Network Intrusion Detection

Abstract

The Network Intrusion Detection System (NIDS) presented in this project is a machine learning-based framework designed to identify unauthorized access and anomalous behavior within a network. By analyzing network traffic, the system effectively detects potential security threats. This report outlines a comprehensive approach to enhancing network intrusion detection by employing a combination of classical machine learning algorithms and strategic feature selection techniques. Specifically, we leverage Optuna for feature selection to reduce dimensionality and optimize model performance. Several classification models—including K-Nearest Neighbors (KNN), Logistic Regression, Support Vector Machines (SVM) with linear and non-linear kernels, Naive Bayes, and Decision Tree—are evaluated for their effectiveness in identifying malicious network activity. The models are rigorously assessed using standard performance metrics such as F1 score, precision, and recall to ensure their reliability and robustness. Additionally, the application of GridSearchCV with cross-validation significantly improves model performance by identifying the best hyperparameters for each classifier. Through experimentation and model exploration, we identify the optimal combination of feature selection and classifier, demonstrating that effective feature engineering and model diversity lead to a robust NIDS applicable to real-world cybersecurity scenarios. The findings of this project underscore the importance of continual experimentation in the development of highly accurate intrusion detection systems.

Keywords: Network Traffic Analysis, Intrusion Detection, Optuna feature selection, KNN classifier, SVM kernels, Naive Bayes, Decision Tree, GridSearchCV, model evaluation, cybersecurity threats, dimensionality reduction

Problem Statement

Network Intrusion Detection using Machine Learning: A network administrator wants to identify suspicious activities, like unauthorized data access, but may not be able to describe all possible threats easily. The system uses network traffic data along with machine learning to detect these threats. By evaluating multiple models and optimizing features, the system effectively identifies potential intrusions and strengthens cybersecurity.

Data Preprocessing:

The original dataset consisted of 42 features and 25,192 rows which were processed as follows before using:

The three categorical features (Protocol, Service and Connection status) were encoded using label encoder.
Columns with more than 90% zeros were dropped.
Columns with only one unique value were dropped.
By the usage of mutual information, top 10 features were selected for further use.

Finally, the dataset was of the shape (25192,10)

The class distribution did not show any bias, so there was no need to drop any rows

Model Results:

**Table 4: Performance of different models**
Model	Accuracy	Precision	Recall	F1 Score
KNN	99.36%	99.50%	99.50%	99.50%
Random Forest	99.50%	99.50%	99.50%	99.50%
AdaBoost	96.59%	96.59%	96.59%	96.59%
Decision Tree	99.41%	99.41%	99.41%	99.41%
Linear SVM	89.86%	85.29%	97.72%	91.13%
Non-Linear SVM	97.14%	95.24%	99.59%	97.39%
XGBoost	99.66%	99.66%	99.66%	99.66%
BernoulliNB	92.72%	92.72%	92.72%	92.71%
Logistic Regression	88.71%	88.72%	88.71%	88.70%
ANN	95.87%	95.87%	95.87%	95.87%

Team

card image

Luv
Valecha

card image

Shiv Jee
Yadav

card image

Ritik
Nagar

card image

Pratyush
Chauhan

card image

Dheeraj
Kumar

card image

Dhruv
Sharma

Acknowledgment

We would like to express our heartfelt gratitude to Prof. Anand Mishra, Assistant Professor at IIT Jodhpur, for his invaluable guidance and mentorship throughout the development of this project. His insights, encouragement, and support were instrumental in shaping our approach and helping us overcome challenges along the way. This project would not have reached its current level of depth and quality without his expert supervision. We are truly thankful for the opportunity to learn under his mentorship.

For questions, please contact Shiv Jee Yadav or raise an issue on GitHub.

Luv Valecha

Shiv Jee Yadav

Ritik Nagar

Pratyush Chauhan

Dheeraj Kumar

Dhruv Sharma

Luv
Valecha

Shiv Jee
Yadav

Ritik
Nagar

Pratyush
Chauhan

Dheeraj
Kumar

Dhruv
Sharma