Network Intrusion Detection System
IIT Jodhpur | P.R.M.L. Project |
Code | Dataset | Slides | Short Talk |
Abstract
The Network Intrusion Detection System (NIDS) presented in this project is a machine learning-based framework designed to identify unauthorized access and anomalous behavior within a network. By analyzing network traffic, the system effectively detects potential security threats. This report outlines a comprehensive approach to enhancing network intrusion detection by employing a combination of classical machine learning algorithms and strategic feature selection techniques. Specifically, we leverage Optuna for feature selection to reduce dimensionality and optimize model performance. Several classification models—including K-Nearest Neighbors (KNN), Logistic Regression, Support Vector Machines (SVM) with linear and non-linear kernels, Naive Bayes, and Decision Tree—are evaluated for their effectiveness in identifying malicious network activity. The models are rigorously assessed using standard performance metrics such as F1 score, precision, and recall to ensure their reliability and robustness. Additionally, the application of GridSearchCV with cross-validation significantly improves model performance by identifying the best hyperparameters for each classifier. Through experimentation and model exploration, we identify the optimal combination of feature selection and classifier, demonstrating that effective feature engineering and model diversity lead to a robust NIDS applicable to real-world cybersecurity scenarios. The findings of this project underscore the importance of continual experimentation in the development of highly accurate intrusion detection systems.
Keywords: Network Traffic Analysis, Intrusion Detection, Optuna feature selection, KNN classifier, SVM kernels, Naive Bayes, Decision Tree, GridSearchCV, model evaluation, cybersecurity threats, dimensionality reduction
Problem Statement
Network Intrusion Detection using Machine Learning: A network administrator wants to identify suspicious activities, like unauthorized data access, but may not be able to describe all possible threats easily. The system uses network traffic data along with machine learning to detect these threats. By evaluating multiple models and optimizing features, the system effectively identifies potential intrusions and strengthens cybersecurity.
Data Preprocessing:
The original dataset consisted of 42 features and 25,192 rows which were processed as follows before using:
Finally, the dataset was of the shape (25192,10)
The class distribution did not show any bias, so there was no need to drop any rows
Model Results:
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
KNN | 99.36% | 99.50% | 99.50% | 99.50% |
Random Forest | 99.50% | 99.50% | 99.50% | 99.50% |
AdaBoost | 96.59% | 96.59% | 96.59% | 96.59% |
Decision Tree | 99.41% | 99.41% | 99.41% | 99.41% |
Linear SVM | 89.86% | 85.29% | 97.72% | 91.13% |
Non-Linear SVM | 97.14% | 95.24% | 99.59% | 97.39% |
XGBoost | 99.66% | 99.66% | 99.66% | 99.66% |
BernoulliNB | 92.72% | 92.72% | 92.72% | 92.71% |
Logistic Regression | 88.71% | 88.72% | 88.71% | 88.70% |
ANN | 95.87% | 95.87% | 95.87% | 95.87% |
Team
Acknowledgment
We would like to express our heartfelt gratitude to Prof. Anand Mishra, Assistant Professor at IIT Jodhpur, for his invaluable guidance and mentorship throughout the development of this project. His insights, encouragement, and support were instrumental in shaping our approach and helping us overcome challenges along the way. This project would not have reached its current level of depth and quality without his expert supervision. We are truly thankful for the opportunity to learn under his mentorship.