Machine Learning in Cybersecurity

Machine Learning in Cybersecurity

Deployment: network intrusion example

ML-based NIDS life-cycle

Offline operations

Problem Definition and Goal: Begin by defining the scope and objectives of your NIDS. Determine what types of network intrusions you want to detect, such as DoS attacks, malware, unauthorized access, etc. Establish clear goals for the system’s accuracy and false positive/negative rates.

  • Data Collection: Collect a diverse dataset that includes both normal and anomalous network traffic. This dataset will be used to train and test your ML model. You can obtain this data from sources like network logs, packet captures, and security appliances.
  • Data Preprocessing: Clean and preprocess the collected data to make it suitable for training. This includes removing duplicates, handling missing values, and converting raw data into features that the ML model can understand. Feature engineering may involve extracting information like packet sizes, protocol types, source/destination IP addresses, etc.
  • Data Labelling: Annotate the dataset with labels that indicate whether each instance of network traffic is normal or an intrusion. This labelling is essential for supervised learning, where the ML model learns from labelled examples.
  • Feature Selection: Select the most relevant features from your preprocessed dataset to reduce dimensionality and improve model performance. This step helps to remove noise and focuses the model on the most meaningful aspects of the data.

We will see some feature ranking and selection techniques in this course.

Model Selection: Choose an appropriate ML algorithm for your NIDS. Common choices include decision trees, random forests, support vector machines, and neural networks. The choice depends on factors like the complexity of the problem and the available computational resources.

  • Interpretability and explainability: Some algorithms, like linear models or decision trees, offer more interpretability, allowing you to understand the underlying patterns and feature importance. Deep Learning (DL) models or ensemble methods might provide better performance but can be more challenging to interpret.
  • Model Training: Divide your labelled dataset into training and validation sets. Train the selected model on the training data and finetune its hyperparameters to optimise its performance. Use the validation set to monitor the model’s progress and prevent overfitting.
  • Model Tuning: If the model’s performance is not satisfactory, you might need to adjust the model architecture, hyperparameters, or data preprocessing steps. This iterative process helps in achieving better results.
  • Model Evaluation: Evaluate the trained model’s performance using the test set. Metrics such as accuracy, precision, recall, F1 score, and ROC curves can help you assess how well the model identifies intrusions and normal traffic.
  • Deployment: Once you’re satisfied with the model’s performance, deploy it in a real or simulated network environment. This involves integrating the model into your network infrastructure to continuously monitor incoming traffic.

Online operations

  • Real-time Monitoring: In the deployment phase, the ML model actively monitors network traffic and makes predictions in real-time. The predictions are based on the patterns and features it learned during training
  • Alert Generation: When the ML model detects potentially malicious activity, it generates alerts. These alerts can be notifications to network administrators or can trigger automated responses, such as blocking suspicious IP addresses
  • Continuous Monitoring and Maintenance (partly offline): Network environments evolve over time, and new attack patterns may emerge. It’s crucial to regularly update and retrain your ML model using fresh data to ensure its effectiveness in detecting new types of intrusions without confusing them with legitimate network activities (or vice-versa)

NIDS Deployment

Deploying an NIDS involves strategically placing the system within a network infrastructure to monitor and analyse network traffic for signs of malicious activities or security threats.

Deployment strategies (inline)

Inline: The NIDS is placed directly in the path of network traffic, between network segments or between a network segment and the external internet connection. PROS: immediate action - CONS: latency overhead

Deployment strategies (out-of-band)

Out-of-Band: The NIDS monitors a copy of the network traffic that is separate from the main data path. This minimises the impact on network performance and avoids potential interference with the primary network traffic. CONS: no immediate response

10

Trusted Network

Deployment strategies (perimeter)

Perimeter: In a perimeter deployment, the NIDS is positioned at the network perimeter, usually between the internal network and the external internet connection. It focuses on detecting and preventing external threats before they can enter the internal network.

Trusted Network

11

Deployment strategies (distributed)

Distributed: In a distributed deployment, multiple NIDS sensors are strategically placed across different (physical or virtual) segments of the network

Deployment strategies (zero-trust approach)

The zero-trust approach advocates for not trusting any user or system by default, regardless of their location within the network. This approach is especially effective in mitigating insider threats, where individuals with authorized access to the network pose a risk to data and