Adversarial Machine Learning
Adversarial Machine Learning (AML) is the study of machine learning vulnerabilities in adversarial environments.
AML refers to a set of techniques and approaches in the field of artificial intelligence and machine learning that focuses on understanding, mitigating, and defending against adversarial attacks.
Adversarial attacks involve intentionally manipulating input data to deceive a machine learning model, causing it to make incorrect predictions or classifications.
Example of an AML attack
If an automotive company wanted to teach its automated car how to identify a stop sign, it would feed thousands of pictures of stop signs through an ML algorithm.
An adversarial ML attack might manipulate the input data, providing images that aren’t stop signs but labelled as such.
The algorithm misinterprets the input data, causing the overall system to misidentify stop signs when the ML application is deployed in practice or production.
Adversarial space
An ML algorithm attempts to fit a hypothesis function $\hat{y} = h_w(x)$ that maps points drawn from a certain data distribution space into different categories or onto a numerical spectrum.
However, the training data we provide an ML algorithm is drawn from an incomplete segment of the theoretical distribution space.
When the time comes for evaluation of the model in the wild, the test set (drawn from the test data distribution) could contain a segment of data whose properties are not captured in the training data distribution.
we refer to this segment as adversarial space Attackers can exploit portions of adversarial space to fool the ML algorithm.
Black Box attacks
- Definition: In a black box attack, the attacker has little to no knowledge about the internal structure, parameters, or training data of the targeted machine learning model
- Method: Attackers perform these attacks by observing the model’s inputoutput behaviour and manipulating inputs to deceive the model without any specific understanding of how the model makes decisions
- Challenge: Black box attacks often require many iterations and experimentation because the attacker lacks detailed information about the model, making it harder to find effective adversarial examples
Black Box AML DDoS Attack
Objective: Overwhelm the web server with HTTP requests while evading detection by the DDoS protection system
Trial and Error: The attacker sends different variations of HTTP requests, adjusting parameters such as headers, payload, or frequency
- The goal is to find a pattern that consistently evades detection by the DDoS protection system.
- Response analysis: The attacker analyses the responses from the web server and observes how the DDoS protection system reacts to different types of HTTP traffic
- This could involve examining error messages, monitoring server latency, or studying any alerts triggered by the protection system
- Refinement: Based on the observed responses, the attacker refines the crafted HTTP requests to improve their effectiveness in bypassing the DDoS protection
- The iterative process continues until a set of adversarial HTTP requests successfully overwhelms the web server without triggering the protection system
Deployment: The attacker deploys the crafted HTTP requests at scale to launch the DDoS attack on the web server
- The goal is to exploit the blind spots in the DDoS protection system, causing service disruption without raising alarms 7
White-box attacks
- Definition: In a white-box attack, the attacker has complete knowledge of the internal architecture, parameters, and possibly the training data of the ML model
- Method: Armed with this detailed information, attackers can craft more targeted and efficient adversarial examples by exploiting specific vulnerabilities in the model’s design or training data
- Advantage: White-box attacks can be more dangerous and require fewer iterations compared to black-box attacks because the attacker has a deeper understanding of how the model functions
AML attacks against NIDSs
AML attacks can be perpetrated against a Network Intrusion Detection System.
The two most common AML attacks are:
- Poisoning: crafted (or mislabelled) samples are inserted into the set of data employed for training the detection function of NIDSs
- The goal is to mislead the learning algorithm and negatively affect NIDS performance in the classification phase
- Evasion: the intrusion traffic is carefully modified so that a trained NIDS will not be able to detect it at runtime (i.e., no alert will be raised)
Poisoning
In poisoning attacks, attackers are assumed to have control over a portion of the training data used by the learning algorithm
The larger the proportion of training data that attackers have control over, the more influence they have over the learning objectives and decision boundaries of the machine learning system
Use case: recurrent update of a one-class anomaly detection system
- System owners can easily detect when an online learner receives a high volume of garbage training data out of the blue (sudden spikes in suspicious or abnormal behaviour)
- So-called boiling frog attacks spread out the injection of adversarial training examples over an extended period so as not to trigger any alarm
Poisoning scenarios
Model poisoning attacks are realistically observed only in online and federated learning systems
Online learning:
- Anomaly detection systems use online learning to automatically adjust model parameters over time as they detect changes in normal traffic
- In this way, laborious human intervention to continually tune models and adjust thresholds can be avoided
Federated learning:
- Malicious clients can manipulate the federated training process by providing model updates obtained using mislabeled data or specially crafted samples, resulting in a global model that fails to accurately classify certain types of attack traffic
Evasion Attacks
Exploiting adversarial space to find adversarial examples that cause a misclassification in a machine learning classifier is called an evasion attack
- Evasion attacks are more generally applicable than poisoning attacks
- These attacks can affect any classifier, even if the user has no influence over the training phase
Researchers have demonstrated that small perturbations to the attack network traffic can evade state-of-the-art NIDSs [1]
[1] He, K., Kim, D. D., & Asghar, M. R. (2023). Adversarial machine learning for network intrusion detection systems: a comprehensive survey. IEEE Communications Surveys & Tutorials.
Evasion AML attacks against NIDSs
Feature | Description |
---|---|
Packet size | Modifying the size of packets, either by increasing or decreasing payload size, can impact the model’s ability to recognise normal network behaviour |
Packet order | Shuffling the order of packets in a communication sequence can introduce disorder and make it challenging for the model to identify malicious patterns |
Protocol manipulation | Switching between different network protocols or manipulating protocol-specific fields can confuse the model’s protocol-based detection mechanisms |
Payload content | Modifying the actual content of the payload, such as injecting or altering specific keywords or patterns, can help create adversarial examples that bypass payload analysis |
Frequency of Requests | Adjusting the frequency of requests or the rate of data transmission can impact the model’s perception of normal traffic behaviour |
Traffic Timing Patterns | Introducing irregularities in the timing patterns of network traffic, such as altering inter-arrival times between packets, can challenge the model’s ability to recognise normal communication patterns |
Origin and Destination Address Spoofing |
Falsifying the source or destination addresses in network packets can be a tactic to evade detection by confusing the model’s understanding of network relationships |
Distributed Attacks | Coordinating attacks from multiple sources or utilising a botnet can distribute the attack traffic, making it 13 more challenging for the model to discern malicious intent |