Adversarial Machine Learning

NIAD+ML

Advanced

Adversarial Machine Learning (AML) is the study of machine learning vulnerabilities in adversarial environments.

AML refers to a set of techniques and approaches in the field of artificial intelligence and machine learning that focuses on understanding, mitigating, and defending against adversarial attacks.

Adversarial attacks involve intentionally manipulating input data to deceive a machine learning model, causing it to make incorrect predictions or classifications.

Example of an AML attack

If an automotive company wanted to teach its automated car how to identify a stop sign, it would feed thousands of pictures of stop signs through an ML algorithm.

An adversarial ML attack might manipulate the input data, providing images that aren’t stop signs but labelled as such.

The algorithm misinterprets the input data, causing the overall system to misidentify stop signs when the ML application is deployed in practice or production.

Adversarial space

An ML algorithm attempts to fit a hypothesis function $\hat{y} = h_w(x)$ that maps points drawn from a certain data distribution space into different categories or onto a numerical spectrum.

However, the training data we provide an ML algorithm is drawn from an incomplete segment of the theoretical distribution space.

When the time comes for evaluation of the model in the wild, the test set (drawn from the test data distribution) could contain a segment of data whose properties are not captured in the training data distribution.

we refer to this segment as adversarial space Attackers can exploit portions of adversarial space to fool the ML algorithm.

Black Box attacks

Definition: In a black box attack, the attacker has little to no knowledge about the internal structure, parameters, or training data of the targeted machine learning model
Method: Attackers perform these attacks by observing the model’s inputoutput behaviour and manipulating inputs to deceive the model without any specific understanding of how the model makes decisions
Challenge: Black box attacks often require many iterations and experimentation because the attacker lacks detailed information about the model, making it harder to find effective adversarial examples

Black Box AML DDoS Attack

Objective: Overwhelm the web server with HTTP requests while evading detection by the DDoS protection system

Trial and Error: The attacker sends different variations of HTTP requests, adjusting parameters such as headers, payload, or frequency

The goal is to find a pattern that consistently evades detection by the DDoS protection system.
Response analysis: The attacker analyses the responses from the web server and observes how the DDoS protection system reacts to different types of HTTP traffic
- This could involve examining error messages, monitoring server latency, or studying any alerts triggered by the protection system
Refinement: Based on the observed responses, the attacker refines the crafted HTTP requests to improve their effectiveness in bypassing the DDoS protection
- The iterative process continues until a set of adversarial HTTP requests successfully overwhelms the web server without triggering the protection system

Deployment: The attacker deploys the crafted HTTP requests at scale to launch the DDoS attack on the web server

The goal is to exploit the blind spots in the DDoS protection system, causing service disruption without raising alarms 7

White-box attacks

Definition: In a white-box attack, the attacker has complete knowledge of the internal architecture, parameters, and possibly the training data of the ML model
Method: Armed with this detailed information, attackers can craft more targeted and efficient adversarial examples by exploiting specific vulnerabilities in the model’s design or training data
Advantage: White-box attacks can be more dangerous and require fewer iterations compared to black-box attacks because the attacker has a deeper understanding of how the model functions

AML attacks against NIDSs

AML attacks can be perpetrated against a Network Intrusion Detection System.

The two most common AML attacks are:

Poisoning: crafted (or mislabelled) samples are inserted into the set of data employed for training the detection function of NIDSs
- The goal is to mislead the learning algorithm and negatively affect NIDS performance in the classification phase
Evasion: the intrusion traffic is carefully modified so that a trained NIDS will not be able to detect it at runtime (i.e., no alert will be raised)

Poisoning

In poisoning attacks, attackers are assumed to have control over a portion of the training data used by the learning algorithm

The larger the proportion of training data that attackers have control over, the more influence they have over the learning objectives and decision boundaries of the machine learning system

Use case: recurrent update of a one-class anomaly detection system

System owners can easily detect when an online learner receives a high volume of garbage training data out of the blue (sudden spikes in suspicious or abnormal behaviour)
So-called boiling frog attacks spread out the injection of adversarial training examples over an extended period so as not to trigger any alarm

Poisoning scenarios

Model poisoning attacks are realistically observed only in online and federated learning systems

Online learning:

Anomaly detection systems use online learning to automatically adjust model parameters over time as they detect changes in normal traffic
In this way, laborious human intervention to continually tune models and adjust thresholds can be avoided

Federated learning:

Malicious clients can manipulate the federated training process by providing model updates obtained using mislabeled data or specially crafted samples, resulting in a global model that fails to accurately classify certain types of attack traffic

Evasion Attacks

Exploiting adversarial space to find adversarial examples that cause a misclassification in a machine learning classifier is called an evasion attack

Evasion attacks are more generally applicable than poisoning attacks
These attacks can affect any classifier, even if the user has no influence over the training phase

Researchers have demonstrated that small perturbations to the attack network traffic can evade state-of-the-art NIDSs [1]

[1] He, K., Kim, D. D., & Asghar, M. R. (2023). Adversarial machine learning for network intrusion detection systems: a comprehensive survey. IEEE Communications Surveys & Tutorials.

Evasion AML attacks against NIDSs

Feature	Description
Packet size	Modifying the size of packets, either by increasing or decreasing payload size, can impact the model’s ability to recognise normal network behaviour
Packet order	Shuffling the order of packets in a communication sequence can introduce disorder and make it challenging for the model to identify malicious patterns
Protocol manipulation	Switching between different network protocols or manipulating protocol-specific fields can confuse the model’s protocol-based detection mechanisms
Payload content	Modifying the actual content of the payload, such as injecting or altering specific keywords or patterns, can help create adversarial examples that bypass payload analysis
Frequency of Requests	Adjusting the frequency of requests or the rate of data transmission can impact the model’s perception of normal traffic behaviour
Traffic Timing Patterns	Introducing irregularities in the timing patterns of network traffic, such as altering inter-arrival times between packets, can challenge the model’s ability to recognise normal communication patterns
Origin and Destination Address Spoofing	Falsifying the source or destination addresses in network packets can be a tactic to evade detection by confusing the model’s understanding of network relationships
Distributed Attacks	Coordinating attacks from multiple sources or utilising a botnet can distribute the attack traffic, making it 13 more challenging for the model to discern malicious intent

Federated Learning Concept Drift in Anomaly Detection