Concept Drift in Anomaly Detection
Concept drift
Concept drift is a phenomenon in machine learning and data mining where the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways
This means that the relationship between the input data and the model’s output is no longer valid
In simpler terms, the patterns the model learned from historical data are no longer accurate in predicting future outcomes because the underlying conditions have changed.
Examples of concept drift
Machine learning-based detection algorithms often operate under the assumption of a closed-world, where training and testing samples are independent and identically distributed
Concept drift in industrial applications:
-
Water Treatment: Changes in the quality and composition of incoming water due to seasonal variations, industrial discharges, or changes in source water can affect treatment processes Industry:
-
Gradual degradation of machinery and equipment can lead to changes in vibration patterns, energy consumption, and other operational metrics, affecting predictive maintenance models
-
Persistent exposure to electromagnetic noise can degrade the performance of sensors. Sensors might start to produce noisy or biased measurements, leading to a shift in the data distribution
-
Introduction of new electronic equipment or machinery that generates electromagnetic interference, can cause variations in the level of noise. This can lead to changes in the data patterns collected by sensors
Types of concept drift
- Sudden Drift: The change happens abruptly. For instance, sensor cleaning, redundant devices starting operations
- Incremental Drift: The change occurs gradually over time. An example could be the slow but steady increase of dust on optical sensors
- Seasonal Drift: Changes recur in a cyclical pattern, such as variations in ICS operations due to seasons or holidays
- Recurring Drift: Similar to seasonal drift, but the changes are not necessarily tied to a specific period
- For example, day-night shifts or varying demand periods
Impact on Anomaly detection
Anomaly detection systems rely on building a model of normal behaviour based on historical data
When concept drift occurs, the model becomes outdated, leading to:
Increased false positives: Normal data that deviates from the old model is incorrectly flagged as anomalous
Increased false negatives: True anomalies that align with the new normal data distribution might be missed or ignored
The system’s ability to accurately detect anomalies deteriorates
Addressing concept drift
- Model Retraining: Regularly model from scratch or with some prior knowledge using updated or additional data
- Adaptive Learning: Implementing online learning algorithms that can adapt to new data in real-time
- Ensemble Methods: Combining multiple models to mitigate the impact of drift by leveraging diverse perspectives on the data
- Feedback Loops: Incorporating feedback from domain experts or automated systems to continuously refine and update the model based on real-time anomaly detection results
Types of anomalies
Point Anomalies: These are individual data points that deviate significantly from the rest of the dataset
For example, a sudden spike in temperature in a cooling system could be a point anomaly
Contextual Anomalies: Anomalies that are only unusual in a specific context
- For instance, a high temperature might be normal during peak operation hours but anomalous during downtime Collective Anomalies: A series of data points that, as a group, represent an anomalous pattern, even if individual points are not
Selecting the anomaly threshold
When only normal data is available for anomaly detection, the task is unsupervised
- The challenge is to identify a threshold that separates the normal data from potential anomalies
- Two common algorithms for anomaly threshold selection:
- Z-score
- Percentile
The Z-score
Z-score: The Z-score formula is a statistical measure used to measure how many standard deviations a data point is from the mean of a dataset
$$Z(\mathbf{x}) = \frac{\mathbf{x} - \mu}{\sigma}$$
NOTE: In anomaly detection, a data point is the anomaly score of a sample (e.g., the reconstruction error)
- Z-score of 0: This means the data point’s value is the same as the mean value of the data set
- Z-score of 1.0: This value indicates one standard deviation above the mean
- Z-score greater than 1.0: The data point is considered unusual or farther from the mean
- example: Autoencoder threshold , applied to the reconstruction error T = μ + σ MSE(x, x′)
- a Z-score of 2 means the data point is two standard deviations above the mean
- Equivalent to T = μ + 2σ
Example: anomaly detection with autoencoders
In this lab, the anomaly threshold is computed as the sum of the mean ( ) and the standard deviation ( ) of the reconstruction errors obtained on the validation samples (lines 3 and 5 in the code snippet) μ σ
$$T = \mu + \sigma$$
In the test phase (lines 13 and 17), the samples whose reconstruction error is larger than the threshold are considered anomalies
$$MSE(\mathbf{x}, \mathbf{x’}) > T$$
This is equivalent to a Z-score threshold of 1 standard deviation:
$$T = \mu + \sigma \iff T - \mu = \sigma \iff \frac{T + \mu}{\sigma} = 1$$
Percentile-based anomaly threshold
Percentile Method. A percentile is a measure that indicates the value below which a given percentage of observations in a dataset falls
-
Example: the 90th percentile is the value below which 90% of the data points lie Steps for Percentile-Based Anomaly Detection:
-
Calculate Anomaly Scores. If you have raw data, you might first need to compute an anomaly score for each data point
- This could be based on distance metrics, reconstruction errors (e.g., MSE), or other methods
Compute Percentiles. Once you have your anomaly scores, calculate the relevant percentiles
-
For example, you could calculate the 95th, 99th, or even 99.9th percentiles, depending on how strict you want to be in identifying anomalies Set the Anomaly Threshold. Determine which percentile will serve to compute your threshold for flagging anomalies
-
Typically, a high percentile (e.g., 95th, 99th) is chosen
-
The threshold is the minimum anomaly score of the samples outside the percentile
This means any data point with an anomaly score above this is considered an anomaly
Anomaly threshold and concept drift
- Anomaly detection systems typically rely on the assumption that normal behaviour is consistent over the time
- When concept drift occurs, what was once normal might now be considered anomalous or vice-versa
- This makes it difficult to maintain a fixed threshold for detecting anomalies
Static Threshold Issues
A threshold that works well in one period may become ineffective if the data distribution shifts.
For example, a threshold that was effective in detecting anomalies in a dataset of sensor readings might become either too sensitive (high false positive rate) or too conservative (high false negative rate) after a concept drift.
Adaptive Thresholding
One approach is to dynamically adjust the threshold in response to detected concept drift.
However, determining the optimal adjustment mechanism is complex and often requires real-time monitoring and tuning, which can be resourceintensive.
It might also involve human intervention to ensure that the data used to tune the throshold that discriminates the normal behavior from anomalies is correctly labelled (e.g., all the samples are normal).
Drift detection delay
Detecting concept drift in real-time is difficult.
There is often a delay between the occurrence of the drift and its detection.
During this delay, the anomaly detection system may operate with a suboptimal threshold, leading to inaccurate detection rates.