OpenWorld

OpenWorld

Outline

  • The open-world challenge
  • Approaches to open-world network intrusion detection
  • The OpenMax classifier

The Open-world challenge

In network intrusion detection, open-world problems occur when the system needs to handle novel, previously unseen types of network attacks or anomalous behaviour that were not explicitly defined or seen during training

Close-World Assumption: Traditional IDSs often operate under the assumption that they know all possible attack types

real-world environments are open-world, where new and evolving threats constantly emerge

Key Challenge: Detect and adapt to novel or unseen classes while maintaining strong performance on known classes

Open-World Examples in Network Intrusion Detection

New Types of Distributed Denial of Service (DDoS) Attacks

Scenario: A network intrusion detection system has been trained to recognize common DDoS attack patterns, such as SYN floods or UDP amplification attacks

However, a new DDoS variant that utilizes novel attack vectors (e.g., leveraging IoT devices or specific zero-day vulnerabilities) emerges

Challenge: The IDS is not explicitly trained to detect this new type of DDoS

As a result, the system needs to recognise that the network traffic deviates from known patterns and potentially flag it as an unknown (novel) threat

New Malware C&C Channels

Scenario: Malware often communicates C&C servers to receive instructions or exfiltrate data

  • The IDS might be trained on known C&C communication patterns, such as specific domain names or protocols (e.g., HTTP/S, DNS tunneling) Challenge: A new type of malware emerges that uses a previously unseen communication method, such as a covert channel through an unconventional protocol (e.g., using VoIP or social media as the control channel)

Emerging Protocol Attacks

Scenario: As IoT devices proliferate, attackers may exploit weaknesses in non-standard or emerging network protocols used by these devices

The IDS might have been trained on traditional protocols (e.g., HTTP, FTP) but not on newer or specialised protocols common in IoT networks (e.g., MQTT, a publish-subscribe protocol)

Challenge: The introduction of new protocols or significant changes in protocol behaviour (e.g., custom IoT communication methods) could confuse a closed-world IDS, which doesn’t recognise these interactions as malicious or anomalous

Novel Network Scanning Techniques

Scenario: Attackers often use network scanning techniques like port scanning, fingerprinting, or reconnaissance to gather information about potential targets

The IDS might be trained to detect typical scanning methods, e.g.:

SYN scans (e.g., nmap -sS <target-ip>)

ICMP sweeps (e.g., fping -a -g 192.168.1.0/24)

Challenge: A new form of network scanning, such as slow, stealthy scans (slow-rate scans over hours or days) or random-sequence scanning, may not be recognised by the system as a malicious activity 8

AI-based techniques used to handle open-world problems in cybersecurity

Open Set Recognition (OSR)

Goal: Detect both known threats (classes that the model was trained on) and unknown threats (new, previously unseen classes)

OpenMax: An extension of SoftMax, OpenMax modifies the output of the final classifier to account for unknown classes by reducing the confidence scores of known classes based on the statistical analysis of activations (using Weibull distribution)

Anomaly Detection and Outlier Detection

Goal: Detect unusual or abnormal behaviour in network traffic, user activities, or system logs that may indicate new types of attacks

Autoencoders: Neural networks that learn a compressed representation of normal data

New, unknown attacks typically show a high reconstruction error, which can be flagged as anomalous

Adversarial Machine Learning

Goal: Improve model robustness against novel attacks, including adversarial attacks where attackers deliberately manipulate data to evade detection

  • Adversarial Training: Exposes the model to adversarial examples during training, making it more resilient to adversarial attacks (e.g., perturbing network packets or logs to evade detection)
  • GANs: GANs can be used both offensively and defensively
    • They can generate synthetic malicious data to improve detection systems or detect adversarial attacks by learning the distribution of normal data.

The OpenMax classifier

Overview

The OpenMAX classifier is a machine learning technique designed to handle open-set recognition problems

  • In real-world scenarios, the model may encounter data that doesn’t belong to any of the known classes from training
  • Traditional classifiers will try to force an input into one of the known classes, potentially leading to misclassifications
  • Open-set recognition aims to not only classify data belonging to known categories but also detect and label inputs that are “unknown”

Core idea of OpenMax

OpenMax builds on traditional SoftMax classifiers, which output a probability distribution across known classes

  • It extends them to handle open-set data by introducing a mechanism for the rejection of unknown samples
  • SoftMax Limitation: SoftMax assigns probabilities based on the assumption that an input belongs to one of the known classes, which is not true in an open-world scenario
  • OpenMax Enhancement: OpenMax adjusts the final classification probabilities by modelling the uncertainty in the prediction and provides an additional option: “unknown class” 15

OpenMax vs Softmax

How OpenMax works

Activation vectors

  • During inference, the penultimate layer of a neural network generates an activation vector (AV), which represents the learned features of the input
  • These activation vectors are typically fed into a SoftMax layer to compute class probabilities

  • OpenMax uses Weibull distribution fitting to model the distance between the activation vectors of test samples and known class centroids (i.e., average activation vectors for each class from the training set)
  • The intuition is that inputs that are far from any known class centroid are likely to belong to an unknown class