Anomaly detection
Encyclopedia
Anomaly detection, also referred to as outlier detection refers to detecting patterns in a given data set that do not conform to an established normal behavior.
The patterns thus detected are called anomalies and often translate to critical and actionable information in several application domains. Anomalies are also referred to as outlier
s, change, deviation, surprise, aberrant, peculiarity, intrusion, etc.
In particular in the context of abuse and network intrusion detection
, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier
as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.
Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learnt model.
, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting eco-system disturbances. It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning
, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.
, and inductive learning. Types of statistics proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of Anomaly detection in Intrusion detection
is Misuse Detection
.
The patterns thus detected are called anomalies and often translate to critical and actionable information in several application domains. Anomalies are also referred to as outlier
Outlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
s, change, deviation, surprise, aberrant, peculiarity, intrusion, etc.
In particular in the context of abuse and network intrusion detection
Intrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...
, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier
Outlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.
Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learnt model.
Applications
Anomaly detection is applicable in a variety of domains, such as intrusion detectionIntrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...
, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting eco-system disturbances. It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.
Popular Anomaly Detection Techniques
Several anomaly detection techniques have been proposed in literature. Some of the popular techniques are:- Distance based techniques (k-nearest neighborK-nearest neighbor algorithmIn pattern recognition, the k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until...
, Local Outlier FactorLocal Outlier FactorLocal outlier factor is an anomaly detection algorithm presented as "LOF: Identifying Density-based Local Outliers" by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander...
). - One Class Support Vector Machines.
- Replicator Neural NetworksNeural networkThe term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...
. - Cluster analysis based outlier detection.
- Pointing at records that deviate from association rules
Application to Data Security
Anomaly detection was proposed for Intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with Soft computingSoft computing
Soft computing is a term applied to a field within computer science which is characterized by the use of inexact solutions to computationally-hard tasks such as the solution of NP-complete problems, for which an exact solution cannot be derived in polynomial time.-Introduction:Soft Computing became...
, and inductive learning. Types of statistics proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of Anomaly detection in Intrusion detection
Intrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...
is Misuse Detection
Misuse Detection
Misuse detection actively works against potential insider threats to vulnerable company data.-Misuse:Misuse detection is an approach in detecting attacks. In misuse detection approach, we define abnormal system behaviour at first, and then define any other behaviour, as normal behaviour...
.