Nothing Special   »   [go: up one dir, main page]

Network Anomaly Detection-Methods, Systems and Tools

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO.

1, FIRST QUARTER 2014 303

Network Anomaly Detection:


Methods, Systems and Tools
Monowar H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita

Abstract—Network anomaly detection is an important and peculiarities or discordant observations in various application
dynamic research area. Many network intrusion detection meth- domains [3], [4]. Out of these, anomalies and outliers are two
ods and systems (NIDS) have been proposed in the literature. of the most commonly used terms in the context of anomaly-
In this paper, we provide a structured and comprehensive
overview of various facets of network anomaly detection so that based intrusion detection in networks.
a researcher can become quickly familiar with every aspect of Anomaly detection has extensive applications in areas such
network anomaly detection. We present attacks normally en- as fraud detection for credit cards, intrusion detection for cyber
countered by network intrusion detection systems. We categorize security, and military surveillance for enemy activities. For
existing network anomaly detection methods and systems based example, an anomalous traffic pattern in a computer network
on the underlying computational techniques used. Within this
framework, we briefly describe and compare a large number of may mean that a hacked computer is sending out sensitive
network anomaly detection methods and systems. In addition, data to an unauthorized host.
we also discuss tools that can be used by network defenders The statistics community has been studying the problem of
and datasets that researchers in network anomaly detection can detection of anomalies or outliers from as early as the 19th
use. We also highlight research directions in network anomaly century [5]. In recent decades, machine learning has started to
detection.
play a significant role in anomaly detection. A good number
Index Terms—Anomaly detection, NIDS, attack, dataset, in- of anomaly-based intrusion detection techniques in networks
trusion detection, classifier, tools have been developed by researchers. Many techniques work
in specific domains, although others are more generic.
I. I NTRODUCTION Even though there are several surveys available in the

D
literature on network anomaly detection [3], [6], [7], surveys
UE to advancements in Internet technologies and the
such as [6], [7], discuss far fewer detection methods than we
concomitant rise in the number of network attacks,
do. In [3], the authors discuss anomaly detection in general
network intrusion detection has become a significant research
and cover the network intrusion detection domain only briefly.
issue. In spite of remarkable progress and a large body of
None of the surveys [3], [6], [7] include common tools
work, there are still many opportunities to advance the state-
used during execution of various steps in network anomaly
of-the-art in detecting and thwarting network-based attacks
detection. They also do not discuss approaches that combine
[1].
several individual methods to achieve better performance. In
According to Anderson [2], an intrusion attempt or a threat
this paper, we present a structured and comprehensive survey
is a deliberate and unauthorized attempt to (i) access infor-
on anomaly-based network intrusion detection in terms of
mation, (ii) manipulate information, or (iii) render a system
general overview, techniques, systems, tools and datasets with
unreliable or unusable. For example, (a) Denial of Service
a discussion of challenges and recommendations. Our presen-
(DoS) attack attempts to starve a host of its resources, which
tation is detailed with ample comparisons where necessary
are needed to function correctly during processing; (b) Worms
and is intended for readers who wish to begin research in this
and viruses exploit other hosts through the network; and (c)
field.
Compromises obtain privileged access to a host by taking
advantages of known vulnerabilities.
The term anomaly-based intrusion detection in networks A. Prior Surveys on Network Anomaly Detection
refers to the problem of finding exceptional patterns in net- Network anomaly detection is a broad research area, which
work traffic that do not conform to the expected normal already boasts a number of surveys, review articles, as well as
behavior. These nonconforming patterns are often referred books. An extensive survey of anomaly detection techniques
to as anomalies, outliers, exceptions, aberrations, surprises, developed in machine learning and statistics has been provided
by [8], [9]. Agyemang et al. [10] present a broad review of
Manuscript received March 7, 2012; revised August 28, 2012 and February
27, 2013. anomaly detection techniques for numeric as well as symbolic
M. H. Bhuyan is with the Department of Computer Science and Engi- data. An extensive overview of neural networks and statistics-
neering, Tezpur University, Napaam, Tezpur-784028, Assam, India (e-mail: based novelty detection techniques is found in [11]. Patcha and
mhb@tezu.ernet.in).
D. K. Bhattacharyya is with the Dept. of Computer Science and Engi- Park [6] and Snyder [12] present surveys of anomaly detection
neering, Tezpur University, Napaam, Tezpur-784028, Assam, India (e-mail: techniques used specifically for cyber intrusion detection.
dkb@tezu.ernet.in). A good amount of research on outlier detection in statistics
J. K. Kalita is with the Department of Computer Science, University of
Colorado, Colorado Springs, CO 80918, USA (e-mail: jkalita@uccs.edu). is found in several books [13]–[15] as well as survey articles
Digital Object Identifier 10.1109/SURV.2013.052213.00046 [16]–[18]. Exhaustive surveys of anomaly detection in several
c 2014 IEEE
1553-877X/14/$31.00 ⃝
304 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

domains have been presented in [3], [7]. Callado et al. [19] attacks and their characteristics. In addition, we perform
report major techniques and problems identified in IP traffic detailed comparisons among these methods. Furthermore,
analysis, with an emphasis on application detection. Zhang like [36], we provide practical recommendations and a list
et al. [20] present a survey on anomaly detection methods of research issues and open challenges.
in networks. A review of flow-based intrusion detection is • Unlike [9], [19], our survey is not restricted to only
presented by Sperotto et al. [21], who explain the concepts of IP traffic classification and analysis. It includes a large
flow and classified attacks, and provide a detailed discussion number of up-to-date methods, systems and tools and
of detection techniques for scans, worms, Botnets and DoS analysis. Like [19], we also include a detailed discussion
attacks. on flow and packet level capturing and preprocessing.
Some work [22]–[25] has been reported in the context However, unlike [9], [19], we include ideas for develop-
of wireless networks. Sun et al. [23] present a survey of ing better IDSs, in addition to providing a list of practical
intrusion detection techniques for mobile ad-hoc networks research issues and open challenges.
(MANET) and wireless sensor networks (WSN). They also • Unlike [37], our survey is not restricted to those solutions
present several important research issues and challenges in the introduced for a particular network technology, like CRN
context of building IDSs by integrating aspects of mobility. (Cognitive Radio Network). Also unlike [37], we include
Sun et al. [22] discuss two domain independent online a discussion of a wide variety of attacks, instead of only
anomaly detection schemes (Lempel-Ziv based and Markov- CRN specific attacks.
based) using the location history obtained from traversal of • Unlike [27], our survey is focused on network anoma-
a mobile user. Sun et al. [25] also introduce two distinct lies, their sources and characteristics; and detection ap-
approaches to build IDSs for MANET, viz., Markov-chain proaches, methods and systems, and comparisons among
based and Hotelling’s T2 test-based. They also propose an them. Like [27], we include performance metrics, in
adaptive scheme for dynamic selection of normal profiles and addition to a discussion of the datasets used for evaluation
corresponding thresholds. Sun et al. [24] construct a feature of any IDS.
vector based on several parameters such as call duration,
call inactivity period, and call destination to identify users’ B. The Problem of Anomaly Detection
calling activities. They use classification techniques to detect
To provide an appropriate solution in network anomaly
anomalies.
detection, we need the concept of normality. The idea of
An extensive survey of DoS and distributed DoS attack
normal is usually introduced by a formal model that expresses
detection techniques is presented in [26]. Discussion of net-
relations among the fundamental variables involved in system
work coordinate systems, design and security is found in
dynamics. Consequently, an event or an object is detected as
[27], [28]. Wu and Banzhaf [29] present an overview of
anomalous if its degree of deviation with respect to the profile
applications of computational intelligence methods to the
or behavior of the system, specified by the normality model,
problem of intrusion detection. They include various methods
is high enough.
such as artificial neural networks, fuzzy systems, evolutionary
For example, let us take an anomaly detection system S
computation, artificial immune systems, swarm intelligence,
that uses a supervised approach. It can be thought of as a
and soft computing.
pair S = (M, D), where M is the model of normal behavior
Dong et al. [30] introduce an Application Layer IDS based
of the system and D is a proximity measure that allows one
on sequence learning to detect anomalies. The authors demon-
to compute, given an activity record, the degree of deviation
strate that their IDS is more effective compared to approaches
that such activities have with regard to the model M . Thus,
using Markov models and k-means algorithms. A general
each system has mainly two modules: (i) a modeling module
comparison of various survey papers available in the literature
and (ii) a detection module. One trains the systems to get the
with our work is shown in Table I. The survey contemplated
normality model M . The obtained model is subsequently used
in this paper covers most well-cited approaches and systems
by the detection module to evaluate new events or objects
reported in the literature so far.
or traffic as anomalous or outliers. It is the measurement
Our survey differs from the existing surveys in the following
of deviation that allows classification of events or objects
ways.
as anomalous or outliers. In particular, the modeling module
• Like [35], we discuss sources, causes and aspects of
needs to be adaptive to cope with dynamic scenarios.
network anomalies, and also include a detailed discussion
of sources of packet and flow level feature datasets.
In addition, we include a large collection of up-to-date C. Our Contributions
anomaly detection methods under the categories of sta- This paper provides a structured and broad overview of
tistical, classification-based, knowledge-based, soft com- the extensive research on network anomaly detection methods
puting, clustering-based and combination learners, rather and NIDSs. The major contributions of this survey are the
than restricting ourselves to only statistical approaches. following.
We also include several important research issues, open (a) Like the categorization of the network anomaly detection
challenges and some recommendations. research suggested in ([8], [10]), we classify detection
• Like [36], we attempt to provide a classification of methods and NIDSs into a number of categories. In
various anomaly detection methods, systems and tools addition, we also provide an analysis of many methods
introduced till date in addition to a classification of in terms of their capability and performance, datasets
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 305

TABLE I
A COMPARISON OF OUR SURVEY WITH EXISTING SURVEY ARTICLES

Methods /NIDSs Topics covered [8] [10] [11] [6] [16] [17] [3] [7] [21] [26] [29] [31] [32] [33] [34] Our
/Tools √ √ √ √ √ √ √ √ √ √ √ √ survey

Statistical √ √ √ √ √ √ √ √ √ √ √
Classification-based √ √ √ √ √ √ √
Knowledge-based √ √ √
Soft computing √ √ √ √ √ √ √ √
Clustering-based √
Ensemble-based
Methods √
Fusion-based √
Hybrid √ √ √
Statistical √
Classification-based √ √
Soft computing √ √ √
NIDSs Knowledge-based √ √ √ √
Data Mining √
Ensemble-based √
Hybrid √
Tools Capturing,
Preprocessing,
Attack launching

used, matching mechanisms, number of parameters, and researchers. Opportunities for future research and concluding
detection mechanisms. remarks are presented in Section VIII.
(b) Most existing surveys do not cover ensemble approaches
or data fusion for network anomaly detection, but we do.
(c) Most existing surveys avoid feature selection methods, II. I NTRUSION D ETECTION
which are crucial in the network anomaly detection task.
We present several techniques to determine feature rele- Intrusion is a set of actions aimed to compromise the
vance in intrusion datasets and compare them. security of computer and network components in terms of
(d) In addition to discussing detection methods, we present confidentiality, integrity and availability [38]. This can be done
several NIDSs with architecture diagrams with compo- by an inside or outside agent to gain unauthorized entry and
nents and functions, and also present a comparison among control of the security mechanism. To protect infrastructure
the NIDSs. of network systems, intrusion detection systems (IDSs) pro-
(e) We summarize tools used in various steps for network vide well-established mechanisms, which gather and analyze
traffic anomaly detection. information from various areas within a host or a network to
(f) We also provide a description of the datasets used for identify possible security breaches.
evaluation. Intrusion detection functions include (i) monitoring and
(g) We discuss performance criteria used for evaluating meth- analyzing user, system, and network activities, (ii) configuring
ods and systems for network anomaly detection. systems for generation of reports of possible vulnerabilities,
(h) We also provide recommendations or a wish list to the (iii) assessing system and file integrity (iv) recognizing pat-
developers of ideal network anomaly detection methods terns of typical attacks (v) analyzing abnormal activity, and
and systems. (vi) tracking user policy violations. An IDS uses vulnerability
(i) Finally, we highlight several important research issues and assessment to assess the security of a host or a network.
challenges from both theoretical and practical viewpoints. Intrusion detection works on the assumption that intrusion
activities are noticeably different from normal system activities
and thus detectable.
D. Organization
In this paper, we provide a comprehensive and exhaustive
survey of anomaly-based network intrusion detection: fun- A. Different Classes of Attacks
damentals, detection methods, systems, tools and research
issues as well as challenges. Section II discusses the basics Anderson [2] classifies intruders into two types: external
of intrusion detection in networks while Section III presents and internal. External intruders are unauthorized users of
network anomaly detection and its various aspects. Section the machines they attack, whereas internal intruders have
IV discusses and compares various methods and systems permission to access the system, but do not have privileges
for network anomaly detection. Section V reports criteria for the root or superuser mode. A masquerade internal intruder
for performance evaluation of network anomaly detection logs in as other users with legitimate access to sensitive data
methods and systems. Section VI presents recommendations whereas a clandestine internal intruder, the most dangerous,
to developers of network anomaly detection methods and has the power to turn off audit control for themselves.
systems. Section VII is devoted to research issues and chal- There are various classes of intrusions or attacks [39], [40]
lenges faced by anomaly-based network intrusion detection in computer systems. A summary is reported in Table II.
306 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

TABLE II
C LASSES OF COMPUTER ATTACKS : CHARACTERISTICS AND EXAMPLE

Attack name Characteristics Example


Virus (i) A self replicating program that infects the system without any knowledge or permission from the user. (ii) Increases the Trivial.88.D,
infection rate of a network file system if the system is accessed by another computer. Polyboot.B, Tuareg
Worm (i) A self replicating program that propagates through network services on computer systems without user intervention. (ii) Can SQL Slammer,
highly harm network by consuming network bandwidth. Mydoom, CodeRed
Nimda
Trojan (i) A malicious program that cannot replicate itself but can cause serious security problems in the computer system. (ii) Appears Example-Mail Bomb,
as a useful program but in reality it has a secret code that can create a backdoor to the system, allowing it to do anything on phishing attack
the system easily, and can be called as the hacker gets control on the system without user permission.
Denial of ser- (i) Attempts to block access to system or network resources. (ii) The loss of service is the inability of a particular network or a Buffer overflow, ping of
vice (DoS) host service, such as e-mail to function. (iii) It is implemented by either forcing the targeted computer(s) to reset, or consuming death(PoD), TCP SYN,
resources. (iv) Intended users can no longer communicate adequately due to non-availability of service or because of obstructed smurf, teardrop
communication media.
Network (i) Any process used to maliciously attempt to compromise the security of the network ranging from the data link layer to the Packet injection, SYN
Attack application layer by various means such as manipulation of network protocols. (ii) Illegally using user accounts and privileges, flood
performing actions to delete network resources and bandwidth, performing actions that prevent legitimate authorized users from
accessing network services and resources.
Physical An attempt to damage the physical components of networks or computers. Cold boot, evil maid
Attack
Password At- Aims to gain a password within a short period of time, and is usually indicated by a series of login failures. Dictionary attack, SQL
tack injection attack
Information Gathers information or finds known vulnerabilities by scanning or probing computers or networks. SYS scan, FIN scan,
Gathering XMAS scan
Attack
User to Root (i) It is able to exploit vulnerabilities to gain privileges of superuser of the system while starting as a normal user on the system. Rootkit, loadmodule,
(U2R) attack (ii) Vulnerabilities include sniffing passwords, dictionary attack, or social engineering. perl
Remote to Lo- (i) Ability to send packets to a remote system over a network without having any account on that system, gain access either Warezclient,
cal (R2L) at- as a user or as a root to the system and do harmful operations. (ii) Performs attack against public services (such as HTTP and warezmaster, imap,
tack FTP) or during the connection of protected services (such as POP and IMAP). ftp write, multihop,
phf, spy
Probe (i) Scans the networks to identify valid IP addresses and to collect information about host (e.g., what services they offer, IPsweep, portsweep
operating system used). (ii) Provides information to an attacker with the list of potential vulnerabilities that can later be used
to launch an attack against selected systems and services.

B. Classification of Intrusion Detection and Intrusion Detec- could assume that someone is committing a ‘port scan’ at some
tion Systems of the computer(s) in the network. Various kinds of port scans,
Network intrusion detection has been studied for almost 20 and tools to launch them are discussed in detail in [43]. Port
years. Generally, an intruder’s behavior is noticeably different scans mostly try to detect incoming shell codes in the same
from that of a legitimate user and hence can be detected [41]. manner that an ordinary intrusion detection system does. Apart
IDSs can also be classified based on their deployment in real from inspecting the incoming traffic, a NIDS also provides
time. valuable information about intrusion from outgoing or local
1) Host-based IDS (HIDS): A HIDS monitors and analyzes traffic. Some attacks might even be staged from the inside
the internals of a computing system rather than its external of a monitored network or network segment, and therefore,
interfaces [42]. A HIDS might detect internal activity such not regarded as incoming traffic at all. The data available
as which program accesses what resources and attempts il- for intrusion detection systems can be at different levels of
legitimate access. An example is a word processor that sud- granularity, e.g., packet level traces and IPFIX records. The
denly and inexplicably starts modifying the system password data is high dimensional, typically, with a mix of categorical
database. Similarly, a HIDS might look at the state of a system as well as continuous attributes.
and its stored information whether it is in RAM or in the Misuse-based intrusion detection normally searches for
file system or in log files or elsewhere. One can think of a known intrusive patterns but anomaly-based intrusion detec-
HIDS as an agent that monitors whether anything or anyone tion tries to identify unusual patterns. Intrusion detection
internal or external has circumvented the security policy that techniques can be classified into three types based on the
the operating system tries to enforce. detection mechanism [1], [3], [44]. This includes (i) misuse-
2) Network-based IDS (NIDS): An NIDS deals with detect- based, (ii) anomaly-based, and (iii) hybrid, as described in
ing intrusions in network data. Intrusions typically occur as Table III. Today, researchers mostly concentrate on anomaly-
anomalous patterns though certain techniques model the data based network intrusion detection because it can detect known
in a sequential fashion and detect anomalous subsequences as well as unknown attacks.
[42]. The primary reason for these anomalies is attacks There are several reasons that make intrusion detection
launched by outside attackers who want to gain unauthorized a necessary part of the entire defense system. First, many
access to the network to steal information or to disrupt the traditional systems and applications were developed without
network. security in mind. Such systems and applications were targeted
In a typical setting, a network is connected to the rest of the to work in an environment, where security was never a
world through the Internet. The NIDS reads all incoming pack- major issue. However, the same systems and applications
ets or flows, trying to find suspicious patterns. For example, when deployed in the current network scenario, become major
if a large number of TCP connection requests to a very large security headaches. For example, a system may be perfectly
number of different ports are observed within a short time, one secure when it is isolated but becomes vulnerable when it is
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 307

Fig. 2. Steps for updation of configuration data in ANIDS

A. Generic Architecture of ANIDS


Many NIDSs have been developed by researchers and
practitioners. However, the development of an efficient ANIDS
architecture is still being investigated. A generic architecture
of an ANIDS is shown in Figure 1.
The main components of the generic model of the ANIDS
are discussed below.
Fig. 1. A generic architecture of ANIDS
1) Anomaly detection engine: This is the heart of any
network intrusion detection system. It attempts to detect
connected to the Internet. Intrusion detection provides a way occurrence of any intrusion either online or offline. However,
to identify and thus allow response to attacks against these before sending any network traffic to the detection engine, it
systems. Second, due to limitations of information security needs preprocessing. If the attacks are known, they can be
and software engineering practices, computer systems and detected using the misuse detection approach. On the other
applications may have design flaws or bugs that could be used hand, unknown attacks can be detected using the anomaly-
by an intruder to attack systems or applications. As a result, based approach based on an appropriate matching mechanism.
certain preventive mechanisms (e.g., firewalls) may not be as Matching mechanism: It entails looking for a particular
effective as expected. pattern or profile in network traffic that can be built by
continuous monitoring of network behavior including known
III. OVERVIEW OF N ETWORK A NOMALY D ETECTION exploits or vulnerabilities. The following are some important
Anomaly detection attempts to find patterns in data, which requirements in the design of an efficient matching mecha-
do not conform to expected normal behavior. The importance nism.
of anomaly detection is due to the fact that anomalies in – Matching determines whether the new instance belongs
data translate to significant (and often critical) actionable to a known class defined by a high dimensional profile
information in a wide variety of application domains [45]. For or not. Matching may be inexact.
example, an anomalous traffic pattern in a computer network – Matching must be fast.
could mean that a hacked computer is sending out sensitive – Effective organization of the profiles may facilitate faster
data to an unauthorized host. However, anomalies in a network search during matching.
may be caused by several different reasons. 2) Reference data: The reference data stores information
As stated in [35], there are two broad categories of network about known intrusion signatures or profiles of normal be-
anomalies: (a) performance related anomalies and (b) security havior. Reference data needs to be stored in an efficient
related anomalies. Various examples of performance related manner. Possible types of reference data used in the generic
anomalies are: broadcast storms, transient congestion, bab- architecture of a NIDS are: profile, signature and rule. In case
bling node, paging across the network, and file server failure. of an ANIDS, it is mostly profiles. The processing elements
Security related network anomalies may be due to malicious update the profiles as new knowledge about the observed
activity of intruder(s) who intentionally flood the network with behavior becomes available. These updates are performed in
unnecessary traffic to hijack the bandwidth so that legitimate regular intervals in a batch oriented fashion.
users are unable to receive service(s). Security related anoma- 3) Configuration data: This corresponds to intermediate
lies are three types: (i) point, (ii) contextual and (iii) collective results, e.g., partially created intrusion signatures. The space
anomalies. This classification scheme is described in Table IV. needed to store such information can be quite large. The
However, this survey of ours is concerned with security related steps for updation of the configuration data is given in Figure
network anomalies only. 2. Intermediate results need to be integrated with existing
Currently, anomaly-based network intrusion detection is a knowledge to produce consistent, up-to-date results.
principal focus of research and development in the field of 4) Alarm: This component of the architecture is responsi-
intrusion detection. Various systems with anomaly-based net- ble for generation of alarm based on the indication received
work intrusion detection capabilities are becoming available, from the detection engine.
and many new schemes are being explored. However, the 5) Human analyst: A human analyst is responsible for
subject is far from mature and key issues remain to be solved analysis, interpretation and for taking necessary action based
before wide scale deployment of ANIDS platforms becomes on the alarm information provided by the detection engine.
practicable. The analyst also takes necessary steps to diagnose the alarm
308 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

TABLE III
C HARACTERISTICS AND TYPES OF INTRUSION DETECTION TECHNIQUES

Technique Characteristics
Misuse- (i) Detection is based on a set of rules or signatures for known attacks. (ii) Can detect all known attack patterns based on the reference
based data. (iii) How to write a signature that encompasses all possible variations of the pertinent attack is a challenging task.
Anomaly- (i) Principal assumption: All intrusive activities are necessarily anomalous. (ii) Such a method builds a normal activity profile and checks
based whether the system state varies from the established profile by a statistically significant amount to report intrusion attempts. (iii) Anomalous
activities that are not intrusive may be flagged as intrusive. These are false positives. (iv) One should select threshold levels so that neither
of the above two problems is unreasonably magnified nor the selection of features to monitor is optimized. (v) Computationally expensive
because of overhead and possibly updating several system profile matrices.
Hybrid (i) Exploits benefits of both misuse and anomaly-based detection techniques. (ii) Attempts to detect known as well as unknown attacks.

TABLE IV
A NOMALY: TYPES , CHARACTERISTICS AND EXAMPLES

Types Characteristics Example


Point An instance of an individual data which has been found to be anomalous with respect Isolated network traffic instance from the nor-
anomaly to the rest of data. mal instances at a particular time.
Contextual (i) A data instance which has been found anomalous in a specific context. (ii) Context is Time interval between purchases in credit card
anomaly induced by the structure in the dataset. (iii) Two sets of attributes are used for defining fraud
a context: (a) contextual and (b) behavioral attributes.
Collective (i) A collection of related data instances found to be anomalous with respect to the A sequence such as the following: . . . http-web,
anomaly entire dataset. (ii) Collection of events is an anomaly, but the individual events are not buffer-overflow, http-web, http-web, ftp, http-
anomalies when they occur alone in the sequence. web, ssh, http-web, ssh, buffer-overflow . . .

information as a post-processing activity to support reference instances (also referred to as objects, records, points, vectors,
or profile updation with the help of security manager. patterns, events, cases, samples, observations, entities) [46].
6) Post-processing: This is an important module in a NIDS Each data instance can be described using a set of attributes
for post-processing of the generated alarms for diagnosis of of binary, categorical or numeric type. Each data instance may
actual attacks. consist of only one attribute (univariate) or multiple attributes
7) Capturing traffic: Traffic capturing is an important mod- (multivariate). In the case of multivariate data instances, all
ule in a NIDS. The raw traffic data is captured at both packet attributes may be of the same type or may be a mixture of
and flow levels. Packet level traffic can be captured using a data types. The nature of attributes determines the applicability
common tool, e.g., Wireshark1 and then preprocessed before of anomaly detection techniques.
sending to the detection engine. Flow level data in high speed 2) Appropriateness of proximity measures: Proximity (sim-
networks, is comprised of information summarized from one ilarity or dissimilarity) measures are necessary to solve many
or more packets. Some common tools to capture flow level pattern recognition problems in classification and clustering.
network traffic include Nfdump2, NfSen3 , and Cisco Netflow Distance is a quantitative degree of how far apart two objects
V.94 . are. Distance measures that satisfy metric properties [46] are
8) Security manager: Stored intrusion signatures are up- simply called metric while other non-metric distance measures
dated by the Security Manager (SM) as and when new are occasionally called divergence. The choice of a proximity
intrusions become known. The analysis of novel intrusions measure depends on the measurement type or representation
is a highly complex task. of objects.
Generally, proximity measures are functions that take argu-
ments as object pairs and return numerical values that become
B. Aspects of Network Anomaly Detection
higher as the objects become more alike. A proximity measure
In this section, we present some important aspects of is usually defined as follows.
anomaly-based network intrusion detection. The network in- Definition 3.1: A proximity measure S is a function X ×
trusion detection problem is a classification or clustering X → R that has the following properties [47].
problem formulated with the following components [3]: (i) – Positivity: ∀x,y ∈ X, S(x, y) ≥ 0
types of input data, (ii) appropriateness of proximity measures, – Symmetry: ∀x,y ∈ X, S(x, y) = S(y, x)
(iii) labelling of data, (iv) classification of methods based on – Maximality: ∀x,y ∈ X, S(x, x) ≥ S(x, y)
the use of labelled data, (v) relevant feature identification and where X is the data space (also called the universe) and x, y
(vi) reporting anomalies. We discuss each of these topics in are the pair of k-dimensional objects.
brief. The most common proximity measures for numeric [48]–
1) Types of input data: A key aspect of any anomaly-based [50], categorical [51] and mixed type [52] data are listed in
network intrusion detection technique is the nature of the input Table V. For numeric data, it is assumed that the data is
data used for analysis. Input is generally a collection of data represented as real vectors. The attributes take their values
1 http://www.wireshark.org/
from a continuous domain. In Table V, we assume that there
2 http://nfdump.sourceforge.net/
!−1objects, x = x1 , x2 , x3 · · · xd , y = y1 , y2 , y3 · · · yd
are two
3 http://nfsen.sourceforge.net/ and represents the data covariance with d number of
4 http://www.cisco.com attributes, i.e., dimensions.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 309

For categorical data, computing similarity or proximity


measures is not straightforward owing to the fact that there
is no explicit notion of ordering among categorical values.
The simplest way to find similarity between two categorical
attributes is to assign a similarity of 1 if the values are identical
and a similarity of 0 if the values are not identical. In Table
V, Sk (xk , yk ) represents per-attribute similarity. The attribute
weight wk for attribute k is computed as shown in the table.
Consider a categorical dataset D containing n objects, defined
over a set of d categorical attributes where Ak denotes the kth
attribute. Sk (xk , yk ) is the per-attribute proximity between two
values for the categorical attribute Ak . Note that xk , yk ∈ Ak .
In Table V, IOF denotes Inverse Occurrence Frequency and
OF denotes Occurrence Frequency [51].
Finally, mixed type data includes both categorical and nu-
meric values. A common practice in clustering a mixed dataset
is to transform categorical values into numeric values and then
use a numeric clustering algorithm. Another approach is to
Fig. 3. Framework of feature selection process
compare the categorical values directly, in which two distinct
values result in a distance of 1 while identical values result in
a distance of 0. Of course, other measures for categorical data
can be used as well. Two well-known proximity measures, readily used compared to supervised techniques. For example,
general similarity coefficient and general distance coefficient in spacecraft fault detection [56], an anomaly scenario would
[52] for mixed type data are shown in Table V. Such methods signify an accident, which is not easy to model. The typical
may not take into account the similarity information embed- approach used in such techniques is to build a model for the
ded in categorical values. Consequently, clustering may not class corresponding to normal behavior, and use the model to
faithfully reveal the similarity structure in the dataset [52], identify anomalies in the test data.
[53]. Finally, unsupervised techniques do not require training
3) Labelling of data: The label associated with a data data, and thus are potentially most widely applicable. The
instance denotes if that instance is normal or anomalous. It techniques in this category make the implicit assumption
should be noted that obtaining accurate labeled data of both that normal instances are far more frequent than anomalies
normal or anomalous types is often prohibitively expensive. in the test data [57]. When this assumption is not true,
Labeling is often done manually by human experts and hence such techniques suffer from high false alarm rates. Many
substantial effort is required to obtain the labeled training semi-supervised techniques can be adapted to operate in an
dataset [3]. Moreover, anomalous behavior is often dynamic unsupervised mode by using a sample of the unlabeled dataset
in nature, e.g., new types of anomalies may arise, for which as training data [58]. Such adaptation assumes that the test
there is no labeled training data. data contains very few anomalies and the model learnt during
4) Classification of methods based on use of labelled data: training is robust to these few anomalies.
Based on the extent to which labels are available, anomaly 5) Relevant feature identification: Feature selection plays
detection techniques can operate in three modes: supervised, an important role in detecting network anomalies. Feature se-
semi-supervised and unsupervised. lection methods are used in the intrusion detection domain for
In supervised mode, one assumes the availability of a eliminating unimportant or irrelevant features. Feature selec-
training dataset which has labeled instances for the normal tion reduces computational complexity, removes information
as well as the anomaly class. The typical approach in such redundancy, increases the accuracy of the detection algorithm,
cases is to build a predictive model for normal vs. anomaly facilitates data understanding and improves generalization.
classes. Any unseen data instance is compared against the The feature selection process includes three major steps: (a)
model to determine which class it belongs to. There are two subset generation, (b) subset evaluation and (c) validation.
major issues that arise in supervised anomaly detection. First, Three different approaches for subset generation are: complete,
anomalous instances are far fewer compared to normal in- heuristic and random. Evaluation functions are categorized
stances in the training data. Issues that arise due to imbalanced into five [59] distinct categories: score-based, entropy or mu-
class distributions have been addressed in data mining and tual information-based, correlation-based, consistency-based
machine learning literature [54]. Second, obtaining accurate and detection accuracy-based. Simulation and real world im-
and representative labels, especially for the anomaly class, is plementation are the two ways to validate the evaluated subset.
usually challenging. A number of techniques inject artificial A conceptual framework of the feature selection process is
anomalies in a normal dataset to obtain a labeled training shown in Figure 3.
dataset [55]. Feature selection algorithms have been classified into three
Semi-supervised techniques assume that the training data types: wrapper, filter and hybrid methods [60]. While wrapper
has labeled instances for only the normal class. Since they methods try to optimize some predefined criteria with respect
do not require labels for the anomaly class, they can be more to the feature set as part of the selection process, filter methods
310 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

TABLE V
P ROXIMITY MEASURES FOR NUMERIC , CATEGORICAL AND MIXED TYPE DATA

Numeric [48]
Name Measure, Si (xi , yi )
! Name Measure, Si (xi , yi )
!
"d "d
Euclidean |xi − yi |2 Weighted Euclidean αi |xi − yi |2
"d i=1 "d i=1 √ √
Squared Euclidean i=1 |x i − yi |2 Squared-chord i=1 ( x i − y i )2
"d (xi −yi )2 "d
Squared X 2 City block i=1 |xi − yi |
!i=1 xi +yi
p "d p
max
Minkowski i=1 |xi − yi | Chebyshev i |xi − yi |
"d "d
|x i −y i | xi yi
Canberra i=1
xi +yi Cosine !"
d
i=1!
"d
x2 y2
i=1 i i=1 i
d
#
xi yi
i=1 "d $
Jaccard d d d
Bhattacharyya − ln i=1 (xi yi )
# # #
x2i + yi2 − xi yi
i=1 i=1 i=1
"d 2 "d (xi −yi )2
Pearson i=1 (xi − yi ) Divergence 2 i=1 (x +y )2
i i
! "
Mahalanobis (x − y)t −1 (x − y) - -
Categorical [51]
wk , k=1. . . d Measure, Sk (xk , yk ) wk , k=1. . . d Measure Sk (xk , yk )
% ⎧
1 1 if xk = yk 1 ⎨ 1 if xk = yk
Overlap =
2 0 otherwise d Eskin = n2
⎩ n2 k otherwise
+2
k

1 % 1 )
d 1 if xk = yk d 1 if xk = yk
IOF = 1
otherwise OF = 1
otherwise
1+log fk (xk )x log fk (yk ) 1+log N x log N
fk (xk ) fk (yk )

Mixed [52]
Name Measure Name Measure *
"d "d
General Similarity sgsc (x, y) = "d
1
k=1 w(xk , yk ) General Distance Co- dgdc (x, y) = "d
1
k=1 w(sk , yk )
w(xk ,yk ) w(xk ,yk )
Coefficient k=1 efficient k=1
s(xk , yk ), +1
2
|xk −yk |
• For numeric attributes, s(xk , yk ) = 1 − Rk , d2 (xk , yk ) , where d2 (xk , yk ) is the squared distance
th
where Rk is the range of the k attribute;
w(xk , yk ) = 0 if x or y has missing value for for the kth attribute; w(xk , yk ) is the same as in General
th
the k attribute; otherwise w(xk , yk ) = 1. Similarity Coefficient.
|xk −yk |
• For categorical attributes, s(xk , yk ) = 1 if xk = • For numeric attributes, d(xk , yk ) = Rk ,
yk ; otherwise s(xk , yk ) = 0; w(xk , yk ) = 0 if where Rk is the range of kth attribute.
data point x or y has missing value at kth attribute; • For categorical attributes, d(xk , yk ) = 0 if xk =
otherwise w(xk , yk ) = 1. yk ; otherwise d(xk , yk ) = 1.

rely on the general characteristics of the training data to select are reported [3]. Typically, the outputs produced by anomaly
features that are independent of each other and are highly detection techniques are of two types: (a) a score, which is a
dependent on the output. The hybrid feature selection method value that combine (i) distance or deviation with reference to
attempts to exploit the salient features of both wrapper and a set of profiles or signatures, (ii) influence of the majority in
filter methods [60]. its neighborhood, and (iii) distinct dominance of the relevant
An example of wrapper-based feature selection method subspace (as discussed in Section III-B5). (b) a label, which
is [61], where the authors propose an algorithm to build a is a value (normal or anomalous) given to each test instance.
lightweight IDS by using modified Random Mutation Hill Usually the labelling of an instance depends on (i) the size
Climbing (RMHC) as a search strategy to specify a can- of groups generated by an unsupervised technique, (ii) the
didate subset for evaluation, and using a modified linear compactness of the group(s), (iii) majority voting based on the
Support Vector Machines (SVMs) based iterative procedure outputs given by multiple indices (several example indices are
as a wrapper approach to obtain an optimum feature subset. given in Table VI), or (iv) distinct dominance of the subset of
The authors establish the effectiveness of their method in terms features.
of efficiency in intrusion detection without compromising the
detection rate. An example filter model for feature selection IV. M ETHODS AND S YSTEMS FOR N ETWORK A NOMALY
is [62], where the authors fuse correlation-based and minimal D ETECTION
redundancy-maximal-relevance measures. They evaluate their
The classification of network anomaly detection methods
method on benchmark intrusion datasets for classification
and systems that we adopt is shown in Figure 4. This
accuracy. Several other methods for feature selection are [39],
scheme is based on the nature of algorithms used. It is not
[63]–[65].
straightforward to come up with a classification scheme for
6) Reporting anomalies: An important aspect of any network anomaly detection methods and systems, primarily
anomaly detection technique is the manner in which anomalies because there is substantial overlap among the methods used
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 311

TABLE VI
C LUSTER VALIDITY MEASURES

Reference Name of Index Formula Remark(s)


dmin
Dunn [66] Dunn Index DI = dmax , where dmin denotes the smallest distance between two (i) Can identify dense and well-separated clusters.
objects from different clusters, dmax the largest distance within the same (ii) High Dunn index is more desired for a clus-
cluster. tering algorithm. (iii) May not perform well with
noisy data.
"n σj +σj
Davies et al. Davies Bouldin’s DB = n 1
i=1,i̸=j max( d(ci ,cj ) ), where n is the number of clusters; (i) Validation is performed using cluster quantities
[67] index σi is the average distance of all patterns in cluster i to their cluster center, ci ; and features inherent to the dataset. (ii) For com-
σj is the average distance of all patterns in cluster j to their cluster center, pact clustering, DB values should be as minimum
cj ; and d(ci , cj ) represents the proximity between the cluster centers ci as possible. (iii) It is not designed to accommodate
and cj . overlapping clusters.
S−Smin
Hubert and C-index C = Smax −S , where S is the sum of distances over all pairs It needs to be minimized for better clustering.
min
Schultz [68] of objects form the same cluster, n is the number of those pairs, Smin
and Smax are the sum of n smallest distances and n largest distances,
respectively.
− S−
Baker and Hu- Gamma Index G = S+ S+ + S− , where (S+) represents the number of times that a pair of This measure is widely used for hierarchical clus-
bert [69] samples not clustered together have a larger separation than a pair that were tering.
in the same clusters; (S−) represents reverse outcome.
2(S−)
Rohlf [70] G+ Index G + = n∗(n−1) , where (S−) is defined as for gamma index and n It uses minimum value to determine the number
is the number of within cluster distances. of clusters in the data.
bi −ai
Rousseeuw Silhouette Index SI = max{a ,b } , where ai is the average dissimilarity of the ith This index cannot be applied to datasets with sub-
i i
[71] object to all other objects in the same cluster; bi is the minimum of average clusters.
dissimilarity of the object from all objects in other clusters;
N −N
Goodman and Goodman-Kruskal GK = Ncc +Nd , where Nc and Nd are the numbers of concordant and (i) It is robust in outliers detection. (ii) It requires
d
Kruskal [72] index disconcordant quadruples, respectively. high computation complexity in comparison to
C-index.
Jaccard [73] Jaccard Index JI = a+b+c a
, where a denotes the number of pairs of points with the same It uses less information than Rand index measure.
label in C and assigned to the same cluster in k, b denotes the number of
pairs with the same label, but in different clusters and c denotes the number
of pairs in the same cluster, but with different class labels.
a+d
Rand [74] Rand Index RI = a+b+c+d , where d denotes the number of pairs with a different It gives equal weights to false positives and false
label in C that were assigned to a different cluster in k, rest are same with negatives during computation.
JI.
"N "n c 2
Bezdek [75] Partition PC = n 1
i=1 j=1 uij , where nc is the number of clusters, N is the (i) It finds the number of overlaps between clus-
coefficient number of objects in the dataset, uij is the degree of membership. ters, (ii) It lacks connection with dataset.
"k "n
Bezdek [76] Classification en- CE = N 1
i=1 j=1 uij log(uij ), same with partition coefficient. It measures the fuzziness of the cluster partitions.
tropy
σ
Xie and Beni Xie-Beni Index XB = N.dπ , where π = ni , is called compactness of cluster i. Since (i) It combines the properties of membership
min i
[77] ni is the number of points in cluster i, σ, is the average variation in cluster degree and the geometric structure of dataset.
i; dmin = min||ki − kj ||. (ii) Smaller XB means more compact and better
separated clusters.

in the various classes in any particular scheme we may


adopt. We have decided on six distinct classes of methods
and systems. We call them statistical, classification-based,
clustering and outlier-based, soft computing, knowledge-based
and combination learners. Most methods have subclasses as
given in Figure 4. Figure 5 shows the approximate statistics
of papers published in each category.
We distinguish between network anomaly detection meth-
ods and systems in this paper, although such a distinction
is difficult to make sometimes. A network intrusion detection
system (NIDS) usually integrates a network intrusion detection
method within an architecture that comprises other associated
sub-systems to build stand-alone practical system that can
perform the entire gamut of activities needed for intrusion
detection. We present several NIDSs with their architectures
and components as we discuss various anomaly detection
categories.
Fig. 4. Classification of network anomaly detection methods (GA-Genetic
Algorithm, ANN-Artificial Neural Network, AIS-Artificial Immune System)
A. Statistical methods and systems
Statistically speaking, an anomaly is an observation which
is suspected of being partially or wholly irrelevant because belongs to this model. Instances that have a low probability
it is not generated by the stochastic model assumed [78]. to be generated from the learnt model based on the applied
Normally, statistical methods fit a statistical model (usually test statistic are declared anomalies. Both parametric and non-
for normal behavior) to the given data and then apply a parametric techniques have been applied to design statistical
statistical inference test to determine if an unseen instance models for anomaly detection. While parametric techniques
312 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

Fig. 6. Architecture of HIDE system

Fig. 5. Statistics of the surveyed papers during the years 2000 to 2012
Manikopoulos and Papavassiliou [81] introduce a hierar-
chical multi-tier multi-window statistical anomaly detection
assume knowledge of the underlying distribution and estimate system to operate automatically, adaptively, and proactively. It
the parameters from the given data [79], non-parametric tech- applies to both wired and wireless ad-hoc networks. This sys-
niques do not generally assume knowledge of the underlying tem uses statistical modeling and neural network classification
distribution [80]. to detect network anomalies and faults. The system achieves
An example of a statistical IDS is HIDE [33]. HIDE is high detection rate along with low misclassification rate when
an anomaly-based network intrusion detection system, that the anomaly traffic intensity is at 5% of the background traffic
uses statistical models and neural network classifiers to detect but the detection rate is lower at lower attack intensity levels
intrusions. HIDE is a distributed system, which consists of such as 1% and 2%.
several tiers with each tier containing several Intrusion Detec- Association rule mining [92], conceptually a simple method
tion Agents (IDAs). IDAs are IDS components that monitor based on counting of co-occurrences of items in transactions
the activities of a host or a network. The probe layer (i.e., top databases, has been used for one-class anomaly detection by
layer as shown in Figure 6) collects network traffic at a host generating rules from the data in an unsupervised fashion.
or in a network, abstracts the traffic into a set of statistical The most difficult and dominating part of an association
variables to reflect network status, and periodically generates rules discovery algorithm is to find the itemsets that have
reports to the event preprocessor. The event preprocessor layer strong support. Mahoney and Chan [83] present an algorithm
receives reports from both the probe and IDAs of lower known as LERAD that learns rules for finding rare events
tiers, and converts the information into the format required in time-series data with long range dependencies and finds
by the statistical model. The statistical processor maintains anomalies in network packets over TCP sessions. LERAD
a reference model of typical network activities, compares uses an Apriori-like algorithm [92] that finds conditional rules
reports from the event preprocessor with the reference models, over nominal attributes in a time series, e.g., a sequence of
and forms a stimulus vector to feed into the neural network inbound client packets. The antecedent of a created rule is
classifier. The neural network classifier analyzes the stimulus a conjunction of equalities, and the consequent is a set of
vector from the statistical model to decide whether the network allowed values, e.g., if port=80 and word3=HTTP/1.0 then
traffic is normal. The post-processor generates reports for the word1=GET or POST. A value is allowed if it is observed
agents at higher tiers. A major attraction of HIDE is its ability in at least one training instance satisfying the antecedent. The
to detect UDP flooding attacks even with attack intensity as idea is to identify rare anomalous events: those which have not
low as 10% of background traffic. occurred for a long time and which have high anomaly score.
Of the many statistical methods and NIDSs [79], [81]–[89] LERAD is a two-pass algorithm. In the first pass, a candidate
only a few are described below in brief. rule set is generated from a random sample of training data
Bayesian networks [90] are capable of detecting anomalies comprised of attack-free network traffic. In the second pass,
in a multi-class setting. Several variants of the basic tech- rules are trained by obtaining the set of allowed values for
nique have been proposed for network intrusion detection and each antecedent.
for anomaly detection in text data [3]. The basic technique A payload-based anomaly detector for intrusion detection
assumes independence among different attributes. Several known as PAYL is proposed in [84]. PAYL attempts to detect
variations of the basic technique that capture the conditional the first occurrence of a worm either at a network system
dependencies among different attributes using more complex gateway or within an internal network from a rogue device and
Bayesian networks have also been proposed. For example, the to prevent its propagation. It employs a language-independent
authors of [91] introduce an event classification-based intru- n-gram based statistical model of sampled data streams. In
sion detection scheme using Bayesian networks. The Bayesian fact, PAYL uses only a 1-gram model (i.e., it looks at the
decision process improves detection decision to significantly distribution of values contained within a single byte) which
reduce false alarms. requires a linear scan of the data stream and a small 256-
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 313

element histogram. In other words, for each ASCII character in statistical analysis and host access policies as components of
the range 0-255, it computes its mean frequency as well as the the host sensor. The system has a separate IDS server, i.e.,
variance and standard deviation. Since payloads (i.e., arriving a management console to aggregate alerts from the various
or departing contents) at different ports differ in length, PAYL sensors with a user interface, a middle-tier and a data man-
computes these statistics for each specific observed payload agement component. It provides real time protection against
length for each port open in the system. It first observes many malicious changes to network settings on client computers,
exemplar payloads during the training phase and computes the which includes unsolicited changes to the Windows Hosts file
payload profiles for each port for each payload length. During and Windows Messenger service.
detection, each incoming payload is scanned and statistics are FSAS (Flow-based Statistical Aggregation Scheme) [94] is
computed. The new payload distribution is compared against a flow-based statistical IDS. It comprises of two modules:
the model created during training. If there is a significant feature generator and flow-based detector. In the feature
difference, PAYL concludes that the packet is anomalous and generator, the event preprocessor module collects the network
generates an alert. The authors found that this simple approach traffic of a host or a network. The event handlers generate
works surprisingly well. reports to the flow management module. The flow manage-
Song et al. [85] propose a conditional anomaly detec- ment module efficiently determines if a packet is part of
tion method for computing differences among attributes and an existing flow or it should generate a new flow key. By
present three different expectation-maximization algorithms inspecting flow keys, this module aggregates flows together,
for learning the model. They assume that the data attributes and dynamically updates per-flow accounting measurements.
are partitioned into indicator attributes and environmental The event time module periodically calls the feature extraction
attributes based on the decision taken by the user regarding module to convert the statistics regarding flows into the format
which attributes indicate an anomaly. The method learns the required by the statistical model. The neural network classifier
typical indicator attribute values and observes subsequent data classifies the score vectors to prioritize flows with the amount
points, and labels them as anomalous or not, based on the of maliciousness. The higher the maliciousness of a flow, the
degree the indicator attribute values differ from the usual indi- higher is the possibility of the flow being an attacker.
cator attribute values. However, if the indicator attribute values In addition to their inherent ability to detect network anoma-
are not conditioned on environmental attributes values, the lies, statistical approaches have a number of additional distinct
indicator attributes are ignored effectively. The precision/recall advantages as well.
of this method is greater than 90 percent.
• They do not require prior knowledge of normal activities
Lu and Ghorbani [87] present a network signal modeling of the target system. Instead, they have the ability to learn
technique for anomaly detection by combining wavelet ap-
the expected behavior of the system from observations.
proximation and system identification theory. They define and • Statistical methods can provide accurate notification or
generate fifteen relevant traffic features as input signals to alarm generation of malicious activities occurring over
the system and model daily traffic based on these features.
long periods of time, subject to setting of appropriate
The output of the system is the deviation of the current input thresholding or parameter tuning.
signal from the normal or regular signal behavior. Residuals • They analyze the traffic based on the theory of abrupt
are passed to the IDS engine to take decisions and obtain 95%
changes, i.e., they monitor the traffic for a long time
accuracy in the daily traffic. and report an alarm if any abrupt change (i.e., significant
Wattenberg et al. [88] propose a method to detect anomalies deviation) occurs.
in network traffic, based on a nonrestricted α-stable first-
order model and statistical hypothesis testing. The α-stable Drawbacks of the statistical model for network anomaly
function is used to model the marginal distribution of real detection include the following.
traffic and classify them using the Generalized Likelihood • They are susceptible to being trained by an attacker in
Ratio Test. They detect two types of anomaly including floods such a way that the network traffic generated during the
and flash-crowds with promising accuracy. In addition, a attack is considered normal.
nonparametric adaptive CSUM (Cumulative Sum) method for • Setting the values of the different parameters or metrics
detecting network intrusions is discussed in [89]. is a difficult task, especially because the balance between
In addition to the detection methods, there are several false positives and false negatives is an issue. Moreover,
statistical NIDSs. As mentioned earlier, a NIDS includes one a statistical distribution per variable is assumed, but not
or more intrusion detection methods that are integrated with all behaviors can be modeled using stochastic methods.
other required sub-systems necessary to create a practically Furthermore, most schemes rely on the assumption of a
suitable system. We discuss a few below. quasi-stationary process [6], which is not always realistic.
N@G (Network at Guard) [93] is a hybrid IDS that ex- • It takes a long time to report an anomaly for the first
ploits both misuse and anomaly approaches. N@G has both time because the building of the models requires extended
network and host sensors. Anomaly-based intrusion detection time.
is pursued using the chi-square technique on various net- • Several hypothesis testing statistics can be applied to
work protocol parameters. It has four detection methodologies detect anomalies. Choosing the best statistic is often not
viz., data collection, signature-based detection, network access straightforward. In particular, as stated in [88] construct-
policy violation and protocol anomaly detection as a part ing hypothesis tests for complex distributions that are
of its network sensor. It includes audit trails, log analysis, required to fit high dimensional datasets is nontrivial.
314 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

TABLE VII
C OMPARISON OF STATISTICAL NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- No. of param- w x y Data types Dataset z Detection method
cation eters used
Eskin [79] 2000 2 O N P Numeric DARPA99 C4 Probability Model
Manikopoulos and Papavas- 2002 3 D N P Numeric Real-life C2 , C5 Statistical model with neural
siliou [81] network
Mahoney and Chan [83] 2003 2 C N P - DARPA99 C1 LERAD algorithm
Chan et al. [82] 2003 2 C N P Numeric DARPA99 C1 Learning Rules
Wang and Stolfo [84] 2004 3 C N P Numeric DARPA99 C1 Payload-based algorithm
Song et al. [85] 2007 3 C N P Numeric KDDcup99 Intrusive Gaussian Mixture Model
pattern
Chhabra et al. [86] 2008 2 D N P Numeric Real time C6 FDR method
Lu and Ghorbani [87] 2009 3 C N P, F Numeric DARPA99 C1 Wavelet Analysis
Wattenberg et al. [88] 2011 4 C N P Numeric Real-time C2 GLRT Model
Yu [89] 2012 1 C N P Numeric Real-time C2 Adaptive CUSUM
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, C5 -remote to local, and C6 -anomalous

Fig. 7. Linear and non-linear classification in 2-D

Histogram-based techniques are relatively simple to im-



Fig. 8. Architecture of ADAM system
plement, but a key shortcoming of such techniques for
multivariate data is that they are not able to capture
interactions among the attributes.
A comparison of a few statistical network anomaly detection improves its performance on the basis of previous results
methods is given in Table VII. [7]. Several classification-based techniques (e.g., k-nearest
neighbor, support vector machines, and decision trees) have
B. Classification-based methods and systems been applied to anomaly detection in network traffic data.
Classification is the problem of identifying which of a set An example of classification-based IDS is Automated Data
of categories a new observation belongs to, on the basis of Analysis and Mining (ADAM) [32] that provides a testbed
a training set of data containing observations whose category for detecting anomalous instances. An architecture diagram of
membership is known. Assuming we have two classes whose ADAM is shown in Figure 8. ADAM exploits a combination
instances are shown as + and −, and each object can be of classification techniques and association rule mining to
defined in terms of two attributes or features x1 and x2 , discover attacks in a tcpdump audit trail. First, ADAM builds
linear classification tries to find a line between the classes a repository of “normal” frequent itemsets from attack-free
as shown in Figure 7(a). The classification boundary may be periods. Second, ADAM runs a sliding-window based online
non-linear as in Figure 7(b). In intrusion detection, the data algorithm that finds frequent itemsets in the connections
is high dimensional, not just two. The attributes are usually and compares them with those stored in the normal itemset
mixed, numeric and categorial as discussed earlier. repository, discarding those that are deemed normal. ADAM
Thus, classification techniques are based on establishing uses a classifier which has been trained to classify suspicious
an explicit or implicit model that enables categorization of connections as either a known type of attack or an unknown
network traffic patterns into several classes [95]–[100]. A type or a false alarm.
singular characteristic of these techniques is that they need A few classification-based network anomaly detection meth-
labeled data to train the behavioral model, a procedure that ods and NIDSs are described below in brief.
places high demands on resources [101]. In many cases, the Abbes et al. [102] introduce an approach that uses decision
applicability of machine learning principles such as classifi- trees with protocol analysis for effective intrusion detection.
cation coincides with that of statistical techniques, although They construct an adaptive decision tree for each application
the former technique is focused on building a model that layer protocol. Detection of anomalies classifies data records
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 315

into two classes: benign and anomalies. The anomalies include built. The purpose of building decision trees is to overcome
a large variety of types such as DoS, scans, and botnets. two problems that k-means faces: a) forced assignment: if
Thus, multi-class classifiers are a natural choice, but like any the value of k is lower than the number of natural groups,
classifier they require expensive hand-labeled datasets and are dissimilar instances are forced into the same cluster, and b)
also not able to identify unknown attacks. class dominance, which arises when a cluster contains a large
Wagner et al. [103] use one-class classifiers that can detect number of instances from one class, and fewer numbers of
new anomalies, i.e., data points that do not belong to the instances from other classes. The hypothesis is that a decision
learned class. In particular, they use a one-class SVM classifier tree trained on each cluster learns the sub groupings (if any)
proposed by Schölkopf et al. [104]. In such a classifier, the present within each cluster by partitioning the instances over
training data is presumed to belong to only one class, and the feature space. To obtain a final decision on classification
the learning goal during training is to determine a function of a test instance, the decisions of the k-means and ID3
which is positive when applied to points on the circumscribed algorithms are combined using two rules: (a) the nearest-
boundary around the training points and negative outside. This neighbor rule and (b) the nearest-consensus rule. The authors
is also called semi-supervised classification. Such an SVM claim that the detection accuracy of the k-means+ID3 method
classifier can be used to identify outliers and anomalies. The is very high with an extremely low false positive rate on
authors develop a special kernel function that projects data network anomaly data.
points to a higher dimension before classification. Their kernel Support Vector Machines (SVMs) are very successful max-
function takes into consideration properties of Netflow data imum margin linear classifiers [109]. However, SVMs take a
and enables determination of similarity between two windows long time for training when the dataset is very large. Khan et
of IP flow records. They obtain 92% accuracy on average for al. [106] reduce the training time for SVMs when classifying
all attacks classes. large intrusion datasets by using a hierarchical clustering
Classification-based anomaly detection methods can usually method called Dynamically Growing Self-Organizing Tree
give better results than unsupervised methods (e.g, clustering- (DGSOT) intertwined with the SVMs. DGSOT, which is based
based) because of the use of labeled training examples. In on artificial neural networks, is used to find the boundary
traditional classification, new information can be incorporated points between two classes. The boundary points are the most
by re-training with the entire dataset. However, this is time- qualified points to train SVMs. An SVM computes the max-
consuming. Incremental classification algorithms [105] make imal margins separating the two classes of data points. Only
such training more efficiently. Although classification-based points closest to the margins, called support vectors, affect the
methods are popular, they cannot detect or predict unknown computation of these margins. Other points can be discarded
attack or event until relevant training information is fed for without affecting the final results. Khan et al. approximate
retraining. support vectors by using DGSOT. They use clustering in
For a comparison of several classification-based network parallel with the training of SVMs, without waiting till the
anomaly detection methods, see Table VIII. end of the building of the tree to start training the SVM. The
Several authors have used a combination of classifiers authors find that their approach significantly improves training
and clustering for network intrusion detection leveraging the time for the SVMs without sacrificing generalization accuracy,
advantages of the two methods. For example, Muda et al. [107] in the context of network anomaly detection.
present a two stage model for network intrusion detection. In addition to the several detection methods viz., noted
Initially, k-means clustering is used to group the samples into above, we also discuss a classification-based IDS known
three clusters: C1 to group attack data such as Probe, U2R as DNIDS (Dependable Network Intrusion Detection Sys-
and R2L; C2 to group DoS attack data, and C3 for normal tem) [110]. This IDS is developed based on the Combined
non-attack data. The authors achieve this by initializing the Strangeness and Isolation measure of the k-Nearest Neighbor
cluster centers with the mean values obtained from known (CSI-KNN) algorithm. DNIDS can effectively detect network
data points of appropriate groups. Since the initial centroids intrusion while providing continued service under attack.
are obtained from known labeled data, the authors find that The intrusion detection algorithm analyzes characteristics of
k-means clustering is very good at clustering the data into the network data by employing two measures: strangeness and
three classes. Next, the authors use a Naive Bayes classifier to isolation. These measures are used by a correlation unit to
classify the data in the final stage into the five more accurate raise intrusion alert along with the confidence information.
classes, Normal, DoS, Probe, R2L and U2R. For faster information, DNIDS exploits multiple CSF-KNN
Gaddam et al. [96] present a method to detect anomalous classifiers in parallel. It also includes a intrusion tolerant
activities based on a combined approach that uses the k- mechanism to monitor the hosts and the classifiers running
means clustering algorithm and the ID3 algorithm for decision on them, so that failure of any component can be handled
tree learning [108]. In addition to descriptive features, each carefully. Sensors capture network packets from a network
data instance includes a label saying whether the instance segment and transform them into connection-based vectors.
is normal or anomalous. The first stage of the algorithm The Detector is a collection of CSI-KNN classifiers that ana-
partitions the training data into k clusters using Euclidean lyze the vectors supplied by the sensors. The Manager, Alert
distance similarity. Obviously, the clustering algorithm does Agents, and Maintenance Agents are designed for intrusion
not consider the labels on instances. The second stage of the tolerance and are installed on a secure administrative server
algorithm builds a decision tree on the instances in a cluster. called Station. The Manager executes the tasks of generating
It does so for each cluster so that k separate decision trees are mobile agents and dispatching them for task execution.
316 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

TABLE VIII
C OMPARISON OF CLASSIFICATION - BASED NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Tong et al. [95] 2005 4 O N P Numeric DARPA99, TCPSTAT C1 KPCC model
Gaddam et al. [96] 2007 3 C N P Numeric NAD, DED, MSD C1 k-means+ID3
Khan et al. [106] 2007 3 C N P Numeric DARPA98 C1 DGSOT + SVM
Das et al. [97] 2008 3 O N P Categorical KDDcup99 C1 APD Algorithm
Lu and Tong [98] 2009 2 O N P Numeric DARPA99 C1 CUSUM-EM
Qadeer et al. [99] 2010 - C R P - Real time C2 Traffic statistics
Wagner et al.[103] 2011 2 C R F Numeric Flow Traces C2 Kernel OCSVM
Muda et al. [107] 2011 2 O N O Numeric KDDcup99 C1 KMNB algorithm
Kang et al. [100] 2012 2 O N P Numeric DARPA98 C1 Differentiated SVDD
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local

Fig. 9. Clustering and outliers in 2-D, where Ci s are clusters in (a) and Oi s
are outliers in (b)
Fig. 10. Architecture of MINDS system

Classification-based anomaly detection approaches are pop-


Clustering can be performed in network anomaly detection
ular for detecting network anomalies. The following are some
in an offline environment. Such an approach adds additional
advantages.
depth to the administrators’ defenses, and allows them to more
• These techniques are flexible for training and testing. accurately determine threats against their network through the
They are capable of updating their execution strategies use of multiple methods on data from multiple sources. Hence,
with the incorporation of new information. Hence, adapt- the extensive amount of activities that may be needed to detect
ability is possible. intrusion near real time in an online NIDS may be obviated,
• They have a high detection rate for known attacks subject achieving efficiency [112].
to appropriate threshold setting. For example, MINDS (Minnesota Intrusion Detection Sys-
Though such methods are popular they have the following tem) [34] is a data mining-based system for detecting network
disadvantages. intrusions. The architecture of MINDS is given in Figure 10.
• The techniques are highly dependent on the assumptions It accepts NetFlow data collected through flow tools as input.
made by the classifiers. Flow tools only capture packet header information and build
• They consume more resources than other techniques. one way sessions of flows. The analyst uses MINDS to analyze
• They cannot detect or predict unknown attack or event these data files in batch mode. The reason for running the
until relevant training information is fed. system in batch mode is not due to the time it takes to analyze
these files, but because it is convenient for the analyst to do so.
Before data is fed into the anomaly detection module, a data
C. Clustering and Outlier-based methods and systems filtering step is executed to remove network traffic in which
Clustering is the task of assigning a set of objects into the analyst is not interested.
groups called clusters so that the objects in the same cluster The first step of MINDS is to extract important features
are more similar in some sense to each other than to those in that are used. Then, it summarizes the features based on time
other clusters. Clustering is used in explorative data mining. windows. After the feature construction step, the known attack
For example, if we have a set of unlabeled objects in two detection module is used to detect network connections that
dimensions, we may be able to cluster them into 5 clusters correspond to attacks for which signatures are available, and to
by drawing circles or ellipses around them, as in Figure 9(a). remove them from further analysis. Next, an outlier technique
Outliers are those points in a dataset that are highly unlikely to is activated to assign an anomaly score to each network
occur given a model of the data, as in Figure 9(b). Examples connection. A human analyst then looks at only the most
of outliers in a simple dataset are seen in [111]. Clustering and anomalous connections to determine if they are actual attacks
outlier finding are examples of unsupervised machine learning. or represent other interesting behavior. The association pattern
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 317

analysis module of this system is dedicated to summarize exploits tree-based subspace clustering and an ensemble-based
the network connections as per the assigned anomaly rank. cluster labelling technique to achieve better detection rate over
The analyst provides feedback after analyzing the summaries real life network traffic data for the detection of known as
created and decides whether these summaries are helpful in well as unknown attacks. They obtain 98% detection rate on
creating new rules that may be used in known attack detection. average in detecting network anomalies.
Clustering techniques are frequently used in anomaly de- Some advantages of using clustering are given below.
tection. These include single-link clustering algorithms, k- • For a partitioning approach, if k can be provided accu-
means (squared error clustering), and hierarchical clustering rately then the task is easy.
algorithms to mention a few [113]–[118]. • Incremental clustering (in supervised mode) techniques
Sequeira and Zaki [119] present an anomaly-based intrusion are effective for fast response generation.
detection system known as ADMIT that detects intruders • It is advantageous in case of large datasets to group into
by creating user profiles. It keeps track of the sequence of similar number of classes for detecting network anoma-
commands a user uses as he/she uses a computer. A user lies, because it reduces the computational complexity
profile is represented by clustering the sequences of the user’s during intrusion detection.
commands. The data collection and processing are thus host- • It provides a stable performance in comparison to classi-
based. The system clusters a user’s command sequence using fiers or statistical methods.
LCS (Longest Common Subsequence) as the similarity metric.
Drawbacks of clustering-based methods include the follow-
It uses a dynamic clustering algorithm that creates an initial
ing.
set of clusters and then refines them by splitting and merging
as necessary. When a new user types a sequence of commands, • Most techniques have been proposed to handle continu-

it compares the sequence to profiles of users it already has. If ous attributes only.
it is a long sequence, it is broken up to a number of smaller • In clustering-based intrusion detection techniques, an

sequences. A sequence that is not similar to a normal user’s assumption is that the larger clusters are normal and
profile is considered anomalous. One anomalous sequence is smaller clusters are attack or intrusion [57]. Without this
tolerated as noise, but a sequence of anomalous sequences assumption, it is difficult to evaluate the technique.
typed by one single user causes the user to be marked • Use of an inappropriate proximity measure affects the

as masquerader or concept drift. The system can also use detection rate negatively.
incremental clustering to detect masqueraders. • Dynamic updation of profiles is time consuming.

Zhang et al. [115] report a distributed intrusion detection Several outlier-based network anomaly identification tech-
algorithm that clusters the data twice. The first clustering niques are available in [18]. When we use outlier-based
chooses candidate anomalies at Agent IDSs, which are placed algorithms, the assumption is that anomalies are uncommon
in a distributed manner in a network and a second clustering events in a network. Intrusion datasets usually contain mixed,
computation attempts to identify true attacks at the central numeric and categorial attributes. Many early outlier detec-
IDS. The first clustering algorithm is essentially the same as tion algorithms worked with continuous attributes only; they
the one proposed by [120]. At each agent IDS, small clusters ignored categorial attributes or modeled them in manners that
are assumed to contain anomalies and all small clusters are caused considerable loss of information.
merged to form a single candidate cluster containing all To overcome this problem, Otey et al. [123] develop a
anomalies. The candidate anomalies from various Agent IDSs distance measure for data containing a mix of categorical
are sent to the central IDS, which clusters again using a simple and continuous attributes and use it for outlier-based anomaly
single-link hierarchical clustering algorithm. It chooses the detection. They define an anomaly score which can be used to
smallest k clusters as containing true anomalies. They obtain identify outliers in mixed attribute space by considering de-
90% attacks detection rate on test intrusion data. pendencies among attributes of different types. Their anomaly
Worms are often intelligent enough to hide their activities score function is based on a global model of the data that
and evade detection by IDSs. Zhuang et al. [121] propose can be easily constructed by combining local models built
a method called PAIDS (Proximity-Assisted IDS) to iden- independently at each node. They develop an efficient one-pass
tify the new worms as they begin to spread. PAIDS works approximation algorithm for anomaly detection that works
differently from other IDSs and has been designed to work efficiently in distributed detection environments with very
collaboratively with existing IDSs such as an anomaly-based little loss of detection accuracy. Each node computes its own
IDS for enhanced performance. The goal of the designers outliers and the inter-node communication needed to compute
of PAIDS is to identify new and intelligent fast-propagating global outliers is not significant. In addition, the authors show
worms and thwarting their spread, particularly as the worm is that their approach works well in dynamic network traffic
just beginning to spread. Neither signature-based nor anomaly- situations where data, in addition to being streaming, also
based techniques can achieve such capabilities. Zhuang et al.’s changes in nature as time progresses leading to concept drift.
approach is based mainly on the observation that during the Bhuyan et al. [124] introduce an outlier score function
starting phase of a new worm, the infected hosts are clustered to rank each candidate object w.r.t. the reference points for
in terms of geography, IP address and maybe, even DNSes network anomaly detection. The reference points are computed
used. from the clusters obtained from variants of the k-means
Bhuyan et al. [122] present an unsupervised network clustering technique. The method is effective on real life
anomaly detection method for large intrusion datasets. It intrusion datasets.
318 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

Some of the advantages of outlier-based anomaly detection


are the following.
• It is easy to detect outliers when the datasets are smaller
in size.
• Bursty and isolated attacks can be identified efficiently
using this method.
Drawbacks of outlier-based anomaly detection include the
following.
• Most techniques use both clustering and outlier detection.
In such cases the complexity may be high in comparison
to other techniques.
• The techniques are highly parameter dependent.
A comparison of a few clustering and outlier-based network
anomaly detection methods is given in Table IX.
Fig. 11. Architecture of RT-UNNID system
D. Soft computing methods and systems
Soft computing techniques are suitable for network anomaly
features and converts them into binary or normalized form.
detection because often one cannot find exact solutions. Soft
The converted data is sent to the UNN-based detection en-
computing is usually thought of as encompassing methods
gine that uses Adaptive Resonance Theory (ART) and Self-
such as Genetic Algorithms, Artificial Neural Networks, Fuzzy
Organizing Map (SOM) [131], [132] neural networks. Finally,
Sets, Rough Sets, Ant Colony Algorithms and Artificial Im-
the output of the detection engine is sent to the responder for
mune Systems. We describe several soft computing methods
recording in the user’s system log file and to generate alarm
and systems for network anomaly detection below.
when detecting attacks. RT-UNNID can work in real time to
1) Genetic algorithm approaches: Genetic algorithms are
detect known and unknown attacks in network traffic with high
population-based adaptive heuristic search techniques based
detection rate.
on evolutionary ideas. The approach begins with conversion of
a problem into a framework that uses a chromosome like data Cannady’s approach [133] autonomously learns new attacks
structure. Balajinath and Raghavan [127] present a genetic rapidly using modified reinforcement learning. His approach
intrusion detector (GBID) based on learning of individual user uses feedback for signature update when a new attack is
behavior. User behavior is described as 3-tuple <matching encountered and achieves satisfactory results. An improved
index, entropy index, newness index> and is learnt using approach to detect network anomalies using a hierarchy of
a genetic algorithm. This behavior profile is used to detect neural networks is introduced in [134]. The neural networks
are trained using data that spans the entire normal space and
intrusion based on past behavior.
Khan [128] uses genetic algorithms to develop rules for are able to recognize unknown attacks effectively.
network intrusion detection. A chromosome in an individual Liu et al. [135] report a real time solution to detect known
contains genes corresponding to attributes such as the service, and new attacks in network traffic using unsupervised neural
flags, logged in or not, and super-user attempts. Khan con- nets. It uses a hierarchical intrusion detection model using
cludes that attacks that are common can be detected more Principal Components Analysis (PCA) neural networks to
accurately compared to uncommon attributes. overcome the shortcomings of single-level structures.
2) Artificial Neural Network approaches: Artificial Neural Sun et al. [136] present a wavelet neural network (WNN)
Networks (ANN) are motivated by the recognition that the based intrusion detection method. It reduces the number
human brain computes in an entirely different way from of the wavelet basic functions by analyzing the sparseness
the conventional digital computer [129]. The brain organizes property of sample data to optimize the wavelet network to a
its constituents, known as neurons, so as to perform certain large extent. The learning algorithm trains the network using
computations (e.g., pattern recognition, perception, and motor gradient descent.
control) many times faster than the fastest digital computer. Yong and Feng [137] use recurrent multilayered percep-
To achieve good performance, real neural networks employ trons (RMLP) [138], a dynamic extension of well-known
massive interconnections of neurons. Neural networks acquire feed-forward layered networks to classify network data into
knowledge of the environment through a process of learning, anomalous and normal. An RMLP network has the ability to
which systematically changes the interconnection strengths, encode temporal information. They develop an incremental
or synaptic weights of the network to attain a desired design kernel principal components algorithm to pre-process the data
objective. that goes into the neural network and obtain effective results.
An example of ANN-based IDS is RT-UNNID [130]. This In addition to the detection methods, we discuss a few IDSs
system is capable of intelligent real time intrusion detection below.
using unsupervised neural networks (UNN). The architecture NSOM (Network Self-Organizing Maps) [139] is a network
of RT-UNNID is given in Figure 11. The first module captures IDS developed using Self-Organizing Maps (SOM). It detects
and preprocesses the real time network traffic data for the anomalies by quantifying the usual or acceptable behavior and
protocols: TCP, UDP and ICMP. It also extracts the numeric flags irregular behavior as potentially intrusive. To classify real
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 319

TABLE IX
C OMPARISON OF CLUSTERING AND OUTLIER - BASED NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Sequeira and Zaki [119] 2002 4 C R P Numeric, Cat- Real life Synthetic in- ADMIT
egorical trusions
Zhang et al. [115] 2005 2 D N P Numeric KDDcup99 C1 Cluster-based DIDS
Leung and Leckie [116] 2005 3 C N P Numeric KDDcup99 C1 fpMAFIA algorithm
Otey et al. [123] 2006 5 C N P Mixed KDDcup99 C1 FDOD algorithm
Jiang et al. [125] 2006 3 C N P Mixed KDDcup99 C1 CBUID algorithm
Chen and Chen [126] 2008 - O N - - - C3 AAWP model
Zhang et al. [117] 2009 2 O N P Mixed KDDcup99 C1 KD algorithm
Zhuang et al. [121] 2010 2 R C P - Real time C6 PAIDS model
Bhuyan et al. [124] 2011 2 N C P,F Numeric KDDcup99 C1 NADO algorithm
Casas et al. [118] 2012 2 N C F Numeric KDDcup99, C1 UNIDS method
Real time
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, C5 -remote to local, and C6 -worms

time traffic, it uses a structured SOM. It continuously collects instead of strings in standard genetic algorithms leading to
network data from a network port, preprocesses that data and enhanced representation ability with compact descriptions
selects the features necessary for classification. Then it starts derived from possible node reusability in a graph.
the classification process - a chunk of packets at a time - and Xian et al. [146] present a novel unsupervised fuzzy cluster-
then sends the resulting classification to a graphical tool that ing method based on clonal selection for anomaly detection.
portrays the activities that are taking place on the network The method is able to obtain global optimal clusters more
port dynamically as it receives more packets. The hypothesis quickly than competing algorithms with greater accuracy.
is that routine traffic that represents normal behavior would be In addition to the fuzzy set theoretic detection methods, we
clustered around one or more cluster centers and any irregular discuss two IDSs, viz., NFIDS and FIRE below.
traffic representing abnormal and possibly suspicious behavior NFIDS [147] is a neuro-fuzzy anomaly-based network
would be clustered in addition to the normal traffic clustering. intrusion detection system. It comprises three tiers. Tier-I
The system is capable of classifying regular vs. irregular and contains several Intrusion Detection Agents (IDAs). IDAs are
possibly intrusive network traffic for a given host. IDS components that monitor the activities of a host or a
POSEIDON (Payl Over Som for Intrusion DetectiON) [140] network and report the abnormal behavior to Tier-II. Tier-
is a two-tier network intrusion detection system. The first II agents detect the network status of a LAN based on the
tier consists of a self-organizing map (SOM), and is used network traffic that they observe as well as the reports from the
exclusively to classify payload data. The second tier consists Tier-I agents within the LAN. Tier-III combines higher-level
of a light modification of the PAYL system [84]. Tests using reports, correlates data, and sends alarms to the user interface.
the DARPA99 dataset show a higher detection rate and lower There are four main types of agents in this system: TCPAgent,
number of false positives than PAYL and PHAD [141]. which monitors TCP connections between hosts and on the
3) Fuzzy set theoretic approaches: Fuzzy network intrusion network, UDPAgent, which looks for unusual traffic involving
detection systems exploit fuzzy rules to determine the likeli- UDP data, ICMPAgent, which monitors ICMP traffic and
hood of specific or general network attacks [142], [143]. A PortAgent, which looks for unusual services in the network.
fuzzy input set can be defined for traffic in a specific network. FIRE (Fuzzy Intrusion Recognition Engine) [142] is an
Tajbakhsh et al. [144] describe a novel method for building anomaly-based intrusion detection system that uses fuzzy logic
classifiers using fuzzy association rules and use it for network to assess whether malicious activity is taking place on a
intrusion detection. The fuzzy association rule sets are used network. The system combines simple network traffic metrics
to describe different classes: normal and anomalous. Such with fuzzy rules to determine the likelihood of specific or
fuzzy association rules are class association rules where the general network attacks. Once the metrics are available, they
consequents are specified classes. Whether a training instance are evaluated using a fuzzy set theoretic approach. The system
belongs to a specific class is determined by using matching takes on fuzzy network traffic profiles as inputs to its rule set
metrics proposed by the authors. The fuzzy association rules and report maliciousness.
are induced using normal training samples. A test sample 4) Rough Set approaches: A rough set is an approximation
is classified as normal if the compatibility of the rule set of a crisp set (i.e., a regular set) in terms of a pair of sets that
generated is above a certain threshold; those with lower are its lower and upper approximations. In the standard and
compatibility are considered anomalous. The authors also original version of rough set theory [148], the two approxima-
propose a new method to speed up the rule induction algorithm tions are crisp sets, but in other variations the approximating
by reducing items from extracted rules. sets may be fuzzy sets. The mathematical framework of rough
Mabu et al. report a novel fuzzy class-association-rule set theory enables modeling of relationships with a minimum
mining method based on genetic network programming (GNP) number of rules.
for detecting network intrusions [145]. GNP is an evolutionary Rough sets have two useful features [149]: (i) enabling
optimization technique, which uses directed graph structures learning with small size training datasets (ii) and overall
320 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

simplicity. They can be applied to anomaly detection by


modeling normal behavior in network traffic. For example, in
[150], the authors present a Fuzzy Rough C-means clustering
technique for network intrusion detection by integrating fuzzy
set theory and rough set theory to achieve high detection rate.
Adetunmbi et al. [151] use rough sets and a k-NN classifier
to detect network intrusions with high detection rate and low
false alarm rate. Chen et al. present a two-step classifier for
network intrusion detection [152]. Initially, it uses rough set
theory for feature reduction and then a support vector machine
classifier for final classification. They obtain 89% accuracy on Fig. 12. Architecture of STAT system
network anomaly data.
5) Ant Colony and Artificial Immune System approaches:
Ant colony optimization [153] and related algorithms are Table X gives a comparison of several soft computing-based
probabilistic techniques for solving computational problems anomaly detection methods.
which can be reformulated to find optimal paths through
graphs. The algorithms are based on the behavior of ants
E. Knowledge-based methods and systems
seeking a path between their colony and a source of food.
Gao et al. [154] use ant colony optimization for feature In knowledge-based methods, network or host events are
selection for an SVM classifier for network intrusion detection. checked against predefined rules or patterns of attack. The
The features are represented as graph nodes with the edges goal is to represent the known attacks in a generalized fash-
between them denoting the addition of the next feature. Ants ion so that handling of actual occurrences becomes easier.
traverse the graph to add nodes until the stopping criterion is Examples of knowledge-based methods are expert systems,
encountered. rule-based, ontology-based, logic-based and state-transition
Artificial Immune Systems (AIS) represent a computational analysis [156]–[159].
method inspired by the principles of the human immune These techniques search for instances of known attacks,
system. The human immune system is adept at performing by attempting to match with pre-determined attack repre-
anomaly detection. Visconti and Tahayori [155] present a sentations. The search begins like other intrusion detection
performance-based AIS for detecting individual anomalous techniques, with a complete lack of knowledge. Subsequent
behavior. It monitors the system by analyzing the set of matching of activities against a known attack helps acquire
parameters to provide general information on its state. Interval knowledge and enter into a region with higher confidence.
type-2 fuzzy set paradigm is used to dynamically generate Finally, it can be shown that an event or activity has reached
system status. maximum anomaly score.
Advantages of soft computing-based anomaly detection An example knowledge-based system is STAT (State Transi-
methods include the following. tion Analysis Tool) [160]. Its architecture is given in Figure 12.
It models traffic data as a series of state changes that lead from
• Such learning systems detect or categorize persistent
secure state to a target compromised state. STAT is composed
features without any feedback from the environment.
of three main components: knowledge base, inference engine
• Due to the adaptive nature of ANNs, it is possible to train
and decision engine. The audit data preprocessor reformats
and test instances incrementally using certain algorithms.
the raw audit data to send as input to the inference engine.
Multi-level neural network-based techniques are more
The inference engine monitors the state transitions extracted
efficient than single level neural networks.
from the preprocessed audit data and then compares these
• Unsupervised learning using competitive neural networks
states with the states available within the knowledge base. The
is effective in data clustering, feature extraction and
decision engine monitors the improvement of the inference
similarity detection.
engine for matching accuracy of the state transitions. It also
• Rough sets are useful in resolving inconsistency in the
specifies the action(s) to be taken based on results of the
dataset and to generate a minimal, non-redundant and
inference engine and the decision table. Finally, the decision
consistent rule set.
results are sent to the SSO (Site Security Officer) interface
Some of the disadvantages of soft computing methods are for action. STAT can detect cooperative attackers and attacks
pointed out below. across user sessions well.
• Over-fitting may happen during neural network training. A few prominent knowledge-based network anomaly detec-
• If a credible amount of normal traffic data is not available, tion methods and NIDS are given below.
the training of the techniques becomes very difficult. 1) Rule-based and Expert system approaches: The expert
• Most methods have scalability problems. system approach is one of the most widely used knowledge-
• Rough set-based rule generation suffers from proof of based methods [161], [162]. An expert system, in the tra-
completeness. ditional sense, is a rule-based system, with or without an
• In fuzzy association rule-based techniques, reduced, rele- associated knowledge base. An expert system has a rule engine
vant rule subset identification and dynamic rule updation that matches rules against the current state of the system, and
at runtime is a difficult task. depending on the results of matching, fires one or more rules.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 321

TABLE X
C OMPARISON OF SOFT COMPUTING - BASED NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Cannady [133] 2000 2 O N P Numeric Real-life C2 CMAC-based model
Balajinath and Raghavan 2001 3 O N O Categorical User command C4 Behavior Model
[127]
Lee and Heinbuch [134] 2001 3 C N P - Simulated data C2 TNNID model
Xian et al. [146] 2005 3 C N P Numeric KDDcup99 C1 Fuzzy k-means
Amini et al. [130] 2006 2 C R P Categorical KDDcup99, Real- C1 RT-UNNID system
life
Chimphlee et al. [150] 2006 3 C N P Numeric KDDcup99 C1 Fuzzy Rough C-means
Liu et al. [135] 2007 2 C N P Numeric KDDcup99 C1 HPCANN Model
Adetunmbi et al. [151] 2008 2 C N P Numeric KDDcup99 C1 LEM2 and K-NN
Chen et al. [152] 2009 3 C N P Numeric DARPA98 C2 RST-SVM technique
Mabu et al. [145] 2011 3 C N P Numeric KDDcup99 C1 Fuzzy ARM-based on GNP
Visconti and Tahayori [155] 2011 2 O N P Numeric Real-life C2 Interval type-2 fuzzy set
Geramiraz et al. [143] 2012 2 O N P Numeric KDDcup99 C1 Fuzzy rule-based model
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local

Snort [113] is a quintessentially popular rule-based IDS. which an expert system makes a decision that human common
This open-source IDS matches each packet it observes against sense would recognize as impossible. They use a technique
a set of rules. The antecedent of a Snort rule is a boolean called prudence [166], in which for every rule, the upper and
formula composed of predicates that look for specific values lower bounds of each numerical variable in the data seen
of fields present in IP headers, transport headers and in the by the rule are recorded, as well as a list of values seen
payload. Thus, Snort rules identify attack packets based on for enumerated variables. The expert system raises a warning
IP addresses, TCP or UDP port numbers, ICMP codes or when a new value or a value outside the range is seen in a
types, and contents of strings in the packet payload. Snort’s data instance. They improve the approach by using a simple
rules are arranged into priority classes based on potential probabilistic technique to decide if a value is an outlier. When
impact of alerts that match the rules. Snort’s rules have working with network anomaly data, the authors partition the
evolved over its history of 15 years. Each Snort rule has problem space into smaller subspaces of homogeneous traffic,
associated documentation with the potential for false positives each of which is represented with a separate model in terms
and negatives, together with corrective actions to be taken of rules. The authors find that this approach works reasonably
when the rule raises an alert. Snort rules are simple and easily well for new subspaces when little data has been observed.
understandable. Users can contribute rules when they observe They claim 0% false negative rate in addition to very low
new types of anomalous or malicious traffic. Currently, Snort false positive rate.
has over 20, 000 rules, inclusive of those submitted by users. Scheirer and Chuah [167] report a syntax-based scheme that
An intrusion detection system like Snort can run on a uses variable-length partition with multiple break marks to
general purpose computer and can try to inspect all packets detect many polymorphic worms. The prototype is the first
that go through the network. However, monitoring packets NIDS that provides semantics-aware capability, and can cap-
comprehensively in a large network is obviously an expensive ture polymorphic shell codes with additional stack sequences
task since it requires fast inspection on a large number of and mathematical operations.
network interfaces. Many hundreds of rules may have to be 2) Ontology and logic-based approaches: It is possible to
matched concurrently, making scaling almost impossible. model attack signatures using expressive logic structure in
To scale to large networks that collect flow statistics ubiqui- real time by incorporating constraints and statistical properties.
tously, Duffield et al. [163] use the machine learning algorithm Naldurg et al. [168] present a framework for intrusion detec-
called Adaboost [164] to translate packet level signatures tion based on temporal logic specification. Intrusion patterns
to work with flow level statistics. The algorithm is used to are specified as formulae in an expressively rich and efficiently
correlate the packet and flow information. In particular, the monitorable logic called EAGLE and evaluated using DARPA
authors associate packet level network alarms with a feature log files.
vector they create from flow records on the same traffic. They Estevez-Tapiador et al. [169] describe a finite state ma-
create a set of rules using flow information with features chine (FSM) methodology, where a sequence of states and
similar to those used in Snort rules. They also add numerical transitions among them seems appropriate to model network
features such as the number of packets of a specific kind protocols. If the specifications are complete enough, the model
flowing within a certain time period. Duffield et al. train is able to detect illegitimate behavioral patterns effectively.
Adaboost on concurrent flow and packet traces. They evaluate Shabtai et al. [170] describe an approach for detecting
the system using real time network traffic data with more than previously un-encountered malware targeting mobile devices.
a billion flows over 29 days, and show that their performance Time-stamped security data is continuously monitored within
is comparable to Snort’s with flow data. the target mobile devices like smart phones and PDAs. Then
Prayote and Compton [165] present an approach to anomaly it is processed by the knowledge-based temporal abstraction
detection that attempts to address the brittleness problem in (KBTA) methodology. The authors evaluate the KBTA model
322 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

by using a lightweight host-based intrusion detection system,


combined with central management capabilities for Android-
based mobile phones.
Hung and Liu [171] use ontologies as a way of describing
the knowledge of a domain, expressing the intrusion detection
system in terms of the end user domain. Ontologies are used
as a conceptual modeling tool allowing a non-expert person
to model intrusion detection applications using the concepts
of intrusion detection more intuitively.
A comparison of knowledge-based anomaly detection meth-
ods is given in Table XI.
The main advantages of knowledge-based anomaly detec-
tion methods include the following.
• These techniques are robust and flexible.
Fig. 13. Architecture of Octopus-IIDS system
• These techniques have high detection rate, if a substantial
knowledge base can be acquired properly about attacks
Chebrolu et al. [178] present an ensemble approach by
as well as normal instances.
combining two classifiers, Bayesian networks (BN) and Clas-
Some disadvantages of knowledge-based methods are listed sification and Regression Trees (CART) [90], [179]. A hybrid
below. architecture for combining different feature selection algo-
• The development of high-quality knowledge is often rithms for real world intrusion detection is also incorporated
difficult and time-consuming. for getting better results. Perdisci et al. [180] construct a high
• Due to non-availability of biased normal and attack data, speed payload anomaly IDS using an ensemble of one-class
such a method may generate a large number of false SVM classifiers intended to be accurate and hard to evade.
alarms. Folino et al. [181] introduce a distributed data mining
• Such a method may not be able to detect rare or unknown algorithm to improve detection accuracy when classifying
attacks. malicious or unauthorized network activity using genetic pro-
• Dynamic updation of rule or knowledge base is a costly gramming (GP) extended with the ensemble paradigm. Their
affair. data is distributed across multiple autonomous sites and the
learner component acquires useful knowledge from data in a
F. Combination learner methods and systems cooperative way and uses network profiles to predict abnormal
In this section, we present a few methods and systems which behavior with better accuracy.
use combinations of multiple techniques, usually classifiers. Nguyen et al. [58] build an individual classifier using both
1) Ensemble-based methods and systems: The idea behind the input feature space and an additional subset of features
the ensemble methodology is to weigh several individual clas- given by k-means clustering. The ensemble combination is
sifiers, and combine them to obtain an overall classifier that calculated based on the classification ability of classifiers on
outperforms every one of them [172]–[176]. These techniques different local data segments given by k-means clustering.
weigh the individual opinions, and combine them to reach a Beyond the above methods, some ensemble-based IDSs are
final decision. The ensemble-based methods are categorized given below.
based on the approaches used. Three main approaches to The paradigm of multiple classifier system (MCS) has
develop ensembles are (i) bagging, (ii) boosting, and (iii) stack also been used to build misuse detection IDSs. Classifiers
generalization. Bagging (Bootstrap Aggregating) increases trained on different feature subsets can be combined to achieve
classification accuracy by creating an improved composite better classification accuracy than the individual classifiers. In
classifier into a single prediction by combining the outputs of such a NIDS, network traffic is serially processed by each
learnt classifiers. Boosting builds an ensemble incrementally classifier. At each stage, a classifier may either decide for one
by training mis-classified instances obtained from the previous attack class or send the pattern to another stage, which is
model. Stack generalization achieves the high generalization trained on more difficult cases. Reported results show that an
accuracy by using output probabilities for every class label MCS improves the performance of IDSs based on statistical
from the base-level classifiers. pattern recognition techniques. For example, CAMNEP [182]
Octopus-IIDS [177] is an example of ensemble IDS. The is a fast prototype agent-based NIDS designed for high-speed
architecture of this system is shown in Figure 13. It is networks. It integrates several anomaly detection techniques,
developed using two types of neural networks, Kohonen and and operates on a collective trust model within a group of
Support Vector Machines. The system is composed of two collaborative detection agents. The anomalies are used as input
layers: classifier and anomaly detection. The classifier is for trust modeling. Aggregation is performed by extended trust
responsible for capturing and preprocessing of network traffic models of generalized situated identities, represented by a set
data. It classifies the data into four main categories: DoS, of observable features. The system is able to perform real time
probe, U2R and R2L. A specific class of attack is identified in surveillance of gigabit networks.
the anomaly detection layer. The authors claim that the IDS McPAD (Multiple classifier Payload-based Anomaly Detec-
works effectively in small scale networks. tion) [183] is an effective payload-based anomaly detection
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 323

TABLE XI
C OMPARISON OF KNOWLEDGE - BASED NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Noel et al. [156] 2002 - O N O - - - Attack Guilt Model
Sekar et al. [157] 2002 3 O N P Numeric DARPA99 C1 Specification-Based Model
Tapiador et al. [169] 2003 3 C N P Numeric Real-life C2 Markov Chain Model
Hung and Liu [171] 2008 - O N P Numeric KDDcup99 C1 Ontology-based
Shabtai et al. [170] 2010 2 O N O - Real-life C2 Incremental KBTA
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local

TABLE XII
C OMPARISON OF ENSEMBLE - BASED NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- Combination strategy w x y Data types Dataset used z Detection method
cation
Chebrolu et al. 2005 Weighted O N P Numeric KDDcup99 C1 Class specific
[178] voting ensemble model
Perdisci et al. [180] 2006 Majority O N Pay - Operational Synthetic in- One-class classifier
voting points trusions model
Borji [173] 2007 Majority O N P Numeric DARPA98 C1 Heterogeneous clas-
voting sifiers model
Perdisci et al. [183] 2009 Min and Max probability O R Pay - DARPA98 C1 McPAD model
Folino et al. [181] 2010 Weighted majority vot- O N P Numeric KDDcup99 C1 GEdIDS model
ing
Noto et al. [176] 2010 Information theoretic O N - Numeric UCI None FRaC model
Nguyen et al. [58] 2011 Majority O N P Numeric KDDcup99 C1 Cluster ensemble
voting
Khreich et al. [184] 2012 Learn and combine O N pay Numeric UNM C4 EoHMMs model
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or payload-based (pay) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local

system that consists of an ensemble of one-class classifiers. classification accuracy compared to the stand-alone general
It is very accurate in detecting network attacks that bear decision-based techniques even though such a system may
some form of shell-code in the malicious payload. This detec- have several disparate data sources. So, a suitable combination
tor performs well even in the case of polymorphic attacks. of these is the focus of the fusion approach. Several fusion-
Furthermore, the authors tested their IDS with advanced based techniques have been applied to network anomaly
polymorphic blending attacks and showed that even in the detection [185]–[189]. A classification of such techniques is
presence of such sophisticated attacks, it is able to obtain a as follows: (i) data level, (ii) feature level, and (iii) decision
low false positive rate. level. Some methods only address the issue of operating in
An ensemble method is advantageous because it obtains a space of high dimensionality with features divided into
higher accuracy than the individual techniques. The following semantic groups. Others attempt to combine classifiers trained
are the major advantages. on different features divided based on hierarchical abstraction
• Even if the individual classifiers are weak, the ensemble levels or the type of information contained.
methods perform well by combining multiple classifiers. Giacinto et al. [185] provide a pattern recognition approach
• Ensemble methods can scale for large datasets. to network intrusion detection employing a fusion of mul-
• Ensemble classifiers need a set of controlling parameters tiple classifiers. Five different decision fusion methods are
that are comprehensive and can be easily tuned. assessed by experiments and their performances compared.
• Among existing approaches, Adaboost and Stack gener- Shifflet [186] discusses a platform that enables a multitude of
alization are more effective because they can exploit the techniques to work together towards creating a more realistic
diversity in predictions by multiple base level classifiers. fusion model of the state of a network, able to detect mali-
cious activity effectively. A heterogenous data level fusion for
Here are some disadvantages of ensemble-based methods.
network anomaly detection is added by Chatzigiannakis et al.
• Selecting a subset of consistent performing and unbiased [190]. They use the Dempster-Shafer Theory of Evidence and
classifiers from a pool of classifiers is difficult. Principal Components Analysis for developing the technique.
• The greedy approach for selecting sample datasets is slow dLEARNIN [187] is an ensemble of classifiers that com-
for large datasets. bines information from multiple sources. It is explicitly tuned
• It is difficult to obtain real time performance. to minimize the cost of errors. dLEARNIN is shown to achieve
A comparison of ensemble-based network anomaly detec- state-of-the-art performance, better than competing algorithms.
tion methods is given in Table XII. The cost minimization strategy, dCMS, attempts to minimize
2) Fusion-based methods and system: With an evolving the cost to a significant level. Gong et al. [191] contribute a
need of automated decision making, it is important to improve neural network-based data fusion method for intrusion data
324 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

analysis and pruning to filter information from multi-sensors building a misuse, anomaly, and hybrid network-based IDS.
to get high detection accuracy. The hybrid detection system improves detection performance
HMMPayl [192] is an example of fusion-based IDS, where by combining the advantages of both misuse and anomaly
the payload is represented as a sequence of bytes, and the detection. Tong et al. [201] discuss a hybrid RBF/Elman neural
analysis is performed using Hidden Markov Models (HMM). network model that can be employed for both anomaly detec-
The algorithm extracts features and uses HMM to guarantee tion and misuse detection. It can detect temporally dispersed
the same expressive power as that of n-gram analysis, while and collaborative attacks effectively because of its memory of
overcoming its computational complexity. HMMPayl follows past events.
the Multiple Classifiers System paradigm to provide better A intelligent hybrid IDS model based on neural networks
classification accuracy, to increase the difficulty of evading is introduced by [202]. The model is flexible, extended to
the IDS, and to mitigate the weaknesses due to a non-optimal meet different network environments, improves detection per-
choice of HMM parameters. formance and accuracy. Selim et al. [203] report a hybrid
Some advantages of fusion methods are given below. intelligent IDS to improve the detection rate for known and
• Data fusion is effective in increasing timeliness of attack unknown attacks. It consists of multiple levels: hybrid neural
identification and in reducing false alarm rates. networks and decision trees. The technique is evaluated using
• Decision level fusion with appropriate training data usu- NSL-KDD dataset and results were promising.
ally yields high detection rate. Advantages of hybrid methods include the following.
Some of the drawbacks are given below. • Such a method exploits major features from both signa-
• The computational cost is high for rigorous training on
ture and anomaly-based network anomaly detection.
the samples. • Such methods can handle both known and unknown
• Feature level fusion is a time consuming task. Also, the
attacks.
biases of the base classifiers affect the fusion process.
• Building hypotheses for different classifiers is a difficult
Drawbacks include the following.
task. • Lack of appropriate hybridization may lead to high
A comparison of fusion-based network anomaly detection computational cost.
methods is given in Table XIII. • Dynamic updation of rule or profile or signature still
3) Hybrid methods and system: Most current network remains difficult.
intrusion detection systems employ either misuse detection or Table XIV presents a comparison of a few hybrid network
anomaly detection. However, misuse detection cannot detect anomaly detection methods.
unknown intrusions, and anomaly detection usually has high
false positive rate [193]. To overcome the limitations of
the techniques, hybrid methods are developed by exploiting G. Discussion
features from several network anomaly detection approaches
[194]–[196]. Hybridization of several methods increases per- After a long and elaborate discussion of many intrusion
formance of IDSs. detection methods and anomaly-based network intrusion de-
For example, RT-MOVICAB-IDS, a hybrid intelligent IDS tection systems under several categories, we make a few
is introduced in [197]. It combines ANN and CBR (case-based observations.
reasoning) within a Multi-Agent System (MAS) to detect (i) Each class of anomaly-based network intrusion detection
intrusion in dynamic computer networks. The dynamic real methods and systems has unique strengths and weak-
time multi-agent architecture allows the addition of prediction nesses. The suitability of an anomaly detection technique
agents (both reactive and deliberative). In particular, two of depends on the nature of the problem attempted to
the deliberative agents deployed in the system incorporate address. Hence, providing a single integrated solution to
temporal-bounded CBR. This upgraded CBR is based on an every anomaly detection problem may not be feasible.
anytime approximation, which allows the adaptation of this (ii) Various methods face various challenges when complex
paradigm to real time requirements. datasets are used. Nearest neighbor and clustering tech-
A hybrid approach to host security that prevents binary niques suffer when the number of dimensions is high
code injection attacks known as the FLIPS (Feedback Learning because the distance measures in high dimensions are not
IPS) model is proposed by [198]. It incorporates three major able to differentiate well between normal and anomalous
components: an anomaly-based classifier, a signature-based instances.
filtering scheme, and a supervision framework that employs Spectral techniques explicitly address the high di-
Instruction Set Randomization (ISR). Capturing the injected mensionality problem by mapping data to a lower di-
code allows FLIPS to construct signatures for zero-day ex- mensional projection. But their performance is highly
ploits. Peddabachigari et al. [199] present a hybrid approach dependent on the assumption that normal instances and
that combines Decision trees (DT) and SVMs as a hierarchi- anomalies are distinguishable in the projected space. A
cal hybrid intelligent system model (DTSVM) for intrusion classification technique often performs better in such a
detection. It maximizes detection accuracy and minimizes scenario. However, it requires labeled training data for
computational complexity. both normal and attack classes. The improper distribution
Zhang et al. [200] propose a systematic framework that of these training data often makes the task of learning
applies a data mining algorithm called random forests in more challenging.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 325

TABLE XIII
C OMPARISON OF FUSION - BASED NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- Fusion level w x y Data types Dataset used z Detection method
cation
Giacinto et al. [185] 2003 Decision O N P Numeric KDDcup99 C1 MCS Model
Shifflet [186] 2005 Data O N O - - None HSPT algorithm
Chatzigiannakis et al. [190] 2007 Data C N P - NTUA, GRNET C2 D-S algorithm
Parikh and Chen [187] 2008 Data C N P Numeric KDDcup99 C1 dLEARNIN system
Gong et al. [191] 2010 Data C N P Numeric KDDcup99 C1 IDEA model
Ariu et al. [192] 2011 Decision C R Pay - DARPA98, real-life C1 HMMPayl model
Yan and Shao [189] 2012 Decision O N F Numeric Real time C2 , C3 EWMA model
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or payload-based (pay) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local

TABLE XIV
C OMPARISON OF HYBRID NETWORK ANOMALY DETECTION METHODS

Author (s) Year of publi- No. of param- w x y Data types Dataset used z Detection method
cation eters
Locasto et al. [198] 2005 2 C R P - Real-life C2 FLIPS model
Zhang and Zulkernine [194] 2006 2 C N P Numeric KDDcup99 C1 Random forest-based hybrid
algorithm
Peddabachigari et al. [199] 2007 2 C N P Numeric KDDcup99 C1 DT-SVM hybrid model
Zhang et al. [200] 2008 2 C N P Numeric KDDcup99 C1 RFIDS model
Aydin et al. [195] 2009 3 C N P - DARPA98, IDE- C1 Hybrid signature-based IDS
VAL
Tong et al. [201] 2009 1 C N P Numeric DARPA-BSM C1 Hybrid RBF/Elman NN
Yu [202] 2010 1 C N - - - - Hybrid NIDS
Arumugam et al. [193] 2010 - C N P Numeric KDDcup99 C1 Multi-stage hybrid IDS
Selim et al. [203] 2011 - C N P Numeric KDDcup99 C1 Hybrid multi-level IDS
Panda et al. [196] 2012 2 C N P Numeric NSL-KDD, KD- C1 DTFF and FFNN
Dcup99
w-indicates centralized (C) or distributed (D) or others (O)
x-the nature of detection as real time (R) or non-real time (N)
y-characterizes packet-based (P) or flow-based (F) or hybrid (H) or others (O)
z-represents the list of attacks handled: C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, and C5 -remote to local

Semi-supervised nearest neighbor and clustering tech- (misuse, anomaly or both), nature of detection (online or
niques that only use normal labels, can often be more ef- offline), nature of processing (centralized or distributed), data
fective than classification-based techniques. In situations gathering mechanism (centralized or distributed) and approach
where identifying a good distance measure is difficult, of analysis. A comparison chart is given in Table XV.
classification or statistical techniques may be a better
choice. However, the success of the statistical techniques V. E VALUATION C RITERIA
is largely influenced by the applicability of the statistical To evaluate performance, it is important that the system
assumptions in the specific real life scenarios. identifies the attack and normal data correctly. There are sev-
(iii) For real time intrusion detection, the complexity of eral datasets and evaluation measures available for evaluating
the anomaly detection process plays a vital role. In network anomaly detection methods and systems. The most
case of classification, clustering, and statistical methods, commonly used datasets and evaluation measures are given
although training is expensive, they are still acceptable below.
because testing is fast and training is offline. In con-
trast, techniques such as nearest neighbor and spectral
techniques which do not have a training phase, have an A. Datasets
expensive testing phase which can be a limitation in a Capturing and preprocessing high speed network traffic is
real setting. essential prior to detection of network anomalies. Different
(iv) Anomaly detection techniques typically assume that tools are used for capture and analysis of network traffic data.
anomalies in data are rare when compared to normal We list a few commonly used tools and their features in Table
instances. Generally, such assumptions are valid, but not XVI. These are commonly used by both the network defender
always. Often unsupervised techniques suffer from large and the attacker at different time points.
false alarm rates, when anomalies are in bulk amounts. The following are various datasets that have been used for
Techniques operating in supervised or semi-supervised evaluating network anomaly detection methods and systems.
modes [204] can be applied to detect bulk anomalies. A taxonomy of different datasets is given in Figure 14.
1) Synthetic datasets: Synthetic datasets are generated to
We perform a comparison of the anomaly-based network meet specific needs or conditions or tests that real data satisfy.
intrusion detection systems that we have discussed throughout This can be useful when designing any type of system for
this paper based on parameters such as mode of detec- theoretical analysis so that the design can be refined. This
tion (host-based, network-based or both), detection approach allows for finding a basic solution or remedy, if the results
326 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

TABLE XV
C OMPARISON OF EXISTING NIDS S

Name of IDS Year of publication a b c d e Approach


STAT [160] 1995 H M R C C Knowledge-based
FIRE [142] 2000 N A N C C Fuzzy Logic
ADAM [32] 2001 N A R C C Classification
HIDE [33] 2001 N A R C D Statistical
NSOM [139] 2002 N A R C C Neural network
MINDS [34] 2003 N A R C C Clustering and Outlier-based
NFIDS [147] 2003 N A N C C Neuro Fuzzy Logic
N@G [93] 2003 H B R C C Statistical
FSAS [94] 2006 N A R C C Statistical
POSEIDON [140] 2006 N A R C C SOM & Modified PAYL
RT-UNNID [130] 2006 N A R C C Neural Network
DNIDS [110] 2007 N A R C C CSI-KNN based
CAMNEP [182] 2008 N A R C C Agent-based Trust and Reputation
McPAD [183] 2009 N A N C C Multiple classifier
Octopus-IIDS [177] 2010 N A N C C Neural network & SVM
HMMPayl [192] 2011 N A R C C HMM model
RT-MOVICAB-IDS [197] 2011 N A R C C Hybrid IDS
a-represents the types of detection such as host-based (H) or network-based (N) or hybrid (H)
b-indicates the class of detection mechanism as misuse (M) or anomaly (A) or both (B)
c-denotes the nature of detection as real time (R) or non-real time (N)
d-characterizes the nature of processing as centralized (C) or distributed (D)
e-indicates the data gathering mechanism as centralized (C) or distributed (D)

TABLE XVI
T OOLS USED IN DIFFERENT STEPS IN NETWORK TRAFFIC ANOMALY DETECTION AND THEIR DESCRIPTION

Tool Name Purpose Characteristics Source


Wireshark Packet (i) Free and open-source packet analyzer. (ii) Can be used for network troubleshooting, analysis, http://www.wireshark.org/
capture software and communications protocol development, and education. (iii) Uses cross-platform GTK+
widget toolkit to implement its user interface, and uses pcap to capture packets. (iv) Similar to tcpdump,
but has a graphical front-end, plus some integrated sorting and filtering options. (v) Works in mirrored
ports to capture network traffic to analyze for any tampering.
Gulp Lossless (i) It allows much higher packet capture rate by dropping far fewer packets. (ii) It has ability to read http://staff.washington.edu/
gigabit directly from the network, but is able to even pipe output from legacy applications before writing to corey/gulp/
remote packet disk. (iii) If the data rate increases, Gulp realigns its writes to even block boundaries for optimum
capturing writing efficiency. (iv) When it receives an interrupt, it stops filling its ring buffer but does not exit
until it has finished writing whatever remains in the ring buffer.
tcptrace TCP-based (i) Can take input files produced by several popular packet-capture programs, including tcpdump, http://jarok.cs.ohiou.edu/
feature snoop, etherpeek, HP Net Metrix, Wireshark, and WinDump. (ii) Produces several types of output software/tcptrace/
extraction containing information on each connection seen, such as elapsed time, bytes and segments sent and
received, retransmissions, round trip times, window advertisements, and throughput. (iii) Can also
produce a number of graphs with packet statistics for further analysis.
nfdump netflow data (i) Can collect and process netflow data on the command line. (ii) It is limited only by the disk space http://nfdump.sourceforge. net/
collection available for all the netflow data. (iii) Can be optimized in speed for efficient filtering. The filter rules
look like the syntax of tcpdump.
nfsen netflow data (i) NfSen is a graphical Web-based front end for the nfdump netflow tool. (ii) It allows display of http://nfsen.sourceforge.net/
collection and netflow data as flows, packets and bytes using RRD (Round Robin Database). (iii) Can process the
visualization netflow data within a specified time span. (iv) Can create history as well as continuous profiles. (v)
Can set alerts, based on various conditions.
nmap Scanning port (i) Nmap (Network Mapper) is a free and open source utility for network exploration or security http://nmap.org/
auditing. (ii) Uses raw IP packets in novel ways to determine what hosts are available on the network,
what services (application name and version) those hosts offer, what operating systems are running,
type of firewall or packet filter used, and many other characteristics. (iii) It is easy, flexible, powerful,
well documented tool for discovering hosts in large network.
rnmap Coordinated (i) Remote Nmap (Rnmap) contains both client and server programs. (ii) Various clients can connect http://rnmap.sourceforge. net/
scanning to one centralized Rnmap server and do their port scanning. (iii) Server performs user authentication
and uses excellent Nmap scanner to do actual scanning.
Targa Attack (i) Targa is free and powerful attack generation tool. (ii) It integrates bonk, jolt, land, nestea, netear, http://www10.org/cdrom/
simulation syndrop, teardrop, and winnuke into one multi-platform DoS attack. papers/409/

prove to be satisfactory. Synthetic data is used in testing and This dataset was prepared by Stolfo et al. [206] and is
creating many different types of test scenarios. It enables built on the data captured in the DARPA98 IDS evaluation
designers to build realistic behavior profiles for normal users program. The KDD training dataset consists of approximately
and attackers based on the generated dataset to test a proposed 4, 900, 000 single connection vectors, each of which contains
system. 41 features and is labeled as either normal or attack with a
specific attack type. The test dataset contains about 300, 000
2) Benchmark datasets: In this subsection, we present
samples with 24 training attack types, with an additional 14
six publicly available benchmark datasets generated using
attack types in the test set only. The names and descriptions
simulated environments that include a number of networks
of the attack types are available in [205].
and by executing different attack scenarios.
(a) KDDcup99 dataset: Since 1999, the KDDcup99 dataset (b) NSL-KDD dataset: Analysis of the KDD dataset showed
[205] has been the most widely used dataset for the evaluation that there were two important issues in the dataset, which
of network-based anomaly detection methods and systems. highly affect the performance of evaluated systems result-
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 327

(d) DEFCON dataset: The DEFCON7 dataset is another


commonly used dataset for evaluation of IDSs [210]. It con-
tains network traffic captured during the hacker competition
called Capture The Flag (CTF), in which competing college
teams are divided into two groups: attackers and defenders.
The traffic produced during CTF is very different from real
world network traffic since it contains only intrusive traffic
without any normal background traffic. Due to this reason,
the DEFCON dataset has been found useful in evaluating alert
correlation techniques.
(e) CAIDA dataset: CAIDA8 collects many different types
of data and makes it available to the research community. Most
CAIDA datasets [211] are very specific to particular events or
Fig. 14. Taxonomy of different datasets attacks (e.g., CAIDA DDoS attack 2007 dataset). All backbone
traces are anonymized and do not have payload information.
(f) LBNL dataset: LBNL’s (Lawrence Berkeley National
ing in poor evaluation of anomaly detection methods [207]. Laboratory) internal enterprise traces are full header network
To solve these issues, a new dataset known as NSL-KDD traces [212], without payload. This dataset has undergone
[208], consisting of selected records of the complete KDD heavy anonymization to the extent that scanning traffic was ex-
dataset was introduced. This dataset is publicly available for tracted and separately anonymized to remove any information
researchers5 and has the following advantages over the original which could identify individual IPs. The packet traces were
KDD dataset. obtained at the two central routers of the LBNL network and
• It does not include redundant records in the training set, they contain more than one hundred hours of traffic generated
so that the classifiers will not be biased towards more from several thousand internal hosts.
frequent records. 3) Real life datasets: In this subsection, we present three
• There are no duplicate records in the test set. Therefore, real life datasets created by collecting network traffic on
the performance of the learners is not biased by the several days, which include both normal as well as attack
methods which have better detection rates on the frequent instances in appropriate proportions in the authors’ respective
records. campus networks.
• The number of selected records from each difficulty level (a) UNIBS dataset: The UNIBS packet traces [213] were
is inversely proportional to the percentage of records in collected on the edge router of the campus network of the
the original KDD dataset. As a result, the classification University of Brescia, Italy, on three consecutive working
rates of various machine learning methods vary in a wide days. This dataset includes traffic captured or collected and
range, which makes it more efficient to have an accurate stored through 20 workstations running the GT client daemon.
evaluation of various learning techniques. The authors collected the traffic by running tcpdump on
• The number of records in the training and testing sets the faculty router, which was a dual Xeon Linux box that
are reasonable, which makes it affordable to run experi- connected their network to the Internet through a dedicated
ments on the complete set without the need to randomly 100Mb/s uplink. The traces were captured and stored on a
select a small portion. Consequently, evaluation results of dedicated disk of a workstation connected to the router through
different research groups are consistent and comparable. a dedicated ATA controller.
(b) ISCX-UNB dataset: Real packet traces [214] were ana-
(c) DARPA 2000 dataset: A DARPA6 evaluation project lyzed to create profiles for agents that generate real traffic for
[209] targeted the detection of complex attacks that contain HTTP, SMTP, SSH, IMAP, POP3 and FTP protocols. Various
multiple steps. Two attack scenarios were simulated in the multi-stage attack scenarios were explored for generating
2000 evaluation contest, namely, LLDOS (Lincoln Laboratory malicious traffic.
scenario DDoS) 1.0 and LLDOS 2.0. To achieve the necessary (c) TUIDS dataset: The TUIDS9 dataset [215], [216] has
variations, these two attack scenarios were carried out over been prepared at the Network Security Lab at Tezpur Uni-
several network and audit scenarios. These sessions were versity, India based on several attack scenarios. Initially, the
grouped into four attack phases: (a) probing, (b) breaking into creators capture network traffic in both packet and flow level
the system by exploiting vulnerability, (c) installing DDoS using gulp [217] and nfdump [218], then preprocess the
software for the compromised system and (d) launching DDoS raw traffic and label each traffic as attack or normal. The
attack against another target. LLDOS 2.0 is different from authors extract features such as basic, content, time, window
LLDOS 1.0 in the sense that attacks are more stealthy and and connectionless features from the preprocessed data, then
thus harder to detect. Since this dataset contains multi-stage correlate the features and generate the final datasets.
attack scenarios, it is also commonly used for evaluation of These datasets are valuable assets for the intrusion detection
alert correlation methods. community. However, the benchmark datasets suffer from the
5 http://www.iscx.ca/NSL-KDD/ 7 http://cctf.shmoo.com/data/
6 http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/ 8 http://www.caida.org/home/

data/index.html 9 http://agnigarh.tezu.ernet.in/∼dkb/resources/
328 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

Fig. 15. Confusion matrix and related evaluation measures Fig. 16. Illustration of confusion matrix in terms of related evaluation
measures

fact that they are not good representatives of real world traffic. (n) test instance is predicted as normal (N) it is known as true
For example, the DARPA dataset has been questioned about negative (TN), while it is a false positive (FP) if it is predicted
the realism of the background traffic [219], [220] because as anomalous (Y) [40], [227], [228].
it is synthetically generated. In addition to the difficulty of The true positive rate (TPR) is the proportion of anomalous
simulating real time network traffic, there are some other instances classified correctly over the total number of anoma-
challenges in IDS evaluation [221]. A comparison of datasets lous instances present in the test data. TPR is also known as
is shown in Table XVII. sensitivity. The false positive rate (FPR) is the proportion of
normal instances incorrectly classified as anomalous over the
total number of normal instances contained in the test data.
B. Evaluation Measures The true negative rate (TNR) is also called specificity. TPR,
An evaluation of a method or a system in terms of accuracy FPR, TNR, and the false negative rate (FNR) can be defined
or quality is a snapshot in time. As time passes, new vul- for the normal class. We illustrate all measures related to the
nerabilities may evolve, and current evaluations may become confusion matrix in Figure 16.
irrelevant. In this section, we discuss various measures used Sensitivity is also known as the hit rate. Between sensitivity
to evaluate network intrusion detection methods and systems. and specificity, sensitivity is set at high priority when the
1) Accuracy: Accuracy is a metric that measures how system is to be protected at all cost, and specificity gets
correctly an IDS works, measuring the percentage of detection more priority when efficiency is of major concern [227].
and failure as well as the number of false alarms that the Consequently, the aim of an IDS is to produce as many TPs
system produces [223], [224]. If a system has 80% accuracy, and TNs as possible while trying to reduce numbers of both
it means that it correctly classifies 80 instances out of 100 FPs and FNs. The majority of evaluation criteria use these
to their actual classes. While there is a big diversity of variables and the relations among them to model the accuracy
attacks in intrusion detection, the main focus is that the of the IDSs.
system be able to detect an attack correctly. From real life (b) ROC Curves: The Receiver Operating Characteristics
experience, one can easily conclude that the actual percentage (ROC) analysis originates from signal processing theory. Its
of abnormal data is much smaller than that of the normal [57], applicability is not limited only to intrusion detection, but
[225], [226]. Consequently, intrusions are harder to detect than extends to a large number of practical fields such as medical
normal traffic, resulting in excessive false alarms as the biggest diagnosis, radiology, bioinformatics as well as in artificial
problem facing IDSs. The following are the some accuracy intelligence and data mining. In intrusion detection, ROC
measures. curves are used on the one hand to visualize the relation
(a) Sensitivity and Specificity: These two measures [227] between TP and FP rates of a classifier while tuning it and
attempt to measure the accuracy of classification for a 2-class also to compare the accuracy with two or more classifiers. The
problem. When an IDS classifies data, its decision can be ROC space [229], [230] uses the orthogonal coordinate system
either right or wrong. It assumes true for right and false for to visualize the classifier accuracy. Figure 17 illustrates the
wrong, respectively. ROC approach normally used for network anomaly detection
If S is a detector and Dt is the set of test instances, there are methods and systems evaluation.
four possible outcomes described using the confusion matrix (c) Misclassification rate: This measure attempts to estimate
given in Figure 15. When an anomalous test instance (p) is the probability of disagreement between the true and predicted
predicted as anomalous (Y) by the detector S, it is counted cases by dividing the sum of FN and FP by the total number
as true positive (TP); if it is predicted as normal (N), it is of pairs observed, i.e., (TP+FP+FN+TN). In other words, mis-
counted as false negative (FN). On the other hand, if a normal classification rate is defined as (FN+FP)/(TP+FP+FN+TN).
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 329

TABLE XVII
L IST OF DATASETS AVAILABLE AND THEIR DESCRIPTIONS

Dataset u v w No. of instances No. of attributes x y z Some references


Synthetic No No Yes user dependent user dependent Not known any user dependent [111], [124]
KDDcup99 Yes No Yes 805050 41 BCTW P C1 [107], [115], [117], [123]
NSL-KDD Yes No Yes 148517 41 BCTW P C1 [207]
DARPA 2000 Yes No No Huge Not known P Raw C2 [214]
DEFCON No No No Huge Not known Raw P C2 [214]
CAIDA Yes Yes No Huge Not known Raw P C1 [214]
LBNL Yes Yes No Huge Not known Raw P C2 [222]
ISCX-UNB Yes Yes Yes Huge Not known Raw P A [214]
TUIDS Yes Yes Yes 301760 50,24 BCTW P, F C1 [124], [215]
u-realistic network configuration
v-indicates realistic traffic
w-describes the label information
x-types of features extracted as basic features (B), content-based features (C), time-based features (T) and window-based features (W)
y-explains the types of data as packet-based (P) or flow-based (F) or hybrid (H) or Others (O)
z-represents the attack category as C1 -all attacks, C2 -denial of service, C3 -probe, C4 -user to root, C5 -remote to local, and A-application layer attacks

types of attacks that are not identified can indicate which areas
of the algorithm need more attention. Exposing these flaws and
establishing the causes assist future improvement.
The F-measure mixes the properties of the previous two
measures as the harmonic mean of precision and recall [40],
[228]. If we want to use only one accuracy metric as an
evaluation criterion, F-measure is the most preferable. Note
that when precision and recall both reach 100%, the F-measure
is the maximum, i.e., 1 meaning that the classifier has 0% false
alarms and detects 100% of the attacks. Thus, a good classifier
is expected to obtain F-measure as high as possible.
2) Performance: The evaluation of an IDS performance is
an important task. It involves many issues that go beyond
the IDS itself. Such issues include the hardware platform,
the operating system or even the deployment of the IDS.
For a NIDS, the most important evaluation criterion for its
performance is the system’s ability to process traffic on a high
Fig. 17. Illustration of ROC measure where A, B, C represents the accuracy speed network with minimum packet loss when working real
of a detection method or a system in ascending order.
time. In real network traffic, the packets can be of various
sizes, and the effectiveness of a NIDS depends on its ability
(d) Confusion Matrix: The confusion matrix is a ranking to handle packets of any size. In addition to the processing
method that can be applied to any kind of classification speed, the CPU and memory usage can also serve as measure-
problem. The size of this matrix depends on the number of ments of NIDS performance [231]. These are usually used as
distinct classes to be detected. The aim is to compare the actual indirect measures that take into account the time and space
class labels against the predicted ones as shown in Figure 15. complexities of intrusion detection algorithms. Finally, the
The diagonal represents correct classification. The confusion performance of any NIDS is highly dependent upon (i) its
matrix for intrusion detection is defined as a 2-by-2 matrix, individual configuration, (ii) the network it is monitoring, and
since there are only two classes known as intrusion and normal (iii) its position in that network.
[40], [226], [228]. Thus, the TNs and TPs that represent the 3) Completeness: The completeness criterion represents the
correctly predicted cases lie on the matrix diagonal while the space of the vulnerabilities and attacks that can be covered by
FNs and FPs are on the right and left sides. As a side effect of an IDS. This criterion is very hard to assess because having
creating the confusion matrix, all four values are displayed in omniscience of knowledge about attacks or abuses of privilege
a way that the relation between them can be easily understood. is impossible. The completeness of an IDS is judged against
(e) Precision, Recall and F-measure: Precision is a measure a complete set of known attacks. The ability of an IDS is
of how a system identifies attacks or normals. A flagging considered complete, if it covers all the known vulnerabilities
is accurate if the identified instance indeed comes from a and attacks.
malicious user, which is referred to as true positive. The final 4) Timeliness: An IDS that performs its analysis as quickly
quantity of interest is recall, a measure of how many instances as possible enables the human analyst or the response engine
are identified correctly (see Figure 15). Precision and recall to promptly react before much damage is done within a
are often inversely proportional to each other and there is specific time period. This prevents the attacker from subverting
normally a trade-off between these two ratios. An algorithm the audit source or the IDS itself. The response generated by
that produces low precision and low recall is most likely the system while combating an attack is very important. Since
defective with conceptual errors in the underlying theory. The the data must be processed to discover intrusions, there is
330 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

always a delay between the actual moment of the attack and • Network anomalies may originate from various sources as
the response of the system. This is called total delay. Thus, the discussed in Section III. So, a better IDS should be able
total delay is the difference between tattack and tresponse . The to recognize origins of the anomalies before initiating the
smaller the total delay, the better an IDS is with respect to its detection process.
response. No matter if an IDS is anomaly-based or signature- • An IDS, to be capable of identifying both known as
based, there is always a gap between the starting time of an well as unknown attacks, should exploit both supervised
attack and its detection. (rule or signature-based learning) as well as unsupervised
5) Data Quality: Evaluating the quality of data is another (clustering or outlier-based) at multiple levels for real
important task in NIDS evaluation. Quality of data is influ- time performance with low false alarm rates.
enced by several factors, such as (i) source of data (should be • The IDS developer should choose the basic components,
from reliable and appropriate sources), (ii) selection of sample method(s), techniques or rule/signature/profile base to
(should be unbiased), (iii) sample size (neither over nor under- overcome four important limitations: subjective effective-
sampling), (iv) time of data (should be frequently updated ness, limited scalability, scenario dependent efficiency
real time data), (v) complexity of data (data should be simple and restricted security.
enough to be handled easily by the detection mechanism), and • The performance of a better IDS needs to be established
so on. both qualitatively and quantitatively.
6) Unknown attack detection: New vulnerabilities are • A better anomaly classification or identification method
evolving almost every day. An anomaly-based network in- enables us to tune it (the corresponding normal profiles,
trusion detection system should be capable of identifying thresholds, etc.) depending on the network scenario.
unknown attacks, in addition to known attacks. The IDS
should show consistent abilities of detecting unknown or even VII. O PEN I SSUES AND CHALLENGES
modified intrusions.
7) Profile Update: Once new vulnerabilities or exploits Although, many methods and systems have been developed
are discovered, signatures or profiles must be updated for by the research community, there are still a number of open
future detection. However, writing new or modified profiles research issues and challenges. The suitability of performance
or signatures without conflict is a challenge, considering the metrics is a commonly identified drawback in intrusion de-
current high-speed network scenario. tection. In evaluating IDSs, the three most important quali-
8) Stability: Any anomaly detection system should perform ties that need to be measured are completeness, correctness,
consistently in different network scenarios and in different and performance. The current state-of-the-art in intrusion
circumstances. It should consistently report identical events detection restricts evaluation of new systems to tests over
in a similar manner. Allowing the users to configure differ- incomplete datasets and micro-benchmarks that test narrowly
ent alerts to provide different messages in different network defined components of the system. A number of anomaly-
environments may lead to an unstable system. based systems have been tested using contrived datasets. Such
9) Information provided to Analyst: Alerts generated by evaluation is limited by the quality of the dataset that the
an IDS should be meaningful enough to clearly identify system is evaluated against. Construction of a dataset which is
the reasons behind the event to be raised, and the reasons unbiased, realistic and comprehensive is an extremely difficult
this event is of interest. It should also assist the analyst task.
in determining the relevance and appropriate reaction to a A formal proof of correctness [6] in the intrusion detection
particular alert. The alert should also specify the source of domain is exceptionally challenging and expensive. Therefore,
the alert and the target system. “pretty good assurance” presents a way in which systems can
10) Interoperability: An effective intrusion detection mech- be measured allowing fuzzy decisions, trade-offs, and priori-
anism is supposed to be capable of correlating information ties. Such a measure must take into consideration the amount
from multiple sources, such as system logs, other HIDSs, of work required to discover a vulnerability or weakness to
NIDSs, firewall logs and any other sources of information exploit for an attack and execute an attack on the system.
available. This helps in maintaining interoperability, while After a study of existing NIDSs, we find that it is still
installing a range of HIDSs or NIDSs from various vendors. extremely difficult to design a new NIDS to ensure robustness,
scalability and high performance. In particular, practitioners
VI. R ECOMMENDATIONS find it difficult to decide where to place the NIDS and how to
The following are some recommendations one needs to be best configure it for use within an environment with multiple
mindful of when developing a network anomaly detection stakeholders. We sort out some of the important issues as
method or a system. challenges and enumerate them below.
• Most existing IDSs for the wired environment work in (i) Runtime limitation presents an important challenge for
three ways: flow level traffic or packet level feature data a NIDS. Without losing any packets, a real time IDS
analysis, protocol analysis or payload inspection. Each of should be ideally able to capture and inspect each packet.
these categories has its own advantages and limitations. (ii) Most NIDSs and network intrusion detection methods
So, a hybridization of these (e.g., protocol level analysis depend on the environment. Ideally, a system or method
followed by flow level traffic analysis) may give better should be independent of the environment.
performance in terms of known (with high detection rate) (iii) The nature of anomalies keeps changing over time as
as well as unknown attack detection. intruders adapt their network attacks to evade existing
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 331

intrusion detection solutions. So, adaptability of a NIDS R EFERENCES


or detection method is necessary to update with the
current anomalies encountered in the local network or [1] A. Sundaram, “An introduction to intrusion detection,” Crossroads,
vol. 2, no. 4, pp. 3–7, April 1996.
the Internet. [2] J. P. Anderson, “Computer Security Threat Monitoring and Surveil-
(iv) Ideally, a NIDS or detection method should avoid a high lance,” James P Anderson Co, Fort Washington, Pennsylvania, Tech.
rate of false alarms. However, it is not possible to escape Rep., April 1980.
totally from false alarms, even though it needs to aim [3] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection : A
Survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 15:1–15:58,
for that in any environment and facilitate adaptability September 2009.
at runtime. This is another challenge for the NIDS [4] N. K. Ampah, C. M. Akujuobi, M. N. O. Sadiku, and S. Alam, “An
development community. intrusion detection technique based on continuous binary communica-
tion channels,” International J. Security and Networks, vol. 6, no. 2/3,
(v) Dynamic updation of profiles in anomaly-based NIDSs pp. 174–180, November 2011.
without conflict and without compromising performance [5] F. Y. Edgeworth, “On discordant observations,” Philosophy Mag.,
is an important task. The profile database needs to be vol. 23, no. 5, pp. 364–375, 1887.
[6] A. Patcha and J. M. Park, “An overview of anomaly detection tech-
updated whenever a new kind of attack is detected and niques: Existing solutions and latest technological trends,” Computer
addressed by the system. Networks, vol. 51, no. 12, pp. 3448–3470, 2007.
(vi) Preparing an unbiased network intrusion dataset with all [7] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Macia-Fernandez, and
normal variations in profiles is another challenging task. E. Vazquez, “Anomaly-based network intrusion detection : Techniques,
systems and challenges,” Computers & Security, vol. 28, no. 1-2, pp.
The number of normal instances is usually large and 18–28, 2009.
their proportion with attack instances is very skewed in [8] V. Hodge and J. Austin, “A survey of outlier detection methodologies,”
the existing publicly available intrusion datasets. Only a Artificial Intellligence Review, vol. 22, no. 2, pp. 85–126, 2004.
[9] T. Nguyen and G. Armitage, “A Survey of Techniques for Internet
few intrusion datasets with sufficient amount of attack Traffic Classification using Machine Learning,” IEEE Commun. Sur-
information are available publicly. Thus, there is an veys Tutorials, vol. 10, no. 4, pp. 56–76, 2008.
overarching need for benchmark intrusion datasets for [10] M. Agyemang, K. Barker, and R. Alhajj, “A comprehensive survey
of numeric and symbolic outlier mining techniques,” Intelligence Data
evaluating NIDSs and detection methods. Analysis, vol. 10, no. 6, pp. 521–538, 2006.
(vii) Reducing computational complexity in preprocessing, [11] J. Ma and S. Perkings, “Online novelty detection on temporal se-
training and deployment is another task that needs to quences,” in Proc. 9th ACM SIGKDD International Conference on
be addressed. Knowledge Discovery and Data Mining. ACM, 2003, pp. 613–618.
[12] D. Snyder, “Online intrusion detection using sequences of system
(viii) Developing an appropriate and fast feature selection calls,” Master’s thesis, Department of Computer Science, Florida State
method for each attack class is yet another challenge. University, 2001.
(ix) Selection of an appropriate number of non-correlated, [13] P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier
Detection. John Wiley & Sons, 1987.
unbiased classifiers from a pool of classifiers by gen- [14] V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley &
erating classifier hypothesis for building an effective Sons, 1994.
ensemble approach for network anomaly detection is [15] D. Hawkins, Identification of Outliers. New York: Chapman and Hall,
1980.
another challenge.
[16] R. J. Beckman and R. D. Cook, “Outliers,” Technometrics, vol. 25,
no. 2, pp. 119–149, 1983.
[17] Z. Bakar, R. Mohemad, A. Ahmad, and M. Andderis, “A comparative
VIII. C ONCLUDING R EMARKS study for outlier detection techniques in data mining,” in Proc. IEEE
Conference on Cybernetics and Intelligent Systems, 2006, pp. 1–6.
In this paper, we have examined the state-of-the-art in [18] P. Gogoi, D. K. Bhattacharyya, B. Borah, and J. K. Kalita, “A Survey
the modern anomaly-based network intrusion detection. The of Outlier Detection Methods in Network Anomaly Identification,”
discussion has emphasized two well-known criteria to classify Computer Journal, vol. 54, no. 4, pp. 570–588, April 2011.
[19] A. Callado, C. Kamienski, G. Szabo, B. Gero, J. Kelner, S. Fernandes,
and evaluate NIDSs: detection strategy and evaluation datasets. and D. Sadok, “A Survey on Internet Traffic Identification,” IEEE
We have also presented many detection methods, systems and Commun. Surveys Tutorials, vol. 11, no. 3, pp. 37–52, 2009.
tools. In addition, we have discussed several evaluation criteria [20] W. Zhang, Q. Yang, and Y. Geng, “A Survey of Anomaly Detection
for testing the performance of a detection method or system. Methods in Networks,” in Proc.International Symposium on Computer
Network and Multimedia Technology, January 2009, pp. 1–3.
A brief description of the different existing datasets and its [21] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller,
taxonomy is also provided. Finally, we outline several research “An Overview of IP Flow-Based Intrusion Detection,” IEEE Commun.
issues and challenges for future researchers and practitioners Surveys Tutorials, vol. 12, no. 3, pp. 343–356, quarter 2010.
[22] B. Sun, F. Yu, K. Wu, Y. Xiao, and V. C. M. Leung, “Enhancing
who may attempt to develop new detection methods and security using mobility-based anomaly detection in cellular mobile
systems for the latest network scenarios. networks,” IEEE Trans. Veh. Technol., vol. 55, no. 4, pp. 1385 –1396,
July 2006.
[23] B. Sun, L. Osborne, Y. Xiao, and S. Guizani, “Intrusion detection
ACKNOWLEDGMENT techniques in mobile ad hoc and wireless sensor networks,” IEEE
Wireless Commun., vol. 14, no. 5, pp. 56–63, October 2007.
This work is supported by Department of Information Tech- [24] B. Sun, Y. Xiao, and R. Wang, “Detection of Fraudulent Usage in
Wireless Networks,” IEEE Trans. Veh. Technol., vol. 56, no. 6, pp.
nology, MCIT and Council of Scientific & Industrial Research 3912–3923, November 2007.
(CSIR), Government of India. It is also partially supported [25] B. Sun, K. Wu, Y. Xiao, and R. Wang, “Integration of mobility and
by NSF (US) grants CNS-0851783 and CNS-1154342. The intrusion detection for wireless ad hoc networks,” International J.
authors are thankful to the funding agencies. The authors are Communication Systems, vol. 20, no. 6, pp. 695–721, June 2007.
[26] T. Peng, C. Leckie, and K. Ramamohanarao, “Survey of network-based
also thankful to the esteemed reviewers for their extensive defense mechanisms countering the DoS and DDoS problems,” ACM
comments to improve the quality of the article. Computing Surveys, vol. 39, no. 1, pp. 1–42, April 2007.
332 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

[27] M. Al-Kuwaiti, N. Kyriakopoulos, and S. Hussein, “A comparative [51] S. Boriah, V. Chandola, and V. Kumar, “Similarity measures for
analysis of network dependability, fault-tolerance, reliability, security, categorical data: A comparative evaluation,” in Proc. 8th SIAM In-
and survivability,” IEEE Commun. Surveys Tutorials, vol. 11, no. 2, ternational Conference on Data Mining, 2008, pp. 243–254.
pp. 106–124, April 2009. [52] G. Gan, C. Ma, and J. Wu, Data Clustering Theory, Algorithms and
[28] B. Donnet, B. Gueye, and M. A. Kaafar, “A Survey on Network Applications. SIAM, 2007.
Coordinates Systems, Design, and Security,” IEEE Commun. Surveys [53] C. C. Hsu and S. H. Wang, “An integrated framework for visualized
Tutorials, vol. 12, no. 4, pp. 488–503, October 2010. and exploratory pattern discovery in mixed data,” IEEE Trans. Knowl.
[29] S. X. Wu and W. Banzhaf, “The use of computational intelligence Data Eng., vol. 18, no. 2, pp. 161–173, 2005.
in intrusion detection systems: A review,” Applied Soft Computing, [54] M. V. Joshi, R. C. Agarwal, and V. Kumar, “Mining needle in a
vol. 10, no. 1, pp. 1–35, January 2010. haystack: classifying rare classes via two-phase rule induction,” in
[30] Y. Dong, S. Hsu, S. Rajput, and B. Wu, “Experimental Analysis of Proc. 7th ACM SIGKDD International Conference on Knowledge
Application Level Intrusion Detection Algorithms,” International J. Discovery and Data Mining. ACM, 2001, pp. 293–298.
Security and Networks, vol. 5, no. 2/3, pp. 198–205, 2010. [55] J. Theiler and D. M. Cai, “Resampling approach for anomaly detection
[31] M. Tavallaee, N. Stakhanova, and A. A. Ghorbani, “Toward credible in multispectral images,” in Proc. SPIE, vol. 5093. SPIE, 2003, pp.
evaluation of anomaly-based intrusion-detection methods,” IEEE Trans. 230–240.
Syst. Man Cybern. C Appl. Rev., vol. 40, no. 5, pp. 516–524, September [56] R. Fujimaki, T. Yairi, and K. Machida, “An approach to spacecraft
2010. anomaly detection problem using kernel feature space,” in Proc. 11th
[32] B. Daniel, C. Julia, J. Sushil, and W. Ningning, “ADAM: a testbed for ACM SIGKDD International Conference on Knowledge Discovery in
exploring the use of data mining in intrusion detection,” ACM SIGMOD Data Mining. USA: ACM, 2005, pp. 401–410.
Record, vol. 30, no. 4, pp. 15–24, 2001. [57] L. Portnoy, E. Eskin, and S. J. Stolfo, “Intrusion detection with
[33] Z. Zhang, J. Li, C. N. Manikopoulos, J. Jorgenson, and J. Ucles, unlabeled data using clustering,” in Proc. ACM Workshop on Data
“HIDE: a Hierarchical Network Intrusion Detection System Using Mining Applied to Security, 2001.
Statistical Preprocessing and Neural Network Classification,” in Proc. [58] H. H. Nguyen, N. Harbi, and J. Darmont, “An efficient local region
IEEE Man Systems and Cybernetics Information Assurance Workshop, and clustering-based ensemble system for intrusion detection,” in Proc.
2001. 15th Symposium on International Database Engineering & Applica-
[34] L. Ertoz, E. Eilertson, A. Lazarevic, P. Tan, V. Kumar, and J. Srivastava, tions. USA: ACM, 2011, pp. 185–191.
Data Mining - Next Generation Challenges and Future Directions. [59] M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent
MIT Press, 2004, ch. MINDS - Minnesota Intrusion Detection System. Data Analysis, vol. 1, pp. 131–156, 1997.
[35] M. Thottan and C. Ji, “Anomaly detection in IP networks,” IEEE Trans. [60] Y. Chen, Y. Li, X. Q. Cheng, and L. Guo, “Survey and taxonomy of
Signal Process., vol. 51, no. 8, pp. 2191–2204, 2003. feature selection algorithms in intrusion detection system,” in Proc. 2nd
[36] J. M. Estevez-Tapiador, P. Garcia-Teodoro, and J. E. Diaz-Verdejo, SKLOIS conference on Information Security and Cryptology. Berlin,
“Anomaly detection methods in wired networks : a survey and tax- Heidelberg: Springer-Verlag, 2006, pp. 153–167.
onomy,” Computer Communication, vol. 27, no. 16, pp. 1569–1584, [61] Y. Li, J. L. Wang, Z. Tian, T. Lu, and C. Young, “Building lightweight
October 2004. intrusion detection system using wrapper-based feature selection mech-
[37] A. Fragkiadakis, E. Tragos, and I. Askoxylakis, “A Survey on Security anisms,” Computers & Security, vol. 28, no. 6, pp. 466–475, 2009.
Threats and Detection Techniques in Cognitive Radio Networks,” IEEE [62] H. T. Nguyen, K. Franke, and S. Petrovic, “Towards a Generic Feature-
Commun. Surveys Tutorials, vol. PP, no. 99, pp. 1–18, January 2012. Selection Measure for Intrusion Detection,” in Proc. 20th International
Conference on Pattern Recognition, August 2010, pp. 1529–1532.
[38] R. Heady, G. Luger, A. Maccabe, and M. Servilla, “The Architecture
[63] A. H. Sung and S. Mukkamala, “Identifying Important Features for
of a Network Level Intrusion Detection System,” Computer Science
Intrusion Detection Using Support Vector Machines and Neural Net-
Department, University of New Mexico, Tech. Rep. TR-90, 1990.
works,” in Proc. Symposium on Applications and the Internet. USA:
[39] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood, “Selecting
IEEE CS, 2003, pp. 209–217.
Features for Intrusion Detection: A Feature Relevance Analysis on
[64] H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual
KDD 99 Intrusion Detection Datasets,” in Proc. 3rd Annual Conference
Information : Criteria of Max-Dependency, Max-Relevance, and Min-
on Privacy, Security and Trust, October 2005.
Redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8,
[40] A. A. Ghorbani, W. Lu, and M. Tavallaee, Network Intrusion Detection pp. 1226–1238, August 2005.
and Prevention : Concepts and Techniques, ser. Advances in Informa- [65] F. Amiri, M. M. R. Yousefi, C. Lucas, A. Shakery, and N. Yazdani,
tion Security. Springer-verlag, October 28 2009. “Mutual information-based feature selection for intrusion detection
[41] P. Ning and S. Jajodia, Intrusion Detection Techniques. H Bidgoli systems,” J. Network and Computer Applications, vol. 34, no. 4, pp.
(Ed.), The Internet Encyclopedia, 2003. 1184–1199, 2011.
[42] F. Wikimedia, “Intrusion detection system,” [66] J. Dunn, “Well separated clusters and optimal fuzzy partitions,” J.
http://en.wikipedia.org/wiki/Intrusion-detection system, Feb 2009. Cybernetics, vol. 4, pp. 95–104, 1974.
[43] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “Surveying Port [67] D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,”
Scans and Their Detection Methodologies,” The Computer Journal, IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, no. 2, pp. 224–227,
vol. 54, no. 10, pp. 1565–1581, October 2011. 1979.
[44] B. C. Park, Y. J. Won, M. S. Kim, and J. W. Hong, “Towards [68] L. Hubert and J. Schultz, “Quadratic assignment as a general data
automated application signature generation for traffic identification,” analysis strategy,” British J. Mathematical and Statistical Psychology,
in Proc. IEEE/IFIP Network Operations and Management Symposium: vol. 29, no. 2, pp. 190–241, 1976.
Pervasive Management for Ubiquitous Networks and Services, 2008, [69] F. B. Baker and L. J. Hubert, “Measuring the power of hierarchical
pp. 160–167. cluster analysis,” J. American Statistics Association, vol. 70, no. 349,
[45] V. Kumar, “Parallel and distributed computing for cybersecurity,” IEEE pp. 31–38, 1975.
Distributed Systems Online, vol. 6, no. 10, 2005. [70] F. J. Rohlf, “Methods of Comparing Classifications,” Annual Review
[46] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. of Ecology and Systematics, vol. 5, no. 1, pp. 101–113, 1974.
Addison-Wesley, 2005. [71] P. J. Rousseeuw, “Silhouettes : a graphical aid to the interpretation
[47] M. J. Lesot and M. Rifqi, “Anomaly-based network intrusion detection and validation of cluster analysis,” J. Computational and Applied
: Techniques, systems and challenges,” International J. Knowledge Mathematics, vol. 20, no. 1, pp. 53–65, 1987.
Engineering and Soft Data Paradigms, vol. 1, no. 1, pp. 63–84, 2009. [72] L. Goodman and W. Kruskal, “Measures of associations for cross-
[48] S. H. Cha, “Comprehensive Survey on Distance/Similarity Measures validations,” J. American Statistics Association, vol. 49, pp. 732–764,
between Probability Density Functions,” International J. Mathematical 1954.
Models and Methods in Applied Science, vol. 1, no. 4, pp. 300–307, [73] P. Jaccard, “The distribution of flora in the alpine zone,” New Phytol-
November 2007. ogist, vol. 11, no. 2, pp. 37–50, 1912.
[49] S. Choi, S. Cha, and C. C. Tappert, “A Survey of Binary Similarity and [74] W. M. Rand, “Objective criteria for the evaluation of clustering
Distance Measures,” J. Systemics, Cybernetics and Informatics, vol. 8, methods,” J. American Statistical Association, vol. 66, no. 336, pp.
no. 1, pp. 43–48, 2010. 846–850, 1971.
[50] M. J. Lesot, M. Rifqi, and H. Benhadda, “Similarity measures for [75] J. C. Bezdek, “Numerical taxonomy with fuzzy sets,” J. Mathematical
binary and numerical data: a survey,” International J. Knowledge Biology, vol. 1, no. 1, pp. 57–71, 1974.
Engineering and Soft Data Paradigms, vol. 1, no. 1, pp. 63–84, [76] , “Cluster Validity with fuzzy sets,” J. Cybernetics, vol. 3, no. 3,
December 2009. pp. 58–78, 1974.
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 333

[77] X. L. Xie and G. Beni, “A Validity measure for Fuzzy Clustering,” Networks. Washington, DC, USA: IEEE Computer Society, 2010, pp.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 4, pp. 841–847, 313–317.
1991. [100] I. Kang, M. K. Jeong, and D. Kong, “A differentiated one-class
[78] F. J. Anscombe and I. Guttman, “Rejection of outliers,” Technometrics, classification method with applications to intrusion detection,” Expert
vol. 2, no. 2, pp. 123–147, 1960. Systems with Applications, vol. 39, no. 4, pp. 3899–3905, March 2012.
[79] E. Eskin, “Anomaly detection over noisy data using learned probabil- [101] C. F. Tsai, Y. F. Hsu, C. Y. Lin, and W. Y. Lin, “Intrusion detection
ity distributions,” in Proc. 7th International Conference on Machine by machine learning: A review,” Expert Systems with Applications,
Learning. Morgan Kaufmann, 2000, pp. 255–262. vol. 36, no. 10, pp. 11 994–12 000, December 2009.
[80] M. Desforges, P. Jacob, and J. Cooper, “Applications of probability [102] T. Abbes, A. Bouhoula, and M. Rusinowitch, “Efficient decision tree
density estimation to the detection of abnormal conditions in engineer- for protocol analysis in intrusion detection,” International J. Security
ing,” in Proc. Institute of Mechanical Engineers, vol. 212, 1998, pp. and Networks, vol. 5, no. 4, pp. 220–235, December 2010.
687–703. [103] C. Wagner, J. François, R. State, and T. Engel, “Machine Learning
[81] C. Manikopoulos and S. Papavassiliou, “Network Intrusion and Fault Approach for IP-Flow Record Anomaly Detection,” in Proc. 10th
Detection: A Statistical Anomaly Approach,” IEEE Commun. Mag., International IFIP TC 6 conference on Networking - Volume Part I,
vol. 40, no. 10, pp. 76–82, October 2002. 2011, pp. 28–39.
[82] P. K. Chan, M. V. Mahoney, and M. H. Arshad, “A machine learning [104] B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and
approach to anomaly detection,” Department of Computer Science, R. C. Williamson, “Estimating the Support of a High-Dimensional
Florida Institute of Technology, Tech. Rep. CS-2003-06, 2003. Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, July
[83] M. V. Mahoney and P. K. Chan, “Learning rules for anomaly detection 2001.
of hostile network traffic,” in Proc. 3rd IEEE International Conference [105] M. Y. Su, G. J. Yu, and C. Y. Lin, “A real-time network intrusion
on Data Mining. Washington: IEEE CS, 2003. detection system for large-scale attacks based on an incremental mining
[84] K. Wang and S. J. Stolfo, “Anomalous Payload-Based Network In- approach,” Computers & Security, vol. 28, no. 5, pp. 301–309, 2009.
trusion Detection,” in Proc. Recent Advances in Intrusion Detection. [106] L. Khan, M. Awad, and B. Thuraisingham, “A New Intrusion Detection
springer, 2004, pp. 203–222. System Using Support Vector Machines and Hierarchical Clustering,”
[85] X. Song, M. Wu, C. Jermaine, and S. Ranka, “Conditional Anomaly The VLDB Journal, vol. 16, no. 4, pp. 507–521, October 2007.
Detection,” IEEE Trans. Knowl. Data Eng., vol. 19, pp. 631–645, 2007. [107] Z. Muda, W. Yassin, M. N. Sulaiman, and N. I. Udzir, “A K-means
[86] P. Chhabra, C. Scott, E. D. Kolaczyk, and M. Crovella, “Distributed and naive bayes learning approach for better intrusion detection,”
Spatial Anomaly Detection,” in Proc. 27th IEEE International Confer- Information Technology J., vol. 10, no. 3, pp. 648–655, 2011.
ence on Computer Communications, 2008, pp. 1705–1713. [108] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1,
[87] W. Lu and A. A. Ghorbani, “Network Anomaly Detection Based on no. 1, pp. 81–106, March 1986.
Wavelet Analysis,” EURASIP J. Advances in Signal Processing, vol. [109] H. Yu and S. Kim, Handbook of Natural Computing. Springer, 2003,
2009, no. 837601, January 2009. ch. SVM Tutorial - Classification, Regression and Ranking.
[88] F. S. Wattenberg, J. I. A. Perez, P. C. Higuera, M. M. Fernandez, and [110] L. V. Kuang, “DNIDS: A Dependable Network Intrusion Detection
I. A. Dimitriadis, “Anomaly Detection in Network Traffic Based on System Using the CSI-KNN Algorithm,” Master’s thesis, Queen’s
Statistical Inference and α-Stable Modeling,” IEEE Trans. Dependable University Kingston, Ontario, Canada, Sep 2007.
Secure Computing, vol. 8, no. 4, pp. 494–509, July/August 2011. [111] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “RODD:
[89] M. Yu, “A Nonparametric Adaptive CUSUM Method And Its Appli- An Effective Reference-Based Outlier Detection Technique for Large
cation In Network Anomaly Detection,” International J. Advancements Datasets,” in Advanced Computing. Springer, 2011, vol. 133, pp.
in Computing Technology, vol. 4, no. 1, pp. 280–288, 2012. 76–84.
[90] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Clas- [112] W. Lee, S. J. Stolfo, and K. W. Mok, “Adaptive Intrusion Detection
sifiers,” Machine Learning, vol. 29, no. 2-3, pp. 131–163, November : A Data Mining Approach,” Artificial Intelligence Review, vol. 14,
1997. no. 6, pp. 533–567, 2000.
[91] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, “Bayesian event [113] M. Roesch, “Snort - Lightweight Intrusion Detection for Networks,” in
classification for intrusion detection,” in Proc. 19th Annual Computer Proc. 13th USENIX Conference on System Administration, Washington,
Security Applications Conference, 2003. 1999, pp. 229–238.
[92] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association [114] B. Neumann, “Knowledge Management and Assistance Systems,”
Rules in Large Databases,” in Proc. 20th International Conference on http://kogs-www.informatik.uni-hamburg.de/ neumann/, 2007.
Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann, [115] Y. F. Zhang, Z. Y. Xiong, and X. Q. Wang, “Distributed intrusion
1994, pp. 487–499. detection based on clustering,” in Proc. International Conference on
[93] N. Subramoniam, P. S. Pawar, M. Bhatnagar, N. S. Khedekar, S. Gun- Machine Learning and Cybernetics, vol. 4, August 2005, pp. 2379–
tupalli, N. Satyanarayana, V. A. Vijayakumar, P. K. Ampatt, R. Ranjan, 2383.
and P. S. Pandit, “Development of a Comprehensive Intrusion Detection [116] K. Leung and C. Leckie, “Unsupervised anomaly detection in net-
System - Challenges and Approaches,” in Proc. 1st International work intrusion detection using clusters,” in Proc. 28th Australasian
Conference on Information Systems Security, Kolkata, India, 2005, pp. conference on Computer Science - Volume 38. Darlinghurst, Australia,
332–335. Australia: Australian Computer Society, Inc., 2005, pp. 333–342.
[94] S. Song, L. Ling, and C. N. Manikopoulo, “Flow-based Statistical [117] C. Zhang, G. Zhang, and S. Sun, “A Mixed Unsupervised Clustering-
Aggregation Schemes for Network Anomaly Detection,” in Proc. IEEE Based Intrusion Detection Model,” in Proc. 3rd International Confer-
International Conference on Networking, Sensing, 2006. ence on Genetic and Evolutionary Computing. USA: IEEE CS, 2009,
[95] H. Tong, C. Li, J. He, J. Chen, Q. A. Tran, H. X. Duan, and X. Li, pp. 426–428.
“Anomaly Internet Network Traffic Detection by Kernel Principle [118] P. Casas, J. Mazel, and P. Owezarski, “Unsupervised Network Intru-
Component Classifier,” in Proc. 2nd International Symposium on sion Detection Systems: Detecting the Unknown without Knowledge,”
Neural Networks, vol. LNCS. 3498, 2005, pp. 476–481. Computer Communications, vol. 35, no. 7, pp. 772–783, April 2012.
[96] S. R. Gaddam, V. V. Phoha, and K. S. Balagani, “K-Means+ID3: A [119] K. Sequeira and M. Zaki, “ADMIT: anomaly-based data mining for
Novel Method for Supervised Anomaly Detection by Cascading K- intrusions,” in Proc. eighth ACM SIGKDD international conference on
Means Clustering and ID3 Decision Tree Learning Methods,” IEEE Knowledge discovery and data mining. New York, NY, USA: ACM,
Trans. Knowl. Data Eng., vol. 19, no. 3, pp. 345–354, Mar 2007. 2002, pp. 386–395.
[97] K. Das, J. Schneider, and D. B. Neill, “Anomaly pattern detection [120] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, Applications
in categorical datasets,” in Proc. 14th ACM SIGKDD International of Data Mining in Computer Security. Kluwer Academic, 2002, ch.
Conference on Knowledge Discovery and Data Mining. USA: ACM, A geometric framework for unsupervised anomaly detection: Detecting
2008, pp. 169–176. intrusions in unlabeled data.
[98] W. Lu and H. Tong, “Detecting Network Anomalies Using CUSUM [121] Z. Zhuang, Y. Li, and Z. Chen, “Enhancing Intrusion Detection System
and EM Clustering,” in Proc. 4th International Symposium on Ad- with proximity information,” International J. Security and Networks,
vances in Computation and Intelligence. Springer-verlag, 2009, pp. vol. 5, no. 4, pp. 207–219, December 2010.
297–308. [122] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “An effective
[99] M. A. Qadeer, A. Iqbal, M. Zahid, and M. R. Siddiqui, “Network unsupervised network anomaly detection method,” in Proc. Interna-
Traffic Analysis and Intrusion Detection Using Packet Sniffer,” in tional Conference on Advances in Computing, Communications and
Proc. 2nd International Conference on Communication Software and Informatics. New York, NY, USA: ACM, 2012, pp. 533–539.
334 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

[123] M. E. Otey, A. Ghoting, and S. Parthasarathy, “Fast distributed outlier [147] M. Mohajerani, A. Moeini, and M. Kianie, “NFIDS: A Neuro-Fuzzy
detection in mixed-attribute data sets,” Data Mining and Knowledge Intrusion Detection System,” in Proc. 10th IEEE International Confer-
Discovery, vol. 12, no. 2-3, pp. 203–228, 2006. ence on Electronics, Circuits and Systems, vol. 1, December 2003, pp.
[124] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “NADO: network 348–351.
anomaly detection using outlier approach,” in Proc. ACM International [148] Z. Pawlak, “Rough sets,” International J. Parallel Programming,
Conference on Communication, Computing & Security. USA: ACM, vol. 11, no. 5, pp. 341–356, 1982.
2011, pp. 531–536. [149] Z. Cai, X. Guan, P. Shao, Q. Peng, and G. Sun, “A rough set theory
[125] S. Jiang, X. Song, H. Wang, J.-J. Han, and Q.-H. Li, “A clustering- based method for anomaly intrusion detection in computer network
based method for unsupervised intrusion detections,” Pattern Recogni- systems,” Expert Systems, vol. 20, no. 5, pp. 251–259, November 2003.
tion Letters, vol. 27, no. 7, pp. 802–810, May 2006. [150] W. Chimphlee, A. H. Abdullah, M. S. M. Noor, S. Srinoy, and S. Chim-
[126] Z. Chen and C. Chen, “A Closed-Form Expression for Static Worm- phlee, “Anomaly-Based Intrusion Detection using Fuzzy Rough Clus-
Scanning Strategies,” in Proc. IEEE International Conference on tering,” in Proc. International Conference on Hybrid Information
Communications. Beijing, China: IEEE CS, May 2008, pp. 1573– Technology, vol. 01. Washington, DC, USA: IEEE Computer Society,
1577. 2006, pp. 329–334.
[127] B. Balajinath and S. V. Raghavan, “Intrusion detection through learning [151] A. O. Adetunmbi, S. O. Falaki, O. S. Adewale, and B. K. Alese,
behavior model,” Computer Communications, vol. 24, no. 12, pp. “Network Intrusion Detection based on Rough Set and k-Nearest
1202–1212, July 2001. Neighbour,” International J. Computing and ICT Research, vol. 2,
[128] M. S. A. Khan, “Rule based Network Intrusion Detection using Genetic no. 1, pp. 60–66, 2008.
Algorithm,” International J. Computer Applications, vol. 18, no. 8, pp. [152] R. C. Chen, K. F. Cheng, Y. H. Chen, and C. F. Hsieh, “Using Rough
26–29, March 2011. Set and Support Vector Machine for Network Intrusion Detection
[129] S. Haykin, Neural Networks. New Jersey: Prentice Hall, 1999. System,” in Proc. First Asian Conference on Intelligent Information
[130] M. Amini, R. Jalili, and H. R. Shahriari, “RT-UNNID: A practical solu- and Database Systems. Washington, DC, USA: IEEE Computer
tion to real-time network-based intrusion detection using unsupervised Society, 2009, pp. 465–470.
neural networks,” Computers & Security, vol. 25, no. 6, pp. 459–468, [153] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization
2006. by a colony of cooperating agents,” IEEE Trans. Syst. Man Cybern. B,
[131] G. Carpenter and S. Grossberg, “Adaptive resonance theory,” CAS/CNS Cybern., vol. 26, no. 1, pp. 29–41, 1996.
Technical Report Series, no. 008, 2010. [154] H. H. Gao, H. H. Yang, and X. Y. Wang, “Ant colony optimization
[132] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, based network intrusion feature selection and detection,” in Proc.
pp. 1464–1480, 1990. International Conference on Machine Learning and Cybernetics, vol. 6,
[133] J. Cannady, “Applying CMAC-Based On-Line Learning to Intrusion aug. 2005, pp. 3871–3875.
Detection,” in Proc. IEEE-INNS-ENNS International Joint Conference [155] A. Visconti and H. Tahayori, “Artificial immune system based on
on Neural Networks, vol. 5, 2000, pp. 405–410. interval type-2 fuzzy set paradigm,” Applied Soft Computing, vol. 11,
[134] S. C. Lee and D. V. Heinbuch, “Training a neural-network based no. 6, pp. 4055–4063, September 2011.
intrusion detector to recognize novel attacks,” IEEE Trans. Syst. Man [156] S. Noel, D. Wijesekera, and C. Youman, “Modern Intrusion Detection,
Cybern. A, vol. 31, no. 4, pp. 294–299, 2001. Data Mining, and Degrees of Attack Guilt,” in Proc. International
[135] G. Liu, Z. Yi, and S. Yang, “A hierarchical intrusion detection model Conference on Applications of Data Mining in Computer Security.
based on the PCA neural networks,” Neurocomputing, vol. 70, no. 7-9, Springer, 2002.
pp. 1561–1568, 2007. [157] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang,
[136] J. Sun, H. Yang, J. Tian, and F. Wu, “Intrusion Detection Method Based and et al., “Specification-based anomaly detection: a new approach
on Wavelet Neural Network,” in Proc. 2nd International Workshop on for detecting network intrusions,” in Proc. 9th ACM Conference on
Knowledge Discovery and Data Mining. USA: IEEE CS, 2009, pp. Computer and Communications Security, 2002, pp. 265–274.
851–854. [158] X. Xu, “Sequential anomaly detection based on temporal-difference
[137] H. Yong and Z. X. Feng, “Expert System Based Intrusion Detection learning: Principles, models and case studies,” Applied Soft Computing,
System,” in Proc. International Conference on Information Manage- vol. 10, no. 3, pp. 859–867, 2010.
ment, Innovation Management and Industrial Engineering, vol. 4, [159] A. Prayote, “Knowledge Based Anomaly Detection,” Ph.D. disserta-
November 2010, pp. 404 –407. tion, School of Computer Science and Egineering, The University of
[138] A. Parlos, K. Chong, and A. Atiya, “Application of the recurrent New South Wales, November 2007.
multilayer perceptron in modeling complex process dynamics,” IEEE [160] K. Ilgun, R. A. Kemmerer, and P. A. Porras, “State transition analysis:
Trans. Neural Netw., vol. 5, no. 2, pp. 255–266, 1994. A rule-based intrusion detection approach,” IEEE Trans. Software Eng.,
[139] K. Labib and R. Vemuri, “NSOM: A Tool To Detect Denial Of Service vol. 21, no. 3, pp. 181–199, 1995.
Attacks Using Self-Organizing Maps,” Department of Applied Science [161] D. E. Denning and P. G. Neumann, “Requirements and model for IDES
University of California, Davis Davis, California, U.S.A., Tech. Rep., a real-time intrusion detection system,” Computer Science Laboratory,
2002. SRI International, USA, Tech. Rep. 83F83-01-00, 1985.
[140] D. Bolzoni, S. Etalle, P. H. Hartel, and E. Zambon, “POSEIDON: a [162] D. Anderson, T. F. Lunt, H. Javitz, A. Tamaru, and A. Valdes,
2-tier Anomaly-based Network Intrusion Detection System,” in Proc. “Detecting unusual program behaviour using the statistical component
4th IEEE International Workshop on Information Assurance, 2006, pp. of the next-generation intrusion detection expert system (NIDES),”
144–156. Computer Science Laboratory, SRI International, USA, Tech. Rep.
[141] M. V. Mahoney and P. K. Chan, “PHAD: Packet Header Anomaly SRIO-CSL-95-06, 1995.
Detection for Identifying Hostile Network Traffic,” Dept. of Computer [163] N. G. Duffield, P. Haffner, B. Krishnamurthy, and H. Ringberg, “Rule-
Science, Florida Tech, Tech. Rep. cs-2001-04, 2001. Based Anomaly Detection on IP Flows,” in Proc. 28th IEEE Interna-
[142] J. E. Dickerson, “Fuzzy network profiling for intrusion detection,” tional Conference on Computer Communications, Joint Conference of
in Proc. 19th International Conference of the North American Fuzzy the IEEE Computer and Communications Societies. Rio de Janeiro,
Information Processing Society, Atlanta, July 2000, pp. 301–306. Brazil: IEEE press, 2009, pp. 424–432.
[143] F. Geramiraz, A. S. Memaripour, and M. Abbaspour, “Adaptive [164] R. E. Schapire, “A brief introduction to boosting,” in Proc. 16th Inter-
Anomaly-Based Intrusion Detection System Using Fuzzy Controller,” national Joint Conference on Artificial Intelligence, Morgan Kaufmann,
International Journal of Network Security, vol. 14, no. 6, pp. 352–361, 1999, pp. 1401–1406.
2012. [165] A. Prayote and P. Compton, “Detecting anomalies and intruders,” AI
[144] A. Tajbakhsh, M. Rahmati, and A. Mirzaei, “Intrusion detection using 2006: Advances in Artificial Intelligence, pp. 1084–1088, 2006.
fuzzy association rules,” Applied Soft Computing, vol. 9, no. 2, pp. [166] G. Edwards, B. Kang, P. Preston, and P. Compton, “Prudent expert
462–469, March 2009. systems with credentials: Managing the expertise of decision support
[145] S. Mabu, C. Chen, N. Lu, K. Shimada, and K. Hirasawa, “An systems,” International journal of biomedical computing, vol. 40, no. 2,
Intrusion-Detection Model Based on Fuzzy Class-Association-Rule pp. 125–132, 1995.
Mining Using Genetic Network Programming,” IEEE Trans. Syst. Man [167] W. Scheirer and M. C. Chuah, “Syntax vs. semantics : competing
Cybern. Part C Appl. Rev., vol. 41, no. 1, pp. 130–139, 2011. approaches to dynamic network intrusion detection,” International
[146] J. Q. Xian, F. H. Lang, and X. L. Tang, “A novel intrusion detection Journal Securrity and Networks, vol. 3, no. 1, pp. 24–35, December
method based on clonal selection clustering algorithm,” in Proc. Inter- 2008.
national Conference on Machine Learning and Cybernetics. USA: [168] P. Naldurg, K. Sen, and P. Thati, “A Temporal Logic Based Framework
IEEE Press, 2005, vol. 6. for Intrusion Detection,” in Proc. 24th IFIP WG 6.1 International
BHUYAN et al.: NETWORK ANOMALY DETECTION: METHODS, SYSTEMS AND TOOLS 335

Conference on Formal Techniques for Networked and Distributed ence on Computational Science and Optimization - Volume 02. USA:
Systems, 2004, pp. 359–376. IEEE CS, 2010, pp. 410–414.
[169] J. M. Estevez-Tapiador, P. Garcya-Teodoro, and J. E. Dyaz-Verdejo, [192] D. Ariu, R. Tronci, and G. Giacinto, “HMMPayl: An intrusion detec-
“Stochastic protocol modeling for anomaly based network intrusion tion system based on Hidden Markov Models,” Computers & Security,
detection,” in Proc. 1st International Workshop on Information Assur- vol. 30, no. 4, pp. 221–241, 2011.
ance. IEEE CS, 2003, pp. 3–12. [193] M. Arumugam, P. Thangaraj, P. Sivakumar, and P. Pradeepkumar, “Im-
[170] A. Shabtai, U. Kanonov, and Y. Elovici, “Intrusion detection for mobile plementation of two class classifiers for hybrid intrusion detection,” in
devices using the knowledge-based, temporal abstraction method,” J. Proc. International Conference on Communication and Computational
System Software, vol. 83, no. 8, pp. 1524–1537, August 2010. Intelligence, December 2010, pp. 486–490.
[171] S. S. Hung and D. S. M. Liu, “A user-oriented ontology-based approach [194] J. Zhang and M. Zulkernine, “A Hybrid Network Intrusion Detection
for network intrusion detection,” Computer Standards & Interfaces, Technique Using Random Forests,” in Proc. 1st International Confer-
vol. 30, no. 1-2, pp. 78–88, January 2008. ence on Availability, Reliability and Security. USA: IEEE CS, 2006,
[172] R. Polikar, “Ensemble based systems in decision making,” IEEE pp. 262–269.
Circuits Syst. Mag., vol. 6, no. 3, pp. 21–45, 2006. [195] M. A. Aydin, A. H. Zaim, and K. G. Ceylan, “A hybrid intrusion
[173] A. Borji, “Combining heterogeneous classifiers for network intrusion detection system design for computer network security,” Computers &
detection,” in Proc. 12th Asian Computing Science Conference on Electrical Engineering, vol. 35, no. 3, pp. 517–526, May 2009.
Advances in Computer Science: Computer and Network Security. [196] M. Panda, A. Abraham, and M. R. Patra, “Hybrid intelligent systems
Springer, 2007, pp. 254–260. for detecting network intrusions,” Computer Physics Communications,
[174] G. Giacinto, R. Perdisci, M. D. Rio, and F. Roli, “Intrusion detection vol. Early, 2012.
in computer networks by a modular ensemble of one-class classifiers,” [197] A. Herrero, M. Navarro, E. Corchado, and V. Julian, “RT-MOVICAB-
Information Fusion, vol. 9, no. 1, pp. 69–82, January 2008. IDS: Addressing real-time intrusion detection,” Future Generation
[175] L. Rokach, “Ensemble-based classifiers,” Artificial Intelligence Review, Computer Systems, vol. 29, no. 1, pp. 250–261, 2011.
vol. 33, no. 1-2, pp. 1–39, February 2010. [198] M. E. Locasto, K. Wang, A. D. Keromytis, and S. J. Stolfo, “FLIPS:
[176] K. Noto, C. Brodley, and D. Slonim, “Anomaly Detection Using an Hybrid Adaptive Intrusion Prevention,” in Recent Advances in Intrusion
Ensemble of Feature Models,” in Proc. IEEE International Conference Detection, 2005, pp. 82–101.
on Data Mining. USA: IEEE CS, 2010, pp. 953–958. [199] S. Peddabachigari, A. Abraham, C. Grosan, and J. Thomas, “Modeling
[177] P. M. Mafra, V. Moll, J. D. S. Fraga, and A. O. Santin, “Octopus-IIDS: intrusion detection system using hybrid intelligent systems,” J. Network
An Anomaly Based Intelligent Intrusion Detection System,” in Proc. and Computer Applications, vol. 30, no. 1, pp. 114–132, January 2007.
IEEE Symposium on Computers and Communications. USA: IEEE [200] J. Zhang, M. Zulkernine, and A. Haque, “Random-Forests-Based
CS, 2010, pp. 405–410. Network Intrusion Detection Systems,” IEEE Trans. Syst. Man Cybern.
C, vol. 38, no. 5, pp. 649–659, 2008.
[178] S. Chebrolu, A. Abraham, and J. P. Thomas, “Feature deduction
and ensemble design of intrusion detection systems,” Computers & [201] X. Tong, Z. Wang, and H. Yu, “A research using hybrid RBF/Elman
neural networks for intrusion detection system secure model,” Com-
Security, vol. 24, no. 4, pp. 295–307, 2005.
puter Physics Communications, vol. 180, no. 10, pp. 1795–1801, 2009.
[179] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and
[202] X. Yu, “A New Model of Intelligent Hybrid Network Intrusion De-
Regression Trees. Monterey, CA: Wadsworth and Brooks, 1984.
tection System,” in Proc. International Conference on Bioinformatics
[180] R. Perdisci, G. Gu, and W. Lee, “Using an Ensemble of One- and Biomedical Technology. IEEE CS, 2010, pp. 386–389.
Class SVM Classifiers to Harden Payload-based Anomaly Detection
[203] S. Selim, M. Hashem, and T. M. Nazmy, “Hybrid Multi-level Intrusion
Systems,” in Proc. 6th International Conference on Data Mining.
Detection System,” International J. Computer Science and Information
USA: IEEE CS, 2006, pp. 488–498.
Security, vol. 9, no. 5, pp. 23–29, 2011.
[181] G. Folino, C. Pizzuti, and G. Spezzano, “An ensemble-based evolution- [204] A. Soule, K. Salamatian, and N. Taft, “Combining filtering and statis-
ary framework for coping with distributed intrusion detection,” Genetic tical methods for anomaly detection,” in Proc. 5th ACM SIGCOMM
Programming and Evolvable Machines, vol. 11, no. 2, pp. 131–146, conference on Internet Measurement. USA: ACM, 2005, pp. 1–14.
June 2010. [205] KDDcup99, “Knowledge discovery in databases DARPA archive,”
[182] M. Rehak, M. Pechoucek, P. Celeda, J. Novotny, and P. Minarik, http://www.kdd.ics.uci.edu/databases/kddcup99/
“CAMNEP: Agent-based Network Intrusion Detection System,” in task.html, 1999.
Proc. 7th International Joint Conference on Autonomous Agents and [206] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. K. Chan, “Cost-
Multiagent Systems: Industrial Track. Richland, SC: IFAAMS, 2008, Based Modeling for Fraud and Intrusion Detection: Results from the
pp. 133–136. JAM Project,” in Proc. DARPA Information Survivability Conference
[183] R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, and W. Lee, “McPAD: A and Exposition, vol. 2. USA: IEEE CS, 2000, pp. 130–144.
multiple classifier system for accurate payload-based anomaly detec- [207] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
tion,” Computer Networks, vol. 53, no. 6, pp. 864–881, April 2009. analysis of the KDD CUP 99 data set,” in Proc. 2nd IEEE International
[184] W. Khreich, E. Granger, A. Miri, and R. Sabourin, “Adaptive ROC- Conference on Computational Intelligence for Security and Defense
based ensembles of HMMs applied to anomaly detection,” Pattern Applications. USA: IEEE Press, 2009, pp. 53–58.
Recognition, vol. 45, no. 1, pp. 208–230, January 2012. [208] NSL-KDD, “NSL-KDD data set for network-based intrusion detection
[185] G. Giacinto, F. Roli, and L. Didaci, “Fusion of multiple classifiers for systems,” http://iscx.cs.unb.ca/NSL-KDD/, March 2009.
intrusion detection in computer networks,” Pattern Recognition Letters, [209] I. S. T. G. MIT Lincoln Lab, “DARPA Intrusion Detection Data Sets,”
vol. 24, no. 12, pp. 1795–1803, August 2003. http://www.ll.mit.edu/mission/communications/
[186] J. Shifflet, “A Technique Independent Fusion Model For Network ist/corpora/ideval/data/2000data.html, March 2000.
Intrusion Detection,” in Proc. Midstates Conference on Undergraduate [210] Defcon, “The Shmoo Group,” http://cctf.shmoo.com/, 2011.
Research in Computer Science and Mathematics, vol. 3, 2005, pp. 13– [211] CAIDA, “The cooperative Analysis for Internet Data Analysis,”
19. http://www.caida.org, 2011.
[187] D. Parikh and T. Chen, “Data Fusion and Cost Minimization for [212] LBNL, “Lawrence Berkeley National Laboratory and ICSI,
Intrusion Detection,” IEEE Trans. Inf. For. Security, vol. 3, no. 3, pp. LBNL/ICSI Enterprise Tracing Project,” http://www.icir.org/enterprise-
381–389, 2008. tracing/, 2005.
[188] L. Zhi-dong, Y. Wu, W. Wei, and M. Da-peng, “Decision-level fusion [213] UNIBS, “University of Brescia dataset,”
model of multi-source intrusion detection alerts,” J. Communications, http://www.ing.unibs.it/ntw/tools/traces/, 2009.
vol. 32, no. 5, pp. 121–128, 2011. [214] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Towards
[189] R. Yan and C. Shao, “Hierarchical Method for Anomaly Detection and developing a systematic approach to generate benchmark datasets for
Attack Identification in High-speed Network,” Information Technology intrusion detection,” Computers & Security, vol. 31, no. 3, pp. 357–
J., vol. 11, no. 9, pp. 1243–1250, 2012. 374, 2012.
[190] V. Chatzigiannakis, G. Androulidakis, K. Pelechrinis, S. Papavassil- [215] P. Gogoi, M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita,
iou, and V. Maglaris, “Data fusion algorithms for network anomaly “Packet and Flow Based Network Intrusion Datasets,” in Proc. 5th
detection: classification and evaluation,” in Proc. 3rd International International Conference on Contemporary Computing, vol. LNCS-
Conference on Networking and Services. Greece: IEEE CS, 2007, CCIS 306. Springer, August 6-8 2012, pp. 322–334.
pp. 50–57. [216] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “AOCD :
[191] W. Gong, W. Fu, and L. Cai, “A Neural Network Based Intrusion An Adaptive Outlier Based Coordinated Scan Detection Approach,”
Detection Data Fusion Model,” in Proc. 3rd International Joint Confer- International J. Network Security, vol. 14, no. 6, pp. 339–351, 2012.
336 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 16, NO. 1, FIRST QUARTER 2014

[217] C. Satten, “Lossless Gigabit Remote Packet Capture With Linux,” [231] R. Sekar, Y. Guang, S. Verma, and T. Shanbhag, “A high-performance
http://staff.washington.edu/corey/gulp/, 2007. network intrusion detection system,” in Proc. 6th ACM Conference on
[218] NFDUMP, “NFDUMP Tool,” http://nfdump.sourceforge.net/, 2011. Computer and Communications Security. USA: ACM, 1999, pp. 8–17.
[219] M. V. Mahoney and P. K. Chan, “An Analysis of the 1999
DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly
Detection,” in Proc. 6th International Symposium on Recent Advances
in Intrusion Detection. Springer, 2003, pp. 220–237. Monowar Hussain Bhuyan received his M.Tech.
[220] J. McHugh, “Testing Intrusion detection systems: a critique of the 1998 in Information Technology from the Department of
and 1999 DARPA intrusion detection system evaluations as performed Computer Science and Engineering, Tezpur Univer-
by Lincoln Laboratory,” ACM Trans. Inf. System Security, vol. 3, no. 4, sity, Assam, India in 2009. Currently, he is pursuing
pp. 262–294, November 2000. his Ph.D. in Computer Science and Engineering
[221] P. Mell, V. Hu, R. Lippmann, J. Haines, and M. Zissman, from the same university. He is a life member of
“An Overview of Issues in Testing Intrusion Detection Systems,” IETE, India. His research areas include machine
http://citeseer.ist.psu.edu/621355.html, 2003. learning, computer and network security, pattern
[222] J. Xu and C. R. Shelton, “Intrusion Detection using Continuous Time recognition. He has published 15 papers in interna-
Bayesian Networks,” J. Artificial Intelligence Research, vol. 39, pp. tional journals and referred conference proceedings.
745–774, 2010.
[223] S. Axelsson, “The base-rate fallacy and the difficulty of intrusion
detection,” ACM Trans. Inf. System Security, vol. 3, no. 3, pp. 186–205,
August 2000.
[224] R. P. Lippmann, D. J. Fried, I. Graf, J. Haines, K. Kendall, D. McClung, Dhruba K. Bhattacharyya received his Ph.D.
D. Weber, S. W. D. Wyschogord, R. K. Cunningham, and M. A. in Computer Science from Tezpur University in
1999. He is a Professor in the Computer Science
Zissman, “Evaluating Intrusion Detection Systems: The 1998 DARPA
Offline Intrusion Detection Evaluation,” in Proc. DARPA Information & Engineering Department at Tezpur University.
Survivability Conference and Exposition, January 2000, pp. 12–26. His research areas include Data Mining, Network
Security and Content-based Image Retrieval. Prof.
[225] M. V. Joshi, R. C. Agarwal, and V. Kumar, “Predicting rare classes
Bhattacharyya has published 150+ research papers
: can boosting make any weak learner strong?” in Proc. 8th ACM
in the leading international journals and conference
SIGKDD International Conference on Knowledge Discovery and Data
proceedings. In addition, Dr Bhattacharyya has writ-
Mining. USA: ACM, 2002, pp. 297–306.
ten/edited 8 books. He is a Programme Commit-
[226] P. Dokas, L. Ertoz, A. Lazarevic, J. Srivastava, and P. N. Tan, “Data tee/Advisory Body member of several international
Mining for Network Intrusion Detection,” in Proc. NSF Workshop on conferences/workshops.
Next Generation Data Mining, November 2002.
[227] Y. Wang, Statistical Techniques for Network Security : Modern
Statistically-Based Intrusion Detection and Protection. Hershey, PA: Jugal K. Kalita is a professor of Computer Sci-
Information Science Reference, IGI Publishing, 2008. ence at the University of Colorado at Colorado
[228] S. M. Weiss and T. Zhang, The handbook of data mining. Lawrence Springs. He received his Ph.D. from the University
Erlbaum Assoc Inc, 2003, ch. Performance Alanysis and Evaluation, of Pennsylvania. His research interests are in natu-
pp. 426–439. ral language processing, machine learning, artificial
[229] F. J. Provost and T. Fawcett, “Robust Classification for Imprecise intelligence and bioinformatics. He has published
Environments,” Machine Learning, vol. 42, no. 3, pp. 203–231, 2001. 120 papers in international journals and referred
[230] R. A. Maxion and R. R. Roberts, “Proper Use of ROC Curves in In- conference proceedings and has written two books.
trusion/Anomaly Detection,” School of Computing Science, University
of Newcastle upon Tyne, Tech. Rep. CS-TR-871, November 2004.

You might also like