Nothing Special   »   [go: up one dir, main page]

IDSV: Intrusion Detection Algorithm Based On Statistics Variance Method in User Transmission Behavior

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2010 International Conference on Computational and Information Sciences

IDSV: Intrusion Detection Algorithm based on


Statistics Variance Method in User Transmission
Behavior

TAO Jun LIN Hui LIU Chunlin


School of Business, Nanjing University, 210093 School of Business, Nanjing University, Nanjing 210093,
School of Computer Sci. & Engr., Southeast University China
Nanjing 210093, China
juntao@seu.edu.cn

Abstract—With the growing diverse demands for Internet


neural network[5, 6] which are independent on characters of
applications, network security issues become more acute. To attack, are able to find unknown attacks and are suitable to
address the appropriate network security from network intrusion detect the inner legal users’ behaviors beyond their authority.
detection event has become an important research in network The technologies mentioned above also have some defects,
security. In this paper, user behavior features are extracted to such as high false rate, low detection efficiency, difficult to
create the model for the user transmission behavior. The define or create normal behavior profile and classify intrusions.
demands for anomaly detection and the specific characteristics of On the other hand, the misuse detection technologies extract
audit data are studied. An intrusion detection algorithm and attack features from known attacks to constitute feature profile.
based on statistics variance method in user transmission behavior Then the data to be tested will be pattern matched with feature
(IDSV) and the implementation framework are provided. And profile to detect intrusion. For example, the methods based on
then, the IDSV algorithm is applied into ARP spoofing detection. conditional probability, state transition and misuse detection
The simulation results show that IDSV algorithm does well in rules. The misuse detection technologies are relatively mature
detection rate of intrusion detection and has good detection with high detection accuracy and efficiency, low false rate.
performance of different application features. The IDSV However, the effect of the technologies depends on the
algorithm can detect intrusion effectively under user behavior in completeness of feature profile and is unable to detect
different applications. unknown intrusion behavior. So the profile must be updated in
time. Therefore, the workload of profile maintenance is heavy.
Keywords-Network Security; Anomaly Detection; User
Transmitting Behavior; Intrusion Detection
In this paper, we analyze the transmission behavior to detect
intrusion. As most users’ behaviors belong to the definition of
behavior profile, we adopt anomaly detection technology.
I. INTRODUCTION
As the distributed character and application’s complexity II. THE PROBLEM OF NETWORK INTRUSION
make Internet difficult to be managed and controlled[1], there
arise too many network security problems. Improving the
security of network by intrusion detection system has become
an important research direction in network control[2]. In this
paper, the studied object is datagram which embodies user
behavior. The user transmission behavior will be studied. The
anomaly detection requirements and datagram features to be
audited, such as IP address, port number, packet type and
length in datagram, will be analyzed. Then the statistics
variance and nearest neighbor algorithm of machine learning is
applied to model user behavior. Based on the above model, an
intrusion detection algorithm is implemented and applied to the
anomaly detection system.
According to different analyzing methods, the technologies Figure 1. Network Intrusion in Internet
of intrusion detection are generally divided into anomaly
detection technology[3] and misuse detection technology. The As shown in Fig. 1, after achieving accessing Internet,
basic principle of anomaly detection technology is to create the malicious nodes try to invade the target node and send target
profile of normal user behavior and decide whether there exist attack packet mixed in normal packets. Intrusion activities
any intrusions by comparing the current user behavior with the include unauthorized users attempting to access and processing
normal user behavior profile. For example, method based on data or preventing normal operation of the computer. Intrusion
feature selection and data mining-based method[4], technologies detection is to monitor or prevent those who attempt to take
based on application pattern, statistics method and artificial control of your system or network resources illegally. Intrusion

978-0-7695-4270-6/10 $26.00 © 2010 IEEE 1182


DOI 10.1109/ICCIS.2010.292
detection technology, which detects whether there exist any Definition 2. Let V = {Attr1 , Attr2 , " , Attrn } be the subset
violations of security policy events or attacks and sends alarm
through collecting and analyzing of some key information in of F . V consists of behavior features used by algorithm.
network and host, is the key technology of network security Then V is called record feature set.
defense system.
The users’ behaviors are various. Whereas, we will study
Intrusion Detection System (IDS) came from the earliest intrusion detection and network attacks prevention. The user
concept in a report, 1972, by of Anderson. IDS integrates behavior, especially transmission behavior, will be concerned.
electronic data processing, security auditing, pattern matching In transmission behavior, the following features should be
and statistical techniques organically. Then IDS analyzes the studied, such as user object, transmission time, type and data
data to be audited or directly captured from network to find
acts and activities which are contrary to security policy or
(
quantity. Therefore, we use four vector T , B, A, M to )
endanger system security. IDS usually has three types of describe transmission behavior.
components (shown in Fig. 2): Data Collecting component • T is the time spent in transmission.
which provides records, Intrusion Detection Engine which
finds intrusion events, Security Control component which • B is the type of transmission behavior and represented by
responds to the output of the engine. two vector (P, V ) , where P is the transport layer
protocol, such as TCP, UDP, etc. V is the application
layer protocol, such as HTTP, FTP, TELNET, etc, and
also corresponds to the port number of the applications.
• A is the address of user’s network location and represented
by two vector (IP, Mac ) , where IP consists of source
and destination IP address, Mac consists of source and
destination MAC address.
• M is data quantity in transmission and represented by
two vector (M s , M d ) , where M s is the quantity of data
Figure 2. Framework of Intrusion Detection System sent from source side, M d is the quantity of data received
by destination side.
The essential of intrusion detection is to detect security
policy violations and attacks. For this reason, intrusion The key of intrusion detection is to analyze the audit
detection has been studied with different technologies, such as records and to find abnormal behavior. According to Def. 1, all
user behavior, artificial immune, statistical techniques, etc. In behavior features constitute behavior measurable set. Here, we
this paper, we will study intrusion detection under user select Service feature to preprocess the audit data. WEB service
behavior model. will be compared with undifferentiated applications. The size
of audit data will be reduced by classifying Service feature to
III. MODEL OF USER TRANSMISSION BEHAVIOR improve the detecting speed. As feature M s directly reflects
Detection feature analysis in IDS is an important part of the transmission behavior features, M s will be used to design
intrusion detection[7]. As the size of audit data grows, how to intrusion detection algorithm.
extract the proper features from massive data is a difficult
problem in intrusion detection. Many experts and scholars have Let x be the sum of M s . Known from central limit
carried out some research to present some feature selection
algorithms, such as principal component analysis. Appropriate theorem, x satisfies normal distribution. In the following, the
feature combinations can help to detect the corresponding feature M s is regarded as an independent stochastic variable
attacks. While inappropriate or unrelated ones may even reduce and is used to calculate the average and threshold value.
the detection performance.
Assuming the number of bytes transferred per connection is
In this paper, the statistics variance method in user
transmission behavior is adopted to detect intrusion. So how to
ti . The total and average bytes number of n connections are
select the features and allocate each feature a proper weight is n
the key of the method. In order to represent the features better, respectively Tn and Tn , where Tn = Tn n = ∑ ti n .
we define measurable set and record feature set as follows. i =1
srcM is the standard variance of audit data. If srcM is
Definition 1. Assuming F is a group of user behavior set, close to the average, the confidence interval will be short. The
which includes all behavior features appeared in audit data or unbiased estimation of standard variance compared to the
connections. Then F is called measurable set.

1183
n
Avg = Sum n ; // the average of srcM
∑t i
2
− nTn
2

V = sqrt ((M − n ∗ Avg * Avg ) ((n − 1) * n )) * z σ ;


population standard variance is sn = i =1
. A 2

n −1 Min = Avg − V ; Min = Avg + V ;//confidence interval


confidence interval is constituted by the average transmission Obtaining connection record in audit data;
bytes number and standard variance for data population
if (feature value is in confidence interval?)
average according to feature M s . The standard variance of return 1; //1 is normal
else
audit data average is s xn = s n n. return 0; //0 is anomaly
}
Therefore, the confidence interval with confidence degree
is The time complexity of IDSV algorithm is O(n ) with n
⎛ s s ⎞ connections. The space complexity is O(1) .
⎜ Tn − z σ ⋅ n , Tn + z σ ⋅ n ⎟
⎝ 2
n 2
n⎠ B. ARP Attack Detection with IDSV Algorithm
(1) The ARP spoofing usually modify ARP table on object by
sending reply packets of ARP. The preventive measures of
Where, z σ is a quantile in accordance with standard ARP spoofing are to statically bind MAC address and IP
2

normal distribution. The larger the data number is, the less the address. Here we present a ARP spoofing detection method
average variance of srcM will be. Hence, if srcM of with considering the mapping between IP and MAC address.
current network connection is in the interval shown in formula As ARP spoofing modifies the mapping between IP and
1, the behavior is normal. The contrary is the anomaly. MAC address, location feature of transmission, two-tuples
( )
IP, Mac , is used to select the data. Three cases will be
IV. IDSV ALGORITHM studied: 1) Fixed IP and MAC address. This case is for most
In the following, an Intrusion Detection algorithm based on users with normal behavior record. In IDSV algorithm, the
Statistics Variance method, IDSV, is presented. The confidence average of IP and MAC address is kept unchanged and the
interval is weighted to improve detection rate according to the threshold value is 0, namely IP and MAC address are one-one
real detection effect of IDSV algorithm. mapped. 2) One to many mapping of IP and MAC. Under this
case, white list into which the mapping of IP and MAC address
A. Framework of IDSV Algorithm is added will be applied to filter the audit data. 3) The dynamic
How to set the threshold value in IDSV algorithm is IP allocation by DHCP. The IP and MAC address mapping
difficult. If threshold value interval is too short, there will be table is maintained through deleting the outdated address
numerous false alarms. If too long, miss rate will be high. As mapping periodically.
the confidence interval according formula 1 is too strict, we It can be seen that the procedure of ARP spoofing detection
introduce the weighted factor θ to change the size of interval with IDSV algorithm, shown in Fig. 2. The ARP reply packets
elastically as follows. are captured through data collecting component. The source IP
and MAC address are extracted from the packets and compared
⎛ s s ⎞ with the mapping table. If there exist multiple MAC address
⎜ Tn − θ ⋅ z σ ⋅ n , Tn + θ ⋅ z σ ⋅ n ⎟ mapping to one IP and one IP mapping to multiple MAC, ARP
⎝ 2
n 2
n⎠ spoofing alarms will be reported with the source and
(2) destination address of the attack.
Where, z σ = 1.96 , θ is the weighted factor. The range of V. SIMULATIONS
2

confidence interval is {Min, Max}. Then IDSV algorithm is The simulations are carried out to detect anomaly behaviors.
as follows: The influence of weighted confidence interval on IDSV
algorithm will be assessed with involving detection accuracy
Input: srcM and user connection records selected by n , and false rate.The simulations are based on the audit data set of
n ≥ 2 , Service features, weighted factor θ . KDDCUP 2009[8]. There are around 5,000,000 records in the
record set. The connection records consist of behavior features.
Output: normal/anomaly A trained data subset which includes 494,020 records is
selected from the audit data, where the weighted factor of
IsNormal ( ) { undifferentiated application and WEB application data is
for i = 1, …, n { respectively 23 and 197.
2
Sum+ = ti ; M + = ti ; //sum and square sum of audit data The detection performance of IDSV algorithm on
} undifferentiated and WEB is compared in the following
figures. With different size of audit data, false rate of

1184
undifferentiated application and WEB is little different. While 0.9975 and higher than reference [9]. Consequently, IDSV
the detection rate of WEB is obviously better than algorithm classifying audit records by features has higher
undifferentiated application’s. The reason is that audit data detection performance.
selected according to transmission feature become more
structured. Moreover, WEB applications are not sensitive to the VI. CONCLUSIONS
size of audit data. To some extent, we can infer that the
adaptability of IDS could be enhanced by classifying audit In this paper, user behavior features are studied to create
record according to Service feature. model for the user behavior. An intrusion detection algorithm
based on statistics variance method in user transmission
behavior (IDSV) is provided and applied into intrusion
detection. The simulation results show that IDSV algorithm
does well in detection performance of intrusion detection with
different behavior features. Therefore, the feasibility and
effectivity of IDSV algorithm are suggested.

ACKNOWLEDGMENT
This paper is supported by the National Natural Science
Foundation of China (70872046,70671054) and the National
Basic Research Program of China (2009CB320501)

REFERENCES
Figure 3. Comparison of IDSV Detection Rate [1] Alpcan T, Basar T. A game theoretic approach to decision and analysis
in network intrusion detection [C]. In: Proc. of 43rd IEEE Conference on
Decision and Control (CDC), Paradise Island, Bahamas: IEEE Computer
Society Press, 2006. 2595-2600
[2] Todd H, Gihan D, Karl Levitt. A network security monitor [C].
In:Prnceedings of the 1990 IEEE Symposium on Research in Security
and Privacy, USA: IEEE Computer Society Press, 1990: 296-304
[3] Kalle Burbeck. Current Research and Use of Anomaly Detection [C].
Proceedings of the 14th IEEE International Workshops on Enabling
Technologies: Infrastructure for Collaborative Enterprise, 2005.
[4] LTC Bruce D. Caulkins USA, Joohan Lee ,Morgan Wang .A Dynamic
Data Mining Technique for Intrusion Detection Systems [C]. 43rd ACM
Southeast Conference,March 18-20,2005.
[5] Yu-Fang Zhang, Gui-Hua Sun, Zhong-Yang Xiong. A Novel Method of
Intrusion Detection based on Artificial Immune System [C]. Proc. of the
Fifth International Conference on Machine Learning and Cybernetics,
Dalian, 13-16 August ,2006.
[6] Baoyi WANG, Shaomin ZHANG .A New Intrusion Detection Method
Figure 4. Comparison of IDSV False Rate Based on Artificial Immune System [C]. IFIP International Conference
on Network and Parallel Computing-Workshops, 2007
Furthermore, we analyze the reason that detection rate is [7] Chen You, Cheng Xueqi, Li Yang, Dai Lei. Lightweight Intrusion
too low with 2,500 audit records in Fig. 4. The detection rates Detection System Based on Feature Selection [J]. Journal of Software,
with different 2,500 audit records are shown in Table I. 2007,18(7): 1639-1651.
[8] Salvatore J. Stolfo, Wei Fan, Wenke Lee etc. Task description of
Kddcup’99 [OL]. http://kdd.ics.uci.edu/databases/kddcup99/task.html,
TABLE I. DETECTION RATE WITH DIFFERENT 2,500 RECORDS SET 1999
Records 1 Records 2 Records in Fig. 6 [9] Zhao Xiaofeng, Ye Zhen. Research on weighted multi-random decision
Detection Rate 1.0 0.915593 0.446237 tree and its application to intrusion detection [J]. COMPUTER
ENGINEERING AND APPLICATIONS, 2007.5 (18).
We find the records in Fig. 4 include many attacks that
statistics model is not good at detecting. So the detection rate in
Fig. 4 is low. This also shows that special detection model and
features are efficient to special intrusion detection. Detection
rate does not depend on the number of audit records but on
whether detection feature is suitable and whether detection
model is good at detecting.
A detection model based on weighted multi-random
decision tree is presented in [9] with 0.9929 detection rate. In
this paper, the average detection rate of IDSV algorithm is

1185

You might also like