CN106230772A

CN106230772A - Industry internet Deviant Behavior excavates scheme

Info

Publication number: CN106230772A
Application number: CN201610527355.5A
Authority: CN
Inventors: 俞海国; 刘文泉; 马先; 张洪平; 张海宁; 刘世良; 苏生平; 尚西元; 李楠芳; 刘忠魁; 赵明明; 林亮成; 任凤伟; 王迎鹤
Original assignee: Middle Electricity Runs (beijing) Information Technology Co Ltd; State Grid Qinghai Electric Power Co Ltd
Current assignee: Middle Electricity Runs (beijing) Information Technology Co Ltd; State Grid Qinghai Electric Power Co Ltd
Priority date: 2016-07-07
Filing date: 2016-07-07
Publication date: 2016-12-14

Abstract

According to the characteristics of data behavior in the industrial control network under the mobile Internet environment, an improved hybrid multi-classification naive Bayesian algorithm and a massive data incremental learning algorithm based on two-step screening are provided, and applied to abnormal behavior mining of mobile industrial control systems and analysis. The abnormal behavior mining scheme is designed by abnormal behavior classification and mining algorithm: the method of suspicious behavior classification and the process of data mining. In the industrial control network under the mobile Internet environment, the mining of behavior data is divided into two stages: classifier learning stage and network behavior monitoring stage; after obtaining various behavior classifiers, data mining of malware behavior enters the second stage, network behavior monitoring stage. The Naive Bayesian classification algorithm has the characteristics of fast calculation speed, high classification accuracy and good robustness under the premise of independent classification categories, and has been widely used.

Description

Industrial Internet Abnormal Behavior Mining Solution

技术领域technical field

本发明涉及一种挖掘方案，尤其涉及一种工业互联网异常行为的挖掘方案。The invention relates to a mining scheme, in particular to a mining scheme for abnormal behavior of the industrial Internet.

背景技术Background technique

针对移动互联环境下工控系统异常行为的安全防护主要包括两个方面，网络侧防护和终端侧防护。Security protection against abnormal behavior of industrial control systems in the mobile Internet environment mainly includes two aspects, network side protection and terminal side protection.

1、网络侧防护1. Network side protection

网络侧防护通常是指利用特征匹配引擎对移动互联环境下工业控制系统的网络流量进行分析，主要是指工业控制特定协议，如OPC、DNP3等，也包括对各种移动终端的异常行为、样本文件等进行分析。工业控制系统的入侵检测技术利用旁路模式实现了对工业控制网络中异常行为的监控。Network side protection usually refers to the use of feature matching engines to analyze the network traffic of industrial control systems in the mobile Internet environment, mainly referring to specific industrial control protocols, such as OPC, DNP3, etc., and also includes abnormal behaviors and samples of various mobile terminals. files etc. for analysis. The intrusion detection technology of the industrial control system realizes the monitoring of abnormal behavior in the industrial control network by using the bypass mode.

2、终端侧防护2. Terminal side protection

终端侧的防护主要通过对工控网络中移动终端的安全检测来阻止攻击行为。目前常用的安全分析技术包括：The protection on the terminal side mainly prevents attacks through the security detection of mobile terminals in the industrial control network. Currently commonly used security analysis techniques include:

·对恶意行为特征扫描。这种防护方式可以高精准度地检测已知的异常恶意行为，但是对于未知的或是新的异常恶意行为则无法检出。· Scan for malicious behavior characteristics. This protection method can detect known abnormal malicious behaviors with high precision, but cannot detect unknown or new abnormal malicious behaviors.

·静态采样分析原理。这种原理在对移动终端上的应用进行固定采样分析，用于判断这个应用的行为是否为异常恶意行为。这种采样分析要基于用户事先设置好的安全选项来进行。对于移动工控网络中的异常行为，特别是新的异常行为，也是无法及时防范的。· Static sampling analysis principle. This principle is used to conduct fixed sampling analysis on the application on the mobile terminal to determine whether the application's behavior is abnormal malicious behavior. This sampling analysis should be performed based on the security options set by the user in advance. Abnormal behaviors in the mobile industrial control network, especially new abnormal behaviors, cannot be prevented in time.

·动态行为分析技术。这种技术为了节约移动终端计算资源，提出了基于云的移动恶意程序安全检测系统，通过在移动互联网中部署一个云恶意程序检测服务器来检测上传的移动恶意程序。该方法相当于利用带宽换取移动终端计算资源的节省。· Dynamic behavior analysis technology. In order to save computing resources of mobile terminals, this technology proposes a cloud-based mobile malicious program security detection system, which detects uploaded mobile malicious programs by deploying a cloud malicious program detection server in the mobile Internet. This method is equivalent to exchanging bandwidth for saving computing resources of the mobile terminal.

在移动互联环境下工业控制网络中，虽然传统特征匹配技术等对已知异常行为可以获得较好的检测效果，但是由于移动终端智能化的提升，在移动终端上所承载的业务，特别是工业控制业务，也在快速地增长，各种针对移动终端的新型攻击和变形攻击层出不穷。In the industrial control network under the mobile Internet environment, although the traditional feature matching technology can obtain better detection results for known abnormal behaviors, due to the improvement of the intelligence of mobile terminals, the services carried on mobile terminals, especially industrial The control business is also growing rapidly, and various new and modified attacks against mobile terminals emerge in an endless stream.

发明内容Contents of the invention

发明目的：根据移动互联环境下工业控制网络的行为特点发明一种改进的混合多分类朴素贝叶斯算法及基于两步筛选的海量数据增量学习算法，给出了在移动互联环境下工业控制网络的异常行为监测系统及处理流程。即引入数据挖掘算法进行异常行为分析。Purpose of the invention: According to the behavior characteristics of the industrial control network in the mobile Internet environment, an improved hybrid multi-classification naive Bayesian algorithm and a massive data incremental learning algorithm based on two-step screening are invented, and the industrial control network in the mobile Internet environment is given. Network abnormal behavior monitoring system and processing flow. That is, data mining algorithms are introduced for abnormal behavior analysis.

本发明是这样实现的：工业互联网异常行为挖掘方案，异常行为分类和挖掘算法设计：The present invention is realized in the following way: industrial Internet abnormal behavior mining scheme, abnormal behavior classification and mining algorithm design:

1、可疑行为分类的方法以及数据挖掘的过程，在移动互联环境下工业控制网络中，行为数据的挖掘分为两个阶段：分类器学习阶段和网络行为监测阶段。在获得各类行为分类器之后，恶意软件行为的数据挖掘进入第二阶段，网络行为监测阶段。1. The method of suspicious behavior classification and the process of data mining. In the industrial control network under the mobile Internet environment, the mining of behavior data is divided into two stages: the classifier learning stage and the network behavior monitoring stage. After obtaining various behavior classifiers, the data mining of malware behavior enters the second stage, the network behavior monitoring stage.

2、网络行为挖掘分析，获取的现网行为数据的TCP连接及应用层协议部分选取关键的包头字段、流量统计以及关键内容字段作为异常行为分析的特征。2. Network behavior mining analysis, the TCP connection and application layer protocol part of the obtained live network behavior data select key header fields, traffic statistics and key content fields as the characteristics of abnormal behavior analysis.

3、异常行为挖掘算法设计，混合多分类朴素贝叶斯算法和两步筛选增量学习方法；首先利用白名单扫描引擎扫描现网行为数据获取用于增量学习的正常行为；利用已知异常行为特征匹配引擎的输出得到异常行为。从而获得包括异常行为和正常行为的原始增量训练集DT，然后进行两步筛选后加入到增量训练集中对现有模型进行训练。3. Design of abnormal behavior mining algorithm, mixed multi-category naive Bayesian algorithm and two-step screening incremental learning method; firstly, use whitelist scanning engine to scan live network behavior data to obtain normal behavior for incremental learning; use known anomalies The output of the behavioral signature matching engine yields anomalous behavior. In this way, the original incremental training set DT including abnormal behavior and normal behavior is obtained, and then after two-step screening, it is added to the incremental training set to train the existing model.

本发明相较于现有技术具有的积极效果在于：数据挖掘算法在传统互联网的网络行为分析和入侵检测中可以较好地检测异常行为。Chandrashekhar等利用聚类算法进行网络入侵检测数据的挖掘分析，Modi等把贝叶斯分类算法应用到基于云计算平台的入侵检测。朴素贝叶斯分类算法在分类类别独立的前提下，具有计算速度快，分类准确性高和健壮性好等特点，并得到广泛应用。Compared with the prior art, the present invention has the positive effect that the data mining algorithm can better detect abnormal behaviors in traditional Internet network behavior analysis and intrusion detection. Chandrashekhar et al. used clustering algorithm to mine and analyze network intrusion detection data, and Modi et al. applied Bayesian classification algorithm to intrusion detection based on cloud computing platform. The Naive Bayesian classification algorithm has the characteristics of fast calculation speed, high classification accuracy and good robustness under the premise of independent classification categories, and has been widely used.

附图说明Description of drawings

图1是本发明网络行为分类器学习阶段流程示意图。Fig. 1 is a schematic flow chart of the learning phase of the network behavior classifier of the present invention.

图2是本发明网络行为监测阶段流程示意图。Fig. 2 is a schematic flow chart of the network behavior monitoring stage of the present invention.

具体实施方式detailed description

移动互联环境下工业控制网络的恶意软件在每个感染阶段的行为都有不同的特点，因此对其行为的分类有助于提高监测的精确度。在移动互联环境下工业控制网络中移动终端的异常行为可以包括：The behavior of malware in industrial control networks in the mobile Internet environment has different characteristics at each infection stage, so the classification of its behavior helps to improve the accuracy of monitoring. Abnormal behaviors of mobile terminals in industrial control networks in the mobile Internet environment may include:

·移动终端僵尸、木马和病毒等恶意程序对移动工控系统的攻击以及人为的针对移动工业控制系统的攻击。这些恶意攻击具有危害大，损失重等特点。攻击者可以通过恶意指令控制移动工控系统的核心操作，或是将工控系统中的机密信息恶意下载等。·Attacks on mobile industrial control systems by malicious programs such as mobile terminal zombies, Trojan horses and viruses, and artificial attacks on mobile industrial control systems. These malicious attacks have the characteristics of great harm and heavy losses. Attackers can control the core operation of the mobile industrial control system through malicious instructions, or maliciously download confidential information in the industrial control system.

·受感染终端通过多种方式将恶意代码扩散到其他终端。由于移动终端具有较强的通信能力，受感染终端可以通过短信方式传播欺骗下载的恶意程序，并链接到其他终端；也可以利用蓝牙、红外等方式将恶意程序传播到其他的终端。·Infected terminals spread malicious codes to other terminals in various ways. Due to the strong communication capabilities of mobile terminals, infected terminals can spread malicious programs that are deceptively downloaded through SMS and link to other terminals; they can also spread malicious programs to other terminals by using bluetooth, infrared, etc.

·装有工业移动应用的移动终端访问恶意网站，将工控系统的机密信息上传，进而泄漏机密信息给恶意的第三方；或是从恶意网站上下载病毒、木马等恶意软件，并通过受感染的移动终端发起对工控系统的攻击等。Mobile terminals equipped with industrial mobile applications visit malicious websites, upload the confidential information of the industrial control system, and then leak the confidential information to malicious third parties; or download viruses, Trojan horses and other malicious software from malicious websites, and pass the infected Mobile terminals initiate attacks on industrial control systems, etc.

Milligan等总结了移动恶意程序的安全危害，其中包括信息泄露、机密数据窃取、恶意攻击、网络欺诈攻击和网络拒绝服务攻击等。Milligan et al. summarized the security hazards of mobile malicious programs, including information leakage, confidential data theft, malicious attacks, network fraud attacks, and network denial of service attacks.

由于移动终端和工业应用和用户密切相关，因此异常行为往往和移动终端类型、客户端程序类型、用户信息等相关联，而且传输的协议多样化，攻击方式也多种多样。因此，在移动互联环境下工业控制网络中的异常行为分析，既要考虑终端信息、用户信息，也要考虑各个层面传输协议的相关特征属性。Since mobile terminals are closely related to industrial applications and users, abnormal behaviors are often associated with mobile terminal types, client program types, user information, etc., and the transmission protocols are diversified, and the attack methods are also diverse. Therefore, in the analysis of abnormal behavior in the industrial control network under the mobile Internet environment, it is necessary to consider not only terminal information, user information, but also the relevant characteristics and attributes of transmission protocols at all levels.

根据对移动互联环境下工业控制网络异常行为的分析，可以将恶意软件的感染期间分为三个阶段：扩散阶段、访问恶意服务器的阶段和攻击阶段。在扩散阶段，通过彩信、HTTP、FTP和电子邮件等方式，恶意软件可以发送到其它移动终端。在恶意软件感染移动终端之后，它通过连接恶意服务器下载更新文件、控制指令或者将终端获取的系统信息上传给恶意网站。最终，恶意软件利用受感染的终端能够对工控系统发起各种各样的攻击，包括从工业控制系统恶意下载数据、发布非法控制指令攻击系统、将隐私机密随意发送到其他的终端或者网站等。According to the analysis of the abnormal behavior of the industrial control network in the mobile Internet environment, the infection period of malware can be divided into three stages: the diffusion stage, the stage of accessing malicious servers, and the attack stage. In the diffusion stage, malicious software can be sent to other mobile terminals through MMS, HTTP, FTP and email. After the malware infects the mobile terminal, it downloads update files and control instructions by connecting to a malicious server, or uploads the system information obtained by the terminal to a malicious website. Ultimately, malicious software can use infected terminals to launch various attacks on industrial control systems, including maliciously downloading data from industrial control systems, issuing illegal control commands to attack systems, and sending private secrets to other terminals or websites at will.

依据上述三个过程，恶意软件的行为分为三类：扩散行为、访问恶意网站行为和攻击行为。这三类行为分别采用不同的分类器进行分类处理，以提高恶意行为的识别度。According to the above three processes, malware behaviors are divided into three categories: spreading behaviors, visiting malicious websites behaviors and attacking behaviors. These three types of behaviors are classified and processed by different classifiers to improve the recognition of malicious behaviors.

异常行为分类和挖掘算法设计：Abnormal behavior classification and mining algorithm design:

1、可疑行为分类的方法以及数据挖掘的过程，在移动互联环境下工业控制网络中，行为数据的挖掘分为两个阶段：分类器学习阶段和网络行为监测阶段。其中，行为分类器数据挖掘在学习阶段的过程如图1所示。在学习阶段，本模型将已知的移动恶意软件和正常的网络访问用作行为分类器的学习数据。其中，作为学习数据的恶意软件具有三个阶段的行为：扩散行为、恶意访问行为和攻击行为。同样的，正常网络访问数据也有相似类型的行为，如移动终端之间正常的信息交流、下载文件的行为、访问系统的行为和正常的控制指令发布到控制系统的行为等。行为分类器模块根据数据行为的特征将学习数据分为三个行为子集：扩散行为子集、恶意访问行为子集和攻击行为子集。接着，这三个行为子集的数据分别被用于三个不同的朴素贝叶斯分类器的学习。这三个分类器分别是：扩散行为分类器F1、恶意访问行为分类器F2和攻击行为分类器F3。1. The method of suspicious behavior classification and the process of data mining. In the industrial control network under the mobile Internet environment, the mining of behavior data is divided into two stages: the classifier learning stage and the network behavior monitoring stage. Among them, the process of behavior classifier data mining in the learning stage is shown in Figure 1. During the learning phase, the model uses known mobile malware and normal network access as learning data for a behavioral classifier. Among them, malware as learning data has three stages of behavior: diffusion behavior, malicious access behavior and attack behavior. Similarly, normal network access data also has similar types of behaviors, such as the normal information exchange between mobile terminals, the behavior of downloading files, the behavior of accessing the system, and the behavior of issuing normal control commands to the control system. The behavior classifier module divides the learning data into three behavior subsets according to the characteristics of the data behavior: diffusion behavior subset, malicious access behavior subset and attack behavior subset. Then, the data of these three behavioral subsets are used to learn three different Naive Bayesian classifiers respectively. The three classifiers are: diffusion behavior classifier F1, malicious access behavior classifier F2 and attack behavior classifier F3.

在获得各类行为分类器之后，恶意软件行为的数据挖掘进入第二阶段，网络行为监测阶段，如图2所示。在监测阶段，移动网络中的真实数据输入到第一阶段获得的行为分类器中。行为分类模块根据网络数据的行为特征，将网络数据分成三个子集。然后，这三个子集的行为数据分别输入到对应的行为分类器中进行分析分类，用以判断这些网络行为数据是否为恶意行为数据。After obtaining various behavior classifiers, the data mining of malware behavior enters the second stage, the network behavior monitoring stage, as shown in Figure 2. In the monitoring phase, real data from the mobile network is fed into the behavior classifier obtained in the first phase. The behavior classification module divides the network data into three subsets according to the behavior characteristics of the network data. Then, the behavior data of these three subsets are respectively input into corresponding behavior classifiers for analysis and classification, so as to judge whether these network behavior data are malicious behavior data.

2、网络行为挖掘分析，传统的网络入侵行为挖掘分析通常采用KDD’99入侵检测数据进行挖掘分析，KDD’99网络入侵行为是由DARPA’98入侵检测系统项目中采集的，包括拒绝服务、提权攻击、远程攻击和扫描攻击。数据中每个连接通过41个特征来描述，包括基本连接特征、流量统计特征、内容特征和基于主机的网络流量统计特征等。由于移动互联环境下工业控制网络中的攻击行为和传统网络有所区别，因此本系统利用的高速采集探针对现有移动工控网络上的异常行为流量和正常行为流量进行采集；利用已知移动互联工控网络攻击特征进行匹配，获得带标签的异常行为数据；利用白名单扫描引擎进行扫描，获得带标签的正常行为数据。参考KDD’99的网络行为特征描述，本发明将获取的现网行为数据的TCP连接及应用层协议部分选取关键的包头字段、流量统计以及关键内容字段作为异常行为分析的特征。例如，如果存在信息窃取类的攻击，那么关键内容字段通常会包括一些特殊的字符串。2. Network behavior mining analysis. Traditional network intrusion behavior mining analysis usually uses KDD'99 intrusion detection data for mining analysis. KDD'99 network intrusion behavior is collected by DARPA'98 intrusion detection system project, including denial of service, prompting privilege attacks, remote attacks, and scanning attacks. Each connection in the data is described by 41 features, including basic connection features, traffic statistics features, content features, and host-based network traffic statistics features. Due to the difference between the attack behavior in the industrial control network and the traditional network in the mobile Internet environment, the high-speed acquisition probe used in this system collects the abnormal behavior traffic and normal behavior traffic on the existing mobile industrial control network; Match the attack characteristics of the interconnected industrial control network to obtain labeled abnormal behavior data; use the whitelist scanning engine to scan to obtain labeled normal behavior data. With reference to the network behavior feature description of KDD'99, the present invention selects key packet header fields, traffic statistics and key content fields as the characteristics of abnormal behavior analysis in the TCP connection and the application layer protocol part of the obtained network behavior data. For example, if there is an attack of information theft, the key content field usually includes some special strings.

3、异常行为挖掘算法设计，混合多分类朴素贝叶斯算法3. Algorithm design for abnormal behavior mining, mixed multi-classification naive Bayesian algorithm

设X＝{x₁，x₂，......，x_k}是数据元组，它由k个属性{A₁，A₂，...，A_k}进行描述；设D是训练元组和相关联的类标号的集合(训练集)。假定对于给定元组X具有n+1个类属性值C＝{C₀，C₁，...，C_n}，朴素贝叶斯分类法预测X在最高概率条件下属于类C_i的概率，当且仅当Let X={x ₁ , x ₂ ,...,x _k } be a data tuple, which is described by k attributes {A ₁ , A ₂ ,..., A _k }; let D be A collection of training tuples and associated class labels (training set). Assuming that for a given tuple X with n+1 class attribute values C = {C ₀ , C ₁ , ..., C _n }, the Naive Bayesian classification method predicts that X belongs to class C _i with the highest probability, if and only if

P(C_i|X)＞P(C_j|X)，(0≤j≤n，i≠j) (1)P(C _i |X)＞P(C _j |X), (0≤j≤n, i≠j) (1)

由于对于所有类均为固定常数，根据贝叶斯定理(公式2)，Since is a fixed constant for all classes, by Bayes' theorem (Equation 2),

$P P (({C C}_{i i} | | X x)) = = \frac{P P ((X x | | {C C}_{i i})) P P (({C C}_{i i}))}{P P ((X x))} - - - - - - ((22))$

只需要确定P(X|C_i)P(C_i)最大即可：即为了预测X的类标号，对每个类C_i，计算P(X|C_i)P(C_i)。It is only necessary to determine the maximum P(X|C _i )P(C _i ): in order to predict the class label of X, for each class C _i , calculate P(X|C _i )P(C _i ).

在移动互联工控网网络请求中选取的属性值之间是相互独立的，因此可以基于各个属性独立的概率值P(x₁|C_i)，P(x₂|C_i)，…，P(x_k|C_i)，进行概率计算：The attribute values selected in the mobile Internet industrial control network network request are independent of each other, so it can be based on the independent probability values of each attribute P(x ₁ |C _i ), P(x ₂ |C _i ),...,P( x _k |C _i ), for probability calculation:

$P P ((X x | | {C C}_{i i})) = = {Π Π}_{j j = = 11}^{k k} P P (({x x}_{j j} | | {C C}_{i i})) - - - - - - ((33))$

如果利用二分类朴素贝叶斯算法对恶意行为进行分类，则n等于1，总类别数为2，即类别只有正常行为与异常行为。由于异常行为可能由多种恶意程序造成且行为并不相同，因此这里采用一种混合多分类朴素贝叶斯算法进行分析。在建模的时候使用不同类别的恶意程序的行为加入到训练集D进行多分类训练；检测的时候按二分类进行检测。对于n+1种分类集C，定义C₀为正常行为类别，C′为异常行为分类，包含n种恶意程序行为子集C′＝{C₁，C₂，...，C_n}，则C＝{C₀，C′}。If the binary classification naive Bayesian algorithm is used to classify malicious behaviors, then n is equal to 1, and the total number of categories is 2, that is, there are only normal behaviors and abnormal behaviors in the categories. Since the abnormal behavior may be caused by a variety of malicious programs and the behaviors are not the same, a mixed multi-classification Naive Bayesian algorithm is used for analysis. When modeling, the behavior of different types of malicious programs is added to the training set D for multi-classification training; when detecting, it is detected by binary classification. For n+1 classification sets C, define C ₀ as normal behavior category, C′ as abnormal behavior category, including n types of malicious program behavior subset C′={C ₁ , C ₂ ,...,C _n }, Then C={C ₀ , C'}.

针对网络行为X进行分类检测的时候，混合多分类朴素贝叶斯算法的输出为公式4。对于网络行为X，当正常行为类C₀的类条件概率P(C₀|X)大于异常行为类条件概率最大值时，判定X为正常行为，否则为异常行为。When classifying and detecting network behavior X, the output of the mixed multi-classification Naive Bayesian algorithm is formula 4. For network behavior X, when the class conditional probability P(C ₀ |X) of normal behavior class C ₀ is greater than the maximum value of abnormal behavior class conditional probability, X is judged as normal behavior, otherwise it is abnormal behavior.

C(X)＝arg max(max(P(C₁|X))，P(C₂|X)，...，P(C_n|X))，P(C₀|X) (4)C(X) = arg max(max(P(C ₁ |X)), P(C ₂ |X), ..., P(C _n |X)), P(C ₀ |X) (4)

两步筛选增量学习方法：Two-step filtering incremental learning method:

在移动互联环境下工业控制网络中的网络行为检测与模型增量学习都涉及到海量数据处理，因此需要选择有助于修正模型的类支持概率相对较高的数据用于增量学习。在实际的检测中，首先利用白名单扫描引擎扫描现网行为数据获取用于增量学习的正常行为；利用已知异常行为特征匹配引擎的输出得到异常行为。从而获得包括异常行为和正常行为的原始增量训练集DT，然后进行两步筛选后加入到增量训练集中对现有模型进行训练。Both network behavior detection and model incremental learning in industrial control networks in the mobile Internet environment involve massive data processing, so it is necessary to select data with a relatively high probability of class support that helps to correct the model for incremental learning. In the actual detection, first use the whitelist scanning engine to scan the live network behavior data to obtain the normal behavior for incremental learning; use the output of the known abnormal behavior feature matching engine to obtain the abnormal behavior. In this way, the original incremental training set DT including abnormal behavior and normal behavior is obtained, and then after two-step screening, it is added to the incremental training set to train the existing model.

第一步筛选，是利用现有的模型对原始增量训练集DT进行多分类朴素贝叶斯分类检测，输出结果分成两种情况：The first step of screening is to use the existing model to perform multi-classification naive Bayesian classification detection on the original incremental training set DT, and the output results are divided into two cases:

·第一种情况，如果原始增量训练集中的异常网络行为的分类属于检测模型中的某一分类，比如属于原始训练模型中的某一类恶意程序的行为，则利用现有模型进行检测计算，并根据公式(1)判定分类是否准确。如果分类准确，则无需利用该数据进行增量学习。如果分类不准确，根据公式(4)进一步判断是否出现把异常行为判断成正常行为的情况。如果有，说明该数据和原来模型存在较大偏差，因此也不能加入进行增量学习。In the first case, if the classification of abnormal network behavior in the original incremental training set belongs to a certain classification in the detection model, such as the behavior of a certain type of malicious program in the original training model, then use the existing model for detection calculation , and judge whether the classification is accurate according to formula (1). If the classification is accurate, there is no need for incremental learning with that data. If the classification is inaccurate, it is further judged according to the formula (4) whether abnormal behaviors are judged as normal behaviors. If there is, it means that the data has a large deviation from the original model, so it cannot be added for incremental learning.

·第二种情况，如果原始增量训练集中的异常网络行为的分类不属于检测模型中的某一分类，则直接利用公式(4)判断是否分类准确，如果准确则用于下一步筛选。In the second case, if the classification of abnormal network behavior in the original incremental training set does not belong to a certain classification in the detection model, then directly use formula (4) to judge whether the classification is accurate, and if it is accurate, it will be used for the next step of screening.

第二步筛选，则对于训练集DT中的剩余数据计算相对类支持概率PS，假设：In the second step of screening, the relative class support probability PS is calculated for the remaining data in the training set DT, assuming:

P(C_m|X)＝max(P(C₁|X)，P(C₂|X)，...，P(C_n|X))，P(C₀|X) 0≤m≤n (5)P(C _m |X)＝max(P(C ₁ |X), P(C ₂ |X),..., P(C _n |X)), P(C ₀ |X) 0≤m≤ n (5)

相对类支持概率PS为：The relative class support probability PS is:

$P P S S = = \frac{P P (({C C}_{m m} | | X x))}{{Π Π}_{i i = = 00,, i i &NotEqual; &NotEqual; m m}^{n no} P P (({C C}_{i i} | | X x))} - - - - - - ((66))$

设定一个数据筛选门限TH，只有相对类支持概率PS＞TH的训练数据才会被加到最后用于增量计算的训练集DT′中，利用如下公式对训练集进行增量训练：Set a data screening threshold TH, and only the training data with relative class support probability PS>TH will be added to the training set DT′ for incremental calculation, and the training set will be incrementally trained using the following formula:

${P P}^{' '} (({C C}_{i i})) = = \frac{11 + + c c o o u u n no t t (({C C}_{i i})) + + {count count}^{' '} (({C C}_{i i}))}{| | C C | | + + | | D D. | | + + | | {DT DT}^{' '} | |} - - - - - - ((77))$

其中：in:

·count(C_i)为训练集D中类别为C_i的网络行为数；Count(C _i ) is the number of network behaviors of category C _i in the training set D;

·count′(C_i)为新增的增量训练集DT′中类别为C_i的网络行为数；Count'(C _i ) is the number of network behaviors of category C _i in the newly added incremental training set DT';

·|C|训练集D中网络行为类别总数；|C|The total number of network behavior categories in the training set D;

·|D|训练集D中网络行为类别总数；|D|The total number of network behavior categories in the training set D;

·|DT|新增的增量训练集中网络行为总数；|DT|The total number of network behaviors in the newly added incremental training set;

·count(C_i∧x_j)为训练集D中类别为C_i且属性A_i取值为X_j的网络行为数；Count(C _i ∧ x _j ) is the number of network behaviors in the training set D where the category is C _i and the value of attribute A _i is X _j ;

·count′(C_i∧x_j)为新增的增量训练集DT′中类别为C_i且属性A_i值为X_j的网络行为数；·count′(C _i ∧ x _j ) is the number of network behaviors whose category is C _i and attribute A _i value is X _j in the newly added incremental training set DT′;

·|A_i|表示属性A_i的取值个数。· |A _i | indicates the value number of attribute A _i .

Claims

1. industry internet Deviant Behavior excavates scheme, it is characterised in that: described Deviant Behavior excavates scheme by Deviant Behavior Classification and mining algorithm design: the method for questionable conduct classification and the process of data mining, under mobile interchange environment In industrial control network, the excavation of behavioral data is divided into two stages: grader study stage and network behavior monitor the stage；? After obtaining each class behavior grader, the data mining of Malware behavior enters second stage, network behavior monitoring stage.

2. Deviant Behavior as claimed in claim 1 excavates scheme, it is characterised in that: described network behavior mining analysis, obtains The TCP of the existing network behavioral data taken connects and application layer protocol part chooses crucial header field, traffic statistics and key The feature that content field is analyzed as Deviant Behavior.

3. Deviant Behavior as claimed in claim 1 excavates scheme, it is characterised in that: described Deviant Behavior mining algorithm sets Meter, mixing many classification NB Algorithm and two steps screening Increment Learning Algorithm；Scan first with white list scanning engine Existing network behavioral data obtains the normal behaviour for incremental learning；The output utilizing known exception behavior characteristics coupling engine obtains Deviant Behavior.Thus obtain the original incremental training collection DT including Deviant Behavior and normal behaviour, after then carrying out two step screenings Join incremental training concentration existing model is trained.