Nothing Special   »   [go: up one dir, main page]

CN111597549A - A method and system for identifying network security behavior based on big data - Google Patents

A method and system for identifying network security behavior based on big data Download PDF

Info

Publication number
CN111597549A
CN111597549A CN202010304316.5A CN202010304316A CN111597549A CN 111597549 A CN111597549 A CN 111597549A CN 202010304316 A CN202010304316 A CN 202010304316A CN 111597549 A CN111597549 A CN 111597549A
Authority
CN
China
Prior art keywords
behavior
user
risk
network security
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010304316.5A
Other languages
Chinese (zh)
Inventor
杨力强
王坤
曹建伟
何纪成
钱海峰
梁野
陈永炜
李慧勋
胡宗宁
张志军
王昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tailun Electric Power Group Co ltd
Beijing Kedong Electric Power Control System Co Ltd
Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Beijing Kedong Electric Power Control System Co Ltd
Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kedong Electric Power Control System Co Ltd, Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Beijing Kedong Electric Power Control System Co Ltd
Priority to CN202010304316.5A priority Critical patent/CN111597549A/en
Publication of CN111597549A publication Critical patent/CN111597549A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network security behavior recognition method and system based on big data in the technical field of network security management, and aims to solve the technical problems that the risk behavior criterion is formulated in the prior art, a blacklist is manually configured mainly by operation and maintenance personnel, scientific basis is not available, possible risk behaviors cannot be exhausted, obvious limitation exists, and the safety of a power system is not guaranteed. The method comprises the following steps: constructing a normal behavior model based on historical data; and identifying the behaviors of the user based on the normal behavior model, and generating corresponding risk levels, wherein the risk levels comprise high-risk behaviors.

Description

一种基于大数据的网络安全行为识别方法及系统A method and system for identifying network security behavior based on big data

技术领域technical field

本发明涉及一种基于大数据的网络安全行为识别方法及系统,属于网络安全管理技术领域。The invention relates to a big data-based network security behavior identification method and system, and belongs to the technical field of network security management.

背景技术Background technique

网络安全管理平台中,为避免用户错误操作或超权限操作,需要通过监测装置根据风险行为判据对所采集到的操作日志进行判断,确认对应的操作行为是否为风险行为,以便采取对应的保护措施。In the network security management platform, in order to avoid wrong operation or unauthorized operation by users, it is necessary to use the monitoring device to judge the collected operation log according to the risk behavior criterion, to confirm whether the corresponding operation behavior is a risk behavior, so as to take corresponding protection. measure.

目前,制订风险行为判据主要通过运维人员手动配置黑名单的方式,黑名单大多采用出厂默认设置或者运维人员根据个人经验设置,既无科学依据,同时对于可能存在的风险行为无法穷尽,存在明显局限性,不利于保证电力系统安全。At present, the establishment of risk behavior criteria is mainly through the manual configuration of blacklists by operation and maintenance personnel. Most of the blacklists are set by factory defaults or set by operation and maintenance personnel based on personal experience. There is no scientific basis, and the possible risk behaviors cannot be exhausted. There are obvious limitations, which are not conducive to ensuring the safety of the power system.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足,本发明的目的在于提供一种基于大数据的网络安全行为识别方法及系统,以解决现有技术中制订风险行为判据主要通过运维人员手动配置黑名单的方式,既无科学依据,同时对于可能存在的风险行为无法穷尽,存在明显局限性,不利于保证电力系统安全的技术问题。In view of the deficiencies of the prior art, the purpose of the present invention is to provide a method and system for identifying network security behaviors based on big data, so as to solve the problem that in the prior art, the method of formulating risk behavior criteria is mainly through the manual configuration of blacklists by operation and maintenance personnel. There is no scientific basis, and at the same time, the possible risk behaviors cannot be exhausted, and there are obvious limitations, which are not conducive to the technical problems of ensuring the safety of the power system.

为解决上述技术问题,本发明所采用的技术方案是:For solving the above-mentioned technical problems, the technical scheme adopted in the present invention is:

一种基于大数据的网络安全行为识别方法,包括如下步骤:A method for identifying network security behaviors based on big data, comprising the following steps:

基于历史数据构建正常行为模型;Build a normal behavior model based on historical data;

基于正常行为模型对用户的行为进行识别,生成对应的风险等级,所述风险等级包括高风险行为。The user's behavior is identified based on the normal behavior model, and a corresponding risk level is generated, where the risk level includes high-risk behaviors.

进一步地,所述正常行为模型的构建方法包括K-means算法和SVM算法。Further, the construction method of the normal behavior model includes K-means algorithm and SVM algorithm.

进一步地,基于历史数据构建正常行为模型,包括:Further, build a normal behavior model based on historical data, including:

采集用户行为历史数据,构建用户行为数据库;Collect user behavior historical data and build a user behavior database;

对用户行为数据库进行关联分析,构建基于用户身份的行为数据集合;Perform association analysis on the user behavior database to construct a user identity-based behavior data collection;

对行为数据集合进行抽象处理,提取用户的日常行为基本特征值,构建基于用户身份的用户行为特征集合;Abstracting the behavior data set, extracting the basic feature values of the user's daily behavior, and constructing a user behavior feature set based on user identity;

结合用户角色的授权信息,对不少于两个行为特征集合进行聚类分析,构建基于用户角色的用户行为基准特征库。Combined with the authorization information of user roles, cluster analysis is performed on no less than two behavior feature sets, and a user behavior benchmark feature library based on user roles is constructed.

进一步地,基于正常行为模型对用户的行为进行识别,包括:Further, the user's behavior is identified based on the normal behavior model, including:

对用户的行为与用户行为基准特征库进行比对,判定用户的角色类别。The user's behavior is compared with the user behavior benchmark feature database to determine the user's role category.

进一步地,用户的行为包括操作目录或/和操作命令;基于正常行为模型对用户的行为进行识别,还包括:Further, the user's behavior includes the operation directory or/and the operation command; the user's behavior is identified based on the normal behavior model, and also includes:

对用户的行为与用户行为数据库进行比对,如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令,判定为高风险行为。The user's behavior is compared with the user behavior database, and if the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior.

进一步地,基于正常行为模型对用户的行为进行识别,还包括:Further, identifying the user's behavior based on the normal behavior model also includes:

对用户的行为与判定的角色类别所对应的授权信息进行比对,如果用户的行为超出判定的角色类别所对应的授权信息,判定为高风险行为。The user's behavior is compared with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior.

进一步地,基于正常行为模型对用户的行为进行识别,还包括:基于所判定的角色类别对用户的行为进行离群点分析,根据分析结果判定风险等级。Further, identifying the user's behavior based on the normal behavior model further includes: performing outlier analysis on the user's behavior based on the determined role category, and determining the risk level according to the analysis result.

进一步地,所述用户行为包括电力监控系统中的用户行为。Further, the user behavior includes user behavior in the power monitoring system.

为达到上述目的,本发明还提供了一种基于大数据的网络安全行为识别系统,包括:In order to achieve the above purpose, the present invention also provides a network security behavior identification system based on big data, including:

正常行为模型构建模块:用于基于历史数据构建正常行为模型;Normal behavior model building module: used to build a normal behavior model based on historical data;

风险等级生成模块:用于基于正常行为模型对用户的行为进行识别,生成对应的风险等级,所述风险等级包括高风险行为。Risk level generation module: used to identify the user's behavior based on the normal behavior model, and generate a corresponding risk level, where the risk level includes high-risk behaviors.

进一步地,用户的行为包括操作目录或/和操作命令,所述风险等级生成模块包括:Further, the user's behavior includes an operation directory or/and an operation command, and the risk level generation module includes:

角色类别判定子模块:用于对用户的行为与用户行为基准特征库进行比对,判定用户的角色类别;Role category determination sub-module: used to compare the user's behavior with the user behavior benchmark feature library to determine the user's role category;

风险等级判定子模块:用于对用户的行为与用户行为数据库进行比对,如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令,判定为高风险行为;对用户的行为与判定的角色类别所对应的授权信息进行比对,如果用户的行为超出判定的角色类别所对应的授权信息,判定为高风险行为;基于所判定的角色类别对用户的行为进行离群点分析,根据分析结果判定风险等级。Risk level determination sub-module: used to compare the user's behavior with the user behavior database. If the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior; Compare with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior; based on the determined role category, the user's behavior is analyzed for outliers , according to the analysis results to determine the risk level.

与现有技术相比,本发明所达到的有益效果:本发明方法及系统采用K-means算法和SVM算法相结合方式进行大数据机器学习,充分利用了SVM算法在多数据集时分类效果比较优秀的特性,同时在原有基础上做出部分调优的处理。通过大数据机器学习基于历史数据构建正常行为模型,再由模型对用户的网络安全行为自动进行判别和分类,同时结合电力监控系统特性不断进行学习与调整,显著提高了生产效率,简化了操作难度,减低了电力监控系统相关人员的工作,有利于保证电力系统的安全运行。Compared with the prior art, the beneficial effects achieved by the present invention are as follows: the method and system of the present invention use a combination of K-means algorithm and SVM algorithm to carry out big data machine learning, and make full use of the SVM algorithm to compare the classification effects of multiple data sets. Excellent features, and at the same time, some optimizations are made on the original basis. Through big data machine learning, a normal behavior model is constructed based on historical data, and then the model automatically discriminates and classifies the user's network security behavior. At the same time, it continuously learns and adjusts according to the characteristics of the power monitoring system, which significantly improves the production efficiency and simplifies the operation difficulty. , reducing the work of the relevant personnel of the power monitoring system, which is conducive to ensuring the safe operation of the power system.

具体实施方式Detailed ways

下面结合实施例对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案,而不能以此来限制本发明的保护范围。The present invention will be further described below in conjunction with the examples. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and cannot be used to limit the protection scope of the present invention.

本发明具体实施方式提供了一种基于大数据的网络安全行为识别方法,本发明方法基于业务场景和历史数据两个方面要素进行安全分析。其中,基于业务场景的安全分析主要是对用户行为进行定义、采集,同时基于历史数据分析用户日常合理的行为并构建正常行为模型,基于此正常行为模型对用户的行为进行实时分析,监视异常行为。The specific embodiment of the present invention provides a network security behavior identification method based on big data, and the method of the present invention performs security analysis based on two aspects of business scenarios and historical data. Among them, security analysis based on business scenarios mainly defines and collects user behaviors, and analyzes users' daily reasonable behaviors based on historical data and builds a normal behavior model. Based on this normal behavior model, real-time analysis of user behaviors is performed to monitor abnormal behaviors. .

本发明方法主要包括以下步骤:The method of the present invention mainly comprises the following steps:

1.行为定义1. Behavior Definition

对电力监控系统中的用户行为数据进行研究,定义哪些数据是与用户行为有关的内容,确定用户行为数据的格式与内容范围。Research the user behavior data in the power monitoring system, define which data is related to the user behavior, and determine the format and content range of the user behavior data.

2.数据采集2. Data collection

结合业务应用系统的日志或审计纪录,确定用户行为数据采集的范围,通过网络安全管理平台的数据采集功能收集主机服务器、数据库服务器、网络设备和安全设备中相关的用户行为数据,并进行初步分类整理形成了用户行为数据库。Combined with the logs or audit records of the business application system, determine the scope of user behavior data collection, collect the relevant user behavior data in the host server, database server, network equipment and security equipment through the data collection function of the network security management platform, and conduct preliminary classification Organized to form a user behavior database.

3.数据关联3. Data association

采用基于数据挖掘的数据关联技术,基于用户身份的相关信息对用户行为数据库进行基于数据挖掘的关联分析,汇聚构建各个用户的行为数据集合,即所述行为数据集合基于用户身份。Using the data association technology based on data mining, the user behavior database is subjected to data mining-based association analysis based on the relevant information of the user identity, and the behavior data set of each user is aggregated and constructed, that is, the behavior data set is based on the user identity.

4.特征提取4. Feature extraction

对各个用户的行为数据集合中的用户行为进行抽象处理,提取用户日常行为的基本特征值,每个用户构建一个用户行为特征集合。The user behavior in each user's behavior data set is abstracted, the basic feature values of the user's daily behavior are extracted, and a user behavior feature set is constructed for each user.

5.构建模型5. Build the model

结合用户授权信息,对同类别(角色)的用户历史行为特征值进行聚类分析,构建基于角色划分的用户行为基准特征库。Combined with user authorization information, cluster analysis is performed on the user historical behavior feature values of the same category (role), and a user behavior benchmark feature library based on role division is constructed.

6.分类评估6. Classification assessment

依据基于角色的用户行为基准特征库,对当前用户行为集进行实时分析,判定其所属角色类别,并对其分类的可信度进行安全性评价。Based on the role-based user behavior benchmark feature library, the current user behavior set is analyzed in real time, the role category to which it belongs is determined, and the reliability of the classification is evaluated for security.

7.异常分析7. Anomaly Analysis

结合业务应用场景对用户行为进行离群点分析,实现基于用户行为模型的异常行为监测。Combined with business application scenarios, outlier analysis is performed on user behavior, and abnormal behavior monitoring based on user behavior model is realized.

本实施例中,用户的行为包括操作目录和在对应目录中的操作命令,用户的行为的风险等级划设为四个级别,即:1级用户行为(无风险行为,绿色警报)、2级用户行为(低风险行为,黄色警报)、3级用户行为(中等风险行为,橙色警报)、4级用户行为(高风险行为和异常行为,红色警报)。具体示例如下:In this embodiment, the user's behavior includes an operation directory and operation commands in the corresponding directory, and the risk level of the user's behavior is divided into four levels, namely: level 1 user behavior (risk-free behavior, green alert), level 2 User behavior (low risk behavior, yellow alert), level 3 user behavior (medium risk behavior, orange alert), level 4 user behavior (high risk behavior and abnormal behavior, red alert). Specific examples are as follows:

1级用户行为Level 1 User Behavior

在“/home”目录下操作,操作内容为:命令仅包括“ls,cd,ifconfig,netstat,ping”,仅搜索文件、IP、端口,测试网络连接操作;Operate in the "/home" directory, the operation content is: the command only includes "ls, cd, ifconfig, netstat, ping", only searches for files, IP, ports, and tests the network connection operation;

2级用户行为Level 2 User Behavior

在“/opt或/usr或/etc或/var或/proc或/tmp”目录下操作,操作内容为:做任何操作命令;Operate in the "/opt or /usr or /etc or /var or /proc or /tmp" directory, the operation content is: do any operation command;

3级用户行为Level 3 User Behavior

(1)在“/home”目录下操作,操作内容为:命令“rm,cp,su,passwd,chown”;(1) Operate in the "/home" directory, the operation content is: command "rm, cp, su, passwd, chown";

(2)在“/root”目录下的操作和“/”操作,操作内容为:命令“ls,cd,ifconfig,netstat,ping”;(2) The operation in the "/root" directory and the "/" operation, the operation content is: command "ls, cd, ifconfig, netstat, ping";

4级用户行为Level 4 User Behavior

(1)在“/root”目录和“/”操作中,操作内容为:输入的指令包括命令“rm,pwd,reboot,pkill,su,chown”,删除文件,更改密码,重新启动系统,终止进程,修改权限和其他操作;(1) In the "/root" directory and the "/" operation, the operation content is: the input commands include the commands "rm, pwd, reboot, pkill, su, chown", delete files, change passwords, restart the system, terminate processes, modify permissions and other operations;

(2)不符合任何历史数据、历史操作目录;(2) Does not conform to any historical data or historical operation catalog;

(3)实施超权限指令操作。(3) Implement the super-authorized instruction operation.

用户的行为的风险等级均在前述步骤7中进行判别,当满足上述关于风险等级的判别要求时,即判定为对应的风险等级。其中,The risk level of the user's behavior is all judged in the aforementioned step 7, and when the above-mentioned judgment requirements on the risk level are met, the corresponding risk level is judged. in,

4级用户行为中的情形(2),其判别过程为:对用户的行为与用户行为数据库进行比对,如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令,判定为高风险行为。For the situation (2) in the user behavior of level 4, the judging process is as follows: compare the user behavior with the user behavior database, if the user behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as high-risk behavior.

4级用户行为中的情形(3),其判别过程为:对用户的行为与判定的角色类别所对应的授权信息进行比对,如果用户的行为超出判定的角色类别所对应的授权信息,判定为高风险行为。In the case (3) of level 4 user behavior, the judging process is as follows: compare the user's behavior with the authorization information corresponding to the determined role category, and determine if the user's behavior exceeds the authorization information corresponding to the determined role category. high-risk behavior.

更具体地,前述正常行为模型采用K-means算法进行构建。K-means算法是典型的基于距离的聚类算法,采用距离作为相似度评价指标,即认为两个对象的距离越近,其相似度越大。计算距离时常用欧式距离或余弦角度。K表示目标聚类簇数,means表示均值,K-means就是通过均值对数据点进行聚类的算法。More specifically, the aforementioned normal behavior model is constructed using the K-means algorithm. The K-means algorithm is a typical distance-based clustering algorithm, and the distance is used as the similarity evaluation index, that is, the closer the distance between two objects, the greater the similarity. Euclidean distance or cosine angle are often used when calculating distance. K represents the number of target clusters, means represents the mean, and K-means is an algorithm for clustering data points by the mean.

K-means算法分为簇分配和移动聚类中心两个步骤:The K-means algorithm is divided into two steps: cluster assignment and moving cluster centers:

1)随机选择k个对象,每个对象代表了一个类簇的平均值。对于剩余的每个对象,根据其与各个聚类中心的距离,将它分配给最近的簇。1) Randomly select k objects, each representing the mean of a cluster. For each remaining object, it is assigned to the nearest cluster based on its distance from the respective cluster center.

2)重新计算每个簇的平均值。重复该过程,直到准则函数E收敛,即聚类中心不再发生明显的变化。2) Recalculate the mean for each cluster. This process is repeated until the criterion function E converges, that is, the cluster centers no longer change significantly.

通常采用误差平方和准则函数E作为性能度量,该准则函数表示所有样本点到各自簇的均值向量的距离之和,E值越小,簇内样本值相似度越高。最小化准则函数E是个非确定性多项式(Nondeterministic Polynomially,NP)问题,而聚类算法可以看成是一个坐标上升算法,即通过固定一个变量,调整另一个变量,通过迭代过程不断进行调整,最终得到局部最优解。Usually, the error sum of squares criterion function E is used as a performance measure. The criterion function represents the sum of the distances from all sample points to the mean vector of their respective clusters. The smaller the E value, the higher the similarity of the sample values in the cluster. The minimization criterion function E is a nondeterministic polynomial (NP) problem, and the clustering algorithm can be regarded as a coordinate ascent algorithm, that is, by fixing one variable, adjusting another variable, and continuously adjusting through an iterative process, and finally get the local optimal solution.

聚类完成,数据由未标记数据转换为标记数据,满足监督学习的训练数据特性。进行分类预测,首先,需要划分训练集和测试集,且需保证训练集中样本类别的分布要尽可能与测试集的样本类别分布一致,否则会影响模型的评估。同时,样本类别的比例要保持平衡,样本类别比例失调会导致训练得到的模型过拟合或者欠拟合,因此,在样本训练测试集初始阶段,划分时要保证:After the clustering is completed, the data is converted from unlabeled data to labeled data, which satisfies the training data characteristics of supervised learning. For classification prediction, first of all, it is necessary to divide the training set and the test set, and it is necessary to ensure that the distribution of sample categories in the training set is as consistent as possible with the distribution of sample categories in the test set, otherwise it will affect the evaluation of the model. At the same time, the proportion of sample categories should be kept balanced, and the imbalance of the proportion of sample categories will lead to over-fitting or under-fitting of the model obtained by training. Therefore, in the initial stage of the sample training and test set, it is necessary to ensure that:

1)训练集和测试集中样本类别所在数据集的比例尽可能的一致;1) The proportions of the data sets where the sample categories in the training set and the test set are located are as consistent as possible;

2)数据集中样本中所有类别比例尽可能保持平衡。由于支持向量机(SupportVector Machine,SVM)在很多数据集上都有优秀的表现,其属于典型的超平面分类器,其基本原理如下所示:2) The proportions of all categories in the samples in the dataset are kept as balanced as possible. Since Support Vector Machine (SVM) has excellent performance on many data sets, it is a typical hyperplane classifier, and its basic principles are as follows:

Figure BDA0002455176730000061
为客户输入数据,其中x是一个d维的向量用
Figure BDA0002455176730000062
来表示;Assume
Figure BDA0002455176730000061
Enter data for customers, where x is a d-dimensional vector with
Figure BDA0002455176730000062
To represent;

Figure BDA0002455176730000063
为映射规则,{c1,...,ck}为各个分类平面集合;ck*=Cw(x)为分类结果,其中k*∈{1,..,k},
Figure BDA0002455176730000066
为x基于模型w对应的分类结果。
Figure BDA0002455176730000063
is the mapping rule, {c 1 ,...,c k } is the set of each classification plane; c k* =C w (x) is the classification result, where k * ∈{1,..,k},
Figure BDA0002455176730000066
is the classification result corresponding to x based on model w.

超平面通过下式计算过程如下:The calculation process of the hyperplane is as follows:

(1)初步计算训练的样本的分类的精度,通过适应度函数来实现,适应度数越大,代表SVM分类越可靠,适应度函数为:(1) Preliminarily calculate the classification accuracy of the training samples, which is realized by the fitness function. The larger the fitness, the more reliable the SVM classification is. The fitness function is:

Figure BDA0002455176730000064
Figure BDA0002455176730000064

式中,just为适应度值,di和dj分别表示分类到分别平面的平均距离,dmax是任意两个相异的分类中的最大距离。In the formula, just is the fitness value, d i and d j respectively represent the average distance from the classification to the respective plane, and d max is the maximum distance in any two dissimilar classifications.

(2)在分类精度基础上增加权重公式,如下:(2) The weight formula is added on the basis of the classification accuracy, as follows:

Figure BDA0002455176730000065
Figure BDA0002455176730000065

式中,weight为权重值,n代表分类个数,b代表分类准则中多项式表达式中常量。计算出每个特征的权重,去掉部分权重较小的特征。In the formula, weight is the weight value, n represents the number of classifications, and b represents the constant in the polynomial expression in the classification criteria. Calculate the weight of each feature, and remove some features with smaller weights.

(3)类别匹配算法求出分类。上述中的权重公式即为匹配算法的系数,同时匹配算法中加入距离因子,如下:(3) Class matching algorithm to find out the classification. The weight formula in the above is the coefficient of the matching algorithm, and the distance factor is added to the matching algorithm, as follows:

Figure BDA0002455176730000071
Figure BDA0002455176730000071

式中,α为距离因子,mdi为平均距离,平均距离采用为分类中各点到分类平面的均值,采用算术平均值,几何平均值,平方平均值,调和平均值,加权平均值中一种或多种,ci为点到分类平面的距离。In the formula, α is the distance factor, md i is the average distance, and the average distance is the mean of each point in the classification to the classification plane, using one of the arithmetic mean, geometric mean, square mean, harmonic mean, and weighted mean. One or more, ci is the distance from the point to the classification plane.

对于未知样本在不同的分类平面中的距离,计算公式如下:For the distance of unknown samples in different classification planes, the calculation formula is as follows:

Figure BDA0002455176730000072
Figure BDA0002455176730000072

式中,dik为在k分类中各点到分类平面的距离,weightk为第k分类中的权重系数。求出距离最大时候的k值,即为求出分类。In the formula, d ik is the distance from each point in the k classification to the classification plane, and weight k is the weight coefficient in the kth classification. Find the k value when the distance is the largest, that is, to find the classification.

本发明具体实施方式还提供了一种基于大数据的网络安全行为识别系统,本发明系统用于实现前述发明方法,所述系统包括:The specific embodiment of the present invention also provides a network security behavior recognition system based on big data. The system of the present invention is used to implement the aforementioned inventive method, and the system includes:

正常行为模型构建模块:用于基于历史数据构建正常行为模型;Normal behavior model building module: used to build a normal behavior model based on historical data;

风险等级生成模块:用于基于正常行为模型对用户的行为进行识别,生成对应的风险等级,所述风险等级包括高风险行为。具体包括如下子模块:Risk level generation module: used to identify the user's behavior based on the normal behavior model, and generate a corresponding risk level, where the risk level includes high-risk behaviors. Specifically, it includes the following sub-modules:

角色类别判定子模块:用于对用户的行为与用户行为基准特征库进行比对,判定用户的角色类别;Role category determination sub-module: used to compare the user's behavior with the user behavior benchmark feature library to determine the user's role category;

风险等级判定子模块:用于对用户的行为与用户行为数据库进行比对,如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令,判定为高风险行为;对用户的行为与判定的角色类别所对应的授权信息进行比对,如果用户的行为超出判定的角色类别所对应的授权信息,判定为高风险行为;基于所判定的角色类别对用户的行为进行离群点分析,根据分析结果判定风险等级。Risk level determination sub-module: used to compare the user's behavior with the user behavior database. If the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior; Compare with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior; based on the determined role category, the user's behavior is analyzed for outliers , according to the analysis results to determine the risk level.

以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can also be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.

Claims (10)

1.一种基于大数据的网络安全行为识别方法,其特征是,包括如下步骤:1. a network security behavior identification method based on big data, is characterized in that, comprises the steps: 基于历史数据构建正常行为模型;Build a normal behavior model based on historical data; 基于正常行为模型对用户的行为进行识别,生成对应的风险等级,所述风险等级包括高风险行为。The user's behavior is identified based on the normal behavior model, and a corresponding risk level is generated, where the risk level includes high-risk behaviors. 2.根据权利要求1所述的基于大数据的网络安全行为识别方法,其特征是,所述正常行为模型的构建方法包括K-means算法和SVM算法。2 . The big data-based network security behavior identification method according to claim 1 , wherein the construction method of the normal behavior model comprises K-means algorithm and SVM algorithm. 3 . 3.根据权利要求1所述的基于大数据的网络安全行为识别方法,其特征是,基于历史数据构建正常行为模型,包括:3. the network security behavior identification method based on big data according to claim 1, is characterized in that, builds normal behavior model based on historical data, comprises: 采集用户行为历史数据,构建用户行为数据库;Collect user behavior historical data and build a user behavior database; 对用户行为数据库进行关联分析,构建基于用户身份的行为数据集合;Perform association analysis on the user behavior database to construct a user identity-based behavior data collection; 对行为数据集合进行抽象处理,提取用户的日常行为基本特征值,构建基于用户身份的用户行为特征集合;Abstracting the behavior data set, extracting the basic feature values of the user's daily behavior, and constructing a user behavior feature set based on user identity; 结合用户角色的授权信息,对不少于两个行为特征集合进行聚类分析,构建基于用户角色的用户行为基准特征库。Combined with the authorization information of user roles, cluster analysis is performed on no less than two behavior feature sets, and a user behavior benchmark feature library based on user roles is constructed. 4.根据权利要求3所述的基于大数据的网络安全行为识别方法,其特征是,基于正常行为模型对用户的行为进行识别,包括:4. the network security behavior identification method based on big data according to claim 3, is characterized in that, the behavior of user is identified based on normal behavior model, comprising: 对用户的行为与用户行为基准特征库进行比对,判定用户的角色类别。The user's behavior is compared with the user behavior benchmark feature database to determine the user's role category. 5.根据权利要求4所述的基于大数据的网络安全行为识别方法,其特征是,用户的行为包括操作目录或/和操作命令;基于正常行为模型对用户的行为进行识别,还包括:5. The method for identifying network security behavior based on big data according to claim 4, wherein the user's behavior includes an operation directory or/and an operation command; the user's behavior is identified based on a normal behavior model, further comprising: 对用户的行为与用户行为数据库进行比对,如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令,判定为高风险行为。The user's behavior is compared with the user behavior database, and if the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior. 6.根据权利要求4所述的基于大数据的网络安全行为识别方法,其特征是,基于正常行为模型对用户的行为进行识别,还包括:6. The network security behavior identification method based on big data according to claim 4, is characterized in that, based on normal behavior model, the behavior of user is identified, also comprising: 对用户的行为与判定的角色类别所对应的授权信息进行比对,如果用户的行为超出判定的角色类别所对应的授权信息,判定为高风险行为。The user's behavior is compared with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior. 7.根据权利要求4所述的基于大数据的网络安全行为识别方法,其特征是,基于正常行为模型对用户的行为进行识别,还包括:基于所判定的角色类别对用户的行为进行离群点分析,根据分析结果判定风险等级。7. The network security behavior identification method based on big data according to claim 4, wherein the behavior of the user is identified based on the normal behavior model, and further comprising: the behavior of the user is outlier based on the determined role category Point analysis, according to the analysis results to determine the risk level. 8.根据权利要求1所述的基于大数据的网络安全行为识别方法,其特征是,所述用户行为包括电力监控系统中的用户行为。8 . The method for identifying network security behaviors based on big data according to claim 1 , wherein the user behaviors include user behaviors in a power monitoring system. 9 . 9.一种基于大数据的网络安全行为识别系统,其特征是,包括:9. A network security behavior identification system based on big data, characterized by comprising: 正常行为模型构建模块:用于基于历史数据构建正常行为模型;Normal behavior model building module: used to build a normal behavior model based on historical data; 风险等级生成模块:用于基于正常行为模型对用户的行为进行识别,生成对应的风险等级,所述风险等级包括高风险行为。Risk level generation module: used to identify the user's behavior based on the normal behavior model, and generate a corresponding risk level, where the risk level includes high-risk behaviors. 10.根据权利要求9所述的基于大数据的网络安全行为识别系统,其特征是,用户的行为包括操作目录或/和操作命令,所述风险等级生成模块包括:10. The big data-based network security behavior identification system according to claim 9, wherein the user's behavior includes an operation directory or/and an operation command, and the risk level generation module includes: 角色类别判定子模块:用于对用户的行为与用户行为基准特征库进行比对,判定用户的角色类别;Role category determination sub-module: used to compare the user's behavior with the user behavior benchmark feature library to determine the user's role category; 风险等级判定子模块:用于对用户的行为与用户行为数据库进行比对,如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令,判定为高风险行为;对用户的行为与判定的角色类别所对应的授权信息进行比对,如果用户的行为超出判定的角色类别所对应的授权信息,判定为高风险行为;基于所判定的角色类别对用户的行为进行离群点分析,根据分析结果判定风险等级。Risk level determination sub-module: It is used to compare the user's behavior with the user behavior database. If the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior; Compare with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior; based on the determined role category, the user's behavior is analyzed for outliers , according to the analysis results to determine the risk level.
CN202010304316.5A 2020-04-17 2020-04-17 A method and system for identifying network security behavior based on big data Pending CN111597549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010304316.5A CN111597549A (en) 2020-04-17 2020-04-17 A method and system for identifying network security behavior based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010304316.5A CN111597549A (en) 2020-04-17 2020-04-17 A method and system for identifying network security behavior based on big data

Publications (1)

Publication Number Publication Date
CN111597549A true CN111597549A (en) 2020-08-28

Family

ID=72185162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010304316.5A Pending CN111597549A (en) 2020-04-17 2020-04-17 A method and system for identifying network security behavior based on big data

Country Status (1)

Country Link
CN (1) CN111597549A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269935A (en) * 2021-07-16 2021-08-17 融讯伟业(北京)科技有限公司 Visual weighing method and weighing system based on screen-free weighing device
CN115150125A (en) * 2022-05-23 2022-10-04 国网安徽省电力有限公司黄山供电公司 A network security situational awareness system suitable for power system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106067088A (en) * 2016-05-30 2016-11-02 中国邮政储蓄银行股份有限公司 E-bank accesses detection method and the device of behavior
CN106209893A (en) * 2016-07-27 2016-12-07 中国人民解放军信息工程大学 The inside threat detecting system excavated based on business process model and detection method thereof
CN106778259A (en) * 2016-12-28 2017-05-31 北京明朝万达科技股份有限公司 A kind of abnormal behaviour based on big data machine learning finds method and system
CN107229849A (en) * 2016-03-24 2017-10-03 全球能源互联网研究院 Towards the database user behavior safety auditing method on power information intranet and extranet border
CN107276980A (en) * 2017-05-02 2017-10-20 广东电网有限责任公司信息中心 A kind of user's anomaly detection method and system based on association analysis
CN107888574A (en) * 2017-10-27 2018-04-06 深信服科技股份有限公司 Method, server and the storage medium of Test database risk
CN108875365A (en) * 2018-04-22 2018-11-23 北京光宇之勋科技有限公司 A kind of intrusion detection method and intrusion detection detection device
CN109492857A (en) * 2018-09-18 2019-03-19 中国电力科学研究院有限公司 A kind of distribution network failure risk class prediction technique and device
US20190179906A1 (en) * 2017-12-12 2019-06-13 Institute For Information Industry Behavior inference model building apparatus and behavior inference model building method thereof
CN109918906A (en) * 2017-12-12 2019-06-21 财团法人资讯工业策进会 Abnormal behavior detection model generation device and abnormal behavior detection model generation method
CN109936549A (en) * 2017-12-18 2019-06-25 航天信息股份有限公司 Audit data processing method and device based on PKI platform
CN110765087A (en) * 2019-10-14 2020-02-07 西安交通大学 User account abuse auditing method and system based on network security device log data
CN110781930A (en) * 2019-10-14 2020-02-11 西安交通大学 User portrait grouping and behavior analysis method and system based on log data of network security equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229849A (en) * 2016-03-24 2017-10-03 全球能源互联网研究院 Towards the database user behavior safety auditing method on power information intranet and extranet border
CN106067088A (en) * 2016-05-30 2016-11-02 中国邮政储蓄银行股份有限公司 E-bank accesses detection method and the device of behavior
CN106209893A (en) * 2016-07-27 2016-12-07 中国人民解放军信息工程大学 The inside threat detecting system excavated based on business process model and detection method thereof
CN106778259A (en) * 2016-12-28 2017-05-31 北京明朝万达科技股份有限公司 A kind of abnormal behaviour based on big data machine learning finds method and system
CN107276980A (en) * 2017-05-02 2017-10-20 广东电网有限责任公司信息中心 A kind of user's anomaly detection method and system based on association analysis
CN107888574A (en) * 2017-10-27 2018-04-06 深信服科技股份有限公司 Method, server and the storage medium of Test database risk
CN109918906A (en) * 2017-12-12 2019-06-21 财团法人资讯工业策进会 Abnormal behavior detection model generation device and abnormal behavior detection model generation method
US20190179906A1 (en) * 2017-12-12 2019-06-13 Institute For Information Industry Behavior inference model building apparatus and behavior inference model building method thereof
CN109936549A (en) * 2017-12-18 2019-06-25 航天信息股份有限公司 Audit data processing method and device based on PKI platform
CN108875365A (en) * 2018-04-22 2018-11-23 北京光宇之勋科技有限公司 A kind of intrusion detection method and intrusion detection detection device
CN109492857A (en) * 2018-09-18 2019-03-19 中国电力科学研究院有限公司 A kind of distribution network failure risk class prediction technique and device
CN110765087A (en) * 2019-10-14 2020-02-07 西安交通大学 User account abuse auditing method and system based on network security device log data
CN110781930A (en) * 2019-10-14 2020-02-11 西安交通大学 User portrait grouping and behavior analysis method and system based on log data of network security equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269935A (en) * 2021-07-16 2021-08-17 融讯伟业(北京)科技有限公司 Visual weighing method and weighing system based on screen-free weighing device
CN113269935B (en) * 2021-07-16 2021-11-30 融讯伟业(北京)科技有限公司 Visual weighing method and weighing system based on screen-free weighing device
CN115150125A (en) * 2022-05-23 2022-10-04 国网安徽省电力有限公司黄山供电公司 A network security situational awareness system suitable for power system

Similar Documents

Publication Publication Date Title
CN112769796B (en) A cloud-network-terminal collaborative defense method and system based on terminal-side edge computing
WO2021077642A1 (en) Network space security threat detection method and system based on heterogeneous graph embedding
CN111614491B (en) A method and system for selection of security situation assessment indicators for power monitoring systems
CN110636066B (en) Network security threat situation assessment method based on unsupervised generative reasoning
WO2021189831A1 (en) Log optimization method, apparatus and device, and readable storage medium
CN111199252A (en) Fault diagnosis method for intelligent operation and maintenance system of power communication network
CN114943475A (en) Power distribution operation risk assessment method and system based on key element data of power distribution network
Pathak et al. Study on decision tree and KNN algorithm for intrusion detection system
CN109143848A (en) Industrial control system intrusion detection method based on FCM-GASVM
CN111597549A (en) A method and system for identifying network security behavior based on big data
CN113706100B (en) Method and system for real-time detection and identification of IoT terminal equipment in distribution network
CN109992484A (en) A kind of network alarm correlation analysis method, device and medium
CN111817971A (en) A data center network traffic splicing method based on deep learning
Shi et al. Three-layer hybrid intrusion detection model for smart home malicious attacks
CN117236699A (en) Network risk identification method and system based on big data analysis
CN116628554A (en) Industrial Internet data anomaly detection method, system and equipment
CN116069607A (en) Abnormal Behavior Detection Method of Mobile Office Users Based on Graph Convolutional Neural Network
CN112202718A (en) An operating system identification method, storage medium and device based on XGBoost algorithm
CN116032526A (en) An abnormal network traffic detection method based on machine learning model optimization
CN108121912B (en) Malicious cloud tenant identification method and device based on neural network
CN114493246A (en) Power information network node risk assessment method based on DW-Degree centrality
CN114866325B (en) Prediction methods for power system cyber attacks
Rachburee et al. Big data analytics: feature selection and machine learning for intrusion detection on microsoft azure platform
CN116308115A (en) Power information asset identification and analysis method based on network detection technology
Dong et al. The research on user short-term electricity load forecasting for judging electric theft

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210924

Address after: 313000 No. 728-5, Fenghuang Road, Wuxing District, Huzhou City, Zhejiang Province

Applicant after: State Grid Zhejiang Electric Power Co., Ltd. Huzhou Power Supply Co.

Applicant after: BEIJING KEDONG POWER CONTROL SYSTEM Co.,Ltd.

Applicant after: ZHEJIANG TAILUN ELECTRIC POWER GROUP CO.,LTD.

Address before: 313000 No. 728-5, Fenghuang Road, Wuxing District, Huzhou City, Zhejiang Province

Applicant before: State Grid Zhejiang Electric Power Co., Ltd. Huzhou Power Supply Co.

Applicant before: BEIJING KEDONG POWER CONTROL SYSTEM Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20200828

RJ01 Rejection of invention patent application after publication