CN111597549A

CN111597549A - A method and system for identifying network security behavior based on big data

Info

Publication number: CN111597549A
Application number: CN202010304316.5A
Authority: CN
Inventors: 杨力强; 王坤; 曹建伟; 何纪成; 钱海峰; 梁野; 陈永炜; 李慧勋; 胡宗宁; 张志军; 王昊
Original assignee: Beijing Kedong Electric Power Control System Co Ltd; Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Zhejiang Tailun Electric Power Group Co ltd; Beijing Kedong Electric Power Control System Co Ltd; Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-08-28

Abstract

The invention discloses a network security behavior recognition method and system based on big data in the technical field of network security management, and aims to solve the technical problems that the risk behavior criterion is formulated in the prior art, a blacklist is manually configured mainly by operation and maintenance personnel, scientific basis is not available, possible risk behaviors cannot be exhausted, obvious limitation exists, and the safety of a power system is not guaranteed. The method comprises the following steps: constructing a normal behavior model based on historical data; and identifying the behaviors of the user based on the normal behavior model, and generating corresponding risk levels, wherein the risk levels comprise high-risk behaviors.

Description

A method and system for identifying network security behavior based on big data

技术领域technical field

本发明涉及一种基于大数据的网络安全行为识别方法及系统，属于网络安全管理技术领域。The invention relates to a big data-based network security behavior identification method and system, and belongs to the technical field of network security management.

背景技术Background technique

网络安全管理平台中，为避免用户错误操作或超权限操作，需要通过监测装置根据风险行为判据对所采集到的操作日志进行判断，确认对应的操作行为是否为风险行为，以便采取对应的保护措施。In the network security management platform, in order to avoid wrong operation or unauthorized operation by users, it is necessary to use the monitoring device to judge the collected operation log according to the risk behavior criterion, to confirm whether the corresponding operation behavior is a risk behavior, so as to take corresponding protection. measure.

目前，制订风险行为判据主要通过运维人员手动配置黑名单的方式，黑名单大多采用出厂默认设置或者运维人员根据个人经验设置，既无科学依据，同时对于可能存在的风险行为无法穷尽，存在明显局限性，不利于保证电力系统安全。At present, the establishment of risk behavior criteria is mainly through the manual configuration of blacklists by operation and maintenance personnel. Most of the blacklists are set by factory defaults or set by operation and maintenance personnel based on personal experience. There is no scientific basis, and the possible risk behaviors cannot be exhausted. There are obvious limitations, which are not conducive to ensuring the safety of the power system.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足，本发明的目的在于提供一种基于大数据的网络安全行为识别方法及系统，以解决现有技术中制订风险行为判据主要通过运维人员手动配置黑名单的方式，既无科学依据，同时对于可能存在的风险行为无法穷尽，存在明显局限性，不利于保证电力系统安全的技术问题。In view of the deficiencies of the prior art, the purpose of the present invention is to provide a method and system for identifying network security behaviors based on big data, so as to solve the problem that in the prior art, the method of formulating risk behavior criteria is mainly through the manual configuration of blacklists by operation and maintenance personnel. There is no scientific basis, and at the same time, the possible risk behaviors cannot be exhausted, and there are obvious limitations, which are not conducive to the technical problems of ensuring the safety of the power system.

为解决上述技术问题，本发明所采用的技术方案是：For solving the above-mentioned technical problems, the technical scheme adopted in the present invention is:

一种基于大数据的网络安全行为识别方法，包括如下步骤：A method for identifying network security behaviors based on big data, comprising the following steps:

基于历史数据构建正常行为模型；Build a normal behavior model based on historical data;

基于正常行为模型对用户的行为进行识别，生成对应的风险等级，所述风险等级包括高风险行为。The user's behavior is identified based on the normal behavior model, and a corresponding risk level is generated, where the risk level includes high-risk behaviors.

进一步地，所述正常行为模型的构建方法包括K-means算法和SVM算法。Further, the construction method of the normal behavior model includes K-means algorithm and SVM algorithm.

进一步地，基于历史数据构建正常行为模型，包括：Further, build a normal behavior model based on historical data, including:

采集用户行为历史数据，构建用户行为数据库；Collect user behavior historical data and build a user behavior database;

对用户行为数据库进行关联分析，构建基于用户身份的行为数据集合；Perform association analysis on the user behavior database to construct a user identity-based behavior data collection;

对行为数据集合进行抽象处理，提取用户的日常行为基本特征值，构建基于用户身份的用户行为特征集合；Abstracting the behavior data set, extracting the basic feature values of the user's daily behavior, and constructing a user behavior feature set based on user identity;

结合用户角色的授权信息，对不少于两个行为特征集合进行聚类分析，构建基于用户角色的用户行为基准特征库。Combined with the authorization information of user roles, cluster analysis is performed on no less than two behavior feature sets, and a user behavior benchmark feature library based on user roles is constructed.

进一步地，基于正常行为模型对用户的行为进行识别，包括：Further, the user's behavior is identified based on the normal behavior model, including:

对用户的行为与用户行为基准特征库进行比对，判定用户的角色类别。The user's behavior is compared with the user behavior benchmark feature database to determine the user's role category.

进一步地，用户的行为包括操作目录或/和操作命令；基于正常行为模型对用户的行为进行识别，还包括：Further, the user's behavior includes the operation directory or/and the operation command; the user's behavior is identified based on the normal behavior model, and also includes:

对用户的行为与用户行为数据库进行比对，如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令，判定为高风险行为。The user's behavior is compared with the user behavior database, and if the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior.

进一步地，基于正常行为模型对用户的行为进行识别，还包括：Further, identifying the user's behavior based on the normal behavior model also includes:

对用户的行为与判定的角色类别所对应的授权信息进行比对，如果用户的行为超出判定的角色类别所对应的授权信息，判定为高风险行为。The user's behavior is compared with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior.

进一步地，基于正常行为模型对用户的行为进行识别，还包括：基于所判定的角色类别对用户的行为进行离群点分析，根据分析结果判定风险等级。Further, identifying the user's behavior based on the normal behavior model further includes: performing outlier analysis on the user's behavior based on the determined role category, and determining the risk level according to the analysis result.

进一步地，所述用户行为包括电力监控系统中的用户行为。Further, the user behavior includes user behavior in the power monitoring system.

为达到上述目的，本发明还提供了一种基于大数据的网络安全行为识别系统，包括：In order to achieve the above purpose, the present invention also provides a network security behavior identification system based on big data, including:

正常行为模型构建模块：用于基于历史数据构建正常行为模型；Normal behavior model building module: used to build a normal behavior model based on historical data;

风险等级生成模块：用于基于正常行为模型对用户的行为进行识别，生成对应的风险等级，所述风险等级包括高风险行为。Risk level generation module: used to identify the user's behavior based on the normal behavior model, and generate a corresponding risk level, where the risk level includes high-risk behaviors.

进一步地，用户的行为包括操作目录或/和操作命令，所述风险等级生成模块包括：Further, the user's behavior includes an operation directory or/and an operation command, and the risk level generation module includes:

角色类别判定子模块：用于对用户的行为与用户行为基准特征库进行比对，判定用户的角色类别；Role category determination sub-module: used to compare the user's behavior with the user behavior benchmark feature library to determine the user's role category;

风险等级判定子模块：用于对用户的行为与用户行为数据库进行比对，如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令，判定为高风险行为；对用户的行为与判定的角色类别所对应的授权信息进行比对，如果用户的行为超出判定的角色类别所对应的授权信息，判定为高风险行为；基于所判定的角色类别对用户的行为进行离群点分析，根据分析结果判定风险等级。Risk level determination sub-module: used to compare the user's behavior with the user behavior database. If the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior; Compare with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior; based on the determined role category, the user's behavior is analyzed for outliers , according to the analysis results to determine the risk level.

与现有技术相比，本发明所达到的有益效果：本发明方法及系统采用K-means算法和SVM算法相结合方式进行大数据机器学习，充分利用了SVM算法在多数据集时分类效果比较优秀的特性，同时在原有基础上做出部分调优的处理。通过大数据机器学习基于历史数据构建正常行为模型，再由模型对用户的网络安全行为自动进行判别和分类，同时结合电力监控系统特性不断进行学习与调整，显著提高了生产效率，简化了操作难度，减低了电力监控系统相关人员的工作，有利于保证电力系统的安全运行。Compared with the prior art, the beneficial effects achieved by the present invention are as follows: the method and system of the present invention use a combination of K-means algorithm and SVM algorithm to carry out big data machine learning, and make full use of the SVM algorithm to compare the classification effects of multiple data sets. Excellent features, and at the same time, some optimizations are made on the original basis. Through big data machine learning, a normal behavior model is constructed based on historical data, and then the model automatically discriminates and classifies the user's network security behavior. At the same time, it continuously learns and adjusts according to the characteristics of the power monitoring system, which significantly improves the production efficiency and simplifies the operation difficulty. , reducing the work of the relevant personnel of the power monitoring system, which is conducive to ensuring the safe operation of the power system.

具体实施方式Detailed ways

下面结合实施例对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案，而不能以此来限制本发明的保护范围。The present invention will be further described below in conjunction with the examples. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and cannot be used to limit the protection scope of the present invention.

本发明具体实施方式提供了一种基于大数据的网络安全行为识别方法，本发明方法基于业务场景和历史数据两个方面要素进行安全分析。其中，基于业务场景的安全分析主要是对用户行为进行定义、采集，同时基于历史数据分析用户日常合理的行为并构建正常行为模型，基于此正常行为模型对用户的行为进行实时分析，监视异常行为。The specific embodiment of the present invention provides a network security behavior identification method based on big data, and the method of the present invention performs security analysis based on two aspects of business scenarios and historical data. Among them, security analysis based on business scenarios mainly defines and collects user behaviors, and analyzes users' daily reasonable behaviors based on historical data and builds a normal behavior model. Based on this normal behavior model, real-time analysis of user behaviors is performed to monitor abnormal behaviors. .

本发明方法主要包括以下步骤：The method of the present invention mainly comprises the following steps:

1.行为定义1. Behavior Definition

对电力监控系统中的用户行为数据进行研究，定义哪些数据是与用户行为有关的内容，确定用户行为数据的格式与内容范围。Research the user behavior data in the power monitoring system, define which data is related to the user behavior, and determine the format and content range of the user behavior data.

2.数据采集2. Data collection

结合业务应用系统的日志或审计纪录，确定用户行为数据采集的范围，通过网络安全管理平台的数据采集功能收集主机服务器、数据库服务器、网络设备和安全设备中相关的用户行为数据，并进行初步分类整理形成了用户行为数据库。Combined with the logs or audit records of the business application system, determine the scope of user behavior data collection, collect the relevant user behavior data in the host server, database server, network equipment and security equipment through the data collection function of the network security management platform, and conduct preliminary classification Organized to form a user behavior database.

3.数据关联3. Data association

采用基于数据挖掘的数据关联技术，基于用户身份的相关信息对用户行为数据库进行基于数据挖掘的关联分析，汇聚构建各个用户的行为数据集合，即所述行为数据集合基于用户身份。Using the data association technology based on data mining, the user behavior database is subjected to data mining-based association analysis based on the relevant information of the user identity, and the behavior data set of each user is aggregated and constructed, that is, the behavior data set is based on the user identity.

4.特征提取4. Feature extraction

对各个用户的行为数据集合中的用户行为进行抽象处理，提取用户日常行为的基本特征值，每个用户构建一个用户行为特征集合。The user behavior in each user's behavior data set is abstracted, the basic feature values of the user's daily behavior are extracted, and a user behavior feature set is constructed for each user.

5.构建模型5. Build the model

结合用户授权信息，对同类别(角色)的用户历史行为特征值进行聚类分析，构建基于角色划分的用户行为基准特征库。Combined with user authorization information, cluster analysis is performed on the user historical behavior feature values of the same category (role), and a user behavior benchmark feature library based on role division is constructed.

6.分类评估6. Classification assessment

依据基于角色的用户行为基准特征库，对当前用户行为集进行实时分析，判定其所属角色类别，并对其分类的可信度进行安全性评价。Based on the role-based user behavior benchmark feature library, the current user behavior set is analyzed in real time, the role category to which it belongs is determined, and the reliability of the classification is evaluated for security.

7.异常分析7. Anomaly Analysis

结合业务应用场景对用户行为进行离群点分析，实现基于用户行为模型的异常行为监测。Combined with business application scenarios, outlier analysis is performed on user behavior, and abnormal behavior monitoring based on user behavior model is realized.

本实施例中，用户的行为包括操作目录和在对应目录中的操作命令，用户的行为的风险等级划设为四个级别，即：1级用户行为(无风险行为，绿色警报)、2级用户行为(低风险行为，黄色警报)、3级用户行为(中等风险行为，橙色警报)、4级用户行为(高风险行为和异常行为，红色警报)。具体示例如下：In this embodiment, the user's behavior includes an operation directory and operation commands in the corresponding directory, and the risk level of the user's behavior is divided into four levels, namely: level 1 user behavior (risk-free behavior, green alert), level 2 User behavior (low risk behavior, yellow alert), level 3 user behavior (medium risk behavior, orange alert), level 4 user behavior (high risk behavior and abnormal behavior, red alert). Specific examples are as follows:

1级用户行为Level 1 User Behavior

在“/home”目录下操作，操作内容为：命令仅包括“ls，cd，ifconfig，netstat，ping”，仅搜索文件、IP、端口，测试网络连接操作；Operate in the "/home" directory, the operation content is: the command only includes "ls, cd, ifconfig, netstat, ping", only searches for files, IP, ports, and tests the network connection operation;

2级用户行为Level 2 User Behavior

在“/opt或/usr或/etc或/var或/proc或/tmp”目录下操作，操作内容为：做任何操作命令；Operate in the "/opt or /usr or /etc or /var or /proc or /tmp" directory, the operation content is: do any operation command;

3级用户行为Level 3 User Behavior

(1)在“/home”目录下操作，操作内容为：命令“rm，cp，su，passwd，chown”；(1) Operate in the "/home" directory, the operation content is: command "rm, cp, su, passwd, chown";

(2)在“/root”目录下的操作和“/”操作，操作内容为：命令“ls，cd，ifconfig，netstat，ping”；(2) The operation in the "/root" directory and the "/" operation, the operation content is: command "ls, cd, ifconfig, netstat, ping";

4级用户行为Level 4 User Behavior

(1)在“/root”目录和“/”操作中，操作内容为：输入的指令包括命令“rm，pwd，reboot，pkill，su，chown”，删除文件，更改密码，重新启动系统，终止进程，修改权限和其他操作；(1) In the "/root" directory and the "/" operation, the operation content is: the input commands include the commands "rm, pwd, reboot, pkill, su, chown", delete files, change passwords, restart the system, terminate processes, modify permissions and other operations;

(2)不符合任何历史数据、历史操作目录；(2) Does not conform to any historical data or historical operation catalog;

(3)实施超权限指令操作。(3) Implement the super-authorized instruction operation.

用户的行为的风险等级均在前述步骤7中进行判别，当满足上述关于风险等级的判别要求时，即判定为对应的风险等级。其中，The risk level of the user's behavior is all judged in the aforementioned step 7, and when the above-mentioned judgment requirements on the risk level are met, the corresponding risk level is judged. in,

4级用户行为中的情形(2)，其判别过程为：对用户的行为与用户行为数据库进行比对，如果用户的行为不符合用户行为数据库中的任何操作目录或/和操作命令，判定为高风险行为。For the situation (2) in the user behavior of level 4, the judging process is as follows: compare the user behavior with the user behavior database, if the user behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as high-risk behavior.

4级用户行为中的情形(3)，其判别过程为：对用户的行为与判定的角色类别所对应的授权信息进行比对，如果用户的行为超出判定的角色类别所对应的授权信息，判定为高风险行为。In the case (3) of level 4 user behavior, the judging process is as follows: compare the user's behavior with the authorization information corresponding to the determined role category, and determine if the user's behavior exceeds the authorization information corresponding to the determined role category. high-risk behavior.

更具体地，前述正常行为模型采用K-means算法进行构建。K-means算法是典型的基于距离的聚类算法，采用距离作为相似度评价指标，即认为两个对象的距离越近，其相似度越大。计算距离时常用欧式距离或余弦角度。K表示目标聚类簇数，means表示均值，K-means就是通过均值对数据点进行聚类的算法。More specifically, the aforementioned normal behavior model is constructed using the K-means algorithm. The K-means algorithm is a typical distance-based clustering algorithm, and the distance is used as the similarity evaluation index, that is, the closer the distance between two objects, the greater the similarity. Euclidean distance or cosine angle are often used when calculating distance. K represents the number of target clusters, means represents the mean, and K-means is an algorithm for clustering data points by the mean.

K-means算法分为簇分配和移动聚类中心两个步骤：The K-means algorithm is divided into two steps: cluster assignment and moving cluster centers:

1)随机选择k个对象，每个对象代表了一个类簇的平均值。对于剩余的每个对象，根据其与各个聚类中心的距离，将它分配给最近的簇。1) Randomly select k objects, each representing the mean of a cluster. For each remaining object, it is assigned to the nearest cluster based on its distance from the respective cluster center.

2)重新计算每个簇的平均值。重复该过程，直到准则函数E收敛，即聚类中心不再发生明显的变化。2) Recalculate the mean for each cluster. This process is repeated until the criterion function E converges, that is, the cluster centers no longer change significantly.

通常采用误差平方和准则函数E作为性能度量，该准则函数表示所有样本点到各自簇的均值向量的距离之和，E值越小，簇内样本值相似度越高。最小化准则函数E是个非确定性多项式(Nondeterministic Polynomially，NP)问题，而聚类算法可以看成是一个坐标上升算法，即通过固定一个变量，调整另一个变量，通过迭代过程不断进行调整，最终得到局部最优解。Usually, the error sum of squares criterion function E is used as a performance measure. The criterion function represents the sum of the distances from all sample points to the mean vector of their respective clusters. The smaller the E value, the higher the similarity of the sample values in the cluster. The minimization criterion function E is a nondeterministic polynomial (NP) problem, and the clustering algorithm can be regarded as a coordinate ascent algorithm, that is, by fixing one variable, adjusting another variable, and continuously adjusting through an iterative process, and finally get the local optimal solution.

聚类完成，数据由未标记数据转换为标记数据，满足监督学习的训练数据特性。进行分类预测，首先，需要划分训练集和测试集，且需保证训练集中样本类别的分布要尽可能与测试集的样本类别分布一致，否则会影响模型的评估。同时，样本类别的比例要保持平衡，样本类别比例失调会导致训练得到的模型过拟合或者欠拟合，因此，在样本训练测试集初始阶段，划分时要保证：After the clustering is completed, the data is converted from unlabeled data to labeled data, which satisfies the training data characteristics of supervised learning. For classification prediction, first of all, it is necessary to divide the training set and the test set, and it is necessary to ensure that the distribution of sample categories in the training set is as consistent as possible with the distribution of sample categories in the test set, otherwise it will affect the evaluation of the model. At the same time, the proportion of sample categories should be kept balanced, and the imbalance of the proportion of sample categories will lead to over-fitting or under-fitting of the model obtained by training. Therefore, in the initial stage of the sample training and test set, it is necessary to ensure that:

1)训练集和测试集中样本类别所在数据集的比例尽可能的一致；1) The proportions of the data sets where the sample categories in the training set and the test set are located are as consistent as possible;

2)数据集中样本中所有类别比例尽可能保持平衡。由于支持向量机(SupportVector Machine,SVM)在很多数据集上都有优秀的表现，其属于典型的超平面分类器，其基本原理如下所示：2) The proportions of all categories in the samples in the dataset are kept as balanced as possible. Since Support Vector Machine (SVM) has excellent performance on many data sets, it is a typical hyperplane classifier, and its basic principles are as follows:

设

为客户输入数据，其中x是一个d维的向量用

来表示；Assume

Enter data for customers, where x is a d-dimensional vector with

To represent;

为映射规则，{c₁,...,c_k}为各个分类平面集合；c_k*＝C_w(x)为分类结果，其中k^*∈{1,..,k}，

为x基于模型w对应的分类结果。

is the mapping rule, {c ₁ ,...,c _k } is the set of each classification plane; c _k* =C _w (x) is the classification result, where k ^* ∈{1,..,k},

is the classification result corresponding to x based on model w.

超平面通过下式计算过程如下：The calculation process of the hyperplane is as follows:

(1)初步计算训练的样本的分类的精度，通过适应度函数来实现，适应度数越大，代表SVM分类越可靠，适应度函数为：(1) Preliminarily calculate the classification accuracy of the training samples, which is realized by the fitness function. The larger the fitness, the more reliable the SVM classification is. The fitness function is:

式中，just为适应度值，d_i和d_j分别表示分类到分别平面的平均距离，d_max是任意两个相异的分类中的最大距离。In the formula, just is the fitness value, d _i and d _j respectively represent the average distance from the classification to the respective plane, and d _max is the maximum distance in any two dissimilar classifications.

(2)在分类精度基础上增加权重公式，如下：(2) The weight formula is added on the basis of the classification accuracy, as follows:

式中，weight为权重值，n代表分类个数，b代表分类准则中多项式表达式中常量。计算出每个特征的权重，去掉部分权重较小的特征。In the formula, weight is the weight value, n represents the number of classifications, and b represents the constant in the polynomial expression in the classification criteria. Calculate the weight of each feature, and remove some features with smaller weights.

(3)类别匹配算法求出分类。上述中的权重公式即为匹配算法的系数，同时匹配算法中加入距离因子，如下：(3) Class matching algorithm to find out the classification. The weight formula in the above is the coefficient of the matching algorithm, and the distance factor is added to the matching algorithm, as follows:

式中，α为距离因子，md_i为平均距离，平均距离采用为分类中各点到分类平面的均值，采用算术平均值，几何平均值，平方平均值，调和平均值，加权平均值中一种或多种，c_i为点到分类平面的距离。In the formula, α is the distance factor, md _i is the average distance, and the average distance is the mean of each point in the classification to the classification plane, using one of the arithmetic mean, geometric mean, square mean, harmonic mean, and weighted mean. One or more, _ci is the distance from the point to the classification plane.

对于未知样本在不同的分类平面中的距离，计算公式如下：For the distance of unknown samples in different classification planes, the calculation formula is as follows:

式中，d_ik为在k分类中各点到分类平面的距离，weight_k为第k分类中的权重系数。求出距离最大时候的k值，即为求出分类。In the formula, d _ik is the distance from each point in the k classification to the classification plane, and weight _k is the weight coefficient in the kth classification. Find the k value when the distance is the largest, that is, to find the classification.

本发明具体实施方式还提供了一种基于大数据的网络安全行为识别系统，本发明系统用于实现前述发明方法，所述系统包括：The specific embodiment of the present invention also provides a network security behavior recognition system based on big data. The system of the present invention is used to implement the aforementioned inventive method, and the system includes:

风险等级生成模块：用于基于正常行为模型对用户的行为进行识别，生成对应的风险等级，所述风险等级包括高风险行为。具体包括如下子模块：Risk level generation module: used to identify the user's behavior based on the normal behavior model, and generate a corresponding risk level, where the risk level includes high-risk behaviors. Specifically, it includes the following sub-modules:

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明技术原理的前提下，还可以做出若干改进和变形，这些改进和变形也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can also be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.

Claims

1. a network security behavior identification method based on big data, is characterized in that, comprises the steps:

Build a normal behavior model based on historical data;

The user's behavior is identified based on the normal behavior model, and a corresponding risk level is generated, where the risk level includes high-risk behaviors.

2 . The big data-based network security behavior identification method according to claim 1 , wherein the construction method of the normal behavior model comprises K-means algorithm and SVM algorithm. 3 .

3. the network security behavior identification method based on big data according to claim 1, is characterized in that, builds normal behavior model based on historical data, comprises:

Collect user behavior historical data and build a user behavior database;

Perform association analysis on the user behavior database to construct a user identity-based behavior data collection;

Abstracting the behavior data set, extracting the basic feature values of the user's daily behavior, and constructing a user behavior feature set based on user identity;

Combined with the authorization information of user roles, cluster analysis is performed on no less than two behavior feature sets, and a user behavior benchmark feature library based on user roles is constructed.

4. the network security behavior identification method based on big data according to claim 3, is characterized in that, the behavior of user is identified based on normal behavior model, comprising:

The user's behavior is compared with the user behavior benchmark feature database to determine the user's role category.

5. The method for identifying network security behavior based on big data according to claim 4, wherein the user's behavior includes an operation directory or/and an operation command; the user's behavior is identified based on a normal behavior model, further comprising:

The user's behavior is compared with the user behavior database, and if the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior.

6. The network security behavior identification method based on big data according to claim 4, is characterized in that, based on normal behavior model, the behavior of user is identified, also comprising:

The user's behavior is compared with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior.

7. The network security behavior identification method based on big data according to claim 4, wherein the behavior of the user is identified based on the normal behavior model, and further comprising: the behavior of the user is outlier based on the determined role category Point analysis, according to the analysis results to determine the risk level.

8 . The method for identifying network security behaviors based on big data according to claim 1 , wherein the user behaviors include user behaviors in a power monitoring system. 9 .

9. A network security behavior identification system based on big data, characterized by comprising:

Normal behavior model building module: used to build a normal behavior model based on historical data;

Risk level generation module: used to identify the user's behavior based on the normal behavior model, and generate a corresponding risk level, where the risk level includes high-risk behaviors.

10. The big data-based network security behavior identification system according to claim 9, wherein the user's behavior includes an operation directory or/and an operation command, and the risk level generation module includes:

Role category determination sub-module: used to compare the user's behavior with the user behavior benchmark feature library to determine the user's role category;

Risk level determination sub-module: It is used to compare the user's behavior with the user behavior database. If the user's behavior does not conform to any operation catalogue or/and operation command in the user behavior database, it is determined as a high-risk behavior; Compare with the authorization information corresponding to the determined role category. If the user's behavior exceeds the authorization information corresponding to the determined role category, it is determined as a high-risk behavior; based on the determined role category, the user's behavior is analyzed for outliers , according to the analysis results to determine the risk level.