CN103532760B

CN103532760B - Analytical equipment, system and method for analyzing the order executed on each host

Info

Publication number: CN103532760B
Application number: CN201310492700.2A
Authority: CN
Inventors: 张卓; 杨卿; 刘小雄; 李洪亮
Original assignee: Beijing Qianxin Technology Co Ltd
Current assignee: Qax Technology Group Inc
Priority date: 2013-10-18
Filing date: 2013-10-18
Publication date: 2018-11-09
Anticipated expiration: 2033-10-18
Also published as: CN103532760A

Abstract

The invention discloses the analytical equipments, system and method for analyzing the order executed on each host, wherein the analytical equipment for analyzing the order executed on each host includes：Intensive data recover is configured as at least collecting each host terminal the current command and affiliated host identification by network transmission；Command analyzer is configured as that the current command that the intensive data recover is collected into is identified, at least identifies aberrant commands and normal command；And alarm device, it is configured as being judged whether to meet alarm conditions according to the recognition result of the command analyzer, if it is satisfied, then sending out respective host has abnormal warning information.Through the invention can be in time to what is inputted on each host in network system, the aberrant commands with certain risk are alerted, and the safety of system is improved.

Description

Analysis device, system and method for analyzing commands executed on host computers

技术领域technical field

本发明涉及计算机技术领域，特别是涉及一种用于分析在各主机上执行的命令的分析设备、系统和方法。The invention relates to the field of computer technology, in particular to an analysis device, system and method for analyzing commands executed on each host.

背景技术Background technique

随着网络的快速发展，出现了需要为大量用户服务的网络系统。这些网络系统通常分布于大量的服务器之上，比如Linux、Unix等，然后系统管理者可以通过输入命令对这些服务器进行操作，但是这些管理员可能不十分了解这些服务器上提供的服务，所以这些操作命令可能会导致服务器不能正常工作、甚至造成严重后果。另外，随着服务器的增多，有些服务器可能会被黑客所侵入，这些黑客就可能执行一些恶意操作来破坏服务器的正常运行。With the rapid development of the network, there have been network systems that need to serve a large number of users. These network systems are usually distributed on a large number of servers, such as Linux, Unix, etc., and then system administrators can operate these servers by entering commands, but these administrators may not be very familiar with the services provided on these servers, so these operations Commands may cause the server not to work normally, or even cause serious consequences. In addition, with the increase of servers, some servers may be invaded by hackers, and these hackers may perform some malicious operations to destroy the normal operation of the servers.

当然上述情况不仅仅存在于服务器，还可能存在于其他类似的主机设备上。因此，如何对服务器等主机设备上执行的命令进行监控，当出现异常时能够及时告警是目前急需解决的问题。Of course, the above situation does not only exist on the server, but may also exist on other similar host devices. Therefore, how to monitor the commands executed on host devices such as servers, and how to give an alarm in time when an exception occurs is an urgent problem to be solved at present.

发明内容Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的用于分析在各主机上执行的命令的分析设备、系统和相应的用于分析在各主机上执行的命令的分析方法。In view of the above problems, the present invention is proposed in order to provide an analysis device and system for analyzing commands executed on each host computer and a corresponding analysis device and system for analyzing commands executed on each host computer, which overcome the above problems or at least partly solve the above problems. The parse method for the command.

本发明实施例公开了一种用于分析在各主机上执行的命令的分析设备，包括：集中数据回收器，被配置为至少收集各主机终端通过网络传输的当前命令及所属主机标识；命令分析器，被配置为对所述集中数据回收器收集到的当前命令进行识别，至少识别出异常命令和正常命令；告警器，被配置为根据所述命令分析器的识别结果判断是否满足告警条件，如果满足，则发出相应主机存在异常的告警信息。The embodiment of the present invention discloses an analysis device for analyzing commands executed on each host, including: a centralized data collector configured to at least collect the current commands transmitted by each host terminal through the network and the identifier of the host to which it belongs; command analysis A device configured to identify the current commands collected by the centralized data collector, at least identifying abnormal commands and normal commands; an alarm device configured to determine whether an alarm condition is met according to the identification result of the command analyzer, If it is satisfied, an alarm message indicating that the corresponding host is abnormal is issued.

可选的，所述命令分析器包括过滤模块，被配置为对所述集中数据回收器收集到的当前命令采用预置的可疑规则进行过滤，将被所述可疑规则命中的当前命令识别为异常命令，并输出被所述可疑规则命中的异常命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；所述告警器具体被配置为根据所述异常命令的告警权值判断是否满足告警条件。Optionally, the command analyzer includes a filtering module configured to filter the current commands collected by the centralized data collector using preset suspicious rules, and identify the current commands hit by the suspicious rules as abnormal command, and output the alarm weight of the abnormal command hit by the suspicious rule, the alarm weight is obtained based on the overall hit rate of the suspicious rule to the command; the alarm is specifically configured to be based on the abnormal command The warning weight determines whether the warning condition is met.

可选的，所述过滤模块输出的异常命令的告警权值通过下述方式获得：通过以该条可疑规则对已有命令的总体命中率作为自变量的单调递减函数，获得被该条可疑规则命中的异常命令的告警权值。Optionally, the alarm weight of the abnormal command output by the filtering module is obtained in the following manner: by using the overall hit rate of the suspicious rule on existing commands as an independent variable, the monotonically decreasing function obtained by the suspicious rule The warning weight of the abnormal command hit.

可选的，所述告警器具体被配置为统计一个告警周期内、所述命令分析器识别出的某一主机上的所有异常命令，将这些异常命令各自对应的告警权值进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Optionally, the alarm device is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and comprehensively process the alarm weights corresponding to these abnormal commands, It is judged whether the preset alarm condition is satisfied according to the value after comprehensive processing.

可选的，所述命令分析器包括：分类模块，被配置为根据已有分类模型的训练样本集，对所述集中数据回收器接收到的当前命令进行分类，获得当前命令分别是正常命令的概率和异常命令的概率，进而识别出该当前命令是否属于异常命令。Optionally, the command analyzer includes: a classification module, configured to classify the current command received by the centralized data collector according to the training sample set of the existing classification model, and obtain whether the current command is a normal command probability and the probability of the abnormal command, and then identify whether the current command is an abnormal command.

可选的，所述告警器具体被配置为统计一个告警周期内、所述命令分析器识别出的某一主机上的所有异常命令，将这些异常命令各自对应的异常命令概率进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Optionally, the alarm is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and comprehensively process the abnormal command probabilities corresponding to these abnormal commands, It is judged whether the preset alarm condition is satisfied according to the value after comprehensive processing.

可选的，所述命令分析器包括：过滤模块，被配置为对所述集中数据回收器接收到的当前命令采用预置的可疑规则进行过滤，将被所述可疑规则命中的当前命令输出至分类模块，并输出被所述可疑规则命中的当前命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；分类模块，被配置为根据已有分类模型的训练样本集，对从所述过滤模块输入的当前命令进一步分类，获得当前命令分别是正常命令的概率和异常命令的概率，进而识别出该当前命令是否为异常命令。Optionally, the command analyzer includes: a filtering module configured to filter the current commands received by the centralized data collector using preset suspicious rules, and output the current commands hit by the suspicious rules to A classification module, and output the warning weight of the current command hit by the suspicious rule, and the warning weight is obtained based on the overall hit rate of the command by the suspicious rule; the classification module is configured to be based on the training of the existing classification model The sample set further classifies the current command input from the filtering module to obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify whether the current command is an abnormal command.

可选的，所述命令分析器还包括：学习模块，被配置为将新增的当前命令与已有训练样本集合并后进行机器学习，更新所述分类模块使用的已有训练样本集。Optionally, the command analyzer further includes: a learning module configured to perform machine learning after combining the newly added current command with an existing training sample set, and update the existing training sample set used by the classification module.

可选的，所述告警器具体被配置为统计一个告警周期内、所述命令分析器识别出的某一主机上的所有异常命令，将每一异常命令对应的异常命令概率和告警权值相乘获得对应的告警指数，将这些异常命令的告警指数进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Optionally, the alarm is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and compare the abnormal command probability corresponding to each abnormal command with the alarm weight. The corresponding alarm indices are obtained by multiplying, and the alarm indices of these abnormal commands are integrated, and it is judged whether the preset alarm conditions are satisfied according to the integrated values.

本发明实施例还公开了一种用于分析在各主机上执行的命令的系统，包括如前文所述的分析设备和若干主机终端；所述若干主机终端，被配置为至少将各主机上的当前命令及所属主机标识通过网络传输至集中数据回收器。The embodiment of the present invention also discloses a system for analyzing commands executed on each host, including the analysis device as described above and several host terminals; the several host terminals are configured to at least The current command and its host ID are transmitted to the centralized data collector through the network.

可选的，所述主机终端包括：命令发送模块，被配置为对各主机的命令解析器shell进行改造，增加将所述shell接收到的主机当前命令和主机IP传输至所述集中数据回收器的函数。Optionally, the host terminal includes: a command sending module, which is configured to transform the command parser shell of each host, and increase the host current command and host IP received by the shell to the centralized data collector The function.

可选的，还包括：监控器，被配置为对各主机中所述命令发送模块的部署情况进行监控，当发现有新增主机未部署所述命令发送模块或发现有主机上的所述命令发送模块失效时，通过该未部署命令发送模块或命令发送模块失效的主机IP自动登录至该主机上，为其部署所述命令发送模块。Optionally, it also includes: a monitor configured to monitor the deployment of the command sending module in each host, when it is found that a new host has not deployed the command sending module or the command sending module on the host is found When the sending module fails, the IP of the host that has not deployed the command sending module or the command sending module fails automatically logs in to the host, and deploys the command sending module for it.

本发明实施例还公开了一种用于分析在各主机上执行的命令的方法，包括：收集所述各主机通过网络传输的当前命令及所属主机标识；对所述收集到的当前命令进行识别，至少识别出异常命令和正常命令；根据上述识别结果判断是否满足告警条件，如果满足，则发出相应主机存在异常的告警信息。The embodiment of the present invention also discloses a method for analyzing commands executed on each host, including: collecting the current commands transmitted by each host through the network and the identification of the host to which they belong; identifying the collected current commands , at least identifying abnormal commands and normal commands; judging whether the alarm conditions are met according to the above identification results, and if so, sending out an alarm message indicating that the corresponding host is abnormal.

可选的，所述对收集到的当前命令进行识别的步骤包括：对收集到的当前命令采用预置的可疑规则进行过滤，将被所述可疑规则命中的当前命令识别为异常命令，并获得被所述可疑规则命中的异常命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；所述根据上述识别结果判断是否满足告警条件的步骤包括：根据所述异常命令的告警权值判断是否满足告警条件。Optionally, the step of identifying the collected current commands includes: filtering the collected current commands using preset suspicious rules, identifying the current commands hit by the suspicious rules as abnormal commands, and obtaining The warning weight of the abnormal command hit by the suspicious rule, the warning weight is obtained based on the overall hit rate of the suspicious rule to the command; the step of judging whether the warning condition is met according to the above recognition result includes: according to the The alarm weight of the abnormal command judges whether the alarm condition is met.

可选的，所述异常命令的告警权值通过下述方式获得：通过以该条可疑规则对已有命令的总体命中率作为自变量的单调递减函数，获得被该条可疑规则命中的异常命令的告警权值。Optionally, the alarm weight of the abnormal command is obtained in the following manner: by using the overall hit rate of the suspicious rule on existing commands as an independent variable, the abnormal command hit by the suspicious rule is obtained by a monotonically decreasing function warning weight.

可选的，统计一个告警周期内、识别出的某一主机上的所有异常命令，将这些异常命令各自对应的告警权值进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Optionally, count all abnormal commands identified on a certain host within an alarm cycle, perform comprehensive processing on the corresponding alarm weights of these abnormal commands, and judge whether the preset values are satisfied according to the comprehensively processed values. Alarm condition.

可选的，所述对收集到的当前命令进行识别包括：根据已有分类模型的训练样本集，对接收到的当前命令进行分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否属于异常命令。Optionally, the identifying the collected current commands includes: according to the training sample set of the existing classification model, classifying the received current commands to obtain the probability that the current command is a normal command and the probability that it is an abnormal command, Then it is identified whether the current command is an abnormal command.

可选的，所述根据识别结果判断是否满足告警条件的步骤包括：统计一个告警周期内、识别出的某一主机上的所有异常命令，将这些异常命令各自对应的异常命令概率进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Optionally, the step of judging whether the alarm condition is met according to the identification result includes: counting all abnormal commands on a certain host identified within an alarm period, and comprehensively processing the abnormal command probabilities corresponding to these abnormal commands , judging whether the preset alarm condition is satisfied according to the integrated value.

可选的，所述对收集到的当前命令进行识别包括：对接收到的当前命令采用预置的可疑规则进行过滤，筛选出被所述可疑规则命中的当前命令，并输出被所述可疑规则命中的当前命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；根据已有分类模型的训练样本集，对筛选出的上述当前命令进一步分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否为异常命令。Optionally, the identifying the collected current commands includes: filtering the received current commands using preset suspicious rules, filtering out the current commands hit by the suspicious rules, and outputting the current commands hit by the suspicious rules. The alarm weight of the hit current command, the alarm weight is obtained based on the overall hit rate of the suspicious rule to the command; according to the training sample set of the existing classification model, the above-mentioned current command that is screened out is further classified to obtain the current command The probability of being a normal command and the probability of being an abnormal command are used to identify whether the current command is an abnormal command.

可选的，还包括：将新增的当前命令与已有训练样本集合并后进行机器学习，更新进行分类时使用的已有训练样本集。Optionally, it also includes: performing machine learning after merging the newly added current command with the existing training sample set, and updating the existing training sample set used for classification.

可选的，所述根据识别结果判断是否满足告警条件的步骤包括：统计一个告警周期内、识别出的某一主机上的所有异常命令，将每一异常命令对应的异常命令概率和告警权值相乘获得对应的告警指数，将这些异常命令的告警指数进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Optionally, the step of judging whether the alarm condition is met according to the identification result includes: counting all abnormal commands on a certain host identified within an alarm period, and calculating the abnormal command probability and alarm weight corresponding to each abnormal command Multiply the corresponding alarm indices to obtain the corresponding alarm indices, perform comprehensive processing on the alarm indices of these abnormal commands, and judge whether the preset alarm conditions are met according to the integrated value.

可选的，所述收集所述各主机通过网络传输的当前命令及所属主机标识的步骤包括：对各主机的命令解析器shell进行改造，增加将所述shell接收到的主机当前命令和主机IP通过网络传输至指定设备的函数，利用所述函数收集所述各主机的当前命令及所属主机标识。Optionally, the step of collecting the current commands transmitted by each host through the network and the identification of the hosts to which they belong includes: transforming the command parser shell of each host, adding the host current command and host IP received by the shell A function that is transmitted to a designated device through the network, and uses the function to collect the current commands of the hosts and the IDs of the hosts they belong to.

可选的，还包括：对各主机传输当前命令及所属主机标识的事件进行监控，当发现有新增主机未进行上述shell改造或者改造失效时，通过该主机IP自动登录至该主机上为其部署上述shell的改造。Optionally, it also includes: monitoring the events of each host transmitting the current command and the ID of the host to which it belongs, and when it is found that there is a new host that has not undergone the above-mentioned shell modification or the modification fails, automatically log in to the host through the IP of the host to serve as its Deploy a retrofit of the above shell.

根据本发明的用于分析在各主机上执行的命令的分析设备，可以在包括若干主机的网络系统中，收集各主机通过网络传输的当前命令及当前命令所属主机的标识，对收集到的当前命令中具有一定的操作危险的命令进行有效识别，判断出在主机上输入的命令是异常命令还是正常命令，并在主机有异常命令输入且满足告警条件时，及时地发出相应主机存在异常的告警信息，由此解决了因管理员误操作、黑客攻击等原因造成而在系统中的主机上输入危险性命令时，对主机乃至整个系统的稳定运行造成不良影响，及时对网络系统中各主机上输入的危险命令进行告警，提高了系统的安全性。According to the analysis equipment for analyzing the commands executed on each host of the present invention, in a network system including several hosts, the current commands transmitted by each host through the network and the identifier of the host to which the current command belongs can be collected, and the collected current Effectively identify the commands with certain operational risks among the commands, judge whether the command input on the host is abnormal or normal, and when the host has an abnormal command input and meets the alarm conditions, it will promptly issue an alarm about the abnormality of the corresponding host information, which solves the problem of bad influence on the stable operation of the host and even the entire system when dangerous commands are entered on the host in the system due to misoperation by the administrator, hacker attacks, etc. An alarm is issued for dangerous commands entered, which improves the security of the system.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same parts. In the attached picture:

图1示出了根据本发明一个实施例的用于分析在各主机上执行的命令的分析系统示意图；FIG. 1 shows a schematic diagram of an analysis system for analyzing commands executed on each host according to an embodiment of the present invention;

图2示出了根据本发明一个实施例的用于分析在各主机上执行的命令的分析方法流程图；以及，FIG. 2 shows a flowchart of an analysis method for analyzing commands executed on each host according to an embodiment of the present invention; and,

图3示出了根据本发明一个实施例的具体应用示意图。Fig. 3 shows a schematic diagram of a specific application according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

请参见图1，图1示出了根据本发明一个实施例的用于分析在各主机上执行的命令的系统，该系统包括若干主机终端210和用于分析在各主机上执行的命令的分析设备100。其中，分析设备100具体包括集中数据回收器110、命令分析器120以及告警器130，每个主机终端210包括命令发送模块2102，各主机终端210耦接于集中数据回收器110。下面对该用于分析在各主机上执行的命令的分析设备，以及各组成部分进行的具体实现方式进行具体的介绍。Please refer to FIG. 1, which shows a system for analyzing commands executed on each host according to an embodiment of the present invention, the system includes several host terminals 210 and an analysis system for analyzing commands executed on each host device 100. Wherein, the analysis device 100 specifically includes a centralized data collector 110 , a command analyzer 120 and an alarm 130 , each host terminal 210 includes a command sending module 2102 , and each host terminal 210 is coupled to the centralized data collector 110 . The analysis device for analyzing commands executed on each host and the specific implementation manners of each component will be specifically introduced below.

一个网络系统通常由多个主机终端210组成，主机终端210可以是计算机实体，也可以是运行于计算机设备上的虚拟机。多个主机终端可以完成不同的分工，在各个终端上可以运行多种命令，来进行对系统的诸如开关机，文件操作，系统配置，安装/卸载软件等等操作，所输入的命令中有可能存在对系统运行造成潜在危害的命令，因此需要通过分析设备100识别这些命令可能造成潜在危害的命令，首选就需要将各主机上执行的当前命令及所属主机标识通过网络传输到分析设备100，其中，主机标识可以是网络系统中各主机的主机名和/或IP地址等，分析设备100获取主机标识的目的，主要是为了判断当前输入的命令是哪部主机发出的，以便于一旦该命令存在风险，可以在一定条件下发出告警信息。为了实现将各主机上执行的命令准确、全面的传输给分析设备110，需要在各主机终端210中设置命令发送模块2102。A network system is generally composed of multiple host terminals 210, and the host terminals 210 may be computer entities or virtual machines running on computer devices. Multiple host terminals can complete different divisions of labor, and various commands can be run on each terminal to perform operations on the system such as power on and off, file operations, system configuration, installation/uninstallation of software, etc., among the commands entered may be There are commands that may cause potential harm to the system operation, so it is necessary to identify these commands that may cause potential harm through the analysis device 100. The first choice is to transmit the current command executed on each host and the host identification to the analysis device 100 through the network. , the host ID can be the host name and/or IP address of each host in the network system, etc. The purpose of the analysis device 100 to obtain the host ID is mainly to determine which host issued the currently input command, so that once the command is at risk , can issue an alarm message under certain conditions. In order to accurately and comprehensively transmit the commands executed on each host to the analysis device 110 , it is necessary to set a command sending module 2102 in each host terminal 210 .

首先，命令发送模块2102将各主机上的当前命令及所属主机标识通过网络传输至集中数据回收器110。例如，命令发送模块2102对各主机的命令解析器shell进行改造，增加将命令解析器shell接收到的主机当前命令和主机IP传输至指定设备（如集中数据回收器110）的函数。如在类UNIX操作系统中，常见的shell有bash，csh，tcsh等等，以bash为例，可以对其中的add_history函数进行改造，具体可以在其中增加对talker(char*host,char*message)函数的调用，由talker函数实现将在当前主机host输入的当前命令message传输给集中数据回收器110。这种对输入的当前命令的传输可以是实时的，即当受监控的主机终端210上一旦发生命令输入，就将输入的命令传输给集中数据回收器110，另外还可以将受监控主机终端210上输入的命令存储为日志shell_log的形式，在达到一定条件下将shell_log日志传输给集中数据回收器110，例如在达到了一定的时间周期，或者在shell_log文件达到了一定的大小时将shell_log文件传输给集中数据回收器110。First, the command sending module 2102 transmits the current command on each host and the ID of the host to which it belongs to the centralized data collector 110 through the network. For example, the command sending module 2102 modifies the command parser shell of each host, and adds a function of transmitting the current host command and host IP received by the command parser shell to a designated device (such as the centralized data collector 110). For example, in a UNIX-like operating system, the common shells include bash, csh, tcsh, etc. Taking bash as an example, the add_history function can be modified, and specifically talker(char*host,char*message) can be added to it. The call of the function is implemented by the talker function to transmit the current command message input on the current host to the centralized data collector 110 . This transmission of the current command input can be real-time, that is, once a command input occurs on the monitored host terminal 210, the input command will be transmitted to the centralized data recovery device 110, and the monitored host terminal 210 can also be sent The command input above is stored in the form of the log shell_log, and the shell_log log is transmitted to the centralized data collector 110 when certain conditions are reached, for example, when a certain time period is reached, or when the shell_log file reaches a certain size, the shell_log file is transmitted To the centralized data reclaimer 110.

当各主机终端210通过命令发送模块2102将各主机上执行的命令及所属主机标识，通过网络传输给几种数据回收器110之后，集中数据回收器110就可以收集到当前各主机终端210上所输入的命令，以及是在哪部主机上输入的命令，可选的，可以将所有接收到的命令保存至一数据库，进而为对当前命令进行分析做好数据准备，命令分析器120耦接于集中数据回收器110，后续具体分析的工作主要由命令分析器120完成。After each host terminal 210 transmits the command executed on each host and the identifier of the host through the network to several data collectors 110 through the command sending module 2102, the centralized data collector 110 can collect the The command input, and the command input on which host computer, optionally, all received commands can be saved to a database, and then data preparation is done for analyzing the current command, and the command analyzer 120 is coupled to Centralize the data collector 110, and the subsequent specific analysis work is mainly completed by the command analyzer 120.

命令分析器120对集中数据回收器110收集到的当前命令进行识别，至少识别出异常命令和正常命令。其中，异常命令是可能对系统正常运行具有潜在威胁的命令，正常命令是对系统运行没有威胁的命令。具体在实现命令分析器120对命令进行识别时，可以有多种实现方式，下面分别对命令分析器120的几种实现方式进行详细的介绍。The command analyzer 120 identifies the current commands collected by the centralized data collector 110, at least identifying abnormal commands and normal commands. Among them, the abnormal command is a command that may potentially threaten the normal operation of the system, and the normal command is a command that does not threaten the normal operation of the system. Specifically, when the command analyzer 120 is implemented to recognize commands, there may be multiple implementation manners, and several implementation manners of the command analyzer 120 will be introduced in detail below.

实现方式一：Implementation method one:

命令分析器120可以包括过滤模块1202，通过过滤模块1202对集中数据回收器110接收到的当前命令采用预置的可疑规则进行过滤，将被可疑规则命中的当前命令识别为异常命令，并输出被可疑规则命中的异常命令的告警权值，这里，告警权值基于该条可疑规则对命令的总体命中率获得。预置的可疑规则，可以是预先根据常见危险操作的特征生成的，每条可疑规则包括至少一个危险操作的特征标识，危险操作的特征标识根据实际情况有很多种，比如下述情况的一种或多种：添加账户；打开、修改或删除敏感文件的关键属性；查看或修改敏感文件的密码；更改网络设置；提升用户权限；更改防火墙设置；查看系统日志；编译代码；有敏感词；更改文件权限和属性；关机/重启；显示特定文件内容；建立网络链接并下载指定地址的文件等等。可疑规则在具体实现时可以采用正则表达式的方式，即在正则表达式中体现危险操作的特征标识信息，从而通过预置的正则表达式规则过滤出具有这些危险操作特征标识的命令，即将正则表达式规则与收集到的当前命令进行匹配，将其中命中可疑规则的异常命令过滤出来，而未被可疑规则命中的命令可以视为正常命令。此外，每一条正则表达式，可能只能针对特定格式或特定内容的命令进行过滤，因此在实际应用时，更多的情况可以是使用多条正则表达式进行多轮过滤，将命中了可疑规则组中任意一条的可疑规则的命令确定为可疑命令并加以过滤，将未命中所有规则的命令确定为正常命令。The command analyzer 120 may include a filtering module 1202, through which the current command received by the centralized data collector 110 is filtered using preset suspicious rules, and the current command hit by the suspicious rule is identified as an abnormal command, and output is The warning weight of the abnormal command hit by the suspicious rule. Here, the warning weight is obtained based on the overall hit rate of the suspicious rule to the command. The preset suspicious rules can be pre-generated based on the characteristics of common dangerous operations. Each suspicious rule includes at least one characteristic identifier of dangerous operations. There are many kinds of characteristic identifiers of dangerous operations according to the actual situation, such as one of the following situations or more: add an account; open, modify, or delete key attributes of sensitive files; view or modify passwords of sensitive files; change network settings; elevate user privileges; change firewall settings; view system logs; compile code; have sensitive words; change File permissions and attributes; shutdown/restart; display specific file content; establish network links and download files from specified addresses, etc. Suspicious rules can be implemented in the form of regular expressions, that is, the characteristic identification information of dangerous operations is reflected in the regular expressions, so that the commands with these dangerous operation characteristic identifications are filtered out through the preset regular expression rules, that is, regular The expression rules are matched with the collected current commands, and the abnormal commands that hit the suspicious rules are filtered out, while the commands that are not hit by the suspicious rules can be regarded as normal commands. In addition, each regular expression may only be able to filter commands of a specific format or specific content, so in practical applications, it is more likely to use multiple regular expressions for multiple rounds of filtering, which will hit suspicious rules The commands of any suspicious rule in the group are determined as suspicious commands and filtered, and the commands that do not match all the rules are determined as normal commands.

在过滤的过程中，还可以对每条可疑规则的总体命中率进行统计，所谓可疑规则的总体命中率，是指每条可疑规则在所有命令中命中异常命令的条数或次数，占所有命令的比例。如通常对查看密码的行为可以理解为可能是一种视图非法获取密码的越权行为，而对密码的获取，可以通过一些命令打开密码文件来实现，如在Linux操作系统中，密码文件一般会存储在特定的路径下，并以特定的文件名来命名，而Linux操作系统中又提供了对特定文件的内容进行查看的命令，这就给非法获取密码的提供了可能的途径。例如当有足够的权限时，执行命令：cat/etc/passwd就可以实现对密码文件passwd内保存的密码内容进行查看。为了对这种命令进行过滤，可以采用正则表达式：During the filtering process, the overall hit rate of each suspicious rule can also be counted. The so-called overall hit rate of suspicious rules refers to the number or times of abnormal commands hit by each suspicious rule in all commands, accounting for all commands. proportion. For example, the behavior of viewing passwords can usually be understood as an unauthorized behavior of illegally obtaining passwords, and obtaining passwords can be achieved by opening password files with some commands. For example, in Linux operating systems, password files are generally stored Under a specific path, and named with a specific file name, and the Linux operating system provides a command to view the contents of a specific file, which provides a possible way to illegally obtain the password. For example, when you have sufficient permissions, execute the command: cat /etc/passwd to view the password content saved in the password file passwd. In order to filter such commands, regular expressions can be used:

.*[\s\W]+passwd.*|^passwd.*以及，.*[\s\W]+passwd.*|^passwd.* and,

.*passwd.*.*passwd.*

通过这两个正则表达式形式的可疑规则，可以过滤出所有包含敏感内容关键字“passwd”的命令。Through these two suspicious rules in the form of regular expressions, all commands containing the sensitive keyword "passwd" can be filtered out.

假设其中一条可疑规则总共过滤了4651629条命令，并命中了其中的7915条命令，则被命中的这7915条命令就可以作为可疑命令，而这条可疑规则对应的总体命中率可以通过：该可疑规则命中的可疑命令/其检测的所有命令获得，如在本示例中，该条可疑规则的总体命中率则为：Suppose one of the suspicious rules filters a total of 4,651,629 commands and hits 7,915 of them, then the 7,915 hit commands can be regarded as suspicious commands, and the overall hit rate corresponding to this suspicious rule can pass: the suspicious Suspicious commands hit by the rule/all commands detected by it are obtained. For example, in this example, the overall hit rate of the suspicious rule is:

7915/4651629≈0.0017027915/4651629≈0.001702

总体命中率统计出来后，过滤模块1202输出被可疑规则命中的异常命令的告警权值，告警权值可以基于该条可疑规则对命令的总体命中率获得。具体基于可疑规则对命令的总体命中率获得告警权值时，可以通过以该条可疑规则对已有命令的总体命中率作为自变量的单调递减函数，获得被该条可疑规则命中的异常命令的告警权值。例如，将总体命中率记做Pa，可以通过总体命中率作为自变量的单调递减函数(1-Pa)*D，来获取被该条可疑规则命中的异常命令的告警权值，其中D为一常数。例如在上述示例中某条可以规则的命中率Pa为0.001702，可以根据After the overall hit rate is counted, the filtering module 1202 outputs the alarm weight of the abnormal command hit by the suspicious rule, and the alarm weight can be obtained based on the overall hit rate of the command by the suspicious rule. Specifically, when the warning weight is obtained based on the overall hit rate of the suspicious rule to the command, the value of the abnormal command hit by the suspicious rule can be obtained by using the monotonically decreasing function of the overall hit rate of the suspicious rule to the existing command as an independent variable. Alert weight. For example, if the overall hit rate is recorded as Pa, the alarm weight of the abnormal command hit by the suspicious rule can be obtained by using the overall hit rate as an independent variable monotonically decreasing function (1-Pa)*D, where D is a constant. For example, in the above example, the hit rate Pa of a rule can be 0.001702, which can be calculated according to

(1-Pa)*D=(1-0.001702)*100≈99.8(1-Pa)*D=(1-0.001702)*100≈99.8

其中D取100，那么被该条可疑规则命中的异常命令的告警权值约为99.8。Where D is 100, then the warning weight of the abnormal command hit by the suspicious rule is about 99.8.

之所以使用总体命中率作为自变量的单调递减函数，是因为在实际应用中，一条可疑规则命中的异常命令实际为具有可疑危险性的命令，如果一条可疑规则命中命令的次数比较多或者频率比较高，说明该可疑规则命中的命令可能是比较常见的命令，而基于实际情况中毕竟真正的异常命令是少数，所以从逻辑上讲，如果某条可疑规则命中命令的次数较多或频率较高，那么被这条可疑规则命中的命令是真正异常命令的可能性相对较低，之所以被可疑规则命中，那么很可能是由于该条可疑规则是比较“严厉”的规则，进而可以认为该规则命中的命令危险性较小，因而，被其命中的异常命令可以取一较小的告警权值。The reason why the overall hit rate is used as a monotonically decreasing function of the independent variable is that in practical applications, the abnormal command hit by a suspicious rule is actually a suspiciously dangerous command. High, indicating that the command hit by the suspicious rule may be a relatively common command, but based on the actual situation, after all, there are only a few truly abnormal commands, so logically speaking, if a certain suspicious rule hits commands more times or with a higher frequency , then the possibility that the command hit by this suspicious rule is a real abnormal command is relatively low. The reason why it is hit by a suspicious rule is probably because the suspicious rule is a relatively "strict" rule, and then it can be considered that the rule The hit command is less dangerous, therefore, the abnormal command hit by it can take a smaller alarm weight.

过滤模块1202输出的告警权值可以作为告警器130获取告警权值的依据，关于这部分内容会在后续告警器130的内容中进行详细介绍。The alarm weight value output by the filtering module 1202 can be used as the basis for the alarm device 130 to obtain the alarm weight value, and this part will be described in detail in the subsequent content of the alarm device 130 .

实现方式二：Implementation method two:

命令分析器120可以包括学习模块1206和分类模块1204。The command analyzer 120 can include a learning module 1206 and a classification module 1204 .

学习模块1206主要对训练样本集进行机器学习，然后为分类模块1204提供所需的各种先验参数。由于分类模块1204可以基于贝叶斯、逻辑回归、偏最小二乘法或决策树等多种分类原理来实现，因此相应的，学习模块也需要根据分类模块1204的不同而提供不同的先验参数。下面以分类模块1204基于贝叶斯原理实现，学习模块1206为分类模块1204提供所需的各种先验概率为例，对这两个模块进行详细说明。The learning module 1206 mainly performs machine learning on the training sample set, and then provides various required prior parameters for the classification module 1204 . Since the classification module 1204 can be implemented based on various classification principles such as Bayesian, logistic regression, partial least squares or decision tree, correspondingly, the learning module also needs to provide different prior parameters according to different classification modules 1204 . Taking the classification module 1204 implemented based on the Bayesian principle and the learning module 1206 providing various prior probabilities required for the classification module 1204 as an example, the two modules will be described in detail below.

学习模块1206对已知的训练样本集进行机器学习。训练样本集包括一定数量的已知命令，并且已知这些命令是否为异常命令。对训练样本集中的已知命令进行分词得到的字段可以看作是与命令有关的特征词，这些特征词可以是命令字符串本身，如cat，wget等，也可以包括从命令的参数提取出的内容。如将命令：The learning module 1206 performs machine learning on known training sample sets. The training sample set includes a certain number of known commands, and it is known whether these commands are abnormal commands. The fields obtained by segmenting the known commands in the training sample set can be regarded as feature words related to the command. These feature words can be the command string itself, such as cat, wget, etc., and can also include parameters extracted from the command content. Such as the command:

wget-o http://www.sina.com/dasd/hahah/tad.tgz/usr/loca/dasd/etc/passwd进行分词，可以得到如下特征词集合：wget-o http://www.sina.com/dasd/hahah/tad.tgz/usr/loca/dasd/etc/passwd for word segmentation, you can get the following set of feature words:

{'wget','-o','http','www.sina.com','dasd','hahah','tad.tgz','usr','loca','dasd','etc','passwd','www','sina','com'}{'wget','-o','http','www.sina.com','dasd','hahah','tad.tgz','usr','loca','dasd','etc ','passwd','www','sina','com'}

具体在对命令进行分词得到特征词时，可以使用正则表达式工具，例如可以使用Specifically, when segmenting commands to obtain feature words, you can use regular expression tools, for example, you can use

[_\$]*[a-zA-Z\d\._\-]+[^\w$/;=\-$\[\]\{\}:>&\?\.\\\s,\d'"\%<]*[_\$]*[a-zA-Z\d\._\-]+[^\w$/;=\-$\[\]\{\}:>&\?\.\ \\s,\d'"\%<]*

对命令进行切分，还可以使用正则表达式Divide the command, you can also use regular expressions

((\w+\.){1,6}(?:net|cn|com|gov|edu|asia|me|co))((\w+\.){1,6}(?:net|cn|com|gov|edu|asia|me|co))

识别命令中的网址，从而可以对诸如上述命令示例进行切分，得到基于该命令的特征词集合。Identify the URL in the command, so that the command example above can be segmented to obtain a set of feature words based on the command.

由于在训练样本集中，命令是否异常是已知的，异常命令出现的概率可以通过（异常命令数量/训练样本集中命令总量）获得，正常命令出现的概率可以通过（正常命令数量/训练样本集中命令总量）获得。另外，通过对训练样本集中的命令进行分词，各个特征词出现的在异常命令中的概率和出现在正常命令中的概率也是可以统计得到的，因此学习模块1206可以获得以上这些先验概率。然后，将这些先验概率数据提供给分类模块1204使用，以便分类模块1204对当前待分析的命令进行分类。Since it is known whether the command is abnormal in the training sample set, the probability of abnormal command occurrence can be obtained by (the number of abnormal commands/the total amount of commands in the training sample set), and the probability of normal commands can be obtained by (the number of normal commands/the total number of commands in the training sample set command total) obtained. In addition, by segmenting the commands in the training sample set, the probabilities of each feature word appearing in abnormal commands and normal commands can also be obtained statistically, so the learning module 1206 can obtain the above prior probabilities. Then, these prior probability data are provided to the classification module 1204 for use, so that the classification module 1204 can classify the current command to be analyzed.

可见，分类模块1204根据已有分类模型的训练样本集（具体是学习模块1206对已有分类模型的训练样本集进行机器学习后提供给分类模块1204一些先验概率），对集中数据回收器110接收到的当前命令进行分类，获得当前命令分别是正常命令的概率和异常命令的概率，进而识别出该当前命令是否属于异常命令。It can be seen that, according to the training sample set of the existing classification model (specifically, the learning module 1206 provides some prior probabilities to the classification module 1204 after performing machine learning on the training sample set of the existing classification model), the classification module 1204 centralizes the data collector 110 The received current command is classified to obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify whether the current command is an abnormal command.

下面以贝叶斯分类方法为例，对分类模块1204进行具体的介绍。The Bayesian classification method is taken as an example below to introduce the classification module 1204 in detail.

贝叶斯分类方法是一种统计分类方法，它是一类利用概率统计进行分类的算法。在许多应用中，朴素贝叶斯分类法都可以获取非常准确的分类结果，且贝叶斯分类方法本身还具有易于实现、分类准确率高、速度快的特点，贝叶斯分类方法的原理是通过对象的先验概率，利用贝叶斯公式计算出其后验概率，即该对象属于某一类的概率，选择具有最大后验概率的类作为该对象所属的类。本发明实施例中，分类模块1204可以利用贝叶斯分类方法来实现对当前命令是否异常命令进行识别，下面对该其实现的过程进行详细的介绍。The Bayesian classification method is a statistical classification method, which is a class of algorithms that use probability statistics for classification. In many applications, the naive Bayesian classification method can obtain very accurate classification results, and the Bayesian classification method itself has the characteristics of easy implementation, high classification accuracy and fast speed. The principle of the Bayesian classification method is Through the prior probability of the object, the Bayesian formula is used to calculate its posterior probability, that is, the probability that the object belongs to a certain class, and the class with the largest posterior probability is selected as the class to which the object belongs. In the embodiment of the present invention, the classification module 1204 can use the Bayesian classification method to identify whether the current command is an abnormal command, and the implementation process will be described in detail below.

利用贝叶斯分类方法实现分类模块1204，其实质是要实现利用训练样本集中已知是否异常的命令、异常命令和正常命令分别出现的概率、以及根据已知命令进行分词得到的各字段在异常命令和正常命令出现的概率，来获取当给定一个命令时，给定命令出现的特定某个/某些字段时，该命令是正常命令的概率以及该命令是异常命令的概率，进而确定该命令的所属分类。这个过程是根据训练样本集来训练分类模块1204，通过训练使分类模块1204获取先验概率，进而可以根据贝叶斯分类方法识别当前命令属于哪个分类，即属于异常命令还是属于正常命令的能力。The Bayesian classification method is used to implement the classification module 1204. Its essence is to realize whether the commands known to be abnormal in the training sample set, the probabilities of occurrence of abnormal commands and normal commands, and the abnormality of each field obtained by word segmentation according to the known commands are realized. Command and normal command occurrence probability to obtain the probability that the command is a normal command and the probability that the command is an abnormal command when a command is given and a certain/certain field appears in the given command, and then determine the The category to which the command belongs. This process is to train the classification module 1204 according to the training sample set. Through the training, the classification module 1204 can obtain a priori probability, and then can identify which category the current command belongs to according to the Bayesian classification method, that is, the ability to belong to abnormal commands or normal commands.

在给定一个未知分类的当前命令时，判断它是属于异常命令或正常命令，要应用贝叶斯分类方法对其进行分类，首先需要对当前命令进行分词，在对当前命令进行分词时，同样可以使用正则表达式实现。设When a current command of unknown classification is given, it is judged whether it belongs to an abnormal command or a normal command. To classify it using the Bayesian classification method, it is first necessary to segment the current command. When segmenting the current command, the same This can be achieved using regular expressions. Assume

x={w₁,w₂,w₃,…,w_n}为该未知分类的当前命令经过分词得到的特征词集合；x={w ₁ ,w ₂ ,w ₃ ,…,w _n } is the set of feature words obtained by word segmentation of the current command of the unknown category;

y={y₁=good,y₂=bad}为类别集合，其中y₁=good代表正常命令的分类，y₂=bad代表异常命令的分类；接下来需要获得P(y₁|x)、P(y₂|x)，其中P(y₁|x)表示在当前命令含有集合x中的各个特征词时，其属于正常命令的概率，P(y₂|x)表示在当前命令含有集合x中的各个特征词时，其属于异常命令的概率。比较P(y₁|x)与P(y₂|x)的值，根据比较的结果确定当前命令的分类。例如取两者之中数值较大者作为当前命令的分类，或者在两者的差值达到一定的阈值时，将其中的较大者作为当前命令的分类。下面来介绍如何获取P(y₁|x)与P(y₂|x)。y={y ₁ =good,y ₂ =bad} is a collection of categories, where y ₁ =good represents the classification of normal commands, and y ₂ =bad represents the classification of abnormal commands; next, we need to obtain P(y ₁ |x), P(y ₂ |x), where P(y ₁ |x) represents the probability that it belongs to a normal command when the current command contains each feature word in the set x, and P(y ₂ |x) represents the probability that the current command contains the set When each characteristic word in x, it belongs to the probability of abnormal command. Compare the values of P(y ₁ |x) and P(y ₂ |x), and determine the classification of the current command according to the comparison result. For example, the larger of the two values is taken as the classification of the current command, or when the difference between the two reaches a certain threshold, the larger of the two is taken as the classification of the current command. The following describes how to obtain P(y ₁ |x) and P(y ₂ |x).

根据贝叶斯分类方法，有如下获取方法：According to the Bayesian classification method, there are the following acquisition methods:

P(y₁|x)=P(x|y₁)*P(y₁)/P(x)P(y ₁ |x)=P(x|y ₁ )*P(y ₁ )/P(x)

P(y₂|x)=P(x|y₂)*P(y₂)/P(x)P(y ₂ |x)=P(x|y ₂ )*P(y ₂ )/P(x)

其中P(x)对于y₁=good和y₂=bad两个分类来说是相等的常数，因此，只需求出P(x|y₁)*P(y₁)，以及P(x|y₂)*P(y₂)即可。Among them, P(x) is an equal constant for the two categories of y ₁ =good and y ₂ =bad. Therefore, only P(x|y ₁ )*P(y ₁ ), and P(x|y ₂ )*P(y ₂ ) is enough.

而其中正常命令出现的概率P(y₁)，以及异常命令出现的概率P(y₂)，可以根据训练样本集中正常命令以及异常命令出现的频率来确定。例如在训练样本集中一共采集了4651629条命令，而其中出现的异常命令有68440条，则异常命令出现的概率P(y₂)为：The probability P(y ₁ ) of normal commands and the probability P(y ₂ ) of abnormal commands can be determined according to the frequency of normal commands and abnormal commands in the training sample set. For example, a total of 4,651,629 commands were collected in the training sample set, and there were 68,440 abnormal commands, the probability P(y ₂ ) of abnormal commands is:

68440/4651629≈0.01471368440/4651629≈0.014713

而相应的正常命令出现的概率则为P(y₁)≈（1-P(y₂)）=0.985287。The probability of the corresponding normal command appearing is P(y ₁ )≈(1-P(y ₂ ))=0.985287.

P(w₁|y₁)*P(w₂|y₁)*P(w₃|y₁)*…*P(w_n|y₁)P(w ₁ |y ₁ )*P(w ₂ |y ₁ )*P(w ₃ |y ₁ )*…*P(w _n |y ₁ )

而其中P(w₁|y₁)，P(w₂|y₁)，P(w₃|y₁)，…，P(w_n|y₁)各项，表示在集合x中各特征词在正常命令里出现的概率，这些项所代表的概率数值，可以通过训练样本集中正常命令中出现目标特征词的概率统计出来。P(x|y₂)的获取原理与P(x|y₁)的获取方法类似，在此不再赘述。需要说明的是在获取P(w₁|y₁)，P(w₂|y₁)，P(w₃|y₁)，…，P(w_n|y₁)各项的乘积时，由于其中各项的数值都属于(0,1)区间，导致各项连乘后得到的结果经常趋近于0，甚至由于超出了计算机能够表达的浮点数范围精度，导致可能计算结果等于0，此时可选的，还可以将：Among them, P(w ₁ |y ₁ ), P(w ₂ |y ₁ ), P(w ₃ |y ₁ ), ..., P(w _n |y ₁ ), represent the feature words in the set x The probability of appearing in normal commands and the probability values represented by these items can be calculated by the probability of target feature words appearing in normal commands in the training sample set. The principle of obtaining P(x|y ₂ ) is similar to the method of obtaining P(x|y ₁ ), and will not be repeated here. It should be noted that when obtaining the product of P(w ₁ |y ₁ ), P(w ₂ |y ₁ ), P(w ₃ |y ₁ ), ..., P(w _n |y ₁ ), due to The values of each item belong to the (0,1) interval, resulting in the result obtained after multiplication of each item is often close to 0, and even the calculation result may be equal to 0 because it exceeds the range of floating-point numbers that the computer can express. Optionally, you can also add:

P(w₁|y₁)*P(w₂|y₁)*P(w₃|y₁)*…*P(w_n|y₁)转化为对数和的形式，例如转化为： P(w ₁ |y ₁ )*P(w ₂ |y ₁ )*P(w ₃ |y ₁ )*…*P(w _n |y ₁ ) is transformed into logarithmic sum form, for example, transformed into:

以上介绍了以贝叶斯分类方法实现的分类模块1204，在实际应用中，这种方法实现的分类模块能够基于训练样本集进行学习，对输入的当前命令可以获取非常准确的分类结果，分类方法本身易于实现、分类准确率高、速度快。The classification module 1204 implemented by the Bayesian classification method has been introduced above. In practical applications, the classification module implemented by this method can learn based on the training sample set, and can obtain very accurate classification results for the current input command. The classification method It is easy to implement itself, with high classification accuracy and fast speed.

需要说明的是，除了使用贝叶斯分类方法对输入的命令进行分类，还可以使用逻辑回归、偏最小二乘法，决策树等来实现分类模块1204。利用不同的方法实现的分类模块1204，其数据训练学习和识别的过程会因方法的不同而不同，但同样可以非常准确的对输入的当前命令进行分类，识别出当前命令是正常命令还是异常命令。例如在使用决策树实现的分类模块1204中，需要首先根据训练样本集中的数据进行训练，生成的是一个决策树模型，在需要判断输入的当前命令的分类时，可以先将当前命令进行分词，将获得的各特征词代入到该决策树模型中，计算出其属于哪一个分类，进而确定当前命令是正常命令，或者异常命令。分类模块1204的其他实现方法都可以分为根据训练样本集学习训练，生产判断模型，进而利用判断模型对输入的当前命令进行判断的过程，在此就不在举例赘述了。It should be noted that, in addition to using the Bayesian classification method to classify the input commands, the classification module 1204 may also be implemented by using logistic regression, partial least square method, decision tree, and the like. The classification module 1204 realized by different methods, its data training learning and recognition process will be different due to different methods, but it can also classify the current command input very accurately, and identify whether the current command is a normal command or an abnormal command . For example, in the classification module 1204 implemented using a decision tree, it is necessary to first perform training according to the data in the training sample set to generate a decision tree model. When it is necessary to judge the classification of the current command input, the current command can be segmented first, Substitute the obtained feature words into the decision tree model to calculate which category it belongs to, and then determine whether the current command is a normal command or an abnormal command. Other implementation methods of the classification module 1204 can be divided into the process of learning and training according to the training sample set, producing a judgment model, and then using the judgment model to judge the input current command, which will not be repeated here.

此外，在实际应用中，采用贝叶斯分类方法、逻辑回归、偏最小二乘法或决策树等来实现分类模块1204，其输出的结果是一个逼近真实情况的近似值，这种近似值只有当训练样本集中的训练样本达到一定的数量规模后，才能达到理想的精确程度，换而言之，能够收集到的训练样本越多，则训练出的分类模块1204就越可靠，其输出的结果就越逼近实际的情况。所以，在实际的使用中，需要对训练样本集的数据进行不断扩充，由学习模块1206将新增的当前命令也作为一部分训练样本，与已有训练样本集合并后进行机器学习，从而更新为分类模块1204提供的各种先验参数，进而使分类模块1204能够利用更加丰富的训练样本进行学习和训练，进一步提高分类模块1204的识别精度，使其对当前输入命令的识别结果更加准确。In addition, in practical applications, Bayesian classification methods, logistic regression, partial least squares or decision trees are used to implement the classification module 1204, and the output result is an approximate value close to the real situation. This approximate value can only be obtained when the training samples Only when the concentrated training samples reach a certain scale can the desired accuracy be achieved. In other words, the more training samples that can be collected, the more reliable the trained classification module 1204 is, and the closer the output results are to actual situation. Therefore, in actual use, it is necessary to continuously expand the data of the training sample set. The learning module 1206 uses the newly added current command as a part of the training samples, and performs machine learning after merging with the existing training sample set, thereby updating to The various prior parameters provided by the classification module 1204 further enable the classification module 1204 to use more abundant training samples for learning and training, further improve the recognition accuracy of the classification module 1204, and make the recognition result of the current input command more accurate.

实现方式三：Implementation method three:

命令分析器120中可以包括过滤模块1202、分类模块1204以及学习模块1206，通过过滤模块1202对集中数据回收器110接收到的命令采用预置的可疑规则进行过滤，将被可疑规则命中的命令输出至分类模块1204，并输出被可疑规则命中的命令的告警权值，告警权值基于该条可疑规则对命令的总体命中率获得；分类模块1204耦接于过滤模块1202，根据已有分类模型的训练样本集，对从过滤模块1202输入的当前命令进一步分类，获得当前命令分别是正常命令的概率和异常命令的概率，进而识别出该当前命令是否为异常命令。本实现方式中的学习模块1206与具体实现方式二种的学习模块1206类似，仍然是对已有样本训练样本集进行机器学习，以及当有新增命令时，将新增命令与已有训练样本集合并后再进行机器学习，从而为分类模块1204提供所需的各种先验参数。The command analyzer 120 may include a filtering module 1202, a classification module 1204, and a learning module 1206. The filtering module 1202 filters the commands received by the centralized data collector 110 using preset suspicious rules, and outputs the commands hit by the suspicious rules. to the classification module 1204, and output the warning weight of the command hit by the suspicious rule, the warning weight is obtained based on the overall hit rate of the suspicious rule to the command; the classification module 1204 is coupled to the filtering module 1202, according to the existing classification model The training sample set further classifies the current command input from the filtering module 1202 to obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify whether the current command is an abnormal command. The learning module 1206 in this implementation is similar to the learning module 1206 in the second specific implementation. It still performs machine learning on the existing sample training sample set, and when there is a new command, the new command is combined with the existing training sample. Machine learning is performed after the collections are merged, so as to provide various prior parameters required by the classification module 1204 .

这中实现方式结合了实现方式一和实现方式二的实现方法，首先通过过滤模块1202对集中数据回收器110接收的当前命令采用预置的可疑规则进行过滤，预置的可疑规则，可以是预置的正则表达式规则，通过预置的正则表达式规则，与将收集到的当前命令进行批量匹配，将其中命中可疑规则的异常命令过滤出来，而未被可疑规则命中的命令可以视为正常命令。并输出被可疑规则命中的当前命令的告警权值，告警权值基于该条可疑规则对命令的总体命中率获得，根据可疑规则对命令的总体命中率获得告警权值的方法可以参考实现方式一，在此就不再赘述了。The implementation method here combines the implementation methods of the first implementation method and the second implementation method. First, the current command received by the centralized data collector 110 is filtered by the filtering module 1202 using preset suspicious rules. The preset suspicious rules can be preset The preset regular expression rules, through the preset regular expression rules, perform batch matching with the current commands to be collected, and filter out the abnormal commands that hit suspicious rules, while the commands that are not hit by suspicious rules can be regarded as normal Order. And output the warning weight of the current command hit by the suspicious rule. The warning weight is obtained based on the overall hit rate of the suspicious rule to the command. The method of obtaining the warning weight based on the overall hit rate of the suspicious rule to the command can refer to the implementation method 1 , which will not be repeated here.

进一步的，过滤模块1202将被可疑规则命中的命令输出至分类模块1204，由分类模块1204对可疑规则命中的命令做进一步的判断，识别出当前命令是正常命令还是异常命令。其中，分类模块1204的具体实现方式与前面实现方式二种的分类模块1204类似，因而此处不再赘述。在这种实现方式下，由过滤模块1202的可疑规则命中的命令，又输入到了分类模块1204做进一步的判断，使得对当前输入的命令是否异常命令的判断更加的准确，能够在很大程度上进一步避免误判的发生。Furthermore, the filtering module 1202 outputs the commands hit by suspicious rules to the classification module 1204, and the classification module 1204 further judges the commands hit by suspicious rules to identify whether the current command is a normal command or an abnormal command. Wherein, the specific implementation manner of the classification module 1204 is similar to that of the classification module 1204 in the second implementation manner above, so it will not be repeated here. In this implementation mode, the commands hit by the suspicious rules of the filtering module 1202 are input to the classification module 1204 for further judgment, so that the judgment of whether the currently input command is an abnormal command is more accurate, and the judgment can be made to a large extent. Further avoid the occurrence of misjudgment.

在命令分析器120通过上述多种方式识别出异常命令后，提供给告警模块130。告警模块130根据命令分析器120的识别结果判断是否满足告警条件，如果满足，则发出相应主机存在异常的告警信息。发出告警信息的方式多种多样，例如，可以是向预留的电子邮件地址发送含有主机存在异常的告警信息的电子邮件，再例如也可以是向预留的电话号码发送含有主机存在异常的告警信息的消息的方式等等。如前，在各主机上执行的命令与其执行的主机是存在对应关系的，在满足告警条件时，可以发出执行异常命令的相应主机存在异常的告警信息，以及时地对对应的主机进行处理。After the command analyzer 120 identifies the abnormal command through the above-mentioned various methods, it provides the command to the alarm module 130 . The alarm module 130 judges whether the alarm condition is satisfied according to the recognition result of the command analyzer 120, and if so, sends out an alarm message indicating that the corresponding host is abnormal. There are various ways to send out the alarm information. For example, it can send an email containing the alarm information of the abnormality of the host to the reserved email address, or send an alarm of the abnormality of the host to the reserved phone number. The way of information, information and so on. As before, there is a corresponding relationship between the commands executed on each host and the host it executes. When the alarm conditions are met, an alarm message indicating that the corresponding host executing the abnormal command is abnormal can be issued, and the corresponding host can be dealt with in a timely manner.

具体在实现告警器130时，告警器130可以统计各主机在一定的时间周期内异常命令出现的次数，判断该周期内次数是否达到预置的阈值，如果达到则发出执行异常命令的相应主机存在异常的告警信息。例如预置的设置是某主机在5分钟的时间内出现10条以上异常命令就发出告警信息，而某主机在5分钟的时间周期内被检测到输入了11次异常命令，则发出该主机存在异常的告警信息。除了这种告警方式外，为了实现更加灵活和精确的告警，告警器130还可以以其他的方式实现。下面对实现告警器130的其它方式进行介绍。Specifically, when realizing the alarm device 130, the alarm device 130 can count the number of times that the abnormal commands of each host appear in a certain period of time, and judge whether the number of times in this cycle reaches the preset threshold value. Abnormal warning information. For example, the preset setting is that if a host has more than 10 abnormal commands within 5 minutes, an alarm message will be issued, and if a host is detected to enter 11 abnormal commands within a 5-minute period, an alarm message will be issued. Abnormal warning information. In addition to this warning method, in order to realize a more flexible and accurate warning, the alarm device 130 may also be implemented in other ways. Other ways of implementing the alarm 130 will be introduced below.

告警器130可以对应于命令分析器120不同的实现方式有不同的实现。如对应于前述的命令分析器120的实现方式一，告警器130可以在异常命令出现时，根据可疑规则命中的该命令对应的总体命中率确定的告警权值，判断是否满足告警条件，并在满足告警条件时发出相应主机存在异常的告警信息。具体实现时，告警器130还可以统计一个告警周期内、命令分析器120识别出的某一主机上的所有异常命令，将这些异常命令各自对应的告警权值进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。例如命令分析器120输出被各可疑规则命中的异常命令和对应的告警权值的对应如下：The alerter 130 may have different implementations corresponding to different implementations of the command analyzer 120 . As corresponding to the aforementioned first implementation of the command analyzer 120, the alarm device 130 can determine whether the alarm condition is met according to the alarm weight determined by the overall hit rate corresponding to the command that is hit by the suspicious rule when an abnormal command appears, and then When the alarm condition is met, an alarm message indicating that the corresponding host is abnormal is issued. During specific implementation, the alarm device 130 can also count all abnormal commands on a certain host computer identified by the command analyzer 120 within an alarm period, and perform comprehensive processing on the corresponding alarm weights of these abnormal commands. According to the comprehensive processing The final value judges whether the preset alarm condition is met. For example, the output of the command analyzer 120 corresponds to the abnormal command hit by each suspicious rule and the corresponding alarm weight as follows:

cmd001——99.8cmd001 - 99.8

cmd003——30.0cmd003 - 30.0

cmd004——95.3cmd004 - 95.3

cmd005——99.8cmd005 - 99.8

在预置的时间周期内，预置的告警条件为出现的各异常命令的告警权值的总和达到预置的告警阈值，如预置的告警条件为在5分钟内的时间周期内，告警权值的总和达到1000即发出告警信息，而在5分钟内上述各异常命令出现的次数如下：Within the preset time period, the preset alarm condition is that the sum of the alarm weights of all abnormal commands that appear reaches the preset alarm threshold. For example, the preset alarm condition is within a time period of 5 minutes, and the alarm weight When the sum of the values reaches 1000, an alarm message will be issued, and the number of occurrences of the above abnormal commands within 5 minutes is as follows:

cmd001——2次cmd001 - 2 times

cmd003——1次cmd003 - 1 time

cmd004——3次cmd004 - 3 times

cmd005——5次cmd005 - 5 times

根据上述的各异常命令的告警权值和出现次数得到这5分钟内的告警权值总和为1014.5，可见这5分钟内的告警权值总和已经超过了预置的告警阈值，则发出相应主机存在异常的告警信息。According to the above-mentioned alarm weights and occurrence times of each abnormal command, the sum of the alarm weights in the 5 minutes is 1014.5. It can be seen that the sum of the alarm weights in the 5 minutes has exceeded the preset alarm threshold, and the corresponding host exists Abnormal warning information.

可见，对异常命令对应的告警权值的“综合化处理”可以是根据具体的告警方式的不同而不同，如上述示例中，可以是将各异常命令出现的次数与对应的告警权值的乘积的累加，或者是将各异常命令的告警权值直接累加（如果某条命令在一个告警周期内多次出现，则将累加多次该命令的告警权值）最后得到的结果如果达到预置阈值就发出告警信息。需要说明的是，之所以对一个告警周期内、某一主机桑的所有异常命令各自对应的告警权值进行综合化处理之后，再判断是否需要告警，主要是为了尽可能的减少误报，因为往往出现真正具有危险的命令时，可能在短时间内会出现多个异常命令，所以比较好的方式是对一定时间内（即一个告警周期内）的所有异常命令综合分析他们的告警权值，而不仅仅是单独看某一个异常命令的告警权值。因此可以理解，综合化处理的方式多种多样，可以采取前面提到的多个告警权值累加的方式，也可以采取多个告警权值相乘取对数等方式，这完全取决于实际需要，这些可行的方式都在本发明的保护范围内。而且，对于命令分析器120的不同实现方式，由于其告警权值的获取方式以及告警权值的最终数值表达都可以有所不同，因此，对异常命令对应的告警权值的“综合化处理”也可以对应有所不同。It can be seen that the "comprehensive processing" of the alarm weight corresponding to the abnormal command can be different according to the specific alarm method. As in the above example, it can be the product of the number of occurrences of each abnormal command and the corresponding alarm weight accumulation, or directly accumulate the warning weight of each abnormal command (if a certain command appears multiple times in one warning cycle, the warning weight of the command will be accumulated multiple times) if the final result reaches the preset threshold A warning message is issued. It should be noted that the reason why the alarm weights corresponding to all abnormal commands of a certain host within an alarm cycle are comprehensively processed, and then judge whether an alarm is needed is mainly to reduce false alarms as much as possible, because Often when there are truly dangerous commands, multiple abnormal commands may appear in a short period of time, so a better way is to comprehensively analyze their alarm weights for all abnormal commands within a certain period of time (that is, within an alarm period). Rather than just looking at the warning weight of a certain abnormal command alone. Therefore, it can be understood that there are various methods of comprehensive processing, such as the accumulation of multiple alarm weights mentioned above, or the multiplication of multiple alarm weights to obtain logarithms, etc. It all depends on the actual needs. , these feasible ways are all within the protection scope of the present invention. Moreover, for different implementations of the command analyzer 120, since the way of obtaining the warning weight and the final numerical expression of the warning weight can be different, the "comprehensive processing" of the warning weight corresponding to the abnormal command Correspondence may also be different.

在命令分析器120以实现方式二的方式实现时，命令分析器120包括的分类模块1204可以获得当前命令分别是正常命令和异常命令的概率，此时，告警器130可以统计一个告警周期内、命令分析器120识别出的某一主机上的所有异常命令，将这些异常命令各自对应的异常命令概率进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。例如在预置的5分钟的时间周期内，在某一主机输入的异常命令、各异常命令出现的次数，以及各异常命令是异常命令的概率如下：When the command analyzer 120 is implemented in the manner of the second implementation, the classification module 1204 included in the command analyzer 120 can obtain the probability that the current command is a normal command and an abnormal command. At this time, the alarm 130 can count the The command analyzer 120 integrates all the abnormal commands on a certain host identified by the corresponding abnormal command probabilities, and judges whether the preset alarm conditions are satisfied according to the integrated values. For example, within a preset time period of 5 minutes, the abnormal commands input by a certain host, the number of occurrences of each abnormal command, and the probability that each abnormal command is an abnormal command are as follows:

cmd001——2次——0.95cmd001 - 2 times - 0.95

cmd003——1次——0.89cmd003 - 1 time - 0.89

cmd004——3次——0.98cmd004 - 3 times - 0.98

cmd005——5次——0.90cmd005 - 5 times - 0.90

在将这些异常命令各自对应的异常命令概率进行综合化处理时，可以将各异常命令的概率与出现次数的乘积的和（或者说将各异常命令的概率累乘，如出现多次则累乘多次），作为是否告警的参考数据。如本示例中，得到的参考数据为10.23，如果预置的告警条件是该参考数据高于10，则判断综合化处理的结果达到预置的告警条件，发出相应主机存在异常的告警信息。与前面提到的综合化处理的具体实现方式可以有多种情况类似，本示例中综合化处理也可以有多种具体实现方式，可以根据实际情况对综合化处理的具体方式加以调整，只要能够体现出是综合多个异常命令的概率判断是否告警即可。When the probabilities of abnormal commands corresponding to these abnormal commands are comprehensively processed, the sum of the product of the probability of each abnormal command and the number of occurrences can be summed (or the probability of each abnormal command can be multiplied, if it occurs multiple times, the cumulative multiplication Multiple times), as the reference data for whether to give an alarm. For example, in this example, the obtained reference data is 10.23. If the preset alarm condition is that the reference data is higher than 10, it is judged that the result of the integrated processing meets the preset alarm condition, and an alarm message indicating that the corresponding host is abnormal is issued. Similar to the above-mentioned specific implementation methods of comprehensive processing, there can also be multiple specific implementation methods of comprehensive processing in this example, and the specific methods of comprehensive processing can be adjusted according to the actual situation, as long as it can It shows that it is only necessary to combine the probabilities of multiple abnormal commands to determine whether to give an alarm.

在以实现方式三实现的命令分析器120，能够通过过滤模块1202对集中数据回收器110接收的当前命令采用预置的可疑规则进行过滤，并输出被可疑规则命中的当前命令的告警权值，以及分类模块1204对可疑规则命中的当前命令做进一步的判断，识别出当前命令是正常命令还是异常命令，同时获得当前命令分别是正常命令的概率和异常命令的概率。在这种实现方式下，告警器130在实现时，可以统计一个告警周期内、命令分析器120识别出的某一主机上的所有异常命令，将每一异常命令对应的异常命令概率和告警权值相乘获得对应的告警指数，将这些异常命令的告警指数进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。例如在预置的5分钟的时间周期内，某一主机输入的异常命令，各异常命令对应的异常命令概率和告警权值以及出现的次数如表1所示：In the command analyzer 120 implemented in the third implementation mode, the current command received by the centralized data collector 110 can be filtered through the filtering module 1202 using preset suspicious rules, and the alarm weight of the current command hit by the suspicious rule can be output. And the classification module 1204 further judges the current command hit by the suspicious rule, identifies whether the current command is a normal command or an abnormal command, and obtains the probability that the current command is a normal command and the probability of an abnormal command respectively. In this way of implementation, when the alarm device 130 is implemented, it can count all the abnormal commands on a certain host identified by the command analyzer 120 within an alarm period, and calculate the abnormal command probability and alarm power corresponding to each abnormal command. Values are multiplied to obtain the corresponding alarm index, and the alarm indices of these abnormal commands are integrated, and it is judged whether the preset alarm condition is met according to the integrated value. For example, within the preset time period of 5 minutes, the abnormal command input by a host, the corresponding abnormal command probability, alarm weight and the number of occurrences of each abnormal command are shown in Table 1:

表1Table 1

异常命令abnormal command 异常命令概率Abnormal command probability 告警权值Alarm weight 告警指数Warning index 出现次数The number of occurrences cmd001cmd001 0.950.95 99.899.8 98.4198.41 22 cmd003cmd003 0.890.89 90.090.0 80.1080.10 11 cmd004cmd004 0.980.98 95.395.3 93.3993.39 33 cmd005cmd005 0.900.90 99.899.8 89.8289.82 55

此时，在对异常命令的告警指数进行综合化处理时，可以将各异常命令对应的告警指数进行累加，如某个异常命令出现多次，则将该异常命令对应的告警指数多次累加即可，或者说取各异常命令对应的告警指数与出现次数的乘积的和，作为是否发出告警信息的参考数据。例如在上表中，在对异常命令的告警指数进行综合化处理获得的参考值为1006.19，若预置的告警条件为一预置的告警阈值1000，且参考值高于该告警阈值时即发出告警信息，那么在该示例中对异常命令的告警指数进行综合化处理获得的参考值为1006.19，高于预置的告警阈值，符合发出告警信息的预置条件，则发出相应主机存在异常的告警信息。At this time, when comprehensively processing the alarm index of the abnormal command, the alarm index corresponding to each abnormal command can be accumulated. Optionally, or in other words, the sum of the products of the warning index corresponding to each abnormal command and the number of occurrences is taken as the reference data for whether to issue warning information. For example, in the above table, the reference value obtained by comprehensively processing the alarm index of the abnormal command is 1006.19. If the preset alarm condition is a preset alarm threshold of 1000 and the reference value is higher than the alarm threshold, it will be issued. Alarm information, then in this example, the reference value obtained by comprehensively processing the alarm index of the abnormal command is 1006.19, which is higher than the preset alarm threshold and meets the preset conditions for sending alarm information, and an alarm about the abnormality of the corresponding host is issued information.

至此，前面介绍的用于分析在各主机上执行的命令的系统可以较好完成对各主机上执行的命令的分析及告警。为了实现对各主机终端的闭环监控，提高整个网络系统的安全性，该系统还可以包括监控器220，通过监控器220对各主机中的命令发送模块2102的部署情况进行监控。具体而言，一方面监控器220可以获知系统中部署的各主机终端的信息，比如各主机终端的主机IP，另一方面监控器220可以从集中数据回收器110获知其接收到了哪些主机上执行的命令，这样通过对比，监控器220就可以知道哪些主机上执行的命令没有被成功传输至集中数据回收器110。So far, the above-mentioned system for analyzing the commands executed on each host can better complete the analysis and alarm of the commands executed on each host. In order to realize the closed-loop monitoring of each host terminal and improve the security of the entire network system, the system may further include a monitor 220 through which the deployment of the command sending module 2102 in each host is monitored. Specifically, on the one hand, the monitor 220 can obtain the information of each host terminal deployed in the system, such as the host IP of each host terminal; In this way, by comparison, the monitor 220 can know which commands executed on the hosts have not been successfully transmitted to the centralized data collector 110 .

如果是已经部署了命令发送模块2102的主机终端没有正确传输命令给集中数据回收器110，那么就说明该主机终端上的命令发送模块2102失效了；如果新加入系统的主机终端没有将其上执行的命令传输给集中数据回收器110，那么就说明该主机终端上还没有部署命令发送模块2102。监控器220发现上面这两种情况后，即可及时处理，例如，当发现有新增主机未部署命令发送模块2102或发现有主机上的命令发送模块2102失效时，可以通过未部署命令发送模块2102或命令发送模块2102失效的主机IP自动登录至该主机上，为其部署命令发送模块2102。可以看出，通过监控器220对各主机上的命令发送模块2102进行实时的监控，能够及时的发现不能正常运行的命令发送模块2102，或者新加入的未部署命令发送模块2102的主机的情况，进而可以在发现异常时及时的对不能正常运行命令发送模块2102的主机进行调整，或者在新加入的未部署命令发送模块2102的主机上部署命令发送模块2102。从而保证整个系统能够实现闭环监控，自行发现问题及解决问题，更好的保证了命令分析结果的准确性以及告警的准确性。If the host terminal that has deployed the command sending module 2102 does not correctly transmit the command to the centralized data collector 110, it means that the command sending module 2102 on the host terminal is invalid; If the command is transmitted to the centralized data collector 110, it means that the command sending module 2102 has not been deployed on the host terminal. After the monitor 220 discovers the above two situations, it can be processed in time. For example, when it is found that there is a new host that has not deployed the command sending module 2102 or when it is found that the command sending module 2102 on the host fails, it can pass the undeployed command sending module. 2102 or the IP of the host whose command sending module 2102 fails is automatically logged on to the host, and the command sending module 2102 is deployed for it. It can be seen that the command sending module 2102 on each host is monitored in real time by the monitor 220, and the command sending module 2102 that cannot operate normally, or the situation of the host computer that has not deployed the command sending module 2102 newly added, can be found in time. Furthermore, when an abnormality is found, the host that cannot run the command sending module 2102 can be adjusted in time, or the command sending module 2102 can be deployed on a newly added host that has not deployed the command sending module 2102 . This ensures that the entire system can realize closed-loop monitoring, discover and solve problems by itself, and better ensure the accuracy of command analysis results and alarms.

以上介绍了本发明实施例提供的用于分析在各主机上执行的命令的分析设备及系统。与本发明实施例提供的用于分析在各主机上执行的命令的分析设备及系统相对应，本发明实施例还提供了一种用于分析在各主机上执行的命令的分析方法。The analysis device and system provided by the embodiments of the present invention for analyzing commands executed on each host are described above. Corresponding to the analysis device and system for analyzing commands executed on each host provided by the embodiment of the present invention, the embodiment of the present invention also provides an analysis method for analyzing commands executed on each host.

请参见图2，该方法开始于步骤S210，首选收集各主机通过网络传输的当前命令及所属主机标识。在具体实现收集各主机通过网络传输的当前命令及所属主机标识时，可以对各主机的命令解析器shell进行改造，增加将shell接收到的主机当前命令和主机IP通过网络传输至指定设备的函数，利用函数收集各主机的当前命令及所属主机标识。步骤S210可以通过前文的集中数据回收器110执行，其相关的技术特征可以参考前文中关于集中数据回收器110在实施例中的描述，此处不再赘述。此外，还可以对各主机传输当前命令及所属主机标识的事件进行监控，当发现有新增主机未进行上述shell改造或者改造失效时，通过该主机IP自动登录至该主机上为其部署上述shell的改造，以便于及时发现不能正常传输命令或主机标识的主机，或者新加入的未添加传输功能的主机，对这些主机进行及时的调整，从而实现对各主机的闭环监控，提高了整个网络系统的安全性。Please refer to FIG. 2 , the method starts at step S210, first collecting the current commands transmitted by each host through the network and the IDs of the hosts to which they belong. When collecting the current commands transmitted by each host through the network and the host IDs to which they belong, the shell of the command parser of each host can be modified to add the function of transmitting the current command of the host and the host IP received by the shell to the specified device through the network , use the function to collect the current command of each host and the ID of the host to which it belongs. Step S210 can be executed by the aforementioned centralized data collector 110 , and its relevant technical features can refer to the foregoing description of the centralized data collector 110 in the embodiment, and will not be repeated here. In addition, it is also possible to monitor the events of each host transmitting the current command and its own host ID. When it is found that a new host has not undergone the above-mentioned shell transformation or the transformation fails, it will automatically log in to the host through the host IP to deploy the above-mentioned shell for it. In order to facilitate timely discovery of hosts that cannot normally transmit commands or host IDs, or newly added hosts that have not added transmission functions, timely adjustments are made to these hosts, thereby realizing closed-loop monitoring of each host and improving the overall network system. security.

在步骤S210中收集了各主机当前命令以及所属的主机标识，接下来可以执行步骤S220，对收集到的当前命令进行识别，至少识别出异常命令和正常命令。具体在识别当前命令时，也可以有多种实现方式：In step S210, the current commands of each host and the identifiers of the hosts are collected, and then step S220 may be executed to identify the collected current commands, at least to identify abnormal commands and normal commands. Specifically, when identifying the current command, there are also multiple implementation methods:

第一种方式，对收集到的当前命令采用预置的可疑规则进行过滤，将被可疑规则命中的当前命令识别为异常命令，并获得被可疑规则命中的异常命令的告警权值，告警权值基于该条可疑规则对命令的总体命中率获得，其中可疑规则可以是正则表达式规则。这种实现方式可以通过前文系统实施例中命令分析器120来实现，具体可以通过过滤模块1202予以实现，因此相关技术技术特征可以参考前文过滤模块1202的相关描述，此处不再赘述。同样，除了根据可疑规则过滤出异常命令外，还可以获得异常命令的告警权值，同样可以是通过以该条可疑规则对已有命令的总体命中率作为自变量的单调递减函数，获得被该条可疑规则命中的异常命令的告警权值，相关技术特征也可以参考前面系统实施例中过滤模块1202中关于告警权值的描述，此处不再赘述。The first way is to filter the collected current commands with preset suspicious rules, identify the current commands hit by suspicious rules as abnormal commands, and obtain the alarm weight and alarm weight of abnormal commands hit by suspicious rules It is obtained based on the overall hit rate of the command by the suspicious rule, where the suspicious rule may be a regular expression rule. This implementation can be realized by the command analyzer 120 in the above system embodiment, specifically by the filter module 1202, so related technical features can refer to the relevant description of the filter module 1202 above, and will not be repeated here. Similarly, in addition to filtering out abnormal commands according to suspicious rules, the alarm weight of abnormal commands can also be obtained. It can also be obtained by using the overall hit rate of the suspicious rule on existing commands as an independent variable. For the alarm weight of the abnormal command hit by a suspicious rule, related technical features can also refer to the description of the alarm weight in the filter module 1202 in the previous system embodiment, and will not be repeated here.

第二种实现方式，具体是根据已有分类模型的训练样本集，对接收到的当前命令进行分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否属于异常命令。在这种实现方式下，分类模型可以基于贝叶斯分类方法，逻辑回归，偏最小二乘法，或决策树等方法来实现，该实现过程可以是，首先基于训练样本集，运用一种分类方法进行训练和学习，进而在需要判断输入的当前命令的分类时，可以先将当前命令进行分词，将获得的各特征词代入到训练好的模型中，计算出其属于哪一个分类，进而确定当前命令是正常命令或者异常命令。当然，为了提高分类的准确度，需要不断丰富训练样本集中的数据，因此可以将新增的当前命令与已有训练样本集合并后进行机器学习，更新进行分类时使用的已有训练样本集。这种实现方式可以通过前文系统实施例中命令分析器120来执行，具体是通过分类模块1204以及学习模块1206来执行，即命令分析器120的第二种实现方式，因此相关技术特征可以参考分类模块1204在实施例中的描述，此处不再赘述。The second implementation method is to classify the received current command according to the training sample set of the existing classification model, obtain the probability that the current command is a normal command and the probability that it is an abnormal command, and then identify whether the current command belongs to Unusual command. In this implementation, the classification model can be implemented based on methods such as Bayesian classification methods, logistic regression, partial least squares, or decision trees. The implementation process can be, first, based on the training sample set, using a classification method Carry out training and learning, and then when it is necessary to judge the classification of the current input command, the current command can be segmented first, and the obtained feature words can be substituted into the trained model to calculate which classification it belongs to, and then determine the current command. The command is either a normal command or an abnormal command. Of course, in order to improve the accuracy of classification, it is necessary to continuously enrich the data in the training sample set. Therefore, the newly added current command can be combined with the existing training sample set for machine learning to update the existing training sample set used for classification. This implementation can be executed by the command analyzer 120 in the above system embodiment, specifically by the classification module 1204 and the learning module 1206, that is, the second implementation of the command analyzer 120, so related technical features can refer to classification The description of module 1204 in the embodiment will not be repeated here.

第三种实现方式，可以理解为是将前面两种实现方式相结合，即首先对接收到的当前命令采用预置的可疑规则进行过滤，筛选出被可疑规则命中的当前命令，并输出被可疑规则命中的当前命令的告警权值，其中，告警权值基于该条可疑规则对命令的总体命中率获得；然后，再根据已有分类模型的训练样本集，对筛选出的上述当前命令进一步分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否为异常命令。从而得到更加精确的对当前命令是否异常命令的识别结果。这种实现方式可以通过前面系统实施例中、第三种方式的命令分析器120来执行，因此相关技术特征可以参考命令分析器120中过滤模块1202、分类模块1204以及学习模块1206的相关描述，此处不再赘述。The third implementation method can be understood as a combination of the previous two implementation methods, that is, firstly, the current command received is filtered by the preset suspicious rules, and the current command hit by the suspicious rule is filtered out, and the suspicious rule is output. The alarm weight of the current command hit by the rule, where the alarm weight is obtained based on the overall hit rate of the suspicious rule to the command; and then, according to the training sample set of the existing classification model, further classify the above-mentioned current command selected , to obtain the probability that the current command is a normal command and the probability that it is an abnormal command, and then identify whether the current command is an abnormal command. Thus, a more accurate identification result of whether the current command is an abnormal command can be obtained. This implementation can be performed by the command analyzer 120 in the third mode in the previous system embodiment, so related technical features can refer to the relevant descriptions of the filtering module 1202, the classification module 1204 and the learning module 1206 in the command analyzer 120, I won't repeat them here.

在通过步骤S220对各主机输入的当前命令进行分类，即识别出异常命令后执行步骤230，即根据识别结果判断是否满足告警条件，如果满足，则发出相应主机存在异常的告警信息。具体在发送相应主机存在异常的告警信息时，可以统计各主机所输入的命令在一定的时间周期内被识别为异常命令出现的次数，判断该周期内出现异常命令次数是否达到预置的阈值，如果达到则发出的相应主机存在异常的告警信息。例如预置的设置是在5分钟的时间内出现10条或10条以上命令就发出告警信息，如果某主机在5分钟的时间周期内输入的命令中，识别出了11条异常命令，则发出该主机存在异常的告警信息。除了这种告警方式外，为了实现更加灵活和精确的告警，步骤S230还可以以根据步骤220的不同实现方式，有对应的不同的实现方式。例如当步骤S220通过预置的可疑规则过滤出异常命令，并输出被可疑规则命中的异常命令的告警权值时，步骤S230可以根据异常命令的告警权值判断是否满足告警条件，如果满足则发出告警信息，具体的可以是统计一个告警周期内、识别出的某一主机上的所有异常命令，将这些异常命令各自对应的告警权值进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件，如对各次出现的异常命令的对应告警权值做累加，在该周期内判断累加的告警权值是否达到预置的阈值，如果达到则发出对应主机存在异常的告警信息。Classify the current commands input by each host through step S220, that is, execute step 230 after identifying abnormal commands, that is, judge whether the alarm condition is met according to the identification result, and if so, send out an alarm message indicating that the corresponding host is abnormal. Specifically, when sending the alarm information that the corresponding host is abnormal, the number of times the commands input by each host are identified as abnormal commands within a certain period of time can be counted, and it can be judged whether the number of abnormal commands within the period reaches the preset threshold. If it is reached, the alarm information about the abnormality of the corresponding host will be sent. For example, the preset setting is to send an alarm message when 10 or more commands appear within 5 minutes. The host has abnormal alarm information. In addition to this warning manner, in order to realize a more flexible and accurate warning, step S230 may also have different corresponding implementation manners according to different implementation manners of step 220 . For example, when step S220 filters out abnormal commands through preset suspicious rules, and outputs the alarm weight of the abnormal commands hit by the suspicious rules, step S230 can judge whether the alarm condition is satisfied according to the alarm weight of the abnormal command, and if so, send The alarm information, specifically, can be to count all abnormal commands on a certain host identified within an alarm cycle, comprehensively process the corresponding alarm weights of these abnormal commands, and judge whether the value meets the requirements after comprehensive processing. Preset alarm conditions, such as accumulating the corresponding alarm weights of the abnormal commands that occur each time, judging whether the accumulated alarm weight reaches the preset threshold within this cycle, and if so, sending out an alarm message corresponding to the abnormality of the host .

又如当S220是根据已有分类模型的训练样本集，对接收到的当前命令进行分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否属于异常命令时，S230的实现可以是，统计一个告警周期内、识别出的某一主机上的所有异常命令，将这些异常命令各自对应的异常命令概率进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。在将这些异常命令各自对应的异常命令概率进行综合化处理时，可以将各异常命令的概率与出现次数的乘积的和，作为是否告警的参考数据，具体的可以是将获得的该参考数据与预置的告警阈值相比较，如果高于预置的告警阈值，则发出相应主机存在异常的告警信息。Another example is when S220 classifies the received current command according to the training sample set of the existing classification model, obtains the probability that the current command is a normal command and the probability that it is an abnormal command, and then identifies whether the current command is an abnormal command , the implementation of S230 may be to count all abnormal commands on a certain host computer identified within an alarm period, perform comprehensive processing on the corresponding abnormal command probabilities of these abnormal commands, and judge whether to meet the requirements according to the comprehensively processed values Preset alarm conditions. When the probabilities of abnormal commands corresponding to these abnormal commands are comprehensively processed, the sum of the product of the probability of each abnormal command and the number of occurrences can be used as the reference data for whether to give an alarm. Specifically, the obtained reference data can be combined with Compared with the preset alarm threshold, if it is higher than the preset alarm threshold, an alarm message indicating that the corresponding host is abnormal will be issued.

再如当步骤S220是对接收到的当前命令采用预置的可疑规则进行过滤，筛选出被可疑规则命中的当前命令，并输出被可疑规则命中的当前命令的告警权值；然后根据已有分类模型的训练样本集，对筛选出的上述当前命令进一步分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否为异常命令。此时，告警权值可以基于该条可疑规则对命令的总体命中率获得。在实现步骤S230时，可以是统计一个告警周期内、识别出的某一主机上的所有异常命令，将每一异常命令对应的异常命令概率和告警权值相乘获得对应的告警指数，将这些异常命令的告警指数进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。其中的综合化处理，可以是取各异常命令对应的告警指数与出现次数的乘积后再相加取和，作为是否发出告警信息的参考数据，之后将该参考数据与预置的告警阈值进行对比，如果该参考数据超出了预置的告警阈值，则发出相应主机存在异常的告警信息。For another example, when step S220 is to filter the received current command using preset suspicious rules, filter out the current command hit by the suspicious rule, and output the alarm weight of the current command hit by the suspicious rule; then according to the existing classification The training sample set of the model further classifies the selected current commands to obtain the probability that the current command is a normal command and the probability that the current command is an abnormal command, and then identify whether the current command is an abnormal command. At this time, the warning weight can be obtained based on the overall hit rate of the suspicious rule to the command. When implementing step S230, it may be to count all abnormal commands on a certain host identified in an alarm period, multiply the abnormal command probability corresponding to each abnormal command and the alarm weight to obtain the corresponding alarm index, and combine these The alarm index of the abnormal command is comprehensively processed, and it is judged whether the preset alarm condition is satisfied according to the comprehensively processed value. The comprehensive processing can be to take the product of the alarm index corresponding to each abnormal command and the number of occurrences, and then add the sum, as the reference data for whether to issue an alarm message, and then compare the reference data with the preset alarm threshold , if the reference data exceeds the preset alarm threshold, an alarm message indicating that the corresponding host is abnormal is issued.

以上具有多种具体实现方式的步骤S230可以通过前面系统实施例中的告警器130执行，因此相关技术特征可以参考前面告警器130的描述，此处不再赘述。The above step S230 having multiple specific implementation manners can be executed by the alarm device 130 in the above system embodiment, so related technical features can refer to the description of the alarm device 130 above, which will not be repeated here.

以上对根据本发明一个实施例的分析设备、系统及方法进行了详细说明，为了更好的便于理解，下面再给出本发明实施例的一个具体应用举例，请参阅图3，图3示出了根据本发明一个实施例的具体应用示意图，图中，Linux/Unix/BSD Server是网络系统中的主机，在一个网络系统中，可以有若干台搭载Linux/Unix/BSD的主机，通过对主机的命令解析器shell进行改造，使其具有发送输入命令（即发送shell_log）到Receive Server（接收服务器，相当于前文中的集中数据回收器110）的能力，Receive Server将接收到的shell_log以日志的形式记录到数据库（database）中。通过将该数据库database中各命令所属的主机IP信息与系统中已部署的各主机IP进行对比，进而就可以知道是否所有主机都已将其上执行的命令准确传输给了Receive Server，以确保所有Linux/Unix/BSD Server命令发送正常，当有失效或者新增主机加入网络系统时，可以自动对失效或新增主机部署命令发送模块。The analysis equipment, system and method according to an embodiment of the present invention have been described in detail above. For better understanding, a specific application example of the embodiment of the present invention is given below. Please refer to FIG. 3, which shows A specific application schematic diagram according to an embodiment of the present invention is shown. In the figure, Linux/Unix/BSD Server is a host computer in a network system. In a network system, there can be several host computers equipped with Linux/Unix/BSD. The shell of the command parser is modified so that it has the ability to send input commands (that is, send shell_log) to the Receive Server (the receiving server, which is equivalent to the centralized data collector 110 in the previous article), and the Receive Server will receive the shell_log as the log The form is recorded in the database (database). By comparing the host IP information of each command in the database with the host IP information deployed in the system, it can be known whether all hosts have accurately transmitted the commands executed on them to the Receive Server to ensure that all Linux/Unix/BSD Server commands are sent normally. When there is a failure or a new host joins the network system, the command sending module can be automatically deployed to the failure or new host.

在具体对命令进行分析的过程中，可以基于Database的数据通过在线学习功能，对数据库里已有的数据进行机器学习，产生识别模型。在需要对当前输入的命令进行识别时，可以利用产生的模型实时监测输入的命令并识别，在识别到异常命令并满足告警条件时进行告警。在告警时，可以通过E-mail向预置的邮件地址发送包含告警信息的邮件，或者通过SMS信息中心向预置的电话号码发送包含告警信息的消息。In the process of analyzing the commands, based on the data of the Database, the online learning function can be used to perform machine learning on the existing data in the database to generate a recognition model. When it is necessary to identify the currently input command, the generated model can be used to monitor and identify the input command in real time, and an alarm will be issued when an abnormal command is identified and the alarm condition is met. When an alarm occurs, an email containing the alarm information can be sent to the preset email address through E-mail, or a message containing the alarm information can be sent to the preset phone number through the SMS information center.

以上详细介绍了本发明实施例提供的用于分析在各主机上执行的命令的分析设备、系统以及方法，通过该用于分析在各主机上执行的命令的分析设备、系统或者方法，可以在包括若干主机的网络系统中，收集各主机通过网络传输的当前命令及当前命令所属主机的标识，对收集到的当前命令中具有一定的操作危险的命令进行有效识别，判断出在主机上输入的命令是异常命令还是正常命令，在主机输入的异常命令满足告警条件时，发出相应主机存在异常的告警信息，从而能够及时对网络系统中各主机上输入的、具有一定危险性的异常命令进行告警，提高了系统的安全性。由此解决了因管理员误操作、黑客攻击等原因造成而在系统中的主机上输入危险性命令时，对主机乃至整个系统的稳定运行造成不良影响，及时对网络系统中各主机上输入的危险命令进行告警，提高了系统的安全性。The analysis equipment, system, and method for analyzing commands executed on each host computer provided by the embodiments of the present invention have been introduced in detail above, and the analysis equipment, system, and method for analyzing commands executed on each host computer can be used in In a network system including several hosts, collect the current commands transmitted by each host through the network and the identification of the host to which the current command belongs, effectively identify the commands with certain operational risks among the collected current commands, and judge the input on the host. Whether the command is an abnormal command or a normal command, when the abnormal command input by the host meets the alarm condition, an alarm message indicating that the corresponding host is abnormal is issued, so that the abnormal command input on each host in the network system that has a certain degree of danger can be alarmed in time , improving the security of the system. This solves the problem of bad influence on the stable operation of the host and even the entire system when dangerous commands are input on the host in the system due to misoperation by the administrator, hacker attacks, etc. Alarms are issued for dangerous commands, which improves the security of the system.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书（包括伴随的权利要求、摘要和附图）中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书（包括伴随的权利要求、摘要和附图）中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings), as well as any method or method so disclosed, may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器（DSP）来实现根据本发明实施例的用于分析在各主机上执行的命令的分析设备中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序（例如，计算机程序和计算机程序产品）。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) can be used in practice to implement some or all of the analysis devices for analyzing commands executed on each host according to the embodiments of the present invention Some or all of the features of the component. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

本发明公开了A1、一种用于分析在各主机上执行的命令的分析设备，包括：The invention discloses A1, an analysis device for analyzing commands executed on each host, including:

集中数据回收器，被配置为至少收集各主机终端通过网络传输的当前命令及所属主机标识；The centralized data collector is configured to at least collect the current commands transmitted by each host terminal through the network and the identification of the host to which they belong;

命令分析器，被配置为对所述集中数据回收器收集到的当前命令进行识别，至少识别出异常命令和正常命令；A command analyzer configured to identify the current command collected by the centralized data collector, at least identifying abnormal commands and normal commands;

告警器，被配置为根据所述命令分析器的识别结果判断是否满足告警条件，如果满足，则发出相应主机存在异常的告警信息。The alarm device is configured to judge whether the alarm condition is satisfied according to the recognition result of the command analyzer, and if so, send out alarm information that the corresponding host is abnormal.

A2、如A1所述的分析设备，所述命令分析器包括过滤模块，被配置为对所述集中数据回收器收集到的当前命令采用预置的可疑规则进行过滤，将被所述可疑规则命中的当前命令识别为异常命令，并输出被所述可疑规则命中的异常命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；A2, the analysis device as described in A1, the command analyzer includes a filtering module configured to filter the current command collected by the centralized data collector using a preset suspicious rule, and will be hit by the suspicious rule The current command is identified as an abnormal command, and the warning weight value of the abnormal command hit by the suspicious rule is output, and the warning weight value is obtained based on the overall hit rate of the suspicious rule to the command;

所述告警器具体被配置为根据所述异常命令的告警权值判断是否满足告警条件。The alarm is specifically configured to determine whether an alarm condition is met according to the alarm weight of the abnormal command.

A3、如A2所述的分析设备，所述过滤模块输出的异常命令的告警权值通过下述方式获得：通过以该条可疑规则对已有命令的总体命中率作为自变量的单调递减函数，获得被该条可疑规则命中的异常命令的告警权值。A3, the analysis device as described in A2, the alarm weight of the abnormal command output by the filtering module is obtained in the following manner: by using the overall hit rate of the suspicious rule to the existing command as an independent variable monotonically decreasing function, Obtain the warning weight of the abnormal command hit by the suspicious rule.

A4、如A2或A3所述的分析设备，所述告警器具体被配置为统计一个告警周期内、所述命令分析器识别出的某一主机上的所有异常命令，将这些异常命令各自对应的告警权值进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。A4, the analysis device as described in A2 or A3, the alarm device is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and record the respective corresponding commands of these abnormal commands The alarm weight is comprehensively processed, and it is judged whether the preset alarm condition is satisfied according to the comprehensively processed value.

A5、如A1所述的分析设备，所述命令分析器包括：A5, the analysis device as described in A1, the command analyzer includes:

分类模块，被配置为根据已有分类模型的训练样本集，对所述集中数据回收器接收到的当前命令进行分类，获得当前命令分别是正常命令的概率和异常命令的概率，进而识别出该当前命令是否属于异常命令。The classification module is configured to classify the current command received by the centralized data collector according to the training sample set of the existing classification model, obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify the Whether the current command is an abnormal command.

A6、如A5所述的分析设备，所述告警器具体被配置为统计一个告警周期内、所述命令分析器识别出的某一主机上的所有异常命令，将这些异常命令各自对应的异常命令概率进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。A6, the analysis device as described in A5, the alarm device is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and record the corresponding abnormal commands of these abnormal commands The probability is comprehensively processed, and it is judged whether the preset alarm condition is met according to the value after comprehensive processing.

A7、如A1所述的分析设备，所述命令分析器包括：A7, the analysis device as described in A1, the command analyzer includes:

过滤模块，被配置为对所述集中数据回收器接收到的当前命令采用预置的可疑规则进行过滤，将被所述可疑规则命中的当前命令输出至分类模块，并输出被所述可疑规则命中的当前命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；The filtering module is configured to filter the current command received by the centralized data collector using preset suspicious rules, output the current command hit by the suspicious rule to the classification module, and output the current command hit by the suspicious rule The warning weight of the current command, the warning weight is obtained based on the overall hit rate of the suspicious rule to the command;

分类模块，被配置为根据已有分类模型的训练样本集，对从所述过滤模块输入的当前命令进一步分类，获得当前命令分别是正常命令的概率和异常命令的概率，进而识别出该当前命令是否为异常命令。The classification module is configured to further classify the current command input from the filtering module according to the training sample set of the existing classification model, obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify the current command Whether it is an abnormal command.

A8、如A5或A7所述的分析设备，所述命令分析器还包括：A8, the analysis equipment as described in A5 or A7, described command analyzer also includes:

学习模块，被配置为将新增的当前命令与已有训练样本集合并后进行机器学习，更新所述分类模块使用的已有训练样本集。The learning module is configured to perform machine learning after combining the newly added current command with the existing training sample set, and update the existing training sample set used by the classification module.

A9、如A7所述的分析设备，所述告警器具体被配置为统计一个告警周期内、所述命令分析器识别出的某一主机上的所有异常命令，将每一异常命令对应的异常命令概率和告警权值相乘获得对应的告警指数，将这些异常命令的告警指数进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。A9, the analysis device as described in A7, the alarm device is specifically configured to count all abnormal commands on a certain host computer identified by the command analyzer within an alarm period, and the abnormal command corresponding to each abnormal command The corresponding warning index is obtained by multiplying the probability and the warning weight, and the warning indexes of these abnormal commands are processed comprehensively, and it is judged whether the preset warning condition is satisfied according to the comprehensively processed value.

B10、一种用于分析在各主机上执行的命令的系统，包括B1至B9中任一项所述的分析设备和若干主机终端；B10. A system for analyzing commands executed on each host, comprising the analysis device described in any one of B1 to B9 and several host terminals;

所述若干主机终端，被配置为至少将各主机上的当前命令及所属主机标识通过网络传输至集中数据回收器。The plurality of host terminals are configured to at least transmit the current command on each host and the ID of the host to the centralized data collector through the network.

B11、如B10所述的系统，所述主机终端包括：B11, the system as described in B10, the host terminal includes:

命令发送模块，被配置为对各主机的命令解析器shell进行改造，增加将所述shell接收到的主机当前命令和主机IP传输至所述集中数据回收器的函数。The command sending module is configured to transform the command parser shell of each host, and increase the function of transmitting the host current command and host IP received by the shell to the centralized data collector.

B12、如B11所述的系统，还包括：B12. The system as described in B11, further comprising:

监控器，被配置为对各主机中所述命令发送模块的部署情况进行监控，当发现有新增主机未部署所述命令发送模块或发现有主机上的所述命令发送模块失效时，通过该未部署命令发送模块或命令发送模块失效的主机IP自动登录至该主机上，为其部署所述命令发送模块。The monitor is configured to monitor the deployment of the command sending module in each host, and when it is found that a new host has not deployed the command sending module or when it is found that the command sending module on the host fails, through the The host IP that has not deployed the command sending module or the command sending module fails automatically logs in to the host, and deploys the command sending module for it.

C13、一种用于分析在各主机上执行的命令的方法，包括：C13. A method for analyzing commands executed on each host, comprising:

收集所述各主机通过网络传输的当前命令及所属主机标识；Collecting the current commands transmitted by each host through the network and the identifiers of the hosts to which they belong;

对所述收集到的当前命令进行识别，至少识别出异常命令和正常命令；Identifying the collected current commands, at least identifying abnormal commands and normal commands;

根据上述识别结果判断是否满足告警条件，如果满足，则发出相应主机存在异常的告警信息。It is judged whether the alarm condition is satisfied according to the above recognition result, and if it is satisfied, an alarm message indicating that the corresponding host is abnormal is issued.

C14、如C13所述的方法，所述对收集到的当前命令进行识别的步骤包括：C14. The method as described in C13, the step of identifying the collected current command includes:

对收集到的当前命令采用预置的可疑规则进行过滤，将被所述可疑规则命中的当前命令识别为异常命令，并获得被所述可疑规则命中的异常命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；The collected current commands are filtered by preset suspicious rules, the current commands hit by the suspicious rules are identified as abnormal commands, and the alarm weights of the abnormal commands hit by the suspicious rules are obtained. The value is obtained based on the overall hit rate of the suspicious rule to the command;

所述根据上述识别结果判断是否满足告警条件的步骤包括：根据所述异常命令的告警权值判断是否满足告警条件。The step of judging whether the warning condition is met according to the identification result includes: judging whether the warning condition is met according to the warning weight value of the abnormal command.

C15、如C14所述的方法，所述异常命令的告警权值通过下述方式获得：C15. The method as described in C14, the warning weight of the abnormal order is obtained in the following manner:

通过以该条可疑规则对已有命令的总体命中率作为自变量的单调递减函数，获得被该条可疑规则命中的异常命令的告警权值。The alarm weight of the abnormal command hit by the suspicious rule is obtained by taking the overall hit rate of the suspicious rule on the existing commands as the monotonically decreasing function of the independent variable.

C16、如C14所述的方法，所述根据识别结果判断是否满足告警条件的步骤包括：C16. The method as described in C14, the step of judging whether an alarm condition is satisfied according to the recognition result includes:

统计一个告警周期内、识别出的某一主机上的所有异常命令，将这些异常命令各自对应的告警权值进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Count all abnormal commands on a certain host identified in an alarm period, and comprehensively process the corresponding alarm weights of these abnormal commands, and judge whether the preset alarm conditions are met according to the comprehensively processed values.

C17、如C11所述的方法，所述对收集到的当前命令进行识别包括：C17. The method as described in C11, the identifying the collected current command includes:

根据已有分类模型的训练样本集，对接收到的当前命令进行分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否属于异常命令。According to the training sample set of the existing classification model, the current command received is classified to obtain the probability that the current command is a normal command and the probability that it is an abnormal command, and then identify whether the current command is an abnormal command.

C18、如C17所述的方法，所述根据识别结果判断是否满足告警条件的步骤包括：C18. The method as described in C17, the step of judging whether an alarm condition is satisfied according to the recognition result includes:

统计一个告警周期内、识别出的某一主机上的所有异常命令，将这些异常命令各自对应的异常命令概率进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Count all abnormal commands on a certain host identified within an alarm cycle, comprehensively process the corresponding abnormal command probabilities of these abnormal commands, and judge whether the preset alarm conditions are met according to the comprehensively processed values.

C19、如C13所述的方法，所述对收集到的当前命令进行识别包括：C19. The method as described in C13, the identifying the collected current command includes:

对接收到的当前命令采用预置的可疑规则进行过滤，筛选出被所述可疑规则命中的当前命令，并输出被所述可疑规则命中的当前命令的告警权值，所述告警权值基于该条可疑规则对命令的总体命中率获得；Filter the received current command using preset suspicious rules, filter out the current command hit by the suspicious rule, and output the warning weight of the current command hit by the suspicious rule, and the warning weight is based on the The overall hit rate of the order obtained by suspicious rules;

根据已有分类模型的训练样本集，对筛选出的上述当前命令进一步分类，获得当前命令是正常命令的概率和是异常命令的概率，进而识别出该当前命令是否为异常命令。According to the training sample set of the existing classification model, the screened current command is further classified to obtain the probability that the current command is a normal command and the probability that the current command is an abnormal command, and then identify whether the current command is an abnormal command.

C20、如C17或C19所述的方法，还包括：C20. The method as described in C17 or C19, further comprising:

将新增的当前命令与已有训练样本集合并后进行机器学习，更新进行分类时使用的已有训练样本集。Combine the newly added current command with the existing training sample set for machine learning, and update the existing training sample set used for classification.

C21、如C19所述的方法，所述根据识别结果判断是否满足告警条件的步骤包括：C21, the method as described in C19, the step of judging whether the alarm condition is met according to the recognition result includes:

统计一个告警周期内、识别出的某一主机上的所有异常命令，将每一异常命令对应的异常命令概率和告警权值相乘获得对应的告警指数，将这些异常命令的告警指数进行综合化处理，根据综合化处理后的值判断是否满足预置的告警条件。Count all abnormal commands on a certain host identified in an alarm period, multiply the abnormal command probability corresponding to each abnormal command by the alarm weight to obtain the corresponding alarm index, and integrate the alarm indices of these abnormal commands Processing, judging whether the preset alarm condition is met according to the comprehensively processed value.

C22、如C13-21所述的方法，所述收集所述各主机通过网络传输的当前命令及所属主机标识的步骤包括：C22. The method as described in C13-21, the step of collecting the current command transmitted by each host through the network and the identification of the host to which it belongs includes:

对各主机的命令解析器shell进行改造，增加将所述shell接收到的主机当前命令和主机IP通过网络传输至指定设备的函数，利用所述函数收集所述各主机的当前命令及所属主机标识。Modify the command parser shell of each host, add the function of transmitting the host current command and host IP received by the shell to the designated device through the network, and use the function to collect the current command of each host and the host ID of the host .

C23、如C22所述的方法，还包括：C23. The method as described in C22, further comprising:

对各主机传输当前命令及所属主机标识的事件进行监控，当发现有新增主机未进行上述shell改造或者改造失效时，通过该主机IP自动登录至该主机上为其部署上述shell的改造。Monitor the events of each host transmitting the current command and its own host ID. When it is found that there is a new host that has not undergone the above-mentioned shell modification or the modification fails, automatically log in to the host through the host IP to deploy the above-mentioned shell modification for it.

Claims

1. An analysis device for analyzing commands executed on each host, comprising:

The centralized data recycler is configured to at least collect the current command transmitted by each host terminal through the network and the host identifier to which the current command belongs, wherein the current command is a command currently input on each host terminal;

A command analyzer configured to identify the commands currently input on each host terminal collected by the centralized data collector, at least to identify abnormal commands and normal commands;

The alarm device is configured to judge whether the alarm condition is satisfied according to the identification result of the command analyzer, and if it is satisfied, send out an alarm message indicating that the corresponding host is abnormal;

The command analyzer includes a filtering module configured to filter the current commands collected by the centralized data collector using preset suspicious rules, identify the current commands hit by the suspicious rules as abnormal commands, and output The warning weight of the abnormal command hit by the suspicious rule, the warning weight is obtained based on the overall hit rate of the suspicious rule to the command;

The alarm weight of the abnormal command output by the filtering module is obtained in the following manner: by using the overall hit rate of the suspicious rule on existing commands as an independent variable, the abnormal command hit by the suspicious rule is obtained the warning weight;

The alarm is specifically configured to determine whether an alarm condition is met according to the alarm weight of the abnormal command.

2. The analysis device according to claim 1, wherein the alarm device is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and record the respective corresponding commands of these abnormal commands The alarm weight is comprehensively processed, and it is judged whether the preset alarm condition is satisfied according to the comprehensively processed value.

3. The analysis device of claim 1, said command analyzer comprising:

The classification module is configured to classify the current command received by the centralized data collector according to the training sample set of the existing classification model, obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify the Whether the current command is an abnormal command.

4. The analysis device according to claim 3, wherein the alarm device is specifically configured to count all abnormal commands on a certain host identified by the command analyzer within an alarm period, and record the respective corresponding commands of these abnormal commands The abnormal command probability is comprehensively processed, and it is judged whether the preset alarm condition is met according to the comprehensively processed value.

5. The analysis device of claim 1 , said command analyzer comprising:

The filtering module is configured to filter the current command received by the centralized data collector using preset suspicious rules, output the current command hit by the suspicious rule to the classification module, and output the current command hit by the suspicious rule The warning weight of the current command, the warning weight is obtained based on the overall hit rate of the suspicious rule to the command;

The classification module is configured to further classify the current command input from the filtering module according to the training sample set of the existing classification model, obtain the probability that the current command is a normal command and the probability of an abnormal command, and then identify the current command Whether it is an abnormal command.

6. The analysis device according to claim 3 or 5, the command analyzer further comprising:

The learning module is configured to perform machine learning after combining the newly added current command with the existing training sample set, and update the existing training sample set used by the classification module.

7. The analysis device according to claim 6, wherein the alarm device is specifically configured to count all abnormal commands on a certain host computer identified by the command analyzer within an alarm period, and count each abnormal command corresponding to The corresponding alarm index is obtained by multiplying the abnormal command probability and the alarm weight value, and the alarm index of these abnormal commands is integrated, and it is judged whether the preset alarm condition is met according to the integrated value.

8. A system for analyzing commands executed on each host, comprising the analysis device and several host terminals as claimed in any one of claims 1 to 7;

The plurality of host terminals are configured to at least transmit the current command on each host and the ID of the host to the centralized data collector through the network.

9. The system of claim 8, said host terminal comprising:

The command sending module is configured to transform the command parser shell of each host, and increase the function of transmitting the host current command and host IP received by the shell to the centralized data collector.

10. The system of claim 9, further comprising:

The monitor is configured to monitor the deployment of the command sending module in each host, and when it is found that a new host has not deployed the command sending module or when it is found that the command sending module on the host fails, through the The host IP that has not deployed the command sending module or the command sending module fails automatically logs in to the host, and deploys the command sending module for it.

11. A method for analyzing commands executed on respective hosts, comprising:

Collecting the current commands transmitted by the hosts through the network and the identifiers of the hosts to which the current commands belong, wherein the current commands are currently input commands on the terminals of each host;

Identifying the collected commands currently input on each host terminal, at least identifying abnormal commands and normal commands;

According to the above identification results, it is judged whether the alarm condition is satisfied, and if it is satisfied, an alarm message indicating that the corresponding host is abnormal is issued;

The step of identifying the collected current command includes:

The collected current commands are filtered by preset suspicious rules, the current commands hit by the suspicious rules are identified as abnormal commands, and the alarm weights of the abnormal commands hit by the suspicious rules are obtained. The value is obtained based on the overall hit rate of the suspicious rule to the command;

The warning weight of the abnormal command is obtained in the following way:

Obtain the alarm weight of the abnormal command hit by the suspicious rule by using the overall hit rate of the suspicious rule on the existing commands as the monotonically decreasing function of the independent variable;

The step of judging whether the warning condition is met according to the identification result includes: judging whether the warning condition is met according to the warning weight value of the abnormal command.

12. The method according to claim 11, said step of judging whether an alarm condition is met according to the recognition result comprises:

Count all abnormal commands on a certain host identified in an alarm period, and comprehensively process the corresponding alarm weights of these abnormal commands, and judge whether the preset alarm conditions are met according to the comprehensively processed values.

13. The method according to claim 11, said identifying the collected current command comprises:

According to the training sample set of the existing classification model, the current command received is classified to obtain the probability that the current command is a normal command and the probability that it is an abnormal command, and then identify whether the current command is an abnormal command.

14. The method according to claim 13, said step of judging whether an alarm condition is met according to the identification result comprises:

Count all abnormal commands on a certain host identified within an alarm cycle, comprehensively process the corresponding abnormal command probabilities of these abnormal commands, and judge whether the preset alarm conditions are met according to the comprehensively processed values.

15. The method according to claim 12, said identifying the collected current command comprises:

Filter the received current command using preset suspicious rules, filter out the current command hit by the suspicious rule, and output the warning weight of the current command hit by the suspicious rule, and the warning weight is based on the The overall hit rate of the order obtained by suspicious rules;

According to the training sample set of the existing classification model, the screened current command is further classified to obtain the probability that the current command is a normal command and the probability that the current command is an abnormal command, and then identify whether the current command is an abnormal command.

16. The method of claim 13 or 15, further comprising:

Combine the newly added current command with the existing training sample set for machine learning, and update the existing training sample set used for classification.

17. The method according to claim 15, wherein the step of judging whether an alarm condition is met according to the recognition result comprises:

Count all abnormal commands on a certain host identified in an alarm period, multiply the abnormal command probability corresponding to each abnormal command by the alarm weight to obtain the corresponding alarm index, and integrate the alarm indices of these abnormal commands Processing, judging whether the preset alarm condition is met according to the comprehensively processed value.

18. The method according to any one of claims 11-15 and 17, wherein the step of collecting the current commands transmitted by the hosts through the network and the identifiers of the hosts they belong to comprises:

Modify the command parser shell of each host, add the function of transmitting the host current command and host IP received by the shell to the designated device through the network, and use the function to collect the current command of each host and the host ID of the host .

19. The method of claim 18, further comprising:

Monitor the events of each host transmitting the current command and its own host ID. When it is found that there is a new host that has not undergone the above-mentioned shell modification or the modification fails, automatically log in to the host through the host IP to deploy the above-mentioned shell modification for it.