CN114338195A - Web traffic anomaly detection method and device based on improved isolated forest algorithm - Google Patents
Web traffic anomaly detection method and device based on improved isolated forest algorithm Download PDFInfo
- Publication number
- CN114338195A CN114338195A CN202111658650.1A CN202111658650A CN114338195A CN 114338195 A CN114338195 A CN 114338195A CN 202111658650 A CN202111658650 A CN 202111658650A CN 114338195 A CN114338195 A CN 114338195A
- Authority
- CN
- China
- Prior art keywords
- web
- anomaly detection
- isolated
- log data
- isolated forest
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 86
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 36
- 230000002159 abnormal effect Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 17
- 206010003591 Ataxia Diseases 0.000 claims description 8
- 206010010947 Coordination abnormal Diseases 0.000 claims description 8
- 208000016290 incoordination Diseases 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 2
- 230000005856 abnormality Effects 0.000 claims 1
- 238000004364 calculation method Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 9
- 230000007123 defense Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000002547 anomalous effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 235000014510 cooky Nutrition 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The disclosure relates to the technical field of network security, in particular to a web traffic anomaly detection method and device based on an improved isolated forest algorithm, a storage medium and a terminal device. The method comprises the following steps: collecting historical log data of a web firewall and log data to be tested of the web firewall; extracting the characteristics of the historical log data of the web firewall to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result; and inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result. The method disclosed by the invention improves the identification accuracy of abnormal flow and reduces false alarm under the condition of ensuring the efficiency of the original algorithm.
Description
Technical Field
The disclosure relates to the technical field of network security, in particular to a web traffic anomaly detection method based on an improved isolated forest algorithm, a web traffic anomaly detection device based on the improved isolated forest algorithm, a storage medium and a terminal device.
Background
The Web firewall is the first line of defense for information security. With the rapid update of network technologies, new hacker technologies are also emerging, which brings challenges to traditional rule firewalls. Traditional web intrusion detection techniques intercept intrusion accesses by maintaining a set of rules. On one hand, the hard rule is easy to bypass in the presence of flexible hackers, and the rule set based on past knowledge is difficult to deal with 0day attack; on the other hand, the water resistance and water rise of the attack and defense, the construction and maintenance threshold of the defense rule is high, and the resource consumption cost is high. However, the conventional isolated forest anomaly detection algorithm is limited in anomaly score calculation, is not good at processing web traffic data containing a large number of local relatively sparse points, and for most web traffic data with unobvious features, the calculation mode of average calculation search depth easily causes the influence of low-correlation features on final anomaly judgment.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a web traffic anomaly detection method based on an improved isolated forest algorithm, a web traffic anomaly detection device based on the improved isolated forest algorithm, a storage medium and a terminal device, and further overcomes the limitation of the application of the traditional isolated forest algorithm in the field of web traffic anomaly detection due to the limitation and defect of the related technology at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a web traffic anomaly detection method based on an improved isolated forest algorithm, the method comprising:
collecting historical log data of a web firewall and log data to be tested of the web firewall;
extracting the characteristics of the historical log data of the web firewall to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
In an exemplary embodiment of the disclosure, the performing feature extraction on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result includes:
analyzing historical log data of the web firewall to obtain feature data of multiple dimensions, and taking the feature data as a training sample;
constructing an orphan tree set based on the feature data;
calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample;
calculating diversity among the isolated trees by using the incoordination amount to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix;
and calculating the abnormal score by combining the first weight w1 and the second weight w 2.
In an exemplary embodiment of the present disclosure, the parsing the web firewall history log data to obtain feature data of multiple dimensions includes:
the HTTP requests in the history log data of the web firewall are generalized, and the feature data of multiple dimensions are extracted based on preset feature fields.
In an exemplary embodiment of the present disclosure, the method further comprises:
and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
In an exemplary embodiment of the present disclosure, the constructing the set of orphan trees based on the feature data includes:
and respectively constructing corresponding isolated tree sets based on the characteristic data of each type.
In an exemplary embodiment of the present disclosure, the first weight w1 is an isolated tree weight.
In an exemplary embodiment of the present disclosure, the second weight coefficient w2 is an isolated tree path depth weight coefficient;
the calculating diversity between isolated trees by using the incorrigibility quantity to determine a symmetric matrix and calculating a second weight coefficient w2 by using the symmetric matrix includes:
calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix;
and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
According to a second aspect of the present disclosure, there is provided a web traffic anomaly detection apparatus based on an improved isolated forest algorithm, the apparatus comprising:
the data acquisition module is used for acquiring historical log data of the web firewall and log data to be detected of the web firewall;
the model training module is used for extracting the characteristics of the historical log data of the web firewall so as to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and the detection result output module is used for inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model so as to obtain an anomaly detection result.
According to a third aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described web traffic anomaly detection method based on an improved isolated forest algorithm.
According to a fourth aspect of the present disclosure, there is provided a terminal device comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the above-described web traffic anomaly detection method based on the improved isolated forest algorithm via execution of the executable instructions.
In the web flow anomaly detection method based on the improved isolated forest algorithm, the collected historical log data of the web firewall are subjected to feature extraction, and an anomaly detection model based on the improved isolated forest model is constructed based on the feature extraction result, so that the log data to be detected of the web firewall can be detected by using the model, and the corresponding detection result is determined. Under the condition of ensuring the efficiency of the original algorithm, the identification accuracy of abnormal flow is improved, and the false alarm is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 is a schematic diagram schematically illustrating a web traffic anomaly detection method based on an improved isolated forest algorithm in an exemplary embodiment of the present disclosure;
FIG. 2 schematically illustrates a schematic diagram of a method for constructing an anomaly detection model based on an isolated forest model in an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram schematically illustrating a web traffic anomaly detection apparatus based on an improved isolated forest algorithm in an exemplary embodiment of the present disclosure;
fig. 4 schematically illustrates a composition diagram of a terminal device in an exemplary embodiment of the present disclosure;
fig. 5 schematically illustrates a schematic diagram of a storage medium in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the related art, the conventional web intrusion detection technology intercepts intrusion access by maintaining a rule set. On one hand, the hard rule is easy to bypass in the presence of flexible hackers, and the rule set based on past knowledge is difficult to deal with 0day attack; on the other hand, the water resistance and water rise of the attack and defense, the construction and maintenance threshold of the defense rule is high, and the resource consumption cost is high. The new generation of web traffic anomaly detection technology based on the machine learning technology is expected to make up the defects of the traditional rule set method, and brings new development and breakthrough to the defense end of the web countermeasure. Machine learning also presents challenges for web intrusion detection, the most difficult of which is the lack of tag data. Therefore, the web traffic anomaly detection method based on anomaly detection can build a model for a large number of normal logs, and if the logs do not conform to normal traffic, the logs are identified as anomalies, so that the logs are in a 'invariable strain ten thousand' in countermeasures. Meanwhile, the method is applied to the firewall system, 100% of samples which are determined to be normal can be preliminarily screened out, the samples can be directly put through the firewall detection engine without detection, resource consumption of the firewall in the large-flow website protection process is reduced, and firewall detection efficiency and accuracy are improved. However, the conventional isolated forest anomaly detection algorithm has limitation in computing the anomaly scores, is not good at processing web traffic data containing a large number of local relatively sparse points, and for most web traffic data with unobvious features, the computing mode of average computing search depth easily causes the influence of low-correlation features on final anomaly judgment.
In view of the above-mentioned shortcomings and drawbacks of the prior art, the present exemplary embodiment provides a web traffic anomaly detection method based on an improved isolated forest algorithm. Referring to fig. 1, the web traffic anomaly detection method based on the improved isolated forest algorithm may include the following steps:
step S11, collecting historical log data of the web firewall and log data to be tested of the web firewall;
step S12, extracting the characteristics of the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and step S13, inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
In the web flow anomaly detection method based on the improved isolated forest algorithm provided by the embodiment of the invention, the collected historical log data of the web firewall is used for feature extraction, and an anomaly detection model based on the improved isolated forest model is constructed based on the feature extraction result, so that the model can be used for detecting the log data to be detected of the web firewall and determining the corresponding detection result. Under the condition of ensuring the efficiency of the original algorithm, the identification accuracy of abnormal flow is improved, and the false alarm is reduced. By combining the abnormal flow data characteristics and introducing a weight coefficient to improve an isolated deep forest algorithm, the influence of an isolated tree constructed by low-correlation characteristics on a result can be reduced, and the identification precision of flow data containing local relatively sparse characteristics is improved.
Hereinafter, the steps of the web traffic anomaly detection method based on the improved isolated forest algorithm in the present exemplary embodiment will be described in more detail with reference to the drawings and the examples.
In step S11, web firewall history log data and log data to be tested of the web firewall are collected.
In the present exemplary embodiment, the above-described method may be performed by a server; alternatively, the present invention may be implemented by a server and a user terminal cooperatively performing. For example, a user may send a traffic detection request to a server side on a terminal device, where the traffic detection request may include log data to be detected of a web firewall collected in real time. After receiving the traffic detection request, the server side can create a traffic detection task. And the server executes the flow detection task, firstly trains an anomaly detection model, and then detects the log data to be detected of the web firewall collected in real time by using the trained anomaly detection model in real time. Alternatively, in some exemplary embodiments, the above web traffic anomaly detection method based on the improved isolated forest algorithm may also be executed by a terminal device with certain computing power.
In step S12, feature extraction is performed on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result.
In this exemplary embodiment, referring to fig. 2, the step S12 may specifically include:
step S21, analyzing the historical log data of the web firewall to obtain characteristic data of multiple dimensions, and taking the characteristic data as a training sample;
step S22, constructing an isolated tree set based on the feature data;
step S23, calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample;
step S24, calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix;
and step S25, combining the first weight w1 and the second weight w2 to calculate an abnormal score.
Specifically, the number of history logs of a preset period duration may be collected. And generalizing an HTTP (Hyper Text Transfer Protocol) request in the history log data of the web firewall, and extracting feature data of multiple dimensions based on preset feature fields. For example, the dimension may specifically be included in the header of an http request: cookies, urls, user agents, usernames, referrers, etc. The specific parameters of each type of feature may include parameter length information, parameter number character distribution information, and text similarity information. The text similarity can be the similarity between the current sample and the normal sample or the abnormal sample; alternatively, the similarity calculation result may correspond to different http requests.
In this example embodiment, the method further comprises: and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
For example, in the training set, a small number of abnormal samples can be randomly added, and parameters are set to train the isolated tree.
In the training process of the isolated tree, the variances of the distances between the centers of the abnormal sample and the normal sample can be respectively calculated, and the isolated tree weight w1 is obtained through calculation, wherein the formula may include:
wherein, deltanRepresenting the distance variance from the normal sample to the center of the normal sample; deltaaRepresenting the variance of the distance of the outlier to the center of the normal sample.
Specifically, the samples may be projected into a euclidean space, and the normal sample center and the distances from other sample points to the normal sample point center are calculated in the euclidean space according to the distribution.
In this exemplary embodiment, the constructing the set of orphan trees based on the feature data includes: and respectively constructing corresponding isolated tree sets based on the characteristic data of each type. For example, corresponding orphan trees can be created based on url, cookie, respectively.
In this exemplary embodiment, the determining a symmetric matrix by using the incorrigibility value to calculate the diversity between the isolated trees and calculating the second weight coefficient w2 by using the symmetric matrix includes:
calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix;
and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
For example, for classifier hiAnd hjThe predicted results of the two isolated trees may include:
based on the prediction results, a symmetry matrix may be calculated, and the formula may include:
specifically, the variability of the isolated trees can be represented as:
wherein, the matrix is calculated as the mean value according to the columns, and the mean value is used as the weight coefficient w2 of the tree path depth. The calculation formula may include:
in this exemplary embodiment, the weight coefficient obtained based on the above calculation improves the abnormal score calculation of the conventional isolated forest algorithm, and the formula may include:
where n represents the number of isolated trees and h represents the depth of the trees.
In step S13, the log data to be tested of the web firewall is input into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
In this exemplary embodiment, the anomaly detection model outputs a normalized anomaly score according to input real-time data, and when the anomaly score is greater than an anomaly threshold, the real-time data is anomalous data. Data with a normalized anomaly score greater than 0.6 is generally defined as anomalous data, and the degree of anomaly of the data is proportional to the normalized anomaly score.
According to the web flow abnormity detection method based on the improved isolated forest algorithm, web firewall historical log data are utilized, firstly, certain dimension characteristic data are randomly extracted through the isolated forest algorithm, an isolated tree set with a certain scale is constructed, and similarity information of a sample to be detected and normal and abnormal samples is introduced into child nodes of the isolated tree set. And introducing a non-conformity calculation method in the integration model to calculate and set an isolated tree integration weight coefficient, and finally setting a search depth weight coefficient by combining the similarity and diversity of isolated tree child nodes to improve an isolated forest algorithm so as to realize identification of abnormal web traffic. A model (Profile) is built for a large number of normal logs, whereas those that do not match normal traffic are identified as abnormal, "in the fight" strain ten thousand invariably ". The method is accessed into a web application firewall and used for filtering flow, the identified normal logs are directly released, and the abnormal logs enter a threat detection engine, so that the working efficiency of the firewall is effectively improved, and meanwhile, the resource consumption is reduced. In addition, the method fully considers http request data characteristics, introduces a weight coefficient to improve an anomaly detection algorithm, and changes a mode that each isolated tree of an isolated forest in an original algorithm contributes an anomaly score on average, thereby reducing errors caused by low-correlation characteristics and improving the detection sensitivity to local sparse data. Under the condition of ensuring the efficiency of the original algorithm, the identification accuracy of abnormal flow is improved, and the false alarm is reduced. In addition, the model disclosed by the invention is combined with text similarity, and the abnormal recognition effect of the isolated tree generated by training is judged through the distance of the child nodes; and changing the calculation mode of the depth of the isolated tree, and setting the weight coefficient of the search depth by combining the similarity and diversity of the sub-nodes of the isolated tree.
It is to be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Further, referring to fig. 3, in an embodiment of the present example, there is also provided a web traffic anomaly detection apparatus 30 based on an improved isolated forest algorithm, including: a data acquisition module 301, a model training module 302 and a detection result output module 303. Wherein,
the data collection module 301 may be configured to collect historical log data of a web firewall and log data to be tested of the web firewall.
The model training module 302 may be configured to perform feature extraction on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result.
The detection result output module 303 may be configured to input the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model, so as to obtain an anomaly detection result.
In this exemplary embodiment, the model training module 302 may be configured to include: analyzing historical log data of the web firewall to obtain feature data of multiple dimensions, and taking the feature data as a training sample; constructing an orphan tree set based on the feature data; calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample; calculating diversity among the isolated trees by using the incoordination amount to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix; and calculating the abnormal score by combining the first weight w1 and the second weight w 2.
In this exemplary embodiment, the parsing the web firewall history log data to obtain feature data of multiple dimensions includes: the HTTP requests in the history log data of the web firewall are generalized, and the feature data of multiple dimensions are extracted based on preset feature fields.
In the present exemplary embodiment, the model training module 302 may include: and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
In this exemplary embodiment, the building a set of orphan trees based on the feature data includes: and respectively constructing corresponding isolated tree sets based on the characteristic data of each type.
In the present exemplary embodiment, the first weight w1 is an isolated tree weight.
In the present exemplary embodiment, the second weight coefficient w2 is an isolated tree path depth weight coefficient; the calculating diversity between isolated trees by using the incorrigibility quantity to determine a symmetric matrix and calculating a second weight coefficient w2 by using the symmetric matrix includes: calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix; and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
The specific details of each module in the web traffic anomaly detection apparatus based on the improved isolated forest algorithm are described in detail in the corresponding web traffic anomaly detection method based on the improved isolated forest algorithm, and therefore, the details are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In an exemplary embodiment of the present disclosure, there is also provided a computer system capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
The terminal device 500 according to this embodiment of the present invention is described below with reference to fig. 4. The terminal device 500 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, terminal device 500 is in the form of a general purpose computing device. The components of computer system 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that couples the various system components including the memory unit 620 and the processing unit 610.
Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The computer system 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the computer system 600, and/or with any devices (e.g., router, modem, etc.) that enable the computer system 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. The display unit 640 may also be connected through an input/output (I/O) interface. Moreover, computer system 600 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network such as the Internet) via network adapter 660. As shown, network adapter 660 communicates with the other modules of computer system 600 via bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer system 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 5, a program product 100 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.
Claims (10)
1. A web traffic anomaly detection method based on an improved isolated forest algorithm is characterized by comprising the following steps:
collecting historical log data of a web firewall and log data to be tested of the web firewall;
extracting the characteristics of the historical log data of the web firewall to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
2. The method for detecting the web traffic anomaly based on the improved isolated forest algorithm as claimed in claim 1, wherein the step of performing feature extraction on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result comprises the following steps:
analyzing historical log data of the web firewall to obtain feature data of multiple dimensions, and taking the feature data as a training sample;
constructing an orphan tree set based on the feature data;
calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample;
calculating diversity among the isolated trees by using the incoordination amount to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix;
and calculating the abnormal score by combining the first weight w1 and the second weight w 2.
3. The method for detecting the web traffic anomaly based on the improved isolated forest algorithm as claimed in claim 2, wherein the step of analyzing the historical log data of the web firewall to obtain feature data of multiple dimensions comprises the following steps:
the HTTP requests in the history log data of the web firewall are generalized, and the feature data of multiple dimensions are extracted based on preset feature fields.
4. The method for detecting web traffic anomaly based on the improved isolated forest algorithm according to the claim 2 or the claim 3, characterized in that the method further comprises the following steps:
and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
5. The method for detecting web traffic anomaly based on improved isolated forest algorithm according to claim 2, wherein the constructing of the isolated tree set based on the feature data comprises:
and respectively constructing corresponding isolated tree sets based on the characteristic data of each type.
6. The method for detecting web traffic abnormality based on the improved isolated forest algorithm as claimed in claim 2, wherein the first weight w1 is an isolated tree weight.
7. The web traffic anomaly detection method based on the improved isolated forest algorithm according to claim 2, wherein the second weight coefficient w2 is an isolated tree path depth weight coefficient;
the calculating diversity between isolated trees by using the incorrigibility quantity to determine a symmetric matrix and calculating a second weight coefficient w2 by using the symmetric matrix includes:
calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix;
and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
8. A web traffic anomaly detection device based on an improved isolated forest algorithm is characterized by comprising:
the data acquisition module is used for acquiring historical log data of the web firewall and log data to be detected of the web firewall;
the model training module is used for extracting the characteristics of the historical log data of the web firewall so as to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and the detection result output module is used for inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model so as to obtain an anomaly detection result.
9. A storage medium having stored thereon a computer program which, when executed by a processor, implements the improved orphan forest algorithm-based web traffic anomaly detection method according to any one of claims 1 to 7.
10. A terminal device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the web traffic anomaly detection method based on the improved orphan forest algorithm of any one of claims 1 to 7 via execution of the executable instructions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111658650.1A CN114338195B (en) | 2021-12-30 | 2021-12-30 | Web flow anomaly detection method and device based on improved isolated forest algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111658650.1A CN114338195B (en) | 2021-12-30 | 2021-12-30 | Web flow anomaly detection method and device based on improved isolated forest algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114338195A true CN114338195A (en) | 2022-04-12 |
CN114338195B CN114338195B (en) | 2024-09-06 |
Family
ID=81019582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111658650.1A Active CN114338195B (en) | 2021-12-30 | 2021-12-30 | Web flow anomaly detection method and device based on improved isolated forest algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114338195B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117194920A (en) * | 2023-09-06 | 2023-12-08 | 万仁企业管理技术(深圳)有限公司 | Data system processing platform and processing method based on big data analysis |
CN117241306A (en) * | 2023-11-10 | 2023-12-15 | 深圳市银尔达电子有限公司 | Real-time monitoring method for abnormal flow data of 4G network |
CN117978461A (en) * | 2024-01-15 | 2024-05-03 | 兵器装备集团财务有限责任公司 | Abnormal login detection method and system based on isolated forest |
CN118054967A (en) * | 2024-04-01 | 2024-05-17 | 雅安数字经济运营有限公司 | Anomaly detection method, medium and system based on network security |
CN118118407A (en) * | 2024-04-30 | 2024-05-31 | 国网浙江省电力有限公司信息通信分公司 | Method and system for planning service route of optical transport network based on directional intermediate node |
CN118568481A (en) * | 2024-07-29 | 2024-08-30 | 中国移动通信集团四川有限公司 | Abnormal data determination method, device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108777873A (en) * | 2018-06-04 | 2018-11-09 | 江南大学 | The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend |
CN110958222A (en) * | 2019-10-31 | 2020-04-03 | 苏州浪潮智能科技有限公司 | Server log anomaly detection method and system based on isolated forest algorithm |
CN111431883A (en) * | 2020-03-18 | 2020-07-17 | 上海观安信息技术股份有限公司 | Web attack detection method and device based on access parameters |
US20210160266A1 (en) * | 2019-11-27 | 2021-05-27 | Telefonaktiebolaget Lm Ericsson (Publ) | Computer-implemented method and arrangement for classifying anomalies |
CN113051552A (en) * | 2019-12-27 | 2021-06-29 | 北京国双科技有限公司 | Abnormal behavior detection method and device |
-
2021
- 2021-12-30 CN CN202111658650.1A patent/CN114338195B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108777873A (en) * | 2018-06-04 | 2018-11-09 | 江南大学 | The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend |
CN110958222A (en) * | 2019-10-31 | 2020-04-03 | 苏州浪潮智能科技有限公司 | Server log anomaly detection method and system based on isolated forest algorithm |
US20210160266A1 (en) * | 2019-11-27 | 2021-05-27 | Telefonaktiebolaget Lm Ericsson (Publ) | Computer-implemented method and arrangement for classifying anomalies |
CN113051552A (en) * | 2019-12-27 | 2021-06-29 | 北京国双科技有限公司 | Abnormal behavior detection method and device |
CN111431883A (en) * | 2020-03-18 | 2020-07-17 | 上海观安信息技术股份有限公司 | Web attack detection method and device based on access parameters |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117194920A (en) * | 2023-09-06 | 2023-12-08 | 万仁企业管理技术(深圳)有限公司 | Data system processing platform and processing method based on big data analysis |
CN117194920B (en) * | 2023-09-06 | 2024-05-28 | 北京酷炫网络技术股份有限公司 | Data system processing platform and processing method based on big data analysis |
CN117241306A (en) * | 2023-11-10 | 2023-12-15 | 深圳市银尔达电子有限公司 | Real-time monitoring method for abnormal flow data of 4G network |
CN117241306B (en) * | 2023-11-10 | 2024-02-06 | 深圳市银尔达电子有限公司 | Real-time monitoring method for abnormal flow data of 4G network |
CN117978461A (en) * | 2024-01-15 | 2024-05-03 | 兵器装备集团财务有限责任公司 | Abnormal login detection method and system based on isolated forest |
CN118054967A (en) * | 2024-04-01 | 2024-05-17 | 雅安数字经济运营有限公司 | Anomaly detection method, medium and system based on network security |
CN118118407A (en) * | 2024-04-30 | 2024-05-31 | 国网浙江省电力有限公司信息通信分公司 | Method and system for planning service route of optical transport network based on directional intermediate node |
CN118568481A (en) * | 2024-07-29 | 2024-08-30 | 中国移动通信集团四川有限公司 | Abnormal data determination method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114338195B (en) | 2024-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114338195B (en) | Web flow anomaly detection method and device based on improved isolated forest algorithm | |
EP4058916B1 (en) | Detecting unknown malicious content in computer systems | |
CN108616498A (en) | A kind of web access exceptions detection method and device | |
CN107451476A (en) | Webpage back door detection method, system, equipment and storage medium based on cloud platform | |
CN110765459A (en) | Malicious script detection method and device and storage medium | |
US10826920B1 (en) | Signal distribution score for bot detection | |
JP2018041442A (en) | System and method for detecting web page abnormal element | |
CN108600172A (en) | Hit library attack detection method, device, equipment and computer readable storage medium | |
CN116389235A (en) | Fault monitoring method and system applied to industrial Internet of things | |
CN114070642A (en) | Network security detection method, system, device and storage medium | |
CN112565164A (en) | Dangerous IP identification method, dangerous IP identification device and computer readable storage medium | |
CN117240632A (en) | Attack detection method and system based on knowledge graph | |
CN113918936A (en) | SQL injection attack detection method and device | |
CN110572402A (en) | internet hosting website detection method and system based on network access behavior analysis and readable storage medium | |
CN115913710A (en) | Abnormality detection method, apparatus, device and storage medium | |
CN110955890B (en) | Method and device for detecting malicious batch access behaviors and computer storage medium | |
CN113282920B (en) | Log abnormality detection method, device, computer equipment and storage medium | |
CN109918901A (en) | The method that real-time detection is attacked based on Cache | |
CN113032774B (en) | Training method, device and equipment of anomaly detection model and computer storage medium | |
EP4169223A1 (en) | Method and apparatus to detect scripted network traffic | |
Setianto et al. | Gpt-2c: A gpt-2 parser for cowrie honeypot logs | |
CN116318763A (en) | Zero-trust dynamic access control method for power distribution cloud master station | |
CN115333850B (en) | Domain name detection method, system and related equipment | |
CN116915459B (en) | Network threat analysis method based on large language model | |
Yang et al. | Mining Method of Code Vulnerability of Multi-Source Power IoT Terminal Based on Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |