Nothing Special   »   [go: up one dir, main page]

CN114338195A - Web traffic anomaly detection method and device based on improved isolated forest algorithm - Google Patents

Web traffic anomaly detection method and device based on improved isolated forest algorithm Download PDF

Info

Publication number
CN114338195A
CN114338195A CN202111658650.1A CN202111658650A CN114338195A CN 114338195 A CN114338195 A CN 114338195A CN 202111658650 A CN202111658650 A CN 202111658650A CN 114338195 A CN114338195 A CN 114338195A
Authority
CN
China
Prior art keywords
web
anomaly detection
isolated
log data
isolated forest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111658650.1A
Other languages
Chinese (zh)
Other versions
CN114338195B (en
Inventor
钟良志
白冰
董康辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111658650.1A priority Critical patent/CN114338195B/en
Publication of CN114338195A publication Critical patent/CN114338195A/en
Application granted granted Critical
Publication of CN114338195B publication Critical patent/CN114338195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure relates to the technical field of network security, in particular to a web traffic anomaly detection method and device based on an improved isolated forest algorithm, a storage medium and a terminal device. The method comprises the following steps: collecting historical log data of a web firewall and log data to be tested of the web firewall; extracting the characteristics of the historical log data of the web firewall to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result; and inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result. The method disclosed by the invention improves the identification accuracy of abnormal flow and reduces false alarm under the condition of ensuring the efficiency of the original algorithm.

Description

Web traffic anomaly detection method and device based on improved isolated forest algorithm
Technical Field
The disclosure relates to the technical field of network security, in particular to a web traffic anomaly detection method based on an improved isolated forest algorithm, a web traffic anomaly detection device based on the improved isolated forest algorithm, a storage medium and a terminal device.
Background
The Web firewall is the first line of defense for information security. With the rapid update of network technologies, new hacker technologies are also emerging, which brings challenges to traditional rule firewalls. Traditional web intrusion detection techniques intercept intrusion accesses by maintaining a set of rules. On one hand, the hard rule is easy to bypass in the presence of flexible hackers, and the rule set based on past knowledge is difficult to deal with 0day attack; on the other hand, the water resistance and water rise of the attack and defense, the construction and maintenance threshold of the defense rule is high, and the resource consumption cost is high. However, the conventional isolated forest anomaly detection algorithm is limited in anomaly score calculation, is not good at processing web traffic data containing a large number of local relatively sparse points, and for most web traffic data with unobvious features, the calculation mode of average calculation search depth easily causes the influence of low-correlation features on final anomaly judgment.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a web traffic anomaly detection method based on an improved isolated forest algorithm, a web traffic anomaly detection device based on the improved isolated forest algorithm, a storage medium and a terminal device, and further overcomes the limitation of the application of the traditional isolated forest algorithm in the field of web traffic anomaly detection due to the limitation and defect of the related technology at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a web traffic anomaly detection method based on an improved isolated forest algorithm, the method comprising:
collecting historical log data of a web firewall and log data to be tested of the web firewall;
extracting the characteristics of the historical log data of the web firewall to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
In an exemplary embodiment of the disclosure, the performing feature extraction on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result includes:
analyzing historical log data of the web firewall to obtain feature data of multiple dimensions, and taking the feature data as a training sample;
constructing an orphan tree set based on the feature data;
calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample;
calculating diversity among the isolated trees by using the incoordination amount to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix;
and calculating the abnormal score by combining the first weight w1 and the second weight w 2.
In an exemplary embodiment of the present disclosure, the parsing the web firewall history log data to obtain feature data of multiple dimensions includes:
the HTTP requests in the history log data of the web firewall are generalized, and the feature data of multiple dimensions are extracted based on preset feature fields.
In an exemplary embodiment of the present disclosure, the method further comprises:
and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
In an exemplary embodiment of the present disclosure, the constructing the set of orphan trees based on the feature data includes:
and respectively constructing corresponding isolated tree sets based on the characteristic data of each type.
In an exemplary embodiment of the present disclosure, the first weight w1 is an isolated tree weight.
In an exemplary embodiment of the present disclosure, the second weight coefficient w2 is an isolated tree path depth weight coefficient;
the calculating diversity between isolated trees by using the incorrigibility quantity to determine a symmetric matrix and calculating a second weight coefficient w2 by using the symmetric matrix includes:
calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix;
and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
According to a second aspect of the present disclosure, there is provided a web traffic anomaly detection apparatus based on an improved isolated forest algorithm, the apparatus comprising:
the data acquisition module is used for acquiring historical log data of the web firewall and log data to be detected of the web firewall;
the model training module is used for extracting the characteristics of the historical log data of the web firewall so as to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and the detection result output module is used for inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model so as to obtain an anomaly detection result.
According to a third aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described web traffic anomaly detection method based on an improved isolated forest algorithm.
According to a fourth aspect of the present disclosure, there is provided a terminal device comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the above-described web traffic anomaly detection method based on the improved isolated forest algorithm via execution of the executable instructions.
In the web flow anomaly detection method based on the improved isolated forest algorithm, the collected historical log data of the web firewall are subjected to feature extraction, and an anomaly detection model based on the improved isolated forest model is constructed based on the feature extraction result, so that the log data to be detected of the web firewall can be detected by using the model, and the corresponding detection result is determined. Under the condition of ensuring the efficiency of the original algorithm, the identification accuracy of abnormal flow is improved, and the false alarm is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 is a schematic diagram schematically illustrating a web traffic anomaly detection method based on an improved isolated forest algorithm in an exemplary embodiment of the present disclosure;
FIG. 2 schematically illustrates a schematic diagram of a method for constructing an anomaly detection model based on an isolated forest model in an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram schematically illustrating a web traffic anomaly detection apparatus based on an improved isolated forest algorithm in an exemplary embodiment of the present disclosure;
fig. 4 schematically illustrates a composition diagram of a terminal device in an exemplary embodiment of the present disclosure;
fig. 5 schematically illustrates a schematic diagram of a storage medium in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the related art, the conventional web intrusion detection technology intercepts intrusion access by maintaining a rule set. On one hand, the hard rule is easy to bypass in the presence of flexible hackers, and the rule set based on past knowledge is difficult to deal with 0day attack; on the other hand, the water resistance and water rise of the attack and defense, the construction and maintenance threshold of the defense rule is high, and the resource consumption cost is high. The new generation of web traffic anomaly detection technology based on the machine learning technology is expected to make up the defects of the traditional rule set method, and brings new development and breakthrough to the defense end of the web countermeasure. Machine learning also presents challenges for web intrusion detection, the most difficult of which is the lack of tag data. Therefore, the web traffic anomaly detection method based on anomaly detection can build a model for a large number of normal logs, and if the logs do not conform to normal traffic, the logs are identified as anomalies, so that the logs are in a 'invariable strain ten thousand' in countermeasures. Meanwhile, the method is applied to the firewall system, 100% of samples which are determined to be normal can be preliminarily screened out, the samples can be directly put through the firewall detection engine without detection, resource consumption of the firewall in the large-flow website protection process is reduced, and firewall detection efficiency and accuracy are improved. However, the conventional isolated forest anomaly detection algorithm has limitation in computing the anomaly scores, is not good at processing web traffic data containing a large number of local relatively sparse points, and for most web traffic data with unobvious features, the computing mode of average computing search depth easily causes the influence of low-correlation features on final anomaly judgment.
In view of the above-mentioned shortcomings and drawbacks of the prior art, the present exemplary embodiment provides a web traffic anomaly detection method based on an improved isolated forest algorithm. Referring to fig. 1, the web traffic anomaly detection method based on the improved isolated forest algorithm may include the following steps:
step S11, collecting historical log data of the web firewall and log data to be tested of the web firewall;
step S12, extracting the characteristics of the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and step S13, inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
In the web flow anomaly detection method based on the improved isolated forest algorithm provided by the embodiment of the invention, the collected historical log data of the web firewall is used for feature extraction, and an anomaly detection model based on the improved isolated forest model is constructed based on the feature extraction result, so that the model can be used for detecting the log data to be detected of the web firewall and determining the corresponding detection result. Under the condition of ensuring the efficiency of the original algorithm, the identification accuracy of abnormal flow is improved, and the false alarm is reduced. By combining the abnormal flow data characteristics and introducing a weight coefficient to improve an isolated deep forest algorithm, the influence of an isolated tree constructed by low-correlation characteristics on a result can be reduced, and the identification precision of flow data containing local relatively sparse characteristics is improved.
Hereinafter, the steps of the web traffic anomaly detection method based on the improved isolated forest algorithm in the present exemplary embodiment will be described in more detail with reference to the drawings and the examples.
In step S11, web firewall history log data and log data to be tested of the web firewall are collected.
In the present exemplary embodiment, the above-described method may be performed by a server; alternatively, the present invention may be implemented by a server and a user terminal cooperatively performing. For example, a user may send a traffic detection request to a server side on a terminal device, where the traffic detection request may include log data to be detected of a web firewall collected in real time. After receiving the traffic detection request, the server side can create a traffic detection task. And the server executes the flow detection task, firstly trains an anomaly detection model, and then detects the log data to be detected of the web firewall collected in real time by using the trained anomaly detection model in real time. Alternatively, in some exemplary embodiments, the above web traffic anomaly detection method based on the improved isolated forest algorithm may also be executed by a terminal device with certain computing power.
In step S12, feature extraction is performed on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result.
In this exemplary embodiment, referring to fig. 2, the step S12 may specifically include:
step S21, analyzing the historical log data of the web firewall to obtain characteristic data of multiple dimensions, and taking the characteristic data as a training sample;
step S22, constructing an isolated tree set based on the feature data;
step S23, calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample;
step S24, calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix;
and step S25, combining the first weight w1 and the second weight w2 to calculate an abnormal score.
Specifically, the number of history logs of a preset period duration may be collected. And generalizing an HTTP (Hyper Text Transfer Protocol) request in the history log data of the web firewall, and extracting feature data of multiple dimensions based on preset feature fields. For example, the dimension may specifically be included in the header of an http request: cookies, urls, user agents, usernames, referrers, etc. The specific parameters of each type of feature may include parameter length information, parameter number character distribution information, and text similarity information. The text similarity can be the similarity between the current sample and the normal sample or the abnormal sample; alternatively, the similarity calculation result may correspond to different http requests.
In this example embodiment, the method further comprises: and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
For example, in the training set, a small number of abnormal samples can be randomly added, and parameters are set to train the isolated tree.
In the training process of the isolated tree, the variances of the distances between the centers of the abnormal sample and the normal sample can be respectively calculated, and the isolated tree weight w1 is obtained through calculation, wherein the formula may include:
Figure BDA0003449096710000071
wherein, deltanRepresenting the distance variance from the normal sample to the center of the normal sample; deltaaRepresenting the variance of the distance of the outlier to the center of the normal sample.
Specifically, the samples may be projected into a euclidean space, and the normal sample center and the distances from other sample points to the normal sample point center are calculated in the euclidean space according to the distribution.
In this exemplary embodiment, the constructing the set of orphan trees based on the feature data includes: and respectively constructing corresponding isolated tree sets based on the characteristic data of each type. For example, corresponding orphan trees can be created based on url, cookie, respectively.
In this exemplary embodiment, the determining a symmetric matrix by using the incorrigibility value to calculate the diversity between the isolated trees and calculating the second weight coefficient w2 by using the symmetric matrix includes:
calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix;
and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
For example, for classifier hiAnd hjThe predicted results of the two isolated trees may include:
Figure BDA0003449096710000072
based on the prediction results, a symmetry matrix may be calculated, and the formula may include:
Figure BDA0003449096710000081
specifically, the variability of the isolated trees can be represented as:
Figure BDA0003449096710000082
wherein, the matrix is calculated as the mean value according to the columns, and the mean value is used as the weight coefficient w2 of the tree path depth. The calculation formula may include:
Figure BDA0003449096710000083
in this exemplary embodiment, the weight coefficient obtained based on the above calculation improves the abnormal score calculation of the conventional isolated forest algorithm, and the formula may include:
Figure BDA0003449096710000084
Figure BDA0003449096710000085
where n represents the number of isolated trees and h represents the depth of the trees.
In step S13, the log data to be tested of the web firewall is input into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
In this exemplary embodiment, the anomaly detection model outputs a normalized anomaly score according to input real-time data, and when the anomaly score is greater than an anomaly threshold, the real-time data is anomalous data. Data with a normalized anomaly score greater than 0.6 is generally defined as anomalous data, and the degree of anomaly of the data is proportional to the normalized anomaly score.
According to the web flow abnormity detection method based on the improved isolated forest algorithm, web firewall historical log data are utilized, firstly, certain dimension characteristic data are randomly extracted through the isolated forest algorithm, an isolated tree set with a certain scale is constructed, and similarity information of a sample to be detected and normal and abnormal samples is introduced into child nodes of the isolated tree set. And introducing a non-conformity calculation method in the integration model to calculate and set an isolated tree integration weight coefficient, and finally setting a search depth weight coefficient by combining the similarity and diversity of isolated tree child nodes to improve an isolated forest algorithm so as to realize identification of abnormal web traffic. A model (Profile) is built for a large number of normal logs, whereas those that do not match normal traffic are identified as abnormal, "in the fight" strain ten thousand invariably ". The method is accessed into a web application firewall and used for filtering flow, the identified normal logs are directly released, and the abnormal logs enter a threat detection engine, so that the working efficiency of the firewall is effectively improved, and meanwhile, the resource consumption is reduced. In addition, the method fully considers http request data characteristics, introduces a weight coefficient to improve an anomaly detection algorithm, and changes a mode that each isolated tree of an isolated forest in an original algorithm contributes an anomaly score on average, thereby reducing errors caused by low-correlation characteristics and improving the detection sensitivity to local sparse data. Under the condition of ensuring the efficiency of the original algorithm, the identification accuracy of abnormal flow is improved, and the false alarm is reduced. In addition, the model disclosed by the invention is combined with text similarity, and the abnormal recognition effect of the isolated tree generated by training is judged through the distance of the child nodes; and changing the calculation mode of the depth of the isolated tree, and setting the weight coefficient of the search depth by combining the similarity and diversity of the sub-nodes of the isolated tree.
It is to be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Further, referring to fig. 3, in an embodiment of the present example, there is also provided a web traffic anomaly detection apparatus 30 based on an improved isolated forest algorithm, including: a data acquisition module 301, a model training module 302 and a detection result output module 303. Wherein,
the data collection module 301 may be configured to collect historical log data of a web firewall and log data to be tested of the web firewall.
The model training module 302 may be configured to perform feature extraction on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result.
The detection result output module 303 may be configured to input the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model, so as to obtain an anomaly detection result.
In this exemplary embodiment, the model training module 302 may be configured to include: analyzing historical log data of the web firewall to obtain feature data of multiple dimensions, and taking the feature data as a training sample; constructing an orphan tree set based on the feature data; calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample; calculating diversity among the isolated trees by using the incoordination amount to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix; and calculating the abnormal score by combining the first weight w1 and the second weight w 2.
In this exemplary embodiment, the parsing the web firewall history log data to obtain feature data of multiple dimensions includes: the HTTP requests in the history log data of the web firewall are generalized, and the feature data of multiple dimensions are extracted based on preset feature fields.
In the present exemplary embodiment, the model training module 302 may include: and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
In this exemplary embodiment, the building a set of orphan trees based on the feature data includes: and respectively constructing corresponding isolated tree sets based on the characteristic data of each type.
In the present exemplary embodiment, the first weight w1 is an isolated tree weight.
In the present exemplary embodiment, the second weight coefficient w2 is an isolated tree path depth weight coefficient; the calculating diversity between isolated trees by using the incorrigibility quantity to determine a symmetric matrix and calculating a second weight coefficient w2 by using the symmetric matrix includes: calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix; and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
The specific details of each module in the web traffic anomaly detection apparatus based on the improved isolated forest algorithm are described in detail in the corresponding web traffic anomaly detection method based on the improved isolated forest algorithm, and therefore, the details are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In an exemplary embodiment of the present disclosure, there is also provided a computer system capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
The terminal device 500 according to this embodiment of the present invention is described below with reference to fig. 4. The terminal device 500 shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, terminal device 500 is in the form of a general purpose computing device. The components of computer system 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that couples the various system components including the memory unit 620 and the processing unit 610.
Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The computer system 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the computer system 600, and/or with any devices (e.g., router, modem, etc.) that enable the computer system 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. The display unit 640 may also be connected through an input/output (I/O) interface. Moreover, computer system 600 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network such as the Internet) via network adapter 660. As shown, network adapter 660 communicates with the other modules of computer system 600 via bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computer system 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 5, a program product 100 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (10)

1. A web traffic anomaly detection method based on an improved isolated forest algorithm is characterized by comprising the following steps:
collecting historical log data of a web firewall and log data to be tested of the web firewall;
extracting the characteristics of the historical log data of the web firewall to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model to obtain an anomaly detection result.
2. The method for detecting the web traffic anomaly based on the improved isolated forest algorithm as claimed in claim 1, wherein the step of performing feature extraction on the web firewall historical log data to construct an anomaly detection model based on an isolated forest model based on a feature extraction result comprises the following steps:
analyzing historical log data of the web firewall to obtain feature data of multiple dimensions, and taking the feature data as a training sample;
constructing an orphan tree set based on the feature data;
calculating a first weight w1 according to the distance from the abnormal sample to the center of the normal sample and the distance from the normal sample to the center of the normal sample;
calculating diversity among the isolated trees by using the incoordination amount to determine a symmetric matrix, and calculating a second weight coefficient w2 by using the symmetric matrix;
and calculating the abnormal score by combining the first weight w1 and the second weight w 2.
3. The method for detecting the web traffic anomaly based on the improved isolated forest algorithm as claimed in claim 2, wherein the step of analyzing the historical log data of the web firewall to obtain feature data of multiple dimensions comprises the following steps:
the HTTP requests in the history log data of the web firewall are generalized, and the feature data of multiple dimensions are extracted based on preset feature fields.
4. The method for detecting web traffic anomaly based on the improved isolated forest algorithm according to the claim 2 or the claim 3, characterized in that the method further comprises the following steps:
and adding a preset proportion of known abnormal samples in the feature data of each dimension to construct a training sample.
5. The method for detecting web traffic anomaly based on improved isolated forest algorithm according to claim 2, wherein the constructing of the isolated tree set based on the feature data comprises:
and respectively constructing corresponding isolated tree sets based on the characteristic data of each type.
6. The method for detecting web traffic abnormality based on the improved isolated forest algorithm as claimed in claim 2, wherein the first weight w1 is an isolated tree weight.
7. The web traffic anomaly detection method based on the improved isolated forest algorithm according to claim 2, wherein the second weight coefficient w2 is an isolated tree path depth weight coefficient;
the calculating diversity between isolated trees by using the incorrigibility quantity to determine a symmetric matrix and calculating a second weight coefficient w2 by using the symmetric matrix includes:
calculating diversity among the isolated trees by using the incoordination quantity to determine a symmetric matrix;
and calculating the mean value of the symmetric matrix according to columns, and taking the mean value as an isolated tree path depth weight coefficient.
8. A web traffic anomaly detection device based on an improved isolated forest algorithm is characterized by comprising:
the data acquisition module is used for acquiring historical log data of the web firewall and log data to be detected of the web firewall;
the model training module is used for extracting the characteristics of the historical log data of the web firewall so as to construct an anomaly detection model based on an isolated forest model based on the characteristic extraction result;
and the detection result output module is used for inputting the log data to be detected of the web firewall into the anomaly detection model based on the isolated forest model so as to obtain an anomaly detection result.
9. A storage medium having stored thereon a computer program which, when executed by a processor, implements the improved orphan forest algorithm-based web traffic anomaly detection method according to any one of claims 1 to 7.
10. A terminal device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the web traffic anomaly detection method based on the improved orphan forest algorithm of any one of claims 1 to 7 via execution of the executable instructions.
CN202111658650.1A 2021-12-30 2021-12-30 Web flow anomaly detection method and device based on improved isolated forest algorithm Active CN114338195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111658650.1A CN114338195B (en) 2021-12-30 2021-12-30 Web flow anomaly detection method and device based on improved isolated forest algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111658650.1A CN114338195B (en) 2021-12-30 2021-12-30 Web flow anomaly detection method and device based on improved isolated forest algorithm

Publications (2)

Publication Number Publication Date
CN114338195A true CN114338195A (en) 2022-04-12
CN114338195B CN114338195B (en) 2024-09-06

Family

ID=81019582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111658650.1A Active CN114338195B (en) 2021-12-30 2021-12-30 Web flow anomaly detection method and device based on improved isolated forest algorithm

Country Status (1)

Country Link
CN (1) CN114338195B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194920A (en) * 2023-09-06 2023-12-08 万仁企业管理技术(深圳)有限公司 Data system processing platform and processing method based on big data analysis
CN117241306A (en) * 2023-11-10 2023-12-15 深圳市银尔达电子有限公司 Real-time monitoring method for abnormal flow data of 4G network
CN117978461A (en) * 2024-01-15 2024-05-03 兵器装备集团财务有限责任公司 Abnormal login detection method and system based on isolated forest
CN118054967A (en) * 2024-04-01 2024-05-17 雅安数字经济运营有限公司 Anomaly detection method, medium and system based on network security
CN118118407A (en) * 2024-04-30 2024-05-31 国网浙江省电力有限公司信息通信分公司 Method and system for planning service route of optical transport network based on directional intermediate node
CN118568481A (en) * 2024-07-29 2024-08-30 中国移动通信集团四川有限公司 Abnormal data determination method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108777873A (en) * 2018-06-04 2018-11-09 江南大学 The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN110958222A (en) * 2019-10-31 2020-04-03 苏州浪潮智能科技有限公司 Server log anomaly detection method and system based on isolated forest algorithm
CN111431883A (en) * 2020-03-18 2020-07-17 上海观安信息技术股份有限公司 Web attack detection method and device based on access parameters
US20210160266A1 (en) * 2019-11-27 2021-05-27 Telefonaktiebolaget Lm Ericsson (Publ) Computer-implemented method and arrangement for classifying anomalies
CN113051552A (en) * 2019-12-27 2021-06-29 北京国双科技有限公司 Abnormal behavior detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108777873A (en) * 2018-06-04 2018-11-09 江南大学 The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN110958222A (en) * 2019-10-31 2020-04-03 苏州浪潮智能科技有限公司 Server log anomaly detection method and system based on isolated forest algorithm
US20210160266A1 (en) * 2019-11-27 2021-05-27 Telefonaktiebolaget Lm Ericsson (Publ) Computer-implemented method and arrangement for classifying anomalies
CN113051552A (en) * 2019-12-27 2021-06-29 北京国双科技有限公司 Abnormal behavior detection method and device
CN111431883A (en) * 2020-03-18 2020-07-17 上海观安信息技术股份有限公司 Web attack detection method and device based on access parameters

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194920A (en) * 2023-09-06 2023-12-08 万仁企业管理技术(深圳)有限公司 Data system processing platform and processing method based on big data analysis
CN117194920B (en) * 2023-09-06 2024-05-28 北京酷炫网络技术股份有限公司 Data system processing platform and processing method based on big data analysis
CN117241306A (en) * 2023-11-10 2023-12-15 深圳市银尔达电子有限公司 Real-time monitoring method for abnormal flow data of 4G network
CN117241306B (en) * 2023-11-10 2024-02-06 深圳市银尔达电子有限公司 Real-time monitoring method for abnormal flow data of 4G network
CN117978461A (en) * 2024-01-15 2024-05-03 兵器装备集团财务有限责任公司 Abnormal login detection method and system based on isolated forest
CN118054967A (en) * 2024-04-01 2024-05-17 雅安数字经济运营有限公司 Anomaly detection method, medium and system based on network security
CN118118407A (en) * 2024-04-30 2024-05-31 国网浙江省电力有限公司信息通信分公司 Method and system for planning service route of optical transport network based on directional intermediate node
CN118568481A (en) * 2024-07-29 2024-08-30 中国移动通信集团四川有限公司 Abnormal data determination method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114338195B (en) 2024-09-06

Similar Documents

Publication Publication Date Title
CN114338195B (en) Web flow anomaly detection method and device based on improved isolated forest algorithm
EP4058916B1 (en) Detecting unknown malicious content in computer systems
CN108616498A (en) A kind of web access exceptions detection method and device
CN107451476A (en) Webpage back door detection method, system, equipment and storage medium based on cloud platform
CN110765459A (en) Malicious script detection method and device and storage medium
US10826920B1 (en) Signal distribution score for bot detection
JP2018041442A (en) System and method for detecting web page abnormal element
CN108600172A (en) Hit library attack detection method, device, equipment and computer readable storage medium
CN116389235A (en) Fault monitoring method and system applied to industrial Internet of things
CN114070642A (en) Network security detection method, system, device and storage medium
CN112565164A (en) Dangerous IP identification method, dangerous IP identification device and computer readable storage medium
CN117240632A (en) Attack detection method and system based on knowledge graph
CN113918936A (en) SQL injection attack detection method and device
CN110572402A (en) internet hosting website detection method and system based on network access behavior analysis and readable storage medium
CN115913710A (en) Abnormality detection method, apparatus, device and storage medium
CN110955890B (en) Method and device for detecting malicious batch access behaviors and computer storage medium
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN109918901A (en) The method that real-time detection is attacked based on Cache
CN113032774B (en) Training method, device and equipment of anomaly detection model and computer storage medium
EP4169223A1 (en) Method and apparatus to detect scripted network traffic
Setianto et al. Gpt-2c: A gpt-2 parser for cowrie honeypot logs
CN116318763A (en) Zero-trust dynamic access control method for power distribution cloud master station
CN115333850B (en) Domain name detection method, system and related equipment
CN116915459B (en) Network threat analysis method based on large language model
Yang et al. Mining Method of Code Vulnerability of Multi-Source Power IoT Terminal Based on Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant