CN110290152B

CN110290152B - Firewall rule engine time complexity evaluation method based on probability weighted path

Info

Publication number: CN110290152B
Application number: CN201910651607.9A
Authority: CN
Inventors: 刘颖; 范渊; 吴永越; 郑学新; 刘韬
Original assignee: Chengdu DBAPPSecurity Co Ltd
Current assignee: Chengdu DBAPPSecurity Co Ltd
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2021-10-15
Anticipated expiration: 2039-07-18
Also published as: CN110290152A

Abstract

The invention discloses a firewall rule engine time complexity evaluation method based on a probability weighted path, which comprises the steps of firstly collecting data message samples flowing through a firewall, classifying the data message samples, analyzing a rule engine, and arranging the rule engine into a decision tree; then, calibrating the path length in the decision tree, and calculating the average path length of the data message sample of each category; and finally evaluating the complexity of the rule engine. The invention abstracts the rule engine into a decision tree, describes the time complexity by using the decision path length, weights the path length by using the probability distribution of a real data message sample, can describe the time complexity of the rule engine in a real environment to the maximum extent, and evaluates the performance of the firewall rule engine. Meanwhile, interference of other components of the firewall on evaluation is eliminated, and a basis is provided for firewall engine optimization.

Description

Firewall rule engine time complexity evaluation method based on probability weighted path

Technical Field

The invention relates to the technical field, in particular to a firewall rule engine time complexity evaluation method based on a probability weighted path.

Background

The firewall rule engine is the core of the firewall, most of CPU and memory resources of the firewall are consumed in rule matching, and the traditional firewall rule evaluation depends on black box test, namely, a packet sending tool or a tester is used for constructing and sending a large number of messages to measure indexes such as throughput, delay, packet loss rate, accuracy and the like of the firewall.

Such methods measure the performance of a firewall product and are not equivalent to the performance of a fire protection engine. The performance of the firewall is affected by the rule engine, and also affected by firewall hardware, network protocol stack implementation, upper layer application forwarding implementation, and the like. Meanwhile, the packet sending tool is difficult to simulate the network flow condition in a real environment.

Disclosure of Invention

The invention aims to provide a firewall rule engine time complexity evaluation method based on a probability weighted path, which evaluates the performance of a firewall rule engine by using time complexity, can evaluate the performance without actually operating a firewall and can expect the capability of the engine even in the stage of designing the firewall.

The invention is realized by the following technical scheme: the firewall rule engine time complexity evaluation method based on the probability weighted path specifically comprises the following steps:

step F1: collecting data message samples flowing through a firewall and classifying the data message samples;

step F2: analyzing the rule engine and arranging the rule engine into a decision tree;

step F3: calibrating the path length in the decision tree;

step F4: calculating the average path length of the data message sample of each category;

step F5: the complexity of the rules engine is evaluated.

Further, in order to better implement the present invention, the step F1 specifically includes the following steps:

step F11: collecting data message samples flowing through a firewall under a real environment, classifying the data samples, and classifying the data message samples into normal, attack and suspicious data message samples;

step F12: after classification is finished, counting the frequency of each class as the probability of the class:

wherein N is_iNumber of messages representing class i, N represents total number of messagesAmount of the compound (A).

Further, in order to better implement the present invention, step F2 specifically refers to:

each non-leaf child node in the decision tree represents a rule, each edge represents a condition, and each leaf node represents a category of the data message sample.

Further, in order to better implement the present invention, step F3 specifically refers to:

the process of detecting the data message by the rule engine is a path from a root node of the decision tree to a leaf node, and only one detection path is provided for any data message; the path length in the decision tree is a computational complexity.

Further, in order to better implement the present invention, the path length specifically means: setting the execution time of each rule detection of the rule engine to be the same, defining the length of any one edge in the decision tree as 1, and setting the path length as l_i ^kIs the number of edges that pass from the root node to the kth node of class i.

Further, in order to better implement the present invention, step F4 specifically refers to:

calculate the average path length for each class:

wherein L is_iMean length, C, of the class i_iIndicating the number of paths from the root node to the leaf nodes of category i.

Further, in order to better implement the present invention, step F5 specifically refers to:

the complexity of the rules engine is evaluated using the mathematical expectation of the average path length:

the lower the mathematical expectation of the average path length, the lower the complexity of the rules engine.

The working principle is as follows:

firstly, collecting a large number of data message samples flowing through a firewall under a real environment, classifying the data message samples, and abstracting a rule engine into a decision tree by analyzing the design of a firewall rule engine, wherein the rule engine detects that the data message is abstracted into a path from a root node to a leaf node of the decision tree, and any data message has one path and only one path. The path length in the decision tree represents the time complexity of the rule engine, the average path length of each category is calculated according to the path of any data message, but the complexity of the rule engine cannot be directly described by the average path length because the categories of the data message are not uniformly distributed, and therefore the complexity of the rule engine is evaluated by calculating the mathematical expected value of the data message according to the average path length.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the invention abstracts the rule engine into a decision tree, describes the time complexity by using the decision path length, weights the path length by using the probability distribution of a real data message sample, can describe the time complexity of the rule engine in a real environment to the maximum extent, and evaluates the performance of the firewall rule engine. Meanwhile, interference of other components of the firewall on evaluation is eliminated, and a basis is provided for firewall engine optimization.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a decision tree according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.

Example 1:

the invention is realized by the following technical scheme, as shown in fig. 1-2, the firewall rule engine time complexity evaluation method based on the probability weighted path is characterized in that: the method specifically comprises the following steps:

step F3: calibrating the path length in the decision tree;

step F5: the complexity of the rules engine is evaluated.

It should be noted that, through the above improvement, the present invention provides a firewall rule engine time complexity evaluation method based on a probability weighted path, which uses the time complexity to evaluate the performance of the firewall rule engine, and can evaluate the performance without actually running the firewall, and even can expect the capability of the engine in the firewall design stage.

Other parts of this embodiment are the same as those of the above embodiment, and thus are not described again.

Example 2:

in this embodiment, further optimization is performed on the basis of the above embodiment, as shown in fig. 1 to fig. 2, the step F1 specifically includes the following steps:

wherein N is_iThe number of packets of the category i is represented, and N represents the total number of packets.

It should be noted that, through the above improvement, a large number of data packet samples flowing through the firewall under the real environment are collected, and the data packet samples are classified. The categories can be simply classified as normal, attack and suspicious, and can also be classified as normal, A attack, B attack, C attack and the like. For example, as shown in FIG. 2, leaf node c1 may represent normal, c2 represents attack, and c3 represents suspect. After classification is finished, counting the frequency of each class as the probability of the class:

The work of the rule engine is to classify the data message samples, the data message samples can be classified into different classes according to the difference of the rule engine, and how to realize the classification is the rule of the firewall engine.

Example 3:

in this embodiment, further optimization is performed on the basis of the above embodiment, as shown in fig. 1 to fig. 2, the step F2 specifically refers to:

The step F3 specifically refers to:

It should be noted that, through the above improvement, there are 6 rules in the decision tree shown in fig. 2, where r1 is a root node, r2, r3, r4, r5, and r6 are child nodes, and c1, c2, and c3 are leaf nodes. When any data message enters the engine, the first rule r1 is executed first, and r2 or r3 is selected to be executed according to the root node r1 until the data message is executed to a certain leaf node, namely a node without child nodes. As can be seen from the decision tree of fig. 2, there is one and only one path from the root node to any leaf node, for example, the only path from the root node r1 to the third leaf node c1 (counted from left to right) is: r1- > r2- > r5- > r6- > c 1.

Example 4:

the present embodiment is further optimized based on the above embodiments, and as shown in fig. 1 to fig. 2, the path length specifically refers to: setting each rule of the rule engine to detect the same execution time, defining the length of any edge in the decision tree as 1, and then the path length is

Is the number of edges that pass from the root node to the kth node of class i.

It should be noted that, with the above improvement, as shown in fig. 2, the path length from the root node r1 to the leaf node c1 is 4, i.e., there are 4 edges. If the leaf node c1 represents the 1 st category, it can be seen that there are 4 paths r1 and c1>r2->r5->r6->c1 points to the third leaf node c1 (the third c1 from left to right), i.e., the path length is

In the same way, the first leaf nodePath length of point c1

Path length of the second leaf node c1

Path length of the fourth leaf node c1

C_iIndicates the number of paths from the root node to the leaf node of the ith category, and the leaf node C1 of the first category in FIG. 2 has 4, so C₁4. In the same way, C₂＝3，C₃＝2。

Example 5:

in this embodiment, further optimization is performed on the basis of the above embodiment, as shown in fig. 1 to fig. 2, the step F4 specifically refers to:

calculate the average path length for each class:

It should be noted that, with the above improvement, since the categories of the data packet samples are not uniformly distributed, for example, in the decision tree shown in fig. 2, the number of c1 of category 1 is more than that of other categories, so the complexity of the engine cannot be directly described by the average path length, and then the mathematical expectation value of the path length is used to describe:

the lower the mathematical expectation of the average path length, the lower the complexity of the rules engine. A good rule engine should keep e (l) as low as possible, so the optimization method should keep the class paths with high probability as short as possible. The invention abstracts the rule engine into a decision tree, describes the time complexity by using the decision path length, weights the path length by using the probability distribution of a real data message sample, can describe the time complexity of the rule engine in a real environment to the maximum extent, and evaluates the performance of the firewall rule engine. Meanwhile, interference of other components of the firewall on evaluation is eliminated, and a basis is provided for firewall engine optimization.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. The firewall rule engine time complexity evaluation method based on the probability weighted path is characterized by comprising the following steps: the method specifically comprises the following steps:

step F3: calibrating the path length in the decision tree;

step F5: evaluating the complexity of the rule engine;

the step F1 specifically includes the following steps:

wherein N is_iThe number of messages of the category i is represented, and N represents the total number of messages;

the step F2 specifically refers to:

each non-leaf child node in the decision tree represents a rule, each edge represents a condition, and each leaf node represents the category of the data message sample;

the step F3 specifically refers to:

the process of detecting the data message by the rule engine is a path from a root node of the decision tree to a leaf node, and only one detection path is provided for any data message; the path length in the decision tree is the complexity of the calculation;

the path length specifically means: setting each rule of the rule engine to detect the same execution time, defining the length of any edge in the decision tree as 1, and then the path length is

The number of edges which are passed from the root node to the kth node with the category i;

the step F4 specifically refers to:

calculate the average path length for each class:

wherein L is_iMean length, C, of the class i_iRepresenting the number of paths from the root node to the leaf node of the category i;

the step F5 specifically refers to: