CN115134305B

CN115134305B - Dual-core cooperation SDN big data network flow accurate classification method

Info

Publication number: CN115134305B
Application number: CN202210727496.7A
Authority: CN
Inventors: 聂博
Original assignee: Hongmeng Tianlu Beijing Technology Co ltd
Current assignee: Hongmeng Tianlu Beijing Technology Co ltd
Priority date: 2022-06-25
Filing date: 2022-06-25
Publication date: 2024-01-23
Anticipated expiration: 2042-06-25
Also published as: CN115134305A

Abstract

According to the method, a dual-core collaborative data flow classification model is built in a software defined network, a machine learning dual-core model which is high in identification efficiency and can be based on a large amount of data analysis is combined with a flow classification identification algorithm, creatively built dual-core is an inverse error feedback core and a condition fitting feature core respectively, after network flow information is acquired, a classifier constructed by the dual-core is utilized to build a method which is high in multi-attribute classification efficiency and accuracy, the method is suitable for a large data flow identification environment, the accuracy of flow classification is further improved through training of a large amount of network data, the accuracy of network flow classification is improved based on the inverse error feedback core, the speed of network flow classification is improved based on the condition fitting feature core, the overall classification efficiency and quality of the network are synergistically improved by the two to give play to advantages, and finally the accuracy of flow classification reaches 99.76%, and the classification efficiency is improved by 17.35% compared with that of the independent use of the inverse error feedback core, so that the method is an efficient SDN network environment flow accurate classification method.

Description

Dual-core cooperation SDN big data network flow accurate classification method

Technical Field

The application relates to a software defined network flow classification model, in particular to a dual-core cooperation SDN big data network flow accurate classification method, and belongs to the technical field of SDN flow classification.

Background

Today, where data is exploded, the importance of network speed, quality and security is increasingly highlighted, and conventional methods of self-adjusting, load balancing and network congestion control through various network protocols of a switch network gradually appear to be weak and far from meeting the expected network operation efficiency of people. No matter how the traditional network architecture is further improved and the physical hardware is upgraded, the network quality cannot be improved greatly, and under the large background, a new architecture of the software defined network SDN is gradually accepted and widely applied; the architecture adopts a layered idea, a control layer and a forwarding layer in a traditional network are separated from each other, a control function is completed by an upper master-slave controller, and the controller maintains the topology structure of the whole network; compared with the traditional network structure, the network structure has more protocol tuples, can contain more useful flow information, and is more beneficial to network analysis and maintenance.

From the traffic perspective, as the P2P peer-to-peer network traffic and network attacks in the current high-speed network environment are greatly increased, traffic types on the network are more diversified, requirements for network traffic classification recognition are more severe, a large amount of random technology and encryption data enable the traditional network traffic recognition technology based on traffic ports and the network traffic recognition technology based on deep packet inspection to be gradually debilitated, recognition efficiency is gradually reduced, compared with a software-defined network architecture, traffic information is less, based on the above consideration, a machine learning algorithm which is high in recognition efficiency and can be based on a large amount of data analysis is combined with a traffic classification recognition algorithm, a method with higher classification efficiency and accuracy for multiple attributes is established, the method is suitable for the traffic recognition environment, and the recognition accuracy of traffic classification is further improved through training of a large amount of network data.

Traditional network traffic is a complex system formed by a plurality of subjects such as users, networks, applications, hosts, and the like, which are interrelated and dependent on interactions. The traffic behavior and network characteristics between different applications are different, the information contained in the network traffic is also different, in the case of large explosion of network data traffic, a more excellent traffic classification recognition algorithm is generated, firstly, the traffic classification algorithm based on port numbers is simple and easy to realize, but if the traffic does not contain any port number information, or is encrypted, or like P2P traffic, the method cannot recognize the traffic change and cannot meet the current requirement of network traffic diversification by adopting a dynamic port number allocation mode. Secondly, the traffic identification and classification method based on the effective load is too large in space cost for the current network environment to maintain a special signature table of all known applications, and the table lookup operation is required to be carried out for each identification, so that the time cost is not small, and the method is difficult to adapt to the modern network environment although the accuracy is high. Thirdly, a traffic identification and classification method based on host behavior has limitation on the identification efficiency of application subtype or encrypted transport layer header, so that the identification accuracy is greatly influenced. Fourth, based on the flow identification of the machine learning method, based on the statistical characteristics of the network flow (such as port number, flow duration time interval, data packet transmission time interval or length of the flow packet, etc.), the accuracy and efficiency of the method depend on the accuracy and comprehensiveness of the training set of the training classification core, noise existing in the training set can affect the identification efficiency and accuracy, fitting phenomenon is easy to happen during training, and when the method is used for classifying and identifying the network flow, it is important to select a proper excellent training set.

With the explosion of network traffic, in order to alleviate the challenges of new architecture, the importance of classifying and identifying the traffic of the software-defined network is increasing, but the specific classification method in the prior art mostly stays in the traditional network architecture in the past, and the combination of the machine-learned traffic classification and identification method and the software-defined network architecture is not considered. The construction of the software defined network meets the accurate classification of the large data network flow, and is based on the following network requirements:

firstly, the proliferation of network scale and the proliferation of network user quantity: with the continuous development and maturity of the internet of things and cloud computing technology, the network bandwidth is gradually increased, the network scale and the number of network users are inevitably increased exponentially, if the huge change cannot be carried, huge challenges are caused to the network service quality and the security, and even the large-scale network paralysis is caused, so that the consequence is not supposed to be considered.

Secondly, network intelligent development requirements: along with the high-speed development of intelligent fields such as data mining, artificial intelligence, machine learning and the like, the user behavior information and the user preference information contained in the network traffic are more and more, the value of the traffic is increasingly increased, and the great application of the technologies can bring great pressure to the processing and classification of the network traffic.

Thirdly, the network multimedia development requirement: the network compatibility and high load performance are very challenging for various self-media applications such as microblog, weChat, friend circle, etc., and in order to cope with the development of network multimedia, a faster and better network traffic classification architecture is needed to meet the future development trend of network traffic multimedia.

Fourthly, network security development requirements: continuous privacy disclosure and network attack, the endlessly layered network security events generate fear to people, are essential for preventing, identifying, classifying and processing aggressive traffic, and the complete security coping system is also a part of the current network development.

To sum up, there are several problems and drawbacks in classifying traffic in the big data network in the prior art, and the problems and key technical difficulties to be solved in the present application include:

(1) The method for carrying out network congestion control through various network protocols by self-adjusting and load balancing of the switch network in the prior art cannot meet the network operation efficiency, the quality of the network cannot be improved and physical hardware is upgraded in any way under the framework of the traditional network, the requirement for flexible management of the whole network is difficult to meet, the data transmission speed and agility of network services cannot be guaranteed, the current network flow relates to a plurality of closely related entities such as a host, a network, an application and a user, and the like, is a complex system with multiple factor fusion, each network application has corresponding flow behavior characteristics, the complexity of the network flow is increased increasingly, various network differences lead to different network environments suitable for different network architectures, and only after enough network analysis, the network flow is accurately classified, so that the high applicability and high efficiency of the network can be achieved. And the current network attack is endless, if the network attack cannot be captured and processed at the first time, the network environment is greatly destroyed, and even a large-scale network paralysis is caused, so that the network traffic classification is extremely important. Currently, machine learning needs to be considered in SDN network traffic classification, so that QoS and big data classification efficiency and accuracy are improved.

(2) From the traffic perspective, the P2P peer-to-peer network traffic and network attacks in the current high-speed network environment are greatly increased, the requirements for classifying and identifying the network traffic are more severe, a large number of random technologies and encryption data enable the traditional network traffic identification technology based on traffic ports and the traditional network traffic identification technology based on deep message detection to be gradually weak, the identification efficiency is gradually reduced, compared with a software defined network architecture, the traditional network has less traffic information, the software defined network traffic is very necessary to be classified and identified along with the sudden increase of the network traffic, but the specific classification method in the prior art is mostly stayed in the traditional network architecture, the combination of the machine learning traffic classification identification method and the software defined network architecture does not consider, the defect exists in the prior art, the machine learning algorithm which has high identification efficiency and can be based on a large number of data analysis is combined with the traffic classification identification algorithm, the method which has high classification efficiency and accuracy of multiple attributes is established, the method which is suitable for the large data traffic identification environment, and the accuracy of the traffic classification is further improved through the training of a large number of network data, and the large number of data service network service is better.

(3) In the traditional network traffic, a plurality of main bodies such as users, networks, applications and hosts are mutually connected and supported by interaction to form a complex system, in the large explosion of network data traffic, the prior art generates a plurality of traffic classification and identification algorithms, namely a traffic classification algorithm based on port numbers, but if the traffic does not contain any port number information, is encrypted, or adopts a dynamic port number allocation mode like P2P traffic, the method cannot identify the traffic, and cannot meet the current requirement of network traffic diversification. Secondly, the traffic identification and classification method based on the effective load has too large space cost for the current network environment, and the table lookup operation needs to be carried out for each identification, so that the time cost is not small, and the method is difficult to adapt to the modern network environment. Thirdly, the traffic identification and classification method based on host behavior has limitation on the identification efficiency of application subtype or encrypted transport layer header, and the identification accuracy is low. Fourth, based on the flow recognition of the machine learning method, the accuracy and efficiency depend on the accuracy and comprehensiveness of the training set of the training classification core, and noise existing in the training set can influence the recognition efficiency and accuracy, the fitting phenomenon is easy to occur during training, the prior art cannot select a proper excellent training set, a high-efficiency and accurate machine learning network flow recognition method is lacked, or the quality and efficiency of network flow classification cannot be balanced, obvious short plates exist, the network flow classification efficiency is low and poor finally, and the current network intelligent, multimedia and safe development requirements cannot be met.

Disclosure of Invention

According to the method, a dual-core collaborative data flow classification model is built in a software defined network, a machine learning dual-core model which is high in recognition efficiency and can be based on a large amount of data analysis is combined with a flow classification recognition algorithm, creatively built dual-core is an inverse error feedback core and a condition fitting feature core respectively, after network flow information is obtained, the obtained network flow information is accurately classified by using a classifier constructed by the inverse error feedback core and the condition fitting feature core, a method which is high in multi-attribute classification efficiency and accuracy is built, the method is suitable for a large-data flow recognition environment, the recognition accuracy of flow classification is further improved through training of a large amount of network data, the network flow classification accuracy is improved based on the inverse error feedback core, the network flow classification speed is improved based on the condition fitting feature core, the overall network classification efficiency and quality are cooperatively improved, the final flow classification accuracy reaches 99.76%, and the classification efficiency is improved by 17.35% compared with that of the independent use of the inverse error feedback core, and the method is high in efficiency, high in applicability and robustness, and is an optional efficient network flow accurate recognition classification method in an SDN network environment.

In order to achieve the technical effects, the technical scheme adopted by the application is as follows:

the method comprises the steps that a dual-core collaborative SDN big data network flow accurate classification model is built in a software defined network, the built dual cores are an inverse error feedback core and a conditional fitting feature core respectively, after network flow information is acquired, the acquired network flow information is accurately classified by using a classifier constructed by the inverse error feedback core and the conditional fitting feature core, network flow classification accuracy is improved based on the inverse error feedback core, network flow classification speed is improved based on the conditional fitting feature core, and network overall classification efficiency is improved by the aid of the two cores;

1) An SDN flow classification method based on an inverse error feedback core comprises the following steps: an inverse error feedback kernel learning process based on SDN, an inverse error feedback kernel diagonal recursion strategy based on stage iteration optimization, and an inverse error feedback kernel SDN classification training process;

the SDN controller directly obtains flow information and preprocesses the information to obtain flow informationNoise, removing data of missing attributes and samples incapable of being used as classification basis, obtaining a group of data sets consisting of discrete or continuous values, randomly dividing each judging type data in the original data set into a training set and a testing set according to a random algorithm in a ratio of 8:2, defining a connection weight w and a critical value theta for each feedback core in the method, performing inverse error feedback core training convergence in a mode of formula 1 and formula 2, and performing weighting adjustment on the weight of the current feedback core node if the calculated output result of the current feedback core node is y' for a group of samples (x, y) in the training set, wherein x is defined as a connection weight w and a critical value theta _i For sample x, the value component on the ith feedback kernel attribute:

w _i ＝w _i +mu type 1

μ＝γ(y-y′)x _i 2, 2

Wherein gamma epsilon (0, 1) is training learning rate of the feedback core, when y=y ', namely when the target output is consistent with the current feedback core output, namely the flow identification result is consistent with the training result, the weight of the feedback core is not adjusted, and when y is not consistent with y', namely the flow identification result is inconsistent with the training result, the connection weight of the attribute is multiplied by learning rate gamma according to the difference between the target output value and the calculated output value of the current feedback core to obtain an adjustment change value mu, so that the original connection weight is updated;

2) An SDN flow classification method based on a condition fitting feature kernel comprises the following steps: condition fitting feature kernel learning process and condition fitting feature kernel SDN classification training and verification process based on SDN.

Preferably, the inverse error feedback kernel learning process based on SDN: a single hidden layer inverse error feedback kernel of a (D, n, 1) architecture is employed, assuming a given traffic data training set d= { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，…，(x _m ，y _m ) Each input example is described by d flow attributes, the number of output nodes is 1, namely the type of the flow determined by the final inverse error feedback kernel algorithm is represented by y, the number of input layer nodes is d, and the number of input layer intermediate nodes is x _i Representation, concealing layerThe number of nodes is n, and b is used for hidden layer intermediate nodes _j The node connection weight of the output layer is represented by w, the connection weight of the hidden layer is represented by v, and the feedback core activation function adopts Sigmoid function inverse error feedback core learning rate gamma;

assuming that the feedback core critical value is 0, the input of the j-th feedback core of the hidden layer is:

the input of the output feedback core of the output layer is:

find the inverse error feedback kernel at (x) _i Variance over y):

p and k are inverse error feedback core parameters, and the feedback core connection weight is adjusted based on an oblique quantity recursion strategy to obtain the change degree mu of the hidden layer input connection weight _j Degree of change μ of input connection weight with output layer _ij ：

The method comprises the following steps:

combining equation 6, equation 7, and equation 8 yields the output layer skew amount recursion rate equation 10 and equation 11:

connection weight: mu (mu) _j ＝γy′(1-y′)(y-y′)b _j 10. The method of the invention

Critical value: θ _y = - γy ' (1-y ') (y-y ') formula 11

And similarly, obtaining the oblique quantity recursion rate of the hidden layer as shown in formulas 12 and 13:

in the process of calculating the updated weight, when no noise exists in the flow information, the result of the formula 5 is minimized, namely, in order to achieve the best training effect, an inverse error feedback kernel calculation degree variable is added on the basis of the formula 5, the phenomenon of overfitting is avoided, and a new discriminant function of the formula 14 is obtained:

Where λ ε (0, 1) represents the ratio of both the equalization variance and the degree variable.

Preferably, in the inverse error feedback kernel training model, a stage iterative optimization algorithm is adopted as an inverse error feedback kernel slope amount recursion strategy, and in the multidimensional nonlinear programming, an initial value is defined as X ^k Finding an estimate better than the initial value, each time by delta ^k For X ^k Updating is performed, and according to the taylor formula, equation 15 is obtained:

wherein,for jacobian matrix at XAt a value that is transformed to minimize the error between the true value and the estimated value, resulting in:

let right formula be 0, e _k ＝Y-f(X ^k ) M, N is the corresponding coefficient of the normalization equation to obtain X ^k+1 And X is ^k Iterative relation 19 between:

sequentially iterating to obtain a sequence of steps of epsilon=X ^k+1 -X ^k Wherein epsilon is a termination condition, and the normalized equation is converted into the following equation on the basis of the level iterative optimization:

wherein:

when lambda is not equal to 0, the level iterative optimization algorithm approaches the slope recursion rate, namely when i is not equal to j, the algorithm has higher descending speed, and on the inverse error feedback kernel model, the level iterative optimization algorithm is suitable for the slope recursion process in the training process.

Preferably, the inverse error feedback kernel SDN classification training process:

input: software defined network traffic information training set d= { (x) with denoising _k ，y _k ) -k is equal to or greater than 1 and equal to or less than n), and the inverse error feedback kernel learning rate gamma;

and (3) outputting: the current software defines an optimal inverse error feedback kernel trained under the network training set;

the algorithm divides the flow data set into a training set and a verification set according to a random algorithm at a ratio of 8:2 during the previous data processing, and after the optimal inverse error feedback core classification core is successfully trained, the verification is performed by using the previous flow verification set, and the accuracy is calculated to complete the whole algorithm.

Preferably, the SDN based condition fitting feature kernel learning process: normalizing flow attribute denoising in a software defined network into a plurality of discrete or continuous attribute data sets, dividing the data sets into a training set and a verification set, and defining a training set D= { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，…，(x _m ，y _m ) D is the number of samples, and the set yields a conditional independence assumption based on network traffic attributes in a software defined network environment, as in equation 22:

prior to calculation 22, a priori probability P (c) is calculated, which satisfies an independent distribution for discrete attributes in network traffic, and thus its conditional probability P (x _i And c) is represented by formula 23:

wherein D is _c Representing the set of all samples with a value c in the traffic training set,representation D _c The samples in (a) take a value x on the ith attribute _i A set of samples of (a);

preferably, in actual softwareDefining network traffic data, there is a large amount of 0-value data or P (x) _i The discrete data of c) is processed by adopting a conditional fitting characteristic kernel algorithm to smooth, laplacian correction is carried out, and the prior probability P (c) and the conditional probability P (x) of flow data are obtained _i The calculation of c) varies to formula 24 and formula 25:

wherein N represents the number of values of all possible results on the training set sample space D, N _i The i-th attribute of the sample is represented by all possible value numbers, and the attribute of the prior probability and the conditional probability which cannot be carried out in the flow data in the software defined network can be processed by the condition fitting feature kernel after the Laplace correction, so that the accuracy of the data and the feasibility of an algorithm are improved;

for the continuity data in the network traffic, RTT cannot be applied to equation 23, RTT represents the round trip delay of the data traffic, for the continuity attribute, the probability density function is used to replace equation 23, and assuming that the distribution rule of the continuous data satisfies the normal distribution rule, equation 26 is obtained:

wherein mu _c，i To take the average of all values, σ, over the ith attribute for a sample with value c _c，i ² Representing the variance of all values on the ith attribute for a sample of value c.

Preferably, the condition fitting feature kernel SDN classification training and verification process:

input: software defined network traffic information training set d= { (x) with denoising _i ,y _i )}(1≤i≤n)，

Software defined network traffic information verification set D' = { (x) that has been denoised _k ,y _k )}(1≤k≤n)；

And (3) outputting: and verifying the judging results of all the samples in the set D'.

Compared with the prior art, the innovation point and the advantage of the application are that:

(1) According to the method, a dual-core collaborative data flow classification model is built in a software defined network, a machine learning dual-core model which is high in recognition efficiency and can be based on a large amount of data analysis is combined with a flow classification recognition algorithm, creatively built dual-core is an inverse error feedback core and a condition fitting feature core respectively, after network flow information is obtained, the obtained network flow information is accurately classified by using a classifier constructed by the inverse error feedback core and the condition fitting feature core, a method which is high in multi-attribute classification efficiency and accuracy is built, the method is suitable for a flow recognition environment, the recognition accuracy of flow classification is further improved through training of a large amount of network data, the network flow classification accuracy is improved based on the inverse error feedback core, the network flow classification speed is improved based on the condition fitting feature core, the overall network classification efficiency and quality are synergistically improved, the final flow classification accuracy reaches 99.76%, and the classification efficiency is improved by 17.35% compared with that of the independent use of the inverse error feedback core, and the method is high in efficiency, applicability and robustness.

(2) The application creatively provides an SDN flow classification method based on an inverse error feedback core, which constructs an inverse error feedback core learning process based on SDN, an inverse error feedback core bias recursion strategy based on level iteration optimization and an inverse error feedback core SDN classification training process; compared with the classification core based on the host behavior mode, the classification core has higher recognition granularity, the traffic recognition method based on the host behavior cannot recognize the subtype of some specific applications, but all traffic passing through the software defined network can be obtained by the controller to perform attribute discrete or continuous processing on the flow table items to form a data training set, so that the data training set is recognized by the inverse error feedback core classification core. The network flow identification method based on the inverse error feedback kernel effectively avoids the occurrence of the overfitting phenomenon through the Validation set, has very high classification accuracy, and has slightly lower classification efficiency compared with the flow classification based on the condition fitting feature kernel. Based on the method, the advantage of high efficiency of classifying the feature kernels by fully exerting the condition fitting feature kernels through dual kernel cooperation is fully exerted, and the finally established large data network flow classifying method is high in accuracy and efficiency, and has great competitive advantage and popularization and application value in the software defined network large data flow classifying method.

(3) The application creatively provides an SDN flow classification method based on condition fitting feature kernels, which constructs a condition fitting feature kernel learning process and a condition fitting feature kernel SDN classification training and verification process based on SDN, wherein the condition fitting feature kernels are extremely high in classification efficiency under the condition that flow attributes in a software defined network environment are determined, the training speed of the classification kernels is superior to that of the inverse error feedback kernel classification kernels, the flow attributes acquired by a controller are used as much as possible after Laplace correction, and the method has higher persuasion and credibility and is suitable for the flow identification classification environment of the software defined network. The recognition accuracy is higher when the flow data of the training set is sufficient, the full training cannot be obtained when the flow data is insufficient, the accuracy is reduced, and the classification accuracy is 97.95% and slightly lower than that of the flow classification method based on the inverse error feedback core, but the classification efficiency is higher. The method and the system further coordinate respective characteristics and advantages of two kinds of flow classification cores, are better complementary in classification accuracy and efficiency, have higher accuracy and efficiency in large data network flow classification, meet the current intelligent, multimedia and safe development demands of the network from the aspect of flow classification, and solve the outstanding contradiction between network scale proliferation and network user quantity proliferation.

Drawings

FIG. 1 is a schematic diagram of a flow identification classification inverse error feedback kernel.

Fig. 2 is a training flowchart of an SDN traffic identification method based on an inverse error feedback kernel algorithm.

FIG. 3 is a flow chart for traffic identification based on a condition-fitting feature kernel in a software-defined network.

Fig. 4 is a linear regression diagram of the inverse error feedback kernel training process.

FIG. 5 is a schematic diagram of flow classification experiment accuracy based on a condition fit feature kernel.

Detailed Description

The technical scheme of the dual-core collaborative SDN big data network flow accurate classification method provided by the application is further described below with reference to the accompanying drawings, so that a person skilled in the art can better understand the application and can implement the application.

With the tremendous increase of the information data volume in the current society, the requirements on network transmission speed and network service quality are increasing, and in front of the requirement that the huge data volume needs to be rapidly transmitted, the network technology is mature continuously, the network scale is greatly increased, the network architecture is complicated and multifunctional, the traditional network architecture cannot bear the increasing speed of the current business, and the requirement of flexible management on the whole network is difficult to meet. The software defined network abstracts the control layer functions, the lower layer equipment (SDN switch) is only responsible for forwarding data, and all control functions are completed by the upper layer controller, so that the data transmission speed and agility of network services are ensured, the flexible and comprehensive management and control of the network are realized, the network security is enhanced, the network efficiency is improved, and the efficiency and the operation cost are reduced.

Currently, network traffic involves multiple closely related entities such as hosts, networks, applications, and users, which is a multi-factor converged, complex system concept. Each network application has its own corresponding traffic behavior characteristics, and with the development of new applications and network protocols, the complexity of network traffic increases. The various network differences lead to different network environments suitable for different network architectures, and the high applicability and high efficiency of the network can be achieved only by accurately classifying network traffic after performing enough network analysis. From another point of view, an infinite network attack, if not captured and processed at the first time, will cause a great deal of disruption to the network environment and even a massive network paralysis, so the importance of network traffic classification is self-evident.

According to the method, the condition fitting feature kernel method and the inverse error feedback kernel algorithm in machine learning are applied to SDN network flow classification, the SDN flow is classified by the machine learning method, qoS can be improved, classification efficiency and accuracy are improved, reasonable experiments are conducted, and the effectiveness and the accuracy of the method are verified.

1. SDN flow classification method based on inverse error feedback kernel

SDN-based inverse error feedback kernel learning process

The SDN controller directly obtains flow information, carries out preprocessing denoising on the information, removes data with missing attributes and samples which cannot be used as classification basis, obtains a group of data sets consisting of discrete or continuous values, randomly divides each judging type data in the original data set into a training set and a testing set according to a random algorithm in a proportion of 8:2, carries out inverse error feedback core training convergence by defining a connection weight w and a critical value theta according to the modes of the formulas 1 and 2 for each feedback core in the method, carries out weighting adjustment on the weight of the current feedback core node if the calculated output result of the current feedback core node is y' for a group of samples (x, y) in the training set, wherein x is the weight of the current feedback core node _i For sample x, the value component on the ith feedback kernel attribute:

w _i ＝w _i +mu type 1

μ＝γ(y-y′)x _i 2, 2

And when y=y', namely the flow identification result is inconsistent with the training set result, the connection weight of the attribute is multiplied by the learning rate gamma according to the difference between the target output value and the calculated output value of the current feedback core to obtain an adjustment change value mu, so that the original connection weight is updated.

The application adopts a single hidden layer inverse error feedback core of a (D, n, 1) architecture, and supposes that a given flow data training set D= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…，(x _m ,y _m ) Each input example is described by d flow attributes (including source port number, destination port number, minimum packet transmission time, average packet transmission time and maximum packet transmission time in network flow information), the number of output nodes is 1, namely the type of flow determined by the final inverse error feedback kernel algorithm is represented by y, the number of input layer nodes is d, and the number of input layer intermediate nodes is x _i Indicating that n hidden layer nodes are used, and b is used for hidden layer intermediate nodes _j The output layer node connection weight is represented by w, the hidden layer connection weight is represented by v, and the feedback core activation function adopts Sigmoid function inverse error feedback core learning rate gamma to obtain the flow identification classification inverse error feedback core diagram of fig. 1.

the input of the output feedback core of the output layer is:

find the inverse error feedback kernel at (x) _i Variance over y):

p and k are inverse error feedback core parameters, and the feedback core connection weight is adjusted based on an oblique quantity recursion strategy to obtain the hidden layer input connection weight Degree of change mu _j Degree of change μ of input connection weight with output layer _ij ：

The method comprises the following steps:

from f' (x) =f (x) (1-f (x)) to obtain:

combining equation 6, equation 7, equation 8, and equation 9 yields the output layer skew amount recursion rate equations 10 and 11:

Critical value: θ _y = - γy ' (1-y ') (y-y ') formula 11

in the process of calculating the updated weight, when no noise exists in the flow information, the result of the formula 5 is minimized, namely, the best training effect is achieved, but the inverse error feedback kernel overfitting is possibly caused by the error of the training set, namely, the inverse error feedback kernel training error continuously drops during training, but the test error does not drop and rises, so that a calculation degree variable of the inverse error feedback kernel is added on the basis of the formula 5, the overfitting phenomenon is avoided, and a new discriminant function of the formula 14 is obtained:

(II) inverse error feedback kernel-skew amount recursion strategy based on stage iterative optimization

In an inverse error feedback kernel training model, a level iterative optimization algorithm is adopted as an inverse error feedback kernel diagonal recursion strategy, and in multidimensional nonlinear programming, an initial value is defined as X ^k Finding an estimate better than the initial value, each time by delta ^k For X ^k Updating is performed, and according to the taylor formula, equation 15 is obtained:

wherein,for the value of the jacobian matrix at X, the transformation minimizes the error between the true value and the estimated value, yielding:

let right formula be 0, e _k ＝Y-f(X ^k ) According to the normalization equation:

m, N is the corresponding coefficient of the normalization equation to obtain X ^k+1 And X is ^k Iterative relation 19 between:

wherein:

when lambda is not equal to 0, the level iterative optimization algorithm approaches to the oblique quantity recursion rate, namely when i is not equal to j, the algorithm has higher descending speed, and on the inverse error feedback kernel model, the level iterative optimization algorithm is suitable for the oblique quantity recursion process in the training process, and the inference is also adopted in the subsequent experiments.

(III) inverse error feedback kernel SDN classification training process

SDN flow identification method training flow chart based on inverse error feedback kernel algorithm is shown in figure 2.

Algorithm the flow data set was processed according to a random algorithm at 8: and 2, dividing the algorithm into a training set and a verification set, after the optimal inverse error feedback core classification core is successfully trained, verifying by using the previous flow verification set, and calculating the accuracy rate to complete the whole algorithm.

Compared with the prior classification core based on the host behavior mode, the classification core has higher recognition granularity, the traffic recognition method based on the host behavior cannot recognize the subtype of some specific applications, but all traffic passing through the software defined network can be obtained by the controller to perform attribute discrete or continuous processing on the flow table items to form a data training set, so that the data training set is recognized by the inverse error feedback core classification core.

2. SDN flow classification method based on condition fitting feature kernel

SDN-based condition fitting feature kernel learning process

Normalizing flow attribute denoising in a software defined network into a plurality of discrete or continuous attribute data sets, dividing the data sets into a training set and a verification set, and defining a training set D= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…，(x _m ,y _m ) D is the number of samples, and the set yields a conditional independence assumption based on network traffic attributes in a software defined network environment, as in equation 22:

in actual software defined network traffic data, there is a large amount of 0-value data or P (x) _i The discrete data of c), therefore, for the processing of such data, a conditional fitting feature kernel algorithm smoothing process is adopted, laplace correction is performed, and the prior probability P (c) and the conditional probability P (x) of the flow data are corrected _i The calculation of c) varies to formula 24 and formula 25:

wherein N represents the number of values of all possible results on the training set sample space D, N _i And the i-th attribute of the sample is represented by all possible value numbers, and the attribute of which the prior probability and the conditional probability cannot be carried out in the flow data in the software defined network can be processed by the condition fitting feature kernel after the Laplace correction, so that the accuracy of the data and the feasibility of an algorithm are improved.

However, for the continuity data in the network traffic, RTT (round trip delay of the data traffic) cannot be applied to equation 23, so for the continuity attribute, the present application replaces equation 23 with its probability density function, and assumes that the distribution rule of the continuous data satisfies the normal distribution rule, resulting in equation 26:

(II) training and verifying process of SDN classification of condition fitting feature kernel

Through the above checking and deduction, a flow identification flow chart based on the condition fitting feature kernel in the software defined network is obtained, as shown in fig. 3.

And (3) outputting: verifying the judging results of all samples in the set D';

compared with the inverse error feedback core method, the SDN flow identification and classification method based on the condition fitting feature core algorithm has higher judging efficiency and accuracy, has extremely high classification efficiency for the condition of determining the flow attribute in the software defined network environment, has higher training speed for the classification core than the inverse error feedback core classification core, uses the flow attribute acquired by the controller as much as possible after Laplacian correction, has higher convincing and credibility, and is more suitable for the flow identification and classification environment of the software defined network.

3. Experiment and result analysis

Flow classification experiment based on inverse error feedback kernel

After the data processing is completed, the flow classification experiment based on the inverse error feedback core starts to utilize the matlab to simulate the instruction tool box of the inverse error feedback core to create the inverse error feedback core. The first problem is to determine the number of nodes contained in a single-layer reverse error feedback core hidden layer, and obtain the proper number of the nodes in the hidden layer by using the empirical formula 27 determined according to the number of the nodes contained in the hidden layer, wherein m represents the number of the nodes in the hidden layer, n represents the number of the nodes in an access layer, and 1 represents the number of the nodes in an output layer, wherein the proper number of the nodes in the hidden layer is 15 by using the reverse error feedback core classification core:

after the number of nodes contained in the hidden layer is determined, a level iterative optimization algorithm is adopted to serve as an oblique quantity recursion algorithm, and a training attribute set and a result set are transmitted into an established inverse error feedback core to train.

In the process of training the inverse error feedback core structure, the connection weight among the feedback cores of the inverse error feedback core and the critical value of each feedback core can be automatically adjusted and corrected along with the deep training of the training set, so that the inverse error feedback core structure with the algorithm calculation result closest to the target output is achieved.

In the training process, an inverse error feedback kernel is set to divide a training set into a plurality of subsets of Train, validation and Test according to the proportions of 70%, 15% and 15%, the inverse error feedback kernel carries out regression fit training by using a Train set, the Test set carries out Test on an intermediate model of the inverse error feedback kernel obtained after each iteration training along with the continuous increase of the number of iterations of the inverse error feedback kernel, the Validation set is used as the verification of the inverse error feedback kernel model, the model divides the input training set into the three sets, the purpose of avoiding the occurrence of fitting phenomenon in the training process is to be achieved, after each iteration, the inverse error feedback kernel model calculates respective mean square errors according to a plurality of subsets of Train, validation and Test, if the inverse error feedback kernel does not have fitting phenomenon in the training process, the mean square errors gradually decrease along with the increasing of the training degree, but the mean square errors of the Train set always decrease, the Test kernel and the error kernel are used for verifying the error feedback kernel when the number of iterations is used for verifying, the error feedback kernel is not used for verifying, and if the error is completely fitted, and the error is completely fitted by the model is completely tested, and if the error is completely fitted by the training model is completely, and if the error is completely increased, and the error is completely fitted by the training is completely and the training result is completely tested by the iteration.

And after training, the specific network flow classification accuracy of the reverse error feedback core classification core obtained in the previous step is evaluated and verified by the input data verification set, the sim function in the matlab reverse error feedback core tool box is utilized for carrying out network flow classification accuracy verification on the trained reverse error feedback core model, and the input data verification set is used for obtaining a verification result and calculating the accuracy rate.

(II) flow classification experiment II based on condition fitting feature kernel

After data processing is performed on the flow classification experiment based on the condition fitting feature kernel, after a data set is imported into a memory, the prior probability P (c) of each category is calculated first, and Laplacian correction is performed to avoid the influence of zero data on a final result.

Then, the training set is led into a conditional fitting feature kernel classifying kernel training function to train classifying kernels, in the training process, the conditional probability P (x|c) of each data attribute relative to the classifying kernel c is calculated, if discrete data is obtained, the conditional probability is calculated by the root, if continuous data is obtained, the calculated conditional probability of each attribute and the prior probability obtained by each class are respectively stored in a matlab structure, and the function output structure represents the training result of training the conditional fitting feature kernel network flow classifying kernels through the input data training set.

After the condition fitting feature core network flow classification core is trained, the performance and the classification accuracy of the classification core are required to be verified, a verification set is transmitted into a verification function, a prediction result and an actual result are respectively stored in two matrixes, bayesian condition probability P (c|x) of a sample in a verification data set is calculated, the prediction classification result is determined according to the size of the prediction classification result, and the accuracy is obtained by comparing the prediction result with the actual classification result. In actual calculation, considering that the conditional probability of some attribute values is too small, when the attribute number is 205, the result overflows down, that is, exceeds the minimum positive value which can be calculated by matlab, so the matlab automatically judges the calculated conditional probability as 0, and therefore, in each cycle, the conditional probability in the classification core needs to be expanded by 10 times, namely, the final result is multiplied by the power of 205 of 10 to obtain the final result.

And finally, drawing an experimental result graph, drawing a flow ratio graph and a classification accuracy statistical graph of each flow, and calculating the overall accuracy, wherein the steps are specific experimental steps of the network flow classification experiment based on the condition fitting feature kernel.

(III) analysis of Experimental results

Based on the experimental process of experiment one and experiment two, the same flow data set is used, and after the dirty data is processed and denoised, the method comprises the following steps of: 2, dividing a training set of the network flow classification core and a verification set, wherein the training set is used for training the network flow classification core, the verification set is used for verifying the flow classification result of the network flow classification core, and finally, the network flow data set is checked by using two different algorithm-trained classification cores to carry out network flow identification classification, so that the experimental results of the flow classification experiment based on the inverse error feedback core and the flow classification experiment based on the condition fitting feature core are obtained.

Comprehensive accuracy of two algorithms: the comprehensive accuracy of the flow classification experiment based on the inverse error feedback core is 99.76%, and the comprehensive accuracy of the flow classification experiment based on the condition fitting characteristic core is 97.95%.

1. Flow classification result based on inverse error feedback kernel

The inverse error feedback core framework is composed of 205 input layer nodes, 15 hidden layer nodes and 1 output layer node, training time is about 4 minutes, along with the increase of iteration times, the overall mean square error is fast reduced in the process of the initial several iterations, the later period is gradually gentle, the error of the optimal inverse error feedback core structure is smaller and smaller, mean square error of Train set is always reduced, mean square error of Test set is started to rise at 15 th iteration, validation set is started to rise after 19 th iteration, and continuous 6 rises are reached at 25 th iteration, so that algorithm judges that inverse error feedback core training over fitting phenomenon has occurred at this time, continuous training is stopped, and the joint of two broken lines in the figure is the training iteration result with optimal performance.

Fig. 4 shows the linear regression equations of three sets of Train set, validation set and Test set and the final integrated linear regression equation formed during the inverse error feedback kernel training process. According to the results displayed in the pictures, the network flow identification method based on the inverse error feedback core has higher classification efficiency on actual network flow data, and effectively avoids the occurrence of the over-fitting phenomenon through the Validation set.

2. Flow classification experimental result based on condition fitting feature kernel

FIG. 5 shows the prediction accuracy of each flow, wherein the prediction accuracy of the "DATABASE" and "FTP-DATA" flows is 100%, the prediction accuracy of the "WWW" and "MAIL" flows is close to 100%, the accuracy of the "FTP-CONTROL" flows is above 95%, these flows almost occupy 95% of the flow DATA set, for the flow categories with too little flow DATA (such as "MULTMEDIA" and "FTP-PASV" flows) and even no flow (such as "GAMS" and "INTERATIVE" flows), the classification kernels cannot be sufficiently trained on these categories, so the accuracy of the prediction results of these category flows is slightly lower, and therefore, it can be seen that the recognition accuracy of the condition fitting feature kernel classification kernels is higher in the case of sufficient training current flow DATA, and cannot be sufficiently trained in the case of insufficient flow DATA, so that higher accuracy cannot be achieved, but the comprehensive accuracy reaches 97.95%, and performs well. Although the classification accuracy is slightly lower than that of the flow classification method based on the inverse error feedback kernel, the classification efficiency is higher.

The two flow classification cores are compared, excellent classification efficiency and accuracy are shown in the experiment, and classification accuracy and efficiency can be well complemented. In an actual flow identification environment, the flow data volume is larger, and the data training is more sufficient, so that the two classification kernels have strong adaptability and complementarity, and have wide development prospects in the field.

Claims

1. The dual-core cooperation SDN big data network flow accurate classification method is characterized in that a dual-core cooperation data flow classification model is built in a software defined network, the built dual cores are an inverse error feedback core and a condition fitting feature core respectively, after network flow information is acquired, the acquired network flow information is accurately classified by using a classifier constructed by the inverse error feedback core and the condition fitting feature core, the network flow classification accuracy is improved based on the inverse error feedback core, the network flow classification speed is improved based on the condition fitting feature core, and the network overall classification efficiency is improved by the aid of the two;

the SDN controller directly obtains flow information, carries out preprocessing denoising on the information, removes data with missing attributes and samples which cannot be used as classification basis, obtains a group of data sets consisting of discrete or continuous values, randomly divides each judging type data in the original data set into a training set and a testing set according to a random algorithm in a proportion of 8:2, carries out inverse error feedback core training convergence by defining a connection weight w and a critical value theta according to the modes of formulas 1 and 2 for each feedback core in the method, carries out autonomous adjustment and correction on the connection weight between each feedback core of the inverse error feedback core and the critical value of each feedback core along with the deep training of the training set in the process of training the inverse error feedback core structure so as to achieve the inverse error feedback core structure with the algorithm calculation result closest to the target output, carries out weighting adjustment on the current feedback core node weight for a group of samples (x, y) in the training set, wherein the calculation output result of the current feedback core node is y', and the current feedback core node weight is weighted _i For sample x, the value component on the ith feedback kernel attribute:

w _i ＝w _i +mu type 1

μ＝γ(y-y′)x _i 2, 2

2. The dual-core collaborative SDN big data network traffic precision classification method of claim 1, wherein an inverse error feedback core learning process based on SDN: a single hidden layer inverse error feedback kernel of a (D, n, 1) architecture is employed, assuming a given traffic data training set d= { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…，(x _m ,y _m ) Each input example is described by d flow attributes, the number of output nodes is 1, namely the type of the flow determined by the final inverse error feedback kernel algorithm is represented by y, the number of input layer nodes is d, and the number of input layer intermediate nodes is x _i Indicating that n hidden layer nodes are used, and b is used for hidden layer intermediate nodes _j The node connection weight of the output layer is represented by w, the connection weight of the hidden layer is represented by v, and the feedback core activation function adopts Sigmoid function inverse error feedback core learning rate gamma;

the input of the output feedback core of the output layer is:

find the inverse error feedback kernel at (x) _i Variance over y):

The method comprises the following steps:

Critical value: θ _y = - γy ' (1-y ') (y-y ') formula 11

3. The dual-core collaborative SDN big data network flow precise classification method of claim 1, wherein in an inverse error feedback core training model, a level iterative optimization algorithm is adopted as an inverse error feedback core diagonal recursion strategy, and in a multidimensional nonlinear programming, an initial value is defined as X ^k Finding an estimate better than the initial value, each time by delta ^k For X ^k Updating is performed, and according to the taylor formula, equation 15 is obtained:

f(X ^k +Δ ^k )＝f(X ^k )+J _X kΔ ^k 15 of the formula

let right formula be 0, e _k ＝Y-f(X ^k ) M, N is the normalized equation corresponding coefficient,obtaining X ^k+1 And X is ^k Iterative relation 19 between:

wherein:

4. The dual-core collaborative SDN big data network traffic precision classification method of claim 2, wherein the inverse error feedback core SDN classification training process:

the process comprises the following steps: init (); determining feedback core initial connection weight and initial critical value in inverse error feedback core;

do{

countE (); the result E of equation 5;

l_mregress (); continuously updating the connection weight and the critical value of the feedback core according to the formula 10-13, and performing oblique recursive regression by using a level iterative optimization algorithm;

while (|ismin ())// equation 14 reaches a minimum value

5. The dual-core collaborative SDN big data network traffic precision classification method of claim 1, wherein an SDN-based condition fit feature core learning process: normalizing flow attribute denoising in a software defined network into a plurality of discrete or continuous attribute data sets, dividing the data sets into a training set and a verification set, and defining a training set D= { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，…，(x _m ，y _m ) D is the number of samples, and the set yields a conditional independence assumption based on network traffic attributes in a software defined network environment, as in equation 22:

wherein D is _c Representing the set of all samples with a value c in the traffic training set,representation D _c The samples in (a) take a value x on the ith attribute _i Is composed of a set of samples。

6. The dual-core collaborative SDN big data network traffic precision classification method of claim 5, wherein: in actual software defined network traffic data, there is a large amount of 0-value data or P (x) _i The discrete data of c) is processed by adopting a conditional fitting characteristic kernel algorithm to smooth, laplacian correction is carried out, and the prior probability P (c) and the conditional probability P (x) of flow data are obtained _i The calculation of c) varies to formula 24 and formula 25:

7. The dual-core collaborative SDN big data network traffic precision classification method of claim 1, wherein: and (3) a condition fitting feature kernel SDN classification training and verification process: