CN111510422B

CN111510422B - Identity authentication method based on terminal information extension sequence and random forest model

Info

Publication number: CN111510422B
Application number: CN202010020123.7A
Authority: CN
Inventors: 段鹏飞; 石乐义; 兰茹; 宋煜枭; 侯博文; 刘祎豪; 马荣; 徐兴华
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2020-01-09
Filing date: 2020-01-09
Publication date: 2021-07-09
Anticipated expiration: 2040-01-09
Also published as: CN111510422A

Abstract

The invention relates to an identity authentication method based on an end information extension sequence and a random forest Model, which is characterized in that the end information extension sequence with obvious main characteristics is generated at a client based on a gene function F (x) with strong autocorrelation, a random forest Model is adopted at a server to perform supervised learning on a data set consisting of the end information extension sequences generated by legal clients, and the monitored end information extension sequences are classified by using a random forest Model RF-Model obtained after training, so that the identity authentication of the legal clients is realized. The method fully utilizes the autocorrelation of the end information extension sequence, hides the main characteristics of the data set, and increases the difficulty of an attacker in analyzing the network data flow. The random forest model obtained by training has the characteristics of high classification precision and high authentication efficiency, and provides a new identity authentication thought for a secure communication system based on an end information extension sequence.

Description

Identity authentication method based on terminal information extension sequence and random forest model

Technical Field

The invention relates to an identity authentication method based on an end information extension sequence and a random forest model, aims to realize identity authentication of a legal user in a complex network environment, and belongs to the technical field of network security.

Background

The end information expansion means that communication content or legal identity information is converted through an end information expansion algorithm, a sequence is formed by a plurality of items of end information to represent one piece of information, and each item of end information is irrelevant to the transmitted information. The client modulates the communication content or the legal identity authentication information by using the concealment and analysis resistance of the end information extension sequence through an end information extension algorithm, so that the communication content or the legal identity authentication information is concealed in the end information extension sequence and is sent to a complex network. The server monitors the network data stream and identifies the end information extension sequence sent by the credible client, and the communication content sent by the client is obtained or the identity authentication of the legal client is realized by demodulating the identified end information extension sequence.

Random forests, as a new emerging machine learning algorithm with high flexibility, can bring good results under most conditions even if no hyper-parameter adjustment is carried out, can be used for executing classification and regression tasks, and has wide application prospects. The random forest is an integrated algorithm (Ensemble Learning), and the final result is obtained by voting or averaging through combining a plurality of weak classifiers, so that the result of the overall model has higher accuracy and generalization performance. The random forest can process high-dimensional characteristic data sets, has strong adaptability to the data sets, and can process both discrete data and continuous data. Because each tree can be independently and simultaneously generated, the method is easy to be made into a parallelization method, and has a fast learning speed.

The authentication technology is the first technology of network space information safety at present, and the safe and reliable identity authentication technology is used to have important significance for ensuring the normal operation of system services. In the existing secure communication system based on the end information extension sequence, the end information extension sequence generated by the client has weak analysis resistance and poor autocorrelation, and the identity authentication method of the server excessively depends on the network data stream characteristics and cannot perform personalized identification and demodulation on each legal user. In order to fully utilize the autocorrelation of the end information extension sequence and improve the personalized demodulation capability of the server, the invention uses the end information extension sequence generated by each legal user as a data set to carry out learning training based on the random forest model, and classifies the end information extension sequence sent by the client by the trained random forest model, thereby realizing the identity authentication of the legal user. The invention gives full play to the advantage of the anti-analysis capability of the end information extension sequence, utilizes the network data stream characteristics to carry out modeling analysis on a legal client, greatly improves the anti-interception and anti-analysis capabilities of the end information extension sequence, enables the end information extension sequence to be better suitable for a complex network environment, and provides a new identity authentication thought for a safe communication system based on the end information extension sequence.

Disclosure of Invention

In order to fully utilize the autocorrelation of the end information extension sequence, the invention is based on a random forest model, the end information extension sequence generated by each legal user is used as a data set for learning and training, and the trained random forest model classifies the end information extension sequence sent by the client, thereby realizing the identity authentication of the legal user. The training set characteristics used by the present invention include a flag bit (denoted as IpId), a source port number (denoted as SrcPort), a source IP (denoted as SrcIp), a destination port number (denoted as DstPort), and a destination IP (denoted as DstIp) in the IP data packet header. The random forest model needs to evaluate the importance of each feature in the training and learning process, screens out features with higher importance and improves the prediction effect of the sample, wherein the features with higher importance are IpId and SrcPort. The invention provides a new idea of identity authentication for a secure communication system based on an end information extension sequence, which is characterized by comprising the following steps:

(1) inputting a gene function F (x) by a client;

(2) the end information spreading sequence generation algorithm generates end information IpId, SrcPort and the like with good autocorrelation based on F (x);

(3) loading the end information IpId, SrcPort and the like generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … …, ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;

(4) the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;

(5) a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;

(6) inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;

(7) analyzing the classification result in the step (6) to obtain a legal client requesting identity authentication and providing personalized service;

according to the method, the terminal information IpId, the SrcPort, the SrcIP, the DstPort and the DstIP are used as data set characteristics, wherein the IpId and the SrcPort are used as main characteristics, and due to the randomness and the concealment of the two terminal information, a network data stream formed by a terminal information extension sequence is not easy to extract the main characteristics by an attacker, so that the difficulty of analyzing the network data stream by the attacker is increased. The invention generates the end information extension sequences based on the specific gene functions of the legal users, so that the end information extension sequences have good autocorrelation, fully utilizes the correlation among the end information extension sequences, and extracts the characteristics of the end information extension sequences based on the random forest model, thereby greatly improving the authentication precision. The random forest model has the characteristics of high classification precision and high learning speed, and can improve the authentication precision and the identity authentication efficiency of legal users.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and specific embodiments:

FIG. 1 is a schematic diagram of a random forest model.

FIG. 2 is a flow chart of identity authentication based on an end information spreading sequence and a random forest model.

Detailed Description

In order to make the objects, techniques and advantages of the present invention more apparent, the present invention will be described in detail and fully hereinafter with reference to the accompanying drawings.

Step (1): inputting a gene function F (x) by a client;

in the step (1), the client inputs the gene function f (x), and the f (x) is required to have good autocorrelation, so that good autocorrelation is ensured to be kept between the end information extension sequences generated by the legal users.

Step (2): the end information spreading sequence generation algorithm generates end information IpId, SrcPort and the like with good autocorrelation based on F (x);

and (3): loading the end information IpId, SrcPort and the like generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … …, ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;

and (4): the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;

in the step (4), the server side adopts the random forest Model to train and learn the data set formed by the end information extension sequences generated by the legal clients, the server side needs to generate the data set with the marks according to the gene functions held by the legal clients, and then the data set is input into the random forest Model to be supervised and learned, so that the random forest Model RF-Model is obtained.

And (5): a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;

and (6): inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;

and (7): analyzing the classification result in the step (6) to obtain a legal client requesting identity authentication and provide personalized service;

in the step (7), the classification result in the step (6) is analyzed, and a legal client with the highest occurrence frequency in the classification result is required to be used as a final identity authentication result.

Claims

1. An identity authentication method based on an end information spreading sequence and a random forest model is characterized by comprising the following steps:

(1) inputting a gene function F (x) by a client;

(2) generating end information Ipid and SrcPort with good autocorrelation by an end information spreading sequence generating algorithm based on F (x);

(3) loading the end information Ipid and SrcPort generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;

(7) and (6) analyzing the classification result to obtain a legal client requesting identity authentication and providing personalized service.

2. The identity authentication method based on the terminal information extension sequence and the random forest model as claimed in claim 1, wherein the method comprises inputting a genetic function F (x) at a client, wherein F (x) is required to have good autocorrelation, so that good autocorrelation is ensured to be kept between terminal information extension sequences generated by legal users.

3. The identity authentication method based on the terminal information extension sequence and the random forest Model as claimed in claim 1, wherein the method comprises training and learning a data set composed of the terminal information extension sequences generated by the legal clients by using the random forest Model at the server side, generating a data set with labels by the server side according to gene functions held by the legal clients, and inputting the data set into the random forest Model for supervised learning, thereby obtaining the random forest Model RF-Model.