CN111510422B - Identity authentication method based on terminal information extension sequence and random forest model - Google Patents
Identity authentication method based on terminal information extension sequence and random forest model Download PDFInfo
- Publication number
- CN111510422B CN111510422B CN202010020123.7A CN202010020123A CN111510422B CN 111510422 B CN111510422 B CN 111510422B CN 202010020123 A CN202010020123 A CN 202010020123A CN 111510422 B CN111510422 B CN 111510422B
- Authority
- CN
- China
- Prior art keywords
- random forest
- end information
- forest model
- information extension
- identity authentication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Collating Specific Patterns (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an identity authentication method based on an end information extension sequence and a random forest Model, which is characterized in that the end information extension sequence with obvious main characteristics is generated at a client based on a gene function F (x) with strong autocorrelation, a random forest Model is adopted at a server to perform supervised learning on a data set consisting of the end information extension sequences generated by legal clients, and the monitored end information extension sequences are classified by using a random forest Model RF-Model obtained after training, so that the identity authentication of the legal clients is realized. The method fully utilizes the autocorrelation of the end information extension sequence, hides the main characteristics of the data set, and increases the difficulty of an attacker in analyzing the network data flow. The random forest model obtained by training has the characteristics of high classification precision and high authentication efficiency, and provides a new identity authentication thought for a secure communication system based on an end information extension sequence.
Description
Technical Field
The invention relates to an identity authentication method based on an end information extension sequence and a random forest model, aims to realize identity authentication of a legal user in a complex network environment, and belongs to the technical field of network security.
Background
The end information expansion means that communication content or legal identity information is converted through an end information expansion algorithm, a sequence is formed by a plurality of items of end information to represent one piece of information, and each item of end information is irrelevant to the transmitted information. The client modulates the communication content or the legal identity authentication information by using the concealment and analysis resistance of the end information extension sequence through an end information extension algorithm, so that the communication content or the legal identity authentication information is concealed in the end information extension sequence and is sent to a complex network. The server monitors the network data stream and identifies the end information extension sequence sent by the credible client, and the communication content sent by the client is obtained or the identity authentication of the legal client is realized by demodulating the identified end information extension sequence.
Random forests, as a new emerging machine learning algorithm with high flexibility, can bring good results under most conditions even if no hyper-parameter adjustment is carried out, can be used for executing classification and regression tasks, and has wide application prospects. The random forest is an integrated algorithm (Ensemble Learning), and the final result is obtained by voting or averaging through combining a plurality of weak classifiers, so that the result of the overall model has higher accuracy and generalization performance. The random forest can process high-dimensional characteristic data sets, has strong adaptability to the data sets, and can process both discrete data and continuous data. Because each tree can be independently and simultaneously generated, the method is easy to be made into a parallelization method, and has a fast learning speed.
The authentication technology is the first technology of network space information safety at present, and the safe and reliable identity authentication technology is used to have important significance for ensuring the normal operation of system services. In the existing secure communication system based on the end information extension sequence, the end information extension sequence generated by the client has weak analysis resistance and poor autocorrelation, and the identity authentication method of the server excessively depends on the network data stream characteristics and cannot perform personalized identification and demodulation on each legal user. In order to fully utilize the autocorrelation of the end information extension sequence and improve the personalized demodulation capability of the server, the invention uses the end information extension sequence generated by each legal user as a data set to carry out learning training based on the random forest model, and classifies the end information extension sequence sent by the client by the trained random forest model, thereby realizing the identity authentication of the legal user. The invention gives full play to the advantage of the anti-analysis capability of the end information extension sequence, utilizes the network data stream characteristics to carry out modeling analysis on a legal client, greatly improves the anti-interception and anti-analysis capabilities of the end information extension sequence, enables the end information extension sequence to be better suitable for a complex network environment, and provides a new identity authentication thought for a safe communication system based on the end information extension sequence.
Disclosure of Invention
In order to fully utilize the autocorrelation of the end information extension sequence, the invention is based on a random forest model, the end information extension sequence generated by each legal user is used as a data set for learning and training, and the trained random forest model classifies the end information extension sequence sent by the client, thereby realizing the identity authentication of the legal user. The training set characteristics used by the present invention include a flag bit (denoted as IpId), a source port number (denoted as SrcPort), a source IP (denoted as SrcIp), a destination port number (denoted as DstPort), and a destination IP (denoted as DstIp) in the IP data packet header. The random forest model needs to evaluate the importance of each feature in the training and learning process, screens out features with higher importance and improves the prediction effect of the sample, wherein the features with higher importance are IpId and SrcPort. The invention provides a new idea of identity authentication for a secure communication system based on an end information extension sequence, which is characterized by comprising the following steps:
(1) inputting a gene function F (x) by a client;
(2) the end information spreading sequence generation algorithm generates end information IpId, SrcPort and the like with good autocorrelation based on F (x);
(3) loading the end information IpId, SrcPort and the like generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … …, ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;
(4) the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;
(5) a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;
(6) inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;
(7) analyzing the classification result in the step (6) to obtain a legal client requesting identity authentication and providing personalized service;
according to the method, the terminal information IpId, the SrcPort, the SrcIP, the DstPort and the DstIP are used as data set characteristics, wherein the IpId and the SrcPort are used as main characteristics, and due to the randomness and the concealment of the two terminal information, a network data stream formed by a terminal information extension sequence is not easy to extract the main characteristics by an attacker, so that the difficulty of analyzing the network data stream by the attacker is increased. The invention generates the end information extension sequences based on the specific gene functions of the legal users, so that the end information extension sequences have good autocorrelation, fully utilizes the correlation among the end information extension sequences, and extracts the characteristics of the end information extension sequences based on the random forest model, thereby greatly improving the authentication precision. The random forest model has the characteristics of high classification precision and high learning speed, and can improve the authentication precision and the identity authentication efficiency of legal users.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and specific embodiments:
FIG. 1 is a schematic diagram of a random forest model.
FIG. 2 is a flow chart of identity authentication based on an end information spreading sequence and a random forest model.
Detailed Description
In order to make the objects, techniques and advantages of the present invention more apparent, the present invention will be described in detail and fully hereinafter with reference to the accompanying drawings.
Step (1): inputting a gene function F (x) by a client;
in the step (1), the client inputs the gene function f (x), and the f (x) is required to have good autocorrelation, so that good autocorrelation is ensured to be kept between the end information extension sequences generated by the legal users.
Step (2): the end information spreading sequence generation algorithm generates end information IpId, SrcPort and the like with good autocorrelation based on F (x);
and (3): loading the end information IpId, SrcPort and the like generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … …, ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;
and (4): the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;
in the step (4), the server side adopts the random forest Model to train and learn the data set formed by the end information extension sequences generated by the legal clients, the server side needs to generate the data set with the marks according to the gene functions held by the legal clients, and then the data set is input into the random forest Model to be supervised and learned, so that the random forest Model RF-Model is obtained.
And (5): a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;
and (6): inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;
and (7): analyzing the classification result in the step (6) to obtain a legal client requesting identity authentication and provide personalized service;
in the step (7), the classification result in the step (6) is analyzed, and a legal client with the highest occurrence frequency in the classification result is required to be used as a final identity authentication result.
According to the method, the terminal information IpId, the SrcPort, the SrcIP, the DstPort and the DstIP are used as data set characteristics, wherein the IpId and the SrcPort are used as main characteristics, and due to the randomness and the concealment of the two terminal information, a network data stream formed by a terminal information extension sequence is not easy to extract the main characteristics by an attacker, so that the difficulty of analyzing the network data stream by the attacker is increased. The invention generates the end information extension sequences based on the specific gene functions of the legal users, so that the end information extension sequences have good autocorrelation, fully utilizes the correlation among the end information extension sequences, and extracts the characteristics of the end information extension sequences based on the random forest model, thereby greatly improving the authentication precision. The random forest model has the characteristics of high classification precision and high learning speed, and can improve the authentication precision and the identity authentication efficiency of legal users.
Claims (3)
1. An identity authentication method based on an end information spreading sequence and a random forest model is characterized by comprising the following steps:
(1) inputting a gene function F (x) by a client;
(2) generating end information Ipid and SrcPort with good autocorrelation by an end information spreading sequence generating algorithm based on F (x);
(3) loading the end information Ipid and SrcPort generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;
(4) the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;
(5) a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;
(6) inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;
(7) and (6) analyzing the classification result to obtain a legal client requesting identity authentication and providing personalized service.
2. The identity authentication method based on the terminal information extension sequence and the random forest model as claimed in claim 1, wherein the method comprises inputting a genetic function F (x) at a client, wherein F (x) is required to have good autocorrelation, so that good autocorrelation is ensured to be kept between terminal information extension sequences generated by legal users.
3. The identity authentication method based on the terminal information extension sequence and the random forest Model as claimed in claim 1, wherein the method comprises training and learning a data set composed of the terminal information extension sequences generated by the legal clients by using the random forest Model at the server side, generating a data set with labels by the server side according to gene functions held by the legal clients, and inputting the data set into the random forest Model for supervised learning, thereby obtaining the random forest Model RF-Model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010020123.7A CN111510422B (en) | 2020-01-09 | 2020-01-09 | Identity authentication method based on terminal information extension sequence and random forest model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010020123.7A CN111510422B (en) | 2020-01-09 | 2020-01-09 | Identity authentication method based on terminal information extension sequence and random forest model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111510422A CN111510422A (en) | 2020-08-07 |
CN111510422B true CN111510422B (en) | 2021-07-09 |
Family
ID=71864620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010020123.7A Expired - Fee Related CN111510422B (en) | 2020-01-09 | 2020-01-09 | Identity authentication method based on terminal information extension sequence and random forest model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111510422B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106682686A (en) * | 2016-12-09 | 2017-05-17 | 北京拓明科技有限公司 | User gender prediction method based on mobile phone Internet-surfing behavior |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9904916B2 (en) * | 2015-07-01 | 2018-02-27 | Klarna Ab | Incremental login and authentication to user portal without username/password |
CN109861957A (en) * | 2018-11-06 | 2019-06-07 | 中国科学院信息工程研究所 | A kind of the user behavior fining classification method and system of the privately owned cryptographic protocol of mobile application |
CN109660656A (en) * | 2018-11-20 | 2019-04-19 | 重庆邮电大学 | A kind of intelligent terminal method for identifying application program |
CN110245693B (en) * | 2019-05-30 | 2023-04-07 | 北京理工大学 | Key information infrastructure asset identification method combined with mixed random forest |
-
2020
- 2020-01-09 CN CN202010020123.7A patent/CN111510422B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106682686A (en) * | 2016-12-09 | 2017-05-17 | 北京拓明科技有限公司 | User gender prediction method based on mobile phone Internet-surfing behavior |
Also Published As
Publication number | Publication date |
---|---|
CN111510422A (en) | 2020-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105871832A (en) | Network application encrypted traffic recognition method and device based on protocol attributes | |
Fu et al. | Semi-supervised specific emitter identification method using metric-adversarial training | |
Wang et al. | Multilevel identification and classification analysis of Tor on mobile and PC platforms | |
CN110460502B (en) | Application program flow identification method under VPN based on distributed feature random forest | |
CN111953669B (en) | Tor flow tracing and application type identification method and system suitable for SDN | |
Yang et al. | Research on network traffic identification based on machine learning and deep packet inspection | |
CN110798314B (en) | Quantum key distribution parameter optimization method based on random forest algorithm | |
Aceto et al. | Traffic classification of mobile apps through multi-classification | |
CN114866486B (en) | Encryption traffic classification system based on data packet | |
Lin et al. | A novel multimodal deep learning framework for encrypted traffic classification | |
CN115086055B (en) | Detection device and method for encrypting malicious traffic of android mobile device | |
Wang et al. | Characterizing application behaviors for classifying p2p traffic | |
Zhang et al. | An automatic and efficient malware traffic classification method for secure Internet of Things | |
Liang et al. | FECC: DNS tunnel detection model based on CNN and clustering | |
CN111510422B (en) | Identity authentication method based on terminal information extension sequence and random forest model | |
Altschaffel et al. | Statistical pattern recognition based content analysis on encrypted network: Traffic for the teamviewer application | |
Ma et al. | A Multi-Perspective Feature Approach to Few-Shot Classification of IoT Traffic | |
CN112383488B (en) | Content identification method suitable for encrypted and non-encrypted data streams | |
Zheng et al. | Detecting malicious tls network traffic based on communication channel features | |
Qin et al. | MUCM: multilevel user cluster mining based on behavior profiles for network monitoring | |
Tavallaee et al. | Online classification of network flows | |
Sajeev et al. | LASER: A novel hybrid peer to peer network traffic classification technique | |
CN102833255B (en) | Skype speech flow extraction method based on time-frequency analysis | |
CN111371727A (en) | Detection method for NTP protocol covert communication | |
Zhang et al. | Wi-Fi device identification based on multi-domain physical layer fingerprint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210709 Termination date: 20220109 |