Nothing Special   »   [go: up one dir, main page]

CN111510422B - Identity authentication method based on terminal information extension sequence and random forest model - Google Patents

Identity authentication method based on terminal information extension sequence and random forest model Download PDF

Info

Publication number
CN111510422B
CN111510422B CN202010020123.7A CN202010020123A CN111510422B CN 111510422 B CN111510422 B CN 111510422B CN 202010020123 A CN202010020123 A CN 202010020123A CN 111510422 B CN111510422 B CN 111510422B
Authority
CN
China
Prior art keywords
random forest
end information
forest model
information extension
identity authentication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010020123.7A
Other languages
Chinese (zh)
Other versions
CN111510422A (en
Inventor
段鹏飞
石乐义
兰茹
宋煜枭
侯博文
刘祎豪
马荣
徐兴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202010020123.7A priority Critical patent/CN111510422B/en
Publication of CN111510422A publication Critical patent/CN111510422A/en
Application granted granted Critical
Publication of CN111510422B publication Critical patent/CN111510422B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Collating Specific Patterns (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an identity authentication method based on an end information extension sequence and a random forest Model, which is characterized in that the end information extension sequence with obvious main characteristics is generated at a client based on a gene function F (x) with strong autocorrelation, a random forest Model is adopted at a server to perform supervised learning on a data set consisting of the end information extension sequences generated by legal clients, and the monitored end information extension sequences are classified by using a random forest Model RF-Model obtained after training, so that the identity authentication of the legal clients is realized. The method fully utilizes the autocorrelation of the end information extension sequence, hides the main characteristics of the data set, and increases the difficulty of an attacker in analyzing the network data flow. The random forest model obtained by training has the characteristics of high classification precision and high authentication efficiency, and provides a new identity authentication thought for a secure communication system based on an end information extension sequence.

Description

Identity authentication method based on terminal information extension sequence and random forest model
Technical Field
The invention relates to an identity authentication method based on an end information extension sequence and a random forest model, aims to realize identity authentication of a legal user in a complex network environment, and belongs to the technical field of network security.
Background
The end information expansion means that communication content or legal identity information is converted through an end information expansion algorithm, a sequence is formed by a plurality of items of end information to represent one piece of information, and each item of end information is irrelevant to the transmitted information. The client modulates the communication content or the legal identity authentication information by using the concealment and analysis resistance of the end information extension sequence through an end information extension algorithm, so that the communication content or the legal identity authentication information is concealed in the end information extension sequence and is sent to a complex network. The server monitors the network data stream and identifies the end information extension sequence sent by the credible client, and the communication content sent by the client is obtained or the identity authentication of the legal client is realized by demodulating the identified end information extension sequence.
Random forests, as a new emerging machine learning algorithm with high flexibility, can bring good results under most conditions even if no hyper-parameter adjustment is carried out, can be used for executing classification and regression tasks, and has wide application prospects. The random forest is an integrated algorithm (Ensemble Learning), and the final result is obtained by voting or averaging through combining a plurality of weak classifiers, so that the result of the overall model has higher accuracy and generalization performance. The random forest can process high-dimensional characteristic data sets, has strong adaptability to the data sets, and can process both discrete data and continuous data. Because each tree can be independently and simultaneously generated, the method is easy to be made into a parallelization method, and has a fast learning speed.
The authentication technology is the first technology of network space information safety at present, and the safe and reliable identity authentication technology is used to have important significance for ensuring the normal operation of system services. In the existing secure communication system based on the end information extension sequence, the end information extension sequence generated by the client has weak analysis resistance and poor autocorrelation, and the identity authentication method of the server excessively depends on the network data stream characteristics and cannot perform personalized identification and demodulation on each legal user. In order to fully utilize the autocorrelation of the end information extension sequence and improve the personalized demodulation capability of the server, the invention uses the end information extension sequence generated by each legal user as a data set to carry out learning training based on the random forest model, and classifies the end information extension sequence sent by the client by the trained random forest model, thereby realizing the identity authentication of the legal user. The invention gives full play to the advantage of the anti-analysis capability of the end information extension sequence, utilizes the network data stream characteristics to carry out modeling analysis on a legal client, greatly improves the anti-interception and anti-analysis capabilities of the end information extension sequence, enables the end information extension sequence to be better suitable for a complex network environment, and provides a new identity authentication thought for a safe communication system based on the end information extension sequence.
Disclosure of Invention
In order to fully utilize the autocorrelation of the end information extension sequence, the invention is based on a random forest model, the end information extension sequence generated by each legal user is used as a data set for learning and training, and the trained random forest model classifies the end information extension sequence sent by the client, thereby realizing the identity authentication of the legal user. The training set characteristics used by the present invention include a flag bit (denoted as IpId), a source port number (denoted as SrcPort), a source IP (denoted as SrcIp), a destination port number (denoted as DstPort), and a destination IP (denoted as DstIp) in the IP data packet header. The random forest model needs to evaluate the importance of each feature in the training and learning process, screens out features with higher importance and improves the prediction effect of the sample, wherein the features with higher importance are IpId and SrcPort. The invention provides a new idea of identity authentication for a secure communication system based on an end information extension sequence, which is characterized by comprising the following steps:
(1) inputting a gene function F (x) by a client;
(2) the end information spreading sequence generation algorithm generates end information IpId, SrcPort and the like with good autocorrelation based on F (x);
(3) loading the end information IpId, SrcPort and the like generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … …, ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;
(4) the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;
(5) a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;
(6) inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;
(7) analyzing the classification result in the step (6) to obtain a legal client requesting identity authentication and providing personalized service;
according to the method, the terminal information IpId, the SrcPort, the SrcIP, the DstPort and the DstIP are used as data set characteristics, wherein the IpId and the SrcPort are used as main characteristics, and due to the randomness and the concealment of the two terminal information, a network data stream formed by a terminal information extension sequence is not easy to extract the main characteristics by an attacker, so that the difficulty of analyzing the network data stream by the attacker is increased. The invention generates the end information extension sequences based on the specific gene functions of the legal users, so that the end information extension sequences have good autocorrelation, fully utilizes the correlation among the end information extension sequences, and extracts the characteristics of the end information extension sequences based on the random forest model, thereby greatly improving the authentication precision. The random forest model has the characteristics of high classification precision and high learning speed, and can improve the authentication precision and the identity authentication efficiency of legal users.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and specific embodiments:
FIG. 1 is a schematic diagram of a random forest model.
FIG. 2 is a flow chart of identity authentication based on an end information spreading sequence and a random forest model.
Detailed Description
In order to make the objects, techniques and advantages of the present invention more apparent, the present invention will be described in detail and fully hereinafter with reference to the accompanying drawings.
Step (1): inputting a gene function F (x) by a client;
in the step (1), the client inputs the gene function f (x), and the f (x) is required to have good autocorrelation, so that good autocorrelation is ensured to be kept between the end information extension sequences generated by the legal users.
Step (2): the end information spreading sequence generation algorithm generates end information IpId, SrcPort and the like with good autocorrelation based on F (x);
and (3): loading the end information IpId, SrcPort and the like generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … …, ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;
and (4): the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;
in the step (4), the server side adopts the random forest Model to train and learn the data set formed by the end information extension sequences generated by the legal clients, the server side needs to generate the data set with the marks according to the gene functions held by the legal clients, and then the data set is input into the random forest Model to be supervised and learned, so that the random forest Model RF-Model is obtained.
And (5): a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;
and (6): inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;
and (7): analyzing the classification result in the step (6) to obtain a legal client requesting identity authentication and provide personalized service;
in the step (7), the classification result in the step (6) is analyzed, and a legal client with the highest occurrence frequency in the classification result is required to be used as a final identity authentication result.
According to the method, the terminal information IpId, the SrcPort, the SrcIP, the DstPort and the DstIP are used as data set characteristics, wherein the IpId and the SrcPort are used as main characteristics, and due to the randomness and the concealment of the two terminal information, a network data stream formed by a terminal information extension sequence is not easy to extract the main characteristics by an attacker, so that the difficulty of analyzing the network data stream by the attacker is increased. The invention generates the end information extension sequences based on the specific gene functions of the legal users, so that the end information extension sequences have good autocorrelation, fully utilizes the correlation among the end information extension sequences, and extracts the characteristics of the end information extension sequences based on the random forest model, thereby greatly improving the authentication precision. The random forest model has the characteristics of high classification precision and high learning speed, and can improve the authentication precision and the identity authentication efficiency of legal users.

Claims (3)

1. An identity authentication method based on an end information spreading sequence and a random forest model is characterized by comprising the following steps:
(1) inputting a gene function F (x) by a client;
(2) generating end information Ipid and SrcPort with good autocorrelation by an end information spreading sequence generating algorithm based on F (x);
(3) loading the end information Ipid and SrcPort generated in the step (2) into a socket data packet so as to obtain an end information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN }, and sending the end information extension sequence to a network environment by using a socket;
(4) the server side adopts a random forest Model to train and learn a data set formed by end information extension sequences generated by all legal clients to obtain a random forest Model RF-Model;
(5) a server side monitors a terminal information extension sequence { ExtendedSeq 1, ExtendedSeq 2, … … and ExtendedSeqN } sent by a client side in a network environment, extracts terminal information in a data packet and adds the terminal information to a test set;
(6) inputting the test set obtained in the step (5) into the random forest Model RF-Model obtained in the step (4) for classification;
(7) and (6) analyzing the classification result to obtain a legal client requesting identity authentication and providing personalized service.
2. The identity authentication method based on the terminal information extension sequence and the random forest model as claimed in claim 1, wherein the method comprises inputting a genetic function F (x) at a client, wherein F (x) is required to have good autocorrelation, so that good autocorrelation is ensured to be kept between terminal information extension sequences generated by legal users.
3. The identity authentication method based on the terminal information extension sequence and the random forest Model as claimed in claim 1, wherein the method comprises training and learning a data set composed of the terminal information extension sequences generated by the legal clients by using the random forest Model at the server side, generating a data set with labels by the server side according to gene functions held by the legal clients, and inputting the data set into the random forest Model for supervised learning, thereby obtaining the random forest Model RF-Model.
CN202010020123.7A 2020-01-09 2020-01-09 Identity authentication method based on terminal information extension sequence and random forest model Expired - Fee Related CN111510422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010020123.7A CN111510422B (en) 2020-01-09 2020-01-09 Identity authentication method based on terminal information extension sequence and random forest model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010020123.7A CN111510422B (en) 2020-01-09 2020-01-09 Identity authentication method based on terminal information extension sequence and random forest model

Publications (2)

Publication Number Publication Date
CN111510422A CN111510422A (en) 2020-08-07
CN111510422B true CN111510422B (en) 2021-07-09

Family

ID=71864620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010020123.7A Expired - Fee Related CN111510422B (en) 2020-01-09 2020-01-09 Identity authentication method based on terminal information extension sequence and random forest model

Country Status (1)

Country Link
CN (1) CN111510422B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682686A (en) * 2016-12-09 2017-05-17 北京拓明科技有限公司 User gender prediction method based on mobile phone Internet-surfing behavior

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9904916B2 (en) * 2015-07-01 2018-02-27 Klarna Ab Incremental login and authentication to user portal without username/password
CN109861957A (en) * 2018-11-06 2019-06-07 中国科学院信息工程研究所 A kind of the user behavior fining classification method and system of the privately owned cryptographic protocol of mobile application
CN109660656A (en) * 2018-11-20 2019-04-19 重庆邮电大学 A kind of intelligent terminal method for identifying application program
CN110245693B (en) * 2019-05-30 2023-04-07 北京理工大学 Key information infrastructure asset identification method combined with mixed random forest

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682686A (en) * 2016-12-09 2017-05-17 北京拓明科技有限公司 User gender prediction method based on mobile phone Internet-surfing behavior

Also Published As

Publication number Publication date
CN111510422A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN105871832A (en) Network application encrypted traffic recognition method and device based on protocol attributes
Fu et al. Semi-supervised specific emitter identification method using metric-adversarial training
Wang et al. Multilevel identification and classification analysis of Tor on mobile and PC platforms
CN110460502B (en) Application program flow identification method under VPN based on distributed feature random forest
CN111953669B (en) Tor flow tracing and application type identification method and system suitable for SDN
Yang et al. Research on network traffic identification based on machine learning and deep packet inspection
CN110798314B (en) Quantum key distribution parameter optimization method based on random forest algorithm
Aceto et al. Traffic classification of mobile apps through multi-classification
CN114866486B (en) Encryption traffic classification system based on data packet
Lin et al. A novel multimodal deep learning framework for encrypted traffic classification
CN115086055B (en) Detection device and method for encrypting malicious traffic of android mobile device
Wang et al. Characterizing application behaviors for classifying p2p traffic
Zhang et al. An automatic and efficient malware traffic classification method for secure Internet of Things
Liang et al. FECC: DNS tunnel detection model based on CNN and clustering
CN111510422B (en) Identity authentication method based on terminal information extension sequence and random forest model
Altschaffel et al. Statistical pattern recognition based content analysis on encrypted network: Traffic for the teamviewer application
Ma et al. A Multi-Perspective Feature Approach to Few-Shot Classification of IoT Traffic
CN112383488B (en) Content identification method suitable for encrypted and non-encrypted data streams
Zheng et al. Detecting malicious tls network traffic based on communication channel features
Qin et al. MUCM: multilevel user cluster mining based on behavior profiles for network monitoring
Tavallaee et al. Online classification of network flows
Sajeev et al. LASER: A novel hybrid peer to peer network traffic classification technique
CN102833255B (en) Skype speech flow extraction method based on time-frequency analysis
CN111371727A (en) Detection method for NTP protocol covert communication
Zhang et al. Wi-Fi device identification based on multi-domain physical layer fingerprint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210709

Termination date: 20220109