Nothing Special   »   [go: up one dir, main page]

NIDS-CNNLSTM Network Intrusion Detection Classification Model Based On Deep Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Received 25 January 2023, accepted 5 March 2023, date of publication 9 March 2023, date of current version 15 March 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3254915

NIDS-CNNLSTM: Network Intrusion Detection


Classification Model Based on Deep Learning
JIAWEI DU 1, KAI YANG 1, YANJING HU 2, AND LINGJIE JIANG 3
1 School of Computer Science, Xijing University, Xi’an, Shaanxi 710123, China
2 School of Cryptographic Engineering, Engineering University of PAP, Xi’an, Shaanxi 710086, China
3 School of Electronic Information, Xijing University, Xi’an, Shaanxi 710123, China

Corresponding author: Kai Yang (sydeny-001@163.com)


This work was supported by the High-level Talents Special Fund Project of Xijing University in 2022, ‘‘Research on Industrial Internet of
Things Intrusion Detection Technology Based on Deep Learning’’ (XJ22B04).

ABSTRACT Intrusion detection is the core topic of network security, and the intrusion detection algorithm
based on deep learning has become a research hotspot in network security. In this paper, a network intrusion
detection classification model (NIDS-CNNLSTM) based on deep learning is constructed for the wireless
sensing scenario of the Industrial Internet of Things (IIoT) to effectively distinguish and identify network
traffic data and ensure the security of the equipment and operation of the IIoT. NIDS-CNNLSTM combines
the powerful learning ability of long short-term memory neural networks in time series data, learns and
classifies the features selected by the convolutional neural network, and verifies the applicability based on
binary classification and multi-classification scenarios. The model is trained using KDD CUP99, NSL_KDD,
and UNSW_NB15 classic datasets. The verification accuracy and training loss on the three datasets all
show good convergence and level, and the accuracy rate is high when classifying various types of traffic.
The overall performance of NIDS-CNNLSTM has been significantly improved compared with the models
proposed in previous studies. The effectiveness shows a high detection rate and classification accuracy and
a low false alarm rate through experimental results. It is more suitable for large-scale and multi-scenario
network data in the IIoT.

INDEX TERMS Network intrusion detection, deep learning, convolutional neural network, long short-term
memory neural network.

I. INTRODUCTION security protection measure based on traditional static pro-


The Industrial Internet of Things (IIoT) has brought new tection. It realizes real-time protection against internal and
opportunities for global development. However, it will also external intrusions through real-time network monitoring.
bring security risks, such as industrial core data leakage and It is proactive, real-time, dynamic characteristics. Therefore,
illegally manipulating interconnected terminals. Therefore, as a network security protection measure, intrusion detection
the security of the IIoT is facing significant challenges. technology has become a research hotspot in IIoT security.
IIoT security protection mainly adopts active-passive defense Essentially, intrusion detection uses a classifier to distinguish
modes, such as industrial firewall technology and intru- between normal and abnormal data in the data stream to
sion detection technology. Firewall technology is a passive realize the alarm of attack behavior. This classifier [3] can be
defense technology that cannot prohibit the transmission of based on Bayesian [4], decision tree [5], neural network [6],
files and programs with threatening codes, such as viruses and support vector machine [7]. In recent years, the research
or worms. Therefore, intrusion detection technology is intro- on intrusion detection and classification algorithms [8] has
duced to make up for the deficiency of firewall technology. been mainly divided into two categories: based on tradi-
Intrusion detection technology [1], [2] is an active network tional machine learning [9], [10], [11] and based on deep
learning [12], [13], [14]. Facing the increasingly complex
The associate editor coordinating the review of this manuscript and IIoT environment, researchers have proposed various intru-
approving it for publication was Daniel Augusto Ribeiro Chaves . sion detection models for different network attacks and

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
24808 VOLUME 11, 2023
J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

applied machine learning algorithms to intrusion detection backpropagation network (BPN), and LSTM, to be stacked to
models. These models have achieved specific results but build a robust anomaly detection model. XGBoost combines
could still be improved. Deep learning has intense feature the results of each deep learning model to achieve higher
extraction and learning classification capabilities. It is a new accuracy, utilizing deep learning techniques to identify intru-
field of machine learning in recent years and has attracted sions with maximum accuracy and reduce false favorable
more scholars’ attention at home and abroad. Intrusion detec- rates.
tion algorithms based on deep learning have solved many Based on existing research, this paper constructs a net-
challenging problems. work intrusion detection classification model based on deep
Xiao et al. [15] adopted an auto-encoder (AE) to reduce the learning (NIDS-CNNLSTM) to further improve the detec-
dimension of the data to decrease the interference of redun- tion model’s detection rate and classification accuracy. The
dant features, and a convolutional neural network (CNN) model employs two deep learning algorithms: CNN [27], [28]
was adopted to identify the intrusion information. Staude- and LSTM [29], [30], [31]. CNN and LSTM extract the
meyer [16] introduced long short-term memory (LSTM) into Spatiotemporal features of network traffic data, which can
intrusion detection, explored the correlation of the tempo- more effectively identify intrusion information in the IIoT.
ral domain of intrusion information, and effectively reduced NIDS-CNNLSTM is evaluated on KDD CUP99, NSL_KDD,
the rate of false positives. Zhang et al. [17] proposed an and UNSW_NB15 datasets; all three datasets contain rich
intelligent grid intrusion detection model that combines samples and cover all possible types of attacks in Indus-
the genetic algorithm (GA) and extreme learning machine trial IoT. Selecting these three classic network intrusion
(ELM). The model retains the advantages of the ELM, and detection public datasets as benchmark datasets is the basis
the GA is introduced to ensure the optimal parameters of for fair comparison and verification of calculation methods.
the model. Vinayakumar et al. [18] connected CNN and Training multiple models in comparative experiments can
LSTM. They showed a serial CNN-LSTM intrusion detec- better evaluate the overall performance of intrusion detec-
tion system model to extract high-level feature represen- tion models [32], [33]. Based on the applicability of the
tations representing the abstract form of low-level feature verification model in binary and multi-classification sce-
sets of network traffic connections. Yao et al. [19] proposed narios, NIDS-CNNLSTM showed better verification accu-
an AMI intrusion detection model based on the cross-layer racy and training loss in both binary and multi-classification
feature fusion of CNN and LSTM to obtain comprehen- scenarios, improving the intrusion detection rate and
sive features with multi-domain characteristics based on the classification accuracy. Compared with models proposed in
KDD Cup 99 and NSL_KDD datasets. Yang and Wang [20] previous studies, the model’s overall performance has been
adopted an improved CNN to identify intrusion informa- significantly improved, further demonstrating the model’s
tion. The CNN is improved to extract features across layers, effectiveness.
and feature fusion is used to obtain comprehensive features. The organization of the article is as follows: the first part
Liu and Zhang [21] used CNN to identify intrusions and is the introduction, which introduces the background and
improved the model’s accuracy through data augmenta- significance of the research in this field and the related work
tion techniques. Shen et al. [22] applied an ELM to at home and abroad; the second part is the preliminary knowl-
intrusion detection, which improved the model’s detection edge, which outlines the related theory of intrusion detection
speed and generalization ability. Halbouni et al. [23] estab- model and neural network; the third part is the algorithm
lished a hybrid intrusion detection system model by using model architecture, which introduces the construction pro-
the ability of a CNN to extract spatial features and an cess of the NIDS-CNNLSTM model in detail, mainly includ-
LSTM network to extract temporal features. Batch nor- ing data collection, data processing, CNN-LSTM model,
malization and dropout layers were added to the model decision-making judgment; the fourth part is the simula-
to improve its performance. Thaseen and Kumar [24] pro- tion experiment, including the experimental environment,
posed an intrusion detection model using chi-square feature evaluation indicators, model performance, and comparative
selection and multi-class support vector machines (SVM). experiments; the fifth part is the summary and the future
The parameter adjustment technology is used to optimize work.
the radial basis function kernel parameters. A multi-class
support vector machine is constructed to reduce the
training and testing time and improve the individual clas-
sification accuracy of network attacks. Sahu et al. [25]
proposed a deep-learning model to solve the intrusion clas-
sification problem effectively. Classify benign and malicious
traffic on intrusion datasets using LSTM and Fully Con-
nected Network (FCN) deep learning methods to classify
multi-class attack patterns more accurately. Ikram et al. [26]
utilized an ensemble of different deep neural network
(DNN) models, such as multilayer perceptron (MLP), FIGURE 1. General intrusion detection model.

VOLUME 11, 2023 24809


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 2. Model structure diagram.

II. PRELIMINARY KNOWLEDGE in (1).


A. INTRUSION DETECTION MODEL X
N

Intrusion detection is the detection of intrusion behaviors. xkl =f x l−1 × wlik + blk (1)
i=1 i
It collects and analyzes network behaviors, security logs,
audit data, and other information available in the network where xkl represents the kth convolution map of the l layer,
and hints at several critical points in the computer sys- f represents the activation function, N represents the number
tem. It checks whether behaviors violate security policies of input convolution maps, × is the convolution operation,
and signs of being attacked in the network or system. The and wlik is the weight of the kth convolution kernel of the
flowchart of the general intrusion detection model is shown l layer for the ith operation, blk is the offset of the kth convo-
in Figure 1. The schematic diagram of the general intrusion lution kernel corresponding to the l layer.
detection model is shown in Figure 1. The model proposed in This paper uses the maximum pooling method for the
this paper is based on the general intrusion detection model. pooling layer; the calculation formula is shown in (2).
 
x̂kl = max xkl : xk+r−1
l
(2)

B. CONVOLUTIONAL NEURAL NETWORKS where x̂kl represents the maximum a value from vector xkl
l
to xk+r−1 . For the sequence x, repeating the max-pooling
CNN is a practical algorithm for deep learning. It is a feedfor-
ward neural network with convolution calculation and deep operation on the continuous vector whose window is r can
structure. It is often designed to process multi-dimensional get the largest feature sequence.
array data. It can accurately extract the local correlation In the last layer of the CNN, the softmax function is used to
of features and improve the accuracy of feature extraction. calculate the probability of each output class. The probability
A CNN consists of an input layer, a hidden layer, and an out- is calculated by dividing the exponent of the class of scores
put layer. The input layer is used to receive normalized array by the sum of the exponents of all the scores as:
data. The hidden layer includes a convolutional layer, a pool- exp(yi )
ing layer, and a fully connected layer. The convolutional softmax = Pc (3)
j exp(yi )
layer uses a convolution kernel for feature extraction and fea-
tures mapping excitation layer. After the convolutional layer The loss function is:
performs feature extraction, the output feature map will be H (y′ , y) = −
X
y′ Log(softmax(yi )) (4)
passed to the pool. The pooling layer performs feature selec-
tion and information filtering; the pooling layer performs Quantifying the degree of prediction of the calculated prob-
down sampling, sparsely processes the feature map, reduces ability to the actual class is done by calculating the loss.
the amount of data computation, dramatically reduces the In the case of the classification probability, it is done by
parameter magnitude, and effectively avoids overfitting; the the classification cross-entropy loss function. The predicted
fully connected layer is usually refitted at the tail of the CNN class (y′ ) and the actual class (y) two vectors are used to output
to minimize the loss of feature information. The output layer the total loss, and the cross-entropy loss is computed as the
directly outputs the classification result of each feature. The sum of the negative log-likelihoods of the class probabilities,
CNN structure is shown in Figure 2. expressed as a function of H (y′ , y).
CNN is divided into one-dimensional, two-dimensional,
and three-dimensional convolution. The one-dimensional C. LONG SHORT-TERM MEMORY NEURAL NETWORKS
CNN selected in this paper is mainly applied to time series CNN mainly analyzes the internal features of a single data
data. Assuming that the lth layer is a convolutional layer, the packet and lacks the extraction and analysis of the association
calculation formula of one-dimensional convolution is shown between sequences. Therefore, the model constructed with

24810 VOLUME 11, 2023


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

the LSTM network will have a better training effect on net-


work intrusion detection. Due to its ability to maintain long-
term memory, LSTM is gradually applied to various network
intrusion detection models to solve the problem of gradient
disappearance caused by recurrent neural networks. LSTM
is a time-cyclic neural network. Based on the traditional
cyclic neural network, the LSTM unit replaces the neurons in
the recurrent neural network (RNN). The input gate, output
gate, and forget gate are introduced in the input, output, and
forgotten past information to control and allow information to
pass through, maintain the memory ability of long data, and FIGURE 3. Internal structure diagram of hidden layer of LSTM network.
solve the existing long-term dependence problem in the RNN
network.
the information in the memory unit c, ot is the output signal
1) FORGET GATE of the output gate, and its value determines the proportion
of the memory unit c output to the current state h, ĉt is the
The forget gate changes with the context and forgets the
preparatory information that will be output to the hidden layer
information that needs to be forgotten. The output of the
state h, its value is multiplied by ot to get the information in h,
forget gate is a sigmoid function, the value range is between
the memory unit ct at time t has been screened by the forget
0 and 1, and it is multiplied by the cell state at the last moment,
gate and the input gate, and then the hidden state ht can be
and 0 represents the information of this bit is completely
obtained through the screening of the output gate.
forgotten, one represents that the information of this bit is
completely retained. The calculation formula is shown in (5).
III. ALGORITHM MODEL ARCHITECTURE
ft = σ Wf · [ht−1 , xt ] + bf

(5) The flowchart of NIDS-CNNLSTM proposed in this paper
is shown in Figure 4. Firstly, the original network intrusion
2) INPUT GATE detection dataset is input into the CNN-LSTM model through
The input gate supplements the information the new cell data preprocessing operation. Then after the composition and
state needs as much as possible. The output of the input gate evaluation of the model, the decision classification is finally
is a sigmoid function with a value range of 0∼1, which is carried out.
multiplied by the current cell state. The calculation formula
is shown as follows (6), (7).
it = σ (Wi · [ht−1 , xt ] + bi ) (6)
C̃t = Tanh (Wc · [ht−1 , xt ] + bc ) (7)
Then the old and new state information can be merged to
form the final new cell state. The calculation formula is shown
in (8).
Ct = ft × Ct−1 + it × C̃t (8)

3) OUTPUT GATE
The final cell state plus the Tanh function is the output.
The output gate’s output is a sigmoid function with a value
between 0 and 1. Select which information can be output. The
calculation formula is shown in (9), (10).
Ot = σ (Wo · [ht−1 , xt ] + bo ) (9)
ht = Ot × Tanh (Ct ) (10)
The internal structure of the hidden layer of the LSTM net-
work is shown in Figure 3.
In figure 3, ft is the output signal of the forget gate, whose
value determines the forget ratio of the memory unit c, it is
FIGURE 4. NIDS-CNNLSTM flowchart.
the output signal of the output gate, and its value determines
how much of the current input information is inputted into
the memory unit c, and c̃t is the preparatory information to be From the NIDS-CNNLSTM flow chart, the architecture
input into the memory unit c, its value is multiplied by it to get of the model implementation mainly includes four parts:

VOLUME 11, 2023 24811


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

TABLE 1. Dataset partition.

data collection, data processing, CNN-LSTM model, and connections and nine attack types: Fuzzer, Analysis, Back-
decision-making judgment. doors, DoS, Exploits, Generic, Reconnaissance, Shellcode,
and Worms.
A. DATA COLLECTION In order to better evaluate the applicability and effective-
The input function is implemented at the bottom of the model, ness of the model proposed in this paper, and to avoid chance,
receiving network intrusion datasets from KDD Cup99, two classification and multi-classification experiments were
NSL_KDD, and UNSW_NB15. KDD CUP99 is often used set up on the three sets of datasets. The datasets were uni-
as an intrusion detection system to provide a unified per- formly labeled, marked as normal and abnormal for binary
formance evaluation benchmark, test the quality of intrusion classification experiments, and marked as multiple traffic
detection algorithms, and lay the foundation for intelligent types for multi-classification experiments. The specific divi-
intrusion detection systems research. In order to make deep sions are shown in Table 1.
learning algorithms better implementable on KDD Cup99,
the NSL_KDD dataset was created, which will be more effec-
B. DATA PROCESSING
tive for accurately evaluating different learning techniques
1) DATA LOADING
due to the removal of redundant data. The UNSW_NB15
dataset is a comprehensive network attack traffic dataset The loaded data is stored in a CSV file in PCAP format, and
widely used in anomaly intrusion detection, which simulates the details of each dataset are read using the Pandas package,
the real attack environment as much as possible. Compared and after reading the details of each dataset, all null and
with KDD CUP99 and NSL_KDD datasets are more suitable duplicate values are cleaned.
for the research of intrusion detection systems. The UNSW-
NB15 dataset is considered to be a reliable dataset for eval- 2) DATA ENCODING
uating existing and novel IDS methods. The KDD Cup99, Processing deep neural networks means processing values,
NSL_KDD, and UNSW_NB15 datasets represent the most uniformly encoding the traffic labels in the read dataset into
classic and latest network attacks. The models are trained on numerical types, and using the One-Hot Encoder to encode
the three datasets to evaluate the models’ effectiveness better. the value of the label column.
In the KDD CUP99 dataset, each network connection is
marked as normal or abnormal, and the types of anomalies
are subdivided into four categories: DoS, R2L, U2R, and 3) DATA SCALING
Probing. The NSL_KDD dataset deletes the duplicate records The Americanized data is normalized, and the data read from
in KDD Cup99, reduces the amount of data, contains the basic the CSV file has different standard deviations and average
records and data characteristics of the KDD Cup99 dataset, values, and the difference will affect the learning efficiency.
and identifies the same attack categories as the KDD Cup99 Scaled the input data using Standard Scalar, resulting in
dataset. The UNSW_NB15 dataset contains normal network a mean of zero and a standard deviation of one. Library

24812 VOLUME 11, 2023


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 5. Internal structure of CNN-LSTM model.

standard scalars are used to normalize datasets according to dimension is set to 64, and then through two connected fully
sklearn preprocessing. connected layers, the function of the first fully connected
layer is to flatten the height and width, and convert the data
C. CNN-LSTM MODEL information from the height and width dimension to the depth
1) COMPONENTS dimension, the number of input nodes of the fully connected
The CNN-LSTM model mainly consists of two parts: the layer is 256 and the number of output nodes is 64; the role of
CNN network and the LSTM network. First, the preprocessed the second fully connected layer is to perform classification
data is input into the two-layer CNN network, feature selec- output, the number of input nodes in the fully connected
tion is performed on the traffic data, and the global average layer is 64, and the number of output nodes is determined
pooling layer is selected to replace the fully connected layer. by the final number of classifications required. In this model,
Data feature extraction and dimensionality reduction are real- the number of classifications is 6, so the number of output
ized through convolution and pooling operations, and the nodes is 6. Finally, the output results are normalized by the
feature matrix is the output. Then input the feature vector into Softmax function, the probability of the output is mapped to
the double-layer unidirectional LSTM network, and combine between 0-1, and the probability of all categories is added
the powerful time series learning ability of LSTM to learn to 1. The hyperparameter settings of the experiment in this
and classify the features selected by the CNN network. The paper: Batch Size is 1024, Learning Rate is 0.001, Optimizer
forget gate, input gate, and output gate in the LSTM network is Adam, and Epoch is 50. The CNN-LSTM model summary
adjust their parameters through continuous iterative training is shown in Table 2.
of a large amount of data. Then, from the data extracted by
TABLE 2. Model summary.
the CNN network, the time-fitting relationship between the
data is learned, and the effective dynamic modeling of the
input and output data of the forecast time series is carried
out. Finally, the CNN-LSTM model fits the trained data and
outputs the predicted value through the fully connected neural
network.

2) INTERNAL STRUCTURE
For example, the CNN-LSTM model trained on the KDD
CUP99 dataset in a multi-classification scenario shows the
internal structure of the CNN-LSTM network in Figure 5.
In Figure 5, the preprocessed data is firstly flattened into
a 1-dimensional array, and then the data is mapped from 3) IMPROVEMENT PART
low-dimensional to high-dimensional through two layers of •Global average pooling layer instead of fully connected
3 × 3 convolution operations, and the height × width is layer
6 × 6 channels, feature maps of 64 and 128, respectively, In the CNN part, the convolutional layer before the con-
in which the meaning of the data represented by different nection layer is responsible for the feature extraction of the
dimensions is also different; then after the maximum pooling data. After acquiring the features, the traditional method is
operation, the size of the pooling kernel is 2×2, and the height to connect the fully connected layer and perform activation
and width of the feature map become 3 × 3, but the number classification, but the fully connected layer adds training and
of channels remains unchanged, which also shows that the testing calculations. The amount of parameters reduces the
maximum pooling operation only changes the size of the speed, and if the parameter amount is too large, it is easy
feature map without changing its dimension; then the critical to overfit. The idea of global average pooling is to replace
features extracted by CNN are used as input data to the LSTM the fully connected layer with global average pooling, use
model, a two-layer one-way LSTM is selected, and the output the pooling layer to reduce dimensionality, and retain the

VOLUME 11, 2023 24813


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 6. Fully connected layer and global average pooling layer.

FIGURE 7. One layer of LSTM accuracy and two layers of LSTM accuracy.

spatial information or semantic information extracted by the Figures 7 (a) and (b) show the accuracy of one layer of
previous convolutional layers and pooling layers. Therefore, LSTM and the accuracy of two layers of LSTM, respectively.
the effect improvement is more evident in practical applica- Figures 8 (a) and (b) show the loss function of one layer
tions, and global average pooling has no limit to the input of LSTM and the loss function of two layers of LSTM,
size. The fully connected layer expands the convolutional respectively. The accuracy curve and loss function of a layer
layer into a vector and classifies each feature map. Global of LSTM have a jagged state with apparent oscillations.
average pooling is to combine the above two processes into Although the results were not significantly affected, this casts
one, as shown in Figure 6. doubt on the reliability of this unstable result. In contrast, the
Figures 6 (a) and (b) show the process of the fully accuracy curve of the two-layer LSTM is smoother, and the
connected layer and the global average pooling layer, respec- obtained results are more stable and reliable.
tively. Global average pooling dramatically reduces the num- The number of LSTM layers can be manageable in pro-
ber of network parameters, equivalent to regularizing the cessing time series data. The increase in the number of layers
entire network structure to prevent over-fitting of the model. will bring about an exponential increase in time overhead
It directly removes the features of the black box in the fully and memory overhead, and then the gradient between lay-
connected layer and directly gives each channel the actual ers disappears. When the number of layers exceeds three,
category meaning. At the end of the convolution layer, as the gradient disappearance between layers will become very
many categories are output as the map, the average value of obvious, resulting in a slowdown of the update iteration of
the map is directly calculated to obtain the result. Finally, use the LSTM layer close to the input layer and a sharp drop in
Softmax function for classification. the convergence effect and efficiency. In the face of a large
•Selection of two layers of LSTM amount of data, more is needed to amplify the neurons of one
When increasing the number of iterations to 50, compare layer. Currently, the two layers of LSTM can compress the
the accuracy and loss function of one-layer LSTM and two- data into highly ‘‘condensed’’ data. Adding the number of
layer LSTM, as shown in Figure 7 and Figure 8. deep layers can only bring about the loss of information in

24814 VOLUME 11, 2023


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 8. One layer of LSTM loss function and two layers of LSTM loss function.

information compression and the disappearance of gradients the advantages and disadvantages of partition models and
in the training process. LSTM solves the problem of RNN differences. It contains four elements: TP is the number
long-distance dependence and gradient disappearance mainly of normal samples of normal traffic, and FN is the num-
within each layer, not between layers, so it will be difficult ber of normal samples in normal traffic. The number of
to train if there are too many LSTM layers, and the feature abnormal samples classified as different abnormal traffic;
extraction ability of each layer of LSTM is potent. Therefore, FP is the number of normal samples classified as different
this paper chooses a two-layer unidirectional LSTM in con- abnormal traffic; TN is the number of abnormal samples
structing the NIDS-CNNLSTM model. classified as normal traffic. The confusion matrix is shown
in Table 4.
D. DECISION-MAKING JUDGMENT
After obtaining the classification results, the attack traffic can TABLE 4. Confusion matrix.
be judged. The detection results are further used to update the
knowledge database to improve the system’s detection capa-
bility. CNN and LSTM implementations for intrusion detec-
tion and specific architectures for classifying input data with
the best performance, so networks with different structures
are designed and implemented and then executed multiple
times to determine the best network structure.
In the process of intrusion detection, indicators such
as accuracy (ACC), precision (PR), detection rate (DR),
IV. SIMULATION EXPERIMENT
F1 value (F1), and false positive rate (FPR) are usually used
A. EXPERIMENTAL ENVIRONMENT
to evaluate the effect of the model. ACC refers to the ratio of
The experimental environment of this paper is shown in
correctly classified samples to the total number of samples,
Table 3 below.
and its calculation formula is shown in (11).
TABLE 3. Experimental environment. TP + TN
ACC = (11)
TP + TN + FP + FN
PR represents the percentage of correctly classified samples
of normal traffic among the samples predicted to be normal
traffic, and its calculation formula is shown in (12).
TP
PR = (12)
TP + FP
B. EVALUATION INDICATORS DR is defined as the ratio between the number of correctly
The confusion matrix is often used in intrusion detection tech- identified abnormal samples and the predicted number of
nology as an index to evaluate classification performance. abnormal samples. The DR reflects the ability of the model to
The confusion matrix is a visual display tool to evaluate identify attacks, which is an essential indicator in IDSs, and

VOLUME 11, 2023 24815


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 9. NIDS-CNNLSTM model validation accuracy and training loss in the binary classification.

its calculation formula is shown in (13). NSL_KDD, and UNSW_NB15 datasets can be obtained,
as shown in Figure 9.
TP
DR = (13) Figures. 9 (a) and (b) respectively show the relationship
TP + FN between the validation accuracy and the epoch and between
F1 is the harmonic mean of PR and DR, and its calculation the training loss and the epoch in the NIDS-CNNLSTM
formula is shown in (14). model. As the number of training rounds increases, the ver-
2PR × DR ification accuracy gradually increases and stabilizes. The
F1 = (14) verification accuracy rates on the KDD CUP99, NSL_KDD,
PR + DR
and UNSW_NB15 datasets are 0.974, 0.99, and 0.94, respec-
FPR is defined as the ratio of misidentified abnormal samples tively. At the same time, the training loss gradually decreases
to the predicted number of normal samples, and its calculation and finally stabilizes. The training losses on the KDD CUP99,
formula is shown in (15). NSL_KDD, and UNSW_NB15 datasets are 0.038, 0.029,
FP and 0.125, respectively. The validation accuracy and train-
FPR = (15) ing loss fitting curves show good convergence, indicating
TP + FP
that the model’s structural design and parameter settings are
Multiple metrics are often used simultaneously in intrusion reasonable.
detection research to comprehensively evaluate a model. The confusion matrix is mainly used to represent classifi-
In this work, ACC, P, F, DR, and FPR were selected to assess cation accuracy. It uses a tabular diagram with the predicted
the performance of the proposed NIDS-CNNLSTM model in results on the horizontal axis and the actual results on the
multiple experiments. vertical axis to display the classification performance of the
algorithm visually. From the confusion matrix, the ACC, PR,
C. MODEL PERFORMANCE DR, F1, and FPR of the normal and abnormal traffic of the
In order to thoroughly verify the classification performance NIDS-CNNLSTM model in the three sets of datasets can
of the NIDS-CNNLSTM model, this paper conducts exper- be obtained. The experimental confusion matrix is shown
iments in binary classification and multi-classification sce- in Figure 10.
narios. Use the training set in the KDD CUP99, NSL_KDD, Figures 10 (a), (b), and (c) respectively show the evaluation
and UNSW_NB15 datasets to train the classifier, then use parameters of normal and abnormal traffic in the three sets
the validation set to optimize and adjust the parameters of of datasets obtained from the confusion matrix and obtain
the classifier, use the test set to calculate the error rate of the the ACC, PR, DR, F1, and FPR of the NIDS-CNNLSTM
optimized classifier, and test the performance of the classifier. model. The ACC, PR, DR, F1, and FPR of the model on the
The model activation function is set to sigmoid, the learning KDD CUP99 dataset are 0.97, 0.935, 0.98, 0.96, and 0.00086,
rate is 0.001, and epochs = 50. The fitting curve of validation respectively. The ACC, PR, DR, F1, and FPR of the model on
accuracy and training loss, confusion matrix, and classifica- the NSL_KDD dataset are 0.99, 0.99, 0.99, 0.99, and 0.0145,
tion accuracy of each component are obtained. respectively. The ACC, PR, DR, F1, and FPR of the model
•Binary classification experiments on the UNSW_NB15 dataset are 0.94, 0.93, 0.935, 0.935, and
In the binary classification experiment, after 50 iterations, 0.0397, respectively.
the fitting curves of the verification accuracy and training In the binary classification experiment, the NIDS-
loss of the NIDS-CNNLSTM model on the KDD CUP99, CNNLSTM model obtained the classification accuracy of

24816 VOLUME 11, 2023


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 10. Confusion matrix in the binary classification.

FIGURE 11. The classification accuracy of each component in the binary


classification.

normal and abnormal traffic for the KDD CUP99, NSL_ training loss and the epoch in the NIDS-CNNLSTM model,
KDD, and UNSW_NB15 datasets, as shown in Figure 11. respectively. As the number of training rounds increases, the
As shown in Figure 11, in the binary classification sce- verification accuracy gradually increases and stabilizes. The
nario, the model’s classification accuracy for Normal and verification accuracy rates on the KDD CUP99, NSL_KDD,
Abnormal traffic in the three datasets is more significant and UNSW_NB15 datasets are 0.9705, 0.992, and 0.829,
than 0.9. The model has a relatively balanced classification respectively. At the same time, the training loss gradually
accuracy for the Normal and Abnormal traffic in the KDD decreases and finally stabilizes. The training losses on the
CUP99 dataset. The model has the highest classification KDD CUP99, NSL_KDD, and UNSW_NB15 datasets are
accuracy of Normal and Abnormal traffic in the NSL_KDD 0.045, 0.022, and 0.408, respectively. The validation accu-
dataset, reaching 0.99. The classification accuracy of the racy and training loss fitting curves show good convergence,
model for Normal in the UNSW_NB15 dataset is low, mainly indicating that the model’s structural design and parameter
due to the imbalance of the dataset and the limited number settings are reasonable.
of training samples for Normal. It can be concluded that the From the confusion matrix, the ACC, PR, DR, F1, and FPR
NIDS-CNNLSTM model proposed in this paper has better of the NIDS-CNNLSTM model in various types of traffic in
performance in classifying two kinds of traffic in the binary the three sets of datasets can be obtained. The experimental
classification scenario. confusion matrix is shown in Figure 13.
•Multi-classification experiments Figures 13(a), (b), and (c) respectively show the evaluation
In the multi-class experiment, after 50 iterations, the fitting parameters of various types of traffic in the three sets of
curves of the verification accuracy and training loss of the datasets obtained from the confusion matrix and obtain the
NIDS-CNNLSTM model on the KDD CUP99, NSL_KDD, ACC, PR, DR, F1, and FPR of the NIDS-CNNLSTM model.
and UNSW_NB15 datasets can be obtained, as shown The ACC, PR, DR, F1, and FPR of the model on the KDD
in Figure 12. CUP99 dataset are 0.9705, 0.825, 0.843, 0.825, and 0.0059,
Figures 12 (a) and (b) show the relationship between respectively. The ACC, PR, DR, F1, and FPR of the model
the validation accuracy and the epoch and between the on the NSL_KDD dataset are 0.992, 0.918, 0.9, 0.908, and

VOLUME 11, 2023 24817


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 12. NIDS-CNNLSTM model validation accuracy and training loss in the multi-classification.

FIGURE 13. Confusion matrix in the multi-classification.

0.0029, respectively. The ACC, PR, DR, F1, and FPR of the the classification accuracy rates of Normal, DOS, Probing,
model on the UNSW_NB15 dataset are 0.829, 0.631, 0.49, and R2L traffic types are all greater than 0.96. The model’s
0.495, and 0.0014, respectively. classification accuracy for nine types of attack traffic and one
In the multi-classification experiment, the NIDS- type of Normal traffic in the UNSW_NB15 dataset reflects
CNNLSTM model obtained the classification accuracy of the data distribution. The UNSW_NB15 dataset is more
Normal, DoS, Probing, R2L, U2R, and Uncertain six types comprehensive than the KDD99 and NSL KDD datasets.
of traffic for the KDD CUP99 dataset. For the NSL_KDD It combines normal activities and synthetic attack behaviors
dataset, the classification accuracy of Normal, DoS, Probing, and can simulate the existing network environment. The
R2L, and U2R’s five types of traffic is obtained. For the NIDS-CNNLSTM model proposed in this paper performs
UNSW_NB15 dataset, the classification accuracy of Nor- better in classifying various traffic in multi-classification
mal, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, scenarios.
Reconnaissance, Shellcode, and Worms ten types of traffic is The experiments of binary classification and multi-
obtained, as shown in Figure 14. classification accuracy fully demonstrate the effectiveness of
As shown in Figure 14, in the multi-classification scenario, the NIDS-CNNLSTM model, which is accurate and reliable.
the classification accuracy distribution of the model for vari-
ous types of traffic in the three sets of datasets is more reason- D. COMPARATIVE EXPERIMENTS
able. The model has a good classification effect on Normal, To further evaluate the effectiveness of the NIDS-CNNLSTM
Dos, R2L, and Uncertain in the KDD CUP99 dataset, but model, the proposed model is compared with methods pro-
the classification effect of Probing and U2R needs to be posed in previous studies on the KDD CUP99, NSL_KDD,
better. Due to the relatively small data size, the model training and UNSW_NB15 datasets. The hyperparameter settings in
sample data is insufficient. The model has a low classification the comparison experiment are consistent with the experi-
accuracy of 0.78 for U2R in the NSL_KDD dataset, and ments in this paper, and the ACC, DR, and FPR of the models

24818 VOLUME 11, 2023


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

FIGURE 14. The classification accuracy of each component in the multi-classification.

TABLE 5. Performance comparison.

are compared, respectively. The performance improvement due to the randomness of the training set and test set selec-
effect of the NIDS-CNNLSTM model is obtained, and the tion. However, it can still prove that the proposed model is
comparison results are shown in Table 5. superior to the existing models proposed in previous studies.
The comparison results in table 5 show that the proposed It can also more confidently adapt to network attack traffic
NIDS-CNNLSTM model has significantly improved perfor- in different scenarios. In the intrusion detection scenario,
mance in terms of ACC, DR, and FPR compared with the the efficiency is maximized while ensuring accuracy, and
models proposed in previous studies. Considering various the overall performance of the intrusion detection system is
evaluation indicators comprehensively, the model proposed improved.
in this paper effectively improves the detection rate and
accuracy of the intrusion detection model and dramatically V. SUMMARY
reduces the false positive rate. This result was also confirmed This paper proposes NIDS-CNNLSTM to solve the problems
on different datasets, which may lead to different final results of low detection rate and classification accuracy and high

VOLUME 11, 2023 24819


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

false detection rate of traditional intrusion detection models [11] Z. Li, A. L. G. Rios, G. Xu, and L. Trajkovic, ‘‘Machine learning tech-
in the IIoT. In NIDS-CNNLSTM, the two-layer CNN layer niques for classifying network anomalies and intrusions,’’ in Proc. IEEE
Int. Symp. Circuits Syst. (ISCAS), May 2019, pp. 1–5.
and the two-layer unidirectional LSTM layer are improved [12] T. Saba, A. Rehman, T. Sadad, H. Kolivand, and S. A. Bahaj,
and superimposed. The ability of CNN to extract spatial ‘‘Anomaly-based intrusion detection system for IoT networks through
features and LSTM to extract temporal features is used. deep learning model,’’ Comput. Electr. Eng., vol. 99, Apr. 2022,
Art. no. 107810.
The model is evaluated using the KDD CUP99, NSL_KDD [13] Y. Fu, Y. Du, Z. Cao, Q. Li, and W. Xiang, ‘‘A deep learning model for
and UNSW_NB15 datasets, and the validation accuracy network intrusion detection with imbalanced data,’’ Electronics, vol. 11,
and training loss on the KDD CUP99, NSL_KDD, and no. 6, p. 898, Mar. 2022.
[14] T. Su, H. Sun, J. Zhu, S. Wang, and Y. Li, ‘‘BAT: Deep learning methods
UNSW_NB15 datasets show good convergence and level. on network intrusion detection using NSL-KDD dataset,’’ IEEE Access,
The model’s applicability has been verified based on binary vol. 8, pp. 29575–29585, 2020.
classification and multi-classification scenarios. When clas- [15] Y. Xiao, C. Xing, T. Zhang, and Z. Zhao, ‘‘An intrusion detection model
based on feature reduction and convolutional neural networks,’’ IEEE
sifying various types of traffic, it has high accuracy and Access, vol. 7, pp. 42210–42219, 2019.
is true and reasonable. Compared with the existing models [16] R. C. Staudemeyer, ‘‘Applying long short-term memory recurrent neural
proposed in previous studies, NIDS-CNNLSTM has signif- networks to intrusion detection,’’ South Afr. Comput. J., vol. 56, no. 1,
pp. 136–154, 2015.
icantly improved classification accuracy, detection rate and [17] K. Zhang, Z. Hu, Y. Zhan, X. Wang, and K. Guo, ‘‘A smart grid AMI
false detection rate. It is an intrusion detection classifica- intrusion detection strategy based on extreme learning machine,’’ Energies,
tion model that is superior to existing models. In future vol. 13, no. 18, p. 4907, 2020.
[18] R. Vinayakumar, K. P. Soman, and P. Poornachandran, ‘‘Applying convolu-
work, we will further improve the imbalance of the dataset,
tional neural network for network intrusion detection,’’ in Proc. Int. Conf.
improve the classification accuracy of small sample traffic, Adv. Comput., Commun. Informat. (ICACCI), Sep. 2017, pp. 1222–1228.
and continue to enhance the model’s overall performance. [19] R. Yao, N. Wang, Z. Liu, P. Chen, and X. Sheng, ‘‘Intrusion detection
NIDS-CNNLSTM can effectively support the edge wire- system in the advanced metering infrastructure: A cross-layer feature-
fusion CNN-LSTM-Based approach,’’ Sensors, vol. 21, no. 2, p. 626,
less perception information security of the ubiquitous IIoT, Jan. 2021.
thereby escorting the safe operation of key infrastructure [20] H. Yang and F. Wang, ‘‘Wireless network intrusion detection based
related to the national economy and people’s livelihood, on improved convolutional neural network,’’ IEEE Access, vol. 7,
pp. 64366–64374, 2019.
such as the industrial Internet and new infrastructure, and [21] G. Liu and J. Zhang, ‘‘CNID: Research of network intrusion detection
avoiding major information security accidents. This paper based on convolutional neural network,’’ Discrete Dyn. Nature Soc.,
has important theoretical significance and extensive practical vol. 2020, pp. 1–11, May 2020.
[22] Y. Shen, K. Zheng, C. Wu, M. Zhang, X. Niu, and Y. Yang, ‘‘An ensemble
application value. method based on selection using bat algorithm for intrusion detection,’’
Comput. J., vol. 61, no. 4, pp. 526–538, 2018.
[23] A. Halbouni, T. S. Gunawan, M. H. Habaebi, M. Halbouni, M. Kartiwi,
REFERENCES and R. Ahmad, ‘‘CNN-LSTM: Hybrid deep neural network for network
[1] S. J. Jian, Z. G. Lu, D. Du, B. Jiang, and B. X. Liu, ‘‘Overview of network intrusion detection system,’’ IEEE Access, vol. 10, pp. 99837–99849,
intrusion detection technology,’’ J. Cyber Secur., vol. 5, no. 4, pp. 96–122, 2022.
2020. [24] I. S. Thaseen and C. A. Kumar, ‘‘Intrusion detection model using fusion
[2] K. Wu, Z. Chen, and W. Li, ‘‘A novel intrusion detection model for of chi-square feature selection and multi class SVM,’’ J. King Saud Univ.-
a massive network using convolutional neural networks,’’ IEEE Access, Comput. Inf. Sci., vol. 29, no. 4, pp. 462–472, 2016.
vol. 6, pp. 50850–50859, 2018. [25] S. K. Sahu, D. P. Mohapatra, J. K. Rout, K. S. Sahoo, Q.-V. Pham,
[3] T. Acharya, I. Khatri, A. Annamalai, and M. F. Chouikha, ‘‘Efficacy of and N.-N. Dao, ‘‘A LSTM-FCNN based multi-class intrusion detection
heterogeneous ensemble assisted machine learning model for binary and using scalable framework,’’ Comput. Electr. Eng., vol. 99, Apr. 2022,
multi-class network intrusion detection,’’ in Proc. IEEE Int. Conf. Autom. Art. no. 107720.
Control Intell. Syst. (I2CACIS), Jun. 2021, pp. 408–413. [26] S. T. Ikram, A. K. Cherukuri, B. Poorva, P. S. Ushasree, Y. Zhang,
[4] M. Injadat, F. Salo, A. B. Nassif, A. Essex, and A. Shami, ‘‘Bayesian opti- X. Liu, and G. Li, ‘‘Anomaly detection using XGBoost ensemble of deep
mization with machine learning algorithms towards anomaly detection,’’ in neural network models,’’ Cybern. Inf. Technol., vol. 21, no. 3, pp. 175–188,
Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec. 2018, pp. 1–6. Sep. 2021.
[27] D. Nedeljkovic and Z. Jakovljevic, ‘‘CNN based method for the develop-
[5] Y. J. Chew, S. Y. Ooi, K.-S. Wong, Y. H. Pang, and N. Lee, ‘‘Adoption of
ment of cyber-attacks detection algorithms in industrial control systems,’’
IP truncation in a privacy-based decision tree pruning design: A case study
Comput. Secur., vol. 114, Mar. 2022, Art. no. 102585.
in network intrusion detection system,’’ Electronics, vol. 11, no. 5, p. 805,
[28] Y. Zhou, X. Zhu, S. Hu, D. Lin, and Y. Gao, ‘‘Intrusion detection based
Mar. 2022.
on convolutional neural network in complex network environment,’’ in
[6] L. L. Ray, ‘‘Training and testing anomaly-based neural network intrusion Artificial Intelligence in China. Springer, 2020, pp. 229–238.
detection systems,’’ Int. J. Inf. Secur. Sci., vol. 2, no. 2, pp. 57–63, 2013.
[29] C. Xu, J. Shen, X. Du, and F. Zhang, ‘‘An intrusion detection system using
[7] A. Ponmalar and V. Dhanakoti, ‘‘An intrusion detection approach using a deep neural network with gated recurrent units,’’ IEEE Access, vol. 6,
ensemble support vector machine based chaos game optimization algo- pp. 48697–48707, 2018.
rithm in big data platform,’’ Appl. Soft Comput., vol. 116, Feb. 2022, [30] W.-C. Shi and H.-M. Sun, ‘‘DeepBot: A time-based botnet detection
Art. no. 108295. with deep learning,’’ Soft Comput., vol. 24, no. 21, pp. 16605–16616,
[8] M. Mehmood, T. Javed, J. Nebhen, S. Abbas, R. Abid, G. R. Bojja, Nov. 2020.
and M. Rizwan, ‘‘A hybrid approach for network intrusion detection,’’ [31] B. Chakravarthi, S.-C. Ng, M. R. Ezilarasan, and M.-F. Leung, ‘‘EEG-
Comput., Materials Continua, vol. 70, no. 1, pp. 91–107, 2022. based emotion recognition using hybrid CNN and LSTM classification,’’
[9] M. U. Ilyas and S. A. Alharbi, ‘‘Machine learning approaches to network Frontiers Comput. Neurosci., vol. 16, Oct. 2022.
intrusion detection for contemporary internet traffic,’’ Computing, vol. 104, [32] S. Zwane, P. Tarwireyi, and M. Adigun, ‘‘Performance analysis of machine
no. 5, pp. 1061–1076, May 2022. learning classifiers for intrusion detection,’’ in Proc. Int. Conf. Intell. Innov.
[10] Z. K. Maseer, R. Yusof, N. Bahaman, S. A. Mostafa, and Comput. Appl. (ICONIC), Dec. 2018, pp. 1–5.
C. F. M. Foozy, ‘‘Benchmarking of machine learning for anomaly [33] C. Kaya, O. Yildiz, and S. Ay, ‘‘Performance analysis of machine learning
based intrusion detection systems in the CICIDS2017 dataset,’’ IEEE techniques in intrusion detection,’’ in Proc. 24th Signal Process. Commun.
Access, vol. 9, pp. 22351–22370, 2021. Appl. Conf. (SIU), May 2016, pp. 1473–1476.

24820 VOLUME 11, 2023


J. Du et al.: NIDS-CNNLSTM: Network Intrusion Detection Classification Model Based on Deep Learning

JIAWEI DU received the B.E. degree from Xijing YANJING HU received the B.S. degree in com-
University, Xi’an, Shaanxi, China, in 2021, where puter science and technology from Xi’an Univer-
she is currently pursuing the master’s degree sity of Technology, in 2009, and the Ph.D. degree
in electronic information. Her research interest in computer science and technology from Xidian
includes network and information security. University, in 2017. He is an Associate Professor
with the School of Cryptographic Engineering,
Engineering University of PAP, Xi’an, Shaanxi,
China. Since 2003, he has been a Lecturer and an
Associate Professor with the Engineering Univer-
sity of PAP. His research interest includes network
and information security.

KAI YANG received the B.S. and Ph.D. degrees LINGJIE JIANG received the bachelor’s degree
in computer science and technology from Xidian in engineering from Xijing University, Xi’an,
University, in 2005 and 2011, respectively. He is an Shaanxi, China, in 2021, where he is currently
Associate Professor with the School of Computer, pursuing the master’s degree in electronic informa-
Xijing University, Xi’an, Shaanxi, China. Since tion. His research interests include deep learning
2011, he has been a Lecturer and an Associate Pro- and object detection.
fessor with the Engineering University of PAP. His
research interest includes network and information
security.

VOLUME 11, 2023 24821

You might also like