CAAI Trans On Intel Tech - 2024 - Qathrady - SACNN IDS A Self Attention Convolutional Neural Network For Intrusion

Received: 23 April 2023
DOI: 10.1049/cit2.12352
ORIGINAL RESEARCH
- -
Revised: 10 January 2024 Accepted: 21 March 2024
- CAAI Transactions on Intelligence Technology
SACNN‐IDS: A self‐attention convolutional neural network for

intrusion detection in industrial internet of things
Mimonah Al Qathrady1 | Safi Ullah2 | Mohammed S. Alshehri3 | Jawad Ahmad4 |

Sultan Almakdi3 | Samar M. Alqhtani1 | Muazzam A. Khan2,5 | Baraq Ghaleb4
1
Department of Information Systems, College of Abstract
Computer Science and Information Systems, Najran
University, Najran, Saudi Arabia
Industrial Internet of Things (IIoT) is a pervasive network of interlinked smart devices
2
that provide a variety of intelligent computing services in industrial environments. Several
Department of Computer Science, Quaid‐i‐Azam
University, Islamabad, Pakistan IIoT nodes operate confidential data (such as medical, transportation, military, etc.) which
3 are reachable targets for hostile intruders due to their openness and varied structure.
Department of Computer Science, College of
Computer Science and Information Systems, Najran Intrusion Detection Systems (IDS) based on Machine Learning (ML) and Deep Learning
University, Najran, Saudi Arabia (DL) techniques have got significant attention. However, existing ML and DL‐based IDS
4
School of Computing, Engineering and the Built still face a number of obstacles that must be overcome. For instance, the existing DL
Environment, Edinburgh Napier University, approaches necessitate a substantial quantity of data for effective performance, which is
Edinburgh, UK
not feasible to run on low‐power and low‐memory devices. Imbalanced and fewer data
5
ICESCO Chair Big Data Analytics and Edge potentially lead to low performance on existing IDS. This paper proposes a self‐attention
Computing, Quaid‐i‐Azam University, Islamabad,
Pakistan convolutional neural network (SACNN) architecture for the detection of malicious ac-
tivity in IIoT networks and an appropriate feature extraction method to extract the most
Correspondence significant features. The proposed architecture has a self‐attention layer to calculate the
Jawad Ahmad. input attention and convolutional neural network (CNN) layers to process the assigned
Email: J.Ahmad@napier.ac.uk attention features for prediction. The performance evaluation of the proposed SACNN
architecture has been done with the Edge‐IIoTset and X‐IIoTID datasets. These datasets
Funding information
encompassed the behaviours of contemporary IIoT communication protocols, the op-
Deputy for Research and Innovation ‐ Ministry of
Education, Kingdom of Saudi Arabia, Grant/Award erations of state‐of‐the‐art devices, various attack types, and diverse attack scenarios.
Number: NU/IFC/02/SERC/‐/31; Institutional
Funding Committee at Najran University, Kingdom KEYWORDS
of Saudi Arabia
deep learning, internet of things, security
1 | INTRODUCTION expansion of IoT into industrial sectors such as factory floors

and warehouses. Industrial Internet of Things (IIoT) consists
Researchers have shown a great deal of interest in the Internet of interlinked smart machinery and real‐time analytics systems
of Things (IoT), which is considered to be one of the most and intelligent services that process the data produced by
advanced technologies. The proliferation trend of IoT appli- those machines [3–6]. For instance, an IIoT‐managed in-
cations has grabbed several industries including agriculture, ventory system can handle ordering supplies just before they
health, smart security, air and water pollution, transport, run out of stock, significantly simplifying the task of main-
smart cities, and smart homes to implement IoT features taining inventory and freeing up the employee to do other
[1, 2]. Industrial IoT is a subcategory of IoT that refers to the duties [7].
Abbreviations: ACC, accuracy; AE, autoencoder; AUC, area under the curve; CNN, convolutional neural network; DL, deep learning; DNN, deep neural network; GRU, gated
recurrent units; IDS, intrusion detection system; IIoT, industrial internet of things; IoT, internet of things; LR, linear regression; LSTM, long short‐term memory; ML, machine learning;
MLP, multi‐layer perceptron; NB, nave bayes; P, precision; R, recall; SACNN, self‐attention convolutional neural network; SVM, support vector machine.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
-
properly cited.
© 2024 The Author(s). CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing
University of Technology.
CAAI Trans. Intell. Technol. 2024;1–14. wileyonlinelibrary.com/journal/cit2 1

24682322, 0, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.12352 by HEALTH RESEARCH BOARD, Wiley Online Library on [12/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2
- QATHRADY ET AL.
Nevertheless, despite the IIoT paradigm's unquestionable sizes of heads and process them in parallel to calculate the
benefits, such benefits are accompanied by critical security flaws attention value for each head, and CNN layers are used to
[8]. For example, numerous IIoT applications deal with sensitive process the calculated attention and predict the communi-
and confidential data, such as medical, transportation or military cation activities in the network.
systems making them attractive targets for adversarial intruders � A new feature extraction approach based on the extra tree
[9–11]. Additionally, IIoT devices are usually poorly designed classifier (ETC) is adopted to enable more efficient extrac-
with many overlooked security aspects rendering them vulner- tion of the most significant features.
able to reprogramming and manipulating. This could have a � An extensive empirical evaluation has been conducted using
profound impact on the IIoT system, resulting in significant both heavy and light versions of the dataset to showcase the
economic and reputational losses. To overcome such security effectiveness of the SACNN model in comparison to state‐
concerns, numerous Intrusion Detection Systems (IDS) in of‐the‐art methods.
which Machine Learning (ML) and Deep Learning (DL) play
vital roles have been developed [12–15]. However, existing ML This paper is organised as follows: Section II overviews the
and DL‐based IDS still face a number of obstacles that must be latest related works pertaining to IDS in IIoT‐based networks.
overcome. First, the state‐of‐the‐art IDS systems have difficulty Section III describes the pre‐processing steps and our proposed
detecting multi‐class category and multi‐class sub‐category at- SACNN model highlighting its main features. Section IV
tacks efficiently. Second, existing DL approaches require an thoroughly discusses experiments and performance evaluation
extremely large volume of data for efficiently training the results. Finally, Section V contains the conclusion of the paper.
models placing a high burden on the limited storage and
computing resources of IIoT devices. Third, existing IDS
techniques address primarily binary classification with balanced 2 | RELATED WORK
datasets rendering them inefficient for multi‐class learning with
imbalanced datasets. Several experts have been actively dedicated to enhancing the
This paper proposes a self‐attention convolutional neural security of IIoT networks. Significant research efforts have
network (SACNN) architecture and a suitable feature extraction recently been directed towards developing more efficient DL‐
technique to identify malicious activity in IIoT networks, based models for intrusion detection. Li et al. [19] designed a
addressing the limitations of current IDS approaches. The multi‐CNN fusion paradigm for the identification of cyber-
proposed system handles the imbalance issue of input data by attacks in IIoT network communication. They evaluate the
dividing input features into equal sizes of vectors and processing designed paradigm with the NSL‐KDD dataset. Bovenzi et al.
them in parallel. Parallel processing of these vectors increases [20] introduced a multimodal deep autoEncoder (M2‐DAE)
the learning rate and improves the detection performance of method for identifying cyberattacks in IoT network communi-
malicious activities in IIoT networks. The proposed architecture cations. The model's performance was evaluated using the Bot‐
has a self‐attention layer to calculate the input attention and IoT dataset, and it achieved an F1‐score of 99% on the utilised
CNN layers to process the assigned attention features for pre- dataset.
diction. The fundamental benefit of CNN over other DL al- Abdel‐Basset et al. [21] presented a forensics‐based DL
gorithms is its ability to capture the importance of features [16]. framework for identifying malicious attacks in IIoT traffic. The
Moreover, CNN operates with fewer parameters, resulting in presented framework was tested in a fog computing environ-
faster performance [17]. The performance assessment of the ment and the Bot‐IIoT dataset was used to prove the efficiency
proposed SACNN architecture has been done with the Edge‐ of the presented framework. Kasongo [22] proposed a genetic
IIoTset and X‐IIoTID datasets. Edge‐IIoTset contains 14 at- algorithm (GA) for attributes extraction and random forest
tacks associated with IoT and IIoT communication protocols method to detect intrusions in IIoT network communication.
that are classified into five classes: DoS/DDoS, Information They utilised UNSW‐NB15 dataset to assess the effectiveness
gathering, Man in the middle, Injection, and Malware attacks of the model.
[18]. On the other hand, the X‐IIoTID dataset comprises real‐ Liu et al. [23] proposed a variational autoencoder (VAE)
time IIoT network traffic data, encompassing contemporary paradigm for intrusion detection utilising a conditional
IIoT communication protocol behaviours, state‐of‐the‐art de- balancing strategy. The VAE paradigm was evaluated using the
vices operations, a wide range of attack types, diverse attack CSE‐CIC‐IDS2018 dataset. Telikani et al. [24] designed a
scenarios, and various attack protocols. Moreover, several ML combined architecture of stacked autoencoders and CNNs for
and DL algorithms were tested in the same environment and malicious activity detection. They utilised ToN‐IoT and UNSW‐
compared with the SACNN architecture. In summary, this NB15 datasets to assess the efficacy of the model. Zhang et al.
article introduces the following contributions: [25] adopted IDS based on the graph neural network (GNN) to
detect cyberattacks in IIoT networks communications. They
� A novel approach self‐attention convolutional neural utilised Mississippi cyberattack datasets to validate the proposed
network (SACNN) is proposed for the detection of mali- method. Khan et al. [26] designed a DAE IDS based on LSTM
cious activities in IIoT networks. It focuses on multi‐class networks to distinguish between normal and malicious traffic in
categories and multi‐class sub‐categories of attacks. A self‐ the IIoT networks. To analyse the effectiveness of the proposed
attention layer is used to divide input features into equal system, the UNSW‐NB15 dataset was utilised. Le et al. [27]
QATHRADY ET AL.
- 3
presented an extreme gradient boosting (XGboost) paradigm IDS2017 datasets, and they addressed the data imbalance issue
for detecting malicious activities in IIoT networks focusing on by applying the SMOTE method. The results of the presented
imbalanced datasets. X‐IIoTDS and TON_IoT datasets were model demonstrated detection accuracies of 90.99%, 99.15%,
utilised to assess the efficacy of the presented algorithm. Li et al. and 99.56% on the utilised datasets, respectively.
[28] proposed a hybrid architecture of CNN and Bi‐directional Table 1 provides an overview of the related work of cyber-
long short‐term memory (BiLSTM) for the classification of attack prediction in IIoT. The analysis of the relevant literature
cyberattacks in IIoT networks. The proposed architecture was reveals that most studies have focused on a limited number of
analysed with the NSL‐KDD dataset. attacks due to data imbalance issues in the datasets. As a result,
Altunay et al. [29] introduced a hybrid DL model known as when these systems are confronted with a diverse range of attack
CNN‐LSTM for detecting cyberattacks in industrial IoT. They classes, these systems face challenges in achieving precise
assessed the CNN‐LSTM model using the UNSW‐NB15 and detection outcomes. Furthermore, from the literature analysis,
X‐IIoTID datasets, achieving detection accuracies of 92.9% we have observed that most papers concentrate on multi‐class
and 99.8%, respectively. Lilhore et al. [30] developed an classification within major categories and do not explore the
Optimised CNN‐LSTM architecture for detecting suspicious subcategories of attacks. Moreover, we noted that all the related
network flow in industrial IoT. They evaluated the proposed papers worked with extensive datasets, without considering
system using the ToN‐IoT and UNSW‐NB15 datasets. The lightweight data suitable for low‐memory devices. This paper
designed model achieved precision rates of 92.7% and 94.25% addresses these issues by considering a higher number of attack
for the utilised datasets, respectively. Wang et al. [31] presented classes using both extensive and lightweight data. Additionally,
a combined ResNet, Transformer, and BiLSTM (Res‐Tran- for performance improvement, this paper introduces a novel DL
BiLSTM) algorithm for the detection of malicious activities in model called SACNN, designed to address the dataset imbalance
IIoT. The model was evaluated using the NSL‐KDD and CIC‐ issue for a limited and diverse number of attack classes.
TABLE 1 Related work overview of cyberattacks detection in IIoT.
Average finding No. of Multi‐class Multi‐class Light

Papers Years Method Dataset Evaluation metrics score (in %) attacks category sub‐category data
[19] 2020 Multi‐CNN NSL‐KDD Accuracy, precision, recall, 86.95, 89.56, 87.25, 4 ✓ � �
F1‐Score 88.41
[20] 2020 M2‐DAE Bot‐IoT F1‐score 99.7 3 ✓ � �
[21] 2021 Forensics‐DL Bot‐IIoT, UNSW‐ Accuracy, precision, recall, 98.93, 97.52, 98.1, 5, 9 ✓ � �
NB15 F1‐Score 97.82
[22] 2021 GA UNSW‐NB15 Accuracy, precision, recall, 77.64, 83.09, 77.64, 9 ✓ � �

F1‐Score 80.27
[23] 2022 VAE CSE‐CIC‐IDS2018 Accuracy, precision, recall, 98.57, 91.33, 82.18, 6 ✓ � �
F1‐Score 84.03
[24] 2022 SAE‐CNN ToN‐IoT, UNSW‐ Precision, recall, F1‐Score 93.35, 97.6, 95.2 9, 9 ✓ � �
NB15
[25] 2022 GNN Mississippi Accuracy, precision, recall, 97.2, 98, 90, 93 7 ✓ � �
F1‐Score
[26] 2022 DAE UNSW‐NB15 Accuracy, precision, recall, 97.95, 98, 96.63, 9 � � �
F1‐Score 97.89
[27] 2022 XGboost X‐IIoTDS, Precision, recall, F1‐Score 9.91, 99.84, 99.88 9, 7 ✓ � �
TON IoT
[28] 2022 CNN‐ NSL‐KDD Accuracy, Detection rate, 96.3, 97.1, 98.9 4 ✓ � �
BiLSTM Precision
[29] 2023 CNN‐LSTM UNSW‐NB15, X‐ Accuracy, precision, recall, 96, 96.06, 96.09, 9, 9 ✓ � �
IIoTID F1‐Score 96.07
[30] 2023 OCNN‐ ToN_IoT UNSW‐ Accuracy, precision, recall, 93.56, 93.48, 53.7, 7, 9 ✓ � �
LSTM NB15 F1‐Score 50.86
[31] 2023 Res‐ NSL‐KDD, CIC‐ Accuracy, precision, recall, 95.07, 95.27, 95.04, 4, 6 ✓ � �
TranBiLSTM IDS2017 F1‐Score 94.02
This 2023 SACNN Edge‐IIoTset, X‐ Accuracy, precision, recall, 99.62, 99.44, 99.11, 14, 9 ✓ ✓ ✓
study IIoTID F1‐Score 99.27
4
- QATHRADY ET AL.
3 | THE PROPOSED INTRUSION the behaviours of contemporary IIoT communication pro-

DETECTION SYSTEM tocols, the operations of state‐of‐the‐art devices, various attack
types, and diverse attack scenarios, as well as several attack
This section overviews our proposed approach for detecting protocols [32]. The dataset consists of 65 input features and a
cyberattacks within IIoT networks. It also describes the pre- total of 820,834 instances. Among these instances, 421,417 are
liminary steps required including data preparation, features categorised as normal, while the remaining 399,417 instances
extraction, normalisation, and data splitting. correspond to different attack types. A full breakdown of these
datasets is given in Table 2.
3.1 | Datasets description

3.2 | Preprocessing techniques
Edge‐IIoTset and X‐IIoTID are renowned datasets that are
used by several researchers in the field of ML and DL‐based This section comprises preprocessing procedures. In this
IDS. These datasets contain IoT and IIoT traffic samples experiment, three preprocessing procedures were used: data
generated by a real‐world testbed deployment consisting of preparation, feature extraction, and normalisation.
seven interconnected layers, including cloud computing,
network functions virtualisation, blockchain network, fog
computing, software‐defined networking, and edge computing, 3.2.1 | Data preparation
in addition to IoT and IIoT perception layers. More than ten
types of devices were used to generate the data including soil This is the first step in the preprocessing stage to address the
and water monitoring, temperature, and humidity among other problem of missing values and convert categorical features into
IoT devices. Edge‐IIoTset contains 14 attacks associated with numeric features. There are no null values in the Edge‐IIoTset
IoT and IIoT communication protocols that are classified into dataset. The utilised dataset contains categorical attributes that
five classes: DoS/DDoS, Information gathering, Man in the include numerous data categories. We considered using a one‐
middle, Injection, and Malware attacks. Data were collected hot encoder to map categorical attributes to numeric values;
from network packets in the form of pcap files which were however, this mechanism requires a large amount of memory
converted to CSV using Zeek and TShark tools [18]. This and introduces significant latency [33]. As a result, we instead
dataset is available in two versions: heavy and light. The heavy used the label encoder mechanism for the conversion task. In
version contains 2,219,201 instances, while the light version this method, each label is assigned a distinct numeric value
contains 157,800 instances. The X‐IIoTID dataset was created based on alphabetical order that does not require additional
by monitoring a real‐time IIoT network, which encompassed memory.
TABLE 2 A detailed presentation of datasets.
Heavy version of Edge‐IIoTset Light version of Edge‐IIoTset X‐IIoTID

Category Instances Sub category Instances Category Instances Sub category Instances Class Instances
Normal 1,615,643 Normal 1,615,643 Normal 24,301 Normal 24,301 Normal 421,417
DDoS 337,977 UDP 121,568 DDoS 49,396 DDoS_UDP 14,498 RDOS 141,261
ICMP 116,436 DDoS_ICMP 14,090
TCP 50,062 DDoS_TCP 10,247 Reconnaissance 127,590
HTTP 49,911 DDoS_HTTP 10,561
Injection 104,752 XSS 15,915 Injection 30,632 XSS 10,052 Weaponization 67,260
SQL 51,203 SQL 10,311 Exfiltration 22,134
Uploading 37,634 Uploading 10,269
Malware 85,940 Password 50,153 Malware 31,109 Password 9989 Lateral 31,596
Backdoor 24,862 Backdoor 10,195 Movement 5122
Ransomware 10,925 Ransomware 10,925 Tampering
Scanning 73,675 Port 22,564 Scanning 21,148 Port 10,071 C&C 2863
Fingerprinting 1001 Fingerprinting 1001 Cryoti 458
Vulnerability 50,110 Vulnerability 10,076 Ransomware
MITM 1214 MITM 1214 MITM 1214 MITM 1214 Exploitation 1133
QATHRADY ET AL.
- 5
3.2.2 | Feature extraction the other 20% for testing purposes. The stratified technique is
applied to divide the data into identical sets for each class.
Feature extraction is the process of reducing a high‐dimensional
x − xmin
dataset into a low‐dimensional dataset with the goal of selecting Xnorm ¼ ð2Þ
the most significant features. The key benefit of feature xmax − xmin
extraction is to improve the effectiveness of the classification
model by avoiding overfitting and introducing less processing 3.3 | The proposed SACNN architecture
power, and better memory utilisation [34–37]. In our proposed
technique, we opted to use the extra‐tree classifier (ETC) The proposed SACNN architecture consists of a self‐attention
method to extract the most significant features due to its many layer and convolutional neural networks (CNN) layers, as
advantages over the other extraction methods [38]. The ETC shown in Figure 1. The self‐attention layer divides the input
operates on the gain value that represents the impact of an features into vectors of equal size called heads. It processes all
attribute on the output category. heads in parallel and computes the attention value for each head.
The ETC is an ensemble learning technique that combines CNN layers are used to process the computed attention and
the outcomes of numerous decision trees, which are uncorre- predict the attack class, as Algorithm 1 outlines the details using a
lated, to form a ‘forest’ to produce its classification results. In pseudo‐formatted flow. The proposed SACNN approach
this method, each decision tree is built from the original training operates input shape (instances_set, attributes, 1), where
sample. During testing at each node, a random subset of k ‘instances_set’ represents the batch size, ‘attributes’ represents
features is provided to each tree from the feature set. Conse- the number of input features, and ‘1’ represents the individual
quently, each decision tree independently selects the optimal input instance. In the proposed architecture, self‐attention splits
feature for data splitting using Equation (1). the input features into eight equal vectors (heads) [41]. The head
size is calculated in Equation (3), where Hs is the size of the head,
X Is is the total number of input attributes, and Nh is the number of
jSv j
GainðS; AÞ ¼ EntropyðSÞ − ⋅ EntropyðS v Þ splitting heads. Self‐attention layer computes the attention based
v∈ValuesðAÞ
jSj on queries (Q), keys (k), and values (V). Q, K, and V are
ð1Þ demonstrated in Equation (4), Equation (5), and Equation (6),
respectively, where X represents the input vector and W repre-
sents the weight. The attention of each head is computed using
Where Gain(S, A) is the information gain which splits the Equation (7), where dq is the length of Q. All the computed
dataset S based on attribute A. Values(A) are distinct values that attentions of the heads are combined to generate the output of
the attribute A can take. Sv is the Subset of examples in S for the self‐attention layer, as expressed in Equation (8). Add and
which attribute A has value v. |Sv| represents the number of norm layer was used to handle the vanishing gradient issue [42].
examples in subset Sv. And |S| denotes the total number of
examples in the dataset. The final information gain for each Is
Hs ¼ ⌈ ⌉ ð3Þ
feature is computed as an average of overall decision trees in the Nh
ETC. The goal is to identify features that consistently provide Q ¼ X � WQ ð4Þ
high information gain across multiple trees. Attributes with a
total information gain value greater than 0 were extracted, while K ¼ X � WK ð5Þ
0 information gain value attributes had no impact on the output.
V ¼ X � WV ð6Þ
After the filtering process, 54 attributes were extracted with !
greater than 0 gain value. The remaining eliminated 7 attributes Q � KT
Zi ¼ softmax pffiffiffiffi �V ð7Þ
have 0 gain value. dq
Z ¼ Zi ð1; …; nÞ ð8Þ
3.2.3 | Normalisation and splittings
Algorithm 1 Proposed SACNN algorithm
Normalisation is a method of rescaling data into a common
range. For ML and DL classifiers, there is no need to rescale Require: Input data vector X
the dataset if the scales of the attributes are not much variant. Ensure: Output predicted probabilities Y
Different range‐scaled attributes of the dataset affect the 1: function SELFATTENTION(X)
effectiveness of the classifiers [39, 40]. The edge‐IIoT set 2: Q, K, V ← LinearTransform(X)
dataset has a varied range of attributes that need to be nor- ▹ Linear transformations
malised. In our proposed approach, the min–max normal- 3: H ← Attention_score(Q, K, V)
isation technique is used to scale the attributes between 0 and ▹ Self attention
1, as represented in Equation (2). To verify the efficacy of the 4: return H
SACNN, we divided the normalised form of the dataset into 5: end function
two portions, one containing 80% of the data for training and 6: function CNNLAYERS(Hnorm)
6
- QATHRADY ET AL.
7: HConvo ← Convo(Hnorm) The self‐attention layer computed the significance of

▹ Convolutional layers heads, where each head consists of multiple features. In the
8: HCNN ← MaxPool(Hconvo) ▹ Max next stage, we pass the output of the self‐attention layer to the
Pooling layer CNN layers for network intrusion detection. The fundamental
9: return HCNN benefit of a CNN is its ability to capture the importance of
10: end function features. Moreover, a CNN operates with fewer parameters
11: function DENSELAYERS(HCNN) than other DL algorithms, resulting in faster performance [43].
12: Ddense ← RelU(HCNN) ▹ Dense A CNN is typically made up of convolutional layers, pooling
ReLU layer layers, and fully connected layers [44]. In the proposed archi-
13: Y ← Softmax(Ddense) ▹ Softmax tecture, two 1D convolutional layers with a kernel size of three
layer for predictions and 26 filters, a max‐pooling layer with a pool size of four,
14: return Y flatten, and two fully connected layers were utilised. The
15: end function convolutional layer highlights the significance of the features as
16: H ← SELFATTENTION(X) well as diminishes the noise [45]. The utilised convolutional
17: Hnorm ← Add_&_Norm(H þ X) ▹ Add layers are expressed in Equation (9) and Equation (10), where
and layers normalisation the input to the CNN is represented by xk, si signifies the
18: HCNN ← CNNLAYERS(Hnorm) neurons of the previous layer, wik signifies the kernel size, and
19: Y ← DENSELAYERS(HCNN) bk depicts the bias. The output of the convolution operation is
20: return Y denoted by yk where the ReLU activation method is employed.
The yield of the convolution operations passes into a
FIGURE 1 Flow diagram of the proposed architecture.

QATHRADY ET AL.
- 7
max‐pooling operation that converges to the most prominent 4.1 | Evaluation metrics
features as expressed in Equation (11). The flatten mechanism
is utilised to transform the yield of max‐pooling operations Several metrics are used in this study to assess the efficacy of
into a 1D vector that passes into ReLU dense layer. The final our proposed classification approach including accuracy,
layer of the approach is the softmax dense layer that produces macro‐precision, macro‐recall, and macro F1‐score.
the output, as demonstrated in Equation (12).
� Accuracy (ACC) refers to the percentage of correctly pre-
X
N dicted instances made by the classification model out of all
xk ¼ bk þ ðsi ; wik Þ ð9Þ the predictions made and it is calculated as in Equation (13).
i¼1 Where α, β, γ, and δ represent true positive, true negative,
false positive, and false negative, respectively.
yk ¼ maxð0; xk Þ ð10Þ
αþβ
sk ¼ max ACC ¼ ð13Þ
i∈ℜ yk ð11Þ αþβþγþδ
exi
softmaxðxÞi ¼ ð12Þ � Precision (P) refers to the ratio between the True Positives
P
K
exj and all the instances classified as positive. Our model refers
j¼1
to the number of true instances classified as abnormal out of
all instances classified as abnormal by the model as given in
3.3.1 | Hyperparameters Equation (14).
Hyperparameters are established prior to the training of neural α

P ¼ ð14Þ
networks, enabling the DL models to learn from the training αþγ
data. These hyperparameters play a crucial role in the training
process and can significantly impact the performance of the � Recall (R) refers to the ratio between the True Positives and
model. Table 3 presents the most important hyperparameters all the positive instances. In our model, it refers to the
utilised in the proposed SACNN model. number of true instances classified as abnormal out of all
abnormal instances as given in Equation (15).
3.4 | Experimental setup α

R ¼ ð15Þ
αþδ
The experiments of our proposed approach were carried out on
a Jupyter notebook with an Intel Core i5 8th generation pro- It is worth indicating that normal samples are designated
cessor and 24‐GB RAM running Windows 11 Pro 64‐bit as Negative, whereas abnormal samples are designated as
operating system and Python 3.8. Several Python libraries Positive in this study. There are situations when the precision
were utilised including Keras, TensorFlow, Pandas, Scikit Learn, and recall measures conflict and thus, they should be carefully
and NumPy. investigated. Several researchers used the F1 score which is
the harmonic mean of precision and recall, as shown in
Equation (16).
4 | PERFORMANCE ASSESSMENT
This section covers a comprehensive assessment of the pro- 2�ðP � RÞ

F1 Score ¼ ð16Þ
posed approach. Edge‐IIoTset datasets were used in this Pþ R
experiment to assess the effectiveness of the proposed
approach. Experiments were conducted for multi‐class cate- The area under the curve (AUC) is a metric that quantifies
gories and multi‐class sub‐categories, and outcomes of the the area beneath the receiver operating characteristic (ROC)
SACNN approach are assessed in comparison to other ML and curve. The ROC curve is generated by graphing the true
DL methods. positive rate (TPR) against the false positive rate (FPR) across
various classification thresholds. TPR and FPR are determined
using Equation (17) and Equation (18), respectively.
TABLE 3 Utilised hyperparameters in the proposed SACNN.
α
Learning Batch T PR ¼ ð17Þ
Optimiser rate Loss function size Epochs αþδ
Adam 0.001 Sparse categorical cross‐ 32 6 γ
entropy FPR ¼ ð18Þ
γþβ
8
- QATHRADY ET AL.
4.2 | Results and discussion 4.2.3 | Results on X‐IIoTID
The Adam optimiser, loss function sparse categorical cross‐ Table 6 presents the performance evaluation results of the
entropy, and batch size of 32 were used in this study. The proposed SACNN approach on the X‐IIoTID dataset for
model was trained over a period of six epochs. For training and multi‐class classification, with different numbers of CNN
testing the proposed architecture, a fivefold cross‐validation layers. Similar to the findings in the Edge‐IIoTset results, it is
approach is also used. evident that the optimal performance of the proposed
approach was achieved with two convolutional layers and one
max pooling layer. All the evaluation results were optimal for
4.2.1 | Results on multi‐class categories of Edge‐ these CNN layers.
IIoTset
Table 4 shows the performance evaluation results of the pro- 4.2.4 | Discussion
posed SACNN approach on the Edge‐IIoTset dataset multi‐
class categories for both the heavy and light versions with The optimal results for the utilised datasets on multi‐class
varying numbers of CNN layers. It is evident from the table categories and multi‐class sub‐categories were achieved with
that the optimum intrusion detection performance of the two 1D convolutional layers and a max pooling layer. It is
SACNN approach was achieved with two convolutional and important to note that these datasets had imbalanced class
one max pooling layer on both heavy and light versions of the distributions. The average accuracy of the proposed DL
dataset. All of the evaluation results were optimal when these approach reached 99.66% for multi‐class categories and multi‐
layers were used. class sub‐categories on both the heavy and light versions of the
Edge‐IIoTset dataset. Moreover, it achieved an accuracy of
99.72% on the X‐IIoTID dataset. To ensure that the proposed
4.2.2 | Results on multi‐class sub‐categories of
Edge‐IIoTset TABLE 6 Results of the proposed SACNN on X‐IIoTID multi‐class.
Table 5 shows the performance evaluation results of the Layers

proposed SACNN approach on the Edge‐IIoTset dataset Max F1‐
multi‐class sub‐categories for both the heavy and light ver- Convolutional pooling P R score ACC AUC
sions and again with varying numbers of CNN layers. Similar 1 1 0.9788 0.9697 0.9728 0.9917 0.9978
to the results on the multi‐class categories, it is also clear that
2 1 0.9911 0.9835 0.9871 0.9981 0.9995
the optimal performance of the proposed approach was
achieved under two convolutional layers and one max pooling 2 2 0.9425 0.9534 0.9457 0.9899 0.9982
layer. All the evaluation results were optimal for these CNN 4 2 0.9922 0.9378 0.9601 0.9972 0.9987
layers.
TABLE 4 Results of the proposed SACNN on Edge‐IIoTset multi‐class categories.
Layers Heavy version of Edge‐IIoTset Light version of Edge‐IIoTset

Convolutional Max pooling P R F1‐score ACC AUC P R F1‐score ACC AUC
1 1 0.9974 0.9976 0.9975 0.9994 0.9998 0.991 0.994 0.9923 0.9927 0.9998
2 1 0.9979 0.998 0.9979 0.9995 0.9999 0.9945 0.9955 0.995 0.9949 0.9999
2 2 0.9948 0.9947 0.9947 0.9987 0.9991 0.9891 0.9895 0.9893 0.9892 0.9987
4 2 0.9961 0.9965 0.9963 0.9991 0.9995 0.9919 0.9904 0.9911 0.9914 0.9989
TABLE 5 Results of the proposed SACNN on Edge‐IIoTset multi‐class sub‐categories.
Layers Heavy version of Edge‐IIoTset Light version of Edge‐IIoTset

Convolutional Max pooling P R F1‐score ACC AUC P R F1‐score ACC AUC
1 1 0.9948 0.9889 0.9916 0.9994 0.9995 0.9906 0.9818 0.9856 0.9897 0.9993
2 1 0.9966 0.9911 0.9937 0.9995 0.9999 0.9921 0.9876 0.9897 0.9927 0.9998
2 2 0.9934 0.9867 0.9898 0.9993 0.9991 0.9802 0.9733 0.976 0.982 0.9995
4 2 0.9936 0.9846 0.9888 0.9992 0.9987 0.9792 0.9896 0.9837 0.9914 0.9997
QATHRADY ET AL.
- 9
model did not suffer from overfitting or underfitting issues, we Meanwhile, Figure 6 illustrates the training and validation
analysed its training and validation performance. Figures 2–5 performance on the X‐IIoTID dataset. Notably, the training
shows the training and validation performance for the and validation performance demonstrate consistent results,
selected optimal layers of the SACNN model when applied to providing evidence that the proposed SACNN model does not
both the heavy and light versions of the Edge‐IIoTset dataset. suffer from overfitting or underfitting issues.
F I G U R E 2 Training and validation performance of the proposed self‐attention convolutional neural network on the heavy version of Edge‐IIoTset multi‐
class categories.
F I G U R E 3 Training and validation performance of the proposed self‐attention convolutional neural network on the light version of Edge‐IIoTset multi‐
class categories.
F I G U R E 4 Training and validation performance of the proposed self‐attention convolutional neural network on the heavy version of Edge‐IIoTset multi‐
class sub‐categories.
10
- QATHRADY ET AL.
F I G U R E 5 Training and validation performance of the proposed self‐attention convolutional neural network on the light version of Edge‐IIoTset multi‐
class sub‐categories.
FIGURE 6 Training and validation performance of the proposed self‐attention convolutional neural network on X‐IIoTID dataset.
4.3 | Comparison with existing Machine Tables 7 and 8 provide a comprehensive summary of the
Learning and DL methods evaluation results for the respective models on the multi‐class
categories of the Edge‐IIoTID dataset, both for the heavy and
We also compared the performance of our proposed approach light versions. These results clearly demonstrate that the pro-
to that of several other ML and DL classifiers including CNN, posed SACNN model consistently outperforms all other
gated recurrent units (GRU), LSTM, autoencoder (AE), deep models in terms of accuracy, precision, recall, and F1‐score
neural network (DNN) multi‐layer perceptron (MLP), linear across both versions of the Edge‐IIoTset dataset in the
regression (LR), Naive Bayes (NB), and support vector machine multi‐class category. Similarly, Tables 9 and 10 present the
(SVM) on the same dataset and using the same experimental evaluation results for the respective models on the multi‐class
settings. In this research, we applied the same preprocessing sub‐categories of the Edge‐IIoTset dataset, considering both
steps to prepare data for all DL and ML‐based IDS as the the heavy and light versions. Once again, the results clearly
proposed SACNN. This consistent approach is crucial for indicate that the proposed model surpasses its counterparts in
making a fair comparison with other ML and DL techniques. If terms of accuracy, precision, recall, and F1‐score for both
different data preparation methods were used for each model, it versions of the Edge‐IIoTset dataset in the multi‐class sub‐
would be challenging to determine whether any differences in category. In addition, Table 11 displays the evaluation results
performance were due to the model's design or the data pre- for the same models applied to the X‐IIoTID dataset. As
processing. For all DL‐based models, we employed the Adam expected, the results in this table also highlight the superiority
optimisation function with sparse categorical cross‐entropy loss of the proposed model, showcasing higher accuracy, precision,
and trained them for six epochs with a batch size of 32. The recall, and F1‐score when compared to other models for the
implementation of DL‐based models, such as GRU, LSTM, X‐IIoTID dataset.
CNN, AE, and DNN, includes 2, 2, 4, 6, and 4 hidden layers, It is worth noting that the superiority of our proposed
respectively. In the CNN, the convolutional layers consist of 32 model over other models was obtained using an imbalanced
filters with a kernel size of 3 and use the same padding, while the dataset with a smaller number of instances. It is also noticeable
max pooling has a pool size of 4. that while other models did not perform well using the light
QATHRADY ET AL.
- 11
TABLE 7 Results comparison of the proposed SACNN with other models on the heavy version of Edge‐IIoTset multi‐class categories.
Training time Test time

Algorithm P R F1‐score ACC AUC (in sec) (in sec)
LSTM [29] 0.9921 0.9957 0.9938 0.9983 0.9991 2689 171
GRU [46] 0.987 0.9887 0.9878 0.9983 0.9993 2514 202
CNN [29] 0.995 0.9938 0.9944 0.9986 0.9994 83 12
AE [47] 0.9952 0.9981 0.9966 0.9995 0.9998 98 12
DNN [48] 0.9944 0.9838 0.9889 0.9988 0.9996 81 10
MLP [49] 0.9937 0.9952 0.9944 0.9989 0.9997 104 1
LR [48] 0.9161 0.9111 0.9127 0.9859 0.9958 68 1
NB [48] 0.8767 0.8371 0.831 0.9422 0.9775 2 1
SVM [48] 0.9817 0.9855 0.9841 0.9954 0.9976 19,975 677
Proposed SACNN 0.9979 0.998 0.9979 0.9995 0.9999 181 18
TABLE 8 Results comparison of the proposed SACNN with other models on the light version of Edge‐IIoTset multi‐class categories.

LSTM [29] 0.9675 0.9711 0.9692 0.9722 0.9949 162 12
GRU [46] 0.88 0.8584 0.8683 0.8952 0.9684 178 15
CNN [29] 0.9826 0.9827 0.9826 0.9816 0.9986 8 1
AE [47] 0.9926 0.9937 0.9931 0.9933 0.9993 7 1
DNN [48] 0.9915 0.9922 0.9918 0.9923 0.9989 6 1
MLP [49] 0.7932 0.8004 0.7968 0.9576 0.9642 46 0.5
LR [48] 0.9442 0.9468 0.9451 0.9403 0.9952 5 0.2
NB [48] 0.8198 0.7557 0.7454 0.7292 0.9683 1 0.4
SVM [48] 0.9727 0.9724 0.9725 0.9695 0.9962 110 15
Proposed SACNN 0.9945 0.9955 0.995 0.9949 0.9999 13 1
TABLE 9 Results comparison of the proposed SACNN with other models on the heavy version of Edge‐IIoTset multi‐class sub‐categories.

LSTM [29] 0.9883 0.9911 0.9896 0.9991 0.9997 3096 211
GRU [46] 0.9414 0.9774 0.9547 0.9977 0.9994 3219 239
CNN [29] 0.9887 0.9836 0.9859 0.9988 0.9996 86 12
AE [47] 0.9829 0.9942 0.9879 0.9984 0.9991 103 12
DNN [48] 0.9944 0.9838 0.9889 0.9988 0.9995 81 10
MLP [49] 0.995 0.9845 0.9893 0.9992 0.9989 137 1
LR [48] 0.9297 0.9023 0.91 0.9962 0.9972 114 1
NB [48] 0.9582 0.9337 0.9404 0.9962 0.996 2 3
SVM [48] 0.9836 0.98 0.9817 0.9985 0.9992 432 241
Proposed SACNN 0.9966 0.9911 0.9937 0.9995 0.9999 187 19

12
- QATHRADY ET AL.
T A B L E 10 Results comparison of the proposed SACNN with other models on the light version of Edge‐IIoTset multi‐class sub‐categories.

LSTM [29] 0.9473 0.9502 0.9485 0.9676 0.9841 156 11
GRU [46] 0.8424 0.8477 0.8439 0.9337 0.9493 167 13
CNN [29] 0.9824 0.9647 0.9718 0.9835 0.9978 6 1
AE [47] 0.9781 0.9914 0.9837 0.9918 0.9986 7 1
DNN [48] 0.9805 0.9895 0.9844 0.99 0.9989 6 1
MLP [49] 0.9254 0.9103 0.9153 0.9474 0.9924 52 0.1
LR [48] 0.9759 0.9529 0.9614 0.9757 0.9987 9 0.2
NB [48] 0.9071 0.8776 0.8801 0.9184 0.9951 0.5 1
SVM [48] 0.9882 0.9772 0.982 0.9879 0.9983 21 11
Proposed SACNN 0.9921 0.9876 0.9897 0.9927 0.9998 13 2
T A B L E 11 Results comparison of the proposed SACNN with other models on X‐IIoTID multi‐class.

Algorithms P R F1‐score ACC AUC (in sec) (in sec)
LSTM [29] 0.9821 0.9022 0.9344 0.9915 0.9951 710 61
GRU [46] 0.9652 0.945 0.9523 0.9948 0.9984 780 47
CNN [29] 0.9818 0.9624 0.9706 0.9903 0.9992 42 7
AE [47] 0.9887 0.9796 0.9836 0.9976 0.9998 33 6
DNN [48] 0.9804 0.9761 0.9781 0.988 0.9978 28 6
MLP [49] 0.9779 0.9692 0.9723 0.9926 0.9959 235 0.2
LR [48] 0.9526 0.9164 0.9301 0.9764 0.9921 39 0.1
NB [48] 0.5068 0.8145 0.4832 0.4858 0.7861 2 1
SVM [48] 0.9632 0.9671 0.9647 0.9818 0.9937 42,956 374
Proposed SACNN 0.9911 0.9835 0.9871 0.9981 0.9995 259 53
version compared to the heavy version, the proposed model addressed the issue of imbalance and fewer data and improved
achieved comparable results under both dataset variants. This the intrusion detection performance in IIoT networks.
demonstrates that our proposed model can detect attacks with Compare the performance of the proposed approach with other
high accuracy even when trained on a small subset of the classifiers to validate its efficacy. The proposed approach has
dataset. higher efficiency than other classifiers for multi‐class categories
and multi‐class sub‐categories on both datasets.
The proposed model improves the performance of
5 | CONCLUSION cyberattack detection in IIoT networks. Additionally, the pro-
posed SACNN demonstrates effective functionality on low‐
This paper proposes a SACNN architecture for the detection of power and low‐memory devices, with an average processing
malicious activity in IIoT networks and an appropriate feature time compared to other models. While some other models may
extraction method to extract the most significant feature. The offer quicker training and testing times, they fail to match the
proposed architecture has a self‐attention layer to calculate the detection performance of the proposed model. Furthermore,
input attention and CNN layers to process the assigned atten- the proposed model has the potential for additional
tion features for prediction. The performance evaluation of the compression and optimisation, leading to reduced detection
proposed SACNN architecture has been done with the Edge‐ time and enhanced performance.
IIoTset and X‐IIoTID datasets. The proposed approach ach-
ieved an average accuracy of 99.66% for multi‐class categories A C K N OWL E D G E M E N TS
and multi‐class sub‐categories on both versions of the Edge‐ Authors would like to acknowledge the support of the Deputy
IIoTID dataset. Moreover, it achieved an accuracy of 99.72% for Research and Innovation ‐ Ministry of Education, Kingdom
on the X‐IIoTID dataset. The SACNN model successfully of Saudi Arabia for this research through grant (NU/IFC/02/
QATHRADY ET AL.
- 13
SERC/‐/31) under the Institutional Funding Committee at Univ. Sci. 34(6), 102149 (2022). https://doi.org/10.1016/j.jksus.2022.
Najran University, Kingdom of Saudi Arabia. 102149
14. Mirsky, Y., et al.: Kitsune: An Ensemble of Autoencoders for
Online Network Intrusion Detection (2018). arXiv preprint arXiv:
CO NF LIC T OF I N T ER E ST STAT E M EN T 1802.09089
The authors declare no conflict of interest. 15. Gao, Z.J., Pansare, N., Jermaine, C.: Declarative parameterizations of
user‐defined functions for large‐scale machine learning and optimization.
DATA AVA IL AB I LI T Y S TAT E M EN T IEEE Trans. Knowl. Data Eng. 31(11), 2079–2092 (2018). https://doi.
The dataset used in this research was obtained from publicly org/10.1109/tkde.2018.2873325
16. Al‐Turaiki, I., Altwaijry, N.: A convolutional neural network for
available sources: [Edge‐IIoTset: https://ieee‐dataport.org/ improved anomaly‐based network intrusion detection. Big Data 9(3),
documents/edge‐iiotset‐new‐comprehensive‐realistic‐cyber‐ 233–252 (2021). https://doi.org/10.1089/big.2020.0263
security‐dataset‐iot‐and‐iiot‐applications and https://www. 17. Aldweesh, A., Derhab, A., Emam, A.Z.: Deep learning approaches for
kaggle.com/datasets/mohamedamineferrag/edgeiiotset‐cyber‐ anomaly‐based intrusion detection systems: a survey, taxonomy, and open
issues. Knowl. Base Syst. 189, 105124 (2020). https://doi.org/10.1016/j.
security‐dataset‐of‐iot‐iiot, X‐IIoTID: https://ieee‐dataport.
knosys.2019.105124
org/documents/x‐iiotid‐connectivity‐and‐device‐agnostic‐ 18. Ferrag, M.A., et al.: Edge‐iiotset: a new comprehensive realistic cyber
intrusion‐dataset‐industrial‐internet‐things]. security dataset of iot and iiot applications for centralized and federated
learning. IEEE Access 10, 40281–40306 (2022). https://doi.org/10.
O RC ID 1109/access.2022.3165809
Jawad Ahmad https://orcid.org/0000-0001-7495-2248 19. Li, Y., et al.: Robust detection for network intrusion of industrial iot
based on multi‐cnn fusion. Measurement 154, 107450 (2020). https://
doi.org/10.1016/j.measurement.2019.107450
R EF ER E NCE S 20. Bovenzi, G., et al.: A hierarchical hybrid intrusion detection approach in
1. Kaur, B., et al.: Internet of things (iot) security dataset evolution: chal- iot scenarios. In: GLOBECOM 2020‐2020 IEEE Global Communica-
lenges and future directions. IoT 22, 100780 (2023). https://doi.org/10. tions Conference, pp. 1–7. IEEE (2020)
1016/j.iot.2023.100780 21. Abdel‐Basset, M., et al.: Deep‐ifs: intrusion detection approach for
2. Anand, A., Singh, A.: A hybrid optimization‐based medical data hiding industrial internet of things traffic in fog environment. IEEE Trans.
scheme for industrial internet of things security. IEEE Trans. Ind. Inf. Ind. Inf. 17(11), 7704–7715 (2021). https://doi.org/10.1109/tii.2020.
19(1), 1051–1058 (2022). https://doi.org/10.1109/tii.2022.3164732 3025755
3. Haq, M.I.U., et al.: Robust graph‐based localization for industrial internet 22. Kasongo, S.M.: An advanced intrusion detection system for iiot based on
of things in the presence of flipping ambiguities. CAAI Transactions on ga and tree based algorithms. IEEE Access 9, 113199–113212 (2021).
Intelligence Technology 8(4), 1140–1149 (2023). https://doi.org/10. https://doi.org/10.1109/access.2021.3104113
1049/cit2.12203 23. Liu, C., et al.: Intrusion detection system after data augmentation schemes
4. Tang, S., et al.: Computational intelligence and deep learning for next‐ based on the vae and cvae. IEEE Trans. Reliab. 71(2), 1000–1010 (2022).
generation edge‐enabled industrial iot. IEEE Transactions on Network https://doi.org/10.1109/tr.2022.3164877
Science and Engineering 10(5), 2881–2893 (2022). https://doi.org/10. 24. Telikani, A., et al.: Industrial iot intrusion detection via evolutionary cost‐
1109/tnse.2022.3180632 sensitive learning and fog computing. IEEE Internet Things J. 9(22),
5. Abdel‐Basset, M., et al.: Deep‐ifs: intrusion detection approach for in- 23260–23271 (2022). https://doi.org/10.1109/jiot.2022.3188224
dustrial internet of things traffic in fog environment. IEEE Trans. Ind. 25. Zhang, Y., et al.: Intrusion detection of industrial internet‐of‐things based
Inf. 17(11), 7704–7715 (2020). https://doi.org/10.1109/tii.2020.3025755 on reconstructed graph neural networks. IEEE Transactions on Network
6. Shi, Y., et al.: Joint online optimization of data sampling rate and pre- Science and Engineering 10(5), 2894–2905 (2022). https://doi.org/10.
processing mode for edge‐cloud collaboration enabled industrial iot. 1109/tnse.2022.3184975
IEEE Internet Things J. 9(17), 16402–16417 (2022). https://doi.org/10. 26. Khan, I.A., et al.: Enhancing iiot networks protection: a robust security
1109/jiot.2022.3150386 model for attack detection in internet industrial control systems. Ad
7. Hassan, M.M., et al.: Increasing the trustworthiness in the industrial iot Hoc Netw. 134, 102930 (2022). https://doi.org/10.1016/j.adhoc.2022.
networks through a reliable cyberattack detection model. IEEE Trans. 102930
Ind. Inf. 16(9), 6154–6162 (2020). https://doi.org/10.1109/tii.2020. 27. Le, T.‐T.‐H., Oktian, Y.E., Kim, H.: Xgboost for imbalanced multiclass
2970074 classification‐based industrial internet of things intrusion detection sys-
8. Bovenzi, G., et al.: Network anomaly detection methods in iot environ- tems. Sustainability 14(14), 8707 (2022). https://doi.org/10.3390/su1
ments via deep learning: a fair comparison of performance and robust- 4148707
ness. Comput. Secur. 128, 103167 (2023). https://doi.org/10.1016/j.cose. 28. Li, A., Yi, S.: Intelligent intrusion detection method of industrial internet
2023.103167 of things based on cnn‐bilstm. Secur. Commun. Network. 2022, 1–8
9. Rejeb, A., et al.: The internet of things (iot) in healthcare: taking stock (2022). https://doi.org/10.1155/2022/5448647
and moving forward. IoT 22, 100721 (2023). https://doi.org/10.1016/j. 29. Altunay, H.C., Albayrak, Z.: A hybrid cnnþ lstmbased intrusion detec-
iot.2023.100721 tion system for industrial iot networks. Engineering Science and Tech-
10. Wani, A., Khaliq, R.: Sdn‐based intrusion detection system for iot using nology, an International Journal 38, 101322 (2023). https://doi.org/10.
deep learning classifier (idsiot‐sdl). CAAI Transactions on Intelligence 1016/j.jestch.2022.101322
Technology 6(3), 281–290 (2021). https://doi.org/10.1049/cit2.12003 30. Lilhore, U.K., et al.: Hidm: hybrid intrusion detection model for industry
11. Liang, W., et al.: Variational few‐shot learning for microservice‐oriented 4.0 networks using an optimized cnn‐lstm with transfer learning. Sensors
intrusion detection in distributed industrial iot. IEEE Trans. Ind. Inf. 23(18), 7856 (2023). https://doi.org/10.3390/s23187856
18(8), 5087–5095 (2021). https://doi.org/10.1109/tii.2021.3116085 31. Wang, S., Xu, W., Liu, Y.: Res‐tranbilstm: an intelligent approach for
12. Samanta, R.K., et al.: Scope of machine learning applications for intrusion detection in the internet of things. Comput. Network. 235,
addressing the challenges in next‐generation wireless networks. CAAI 109982 (2023). https://doi.org/10.1016/j.comnet.2023.109982
Transactions on Intelligence Technology 7(3), 395–418 (2022). https:// 32. Al‐Hawawreh, M., Sitnikova, E., Aboutorab, N.: X‐iiotid: a connectivity‐
doi.org/10.1049/cit2.12114 agnostic and device‐agnostic intrusion data set for industrial internet of
13. Xingxin, C., Xin, Z., Gangming, W.: Research on online fault detection things. IEEE Internet Things J. 9(5), 3962–3977 (2021). https://doi.org/
tool of substation equipment based on artificial intelligence. J. King Saud 10.1109/jiot.2021.3102056
14
- QATHRADY ET AL.
33. Dahouda, M.K., Joe, I.: A deep‐learned embedding technique for cate- 44. Riyaz, B., Ganapathy, S.: A deep learning approach for effective intrusion
gorical features encoding. IEEE Access 9, 114381–114391 (2021). detection in wireless networks using cnn. Soft Comput. 24(22),
https://doi.org/10.1109/access.2021.3104357 17265–17278 (2020). https://doi.org/10.1007/s00500‐020‐05017‐0
34. Sarhan, M., et al.: Feature extraction for machine learning‐based intru- 45. Zhang, H., et al.: An effective convolutional neural network based on
sion detection in iot networks. Digital Communications and Networks smote and Gaussian mixture model for intrusion detection in imbalanced
10(1), 205–216 (2022). https://doi.org/10.1016/j.dcan.2022.08.012 dataset. Comput. Network. 177, 107315 (2020). https://doi.org/10.1016/
35. Uddin, M.F., et al.: Proposing enhanced feature engineering and a se- j.comnet.2020.107315
lection model for machine learning processes. Appl. Sci. 8(4), 646 (2018). 46. Ansari, M.S., Bartoš, V., Lee, B.: Gru‐based deep learning approach for
https://doi.org/10.3390/app8040646 network intrusion alert prediction. Future Generat. Comput. Syst. 128,
36. Globerson, A., Tishby, N.: Sufficient dimensionality reduction. J. Mach. 235–247 (2022). https://doi.org/10.1016/j.future.2021.09.040
Learn. Res. 3(Mar), 1307–1331 (2003) 47. Aygun, R.C., Yavuz, A.G.: Network anomaly detection with stochastically
37. Adeel, A., et al.: Entropy‐controlled deep features selection framework improved autoencoder based models. In: 2017 IEEE 4th International
for grape leaf diseases recognition. Expet Syst. 39(7), e12569 (2022). Conference on Cyber Security and Cloud Computing (CSCloud), pp.
https://doi.org/10.1111/exsy.12569 193–198. IEEE (2017)
38. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. 48. Vinayakumar, R., et al.: Deep learning approach for intelligent intrusion
Learn. 63(1), 3–42 (2006). https://doi.org/10.1007/s10994‐006‐6226‐1 detection system. IEEE Access 7, 41525–41550 (2019). https://doi.org/
39. Chiba, Z., et al.: A novel architecture combined with optimal parameters 10.1109/access.2019.2895334
for back propagation neural networks applied to anomaly network 49. Rosay, A., Carlier, F., Leroux, P.: Mlp4nids: an efficient mlp‐based network
intrusion detection. Comput. Secur. 75, 36–58 (2018). https://doi.org/ intrusion detection for cicids2017 dataset. In: Machine Learning for
10.1016/j.cose.2018.01.023 Networking: Second IFIP TC 6 International Conference, MLN 2019,
40. Larriva‐Novo, X., et al.: An iot‐focused intrusion detection system Paris, France, December 3–5, 2019, Revised Selected Papers 2, pp.
approach based on preprocessing characterization for cybersecurity 240–254. Springer (2020)
datasets. Sensors 21(2), 656 (2021). https://doi.org/10.3390/s21020656
41. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process.
Syst. 30 (2017) How to cite this article: Qathrady, M. A., et al.:
42. He, K., et al.: Deep residual learning for image recognition. In: Pro- SACNN‐IDS: a self‐attention convolutional neural
ceedings of the IEEE Conference on Computer Vision and Pattern network for intrusion detection in industrial internet of
Recognition, pp. 770–778 (2016)
things. CAAI Trans. Intell. Technol. 1–14 (2024).
43. Scarpa, G., et al.: A cnn‐based fusion method for feature extraction from
sentinel data. Rem. Sens. 10(2), 236 (2018). https://doi.org/10.3390/ https://doi.org/10.1049/cit2.12352
rs10020236

CAAI Trans On Intel Tech - 2024 - Qathrady - SACNN IDS A Self Attention Convolutional Neural Network For Intrusion

Uploaded by

Copyright:

Available Formats

CAAI Trans On Intel Tech - 2024 - Qathrady - SACNN IDS A Self Attention Convolutional Neural Network For Intrusion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CAAI Trans On Intel Tech - 2024 - Qathrady - SACNN IDS A Self Attention Convolutional Neural Network For Intrusion

Uploaded by

Copyright:

Available Formats

Received: 23 April 2023

- CAAI Transactions on Intelligence Technology

SACNN‐IDS: A self‐attention convolutional neural network for

Mimonah Al Qathrady1 | Safi Ullah2 | Mohammed S. Alshehri3 | Jawad Ahmad4 |

1 | INTRODUCTION expansion of IoT into industrial sectors such as factory floors

CAAI Trans. Intell. Technol. 2024;1–14. wileyonlinelibrary.com/journal/cit2 1

TABLE 1 Related work overview of cyberattacks detection in IIoT.

Average finding No. of Multi‐class Multi‐class Light

[20] 2020 M2‐DAE Bot‐IoT F1‐score 99.7 3 ✓ � �

[22] 2021 GA UNSW‐NB15 Accuracy, precision, recall, 77.64, 83.09, 77.64, 9 ✓ � �

3 | THE PROPOSED INTRUSION the behaviours of contemporary IIoT communication pro-

3.1 | Datasets description

TABLE 2 A detailed presentation of datasets.

Heavy version of Edge‐IIoTset Light version of Edge‐IIoTset X‐IIoTID

7: HConvo ← Convo(Hnorm) The self‐attention layer computed the significance of

FIGURE 1 Flow diagram of the proposed architecture.

Hyperparameters are established prior to the training of neural α

3.4 | Experimental setup α

This section covers a comprehensive assessment of the pro- 2�ðP � RÞ

4.2 | Results and discussion 4.2.3 | Results on X‐IIoTID

Table 5 shows the performance evaluation results of the Layers

TABLE 4 Results of the proposed SACNN on Edge‐IIoTset multi‐class categories.

Layers Heavy version of Edge‐IIoTset Light version of Edge‐IIoTset

TABLE 5 Results of the proposed SACNN on Edge‐IIoTset multi‐class sub‐categories.

Layers Heavy version of Edge‐IIoTset Light version of Edge‐IIoTset

Training time Test time

GRU [46] 0.987 0.9887 0.9878 0.9983 0.9993 2514 202

CNN [29] 0.995 0.9938 0.9944 0.9986 0.9994 83 12

AE [47] 0.9952 0.9981 0.9966 0.9995 0.9998 98 12

DNN [48] 0.9944 0.9838 0.9889 0.9988 0.9996 81 10

MLP [49] 0.9937 0.9952 0.9944 0.9989 0.9997 104 1

LR [48] 0.9161 0.9111 0.9127 0.9859 0.9958 68 1

NB [48] 0.8767 0.8371 0.831 0.9422 0.9775 2 1

SVM [48] 0.9817 0.9855 0.9841 0.9954 0.9976 19,975 677

Proposed SACNN 0.9979 0.998 0.9979 0.9995 0.9999 181 18

Training time Test time

GRU [46] 0.88 0.8584 0.8683 0.8952 0.9684 178 15

CNN [29] 0.9826 0.9827 0.9826 0.9816 0.9986 8 1

AE [47] 0.9926 0.9937 0.9931 0.9933 0.9993 7 1

DNN [48] 0.9915 0.9922 0.9918 0.9923 0.9989 6 1

MLP [49] 0.7932 0.8004 0.7968 0.9576 0.9642 46 0.5

LR [48] 0.9442 0.9468 0.9451 0.9403 0.9952 5 0.2

NB [48] 0.8198 0.7557 0.7454 0.7292 0.9683 1 0.4

SVM [48] 0.9727 0.9724 0.9725 0.9695 0.9962 110 15

Proposed SACNN 0.9945 0.9955 0.995 0.9949 0.9999 13 1

Training time Test time

GRU [46] 0.9414 0.9774 0.9547 0.9977 0.9994 3219 239

CNN [29] 0.9887 0.9836 0.9859 0.9988 0.9996 86 12

AE [47] 0.9829 0.9942 0.9879 0.9984 0.9991 103 12

DNN [48] 0.9944 0.9838 0.9889 0.9988 0.9995 81 10

MLP [49] 0.995 0.9845 0.9893 0.9992 0.9989 137 1

LR [48] 0.9297 0.9023 0.91 0.9962 0.9972 114 1

NB [48] 0.9582 0.9337 0.9404 0.9962 0.996 2 3

SVM [48] 0.9836 0.98 0.9817 0.9985 0.9992 432 241

Proposed SACNN 0.9966 0.9911 0.9937 0.9995 0.9999 187 19

Training time Test time

GRU [46] 0.8424 0.8477 0.8439 0.9337 0.9493 167 13

CNN [29] 0.9824 0.9647 0.9718 0.9835 0.9978 6 1

AE [47] 0.9781 0.9914 0.9837 0.9918 0.9986 7 1