1 s2.0 S016740482300456X Main
1 s2.0 S016740482300456X Main
1 s2.0 S016740482300456X Main
A R T I C L E I N F O A B S T R A C T
Keywords: The rise of Adversarial Machine Learning (AML) attacks is presenting a significant challenge to Intrusion
Adversarial Machine Learning Detection Systems (IDS) and their ability to detect threats. To address this issue, we introduce Apollon, a novel
Intrusion Detection Systems defense system that can protect IDS against AML attacks. Apollon utilizes a diverse set of classifiers to identify
Artificial Intelligence
intrusions and employs Multi-Armed Bandits (MAB) with Thompson sampling to dynamically select the optimal
Cybersecurity
Multi-Armed Bandits
classifier or ensemble of classifiers for each input. This approach enables Apollon to prevent attackers from
learning the IDS behavior and generating adversarial examples that can evade the IDS detection. We evaluate
Apollon on several of the most popular and recent datasets, and show that it can successfully detect attacks
without compromising its performance on traditional network traffic. Our results suggest that Apollon is a robust
defense system against AML attacks in IDS.
* Corresponding author.
E-mail addresses: antoniopaya@outlook.com (A. Paya), sergioarroni@outlook.com (S. Arroni), garciavicente@uniovi.es (V. García-Díaz),
albertogomez@uniovi.es (A. Gómez).
https://doi.org/10.1016/j.cose.2023.103546
Received 30 March 2023; Received in revised form 24 September 2023; Accepted 17 October 2023
Available online 20 October 2023
0167-4048/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
2. Background
Fig. 1. Adversarial Machine Learning (AML) attacks by ART (Nicolae et al.,
In this section, we will provide an overview of the background 2018).
knowledge required to understand the rest of this work. This includes
an introduction to the concepts of Adversarial Machine Learning and
or previously unseen. ML-based IDS can also adapt and improve over
adversarial examples, as well as an introduction to the concepts of In-
time as they learn from new data and feedback from security analysts.
trusion Detection Systems and the challenges they face.
Intrusion Detection Systems (IDS) are security tools designed to Adversarial Machine Learning (AML) attacks refer to a set of tech-
identify and prevent unauthorized access, misuse, and malicious activi- niques used to undermine the accuracy, integrity, or security of ma-
ties in computer networks (Mukherjee et al., 1994). IDS play a critical chine learning (ML) models (Huang et al., 2011). AML attacks can be
role in protecting networks from various types of cyber threats, includ- launched by malicious actors with different objectives, such as stealing
ing viruses, malware, and intrusions. IDS operate by monitoring net- sensitive information, manipulating decision-making processes, or com-
work traffic and analyzing it for suspicious behavior or patterns. There promising the confidentiality and privacy of ML systems (see Fig. 1).
are two main types of IDS: Network-based Intrusion Detection Systems AML attacks can be launched against a wide range of ML models, in-
(NIDS) and Host-based Intrusion Detection Systems (HIDS) (Pharate et cluding deep neural networks, support vector machines, decision trees,
al., 2015). and others. The success of an AML attack depends on various factors,
NIDS monitors network traffic and analyzes packets to identify po- such as the type and quality of the target ML model, the sophistication
tential security threats. It can detect a wide range of network-based of the attack technique, and the attacker’s level of knowledge and re-
attacks, such as port scans, denial-of-service attacks, and data exfiltra- sources. According to the taxonomy of the attack, they can be classified
tion. NIDS can be deployed at various points within the network, such into evasion attacks, poisoning attacks, extraction attacks and inference
as at the perimeter, within the LAN, or at critical junctions within the attacks (De Cristofaro, 2020).
network.
On the other hand, HIDS monitors the activity on individual hosts, 2.2.1. Evasion attacks
such as servers or workstations. While it can also analyze network traf- Evasion attacks in AML refer to a type of attack where the attacker
fic specific to the host, its unique strength lies in its ability to inspect manipulates the input data in a way that the ML model will misclassify
system-specific activities, including file system modifications and sys- it, without changing the underlying characteristics of the data (Biggio et
tem call behaviors. This makes HIDS particularly suitable for identifying al., 2013). Evasion attacks are typically launched against classification
and detecting malware infections, as these often manifest in changes at models, such as those used for image recognition or spam detection, and
the host level. Furthermore, HIDS can detect attacks that may not be they can be crafted using various techniques, including gradient-based
visible to NIDS, such as attacks that occur within encrypted traffic or methods, evolutionary algorithms, or gray/black-box attacks. The goal
those that originate from within the network. of an evasion attack is to create an adversarial example, i.e., a modified
From this point on, and to facilitate the understanding of the docu- version of the original input data that is similar to the original but is
ment, we will refer to Network Intrusion Detection Systems as Intrusion misclassified by the ML model. Evasion attacks pose a significant threat
Detection Systems. to the security and robustness of ML systems, especially in domains such
Intrusion Detection Systems are an essential component of a compre- as malware detection, Intrusion Detection Systems and fraud detection,
hensive network security strategy. They provide an additional layer of where accurate classification is critical.
protection beyond firewalls, antivirus software, and other security tools.
By detecting and alerting administrators to potential security threats, 2.2.2. Poisoning attacks
IDS can help organizations respond quickly and effectively to cyber at- Data poisoning attacks in AML involve manipulating the training
tacks. data of an ML model to introduce biases or to cause it to learn incorrect
Machine Learning (ML) has emerged as a powerful technique for patterns (Koh, 2018). Poisoning attacks can be launched at different
improving the accuracy and effectiveness of Intrusion Detection Sys- stages of the ML pipeline, including data collection, preprocessing, and
tems (Abdallah et al., 2022; Maseer et al., 2021; Thakkar and Lohiya, training. The goal of a poisoning attack is to compromise the integrity
2020). ML algorithms can be used to analyze large volumes of network and accuracy of the ML model by introducing malicious data into the
data and identify patterns that may be indicative of security threats. training dataset, which can cause the model to learn incorrect patterns
ML-based IDS can learn from past network activity to identify and flag and make incorrect predictions. Poisoning attacks can be launched in
potential security threats in real-time, even when the attacks are novel a variety of ways, such as by injecting adversarial examples into the
2
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
training data, manipulating the distribution of the training data, or in- The MAB problem is intricately connected to the realm of Reinforce-
troducing outliers into the dataset. ment Learning (RL). In RL, an intelligent agent endeavors to acquire
knowledge and develop a strategy, known as a policy, that maximizes
2.2.3. Extraction attacks its total rewards throughout its interaction with an environment. Over
Model extraction attacks in AML refer to a type of attack where the past few years, RL has exhibited remarkable success in diverse do-
the attacker aims to extract the details of an ML model without direct mains. Notably, it has found significant utility in the domain of demand
access to it (Chen, 2020). This is achieved by leveraging the output forecasting (Ramos et al., 2022a,b).
of the target ML model to infer the underlying structure, architecture, In the context of our work, we use the MAB algorithm to select
or parameters of the model. Model extraction attacks can be launched the best (IDS) classifier for each network traffic request. This is similar
through different channels, such as querying the model with carefully to how MAB is used in demand forecasting to select the most optimal
crafted inputs or by observing its behavior in response to various in- forecasting model or determine the best hyperparameters for a given
puts. The goal of a model extraction attack is to steal the target model’s forecasting model. By using MAB, we can balance the trade-off between
intellectual property or use it for malicious purposes such as deploy- exploiting the classifiers that have the highest expected accuracy based
ing counterfeit models, stealing sensitive data, or reverse engineering on the current information, and exploring new classifiers that may yield
proprietary algorithms. Model extraction attacks can be particularly higher accuracy in the future.
effective against black-box models where the attacker does not have
access to the model’s internal structure or parameters. 3. Related work
2.2.4. Inference attacks This section will present a summary of the distinct datasets for IDS,
AML inference attacks involve an attacker attempting to glean con- along with their corresponding classifiers and performance metrics. Our
fidential information about the input data utilized by the ML model by proposed approach will make use of these classifiers. Furthermore, we
scrutinizing the model’s output (Yeom, 2017). Inference attacks can be will explore the typical types of AML attacks used in this domain.
launched against a wide range of ML models, including deep neural net-
3.1. Intrusion Detection Systems datasets
works, decision trees, and support vector machines, among others. The
goal of an inference attack is to obtain access to private or confiden-
IDS datasets play a crucial role in assessing and gauging the perfor-
tial information about the input data, such as personal characteristics,
mance of IDSs. These datasets contain labeled instances of regular and
financial transactions, or medical records, without having direct access
anomalous network traffic that are used to train and assess the precision
to the data itself. Inference attacks can be launched through different
and efficiency of IDSs. A range of datasets with diverse strengths and
channels, such as analyzing the output distribution of the model, mea-
features is accessible. This section will examine some of the most preva-
suring its response time to different inputs, or exploiting the model’s
lent datasets, highlighting their essential qualities and applications.
decision boundaries.
3.1.1. CIC-IDS-2017
2.3. Multi-Armed Bandits (MAB)
A highly utilized IDS dataset in contemporary literature is the CIC-
IDS2017 (Sharafaldin et al., 2018a), which was developed by the Cana-
The Multi-Armed Bandits (MAB) is a classic problem in probabil- dian Institute for Cybersecurity (CIC) in a simulated enterprise network
ity theory and Machine Learning, where an agent has to allocate a environment, gathering network traffic data for five consecutive days.
limited set of resources among competing choices that have uncertain This dataset emulates the actions of 25 users and comprises nearly 80
rewards (Kuleshov and Precup, 2014). The agent faces a trade-off be- significant attributes (Ring et al., 2019). Notably, it has an 83% to 17%
tween exploiting the choices that have the highest expected rewards benign to malicious instance ratio, representing a significant portion
based on the current information, and exploring new choices that may of the dataset. The CIC-IDS2017 is considered an accurate depiction of
yield higher rewards in the future. normal traffic distribution in a network and can be utilized individually
The MAB problem has many practical applications in various do- or combined with other datasets (Shroff et al., 2022).
mains, such as clinical trials, adaptive routing, financial portfolio de-
sign, and online advertising. Several algorithms have been proposed to 3.1.2. CSE-CIC-IDS-2018
solve the MAB problem, such as optimistic initialization (Machado et The CSE-CIC-IDS2018 dataset was developed using AWS resources
al., 2014), upper confidence bound (UCB) (Carpentier et al., 2011), and in a simulated enterprise network environment in 2018 (Sharafaldin et
Thompson sampling (Agrawal and Goyal, 2012). These algorithms dif- al., 2018b). It consists of data on seven distinct attack categories and
fer in how they balance exploration and exploitation, and how they comprises nearly 79 important features. With over 450 devices, includ-
estimate the expected rewards of each choice. ing servers, computers, and other tools, this dataset is notably large and
Thompson sampling is a Bayesian approach that maintains a prob- realistic (Pujari et al., 2022). It is akin to the CIC-IDS2017 dataset, ana-
ability distribution over the unknown reward distributions of each lyzing bidirectional flow packet data, but with more significant features
choice, and chooses actions based on sampling from these distributions. and greater comprehensiveness. Hence, it is widely used in the litera-
Specifically, at each timestep, Thompson sampling samples a reward ture for assessing and benchmarking IDSs (Pujari et al., 2022).
from each distribution, chooses the action associated with the highest
sampled reward, and updates its beliefs about the reward distributions 3.1.3. CIC-DDoS-2019
based on the observed reward. This approach has been shown to be ef- To address the lack of representation of all DDoS (Distributed De-
fective in many applications, and has a strong theoretical justification nial of Service) attack subtypes in existing datasets, the CIC-DDoS-2019
in terms of minimizing regret. dataset was created (Sharafaldin et al., 2019). Although the dataset in-
Thompson sampling has gained popularity in recent years due to its cludes simulated network traffic, it strives to present realistic benign
ability to balance exploration and exploitation in a principled way (Park data. It features 13 types of DDoS attacks and over 80 significant fea-
and Faradonbeh, 2021). By sampling from the probability distributions tures. However, it is severely imbalanced, with 50,006,249 DDoS attack
over the reward distributions, Thompson sampling encourages explo- records and just 56,863 benign traffic records, making it challenging to
ration of all choices while still favoring choices with higher expected train a model on both data types (Ring et al., 2019). As a result, experts
rewards. Additionally, the Bayesian framework allows for the incorpo- suggest using this dataset in conjunction with other datasets (Shroff et
ration of prior knowledge about the reward distributions, which can be al., 2022), such as CIC-IDS-2017 or CSE-CIC-IDS-2018, to train a more
especially useful in scenarios with limited data. robust model.
3
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
Table 1
Performance of the IDSs classifiers on the selected datasets.
LR 92.96 90.87 91.50 87.96 88.99 81.54 91.72 87.27 90.23 Thiyam and Dey (2023); Akshay Kumaar et al. (2022)
FNN 99.61 99.57 99.83 93.00 92.00 100.00 95.55 95.50 95.63 Huang and Lei (2020); Thiyam and Dey (2023); Wu et
al. (2022)
RF 99.79 99.78 99.98 92.00 94.00 100.00 99.86 99.78 99.82 Pujari et al. (2022); Huang and Lei (2020); Abdul-
hammed et al. (2019); Maseer et al. (2021); Faker and
Dogdu (2019); Thiyam and Dey (2023)
DT 99.62 99.57 99.56 88.00 91.00 100.00 99.87 99.78 99.80 Pujari et al. (2022); Thiyam and Dey (2023); Huang
and Lei (2020); Maseer et al. (2021)
RTIDS 99.35 99.17 98.83 - - - 98.58 98.48 98.66 Wu et al. (2022)
SVM 96.97 96.99 98.98 61.00 66.00 100.00 94.02 94.98 94.24 Pujari et al. (2022); Huang and Lei (2020); Maseer et
al. (2021); Faker and Dogdu (2019); Wu et al. (2022);
Sahoo et al. (2020)
3.1.4. Discarded datasets sures the proportion of correct predictions made by a model out of the
This project did not make use of several other IDS datasets, includ- total number of predictions and is defined in Equation (1),
ing Darpa 1998/99 (Mahoney and Chan, 2003), KDD 99 (Tavallaee et
𝑇𝑃 +𝑇𝑁
al., 2009), and NSL-KDD (Tavallaee et al., 2009). These datasets are no 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
longer commonly employed for evaluating and benchmarking IDS due
where TP is the number of true positives, TN is the number of true
to their outdated nature. Created in the late 1990s and early 2000s, they
negatives, FP is the number of false positives, and FN is the number of
do not accurately represent the current landscape of network threats
false negatives.
and behaviors. KDD 99 dataset, in particular, has been criticized for its
In the context of IDS, a true positive is defined as a malicious
high false positive rate and lack of realism, thereby limiting its useful-
network flow that is correctly identified as malicious by the IDS. Con-
ness in assessing the performance of modern IDS (Hugh, 2000; Tobi and
versely, a true negative refers to normal network traffic that is correctly
Duncan, 2018).
identified as non-malicious. It’s worth noting that in many real-world
scenarios, the volume of normal traffic significantly outweighs mali-
3.2. Intrusion Detection Systems ML-classifiers
cious traffic. As a result, if the positive class in our metrics refers to
normal traffic, the values would naturally be higher due to the preva-
ML-classifiers have emerged as a promising alternative to traditional lence of normal traffic. However, in our evaluations, the positive class
IDS for detecting network attacks. This is due to the limitations of specifically denotes malicious traffic, ensuring that our metrics provide
traditional IDS in dealing with the complex and dynamic nature of a balanced and accurate representation of the IDS’s performance.
cyber-attacks. In the current digital era, the number and sophistication F1 Score is a weighted average of precision and the detection rate
of malware threats are constantly growing, posing a serious challenge (DR) (also known as recall or sensitivity on traditional ML literature),
to network security. Therefore, it is essential to have reliable and ef- where precision measures the proportion of true positives out of all
fective IDS systems in place to protect network systems from potential predicted positives, and the detection rate measures the proportion of
damage. true positives out of all actual positives. The F1 Score is defined in
Several studies have been conducted to evaluate the performance of Equation (2).
various ML-classifiers in detecting network attacks. These studies use
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝐷𝑅
datasets such as CIC-IDS-2017, CSE-CIC-IDS-2018, and CIC-DDoS-2019. 𝐹 1𝑆𝑐𝑜𝑟𝑒 = 2 ⋅ (2)
Table 1 provides a summary of the most commonly used ML- 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝐷𝑅
classifiers and their scores, including accuracy, F1 Score, and AUC, for where precision is defined in Equation (3) and DR is defined in Equa-
each of the aforementioned datasets. The ML-classifiers used in these tion (4).
studies are Logistic Regression (LR) (Wright, 1995), Fuzziness based Neu- 𝑇𝑃
ral Networks (FNN) (Ashfaq et al., 2017), Random Forests (RF) (Cutler 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
𝑇𝑃 + 𝐹𝑃
et al., 2012), Decision Trees (DT) (Rokach and Maimon, 2005), Robust 𝑇𝑃
transformer based Intrusion Detection System (RTIDS) (Wu et al., 2022), 𝐷𝑅 = (4)
𝑇𝑃 + 𝐹𝑁
and Support Vector Machines (SVM) (Suthaharan and Suthaharan, 2016).
The Area Under the ROC Curve (AUC) is a widely used performance
These classifiers were selected due to their wide adoption and
metric in ML and binary classification tasks. It quantifies the discrimi-
proven effectiveness in diverse Machine Learning applications. Nonethe-
native power of a classification model by measuring the probability that
less, research continues with the development of new and enhanced
the model will rank a randomly chosen positive instance higher than a
classifiers, such as LSTM-FCNN (Sahu et al., 2022) or DCNNBiL-
randomly chosen negative instance. The ROC curve plots the true pos-
STM (Hnamte and Hussain, 2023), further extending the landscape of
itive rate (DR) against the false positive rate (1-specificity) for various
ML-IDS. By examining the results presented in Table 1, researchers
classification thresholds, where specificity is defined in Equation (5).
can gain a comprehensive understanding of the performance of these
classifiers on the specific datasets, enabling informed decisions when 𝑇𝑁
𝑠𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 = (5)
choosing the most appropriate algorithm for their requirements. The 𝑇𝑁 + 𝐹𝑃
detailed explanations of how each of the results were obtained for each The AUC represents the integral of this curve and ranges from 0 to
classifier can be found in the papers listed in the References column. 1, where a value of 1 indicates a perfect classifier and a value of 0.5
These papers provide comprehensive insights into the methodologies suggests a random or ineffective classifier. Higher AUC values indicate
employed and offer further analysis of the performance of each ML- better model performance in distinguishing between positive and nega-
classifier on the respective datasets. tive instances. The AUC is a popular evaluation metric as it is robust to
Accuracy, F1 Score, and AUC are three common metrics used to class imbalance and provides a concise summary of the model’ overall
evaluate the performance of machine learning models. Accuracy mea- performance.
4
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
Among the classifiers in Table 1, Random Forest (Zhang et al., 2008) system’s model scores or just a binary output indicating whether the in-
and Decision Trees (Amor et al., 2004) are found to be some of the put was accepted or rejected. This information can be utilized to guide
most effective classifiers for detecting network attacks. Random Forest the training of the generator network, enhancing its ability to produce
has gained popularity due to its ability to handle large datasets and its effective adversarial examples.
robust performance even when the data contains noise or missing val- Recent gray/black-box attacks proven to be effective against IDS in-
ues. Decision Trees are also preferred because of their simplicity and clude attackGAN (Zhao et al., 2021), DIGFuPAS (Duy et al., 2021), IDS-
interpretability. They enable clear visualization of the decision-making GAN (Lin et al., 2022), VulnerGAN (Liu et al., 2022), ZOO attack (Chen
process, making them useful for understanding the factors that con- et al., 2017), Boundary attack (Chen and Jordan, 2019) and the Hot-
tribute to the classification results. SkipJump attack (HSJA) (Chen et al., 2020). ZOO is a score-based attack
Overall, these classifiers have demonstrated strong performance in that estimates gradients to create adversarial traffic in gray/black-box
the field of IDS and are frequently used by researchers and practition- settings. Boundary attack and HSJA are decision-based attacks that only
ers. Their effectiveness in detecting intrusions and classifying network use binary feedback to craft adversarial inputs. The IDSGAN, attackGAN,
traffic makes them valuable tools for maintaining the security and in- and DIGFuPAS are gray/blackbox attacks that employ Wasserstein-GAN
tegrity of computer networks. to generate adversarial traffic. Wasserstein-GAN (W-GAN) (Gulrajani et
al., 2017) is a GAN variant that trains the generator network with a
3.3. Adversarial Machine Learning attacks different objective function called the Wasserstein distance. The Wasser-
stein distance measures the distance between two probability distribu-
ML-based IDSs can learn from data and adapt to new situations, tions and has properties like smoothness and continuity. Some recent
unlike traditional systems that rely on predefined rules. However, ML- WGANs use the Gradient Penalty to enhance the training convergence.
based IDSs also face new challenges from attackers who use Artificial
Intelligence (AI) to craft sophisticated attacks that can fool or com- 3.4. Adversarial Machine Learning defenses
promise ML models. One such threat comes from the use of Artificial
Intelligence (AI) in the form of Adversarial Machine Learning (AML), Due to the increasing number of AML attacks, researchers have de-
where attackers use sophisticated techniques to manipulate or subvert veloped several defense mechanisms to mitigate the impact of these
ML models. These attacks are attractive to cyber attackers since they attacks. In the IDS domain, these defenses can be classified into three
can be challenging to detect and prevent. Furthermore, as AI techniques categories (Alotaibi and Rassam, 2023): preprocessing defenses, adversar-
gain popularity in cybersecurity, attackers are incentivized to develop ial training defenses and adversarial detection defenses.
more sophisticated adversarial attacks to evade detection. It’s crucial to briefly mention here the context of our proposed sys-
Adversarial Machine Learning attacks can be classified as white-box tem, Apollon, in relation to these established defenses. Rather than
attacks and gray/black-box attacks, depending on the level of knowl- replacing or competing with these mechanisms, Apollon works in tan-
edge the attacker possesses about the target model. dem with them, dynamically selecting and optimizing the utilization of
these unaltered, pre-existing models.
3.3.1. White-box attacks
White-box attacks are a powerful category of AML attacks that can 3.4.1. Preprocessing defenses
pose challenges not only to IDS but to any Machine Learning model. In order to mitigate the impact of adversarial perturbations, re-
Their potency derives from the complete knowledge they assume about searchers have devised carefully planned preprocessing techniques.
the target model and training data. This allows the attacker to craft These preprocessing methods aim to reduce the vulnerability of ma-
intricate attacks that can evade system defenses. chine learning models to adversarial attacks and enhance their robust-
Admittedly, such thorough knowledge about the system and its ness applying carefully planned transformations to the input data before
vulnerabilities is often impossible, rendering white-box attacks quite it is fed into the model. These transformations are designed to reduce
unrealistic in many practical scenarios. Thus, while white-box attacks the vulnerability of the model to adversarial perturbations.
are potent in theory, they are seldom observed in practice. Common One example of a preprocessing defense technique is Stochastic
white-box attack methods, applicable to ML models in general, and Transformation-based Defenses (Kou et al., 2019). This technique in-
proven to be effective against IDS, include Fast Gradient Sign Method volves applying random transformations to the input data, such as
(FGSM) (Goodfellow et al., 2014a), Deep-Fool (Moosavi-Dezfooli et al., rotations, translations, and scaling, before feeding it into the model.
2016), Carlini &Wagner attack (C&W) (Carlini and Wagner, 2017), Ja- By introducing randomness into the input data, the model becomes less
cobian based Saliency Map Attack (JSMA) (Papernot et al., 2016), Basic susceptible to adversarial perturbations that are designed to exploit spe-
Iterative Method (BIM) (Kurakin et al., 2016), and Projected Gradient De- cific features of the input.
scent (PGD) (Madry et al., 2017). Another example of a preprocessing defense technique is Gradient
Masking (Athalye et al., 2018). This technique involves modifying the
3.3.2. Gray/black-box attacks gradients of the model during training to make it more difficult for an
Gray/black-box attacks are comparatively more practical as they do attacker to compute the gradients needed to generate adversarial exam-
not necessitate knowledge about the target model. Yet, the effectiveness ples. This is achieved by adding noise to the gradients or by clipping
of these attacks may be limited by the attacker’s restricted knowledge, them to a certain range.
leading to more generic and potentially less effective techniques. Nevertheless, sophisticated adversarial attacks have proven the in-
A prevalent tool used to mount such attacks is the Generative Ad- adequacy of these defense mechanisms. The primary shortcomings of
versarial Network (GAN) (Goodfellow et al., 2020), a machine learning these strategies are rooted in their approach: they tend to “confound” or
model that can produce adversarial examples capable of evading detec- confuse adversaries instead of outright eradicating the presence of ad-
tion systems. GANs are a type of Machine Learning model consisting versarial examples (Xu et al., 2020). This means that while they might
of two neural networks: a generator network and a discriminator net- momentarily disrupt or delay an attacker, they don’t provide a long-
work. The generator network is trained to produce synthetic data that term solution or a foolproof barrier against these threats.
resembles the real data, while the discriminator network is trained to
differentiate between real and synthetic data. 3.4.2. Adversarial training defenses
In the context of black-box and gray-box attacks, the generator net- Adversarial training is a widely researched topic within the realm
work can generate adversarial examples that are specifically designed to of visual computing. Goodfellow et al. (2014b) demonstrated that by
evade the target system’s defenses. The attacker may have access to the retraining a neural network with a dataset comprising both original
5
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
6
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
7
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
arate version of each classifier for each cluster. While this may seem to
add computational complexity without directly improving traditional
performance metrics such as accuracy or detection rate, it plays a cru-
cial role in increasing the robustness of our system against adversarial
attacks.
This layer enriches the overall diversity of our system, ensuring the
existence of multiple models trained on different clusters of data. This
added variation serves to complicate an attacker’s task when trying to
replicate the behavior of our system through black/gray box attacks. It
essentially increases the system’s unpredictability, forming a key part
of Apollon’s robust defense against such attempts.
The traffic requests are clustered using the K-Means algorithm
(Sinaga and Yang, 2020), which is a popular clustering algorithm that
is used in many Machine Learning applications. The K-Means algorithm
works by randomly selecting 𝑘 points as the initial centroids, and then
iteratively updating the centroids until the clusters converge. In Apol-
lon, we use the K-Means algorithm to cluster the network traffic requests
Fig. 3. Apollon Multi-Armed Bandits algorithm with multiple classifiers. based on their features, ensuring that requests with similar features are
grouped together. These features are the same ones used by the classi-
fiers to evaluate the requests, ensuring that the clustering is based on
multiple ‘best’ classifiers, the decision-making process remains robust
the same information that the classifiers use to make their decisions.
and effective.
Each cluster has its dedicated version of every classifier, trained ex-
Our choice of Bagging as the ensemble method stems from its in-
clusively on the traffic requests in that cluster. This specific tuning to
herent properties that align well with our objectives. Bagging works
distinct traffic patterns significantly contributes to the system’s com-
by generating multiple versions of a predictor through bootstrapped
plexity and enhances its defense against adversarial learning.
samples of the training set and then aggregates their predictions. This
When a new network traffic request arrives at Apollon, it is classified
process inherently reduces variance, making the ensemble less sensi-
into the appropriate cluster based on its features. The Multi-Armed Ban-
tive to the idiosyncrasies of individual classifiers. In the context of
dits algorithm then selects the optimal set of classifiers for that cluster,
our MAB-based system, where ties among classifiers indicate closely
factoring in the performance of each classifier within that specific clus-
matched performance, Bagging offers a natural way to harness the col-
ter. The selected classifier or set of classifiers then evaluate the request
lective strength of these classifiers without introducing undue bias.
to determine if it is benign or malicious.
The use of clustering in Apollon, together with the individual train-
Algorithm 1 Apollon Thompson Sampling. ing of each classifier on each cluster, allows the Multi-Armed Bandit
Intit 𝑆𝑖 = 0 y 𝐹𝑖 = 0 for each arm 𝑖 algorithm to generate multiple probability distributions for each classi-
for 𝑡 = 1, 2, … do
fier, depending on the type of request received. This blend of strategies
For each arm 𝑖, sample 𝜃𝑖 of Beta distribution (𝑆𝑖 + 1, 𝐹𝑖 + 1)
Choose the arm 𝐼𝑡 that maximizes 𝜃𝑖 amplifies the challenge for potential attackers to identify the responding
Observe the reward 𝑋𝑡 of the arm 𝐼𝑡 . classifier, thereby reducing the probability of successful system imita-
if 𝑋𝑡 = 1 then tion.
Increment 𝑆𝐼𝑡 by one
else 4.4. Apollon limitations
Increment 𝐹𝐼𝑡 by one
end if While Apollon offers a novel and robust approach to defending
end for against adversarial attacks in Intrusion Detection Systems, it is crucial
to discuss its limitations to provide a well-rounded understanding of its
By using a Multi-Armed Bandits algorithm, Apollon can dynamically applicability, strengths, and areas for future improvement.
select the optimal classifier or set of classifiers for each network traf-
fic request, making the system more responsive to new types of attacks. • Increased Model Training Time: One of the notable limitations
The use of Thomson Sampling ensures that the system is balanced be- of Apollon is the increased computational time required during the
tween exploration and exploitation, improving the overall attacks de- model training phase. Although Apollon is designed to be compu-
tection rate of the classification. tationally efficient during the prediction phase—where it merely
The primary reason for our choice of MAB over traditional ensem- samples from pre-defined distributions to select an appropriate
ble algorithms is its dynamic adaptability. In the context of defending model—the training phase is more computationally intensive. This
against AML attacks in IDS, the threat landscape is constantly evolving. is because Apollon needs to generate these distributions for each
MAB provides us with the flexibility to adapt over time, allowing us model in the pool, which can be a time-consuming process. This
to explore and exploit different classifiers based on their historical per- limitation is particularly relevant in scenarios where rapid model
formance. This dynamic selection mechanism ensures that our system training and deployment are crucial.
remains robust even as adversaries adapt their strategies. Traditional • Model Pool Diversity: The second limitation pertains to the diver-
ensemble methods, while powerful, operate on a more static combina- sity of the model pool. The efficacy of Apollon is intrinsically linked
tion of models and might not be as agile in responding to changing to the diversity and quality of the models it has at its disposal. A
adversarial tactics. pool that lacks diversity in terms of the types of models, their archi-
Fig. 3 shows the diagram of the MAB algorithm with multiple clas- tectures, or their training data may not fully exploit the potential
sifiers. of the Multi-Armed Bandits mechanism. This could result in sub-
optimal performance and may reduce Apollon’s overall robustness
against a wide array of adversarial attacks.
4.3. Traffic requests clustering
These limitations offer avenues for future research and development
The final layer of the Apollon defense system involves clustering the to further enhance the real-world applicability and effectiveness of Apol-
network traffic requests based on their features, and then training a sep- lon.
8
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
We designed two scenarios to assess the performance of our pro- 5.1.1. Traditional network traffic and attacks
posed solution. In the first scenario, we tested our solution on three In this test scenario, we utilized the CIC-IDS-2017, CSE-CIC-IDS-
datasets: CIC-IDS-2017, CSE-CIC-IDS-2018, and CIC-DDoS-2019. These 2018, and CIC-DDoS-2019 datasets to train a set of classifiers to com-
datasets are comprised of network traffic and web attacks and are com- pare with our proposed solution. Our goal was to assess the performance
monly used in the evaluation of IDSs. The reason for selecting these of our solution with the traditional network traffic and web attacks.
datasets is that they are the most popular and widely used in the evalua- We have utilized default hyperparameters of the classifiers because
tion of IDSs. Additionally, they are generated to be as similar as possible the objective of this scenario is not to maximize the performance of
to real-world network traffic and attacks. Therefore, they are ideal for these classifiers but rather to compare them with our solution. The clas-
evaluating the performance of our solution in a traditional IDS environ- sifiers used in this scenario are the following:
ment. However, it is important to note that real-world network traffic
and attacks are constantly evolving and becoming more complex, and • Multilayer Perceptron (MLP): hidden_layer_sizes = (32), max_iter =
as such, these datasets may not be representative of the current state of 200
network traffic and attacks. • Random Forest (RF): n_estimations = 100
These datasets are comprised of flow features in a CSV format which • Decision Trees (DT)
were used as inputs to our Machine Learning models. The process • Naive Bayes (NB)
of feature selection was undertaken prior to model training. For the • Logistic Regression (LR)
CIC-DDoS-2019 dataset, we followed the feature selection process as
described in work (Thiyam and Dey, 2023). For the CSE-CIC-IDS-2018 5.1.2. Adversarial Machine Learning attacks
dataset, we referred to the methodology outlined in work (Pujari et al., In this test scenario, we used several gray/black-box Adversarial Ma-
2022) for feature selection. Lastly, for the CIC-IDS-2017 dataset, we ad- chine Learning attacks to evaluate the ability of our solution to defend
hered to the approach recommended in work (Faker and Dogdu, 2019) against such attacks. To simplify the evaluation process, we used the
for selecting features. This systematic feature selection approach en- CIC-IDS-2017 dataset to train the classifiers and our proposed solution
abled us to optimize the performance of our models, enhancing their because it is the most popular dataset. We compare the accuracy and the
accuracy, and reducing overfitting and training time. detection rate of the classifiers and our proposed solution against the
Due to our lack of powerful machines, we opted to use a represen- gray/black-box Adversarial Machine Learning attacks. As we mentioned
tative subset of the datasets instead of the complete data to efficiently before, we only test with gray/black-box attacks because white-box at-
train and test our models. For the dataset CIC-DDoS-2019 the whole tacks are not realistic in real-world scenarios.
dataset has been used, while for the dataset CSE-CIC-IDS-2018 the The attacks used in this scenario are the following:
subset from 02-15-2018 has been used. Finally, for the CIC-IDS-2017
dataset, the following subsets of data have been selected: • Zeroth-order optimization attack (ZOO) (Chen et al., 2017)
• HopSkipJump attack (HSJA) (Chen et al., 2020)
• Friday WorkingHours Afternoon DDoS • W-GAN based attacks (Lin et al., 2022)
• Friday WorkingHours Afternoon PortScan
• Friday WorkingHours Morning We opted for these particular attacks because they encompass a
• Monday WorkingHours broad spectrum of potential Adversarial Machine Learning evasion
• Thursday WorkingHours Afternoon Infilteration strategies and are among the most widely used and successful.
• Thursday WorkingHours Morning WebAttacks The adversarial network traffic used in this scenario was generated
• Tuesday WorkingHours by modifying real attack traffic while preserving the key characteristics
of the attacks. This method ensures that the adversarial traffic maintains
We compared the results of our solution with the classifiers extracted the legitimate semantics of the original traffic, an approach adopted
from related work. In the second scenario, we used several gray/black- from M. Usama et al.’s methodology (Usama et al., 2019).
box Adversarial Machine Learning attacks to evaluate the ability of our
solution to defend against such attacks. In this scenario, the test data
5.2. Results
exclusively comprise attack instances. Conversely, for the first scenario,
the test data are extracted from the respective dataset and incorporate
a mix of benign and malicious instances. Throughout the development of the evaluation we have decided to
The classifiers scores used to compare our solution may be differ- set a seed, so that the results can be replicated: the seed = 42.
ent than those reported in related work due to several factors. Firstly, Before training the classifiers, a common pre-processing step was
the machine on which the training is performed may differ from that performed on the data from all datasets. This step is essential in stan-
of related work. This can impact the speed and efficiency of the train- dardizing the datasets, ensuring that we are working uniformly with
ing process, which in turn can affect the final accuracy of the classifiers. each of them. To achieve this, a combination of sklearn functions such
Secondly, the pre-processing of the data may be different between our as RobustScaler and Normalizer was utilized. To make features less sen-
solution and related work. Pre-processing techniques can greatly im- sitive to outliers, the RobustScaler subtracts the median and adjusts the
pact the quality of the data and hence the performance of the classifiers. data based on the quantile range. The Normalizer scales the input data
set to have a norm of 1 and values between 0 and 1.
In addition to standardizing the datasets, additional steps were taken
1
https://github.com/antonioalfa22/apollon. to further prepare the data for training our classifiers. We avoided ex-
9
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
By using Walk-Forward Cross-Validation, we ensure that the results Accuracy 0.9998 0.9990 0.9999 0.9999 0.9970 0.9996
Detection rate 0.8985 0.8709 0.9932 0.9999 0.7930 0.9691
obtained are representative of the model’s ability to predict future data
F1 0.9186 0.7148 0.9957 0.9991 0.8541 0.8521
points based on past observations, eliminating the risk of inadvertently AUC 0.9817 0.9753 0.9999 0.9999 0.9796 0.9691
training the model on future data. This approach provides a more re-
alistic and robust evaluation of the model’s performance on time-series
data. mum score, while it can never perform as poorly as the worst classifier
The results of our experiments in the two evaluation scenarios are since it has other better options.
presented below. Our results demonstrate that even with the integration of new secu-
rity mechanisms, Apollon can still provide high accuracy and detection
rate scores in traditional network traffic classification environments.
5.2.1. Traditional network traffic and attacks
Therefore, Apollon is capable of preserving the fundamental functional-
In this scenario, we trained the selected classifiers on the CIC-IDS-
ity of an IDS.
2017, CSE-CIC-IDS-2018, and CIC-DDoS-2019 datasets. Additionally,
we trained our proposed solution on the same datasets and with the
5.2.2. Adversarial Machine Learning attacks
same classifiers. To train Apollon, we use the K-Means (Likas et al.,
In this scenario, we have launched three types of Adversarial Ma-
2003) algorithm to cluster the data into 2 clusters. For each cluster,
chine Learning attacks on the selected classifiers and against our so-
we train the selected classifiers and we update the Multi-Armed Bandits
lution. These attacks are Zeroth-order optimization attack (ZOO), Hop-
algorithm with the results.
SkipJump attack (HSJA) and W-GAN based attacks.
The results obtained from the experiments are presented in the Ta- The classifiers and the Apollon implementation used as targets of
bles 2, 3, and 4. the attacks are the ones trained in the previous environment with the
In our experiments across different datasets, the Multi-Armed Ban- CIC-IDS-2017 dataset.
dit algorithm showed a preference for specific models or combinations Starting with the Zeroth-order optimization attack (ZOO), we have
thereof. For instance, in the CIC-IDS-2017 dataset, the most frequently used the open source implementation provided by ART (Nicolae et al.,
chosen combination by MAB was Random Forest (RF) and Decision Tree 2018), and created a Classifier class so that Apollon can be used as a
(DT), accounting for approximately 40% of the selections, followed by model. The attack was launched with the following parameters for each
RF alone at 28%. In the CIC-DDoS-2019 dataset, DT was the choice classifier:
in 60% of the selections. Similarly, in the CSE-CIC-IDS-2018 dataset,
RF was chosen 42% of the time, followed by the combination of RF • classifier: the Classifier class instance with the classifier to be at-
and DT at approximately 27%. Intriguingly, the models or combina- tacked.
tions selected by MAB are generally those that individually exhibit high • targeted: True.
performance. This observation underscores the efficacy of MAB in dy- • learning_rate: 0.01.
namically selecting optimal models, thereby enhancing the robustness • max_iter: 100.
and adaptability of our IDS.
Based on these results, our solution demonstrates high detection The attack was launched against the classifiers and the results are
rate and accuracy scores, comparable to the classifiers chosen for com- shown in Table 5. The most frequently chosen combination was RF and
parison. It’s important to mention that among all the datasets, Apollon DT, making up approximately 39% of the selections, followed by RF
doesn’t achieve the best or the worst scores. This is due to the fact that alone at 28%. According to the findings, the attack was effective in
Apollon internally selects from the same classifiers. Hence, the highest every scenario, resulting in reduced detection rates across all classifiers.
score that Apollon can achieve is limited to the best classifier’s maxi- Nonetheless, even though the Apollon implementation’s accuracy and
10
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
Table 5 attacks are highly similar, with the same models being chosen more
Results of the ZOO AML attack. frequently than others.
Metrics ZOO attack Apollon’s improved accuracy and detection rates in AML attacks can
be attributed to the inclusion of an uncertainty component in its model
MLP NB RF DT LR Apollon
selection process. This renders it difficult to train a model solely based
Accuracy 0.7200 0.7300 0.7250 0.7400 0.5200 0.9304 on its responses.
Detection rate 0.5260 0.5500 0.5460 0.5780 0.0420 0.8772
F1 0.5329 0.5550 0.5505 0.5815 0.0457 0.8835
The experimental results of the scenario reveal that our solution
AUC 0.7300 0.7400 0.7350 0.7500 0.5300 0.9400 exhibits greater robustness in comparison to the other classifiers used
individually.
However, we also observed that while our solution effectively re-
Table 6 duces the effectiveness of the attacks, it does not completely nullify
Results of the HopSkipJump AML attack. them. This means that there is still room for improvement in terms of
Metrics HopSkipJump attack enhancing the solution’s robustness to further strengthen its resistance
against such attacks.
MLP NB RF DT LR Apollon
In particular, if we were to generate the attacks with more time, such
Accuracy 0.5002 0.4301 0.5001 0.5051 0.5900 0.7550 as by increasing the number of iterations or epochs, it is likely that the
Detection Rate 0.0000 0.0000 0.0000 0.0100 0.1800 0.5260
F1 0.0000 0.0000 0.0000 0.0133 0.2091 0.5607
effectiveness of these attacks against our solution would increase. It’s
AUC 0.5002 0.4907 0.5000 0.5121 0.6200 0.7760 important to note that we are realistic about the limitations of our pro-
posal. While crafting an impenetrable defense is nearly impossible, our
solution aims to significantly increase the time and resources required
Table 7 for a potential attacker to successfully execute an attack. In real-world
Results of the W-GAN based AML attack. scenarios, this increase in time and computational cost can render an at-
Metrics W-GAN based attack tack unfeasible or economically unviable, thereby serving as a deterrent
and adding an extra layer of security.
MLP NB RF DT LR Apollon
11
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
Arroni: Investigation, Software, Visualization, Writing – original draft. Goodfellow, I.J., Shlens, J., Szegedy, C., 2014a. Explaining and harnessing adversarial
Vicente García-Díaz: Conceptualization, Validation, Writing – original examples. arXiv preprint arXiv:1412.6572.
Goodfellow, I.J., Shlens, J., Szegedy, C., 2014b. Explaining and harnessing adversarial
draft, Writing – review & editing. Alberto Gómez: Conceptualization,
examples. arXiv preprint arXiv:1412.6572.
Validation, Writing – original draft, Writing – review & editing. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C., 2017. Improved
training of Wasserstein gans. Adv. Neural Inf. Process. Syst. 30.
Declaration of competing interest He, K., Kim, D.D., Asghar, M.R., 2023. Adversarial machine learning for network intru-
sion detection systems: a comprehensive survey. IEEE Commun. Surv. Tutor. 25 (1),
538–566.
The authors declare that they have no known competing financial Hnamte, V., Hussain, J., 2023. Dcnnbilstm: an efficient hybrid deep learning-based intru-
interests or personal relationships that could have appeared to influence sion detection system. Telemat. Inform. Rep. 10, 100053.
the work reported in this paper. Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.D., 2011. Adversarial ma-
chine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial
Intelligence, pp. 43–58.
Data availability Huang, S., Lei, K., 2020. Igan-ids: an imbalanced generative adversarial network towards
intrusion detection system in ad-hoc networks. Ad Hoc Netw. 105, 102177.
I have shared the link to the Github with the data in the article. Hugh, J.M., 2000. Testing intrusion detection systems: a critique of the 1998 and 1999
darpa intrusion detection system evaluations as performed by Lincoln laboratory.
ACM Trans. Inf. Syst. Secur.
References Koh, P.W., 2018. Stronger data poisoning attacks break data sanitization defenses. Mach.
Learn. 111, 1–47.
Abdallah, E.E., Otoom, A.F., et al., 2022. Intrusion detection systems using supervised Kou, C., Lee, H.K., Chang, E.-C., Ng, T.K., 2019. Enhancing transformation-based defenses
machine learning techniques: a survey. Proc. Comput. Sci. 201, 205–212. using a distribution classifier. arXiv preprint arXiv:1906.00258.
Abdulhammed, R., Faezipour, M., Musafer, H., Abuzneid, A., 2019. Efficient network Kuleshov, V., Precup, D., 2014. Algorithms for multi-armed bandit problems. arXiv
intrusion detection using pca-based dimensionality reduction of features. In: 2019 preprint arXiv:1402.6028.
International Symposium on Networks, Computers and Communications (ISNCC), Kurakin, A., Goodfellow, I., Bengio, S., 2016. Adversarial machine learning at scale. arXiv
pp. 1–6. preprint arXiv:1611.01236.
Agrawal, S., Goyal, N., 2012. Analysis of Thompson sampling for the multi-armed bandit Lee, T.-H., Ullah, A., Wang, R., 2020. Bootstrap aggregating and random forest. In:
problem. In: Conference on Learning Theory. In: JMLR Workshop and Conference Macroeconomic Forecasting in the Era of Big Data: Theory and Practice, pp. 389–429.
Proceedings. Likas, A., Vlassis, N., Verbeek, J.J., 2003. The global k-means clustering algorithm. Pat-
Akshay Kumaar, M., Samiayya, D., Vincent, P.M.D.R., Srinivasan, K., Chang, C.-Y., tern Recognit. 36 (2), 451–461.
Ganesh, H., 2022. A hybrid framework for intrusion detection in healthcare systems Lin, Z., Shi, Y., Xue, Z., 2022. Idsgan: generative adversarial networks for attack genera-
using deep learning. Front. Public Health 9. tion against intrusion detection. In: Pacific-Asia Conference on Knowledge Discovery
Alotaibi, A., Rassam, M.A., 2023. Adversarial machine learning attacks against intrusion and Data Mining. Springer, pp. 79–91.
detection systems: a survey on strategies and defense. Future Internet 15 (2), 62. Liu, G., Zhang, W., Li, X., Fan, K., Yu, S., 2022. Vulnergan: a backdoor attack through vul-
Amor, N.B., Benferhat, S., Elouedi, Z., 2004. Naive Bayes vs decision trees in intrusion de- nerability amplification against machine learning-based network intrusion detection
tection systems. In: Proceedings of the 2004 ACM Symposium on Applied Computing, systems. Sci. China Inf. Sci. 65 (7), 1–19.
pp. 420–424. Machado, M.C., Srinivasan, S., Bowling, M., 2014. Domain-independent optimistic initial-
Ashfaq, R.A.R., Wang, X.-Z., Huang, J.Z., Abbas, H., He, Y.-L., 2017. Fuzziness based semi- ization for reinforcement learning. arXiv preprint arXiv:1410.4604.
supervised learning approach for intrusion detection system. Inf. Sci. 378, 484–497. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A., 2017. Towards deep learning
Athalye, A., Carlini, N., Wagner, D., 2018. Obfuscated gradients give a false sense of se- models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
curity, circumventing defenses to adversarial examples. In: International Conference Mahoney, M.V., Chan, P.K., 2003. An analysis of the 1999 darpa/Lincoln laboratory eval-
on Machine Learning. In: PMLR, pp. 274–283. uation data for network anomaly detection. In: International Workshop on Recent
Bai, T., Luo, J., Zhao, J., Wen, B., Wang, Q., 2021. Recent advances in adversarial training Advances in Intrusion Detection. Springer, pp. 220–237.
for adversarial robustness. arXiv preprint arXiv:2102.01356. Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M., 2021. Benchmarking
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, of machine learning for anomaly based intrusion detection systems in the cicids2017
F., 2013. Evasion attacks against machine learning at test time. In: Machine Learning dataset. IEEE Access 9, 22351–22370.
and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P., 2016. Deepfool: a simple and accurate
Prague, Czech Republic, September 23–27, 2013, Proceedings, Part III 13. Springer, method to fool deep neural networks. In: Proceedings of the IEEE Conference on
pp. 387–402. Computer Vision and Pattern Recognition, pp. 2574–2582.
Carlini, N., Wagner, D., 2017. Towards evaluating the robustness of neural networks. In: Mukherjee, B., Heberlein, L.T., Levitt, K.N., 1994. Network intrusion detection. IEEE
2017 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 39–57. Netw. 8 (3), 26–41.
Carpentier, A., Lazaric, A., Ghavamzadeh, M., Munos, R., Auer, P., 2011. Upper- Nicolae, M.-I., Sinn, M., Tran, M.N., Buesser, B., Rawat, A., Wistuba, M., Zantedeschi,
confidence-bound algorithms for active learning in multi-armed bandits. In: Algo- V., Baracaldo, N., Chen, B., Ludwig, H., Molloy, I., Edwards, B., 2018. Adversarial
rithmic Learning Theory: 22nd International Conference, ALT 2011, Espoo, Finland, robustness toolbox v1.2.0. CoRR. arXiv:1807.01069.
October 5-7, 2011. Proceedings 22. Springer, pp. 189–203. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A., 2016. The lim-
Chen, J., Jordan, M.I., 2019. Boundary attack++: query-efficient decision-based adver- itations of deep learning in adversarial settings. In: 2016 IEEE European Symposium
sarial attack. arXiv preprint arXiv:1904.02144. vol. 2, no. 7. on Security and Privacy (EuroS&P). IEEE, pp. 372–387.
Chen, J., Jordan, M.I., Wainwright, M.J., 2020. Hopskipjumpattack: a query-efficient Park, H., Faradonbeh, M.K.S., 2021. Analysis of Thompson sampling for partially observ-
decision-based attack. able contextual multi-armed bandits. IEEE Control Syst. Lett. 6, 2150–2155.
Chen, K., 2020. Stealing deep reinforcement learning models for fun and profit. In: Pharate, A., Bhat, H., Shilimkar, V., Mhetre, N., 2015. Classification of intrusion detection
Proceedings of the 2021 ACM Asia Conference on Computer and Communications system. Int. J. Comput. Appl. 118 (7).
Security. Pujari, M., Pacheco, Y., Cherukuri, B., Sun, W., 2022. A comparative study on the impact
Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.-J., 2017. Zoo: zeroth order opti- of adversarial machine learning attacks on contemporary intrusion detection datasets.
mization based black-box attacks to deep neural networks without training substitute SN Comput. Sci. 3 (5), 1–12.
models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Ramos, D., Faria, P., Gomes, L., Campos, P., Vale, Z., 2022a. Selection of features in
Security, pp. 15–26. reinforcement learning applied to energy consumption forecast in buildings according
Cutler, A., Cutler, D.R., Stevens, J.R., 2012. Random forests. In: Ensemble Machine Learn- to different contexts. Energy Rep. 8, 423–429.
ing: Methods and Applications, pp. 157–175. Ramos, D., Faria, P., Gomes, L., Campos, P., Vale, Z., 2022b. A learning approach to
De Cristofaro, E., 2020. An overview of privacy in machine learning. arXiv preprint arXiv: improve the selection of forecasting algorithms in an office building in different
2005.08679. contexts. In: Progress in Artificial Intelligence: 21st EPIA Conference on Artificial In-
Duy, P.T., Khoa, N.H., Nguyen, A.G.-T., Pham, V.-H., et al., 2021. Digfupas: deceive ids telligence, EPIA 2022, Lisbon, Portugal, August 31–September 2, 2022, Proceedings.
with gan and function-preserving on adversarial samples in sdn-enabled networks. Springer, pp. 271–281.
Comput. Secur. 109, 102367. Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A., 2019. A survey of network-
Faker, O., Dogdu, E., 2019. Intrusion detection using big data and deep learning tech- based intrusion detection data sets. Comput. Secur. 86, 147–167.
niques. In: Proceedings of the 2019 ACM Southeast Conference. ACM SE’19, New Rokach, L., Maimon, O., 2005. Decision trees. In: Data Mining and Knowledge Discovery
York, NY, USA. Association for Computing Machinery, pp. 86–93. Handbook, pp. 165–192.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Sahoo, K.S., Tripathy, B.K., Naik, K., Ramasubbareddy, S., Balusamy, B., Khari, M., Bur-
A., Bengio, Y., 2020. Generative adversarial networks. Commun. ACM 63 (11), gos, D., 2020. An evolutionary svm model for ddos attack detection in software
139–144. defined networks. IEEE Access 8, 132502–132513.
12
A. Paya, S. Arroni, V. García-Díaz et al. Computers & Security 136 (2024) 103546
Sahu, S.K., Mohapatra, D.P., Rout, J.K., Sahoo, K.S., Pham, Q.-V., Dao, N.-N., 2022. A Zizzo, G., Hankin, C., Maffeis, S., Jones, K., 2019. Adversarial machine learning beyond
lstm-fcnn based multi-class intrusion detection using scalable framework. Comput. the image domain. In: Proceedings of the 56th Annual Design Automation Conference
Electr. Eng. 99, 107720. 2019, pp. 1–4.
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A., 2018a. Toward generating a new in-
trusion detection dataset and intrusion traffic characterization. In: ICISSp, vol. 1,
pp. 108–116. Antonio Payá González is a Ph.D. student developing his
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A., 2018b. Toward generating a new intrusion thesis on defense systems for Software-Defined Perimeters and
detection dataset and intrusion traffic characterization. In: International Conference Intrusion Detection Systems. He has a Master’s degree in Web
on Information Systems Security and Privacy. Engineering of the University of Oviedo. He is currently working
for TheNextPangea S.L. on topics related to Machine Learning
Sharafaldin, I., Lashkari, A.H., Hakak, S., Ghorbani, A.A., 2019. Developing realistic dis-
and Artificial Intelligence applied on cybersecurity fields.
tributed denial of service (ddos) attack dataset and taxonomy. In: 2019 International
Carnahan Conference on Security Technology (ICCST), pp. 1–8.
Shroff, J., Walambe, R., Singh, S.K., Kotecha, K., 2022. Enhanced security against vol-
umetric ddos attacks using adversarial machine learning. Wirel. Commun. Mob.
Comput. 2022. Sergio Arroni del Riego is a student of Software Engi-
Sinaga, K.P., Yang, M.-S., 2020. Unsupervised k-means clustering algorithm. IEEE Ac- neering at the University of Oviedo, passionate about Machine
cess 8, 80716–80727. Learning, Artificial Intelligence and Cybersecurity among other
Suthaharan, S., Suthaharan, S., 2016. Support vector machine. In: Machine Learning Mod- Software fields. He is currently working at the Foundation of the
els and Algorithms for Big Data Classification: Thinking with Examples for Effective University of Oviedo, researching new advances in the field of
Learning, pp. 207–235. Artificial Intelligence and Machine Learning.
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A., 2009. A detailed analysis of the kdd
cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security
and Defense Applications, pp. 1–6.
Thakkar, A., Lohiya, R., 2020. A review of the advancement in intrusion detection
Vicente García-Díaz is an Associate Professor in the Depart-
datasets. Proc. Comput. Sci. 167, 636–645.
ment of Computer Science at the University of Oviedo, Spain.
Thiyam, B., Dey, S., 2023. Efficient feature evaluation approach for a class-imbalanced He is a Software Engineer, PhD in Computer Science. He has a
dataset using machine learning. In: International Conference on Machine Learning Master’s in Occupational Risk Prevention and the qualification of
and Data Engineering. Proc. Comput. Sci. 218, 2520–2532. University Expert in Blockchain Application Development. He is
Tobi, A.M.A., Duncan, I., 2018. Kdd 1999 generation faults: a review and analysis. J. part of the editorial and advisory board of several indexed jour-
Cyber Secur. Technol. 2 (3–4), 164–200. nals and conferences and has been editor of several special issues
Usama, M., Asim, M., Latif, S., Qadir, J., et al., 2019. Generative adversarial networks for in books and indexed journals. He has supervised 100+ academic
launching and thwarting adversarial attacks on network intrusion detection systems. projects and published 100+ research papers in journals, con-
In: 2019 15th International Wireless Communications & Mobile Computing Confer- ferences, and books. His teaching interests are primarily in the
ence (IWCMC). IEEE, pp. 78–83. design and analysis of algorithms and the design of domain-specific languages. His current
Wang, Y., Sun, T., Li, S., Yuan, X., Ni, W., Hossain, E., Poor, H.V., 2023. Adversarial research interests include decision support systems, health informatics and eLearning.
attacks and defenses in machine learning-powered networks: a contemporary survey.
arXiv preprint arXiv:2303.06302. Alberto Gómez Gómez works for the Department of Busi-
Wright, R.E., 1995. Logistic regression. ness Administration, at the School of Industrial Engineering of
Wu, Z., Zhang, H., Wang, P., Sun, Z., 2022. Rtids: a robust transformer-based approach The University of Oviedo, Spain. His teaching and research ini-
for intrusion detection system. IEEE Access 10, 64375–64387. tiatives focus on the areas of Production Management, Applied
Xu, H., Ma, Y., Liu, H.-C., Deb, D., Liu, H., Tang, J.-L., Jain, A.K., 2020. Adversarial Artificial Intelligence and Information Systems. He has written
attacks and defenses in images, graphs and text: a review. Int. J. Autom. Comput. 17, several national and international papers. Journal of the Oper-
ational Research Society. Artificial Intelligence for Engineering
151–178.
Design, Analysis and Manufacturing. International Journal of
Yeom, S., 2017. Privacy risk in machine learning: analyzing the connection to overfitting.
Foundations of Computer Science. European Journal of Opera-
In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282. tional Research. International Journal of production economics.
Zhang, J., Zulkernine, M., Haque, A., 2008. Random-forests-based network intrusion de- Engineering Applications of Artificial Intelligence. Concurrent Engineering- Research and
tection systems. IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 38 (5), 649–659. Applications.
Zhao, S., Li, J., Wang, J., Zhang, Z., Zhu, L., Zhang, Y., 2021. attackgan: adversarial attack
against black-box ids using generative adversarial networks. Proc. Comput. Sci. 187,
128–133.
13