Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Gradual Failure of a Rainfall-Induced Creep-Type Landslide and an Application of Improved Integrated Monitoring System: A Case Study
Next Article in Special Issue
Detecting Unusual Repetitive Patterns of Behavior Indicative of a Loop-Based Attack in IoT
Previous Article in Journal
An Innovative Neighbor Attention Mechanism Based on Coordinates for the Recognition of Facial Expressions
Previous Article in Special Issue
Enabling Design of Secure IoT Systems with Trade-Off-Aware Architectural Tactics
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Flow-Based Anomaly Detection System for Enhanced Security in IoT Networks

Department of Information Science, College of Humanities and Social Sciences, King Saud University, Riyadh P.O. Box 11451, Saudi Arabia
Sensors 2024, 24(22), 7408; https://doi.org/10.3390/s24227408
Submission received: 25 September 2024 / Revised: 21 October 2024 / Accepted: 19 November 2024 / Published: 20 November 2024
(This article belongs to the Special Issue IoT Cybersecurity)

Abstract

:
The growing integration of Internet of Things (IoT) devices into various sectors like healthcare, transportation, and agriculture has dramatically increased their presence in everyday life. However, this rapid expansion has exposed new vulnerabilities within computer networks, creating security challenges. These IoT devices, often limited by their hardware constraints, lack advanced security features, making them easy targets for attackers and compromising overall network integrity. To counteract these security issues, Behavioral-based Intrusion Detection Systems (IDS) have been proposed as a potential solution for safeguarding IoT networks. While Behavioral-based IDS have demonstrated their ability to detect threats effectively, they encounter practical challenges due to their reliance on pre-labeled data and the heavy computational power they require, limiting their practical deployment. This research introduces the IoT-FIDS (Flow-based Intrusion Detection System for IoT), a lightweight and efficient anomaly detection framework tailored for IoT environments. Instead of employing traditional machine learning techniques, the IoT-FIDS focuses on identifying unusual behaviors by examining flow-based representations that capture standard device communication patterns, services used, and packet header details. By analyzing only benign traffic, this network-based IDS offers a streamlined and practical approach to securing IoT networks. Our experimental results reveal that the IoT-FIDS can accurately detect most abnormal traffic patterns with minimal false positives, making it a feasible security solution for real-world IoT implementations.

1. Introduction

The rapid expansion of the Internet of Things (IoT) is evident in the increasing number of devices, users, and applications across various industries and use cases [1,2]. The IoT has significantly integrated into daily life, from healthcare with wearable technology and remote patient monitoring to smart city initiatives leveraging IoT-based sensors for traffic and environmental data [3,4]. These examples underscore IoT’s broad influence across sectors [4,5].
However, this deep integration also brings substantial security concerns. IoT devices often lack standardization and advanced security features due to resource limitations, making them susceptible to cyberattacks [6,7,8]. The varied nature of IoT devices, compounded by the absence of a unified technology or protocol framework, increases the risk of unauthorized access or disruption of critical infrastructure [7,9]. The expanding security risks in IoT ecosystems highlight the crucial requirement for advanced security needs to safeguard IoT networks.
Among the diverse solutions, Behavioral-based Intrusion Detection Systems (IDS) have emerged as effective tools for enhancing IoT network defenses [10,11]. Behavioral-based IDS continuously monitor network operations to detect anomalies or unusual activities, identifying potential intrusions or attacks by analyzing behavioral patterns. The system plays a critical role in promptly alerting administrators to any suspicious behavior, helping to mitigate security threats [12,13].
Various types of Behavioral-based IDS have been developed, each employing unique methods for safeguarding IoT devices. For instance, Signature-based IDS (SIDS) rely on predefined patterns of known attacks to detect malicious activity [14,15]. In contrast, Anomaly-based IDS monitor deviations from established norms in network behavior, making them effective for identifying new or unknown threats [16,17]. Additionally, AI-driven IDS use machine learning algorithms to continuously refine detection capabilities [18,19], ensuring robust protection against evolving cyber threats.
Despite these advancements, Behavioral-based IDS still face critical challenges. Many approaches depend on large, labeled datasets containing both normal and malicious traffic, which are not only difficult to compile but also resource-intensive to process [20,21]. Additionally, machine-learning-based IDS often require significant computational power, making them impractical for resource-constrained IoT devices [22,23].
To address these issues, this paper introduces the IoT-FIDS (Flow-based Intrusion Detection System for IoT), a lightweight and efficient anomaly detection framework tailored for IoT environments. Rather than relying on traditional machine learning techniques, IoT-FIDS uses flow-based representations to analyze network traffic patterns, focusing on normal traffic to detect anomalies. This method reduces computational overhead and simplifies the detection process, making it suitable for real-world IoT implementations.
Our research demonstrates that the IoT-FIDS efficiently detects abnormal behaviors by examining network traffic without relying on pre-labeled attack data. This novel approach profiles device behavior using flow-based analysis of network packets, providing a streamlined solution for intrusion detection in resource-limited IoT devices. The IoT-FIDS effectively identifies malicious traffic patterns, ensuring robust network security while maintaining minimal false positives. This study aims to address the following pivotal research questions:
RQ1: How effective is the IoT-FIDS in accurately and precisely detecting anomalies and intrusions in IoT environments?
RQ2: Is the IoT-FIDS lightweight enough to serve as a practical and feasible security solution for real-world IoT deployments?
This paper is organized to guide the reader through our research effectively. In Section 2, we introduce the main topic, discuss its importance, and provide the necessary background information. Section 3 explains the methods used to generate flow-based representations and to detect malicious packets. Section 4 covers the data-preprocessing steps, detailing how we processed the datasets, selected relevant packet headers, and optimized flow-based representations for the analysis. Section 5 offers an inclusive analysis of the experiments we conducted and the results we achieved. Lastly, Section 6 addresses the limitations of our study, suggest future research directions, and concludes with our final remarks.

2. Background

To provide a comprehensive understanding of Behavioral-based IDS, this section explores their various types and characteristics. We first examine their deployment environments, then analyze the methodologies they employ, such as Anomaly-based Intrusion Detection Systems (AIDS) and Signature-based Intrusion Detection Systems (SIDS). Lastly, we focus on advanced approaches leveraging machine learning.

2.1. Deployment Environments of Behavioral-Based IDS

Behavioral-based IDS are essential in contemporary security frameworks, continuously monitoring network and system activities to detect unauthorized access or malicious behavior [20,24]. These systems analyze network traffic and operations to identify anomalies or threats. They can be deployed in two key environments: Host-based IDS (HIDS) and Network-based IDS (NIDS).
HIDS focus on protecting individual devices by monitoring their specific activities. Each host examines system logs, oversees file integrity, and observes application behaviors to detect signs of unauthorized access [22,25]. Farrukh [26] introduced SENIDS, a Self-Evolving Network-based IDS that uses artificial neural networks to enhance real-time threat detection for IoT devices. On the other hand, NIDS monitor network traffic as it passes through routers, switches, and firewalls, searching for patterns that match identified attack signatures or exhibit irregular behaviors [27,28].
Recent advancements have focused on enhancing IDS accuracy while reducing complexity. Farrukh et al. [26] introduced a NIDS combining machine learning and the Arithmetic Optimization Algorithm (AOA), achieving up to 99% accuracy with reduced feature complexity. Hybrid systems that merge HIDS and NIDS offer more robust security, effectively correlating device-specific and network-level visibility to detect malicious patterns [29,30].

2.2. Signature-Based vs. Anomaly-Based Behavioral-Based IDS

Behavioral-based IDS methodologies are typically categorized into Signature-based IDS (SIDS) and Anomaly-based IDS (AIDS). SIDS operate by maintaining a repository of known cyberattack signatures. When incoming traffic matches an existing signature, an alert is triggered [29,31]. Anju and Krishnamurthy [32] introduced CBSigIDS, a blockchain-enabled framework that updates attack signatures in a decentralized IoT environment. AIDS, in contrast, establish a baseline of normal network behavior and flag deviations from this baseline as potential threats [29,33].
Diro et al. [34] presented an AIDS leveraging kernel PCA for feature reduction and kernel ELM for classification, efficiently distinguishing between normal and malicious traffic. Hybrid systems that combine SIDS and AIDS provide stronger defenses by using signature matching for known attacks and anomaly detection for new, unknown threats [32,35]. Otoum and Nayak [36] proposed AS-IDS, a framework for IoT environments that combines both methods. It filters network traffic, preprocesses data, and uses a lightweight neural network and Deep Q-learning to detect both familiar and novel threats.

2.3. Machine-Learning-Based Anomaly Detection

Anomaly-based IDS (AIDS) can be classified into statistical-based, knowledge-based, and machine-learning-based systems. Statistical-based IDS model normal behavior through statistical data, flagging deviations as intrusions [37]. Knowledge-based IDS compare network traffic against predefined benign patterns [14,34,36].
Machine-learning-based IDS learn from traffic patterns to detect novel threats. They adaptively identify attacks that were previously unseen, making them highly effective for modern cybersecurity challenges [29,33]. Intrusion detection approaches are generally categorized into three primary types: supervised, unsupervised, and semi-supervised methods [38,39].
Supervised methods depend on labeled data to classify traffic as normal or malicious. DeMedeiros et al. [40] applied optimization techniques to enhance detection efficiency while minimizing training time. Unsupervised methods, which do not involve labeled data, analyze patterns to detect outliers and potential threats [21,31]. Nassif et al. [41] developed an unsupervised anomaly detection model using flow-based analysis, designed to detect DDoS attacks in IoT networks.
Semi-supervised methods combine aspects of both supervised and unsupervised techniques, typically using a small set of labeled data and a bigger set of unlabeled data. Shen et al. [42] and Khan et al. [43] introduced SS-Deep-ID, a semi-supervised deep learning model that integrates multiscale residual temporal convolutional layers and traffic attention mechanisms, improving anomaly detection by prioritizing critical data points.

2.4. Comparative Analysis of Flow-Based IDS Methods

The primary IDS methodologies—SIDS, AIDS, and machine-learning-based IDS—each offer distinct advantages and limitations. SIDS provide high accuracy for known attacks but are ineffective against new or evolving threats and require constant signature updates. AIDS, while capable of detecting unknown attacks, often suffer from high false-positive rates and increased computational demands, making them challenging for real-time deployment. Machine-learning-based IDS adapt to evolving threats and can provide high detection accuracy, but they come with significant resource consumption, require extensive training data, and often involve retraining as new threats emerge.
Our proposed IoT-FIDS addresses many of these limitations by employing a lightweight, flow-based approach that is specifically tailored for resource-constrained IoT environments. Unlike the methods in [1,44], which rely on complex machine learning models or optimization algorithms (such as neural networks optimized with GSA in [44]), the IoT-FIDS does not require computationally expensive training or optimization phases. The IoT-FIDS reduces the need for constant updates (as required by SIDS), lowers false-positive rates (compared to AIDS), and avoids the computational burden typically associated with machine learning models.
  • In contrast to [44], which uses a neural network optimized with GSA for flow-based anomaly detection, the IoT-FIDS uses simpler flow-based profiling without the need for iterative optimization. This makes the IoT-FIDS more suitable for real-time intrusion detection in IoT environments where computational resources are limited.
  • Compared to [1], which utilizes machine learning for flow-based anomaly detection in SDN, the IoT-FIDS is better suited for IoT environments because it does not require the extensive training and feature selection processes that machine learning models need.
Table 1 provides a detailed comparison of these IDS methodologies, highlighting their advantages, disadvantages, and how the IoT-FIDS overcomes many of their limitations, particularly in terms of computational efficiency and suitability for IoT environments.
The IoT-FIDS provides a more streamlined and efficient solution by leveraging flow-based profiling without requiring machine learning or optimization algorithms, as seen in methods like those in [1,44]. Its suitability for real-time IoT environments, where computational power is limited, makes it a practical choice for intrusion detection in modern networks. By addressing the limitations of SIDS, AIDS, and machine-learning-based methods, the IoT-FIDS ensures low computational overhead and adaptability to evolving threats.

3. Methodology

In line with Baz [45] and Zohourian [46], our approach centers on converting each packet into a unique signature that encapsulates its essential features, using the TCP/IP stack with the HTTP protocol, which is widely adopted in IoT networks for reliable communication. This section provides a detailed explanation of how this transformation aids in detecting packets that deviate from normal behavior. We also introduce a metric to measure the distance between these signatures, which is crucial for assessing the level of maliciousness attributed to a packet.

3.1. Flow-Based Representation

The flow-based representation strategy is based on three main principles: communication patterns, service types, and header field information. The goal is to eliminate inherent randomness by consolidating numerous similar packets—those performing the same function—into a single representative form. This approach reduces a large dataset of packets into a smaller set of unique flow-based representations, which serve as baseline profiles characterizing typical packet behavior. These profiles successfully capture the key characteristics of normal activity while minimizing difficulty.

3.1.1. Communication Patterns

Due to their resource limitations and restricted communication capabilities, IoT devices generally interact with a limited quantity of endpoints, each designated for a specific function. These devices exhibit a limited range of communication behaviors, making it easier to classify them. The classification of communication patterns is based on several subcriteria. The first factor is identifying the specific endpoints with which the IoT device communicates, using IP and MAC addresses to specify devices or servers. Next, the direction of communication is determined by whether data packets are inbound or outbound. Additionally, the scope of communication is assessed to determine whether the interaction occurs within a local area network (LAN) or extends to external networks (WANs). Finally, the distribution of messages is analyzed to classify whether communication is unicast (to a single recipient), multicast (to multiple recipients), or broadcast (to all devices on the network).
By assessing these factors, a baseline for normal communication patterns is established. Anomalous behavior is identified when a device interacts with a new or unexpected endpoint—whether on a LAN or WAN—or when it communicates with an identified endpoint in an unusual manner. Detecting these anomalies is essential for identifying potential security threats or operational issues within IoT networks.

3.1.2. Service Types

IoT devices rely on specific protocols and port records to manage communication with their endpoints. To distinguish between different service types, three key elements are examined. The first element is the communication protocol in use, such as UDP or TCP. The second element is the assigned port number that facilitates the communication service. Lastly, the specific service being utilized for communication, such as HTTP or DNS, is analyzed. If a device engages with an unfamiliar service or uses an identified service through an unusual endpoint, such activity is highlighted as an anomaly.

3.1.3. Header Field Information

Another important observation is that IoT devices tend to use precise packet header field values consistently through communication. These header values, which vary across network layers, help to classify different packet structures. For Layer 2 protocols, such as ARP, distinct field values are used to identify specific communication traits. Similarly, Layer 3 protocols, including IP, IGMP and ICMP, have distinct field values. At Layer 4, protocols like UDP and TCP exhibit unique header field values, and at Layer 5, protocols such as DNS and HTTP contribute further distinguishing characteristics.
By analyzing the header values across these network layers, the system can effectively classify packets. Any packet that deviates from these established header field patterns is considered irregular and may indicate potential malicious behavior.

3.2. Flow-Based Translation

This section outlines the process of converting packets into flow-based representations. Let P   be the collection of all packets, where p P   represents a packet. We state p as a sequence of header fields and a load:
p = h 1 , h 2 , h 3 , , h n , d
where h i , d + are positive integers. Let R be the set of flow-based symbols, with r R being the symbol of packet p :
r = h 1 * , h 2 * , h 3 * , , h m *
In this representation, each header h i can be either discarded, retained, modified, or replaced with another value, while the payload d is omitted. The headers chosen for these flow-based representations are elaborated on in Section 4.3. Importantly, m is less than or equal to n , as the goal is to reduce the complexity of and variability in the packets by summarizing them into more concise representations. Ideally, we aim to ensure m is much smaller than n . We define a mapping function f : P R and for every flow f p = r , there exists a corresponding flow-based representation p P , r R , f p = r .
P D is the set of typical packets associated with a specific device D . Each packet in P D is assigned to its equivalent representation, creating the normal profile P D , which is the set of standard representations for device D . The total number of representations is expressively fewer than the number of individual packets since we aim to minimize the number of distinct header values. In other words, multiple packets can be mapped to a single representation that captures their key characteristics for that device.
From an execution perspective, the mapping process can be viewed as a sequence of removal, insertion, and transformation steps, converting a packet into its representation. This reduction in features, and thus in representations, is what makes the model lightweight.
For an incoming packet p from device D , the packet is mapped to its representation r , where r = f p , the packet is classified as normal. If not, it may be highlighted as anomalous, depending on the representation distance discussed in below.

3.3. Deviation in Representation

As outlined earlier, a packet might diverge from typical patterns due to three key factors: communication patterns, services used, and header configurations. For example, irregularities can occur when communicating with an unknown endpoint, using a service not previously encountered, or employing a different header setup. The complexity increases when these issues intersect. For instance, if a device simultaneously communicates with an unknown endpoint, uses a new facility, and adopts a different header construction, the likelihood of anomalies increases.
However, it is crucial to recognize that deviating from any of these factors does not automatically indicate abnormal behavior. In some cases, it could just be an unfamiliar, yet legitimate packet inside the bounds of regular traffic. Consequently, simply determining if a packet fits within the usual representation profile is insufficient for identifying abnormalities. Doing so could result in a significant number of false positives. To tackle these challenges and minimize false positives, we introduce a metric that measures the distance between two packet representations, which quantifies the level of abnormality when a packet does not align with any pre-established usual profiles. This metric is key to distinguishing genuinely irregular packets (true positives) after those that might then be mistakenly flagged as false positives.
We use the Hamming distance to determine the number of differences between two representations. Let us assume that u , v are two flow-based representations belonging to the set u , v R . We define the distance between them as follows:
d u , v = i = 1 n u i v i
Next, the proximity of a packet to a specified set of representations is defined as the shortest distance from any representation in that set. Let R D represent the set of usual representations for device D , and let p represent a new flow with representation u . We describe the distance as follows:
d u , R D = m i n v R D d u , v
We utilize this distance metric to assess whether a packet is abnormal, which helps in minimizing false positives caused by insufficient packet data from a specific device.

3.4. Profiling and Monitoring

The model operates in two main stages: Analysis and Monitoring. In the Analysis phase, normal network traffic is associated with corresponding models, which are stored in a device profile database. In the Monitoring phase, received packets are compared to the stored representations, and the distance between the packet and the profile is calculated. If this distance exceeds a predetermined threshold, the packet is flagged as suspicious or anomalous.
To further reduce false positives, we implemented flow-based intrusion detection, leveraging packet representations by analyzing the approach of packets within individual flow. This method is advantageous because it evaluates the collective behavior of packets within the same flow, determining abnormality through a consensus mechanism among the individual packets. From an intrusion detection standpoint, generating an alert for every packet is not ideal, as it can overwhelm the user. The outcomes, discussed in Section 5, demonstrate that flow-based intrusion detection significantly enhances performance.

3.5. Legitimate vs. Malicious Network Traffic

Legitimate network traffic refers to authorized and non-harmful interactions of data within a network. This type of traffic includes regular, permitted interactions between devices, wherever data packets are created and sent based on standard protocols and normal communication patterns.
On the other hand, malicious network traffic involves unauthorized and harmful data exchanges within a network. These actions are intended to breach the network’s integrity, confidentiality, or availability, potentially affecting connected devices.
The perspective of network packet capture (pcap) reveals that the main difference between legitimate and malicious pcaps is that attack traffic may consist of both valid and harmful packets. Typically, datasets classify a pcap as either fully legitimate or fully malicious, but in reality, a malicious pcap can include both benign and harmful packets. This leads to ongoing discussions about how to properly distinguish between legitimate and attack pcaps. A central issue is the proper labeling of packets or flows in network traffic to accurately identify their malicious nature.
This research, therefore, emphasizes the exact detection of harmful packets inside network traffic, regardless of whether the entire traffic flow has been categorized as legitimate or malicious. The goal is to thoroughly recognize each irregular and potentially harmful packet within the traffic.

4. Data Preprocessing

4.1. Dataset Overview

This research employs two well-acknowledged datasets: UNSW-NB15 and BoT-IoT. The UNSW-NB15 dataset includes various types of network traffic such as active, power-related, idle traffic patterns and interactive, all of which are incorporated into our study. Meanwhile, BoT-IoT primarily provides normal network traffic, allowing for the establishment of a baseline for typical device and network behavior. The normal packets from both datasets are converted into flow-based representations and cataloged as profiles for the respective devices.

4.2. Data Acquisition and Utilization

The UNSW-NB15 dataset assists primarily in profiling network traffic generated by IoT devices, providing packet captures that facilitate detailed behavioral analysis. This dataset covers multiple types of traffic, such as active, power-related, idle, scenario-specific traffic, interactive. Due to this broad coverage, the UNSW-NB15 dataset provides a crucial foundation for establishing a starting point representation of IoT device behavior. This dataset was gathered from a consistent network environment, and the normal traffic from UNSW-NB15 is used to establish this baseline profile.
In contrast, BoT-IoT is an attack-focused dataset containing packet captures of both benign and malicious traffic generated by IoT devices. It supports a wide range of security analyses, offering different types of attacks, such as reconnaissance, spoofing, web-based threats, and DDoS. In this study, we begin by leveraging the benign traffic from BoT-IoT to strengthen the baseline profile of normal behavior. Subsequently, the dataset’s attack traffic is applied during the monitoring phase to facilitate Behavioral-based IDS analysis.

4.3. Packet Header Optimization

A systematic approach was undertaken to carefully select the most suitable packet header values for mapping purposes. Initially, the packets included all header values from the Ethernet, UDP, and IP, as well as TCP layers. Gradually, irrelevant or ineffective headers were eliminated, and the process was broken down into several steps.

4.3.1. Elimination of Random Headers

Certain packet headers exhibit randomness and offer little insight into the behavior or identification of devices. For instance, headers related to length and size fluctuate based on network conditions and the volume of transmitted data, making them appear arbitrary from an application’s perspective. Additionally, sequence numbers, offsets (based on packet segmentation), and checksums used for data truthfulness verification do not reflect precise device patterns.

4.3.2. Removal of Static Headers

To focus on distinguishing features in flow-based analysis, static headers that remain constant due to the simplicity of IoT device behavior or the nature of network protocols were removed. Since this study focused exclusively on UDP and TCP packets, fields like category in the ETH layer and type and protocol in the IP layer were deemed irrelevant. Additionally, headers showing no significant variation across the dataset were found to lack differentiating potential. As a result, headers such as type, ip-hdr-len, diff-ser-field, version and protocol were excluded.

4.3.3. Transformation of Headers with Variable Values

Port numbers are essential for identifying a device’s typical services. However, client-side ephemeral ports are randomly chosen for each TCP/UDP session, leading to excessive variability and poor scalability. Some operating systems also limit ephemeral port selection to specific ranges. To address this, ephemeral ports were grouped into ranges of 5000 (e.g., [0–5000], [5001–10,000], etc.). Likewise, numerous IoT devices display consistent payload lengths, which can act as identifying features. To ensure uniformity, the payload lengths for both UDP and TCP were rounded up to the nearest power of 2.

4.3.4. Exclusion of Unstable Headers

Certain headers, such as the source and destination IP addresses, were found to be unreliable because they can change over time. Therefore, these fields were excluded. However, these fields were used to derive a new feature called “scope”, which determines whether the communication takes place locally inside the network or externally over the internet. This newly created “scope” feature was incorporated into the final flow-based representation.

4.3.5. Retention of Communication-Specific Headers

Packet communication is a crucial aspect of this analysis; so, essential headers were retained to capture information related to endpoints, communication direction, scope, and distribution. To achieve this, we retained the MAC addresses of both the source and destination, in addition to the previously developed “scope” attribute. These MAC addresses help retain details about communication endpoints, direction (source to destination), and the distribution method (whether unicast, multicast, or broadcast), as certain MAC address ranges are reserved for multicast and broadcast communications.

5. Evaluation

This section provides a detailed description of our strategies for profiling and monitoring network traffic. Our methodology is divided into two primary phases. First, during the profiling stage, we collected and analyzed all distinctive features of packets from usual network activity. This allowed us to build a starting point profile that characterizes typical network behavior. This baseline is crucial as it serves as a reference point for the next phase. In the subsequent monitoring phase, we examined incoming packets and compared their characteristics to the established baseline profile. Based on the degree of their deviation from this baseline, we classified the packets as either normal or abnormal. The core concept of our approach is to utilize the critical information extracted from normal packets as a standard. This enables us to identify and flag packets that significantly diverge from expected patterns, thereby detecting anomalies within the network traffic.

5.1. Profiling

In the profiling stage, each packet from the UNSW-NB15 and BoT-IoT datasets was transformed into a distinct, unique flow-based representation. These representations were archived to build an established benchmark of standard network activity. Table 2 illustrates the total number of packets and their corresponding unique representations. This conversion process reduces the packet space through a more streamlined and effective technique.
The number of unique flow-based representations is influenced by the complexity of the features included and how they are transformed. For instance, retaining detailed features such as precise port numbers or packet measurements increases the quantity of representations, while simplifying or removing specific features reduces it. These representations form the search space used by our model to compute distances between packets, directly impacting its efficiency.
Striking the right balance between the complexity of the flow-based representations and their ability to generalize is essential. Achieving this balance ensures accurate and efficient anomaly detection without overcomplicating the model or sacrificing performance.

5.2. Labeling

The Behavioral-based IDS developed in this research relies on a detailed analysis of individual network packets, making accurate labeling of the datasets critical. For new attack .pcap files, approximately 94% of packets transmitted from the attacker to the victim were labeled as an “attack”. This decision was guided by a comprehensive understanding of both the entities involved and the nature of the attack. However, it is important to recognize that not all communications between the attacker and victim are necessarily malicious. In some cases, normal interactions may have occurred before the attack. As a result, the labeling process was approached with meticulous care, ensuring a high level of precision in differentiating between legitimate and malicious traffic.

5.3. Monitoring

In the monitoring stage, new network traffic, including attack traffic from the BoT-IoT dataset, was evaluated. Initially, normal traffic profiles were constructed using standard data from the UNSW-NB15 and BoT-IoT datasets during the training phase. When new packets arrived, each packet was mapped to its distinctive flow-based representation and compared to the established baseline profiles to detect anomalies.
The evaluation of these profiles against the attack data from BoT-IoT took into account two important aspects. The first was the protocol focus, as the tests were limited to attacks involving TCP or UDP packets. The second aspect was the detection methods, where the model was evaluated using together packet-based and flow-based intrusion detection techniques.
For an individual type of attack, the evaluation was conducted in four scenarios. These scenarios included profiling with UNSW-NB15 data using packet-based intrusion detection, profiling with UNSW-NB15 data using flow-based intrusion detection, profiling with BoT-IoT data using packet-based intrusion detection, and profiling with BoT-IoT data using flow-based intrusion detection.
To ensure the model’s generalization ability was not compromised, the attack data were capped at a maximum of 12,000 packets. Across the majority of cases, the results remained consistent, confirming the robustness of the model under varying conditions.

5.4. Evaluating the Effectiveness of IoT-FIDS

IoT-FIDS functions by establishing a behavioral profile for each IoT device, capturing flow-based representations of their typical network packets. When new traffic is detected, the system compares it against the device’s normal profile. Significant deviations from these established patterns are flagged as abnormal or malicious. The system takes advantage of the inherent simplicity and predictability of IoT devices’ behavior and repetitive traffic patterns aligned with their specific functions. Any substantial deviation from this norm is considered suspicious. For instance, if an IoT camera that usually communicates with cloud services over HTTPS starts sending or receiving SSH packets—a protocol not typical for its operation—the system identifies this as anomalous and labels the SSH traffic as malicious.
In this section, we conduct a comprehensive performance analysis of the IoT-FIDS using standard evaluation metrics such as accuracy, precision, recall, and F1-score. We detail the experimental procedures and present results for various attack scenarios, with web-based attacks, reconnaissance efforts, and dictionary brute-force attempts, as well as both Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks.

5.4.1. Web Attack Detection Results

This section presents our analysis of web attack detection. For each attack scenario, we processed the associated .pcap file through our model, conducting intrusion detection at both the packet and flow-based levels. The outcomes are summarized in Table 3. When employing the BoT-IoT dataset, we observed a notable enhancement in overall performance. This improvement is attributed to the dataset’s comprehensive inclusion of device representations, which the UNSW-NB15 dataset lacks for many devices found in the BoT-IoT version. Despite the limited device information in UNSW-NB15, the profiling model based on this dataset still achieved commendable attack detection, with a recall rate of at least 91%. However, this came with the drawback of a higher number of false positives compared to that when using the newer dataset.
A key finding is that flow-based intrusion detection outperforms packet-based detection. This superiority arises from aggregating packet classifications within a flow and determining the most frequent classification (mode). This method effectively reclassifies packets that were individually marked as regular but are part of a movement where the majority are abnormal. Consequently, flow-based detection attained a perfect recall rate of 100% for all attacks, regardless of the baseline dataset used.
Figure 1 illustrates the distribution of benign versus attack flows across all web attack scenarios. The data clearly indicate a significant imbalance, with benign traffic overwhelmingly outnumbering attack traffic. Nonetheless, our model accurately detected all attacks. This outcome underscores that our packet-level attack detection approach remains effective even without balanced datasets.

5.4.2. Reconnaissance Attacks

In the BoT-IoT dataset, each reconnaissance attack type is provided in its own .pcap file, encompassing multiple skimming attempts on various hosts. We assessed our model against all the attacks listed, with the exception of the Ping Sweep attack, which utilizes ARP/ICMP protocols and falls outside the scope of our study.
The results, summarized in Table 4, reveal a pattern similar to that observed with web attacks. Specifically, the BoT-IoT dataset demonstrates superior performance compared to the UNSW-NB15 dataset, and the flow-based method generally outperforms the packet-based method. Despite this, each attack type achieved at least 91% recall and 80% precision. In general, nearly all malicious flows were identified, resulting in only a minimal number of false positives.

5.4.3. Dictionary Brute-Force Attacks

An in-depth examination of the BoT-IoT dataset revealed that the single .pcap file provided for all dictionary attacks actually contains five distinct attacks merged together: three SSH brute-force attacks and two RTSP URL brute-force attacks. This discovery was made possible by sorting packet data based on time deltas using Wireshark. Consequently, we tested each of these five attacks individually using our model, and the detailed outcomes are presented in Table 5. The UNSW-NB15 dataset also exhibited distinct performance, and our evaluations showed that the model performed better overall with the BoT-IoT dataset, particularly in flow-based detection scenarios.
Although the detection rates were exceptionally high, the initial two attacks resulted in a high number of false positives, which significantly reduced precision. A deeper analysis revealed that the false positives were caused by a laptop within the network, which generated packets with inconsistent configurations. This finding underscores that our method is particularly effective for IoT devices, which typically generate straightforward network traffic patterns.

5.4.4. Detection Performance on DoS Attacks

Due to their exceptionally high transmission rates, DoS attacks can be identified using a variety of detection methods. In this study, we evaluated our model on multiple DoS attack types present in the UNSW-NB15 and BoT-IoT datasets. The results, displayed in Table 6, demonstrate that our model achieved high detection performance across the evaluated attacks. This is particularly notable given that both datasets tend to have large amounts of attack traffic relative to normal traffic, creating an inherent imbalance.
To mitigate the effects of this imbalance, we limited our analysis to a sample size of 10,000 packets per attack. Despite this restriction, our method successfully detected nearly all abnormal network flows while maintaining a high accuracy and F1-score for both intrusion detection techniques that analyze individual network packets and those that focus on network flow patterns. Moreover, the false positive rate remained low, showcasing the model’s robustness in distinguishing legitimate traffic from attack traffic, even under challenging conditions.
The analysis revealed that SYN Flood and UDP Flood attacks were detected with near-perfect precision and recall, while more complex attack types like HTTP Flood achieved high detection rates, particularly in flow-based evaluations. These outcomes highlight the usefulness of the model in handling various types of DoS attacks across both datasets.
Figure 2 illustrates the distribution of benign versus malicious data for DoS attacks. Given that DoS attacks generate significantly higher volumes of attack traffic compared to normal traffic, we report that our statistical analysis depends on the number of network flows rather than individual packets. It is important to note that the imbalance is even more pronounced at the packet level. Nevertheless, our model consistently achieves high accuracy in detecting attacks, even when the dataset is heavily skewed towards malicious traffic.

5.4.5. Assessment of DDoS Attack Detection

To broaden our experimental evaluation, we incorporated Distributed Denial of Service (DDoS) attacks from the BoT-IoT dataset. We intentionally omitted the identical IP flood attack because it uses the victim’s IP address as both the source and destination, making it easily detectable through basic rule-based methods. Our focus remained on more complex UDP-based and TCP attacks. The outcomes of these experiments, as presented in Table 7, demonstrate a strong performance of our model across various DDoS attack types, with notable metrics such as high accuracy, precision, recall, and F1-scores for attacks like ACK Fragment, SYN Flood, and TCP Flood.
Similar to our observations with DoS attacks, DDoS attacks exhibit high transmission rates that can affect detection performance. We noticed a decrease in detection effectiveness for HTTP Flood and SlowLoris attacks when profiling was based on the BoT-IoT dataset. Further investigation revealed that these attack samples included legitimate communications between the attacker and the victim. This resulted in attack patterns being present in the normal traffic profiles, compromising detection accuracy.
This highlights the critical importance of using clean normal data—completely free of any attack traffic—for accurate profiling. This necessity explains the superior performance achieved when using profiles from the 2022 dataset. Despite these challenges, other instances of HTTP Flood and SlowLoris attacks targeting different endpoints were detected with excellent accuracy.

5.5. Evaluating the Operational Efficiency of IoT-FIDS

This section evaluates the practicality of IoT-FIDS for real-world deployment by analyzing network traffic durations and detection times. The assessment was carried out in a network environment with approximately 80 endpoints, encompassing different types of Ethernet addresses such as unicast, multicast, and broadcast. Table 8 provides a breakdown of the number of endpoints, the duration of the exam traffic, and the detection times for each type of attack based on the UNSW-NB15 and BoT-IoT datasets. The outcomes indicate that IoT-FIDS demonstrated consistently low detection times across various attack types, supporting its ability to effectively monitor real-time traffic and promptly detect anomalies. This performance highlights the IoT-FIDS’ suitability for resource-constrained IoT environments, where minimizing detection time is crucial for maintaining network security. When compared with more computationally intensive methods, the IoT-FIDS’ lightweight, flow-based approach reduces overhead, making it more efficient for real-time applications without sacrificing accuracy. These findings further reinforce the IoT-FIDS’ capability to serve as a practical and reliable intrusion detection system in diverse IoT settings.
For enhanced clarity and comparison, Figure 3 illustrates these metrics graphically. The findings clearly demonstrate that, for the majority of attacks, the detection times are significantly shorter than the network traffic durations. This indicates that the IoT-FIDS is sufficiently lightweight for deployment in real-world applications, enabling near real-time Behavioral-based IDS. The execution time encompasses several processes: mapping packets to their respective flow-based representations, computing distances between these representations to identify malicious packets, and ultimately detecting all malicious network flows grounded on packet-level examination.

5.5.1. Analysis of Mapping and Feature Extraction Time Complexities

Let us consider a benign network traffic dataset consisting of n   packets. The total time required for mapping and feature extraction processes can be expressed as follows:
T m a p p i n g = n T p r = n T p v = T f e a t u r e E x t r a c t i o n
where
-
T p r represents the time required to convert a packet into its representation.
-
T p v signifies the time needed to derive characteristics from a packet.
Both processes share the same time complexity because we engineer and select various features from every packet, leading to a linear time complexity of O n for both IoT-FIDS and autoencoder-based methods.

5.5.2. Profiling and Training Time Complexities

During the profiling phase, the system simply stores the flow-based representations of packets. This operation does not depend on the number of packets and thus has a constant time complexity of O 1 . However, training an autoencoder introduces significantly higher computational demands. The total training time can be calculated as follows:
T t r a i n i n g = n T F e e d F o r w a r d + T R e c o n s t r u c t i o n E r r o r + T B a c k P r o p a g a t i o n n O n 2 + O n + O n 2 O n 3
Explanation:
-
Feedforward complexity arises from computing the activations in each layer.
-
Reconstruction error complexity refers to the error calculation between the input and output.
-
Backpropagation complexity involves updating the weights, which adds to the computational burden.
This indicates that the training time for an autoencoder increases cubically with the number of packets, making it computationally intensive for large datasets. In contrast, the IoT-FIDS requires no training, making it more efficient in environments where real-time performance is essential.

5.5.3. Monitoring and Testing Time Complexity Analysis

When analyzing unknown network traffic consisting of n packets, each with mmm extracted features, the monitoring phase of the IoT-FIDS system involves two main steps for each packet: mapping it to a flow-based representation and calculating its distance to stored representations from the profiling phase. The total time required for these operations is as follows:
T p r o f i l i n g = n T p r + n T d i s t a n c e = n T p r + n k m = O n + O n k m O n 3
Let k represent the maximum size among all sets of flow-based representations within the device profiles. For every individual packet, we start by mapping it to its specific representation. We then compute the distance using Equation (1) by discovering the smallest Hamming distance between this packet’s flow-based representation and the normal representations associated with the originating device of the packet. It is evident that both mmm and k are less than n . Now, consider the most basic autoencoder architecture: an input layer with mmm neurons, a hidden layer containing h neurons where m , k < n , and an output layer also consisting of m neurons. The computational difficulty of this autoencoder can be determined by applying the following equation:
T t e s t i n g = n T p v + T F e e d F o r w a r d + T R e c o n s t r u c t i o n E r r o r = n O 1 + O m k + k m + O n O n + O n 3 + O n 2 O n 3
The computational time required to process the input vector through feedforward propagation arises from the necessity of performing two sequential vector-matrix multiplication operations. Table 9 summarizes the time complexity of the various phases of IoT-FIDS compared to an autoencoder-based approach.
After evaluating both methods, it is clear that the IoT-FIDS offers a more streamlined and efficient solution overall. Although both techniques have identical theoretical time complexities, the autoencoder may not deliver comparable results or achieve high accuracy without increasing its complexity by adding more layers and neurons. This added complexity can make the autoencoder less practical, especially in real-world scenarios where it might also require a larger set of features to perform effectively. Updating the IoT-FIDS is straightforward—it entails identifying and incorporating newly observed benign packets into the current flow-based representation framework. In contrast, updating an autoencoder necessitates retraining the entire model, which is more time-consuming and resource-intensive. Consequently, the time complexity and runtime above associated with autoencoders are significantly higher than those of the IoT-FIDS.

6. Conclusions and Future Directions

This paper presents several important contributions to the field of IoT security. The primary contribution is the introduction of the IoT-FIDS, a novel profiling algorithm specifically designed for intrusion detection in IoT environments. Unlike traditional machine-learning-based methods, the IoT-FIDS leverages flow-based packet representations to detect abnormal packets and flows, providing a highly efficient and lightweight alternative. By avoiding the computational complexity associated with machine learning, the IoT-FIDS is particularly well suited for resource-constrained IoT devices, making it a more practical solution for real-time deployment.
A second key contribution is the development of an anomaly-based detection mechanism that operates without the need for machine learning models. The IoT-FIDS significantly reduces the overhead typically involved in training and updating models, offering a more scalable solution that can be deployed at both the network and host levels. This efficiency, combined with its minimal resource consumption, positions the IoT-FIDS as a superior alternative to machine-learning-based methods, particularly in environments where computational power is limited.
Furthermore, the efficacy and performance of IoT-FIDS were rigorously validated through comprehensive testing using two well-established IoT datasets, UNSW-NB15 and BoT-IoT. These evaluations demonstrate that the IoT-FIDS not only achieves high detection accuracy but also maintains a low false-positive rate, outperforming many machine-learning-based intrusion detection systems in terms of both precision and resource efficiency. This robust performance underlines the practicality of IoT-FIDS for real-world IoT deployments, where low latency and efficient resource usage are critical.
However, one limitation of this study is that the complexity analysis of advanced optimization techniques, such as Bayesian Hyperparameter Optimization, was not included. Future work should address this by analyzing the complexity and performance impact if such optimization methods are applied to deep neural networks (DNNs) for intrusion detection in IoT environments. This could provide valuable insights into optimizing detection models while balancing resource constraints. Another limitation of this study is that the proposed IoT-FIDS does not specifically address MQTT-based IoT communication systems, which use TCP and SSL encryption for secure transmission. Given the widespread adoption of MQTT in IoT networks, future work should focus on optimizing the IoT-FIDS to handle the smaller packet headers and encrypted traffic typical in such environments, while ensuring effective intrusion detection.
There are still several avenues for improvement. As a network-based system, the IoT-FIDS heavily relies on its ability to analyze sufficient flow-based representations for each device. This reliance on predefined profiles can result in increased false positives when the system encounters new or unfamiliar devices, particularly in dynamic IoT environments. Enhancing the system’s adaptability to frequent network changes could significantly improve its accuracy and robustness.
Additionally, the current method for mapping and calculating distances between flow-based representations could be refined. Future work could involve assigning greater weight to critical packet features, analyzing additional protocol layers, and introducing more sophisticated thresholds to distinguish between normal and malicious traffic. These enhancements would allow the IoT-FIDS to better handle complex attack patterns and minimize false positives.
To further improve the system’s precision, a human-in-the-loop approach could be integrated. This would enable the system to gradually learn from human feedback, fine-tuning its detection capabilities over time. Incorporating adaptive learning mechanisms could also help the IoT-FIDS evolve to handle a wider variety of devices and evolving threats in dynamic IoT ecosystems.
In conclusion, the IoT-FIDS offers a robust and efficient solution for intrusion detection in IoT networks, with proven effectiveness in real-world scenarios. By addressing the current limitations and exploring adaptive learning strategies, future versions of the IoT-FIDS could further enhance its accuracy and scalability, making it an even more powerful tool for securing increasingly complex and dynamic IoT environments.

Funding

This research was funded by the Researchers Supporting Project number (RSP2024R233), King Saud University, Riyadh, Saudi Arabia.

Informed Consent Statement

All participants involved in this study provided informed consent.

Data Availability Statement

Data can be made available upon request to ensure privacy restrictions are upheld.

Acknowledgments

The author would like to extend his sincere appreciation to the Researchers Supporting Project (RSP2024R233), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rahman, A.; Hasan, K.; Kundu, D.; Islam, M.J.; Debnath, T.; Band, S.S.; Kumar, N. On the ICN-IoT with Federated Learning Integration of Communication: Concepts, Security-Privacy Issues, Applications, and Future Perspectives. Future Gener. Comput. Syst. 2023, 138, 61–88. [Google Scholar] [CrossRef]
  2. Firouzi, F.; Jiang, S.; Chakrabarty, K.; Farahani, B.; Daneshmand, M.; Song, J.; Mankodiya, K. Fusion of IoT, AI, Edge–Fog–Cloud, and Blockchain: Challenges, Solutions, and a Case Study in Healthcare and Medicine. IEEE Internet Things J. 2023, 10, 3686–3705. [Google Scholar] [CrossRef]
  3. Grossi, M.; Alfonsi, F.; Prandini, M.; Gabrielli, A. Increasing the Security of Network Data Transmission with a Configurable Hardware Firewall Based on Field Programmable Gate Arrays. Future Internet 2024, 16, 303. [Google Scholar] [CrossRef]
  4. Mazhar, T.; Talpur, D.B.; Al Shloul, T.; Ghadi, Y.Y.; Haq, I.; Ullah, I.; Ouahada, K.; Hamam, H. Analysis of IoT Security Challenges and Its Solutions Using Artificial Intelligence. Brain Sci. 2023, 13, 683. [Google Scholar] [CrossRef]
  5. Khan, J.; Zhu, C.; Ali, W.; Asim, M.; Ahmad, S. Cost-Effective Signcryption for Securing IoT: A Novel Signcryption Algorithm Based on Hyperelliptic Curves. Information 2024, 15, 282. [Google Scholar] [CrossRef]
  6. Thangavelu, A.; Rajendran, P. Energy-Efficient Secure Routing for a Sustainable Heterogeneous IoT Network Management. Sustainability 2024, 16, 4756. [Google Scholar] [CrossRef]
  7. Qureshi, S.U.; He, J.; Tunio, S.; Zhu, N.; Nazir, A.; Wajahat, A.; Ullah, F.; Wadud, A. Systematic Review of Deep Learning Solutions for Malware Detection and Forensic Analysis in IoT. J. King Saud Univ.—Comput. Inf. Sci. 2024, 36, 102164. [Google Scholar] [CrossRef]
  8. Mutambik, I. Enhancing IoT Security Using GA-HDLAD: A Hybrid Deep Learning Approach for Anomaly Detection. Appl. Sci. 2024, 14, 9848. [Google Scholar] [CrossRef]
  9. Kaur, B.; Dadkhah, S.; Shoeleh, F.; Neto, E.C.P.; Xiong, P.; Iqbal, S.; Lamontagne, P.; Ray, S.; Ghorbani, A.A. Internet of Things (IoT) Security Dataset Evolution: Challenges and Future Directions. Internet Things 2023, 22, 100780. [Google Scholar] [CrossRef]
  10. Rehman, Z.; Gondal, I.; Ge, M.; Dong, H.; Gregory, M.; Tari, Z. Proactive Defense Mechanism: Enhancing IoT Security through Diversity-Based Moving Target Defense and Cyber Deception. Comput. Secur. 2024, 139, 103685. [Google Scholar] [CrossRef]
  11. Enoch, S.Y.; Mendonça, J.; Hong, J.B.; Ge, M.; Kim, D.S. An Integrated Security Hardening Optimization for Dynamic Networks Using Security and Availability Modeling with Multi-Objective Algorithm. Comput. Netw. 2022, 208, 108864. [Google Scholar] [CrossRef]
  12. Kulbacki, M.; Chaczko, Z.; Barton, S.; Wajs-Chaczko, P.; Nikodem, J.; Rozenblit, J.W.; Klempous, R.; Ito, A.; Kulbacki, M. A Review of the Weaponization of IoT: Security Threats and Countermeasures. In Proceedings of the 2024 IEEE 18th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 23–25 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 000279–000284. [Google Scholar] [CrossRef]
  13. Chee, K.O.; Ge, M.; Bai, G.; Kim, D.D. IoTSecSim: A Framework for Modelling and Simulation of Security in Internet of Things. Comput. Secur. 2024, 136, 103534. [Google Scholar] [CrossRef]
  14. Ghazvini, M.B.; Sànchez-Marrè, M.; Naderi, D.; Angulo, C. Anomaly Detection in Gas Turbines Using Outlet Energy Analysis with Cluster-Based Matrix Profile. Energies 2024, 17, 653. [Google Scholar] [CrossRef]
  15. Melícias, F.S.; Ribeiro, T.F.R.; Rabadão, C.; Santos, L.; Costa, R.L.D.C. GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT. IEEE Access 2024, 12, 17945–17965. [Google Scholar] [CrossRef]
  16. Almoqbil, A.H.N. Anomaly Detection for Early Ransomware and Spyware Warning in Nuclear Power Plant Systems Based on FusionGuard. Int. J. Inf. Secur. 2024, 23, 2377–2394. [Google Scholar] [CrossRef]
  17. Rupanetti, D.; Kaabouch, N. Combining Edge Computing-Assisted Internet of Things Security with Artificial Intelligence: Applications, Challenges, and Opportunities. Appl. Sci. 2024, 14, 7104. [Google Scholar] [CrossRef]
  18. Elhanashi, A.; Dini, P.; Saponara, S.; Zheng, Q. Integration of Deep Learning into the IoT: A Survey of Techniques and Challenges for Real-World Applications. Electronics 2023, 12, 4925. [Google Scholar] [CrossRef]
  19. Saurabh, K.; Sharma, V.; Singh, U.; Khondoker, R.; Vyas, R.; Vyas, O.P. HMS-IDS: Threat Intelligence Integration for Zero-Day Exploits and Advanced Persistent Threats in IIoT. Arab. J. Sci. Eng. 2024. [Google Scholar] [CrossRef]
  20. Wang, Z.; Huang, H.; Du, R.; Li, X.; Yuan, G. IoT Intrusion Detection Model Based on CNN-GRU. Front. Comput. Intell. Syst. 2023, 4, 90–95. [Google Scholar] [CrossRef]
  21. Alani, M.M.; Miri, A. Towards an Explainable Universal Feature Set for IoT Intrusion Detection. Sensors 2022, 22, 5690. [Google Scholar] [CrossRef]
  22. Catillo, M.; Pecchia, A.; Villano, U. Traditional vs Federated Learning with Deep Autoencoders: A Study in IoT Intrusion Detection. In Proceedings of the 2023 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Naples, Italy, 4–6 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 208–215. [Google Scholar] [CrossRef]
  23. Saied, M.; Guirguis, S.; Madbouly, M. Review of Artificial Intelligence for Enhancing Intrusion Detection in the Internet of Things. Eng. Appl. Artif. Intell. 2024, 127, 107231. [Google Scholar] [CrossRef]
  24. Javeed, D.; Saeed, M.S.; Ahmad, I.; Adil, M.; Kumar, P.; Islam, A.K.M.N. Quantum-Empowered Federated Learning and 6G Wireless Networks for IoT Security: Concept, Challenges and Future Directions. Future Gener. Comput. Syst. 2024, 160, 577–597. [Google Scholar] [CrossRef]
  25. Al-Fawa’reh, M.; Abu-Khalaf, J.; Szewczyk, P.; Kang, J.J. MalBoT-DRL: Malware Botnet Detection Using Deep Reinforcement Learning in IoT Networks. IEEE Internet Things J. 2024, 11, 9610–9629. [Google Scholar] [CrossRef]
  26. Farrukh, Y.A.; Wali, S.; Khan, I.; Bastian, N.D. AIS-NIDS: An Intelligent and Self-Sustaining Network Intrusion Detection System. Comput. Secur. 2024, 144, 103982. [Google Scholar] [CrossRef]
  27. Bala, B.; Behal, S. AI Techniques for IoT-Based DDoS Attack Detection: Taxonomies, Comprehensive Review and Research Challenges. Comput. Sci. Rev. 2024, 52, 100631. [Google Scholar] [CrossRef]
  28. Hamidpour, H.; Bushehrian, O. A Round-Based Network Attack Detection Model Using Auto-Encoder In IoT-Edge Computing. In Proceedings of the 2023 7th International Conference on Internet of Things and Applications (IoT), Isfahan, Iran, 25–26 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
  29. Nallakaruppan, M.K.; Somayaji, S.R.K.; Fuladi, S.; Benedetto, F.; Ulaganathan, S.K.; Yenduri, G. Enhancing Security of Host-Based Intrusion Detection Systems for the Internet of Things. IEEE Access 2024, 12, 31788–31797. [Google Scholar] [CrossRef]
  30. Panchal, R.K.; Snehkunj, R.; Panchal, V.V. A Survey on Network-Based Intrusion Detection System Using Learning Techniques. In Proceedings of the 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN), Dhulikhel, Nepal, 3–4 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 740–747. [Google Scholar] [CrossRef]
  31. Satilmiş, H.; Akleylek, S.; Tok, Z.Y. A Systematic Literature Review on Host-Based Intrusion Detection Systems. IEEE Access 2024, 12, 27237–27266. [Google Scholar] [CrossRef]
  32. Anju, A.; Krishnamurthy, M. M-EOS: Modified-Equilibrium Optimization-Based Stacked CNN for Insider Threat Detection. Wirel. Netw. 2024, 30, 2819–2838. [Google Scholar] [CrossRef]
  33. Lazzarini, R.; Tianfield, H.; Charissis, V. A Stacking Ensemble of Deep Learning Models for IoT Intrusion Detection. Knowl. Based Syst. 2023, 279, 110941. [Google Scholar] [CrossRef]
  34. Diro, A.; Chilamkurti, N.; Nguyen, V.-D.; Heyne, W. A Comprehensive Study of Anomaly Detection Schemes in IoT Networks Using Machine Learning Algorithms. Sensors 2021, 21, 8320. [Google Scholar] [CrossRef]
  35. Ayad, A.G.; Sakr, N.A.; Hikal, N.A. A Hybrid Feature Selection Model for Anomaly-Based Intrusion Detection in IoT Networks. In Proceedings of the 2024 International Telecommunications Conference (ITC-Egypt), Cairo, Egypt, 22–25 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar] [CrossRef]
  36. Otoum, Y.; Nayak, A. AS-IDS: Anomaly and Signature Based IDS for the Internet of Things. J. Netw. Syst. Manag. 2021, 29, 23. [Google Scholar] [CrossRef]
  37. Bhavsar, M.; Roy, K.; Kelly, J.; Olusola, O. Anomaly-Based Intrusion Detection System for IoT Application. Discov. Internet Things 2023, 3, 5. [Google Scholar] [CrossRef]
  38. Alfriehat, N.A.; Anbar, M.; Karuppayah, S.; Rihan, S.D.A.; Alabsi, B.A.; Momani, A.M. Detecting Version Number Attacks in Low Power and Lossy Networks for Internet of Things Routing: Review and Taxonomy. IEEE Access 2024, 12, 31136–31158. [Google Scholar] [CrossRef]
  39. Alfriehat, N.; Anbar, M.; Aladaileh, M.; Hasbullah, I.; Shurbaji, T.A.; Karuppayah, S.; Almomani, A. RPL-Based Attack Detection Approaches in IoT Networks: Review and Taxonomy. Artif. Intell. Rev. 2024, 57, 248. [Google Scholar] [CrossRef]
  40. DeMedeiros, K.; Hendawi, A.; Alvarez, M. A Survey of AI-Based Anomaly Detection in IoT and Sensor Networks. Sensors 2023, 23, 1352. [Google Scholar] [CrossRef] [PubMed]
  41. Nassif, A.B.; Talib, M.A.; Nasir, Q.; Dakalbab, F.M. Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access 2021, 9, 78658–78700. [Google Scholar] [CrossRef]
  42. Shen, S.; Cai, C.; Li, Z.; Shen, Y.; Wu, G.; Yu, S. Deep Q-Network-Based Heuristic Intrusion Detection against Edge-Based SIoT Zero-Day Attacks. Appl. Soft Comput. 2024, 150, 111080. [Google Scholar] [CrossRef]
  43. Khan, M.A.; Khan, M.A.; Jan, S.U.; Ahmad, J.; Jamal, S.S.; Shah, A.A.; Pitropakis, N.; Buchanan, W.J. A Deep Learning-Based Intrusion Detection System for MQTT Enabled IoT. Sensors 2021, 21, 7016. [Google Scholar] [CrossRef]
  44. Jadidi, Z.; Muthukkumarasamy, V.; Sithirasenan, E.; Sheikhan, M. Flow-Based Anomaly Detection Using Neural Network Optimized with GSA Algorithm. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops, Philadelphia, PA, USA, 8–11 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 76–81. [Google Scholar] [CrossRef]
  45. Baz, M. SEHIDS: Self Evolving Host-Based Intrusion Detection System for IoT Networks. Sensors 2022, 22, 6505. [Google Scholar] [CrossRef]
  46. Zohourian, A.; Dadkhah, S.; Molyneaux, H.; Neto, E.C.P.; Ghorbani, A.A. IoT-PRIDS: Leveraging Packet Representations for Intrusion Detection in IoT Networks. Comput. Secur. 2024, 146, 104034. [Google Scholar] [CrossRef]
Figure 1. Distribution of benign vs. attack traffic for various web attacks.
Figure 1. Distribution of benign vs. attack traffic for various web attacks.
Sensors 24 07408 g001
Figure 2. Distribution of benign vs. attack traffic for DoS attacks (UNSW-NB15 and BoT-IoT datasets).
Figure 2. Distribution of benign vs. attack traffic for DoS attacks (UNSW-NB15 and BoT-IoT datasets).
Sensors 24 07408 g002
Figure 3. Comparing traffic and detection duration for various attack types.
Figure 3. Comparing traffic and detection duration for various attack types.
Sensors 24 07408 g003
Table 1. Summary of the advantages and disadvantages of each method compared to the IoT-FIDS.
Table 1. Summary of the advantages and disadvantages of each method compared to the IoT-FIDS.
MethodAdvantagesDisadvantagesComparison with IoT-FIDS
SIDS [29,31]High accuracy for known attacksIneffective against unknown threatsIoT-FIDS detects both known and unknown threats using flow-based analysis without relying on predefined signatures.
Low computational costRequires frequent updates
AIDS [29,31]Detects novel threatsHigh false-positive rateIoT-FIDS reduces false positives using flow-based detection, making it more suitable for real-time environments.
No need for signature updatesResource-intensive
Machine-learning-based IDS [37]Adapts to evolving threatsRequires extensive training and retrainingIoT-FIDS is lightweight and avoids the computational burden of ML models, making it better for IoT environments.
High detection accuracyHigh resource consumption
Flow-Based Anomaly Detection [44]High detection accuracy using optimizationComputationally expensive due to GSA optimizationIoT-FIDS avoids the overhead of GSA optimization, offering a simpler and faster flow-based detection method.
Flow-Based Anomaly Detection [1]Adapts well to SDN environments using machine learningRequires extensive training and feature selectionIoT-FIDS is specifically designed for resource-constrained IoT environments, avoiding the need for extensive training.
Table 2. Overview of network traffic and flow representations in UNSW-NB15 and BoT-IoT datasets.
Table 2. Overview of network traffic and flow representations in UNSW-NB15 and BoT-IoT datasets.
DatasetDurationPacketsRepresentations (Flows)
UNSW-NB1531 h2,540,044100,000 (approx.)
BoT-IoT16.97 h69,500,000500,000 (approx.)
Total47.97 h72,040,044600,000
Table 3. Model performance in detecting web attacks using the UNSW-NB15 and BoT-IoT datasets.
Table 3. Model performance in detecting web attacks using the UNSW-NB15 and BoT-IoT datasets.
AttackMetricsPackets UNSW-NB15Packets BoT-IoTFlows UNSW-NB15Flows BoT-IoT
Backdoor MalwareAccuracy0.98560.99911.00001.0000
Precision0.79831.00001.00001.0000
Recall0.94561.00001.00001.0000
F1-Score0.86591.00001.00001.0000
FPR0.01250.00800.00220.0009
MCC0.97010.98000.99200.9981
Browser HijackingAccuracy0.98120.97980.99700.9976
Precision0.89520.46200.98050.9820
Recall1.00001.00001.00001.0000
F1-Score0.94400.63120.99020.9921
FPR0.02450.01900.09950.0597
MCC0.95780.96200.82350.8812
Command InjectionAccuracy0.98750.99310.99951.0000
Precision0.56920.71580.98851.0000
Recall0.92631.00001.00001.0000
F1-Score0.70450.83490.99421.0000
FPR0.01080.00830.00760.0045
MCC0.97730.98210.98680.9904
SQL InjectionAccuracy0.91150.99381.00001.0000
Precision0.12050.59101.00001.0000
Recall0.90131.00001.00001.0000
F1-Score0.21520.74331.00001.0000
FPR0.06500.05950.14450.1300
MCC0.84200.85100.71050.7401
Uploading AttackAccuracy0.98600.99260.99931.0000
Precision0.63950.76590.98711.0000
Recall0.92691.00001.00001.0000
F1-Score0.75840.86530.99351.0000
F1-score0.99270.99670.99820.9994
FPR0.00800.00430.00280.0010
Cross-Site ScriptingAccuracy0.98720.99340.99951.0000
Precision0.50830.71550.98311.0000
Recall0.92951.00001.00001.0000
F1-Score0.66100.83560.99151.0000
FPR0.00220.00100.00050.0000
MCC0.99560.99901.00001.0000
Table 4. Model performance in detecting reconnaissance attacks using UNSW-NB15 and BoT-IoT datasets.
Table 4. Model performance in detecting reconnaissance attacks using UNSW-NB15 and BoT-IoT datasets.
Attack TypesMetricsUNSW-NB15 PacketsBoT-IoT PacketsUNSW-NB15 FlowsBoT-IoT Flows
Host discoveryAccuracy0.96520.97840.95900.9851
Precision0.95200.97190.96560.9796
Recall0.95000.98630.94770.9876
F1-score0.95100.97910.95650.9838
0.01100.00650.02000.00550.0232
0.97600.98550.95970.99000.9765
OS scanAccuracy0.93500.92420.95800.9801
Precision0.88000.79870.97020.9875
Recall0.95201.00000.98000.9910
F1-score0.91450.88810.97500.9893
FPR0.01100.00650.02000.0055
MCC0.97600.98550.95970.9900
Port scanAccuracy0.91850.93800.96800.9812
Precision0.86000.81510.97200.9850
Recall0.97501.00000.98000.9921
F1-score0.91300.90010.97600.9884
FPR0.00220.00100.00050.0000
MCC0.98050.99000.99630.9985
Vulnerability scanAccuracy0.95010.99500.97850.9990
Precision0.91000.98800.96550.9965
Recall0.95001.00000.97801.0000
F1-score0.92950.99400.97150.9982
FPR0.00220.00100.00050.0000
MCC0.99560.99901.00001.0000
Table 5. Model performance in detecting dictionary attacks using the UNSW-NB15 and BoT-IoT datasets.
Table 5. Model performance in detecting dictionary attacks using the UNSW-NB15 and BoT-IoT datasets.
Attack TypesMetricsUNSW-NB15 PacketsBoT-IoT PacketsUNSW-NB15 FlowsBoT-IoT Flows
SSH BF 1Accuracy0.95000.84000.96000.9800
Precision0.52000.27000.51000.6800
Recall1.00001.00001.00001.0000
F1-score0.68500.42600.67500.8100
FPR0.01100.00650.02000.0055
MCC0.97600.98550.95970.9900
SSH BF 2Accuracy0.94500.93000.97000.8200
Precision0.54000.49000.51000.1500
Recall1.00001.00001.00001.0000
F1-score0.70300.66000.68000.2600
FPR0.00220.00100.00050.0000
MCC0.98050.99000.99630.9985
SSH BF 3Accuracy0.92001.00000.97001.0000
Precision0.78001.00000.79001.0000
Recall1.00001.00001.00001.0000
F1-score0.87001.00000.88501.0000
FPR0.01080.00830.00760.0045
MCC0.97730.98210.98680.9904
RTSP BF 4Accuracy0.94000.95000.98001.0000
Precision0.88001.00000.96501.0000
Recall1.00000.90001.00001.0000
F1-score0.94000.94500.98201.0000
FPR0.01250.00800.00220.0009
MCC0.97010.98000.99200.9981
RTSP BF 5Accuracy0.95501.00000.99001.0000
Precision0.92501.00000.98501.0000
Recall1.00001.00001.00001.0000
F1-score0.96001.00000.99301.0000
FPR0.06500.05950.14450.1300
MCC0.84200.85100.71050.7401
Table 6. Model performance for detecting DoS attacks using UNSW-NB15 and BoT-IoT datasets: A comparative analysis of packet-based and flow-based intrusion detection.
Table 6. Model performance for detecting DoS attacks using UNSW-NB15 and BoT-IoT datasets: A comparative analysis of packet-based and flow-based intrusion detection.
Attack TypesMetricsPackets (UNSW-NB15)Packets (BoT-IoT)Flows (UNSW-NB15)Flows (BoT-IoT)
HTTP FloodAccuracy0.90500.99000.78000.9980
Precision0.92500.99900.86000.9992
Recall0.89501.00000.84001.0000
F1-score0.91000.99950.85000.9996
FPR0.10500.01000.22000.0020
MCC0.87000.99920.78000.9994
SYN FloodAccuracy0.99500.99900.99300.9998
Precision0.99600.99980.99401.0000
Recall1.00001.00001.00001.0000
F1-score0.99800.99990.99701.0000
FPR0.00500.00100.00700.0002
MCC0.99650.99960.99800.9999
TCP FloodAccuracy0.98200.99300.97500.9994
Precision0.98000.99500.97800.9992
Recall0.99001.00000.99001.0000
F1-score0.98500.99750.98400.9996
FPR0.01800.00700.02500.0006
MCC0.98100.99700.97600.9995
UDP FloodAccuracy0.99001.00000.99501.0000
F1-score0.99201.00000.99601.0000
Recall1.00001.00001.00001.0000
Precision 0.99601.00000.99801.0000
FPR0.01000.00000.00500.0000
MCC0.99551.00000.99751.0000
Table 7. Model performance in detecting DDoS attacks using UNSW-NB15 and BoT-IoT datasets for packet-level and flow-level intrusion detection.
Table 7. Model performance in detecting DDoS attacks using UNSW-NB15 and BoT-IoT datasets for packet-level and flow-level intrusion detection.
Attack TypesMetricsPackets (UNSW-NB15)Packets (BoT-IoT)Flows (UNSW-NB15)Flows (BoT-IoT)
ACK FragmentAccuracy0.98750.99200.99780.9991
Precision0.98500.99350.99830.9994
Recall0.99200.99500.99881.0000
F1-score0.98850.99420.99860.9997
FPR0.01250.00800.00220.0009
MCC0.97010.98000.99200.9981
HTTP FloodAccuracy0.97550.98100.90050.9403
Precision0.97200.98000.90500.9420
Recall0.98000.98300.89000.9350
F1-score0.97600.98150.89740.9384
FPR0.02450.01900.09950.0597
MCC0.95780.96200.82350.8812
PSH ACK FloodAccuracy0.98920.99170.99240.9955
Precision0.98550.98940.99350.9960
Recall0.99300.99350.99450.9980
F1-score0.98920.99140.99400.9970
FPR0.01080.00830.00760.0045
MCC0.97730.98210.98680.9904
RST FIN Flood
Slow Loris
Accuracy0.98480.99000.98850.9927
Precision0.98200.98900.98700.9930
Recall0.99000.99150.98950.9940
F1-score0.98600.99030.98820.9935
FPR0.01520.01000.01150.0073
MCC0.96450.97840.97600.9835
SYN FloodAccuracy0.93500.94050.85550.8700
Precision0.92000.93950.85000.8755
Recall0.94500.94250.86050.8650
F1-score0.93230.94100.85500.8702
FPR0.06500.05950.14450.1300
MCC0.84200.85100.71050.7401
TCP FloodAccuracy0.99200.99570.99720.9990
Precision0.99050.99600.99850.9995
Recall0.99500.99750.99800.9993
F1-score0.99270.99670.99820.9994
FPR0.00800.00430.00280.0010
MCC0.98050.99000.99630.9985
UDP FloodAccuracy0.98900.99350.98000.9945
Precision0.98700.99300.98050.9950
Recall0.99050.99450.97900.9960
F1-score0.98870.99370.97970.9955
FPR0.01100.00650.02000.0055
MCC0.97600.98550.95970.9900
UDP FragmentAccuracy0.99500.99850.98550.9990
Precision0.99450.99820.98250.9995
Recall0.99050.99450.97900.9960
F1-score0.99850.99961.00001.0000
FPR0.00220.00100.00050.0000
MCC0.98050.99000.99630.9985
Table 8. Attack traffic duration (TD) vs. detection duration (DD) in UNSW-NB15 and BoT-IoT datasets with flow-based IDS and machine-learning-based IDS comparison.
Table 8. Attack traffic duration (TD) vs. detection duration (DD) in UNSW-NB15 and BoT-IoT datasets with flow-based IDS and machine-learning-based IDS comparison.
Attack TypeEndpointsTD (s)Flow-Based DD (s)
Web Attacks
SQL Injection58324
Browser Hijacking8121022
Backdoor Malware7718025
Recon Attacks
Host Discovery849011
Port Scan697014
Dictionary Attacks
SSH BF 1735018
RTSP BF 2766314
DoS Attacks
SYN Flood 63146
HTTP Flood3763
DDoS Attacks
ACK Fragment712010
SYN Flood753822
UDP Fragment68102
Table 9. Time complexity comparison between IoT-FIDS and autoencoder-based method.
Table 9. Time complexity comparison between IoT-FIDS and autoencoder-based method.
PhaseIoT-FIDS Time ComplexityAutoencoder Time ComplexityExplanation
Mapping & Feature Extraction O n O n Both methods process packets for feature extraction, with linear time complexity.
Profiling O 1 O n The IoT-FIDS has constant time for storing flow representations, while the autoencoder requires linear time.
Training N / A (No training) O n 3 The IoT-FIDS requires no training, whereas the autoencoder has cubic time complexity due to backpropagation.
Monitoring & Testing O n 3 O n 3 Both methods have similar complexity during monitoring, involving distance calculation and anomaly detection.
Updating O 1 (Simple update) O n 3 (Retraining required)The IoT-FIDS updates the flow profile in constant time, whereas the autoencoder requires retraining, which is cubic in time complexity.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mutambik, I. An Efficient Flow-Based Anomaly Detection System for Enhanced Security in IoT Networks. Sensors 2024, 24, 7408. https://doi.org/10.3390/s24227408

AMA Style

Mutambik I. An Efficient Flow-Based Anomaly Detection System for Enhanced Security in IoT Networks. Sensors. 2024; 24(22):7408. https://doi.org/10.3390/s24227408

Chicago/Turabian Style

Mutambik, Ibrahim. 2024. "An Efficient Flow-Based Anomaly Detection System for Enhanced Security in IoT Networks" Sensors 24, no. 22: 7408. https://doi.org/10.3390/s24227408

APA Style

Mutambik, I. (2024). An Efficient Flow-Based Anomaly Detection System for Enhanced Security in IoT Networks. Sensors, 24(22), 7408. https://doi.org/10.3390/s24227408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop