Open AccessArticle

Securing the Edge: CatBoost Classifier Optimized by the Lyrebird Algorithm to Detect Denial of Service Attacks in Internet of Things-Based Wireless Sensor Networks

Sennanur Srinivasan Abinayaa

¹,

Prakash Arumugam

²,

Divya Bhavani Mohan

³,

Anand Rajendran

⁴

Abderezak Lashab

⁵

Baoze Wei

^5,*

and

Josep M. Guerrero

⁵

Department of Electronics and Communication Engineering, Dr. NGP Institute of Technology, Coimbatore 641048, India

Karnavati School of Research, Karnavati University, Gujarat 382422, India

United World School of Computational Intelligence, Karnavati University, Gujarat 382422, India

⁴

Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India

⁵

Department of Energy Technology, Aalborg University, 9220 Aalborg, Denmark

Author to whom correspondence should be addressed.

Future Internet 2024, 16(10), 381; https://doi.org/10.3390/fi16100381

Submission received: 7 August 2024 / Revised: 16 October 2024 / Accepted: 17 October 2024 / Published: 19 October 2024

(This article belongs to the Special Issue Scalable and Distributed Cloud Continuum Orchestration for Next-Generation IoT Applications: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

The security of Wireless Sensor Networks (WSNs) is of the utmost importance because of their widespread use in various applications. Protecting WSNs from harmful activity is a vital function of intrusion detection systems (IDSs). An innovative approach to WSN intrusion detection (ID) utilizing the CatBoost classifier (Cb-C) and the Lyrebird Optimization Algorithm is presented in this work (LOA). As is typical in ID settings, Cb-C excels at handling datasets that are imbalanced. The lyrebird’s remarkable capacity to imitate the sounds of its surroundings served as inspiration for the LOA, a metaheuristic optimization algorithm. The WSN-DS dataset, acquired from Prince Sultan University in Saudi Arabia, is used to assess the suggested method. Among the models presented, LOA-Cb-C produces the highest accuracy of 99.66%; nevertheless, when compared with the other methods discussed in this article, its error value of 0.34% is the lowest. Experimental results reveal that the suggested strategy improves WSN-IoT security over the existing methods in terms of detection accuracy and the false alarm rate.

Keywords:

wireless sensor networks (WSNs); intrusion detection (ID); CatBoost classifier (Cb-C); lyrebird optimization algorithm (LOA); WSN-DS dataset; machine learning (ML)

1. Introduction

Emerging as a game-changing technology, Wireless Sensor Networks (WSNs) have the ability to revolutionize numerous industries and applications. The “sensor nodes” that comprise these networks are small autonomous devices that can detect, process, and transmit data wirelessly. Wherever conventional wired networks are too costly or impossible to implement, WSNs have found use, including in environmental monitoring, healthcare, industrial computerization, military shadowing, and many more. One major advantage of WSNs is their ability to gather data from inaccessible or harsh locations. Dense networks formed by several sensor nodes can provide detailed, real-time data about the monitored area. Efficiency and safety can be enhanced in a variety of applications with the use of these data for decision-making, monitoring, and control. Despite all the benefits, WSNs still have a long way to go before they can reach their maximum potential. Problems arise, for example, when the processing and data transfer capacities of individual sensor nodes are inadequate. Energy efficiency is a major concern in WSNs since sensor nodes’ batteries have limited capacity, and it is impossible to charge or replace them frequently in many deployment situations. The necessity for efficient and reliable communication protocols is another obstacle with WSNs. The design of the protocols that let sensor nodes communicate with one another and with a central base station is a key factor in the network’s overall success. Verifying the dependability and efficiency of communication in WSNs requires fixing problems with routing, data aggregation, and congestion control. A lot of work has gone into improving the efficiency and dependability of WSNs in recent years. Data processing using machine learning algorithms, optimization methods for reduced energy consumption, and new communication protocols for enhanced network efficiency are all examples of what is meant here. When these problems are solved and these innovations are used, WSNs could change data collection, processing, and usage for many different things.

Figure 1 represents the basic structure of an IoT-based WSN. The integration of the Internet of Things (IoT) and WSNs has revolutionized the field of sensing, communication, and data collection in various industries. By combining IoT with WSNs, the communication capabilities of these networks can be enhanced, making them more intelligent and efficient in a wide range of applications including smart cities, healthcare, agriculture, and industrial automation. IoT-based WSNs play a critical role in modern technological ecosystems, particularly in the context of smart environments, industry 4.0, and automation. These networks consist of spatially distributed autonomous sensors that monitor physical or environmental conditions such as temperature, humidity, pressure, and motion, and transmit the data to a central hub for processing. The integration of WSNs with the IoT extends their functionality beyond simple data collection to include remote monitoring, control, and decision-making capabilities. This enables real-time interaction between physical and digital systems across various domains, such as healthcare, agriculture, smart cities, and industrial automation. IoT-based WSNs leverage various communication protocols, including Wi-Fi, Bluetooth, Zigbee, and LoRa, to ensure seamless connectivity and data transfer across devices, even in remote or challenging environments. The scalability and flexibility of these systems allow for the integration of a wide range of sensors and actuators, making them adaptable to diverse applications. However, challenges such as energy efficiency, security, and data privacy remain critical considerations. The limited battery life of sensor nodes requires the development of energy-harvesting techniques or optimized communication protocols to extend network longevity. Additionally, security threats like unauthorized access and data tampering must be addressed through robust encryption and authentication protocols. Despite these challenges, IoT-based WSNs are pivotal in driving innovation across various sectors, fostering greater efficiency, automation, and data-driven decision-making, thereby shaping the future of interconnected systems.

The structure of this research article is as follows. A comprehensive literature review is presented in Section 2; the contributions of this research article are delineated in Section 3; the proposed methodology is elucidated in Section 4; the performance metrics and simulation results are provided in Section 5 and Section 6; the discussion is contained in Section 7; the complexity analysis is elaborated on in Section 8; and the conclusion and future research are presented in Section 9 and Section 10, respectively.

2. Literature Review

Many different ML algorithms for WSN ID have been studied recently. In this section, we summarize and analyze a large body of literature on optimization techniques, deep learning methods, machine learning models, and their combined applications. In order to detect different types of attacks successfully, a groundbreaking MBOLT (MegaBAT optimized Long Short-Term Memory)-IDS for WSNs was created and introduced. Using this method to fine-tune deep LSTM’s hyperparameters led to better performance with less computational overhead. The proposed ID was tested with the use of the publicly available WSN-DS datasets. Normalizing the data, changing the letters to numbers, and adding multi-class labelling to better attack identification are all steps in the pre-processing that boost classification performance [2]. By grouping the SN using the DFFF algorithm, the study suggested a way to gather data throughout the WSN and ID stages. The data were organized in the pre-processing step, and the matrix values were reduced using ELDA (entropy-based linear discriminant analysis). The QN3 (Quasi-Newton Neural Network) classifier inspects the data for signs of assault or normalcy after matrix reduction and sorts them into DoS, R2L, and U2R categories [3]. In order to implement ID in IoT-WSN, one research study developed and implemented a BCOA-MLID (Binary Chimp Optimization Algorithm with Machine Learning-based Intrusion Detection). To protect the IoT-WSN, this method makes an effort to differentiate between different kinds of attacks. The BCOA-MLID method outperformed XGBoost and KNN (K-Nearest Neighbor)-AOA in terms of accuracy (99.36 percent) in experimental findings conducted on the Kaggle invasion dataset [4]. The FA with SVM was employed for the classification of attacks, where GWO (Grey Wolf Optimizer) optimizes the parameter ‘a’ that linearly decreases the value from 2 to 0 during the iterations, where a is given by

a = 2 - (\frac{2}{T} x t)

, where ‘T’ is the total number of iterations and ‘t’ is the current iteration. Simulations of FA-SVM use the NSL-kDD dataset with 99.34% accuracy. The kNN-PSO and XGBoost models had lower accuracies of 96.42% and 95.36%, respectively [5]. The authors presented a number of potential hybrid intrusion detection systems (IDSs) for SAT-IDSs that made use of feature selection in conjunction with ML or DL models. The methods utilized were ANN, GRU, LSTM, and RF for SFS. The proposed SAT-IDSs were placed through their paces with STIN on satellite networks and UNSW-NB15 on land networks [6].

To detect the numerous assaults documented in the IoT-IDS dataset, one study employed the GWDTO (Grey Wolf Dipper Throated Optimization) and contrasted it with other approaches found in previous research. In the fields of statistics and signal processing, SVD finds use in data partitioning, feature extraction, matrix approximation, and pattern recognition [7]. An ensemble model was suggested in another research study that uses the recursive feature reduction technique with RF assistance to enhance the IDS’s prediction capabilities. To extract meaningful information from the datasets, the writers employed a data normalization strategy and a wrapper-based feature removal method. With the goal of generating attack class probabilities through majority voting, a hybrid stacking ensemble was constructed by merging RF-RFE, MLP, RF, and SVM for classification [8]. The CMPRO (Chimp social incentive-based Mutated Poor Rich Optimization) algorithm was used for optimal cluster head selection. Blockchain was deployed on optimal cluster heads and a base station for storage and computational resources [9]. In another research article, a CRNN (convolutional recurrent neural network) model along with MSO for optimal hyperparameter selection was presented with the aim of improving the classifier model’s performance. The investigational results of the MSODL-ID (Moth Search Optimizer with Deep Learning Enabled Intrusion Detection) method were analyzed on the WSN-DS with 374,661 samples. The MSODL-ID approach accurately recognized different types of attacks by providing outstanding results [10]. Geometric SMOTE was used to generate different rare-class attack data while keeping sample feature similarity using the enhanced kernel density estimation algorithm. Soft-voting ensemble learning was used to detect multi-class anomalies in balanced and dimensionally reduced data [11].

In another article, an ID based on DL for IoT networks, employing a four-layered deep FC network architecture, was used to detect malicious traffic. The proposed system was implemented in Python using the TensorFlow library and was tested using the CNS and COS. The suggested system updates on classifier-discovered features. The feature extractor module intercepts data packets and extracts features such as Id, Tr, Rr, and TRR, where Id refers to ‘identifier’, Tr refers to ‘Transmission Ratio’, Rr refers to ‘Reception Ratio’, and TRR refers to ‘Transmission Reception Ratio’ [12]. The authors of another article chose OBPNN to monitor sensor data and assess health, optimizing the SVM with the FSA. The suggested automated health status evaluation removes outliers using kernel density estimation, determines the appropriate bandwidth for typical data, simulates normality using OBPNN, and calculates the health score based on the decision function and distance from the separation hyperplane [13]. An effective and smart NIDS built on DL was suggested in another article. The model’s performance was evaluated using various classification techniques and achieved a 100% and 99.64% accuracy rate when trained and tested with the datasets, respectively [14]. Another study presented an AutoML model that uses Bayesian optimization to automatically choose an ML model and adjust its hyperparameters, where ‘k’ barriers are needed for accurate detection and prevention of intrusion. The ML models considered in the study included various regression techniques along with bagging, ensemble, and boosting ensemble learning [15]. In another study, a decision tree was built using the CSA, and the selected features were represented by the terminal nodes of the decision tree. The proposed IDS was compared to other techniques using the CIC-IDS2018 benchmark dataset. The suggested approach is a hybrid decision-making framework for classification and feature extraction using a decision tree with a maximum information gain ratio. The KDDCup 99 dataset was used for training and testing the proposed methodology [16].

In another study, a hybrid method that combines the RF model with the Pearson correlation coefficient was devised for efficient and effective feature selection. The categorical features were transformed into numerical values using an encoding approach. When tested in actual network settings, the decision tree algorithm proved to be the most effective and accurate method for ID [17]. In order to identify network intrusions using a multi-model ID system, the paper presented a method based on feature selection and majority vote. Six distinct ML methods were used for training and assessment. A multi-model categorization system was developed using the Majority Voting approach [18]. An effective ML-based approach for WSN routing and clustering was presented in the article. In order to transmit data, identify nodes, detect faults, and conduct analyses, WSNs employ ML algorithms such as the RF algorithm, MLPNN (Multi-Layer Perceptron Neural Network), and RBFNN [19]. In another study, ML-based IDS was implemented to keep an eye on traffic in vehicular ad hoc networks (VANETs) and spot any questionable behavior using streaming engines for big data analytics. Using RF as the data classifier during training and including real-time data management, visualization, and collection, the IDS framework attained maximum accuracy. Additionally, it assessed the effect of the IDS on network throughput, showing that it is a resource-efficient solution [20]. In order to identify attacks on the MQTT-based protocol, the authors of another study introduced a DNN and evaluated its efficacy in comparison to other popular ML techniques. The authors evaluated the ID system’s efficacy using evaluation measures like recall, accuracy, precision, and the F1 measure and then computed the information gain using entropy [21].

Using ML techniques for the purpose of detecting previously unseen attacks, another article suggested a multi-layer ID architecture for WSNs. In the initial detection layer, network edge sensors used an NB classifier to make decisions about real-time packet inspection [22]. Another research study used a systematic literature review approach to pick papers to examine. The study provided a taxonomy of ID algorithms based on individual DL methods. The research examined and contrasted the ID systems under study in terms of the simulation metrics, settings, languages, software, datasets, and feature extraction approaches that were employed [23]. Another article presented an ID model that combines TVP-IPSO (Time-Varying Parameter-Improved Particle Swarm Optimization), PCA (Principal Component Analysis), and SVM (Support Vector Machines), specifically PCA, with other popular methods. To reduce energy, PCA was utilized to decrease data dimensions, while SVM was used for high detection accuracy. To improve detection accuracy and convergence speed, the TVP-IPSO method was used to find the optimal parameters for the SVM algorithm and optimize them [24]. Another research study presented a novel method for improving the setup of a lightweight DL-HIDS model using parameters supplied by an IoT device. Memory consumption and inference timing were among the many parameters considered by the authors when they optimized the devices used. The article explored the proposed method for deploying the DL-HIDS in a real-world Internet of Things environment [25]. Another study combined the kNN and SCA (Sine Cosine Algorithm), which allowed a lightweight ID model for WSN to enhance classification accuracy while decreasing false alarm rates. The proposed model made up for the optimization accuracy loss with PM, and a CSCA reduced space and time for calculations [26].

One research study looked at several data mining techniques for an effective IDS in detecting DoS assaults. It used a specialized dataset named WSN-DS and performed a thorough empirical analysis. Prior to and following the use of a feature selection algorithm, the detection accuracy and computational cost of those methods were assessed [27]. Another paper explored the use of DNNs to develop an effective IDS for detecting and classifying cyberattacks in real time. DNNs outperformed other ML classifiers on several publicly available benchmark datasets, according to the study [28]. The authors of another study used the WSN-DS dataset to detect the attack using an RF model. They compared the entails by dividing the WSN-DS dataset into training and testing sets, developing an RF algorithm to learn the training data, and utilizing the RF classifier to predict DoS assaults on the testing set [29]. The paper employed various CNN techniques to discover DoS attacks in WSNs. NN algorithms were trained and assessed on the WSN-DS dataset with diverse DoS attacks [30]. GSWO was proposed in another paper, which combines a GA and a modified WOA. This hybrid approach aimed to enhance feature selection and hyperparameter tuning for intrusion detection systems in WSNs. GSWO is specifically applied for feature selection, where it identifies the most relevant features from datasets. The algorithm uses a fitness function that balances classification error and the number of features retained, promoting efficient model performance. The performance of the proposed methods was evaluated using various metrics, including accuracy, precision, recall, and F1-score. Additionally, ROC curves were utilized to visualize the model’s performance across different thresholds [31]. The core of Kitsune’s approach is the use of an ensemble of autoencoders, which are NNs designed to learn efficient representations of data. This ensemble method allows for collective differentiation between normal and abnormal traffic patterns, making it effective for anomaly detection in real-time scenarios. Kitsune includes a feature extraction component that efficiently tracks the behavior of all network channels. This framework is crucial for obtaining relevant features from network traffic, which helps in accurately identifying anomalies. One paper discussed the evaluation of Kitsune’s performance in terms of detection accuracy and runtime efficiency. The results indicated that Kitsune can perform comparably to traditional batch algorithms, even on low-power devices like Raspberry Pi, showcasing its efficiency and practicality [32]. Another paper demonstrated that denoising autoencoders can significantly improve the accuracy of NIDS. By reconstructing inputs from partial observations, these autoencoders helped distinguish between benign and malicious traffic more effectively. The paper discussed the importance of feature vectors in both packet-based and flow-based NIDS. It highlighted that the way features are extracted can significantly impact detection rates, especially for certain types of attacks [33]. Another paper adopted a deep learning architecture for unsupervised anomaly detection in IoT, evaluating multiple design choices on the IoT-23 and Kitsune datasets. The results demonstrated enhanced performance and robustness, particularly in the face of Label-Flipping poisoning attacks, compared with existing baselines [34]. Table 1 summarizes the recent works carried out in the field of ID in WSNs.

The reviewed research articles demonstrate a growing interest in utilizing ML algorithms for ID in WSNs. However, the existing methods often face challenges such as high false alarm rates and limited adaptability to dynamic network environments. By leveraging advanced optimization techniques, such as the LOA, there is a clear opportunity to develop more effective classifiers for WSNs. These optimized classifiers have the capability to enhance the accuracy and efficacy of attack detection significantly, thus addressing the shortcomings of current approaches.

3. Contributions of This Research Article

The suggested LOA helps to optimize the hyperparameters of the Cb-C.
The proposed Cb-C incorporates an efficient classification technique for enhancing the prediction capabilities of various attacks.
This study mainly focuses on integrating the LOA and Cb-C, hence called LOA-Cb-C, which helps in the effective prediction of the maximum number of samples in each category of attack, thus enhancing the accuracy, true positive rate, and F1-score and reducing the error rate.

4. Proposed Methodology

4.1. General Structure of the Proposed Methodology

(i): Dataset collection

The dataset used in this study is the WSN-DS dataset obtained from Prince Sultan University, Saudi Arabia. This dataset contains instances of various types of attacks in WSNs, making it suitable for our classification task.

(ii): Data Splitting

Eighty percent of the dataset was reserved for training purposes, while the remaining twenty percent was used for testing. The holdout method was used to divide the data in this way so that the proposed model could be trained on a large enough portion of the data while retaining a separate set for evaluation. The robustness of the suggested optimization-based classifier (LOA-Cb-C) was evaluated on the mentioned dataset consisting of 374,661 samples. Table 2 displays the breakdown of the total sample size.

(iii): Model Selection and Optimization

We chose the Cb-C for its effectiveness in handling categorical features and its ability to handle imbalanced datasets. The hyperparameters of the Cb-C were optimized using the LOA. This algorithm is known for its efficiency in finding optimal hyperparameters, which can considerably increase the classifier’s performance.

(iv): Training athe Cb-C for its effectiveness in handling categorical features nd Prediction

The optimized Cb-C was trained on the training dataset to learn the patterns in the data related to diverse attacks. After training, the model was applied to the testing dataset for prediction purposes. Several metrics were utilized to evaluate the model’s performance, including accuracy, precision, recall, and F1-score.

4.2. The WSN-DS Dataset

The WSN-DS dataset used in this study is a comprehensive dataset collected from real-world sensor networks, specifically designed for evaluating IDS. The dataset was obtained from Prince Sultan University, Saudi Arabia, and consists of a vast collection of sensor readings and network traffic data. The dataset can be accessed at https://www.kaggle.com/datasets/bassamkasasbeh1/wsnds accessed on 19 June 2024. The types of attacks and the total number of samples are detailed below.

WSN-DS comprises various types of attacks that can occur in WSNs, including the following:

Blackhole Attacks: A blackhole attack is a type of security threat where a malicious node in the network selectively drops or discards data packets, without sending them to their anticipated terminal. The purpose of a blackhole attack is to disrupt communication within the network by making the malicious node appear as an attractive route for data traffic.
Greyhole Attacks: A grayhole attack is a type of security threat where a malicious node selectively drops or modifies data packets, rather than dropping all packets like in a blackhole attack. The goal of a grayhole attack is to disrupt communication in the network while remaining stealthy and difficult to detect.
Flooding Attacks: A flooding attack is a type of Denial of Service (DoS) attack where an attacker deliberately sends a large volume of packets or messages to the network with the intention of overwhelming its resources. The goal of a flooding attack is to consume the network’s bandwidth, energy, or processing capabilities, leading to a disruption in communication and potentially causing legitimate messages to be dropped or delayed.
TDMA/Scheduling: TDMA/scheduling is a channel access method used to allocate time slots to nodes for transmitting and receiving data. TDMA is a scheduling algorithm that divides the communication channel into time slots, with each node assigned a specific time slot during which it can transmit or receive data.

There are a total of 374,661 samples in the WSN-DS dataset. Out of 374,661 instances, there are 340,066 normal nodes; 10,049 blackhole attacks; 14,596 grayhole attacks; 6638 TDMA/scheduling attacks; and 3312 flooding attacks. The dataset description can be accessed from [30] and its feature importance can be found in [35,36,37].

The dataset utilized in this investigation was carefully inspected to check for null values, missing values, and value ranges. It was discovered that the dataset was clean and devoid of any missing or null values. This indicated that the dataset was suitable for direct use in the classification task without the need for any specific data pre-processing techniques. The range of values in the dataset was also inspected to ensure that all values fall within the expected range for each feature, which further confirmed the value and appropriateness of the dataset for analysis.

4.3. Lyrebird Optimization Algorithm (LOA)

This section discusses a new population-based metaheuristic algorithm and its mathematical model. Lyrebirds are native to Australia, with the superb lyrebird and Albert’s lyrebird as its species. They are primarily recognized for the startling beauty of the enormous tail of the male bird when it is spread out in a mating display, as well as for their exceptional ability to reproduce both manufactured and natural noises that are present in their habitat. Lyrebirds are among the most well-known native birds in Australia. They are distinguished by their distinctive plumes of feathers with a neutral coloration. The females of the superb lyrebird species measure between 74 and 84 cm in length, while the males measure between 80 and 98 cm. To put that into perspective, the female of Albert’s lyrebird species can grow to a maximum size of 84 cm, while the males can grow to a maximum size of 90 cm. Although the lyrate feathers of Albert’s lyrebird species are smaller and less remarkable than those of the superb lyrebird, the two species are generally comparable. Superb lyrebirds weigh approximately 0.97 kg, which is somewhat more than the average weight of 0.93 kg. When a lyrebird detects the presence of potential danger, one of its behavioral features immediately becomes obvious. A bird’s response to this situation is to take a moment to gather its thoughts, survey its immediate area, and then take flight or find a safe haven. The flowchart of LOA is shown in Figure 2.

(i): Different phases of the LOA.

The LOA approach is an iterative technique in which the members are updated based on every iteration when lyrebirds sense danger. Based on the situation, the population is updated in two different phases, viz., (a) escaping and (b) hiding.

(ii): Escape strategy (exploration phase).

In the exploration phase, the simulation is carried out in such a way that the birds move from a dangerous spot to a safe place, and then the population member’s location in the search space is rationalized accordingly. Relocating a lyrebird to a secure spot causes it to drastically alter its position and scan other regions of the problem-solving space, which expresses the capability of the algorithm in the overall search. As part of the LOA design, each population member’s safe zones are defined as the locations of other population members with higher objective function values.

(iii): Hiding strategy (exploitation phase).

Each population member’s search space position is updated using the lyrebird’s modeling strategy for hiding in its safe area during exploitation. The lyrebird’s ability to scan its environment accurately and move cautiously in quest of a suitable hiding place demonstrates the LOA’s exploitative skills in the local search since it makes little but noticeable movements. In the LOA design, every member of the flock is given a new position according to the lyrebird’s predicted flight path to a nearby hiding place.

4.4. Lyrebird Optimization Algorithm for Intrusion Detection in WSNs

This section describes the fitness function and the parameters that will adapt to reflect the specific optimization goals (e.g., minimizing attack detection errors and energy consumption or maximizing the detection rate).

(a): Fitness function (objective):
In the proposed work, the objective is to minimize false positives and maximize detection accuracy. The fitness function is based on metrics such as accuracy, precision, recall, and F1-score.
The fitness function is defined as

$F i t n e s s (P_{i}) = w_{1} \cdot A c c u r a c y + w_{2} \cdot \Pr e c i s i o n + w_{3} \cdot Re c a l l - w_{4} \cdot F P R$

(1)

where the weights for balancing the metrics are given by $w_{1}, w_{2}, w_{3}, w_{4}$ and FPR is the false positive rate.

(b): Exploration and exploitation phases:
The exploration and exploitation phases are the same as in general optimization, but the updated positions of the solutions reflect the search for optimal network parameters that maximize detection accuracy or minimize energy consumption. The equation for exploration is as follows:

$P_{i}^{n e w} = P_{i} + β x (r - 0.5)$

(2)

(c): Termination:
The termination criterion are based on reaching a desired detection threshold (e.g., achieving a minimum F1-score) or after a certain number of iterations

$F 1 (P_{i}) \geq T h r e s h o l d o r I t e r a t i o n s = M a x I t e r$

(3)

where F1 is the F1-score for attack detection.

4.5. CatBoost Classifier (Cb-C) for WSN Attack Detection

The CatBoost Classifier (Cb-C) is a gradient boosting library-based, open-source, non-linear, tree-based ML approach. It outperforms more sophisticated boosting techniques, including Light GBM, XG Boost, and others, and delivers optimal results even during initial runs. The Cb-C exhibits equivalent performance across a wider variety of data types. The Cb-C was implemented in this pipeline to classify the pre-processed data. By implementing this classifier, the data were categorized with enhanced precision and increased speed without requiring additional time for processing. The Cb-C classifier was supplied with the data for the purpose of classifying the data during the training and testing phases. Each weight was initially established as identical. Figure 3 depicts the flow diagram Cb-C classifier.

(a): Objective function (loss function):
The objective function is a classification loss function, and the base equation for distinguishing between the normal traffic and attack traffic in WSN is given by

$L (y, \hat{y}) = \sum_{i = 1}^{n} y_{i} \log (\hat{y_{i}}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})$

(4)

where $y_{i}$ = 1 for attack traffic (DoS) and $y_{i}$ = 0 for normal traffic. ${\hat{y}}_{i}$ is the predicted probability that sample ‘i’ is an attack.

(b): Handling categorical data in WSNs:
CatBoost’s ordered target statistics help in managing the categorical data present in the WSN-DS dataset, and this prevents leakage and overfitting. The equation for the ordered target statistics is given by

$T S (c) = \frac{\sum_{i = 1}^{n - 1} y_{i}}{n - 1}$

(5)

where c is the categorical feature (attack type) and $y_{i}$ represents the attack labels.

(c): Prediction of attack likelihood:
After building the trees iteratively, the final prediction for a given instance is

${\hat{y}}_{i} = \sum_{m = 1}^{M} η \cdot h_{m} (X_{i})$

(6)

where ${\hat{y}}_{i}$ is the predicted likelihood of an attack on the i-th instance of sensor data and $X_{i}$ represents features like the packet size, transmission rate, and node energy level.

4.6. CatBoost Classifier (Cb-C) Optimized by the Lyrebird Optimization Algorithm (LOA)

The Cb-C optimized by the LOA has shown promising results in detecting DoS attacks in IoT-based WSNs. The flow diagram of the proposed LOA- Cb-C is presented in Figure 4. The benefits of this approach include the following:

Improved Accuracy: The CatBoost classifier’s ability to differentiate between normal and malicious attack can be enhanced by hyperparameter optimization.
Robustness: Improved detection of DoS attacks is a result of the enhanced CatBoost classifier’s increased resilience to changes in IoT data.
Efficiency: Because of its improved computing efficiency, the optimized classifier is well-suited for WSNs to identify DoS assaults in real time.

5. Performance Metrics

The following are the parameters considered for analysis.

True Positive Rate: The ratio of actual positive cases to the sum of true positive and false negatives that are correctly identified by a classifier.

T P R = \frac{T P}{T P + F N}

(7)

False Positive Rate: The proportion of actual negative cases to the sum of true negative and false positives that are incorrectly classified as positive by a classifier.

F P R = \frac{F P}{F P + T N}

(8)

Precision: The fraction of positive instances that a classifier accurately predicts out of all positive instances.

\Pr e c i s i o n = \frac{T P}{T P + F P}

(9)

Recall: The fraction of correctly predicted positive instances among all actual positive instances.

Re c a l l = \frac{T P}{T P + F N}

(10)

F1-score: The harmonic mean of precision and recall, providing a balance between the two metrics.

F 1_S c o r e = 2 \times \frac{(\Pr e c i s o n \times Re c a l l)}{(\Pr e c i s o n + Re c a l l)}

(11)

Accuracy: The ratio of correctly classified instances (true positives and true negatives) among all instances.

A c c u r a c y = \frac{T P + T N}{o v e r a l l s a m p l e s} \times 100

(12)

Error: The ratio of incorrectly classified instances (both false positives and false negatives) among all instances.

E r r o r = \frac{F P + F N}{o v e r a l l s a m p l e s}

(13)

6. Simulation Results

The suggested Cb-C was implemented in Python on an Intel(R) Core(TM) i7 processor running Microsoft Windows 11 with a 3.40 GHz central processing unit and 16 GB of random access memory under the Google Collaboratory environment. Table 3 lists the hyperparameters and the values selected for the proposed classifier.

The proposed classifier was executed with a learning rate of 0.01, 0.05, 0.1, and 0.5 with 100 iterations. Table 3 provides the values obtained by the proposed classifier on 20% of the testing data.

Table 4 shows that the proposed classifier achieved an average TP rate of 0.99, recall, precision, f1 measure, and ROC, with an FP rate of 0.01.

Table 5 shows that out of 74,932 instances, 74,676 were correctly classified and 256 were incorrectly classified. Therefore, the accuracy is 99.66% and the misclassification cost is 0.34. From Table 6, it is inferred that MAE is 0.0025 and RMSE is 0.034. The kappa statistics value is 0.98, which is very close to 1. This shows that the proposed Cb-C tuned by the LOA fits best in predicting more samples as true positives, which produces better accuracy than the other reported techniques.

Table 7 shows the comparative analysis of the performance metrics obtained by the proposed classifier with the other reported models in this article. The accuracy obtained by LOA-Cb-C is 99.66%, which is 0.06% less than Gboost, 0.16% less than MVFS, 0.46% less than DLDM, 0.87% less than CNN, 0.87% less than SVM, 1.66% less than RF, 2.01% less than MLML, 2.62% less than DNN, 2.8% less than CNN+RNN, 3.18% less than RNN, 5.06% less than NB, 6.26% less than LR, 6.66% less than NNC-PSOR, 7.7% less than ANN, and 15.86% less than KNN. The error obtained by LOA-Cb-C is 0.34, which is 0.06 times less than Gboost, 0.16 times less than MVFS, 0.46 times less than DLDM, 0.87 times less than CNN, 0.94 times less than SVM, 1.66 times less than RF, 2.01 times less than MLML, 2.62 times less than DNN, 2.8 times less than CNN+RNN, 3.18 times less than RNN, 5.06 times less than NB, 6.26 times less than LR, 7.0 times less than NNC-PSOR, 7.7 times less than ANN, and 15.86 times less than KNN. The value of precision, recall and F1-score obtained by proposed classifier is 0.99, but the value obtained by ANN is 1.00.

7. Discussion

The combined use of the LOA and Cb-C for detecting DoS attacks in IoT-based WSNs demonstrates significant improvements over traditional unimodal and hybrid algorithms. The LOA’s ability to balance exploration and exploitation during the search process enhances the algorithm’s adaptability to dynamic WSN environments. Unlike unimodal algorithms that struggle with local optima, the LOA’s population-based approach ensures robust convergence toward optimal solutions, making it well-suited for the detection of complex attack patterns. Additionally, the LOA’s iterative process allows for more effective detection of diverse attack types, resulting in improved classification accuracy.

The Cb-C, known for its fast, high-performance gradient boosting technique, further strengthens the detection mechanism. Its unique symmetric tree structure accelerates prediction times and reduces overfitting, which is crucial for maintaining high accuracy in real-time environments like IoT-based WSNs. By leveraging an advanced ordered target statistics method for handling categorical data, CatBoost mitigates issues related to target leakage and enhances model generalization. This is particularly important in the context of intrusion detection, where the correct classification of network traffic and attack types is critical for system security.

In comparison to other hybrid algorithms, the LOA-Cb-C combination delivers superior results in terms of detection accuracy, computational efficiency, and adaptability to real-time network changes. While some challenges remain, such as managing computational complexity in resource-constrained WSN environments, the method’s scalability and flexibility in addressing diverse attack vectors present strong advantages. Future work can extend this approach to detect a broader range of attacks, while also exploring further optimization techniques to improve computational efficiency.

8. Complexity Analysis of the Proposed Approach

A detailed complexity analysis is presented in this section. The complexity of the LOA depends on the following parameters: population size (N), number of iterations (T), and fitness evaluation (F) for each individual in the population.

a.

Lyrebird Optimization Algorithm:

(i)

Time Complexity:

Population Initialization: Randomly generating an initial population requires O(N), where N is the population size.
Fitness Calculation: For each iteration, the fitness of each individual is evaluated. Let the fitness evaluation take O (F). So, for each iteration, it takes O (N × F).
Exploration and Exploitation (Search Process): For each iteration, the algorithm applies search mechanisms (exploitation and exploration) on the population. Since each individual is updated per iteration, it contributes to an additional O (N) per iteration.
Overall Time Complexity of the LOA: O (T × N × F), where T is the number of iterations, N is the population size, and F is the time complexity for fitness calculation.

(ii)

Space Complexity of the LOA:

Population storage: Each individual in the population is stored, so the space complexity is O (N).

b.

CatBoost Classifier:

The complexity of CatBoost depends on the following:

Number of trees (T), maximum depth of trees (D), number of data points (n), and number of features (f).

(i)

Time Complexity of CatBoost:

Tree Construction: The complexity of building one tree in CatBoost is O(n × f × D), where n-n is the number of data points, f-f is the number of features, and D is the maximum depth of the trees.
Total Time Complexity: With T trees, the overall time complexity for CatBoost is O(T × n × f × D).

(ii)

Space Complexity of CatBoost:

Model Storage: CatBoost needs to store the trees, so the space complexity is O(T × n × D).

To ensure the efficiency of the proposed approach, let us compare the computational complexity with Gboost [31]. Let us breakdown the time and space complexity of GBoost and investigate how CatBoost improves on it in terms of reducing overfitting, using ordered target statistics, and its balanced tree structure.

The time complexity of Gboost is given by O (T × n × f × log n), where T is the number of trees in the model; n is the number of data points or samples; f is the number of features or columns in the dataset; and log n represents the complexity of sorting the data at each node split. XGBoost builds decision trees iteratively. At each split of the decision tree, the algorithm selects the best feature that minimizes a loss function (like mean squared error for regression tasks). The process of finding the best feature and its splitting point requires sorting the feature values for each node in the tree, which costs O (log n). Since XGBoost builds T trees, and for each tree, it looks through all n data points and f features to find the best split, and the total time complexity becomes O (T × n × f × log n). The space complexity of Gboost is given by O (T × n × D), where D is the maximum depth of the trees. Each tree stores information for all n samples, and since XGBoost constructs T trees, the overall space required for storing the tree structure and the data associated with each node is O (T × n × D).

c.: Lyrebird Optimization Algorithm–CatBoost Classifier (LOA-Cb-C):

LOA-Cb-C offers a competitive time complexity of O(T × N × F +T × n × f × D), which is efficient because of the robust optimization provided by the LOA and the fast training and inference of CatBoost.

Thus, the LOA-Cb-C combination strikes a favorable balance between computational efficiency and classification performance, making it a suitable choice for large-scale, real-time IoT-based WSN applications.

9. Conclusions

In this study, a novel approach is developed and implemented for detecting and classifying attacks in WSNs using LOA-Cb-C. The experiments reveal that our classifier is highly successful, with 99.66 percent accuracy and 0.34 error. These results outperform the existing models reported in the literature, highlighting the superiority of our approach. Furthermore, our proposed classifier achieves challenging results for key performance metrics such as precision, recall, and F1-score. This indicates that our classifier not only accurately detects attacks but also minimizes false negatives and false positives, making it highly reliable for practical applications in WSNs. Overall, our study demonstrates the effectiveness of using LOA-Cb-C for ID in WSNs. Future work could focus on further enhancing the classifier’s performance by exploring additional features or refining the optimization process.

10. Future Research Directions

a.: Enhancing the LOA’s Adaptability for Complex IoT Environments: While the LOA demonstrates strong capabilities in balancing exploration and exploitation, future research could focus on improving its adaptability to more complex, multi-modal attack scenarios. Enhancing the LOA’s ability to adjust its exploration parameters dynamically based on the characteristics of the evolving network environment could further improve the accuracy of attack detection. Incorporating adaptive or self-tuning parameters within the LOA could be a significant improvement in achieving optimal performance in dynamic IoT environments.
b.: Integration of Real-Time Constraints and Resource Optimization: IoT-based WSNs often operate in resource-constrained environments where energy efficiency, memory, and computational power are limited. Future work could focus on optimizing the combination of the LOA and Cb-C to function effectively within these constraints. This may involve integrating energy-aware strategies, low-power communication protocols, or lightweight versions of the algorithm that minimize resource consumption while maintaining high detection accuracy.
c.: Expanding Detection to a Wider Range of Attack Types and Datasets: The current approach is primarily designed for detecting DoS attacks; however, future research could explore expanding the detection capabilities to cover a broader spectrum of IoT-based WSN attack types, such as routing attacks, Sybil attacks, and blackhole attacks. Testing the LOA-Cb-C framework on more diverse datasets and real-world environments could further validate its generalizability and effectiveness. Moreover, integrating multi-class classification approaches within CatBoost for more granular identification of attack subtypes could lead to deeper insights into network vulnerabilities.
d.: Hybrid Approaches and Deep Learning Integration: To further improve detection performance, combining LOA with deep learning techniques or other advanced optimization methods (e.g., particle swarm optimization, genetic algorithms) could be explored. Hybrid approaches could capitalize on the strengths of different algorithms and lead to more robust and accurate IDS. Additionally, real-time learning and adaptation, such as online learning techniques, could be integrated to continuously improve the classifier’s performance as new threats emerge.

By addressing these research directions, the combination of the LOA and Cb-C could evolve into an even more powerful tool for securing IoT-based WSNs against a wide array of cyber threats.

Author Contributions

Conceptualization, S.S.A. and P.A.; Methodology, P.A.; Software, D.B.M.; Validation, P.A., A.R. and A.L.; Formal analysis, S.S.A.; Investigation, D.B.M.; Resources, P.A.; Data curation, D.B.M.; Writing—original draft preparation, S.S.A.; Writing—review and editing, A.R.; Visualization, B.W.; Supervision, P.A.; Project administration, J.M.G.; Funding acquisition, J.M.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in kaggle at https://www.kaggle.com/datasets/bassamkasasbeh1/wsnds, accessed on 15 October 2024.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviation

Acronym	Expansion
ANN	Artificial Neural Network
BCOA-MLID	Binary Chimp Optimization Algorithm with Machine Learning-based Intrusion Detection
Cb-C	Catboost classifier
CMPRO	Chimp social incentive-based Mutated Poor Rich Optimization
CNN	Convolutional Neural Network
CNS	Cooja Network Simulator
COS	Contiki Operating System
CRNN	convolutional recurrent neural network
CSA	Crow Search Algorithm
CSCA	Compact Sine Cosine Algorithm
DFFF	Distance-based Fruit Fly Fuzzy
DL	deep learning
DL-HIDS	deep learning-based host intrusion detection
DoS	Denial of Service
ELDA	entropy-based linear discriminant analysis
FA	Firefly Algorithm
FC	Fully Connected
FSA	fish swarm algorithm
GA	genetic algorithm
GRU	Gated Recurrent Unit
GSWO	Genetic Sacrificial Whale Optimization
GWDTO	Grey Wolf Dipper Throated Optimization
GWO	Grey Wolf Optimizer
ID	intrusion detection
IDS	intrusion detection system
IoT	Internet of Things
LOA	Lyrebird Optimization Algorithm
LSTM	long short-term memory
MBOLT-IDS	MegaBAT optimized Long Short-Term Memory
ML	machine learning
MLP	Multi-Layer Perceptron
MLPNN	Multi-Layer Perceptron Neural Network
MSO	Moth Search Optimizer
MSODL-ID	Moth Search Optimizer with Deep Learning Enabled Intrusion Detection
NIDS	Network ID System
NSL	National Security Laboratory
OBPNN	Optimized Back Propagation Neural Network
PCA	Principal Component Analysis
PM	Polymorphic Mutation
PSO	particle swarm optimization
QN3	Quasi-Newton Neural Network
RBFNN	Radial Basis Function Neural Network
RF	Random Forest
RF-RFE	Random Forest-Recursive Feature Elimination
SFS	Sequential Forward Selection
SMOTE	Synthetic Minority Oversampling Technique
SNs	sensor nodes
SVD	Singular Value Decomposition
SVM	Support Vector Machine
TVP-IPSO	Time-Varying Parameter-Improved Particle Swarm Optimization
WOA	Whale Optimization Algorithm
WSN	Wireless Sensor Network

References

Avinash, B.; Manmohan, S.; Ajay Shriaram, K.; Shilpa, S.; Hussien Sobahi, M. Nonlinear Energy Optimization in the Wireless Sensor Network through NN-LEACH. Math. Prob. Eng. 2023, 2023, 5143260. [Google Scholar]
Nagalalli, G.; Ravi, G. A Novel MegaBAT Optimized Intelligent Intrusion Detection System in Wireless Sensor Networks. Intell. Autom. Soft Comput. 2023, 35, 475–490. [Google Scholar] [CrossRef]
Gautami, A.; Shanthini, J.; Karthik, S. A Quasi-Newton Neural Network Based Efficient Intrusion Detection System for Wireless Sensor Network. Comput. Syst. Sci. Eng. 2023, 45, 427–443. [Google Scholar] [CrossRef]
Aljebreen, M.; Ahmed, M.; Ahmad, M.; Abbas, M.; Khan, A.; Alqahtani, S.; Hussien, M.A. Binary Chimp Optimization Algorithm with ML Based Intrusion Detection for Secure IoT-Assisted Wireless Sensor Networks. Sensors 2023, 23, 4073. [Google Scholar] [CrossRef]
Karthikeyan, M.; Manimegalai, D.; RajaGopal, K. Firefly Algorithm Based WSN-IoT Security Enhancement with Machine Learning for Intrusion Detection. Sci. Rep. 2024, 14, 231. [Google Scholar] [CrossRef]
Azar, A.T.; Shehab, E.; Mattar, A.M.; Hameed, I.A.; Elsaid, S.A. Deep Learning Based Hybrid Intrusion Detection Systems to Protect Satellite Networks. J. Netw. Syst. Manag. 2023, 31, 767–788. [Google Scholar] [CrossRef]
Alhasan, R.A.; Hamza, E.K. A Novel CNN Model with Dimensionality Reduction for WSN Intrusion Detection. Rev. d’Intell. Artif. 2023, 37, 1121–1131. [Google Scholar] [CrossRef]
Abbas, Q.; Hina, S.; Sajjad, H.; Zaidi, K.S.; Akbar, R. Optimization of Predictive Performance of Intrusion Detection System Using Hybrid Ensemble Model for Secure Systems. PeerJ Comput. Sci. 2023, 9, e1552. [Google Scholar] [CrossRef]
Darla, S.; Naveena, C. An Optimized Deep Learning Based Malicious Nodes Detection in Intelligent Sensor-Based Systems Using Blockchain. J. Adv. Inf. Technol. 2023, 14, 1037–1045. [Google Scholar] [CrossRef]
Murugesh, C.; Murugan, S. Moth Search Optimizer with Deep Learning Enabled Intrusion Detection System in Wireless Sensor Networks. SSRG Int. J. Electr. Electron. Eng. 2023, 10, 77–90. [Google Scholar] [CrossRef]
Yang, Y.; Gu, Y.; Yan, Y. Machine Learning-Based Intrusion Detection for Rare-Class Network Attacks. Electronics 2023, 12, 3911. [Google Scholar] [CrossRef]
Awajan, A. A Novel Deep Learning-Based Intrusion Detection System for IoT Networks. Computers 2023, 12, 34. [Google Scholar] [CrossRef]
Mandala, V.; Senthilnathan, T.; Suganyadevi, S.; Gobhinath, S.; Selvaraj, D.S.; Dhanapal, R. An Optimized Back Propagation Neural Network for Automated Evaluation of Health Condition Using Sensor Data. Meas. Sens. 2023, 29, 100846. [Google Scholar] [CrossRef]
Hnamte, V.; Hussain, J. DCNNBiLSTM: An Efficient Hybrid Deep Learning-Based Intrusion Detection System. Telemat. Inform. Rep. 2023, 10, 100053. [Google Scholar] [CrossRef]
Singh, A.; Amutha, J.; Nagar, J.; Sharma, S.; Lee, C.C. AutoML-ID: Automated Machine Learning Model for Intrusion Detection Using Wireless Sensor Network. Sci. Rep. 2022, 12, 9074. [Google Scholar] [CrossRef]
Karthika, J.; Loganathan, S.; Vanathi, M. A Hybrid Machine Learning Based Feature Selection Technique for Attack Detection in NIDS. J. Phys. Conf. Ser. 2022, 2335, 012033. [Google Scholar] [CrossRef]
Hidayat, I.; Ali, M.Z.; Arshad, A. Machine Learning-Based Intrusion Detection System: An Experimental Comparison. J. Comput. Cogn. Eng. 2022, 2, 88–97. [Google Scholar] [CrossRef]
Patil, D.R.; Pattewar, T.M. Majority Voting and Feature Selection Based Network Intrusion Detection System. EAI Endorsed Trans. Scalable Inf. Syst. 2022, 22, e173780. [Google Scholar] [CrossRef]
Balobaid, A.S.; Ahamed, S.B.; Shamsudheen, S.; Balamurugan, S. Neural Network Clustering and Swarm Intelligence-Based Routing Protocol for Wireless Sensor Networks: A Machine Learning Perspective. Comput. Intell. Neurosci. 2023, 2023, 4758852. [Google Scholar] [CrossRef]
Zang, M.; Yan, Y. Machine Learning-Based Intrusion Detection System for Big Data Analytics in VANET. In Proceedings of the IEEE 93rd Vehicular Technology Conference 2021, Helsinki, Finland, 25–28 April 2021. [Google Scholar] [CrossRef]
Khan, M.A.; Jan, M.A.; Alam, M.M.; Khalid, A.; Ahmad, M.; Manzoor, S.; Rodrigues, J.J.P.C.; Rodrigues, O. A Deep Learning-Based Intrusion Detection System for MQTT Enabled IoT. Sensors 2021, 21, 7016. [Google Scholar] [CrossRef]
Alruhaily, N.M.; Ibrahim, D.M. A Multi-Layer Machine Learning-Based Intrusion Detection System for Wireless Sensor Networks. Sensors 2021, 12, 281–288. [Google Scholar] [CrossRef]
Lansky, J.; Dobias, P.; Sahula, V.; Kremen, P. Deep Learning-Based Intrusion Detection Systems: A Systematic Review. IEEE Access 2021, 9, 101574–101599. [Google Scholar] [CrossRef]
Zhang, T.; Han, D.; Marino, M.D.; Wang, L.; Li, K.C. An Evolutionary-Based Approach for Low-Complexity Intrusion Detection in Wireless Sensor Networks. Wirel. Pers. Commun. 2021, 126, 2019–2042. [Google Scholar] [CrossRef]
Idrissi, I.; Azizi, M.; Moussaoui, O. A Lightweight Optimized Deep Learning-Based Host-Intrusion Detection System Deployed on the Edge for IoT. Int. J. Comput. Digit. Syst. 2022, 11, 209–216. [Google Scholar] [CrossRef]
Pan, J.S.; Fan, F.; Chu, S.C.; Zhao, H.Q.; Liu, G.Y. A Lightweight Intelligent Intrusion Detection Model for Wireless Sensor Networks. Secur. Commun. Netw. 2021, 2021, 5540895. [Google Scholar] [CrossRef]
Almomani, I.; Alenezi, M. Efficient Denial of Service Attacks Detection in Wireless Sensor Networks. J. Inf. Sci. Eng. 2018, 34, 977–1000. [Google Scholar] [CrossRef]
Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep Learning Approach for Intelligent Intrusion Detection System. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
Le, T.T.H.; Park, T.; Cho, D.; Kim, H. An Effective Classification for DoS Attacks in Wireless Sensor Networks. In Proceedings of the 2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN), Prague, Czech Republic, 3–6 July 2018; pp. 689–692. [Google Scholar] [CrossRef]
Salmi, S.; Oughdir, L. Performance Evaluation of Deep Learning Techniques for DoS Attacks Detection in Wireless Sensor Network. J. Big Data 2023, 10, 17. [Google Scholar] [CrossRef]
Nguyen, T.M.; Hanh Hong-Phuv, V.; Yoo, M. Enhancing Intrusion Detection in Wireless Sensor Networks Using a GSWO-CatBoost Approach. Sensors 2024, 24, 3339. [Google Scholar] [CrossRef]
Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Proceedings of the Network and Distributed Systems Security (NDSS)Symposium 2018, San Diego, CA, USA, 18–21 February 2018. [Google Scholar] [CrossRef]
Mohammad Hashemi, J.; Eric, K. Enhancing Robustness Against Adversarial Examples in Network Intrusion Detection Systems. In Proceedings of the 2020 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN), Leganes, Spain, 10–12 November 2020; pp. 1–7. [Google Scholar] [CrossRef]
Bovenzi, G.; Aceto, G.; Ciuonzo, D.; Montieri, A.; Persico, V.; Pescape, A. Network anomaly detection methods in IoT environments via deep learning: A Fair comparison of performance and robustness. Comput. Secur. 2023, 128, 103167. [Google Scholar] [CrossRef]
Deshpande, S.; Gujarathi, J.; Chandre, P.; Nerkar, P. A Comparative Analysis of Machine Deep Learning Algorithms for Intrusion Detection in WSN. Stud. Syst. Decis. Control 2021, 341, 173–193. [Google Scholar] [CrossRef]
Wazirali, R.; Ahmad, R. Machine Learning Approaches to Detect DoS and Their Effect on WSNs Lifetime. Comput. Mater. Contin. 2022, 70, 4921–4946. [Google Scholar] [CrossRef]
Premkumar, M.; Sundararajan, T.V.P. DLDM: Deep Learning-Based Defense Mechanism for Denial of Service Attacks in Wireless Sensor Networks. Microprocess. Microsyst. 2020, 79, 103278. [Google Scholar] [CrossRef]

Figure 1. IoT-based Wireless Sensor Network—the basic structure [1].

Figure 2. Flowchart of the LOA.

Figure 3. Flowchart of the CatBoost classifier.

Figure 4. Flow diagram of the proposed LOA-Cb-C.

Table 1. Some recent studies related to intrusion detection in WSNs.

Year	Author Names	Model Used	Dataset	Accuracy	Computational Efficiency	Energy Efficiency
2024	Karthikeyan, M.; Manimegalai, D.; Raja Gopal, K. [5]	FA with ML	Not mentioned	95%	High (Complex Optimization)	Moderate
2023	Gautami, A.; Shanthini, J.; Karthik, S. [3]	QNNN	Not mentioned	91%	Medium	High
2023	Aljebreen, M.; Ahmed, M.; Ahmad, M. [4]	BCO	IoT-WSN	93.5%	High	Low
2023	Azar, A.T.; Shehab, E.; Mattar, A.M. [6]	DL-HIDS	Satellite Networks	98%	High	Low
2023	Alhasan, R.A.; Hamza, E.K. [7]	CNN with Dimensionality Reduction	WSN Dataset	97%	High	Moderate
2023	Abbas, Q.; Hina, S. [8]	Hybrid Ensemble Model	Secure Systems	92%	Medium	High
2023	Darla, S.; Naveena, C. [9]	Optimized DL	Blockchain Sensor Systems	94%	High	Moderate
2023	Murugesh, C.; Murugan, S. [10]	MSO	Not mentioned	95%	High	Moderate
2023	Yang, Y.; Gu, Y.; Yan, Y. [11]	ML for Rare-Class Attacks	Not mentioned	90%	Moderate	High
2023	Awajan, A. [12]	Deep learning IDS for IoT	IoT Networks	92%	High	Moderate
2023	Mandala, V.; Senthilnathan, T. [13]	BPNN	Sensor Data	96%	High	Moderate
2023	Hnamte, V.; Hussain, J. [14]	DCNNBiLSTM	WSN	97%	High	Moderate
2023	Balobaid, A.S.; Ahamed, S.B. [19]	NN Clustering	WSN	94%	Moderate	High
2023	Salmi, S.; Oughdir, L. [30]	DL Techniques for DoS Detection	WSN	92%	High	Moderate
2022	Singh, A.; Amutha, J. [15]	AutoML-ID	WSN Dataset	94%	High	Moderate
2022	Karthika, J.; Loganathan, S. [16]	Hybrid ML Feature Selection	NIDS	91%	Medium	High
2022	Patil, D.R.; Pattewar, T.M. [18]	Majority Voting and Feature Selection	Not mentioned	89%	Medium	High
2022	Idrissi, I.; Azizi, M. [25]	Optimized DL-based IDS	Edge IoT	93%	Medium	High
2021	Zang, M.; Yan, Y. [20]	ML for Big Data Analytics	VANET	95%	High	Moderate
2021	Khan, M.A.; Jan, M.A. [21]	DL-IDS	MQTT-Enabled IoT	96%	High	Moderate
2021	Alruhaily, N.M.; Ibrahim, D.M. [22]	Multi-Layer ML IDS	WSN	93%	Moderate	High
2021	Pan, J.S.; Fan, F. [26]	Lightweight IDS	WSN	94%	Moderate	High
2024	Abinayaa, S.S. et al.	LOA-Cb-C	WSN-DS	99.66%	Low (Optimized)	High

Table 2. Breakdown of WSN-DS dataset.

	Normal	Blackhole	Grayhole	Flooding	TDMA	Total
No. of testing samples (20%)	67,979	2030	2943	618	1363	74,933
No. of training samples (80%)	272,087	8019	11653	2694	5275	299,728
% of proportion	90.77	2.68	3.9	0.88	1.77	374,661

Table 3. Hyperparameters of the proposed classifier.

Parameter	Input Values
Defining search space for hyper parameters
iterations	range (100, 1000, 100)
depth	range (4, 11)
learning rate	[0.01, 0.05, 0.1, 0.5]
optimizer	lyrebirdOptimizer(param_space)
best_params	optimizer.optimize(X_train, y_train)
model=	Catboost-Classifier(best_params)

Table 4. Performance of proposed classifier on test data.

Class	TP Rate	FP Rate	Precision	Recall	F1_Measure	ROC
Normal	0.99	0.02	0.99	0.99	0.99	1.00
Flooding	0.97	0.00	0.94	0.97	0.96	0.99
TDMA	0.93	0.00	1.00	0.93	0.96	0.95
Grayhole	0.97	0.00	0.98	0.97	0.98	1.00
Blackhole	0.99	0.00	0.98	0.99	0.98	1.00
Weighted Average	0.99	0.01	0.99	0.99	0.99	0.99

Table 5. Confusion matrix of the proposed LOA-Cb-C on test data.

Class	Normal (a)	Flooding (b)	TDMA (c)	Grayhole (d)	Blackhole (e)
Normal (a)	67,914	37	0	31	3
Flooding (b)	16	663	0	0	0
TDMA (c)	93	0	1249	1	0
Grayhole (d)	33	0	0	2798	33
Blackhole (e)	0	0	0	9	2052

Table 6. Output of the proposed LOA-Cb-C.

Metrics	Output
Accuracy	99.66
Error	0.34
Kappa Statistic	0.98
MAE (Mean Absolute Error)	0.0025
RMSE (Root Mean Squared Error)	0.034
Total number of instances	74,932

Table 7. LOA-Cb-C comparison with other reported techniques.

Algorithm/Model	Performance Metrics
Algorithm/Model	Accuracy	Error	Precision	Recall	F1-Score
LOA-Cb-C (Proposed model)	99.66	0.34	0.99	0.99	0.99
Gboost [36]	99.60	0.40	-	-	0.98
MVFS [18]	99.50	0.50	0.97	0.96	0.96
DLDM [37]	99.20	0.80	-	-	-
CNN [30]	98.79	1.21	0.94	0.92	0.93
SVM [35]	98.72	1.28	-	-	-
RF [28]	98.00	2.00	0.99	0.96	0.97
MLML [22]	97.65	2.35	0.94	0.96	0.93
DNN [30]	97.04	2.96	0.82	0.82	0.82
DT [28]	97.00	3.00	0.94	0.95	0.95
CNN + RNN [30]	96.86	3.14	0.85	0.85	0.82
RNN [30]	96.48	3.52	0.85	0.69	0.75
NB [28]	94.60	5.40	0.32	0.76	0.45
LR [28]	93.40	6.60	0.88	0.77	0.82
NNC-PSOR [19]	93.00	7.00	0.93	0.92	0.92
ANN [29]	91.96	8.04	1.00	1.00	1.00
KNN [28]	83.80	16.20	0.69	0.66	0.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abinayaa, S.S.; Arumugam, P.; Mohan, D.B.; Rajendran, A.; Lashab, A.; Wei, B.; Guerrero, J.M. Securing the Edge: CatBoost Classifier Optimized by the Lyrebird Algorithm to Detect Denial of Service Attacks in Internet of Things-Based Wireless Sensor Networks. Future Internet 2024, 16, 381. https://doi.org/10.3390/fi16100381

AMA Style

Abinayaa SS, Arumugam P, Mohan DB, Rajendran A, Lashab A, Wei B, Guerrero JM. Securing the Edge: CatBoost Classifier Optimized by the Lyrebird Algorithm to Detect Denial of Service Attacks in Internet of Things-Based Wireless Sensor Networks. Future Internet. 2024; 16(10):381. https://doi.org/10.3390/fi16100381

Chicago/Turabian Style

Abinayaa, Sennanur Srinivasan, Prakash Arumugam, Divya Bhavani Mohan, Anand Rajendran, Abderezak Lashab, Baoze Wei, and Josep M. Guerrero. 2024. "Securing the Edge: CatBoost Classifier Optimized by the Lyrebird Algorithm to Detect Denial of Service Attacks in Internet of Things-Based Wireless Sensor Networks" Future Internet 16, no. 10: 381. https://doi.org/10.3390/fi16100381

APA Style

Abinayaa, S. S., Arumugam, P., Mohan, D. B., Rajendran, A., Lashab, A., Wei, B., & Guerrero, J. M. (2024). Securing the Edge: CatBoost Classifier Optimized by the Lyrebird Algorithm to Detect Denial of Service Attacks in Internet of Things-Based Wireless Sensor Networks. Future Internet, 16(10), 381. https://doi.org/10.3390/fi16100381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu