1. Introduction
Power optimization mechanisms have been widely adapted by today’s microcontroller designers to minimize the chip’s dynamic power consumption. Smart sensing technologies with limited energy resources (e.g., IoT platforms, health monitoring devices, energy monitoring systems, and radio communication modules) are widely integrated with a power-efficient microcontroller chipset based on an ARM Cortex M core to support low energy processing capabilities.
Microcontrollers based on the ARM Cortex M chipset become the ultimate choice for supporting low-cost and power-efficient processing on embedded systems. Low energy processing on the ARM Cortex M chipset is supported via the deployment of clock-gating methodology [
1,
2,
3,
4,
5]. Clock-gating methodology is a hardware feature that enables dynamic activation/ deactivation of various subsystems on the chip to minimize dynamic power consumption. Examples of subsystems that are clock-gated include interrupt vector modules, USB, RTC,
, UART, and RNG. Via chip configuration code running on the chipset during boot time, subsystems that are not used by the embedded system application can be deactivated to lower dynamic energy consumption.
Although hardware-based optimization techniques based on the clock-gating approach enable power-efficient processing, such techniques can be exploited by an attacker to target the subsystems’ availability during software execution. A recent study presented by A. Rasheed et al. (2021) [
1] showed that embedded systems implemented with an ARM Cortex M microcontroller chipset are vulnerable to clock-gating-assisted malware attacks. The author introduced new malware threats that compromise the chipset configuration file during boot time by injecting malicious code. It aims at modifying the chip’s configuration parameters during the initialization process by disabling/enabling various subsystems on the chipset. For example, an attacker will be able to disable the RNG module on the chipset, impacting the reliability of cryptographic computations. Another example includes disabling serial communication ports on the chipset where sensor systems became unable to transmit and receive data. Several variants of the malware were presented in [
1] and include Power Hungry, PIT-off, UART-killer, and
killer.
Signature-based approaches to malware detection in embedded systems are limited by their reliance on known malware signatures. Zero-day attacks and advanced malware variants that can modify their signatures make this approach inherently ineffective. The malware signature database must be constantly updated in signature-based systems in order to keep up with the evolving landscape of malware threats [
6,
7]. They are also less effective at detecting sophisticated attacks that exploit hardware vulnerabilities, such as clock-gating, which are difficult to detect with traditional signatures [
8].
In contrast, the proposed Intrusion Detection Systems (IDSs) in this research leverage machine learning techniques to overcome these limitations. Our IDSs analyze power consumption data and employ advanced classification algorithms to detect anomalies and patterns indicative of clock-gating-assisted malware, regardless of the attack signature. With this approach, embedded systems can be protected against evolving threats by detecting zero-day attacks and sophisticated malware variants that exploit hardware features.
This paper proposes the development and implementation of two IDSs capable of detecting several variants of the clock-gating-assisted malware presented in [
1]. Six IDSs leveraging machine learning approaches based on K-Nearest Classifier, Random Forest, Logistic Regression, Decision Tree, Naive Bayes, and Stochastic Gradient Descent have been trained and validated using a dynamic power dataset collected from the chipset during malware execution and under normal operation. To the best of our knowledge, this research effort introduces the first solution for detecting hardware-based malware via classifying and identifying malware code injection attacks that exploit vulnerabilities in the clock-gating mechanism of the ARM Cortex M chipset. Our proposed IDSs consider unique characteristics of the ARM-based embedded system power consumption data for detecting and classifying clock-gating-assisted malware effectively. During this effort, we proposed the development of reliable and efficient IDSs that can operate within the resource-constrained nature of embedded systems. Our proposed IDSs aim to overcome the limitations of signature-based approaches by utilizing machine learning techniques for hardware-based malware detection on embedded systems with ARM Cortex M chipset. The proposed IDSs implement machine learning algorithms that have the capacity to accurately detect and categorize zero-day malware on hardware. The proposed IDS is capable of combining anomaly-based recognition, misuse-based recognition, and specification-based recognition procedures [
2].
Malware data samples including nonanomalous data samples obtained from previous research efforts conducted by Rasheed et al. (2021) [
1] were utilized for model training, testing, and validation. Our main objective in this paper is the development of IDSs capable of classifying malware presence via analyzing power consumption data of the ARM Cortex M chipset during program execution. To test our proposed IDSs, multiple embedded systems were employed and deployed with various variants of clock-gating-assisted malware codes. To truly simulate the behavior of a system under malware threat, each system was incorporated with multi-sensor modules capable of capturing sensor data from the infected embedded platforms, including measurements of light, temperature, humidity, accelerometer, and pressure readings. Among the sensing unit and the primary chip, data from these sensors were sent utilizing the uart and
serial communication channels. In order to make use of the SIM module, the malware altered the bits of values in the system clock-gate management registers. For all variations of the suggested malware, malicious code injections were made into the “systemInit()” method. Intruders could access these registers during the boot-up process by inserting malicious code within the “systemInit()” function. During this research, our IDSs were tested against four distinct malware strains: Power Hungry, PIT-off, uart killer, and
killer.
The proposed IDSs employ various models on a power consumption dataset, including four types of clock-gating-assisted malware and normal clock-gating operations. Our main objectives in this research are the following:
To highlight software threats and attacks against clock-gating techniques in embedded systems.
To propose IDSs using machine learning models to identify and classify clock-gating-assisted malware correctly.
To examine the effectiveness and efficiency of the proposed IDSs and compare them against various machine learning baseline models for identifying and categorizing malware that uses clock-gating.
The remaining part of this paper follows this structured outline: In
Section 2, we review prior research pertaining to the topic at hand.
Section 3 outlines the methodology applied in this study, while
Section 4 presents the analysis and findings. Finally, in
Section 5, we conclude and discuss potential future lines of exploration.
2. Related Work
This section aims to highlight the emergence of malware threats to clock-gating operation and refine malware identification and classification within the embedded systems landscape. Previous research works have explained and analyzed clock-gating and other ways of preventing power dissipation. In addition, threats to the hardware of embedded systems have been analyzed. However, little research has been done on threats to clock-gating and the software involved in embedded systems. For these reasons, we seek ways to improve and rectify these shortcomings. We want to identify and classify malware in embedded systems correctly.
Several works have been proposed to secure embedded systems [
9,
10,
11]. Zareen et al. [
8] present an approach to embedded device security through the development of a Hardware Immune System (HWIS), leveraging Artificial Immune Systems for effective malware detection in IoT devices. In resource-constrained environments, the HWIS demonstrates high efficiency in detecting botnet activities, achieving 96.7% accuracy with minimal overhead in power and area and no impact on processor delay. In the context of IoT device security, this method represents a significant improvement over traditional software-based malware detection.
Tamil et al. [
12] described the use of clock-gating, which decreases the dissipation of dynamic power in synchronous circuits. The paper explained how clock-gating works and also outlined different clock-gating techniques. Various types were considered, such as latch-based, flip-flop-based, gate-based, synthesis-based, and look-ahead-based clock-gating. The paper also discussed other power reduction techniques, such as power gating and adiabatic logic. In conclusion, the paper presented a summary of the issues that are associated with clock-gating.
Shila et al. [
13] proposed the design and implementation of Hardware Trojan Threats (HTTs) in Field-Programmable Gate Arrays (FPGA). The paper also proposed a detectability metric, called HTT detectability metric (HDM), to assess the efficiency of HTT detection techniques. A security analysis of the HTTs was conducted, and their detectability was evaluated using the proposed metric. Testbeds were put into use on MicroZed’s Xilinx Zynq-based FPGA development board. The paper showed that the proposed HTTs can be successfully implemented in the FPGA testbed. The detectability metric proposed by the paper effectively evaluated the detectability of HTTs. The security analysis of the HTTs showed that they can be used to leak secret information or cause denial-of-service attacks.
Subramanian et al. [
14] proposed an Adaptive Counter-Clock (ACC) S-Box algorithm for Advanced Encryption Standard (AES) [
15] that corrects errors while encryption takes place, as well as ensuring the security of data during encryption. The paper also aimed to reduce area size, power dissipation, and consumption. The round keys were obtained by running a key expansion code on three different key lengths of AES (128, 192, and 256 bits). Errors in data encryption were fixed using the ACC S-Box technique. As part of the encryption process, the paper made use of Field-Programmable Gate Arrays (FPGAs). The results show that the ACC S-Box algorithm improves the security of AES by rectifying errors during data encryption.
Mehta et al. [
16] have proposed a method for detecting suspicious activity in Internet of Things (IoT)-embedded devices. The proposed method is based on a hierarchical design that distributes computational resources over IoT devices, making it scalable. The approach observes the device’s performance and correlation to similar devices to detect anomalies. Experiment findings demonstrate that the proposed strategy effectively identifies suspicious activity. The proposed approach is also resilient, meaning it can continue operating with minimum functionality even if an intrusion is detected.
Hunter et al. [
17] investigated the viability of resource-constrained embedded devices frequently utilized in Internet of Things (IoT) systems by utilizing deep learning for intrusion detection. In the paper, four deep learning models that had already been trained were tested on devices with different capacities for resources. The models were trained on separate intrusion detection datasets, and their accuracy, precision, recall, F1 score, and prediction rate were evaluated. The paper also included testing the models’ responses to new attack patterns using separate datasets. The research also covered the usage of thin neural network structures for outstanding performance with little computation and potential consumption of energy. The study’s findings, which assessed whether deep learning-based intrusion detection could be implemented on embedded devices with minimal resources, were given in the publication. According to the paper’s findings, lightweight neural network topologies can deliver enough performance with few calculations and potential power requirements.
Emnett et al. [
18] discuss a design methodology using RTL clock-gating in ASICs to significantly reduce power consumption, with a successful application in a 200K-gate ASIC reducing power by two-thirds. The method also integrates with full scan techniques for low-power and testable designs.
Shinde et al. [
19] investigate various clock-gating techniques for power optimization in VLSI circuits at RTL level, used extensively in the Pentium 4 processor. The paper emphasizes the importance of considering power optimization early in the design process, at the RTL stage.
Wu et al. [
20] propose two clock-gating techniques based on a quaternary variable model of the clock in sequential circuits. The method demonstrates power savings and the potential for synchronous operation with the master clock, while also addressing engineering challenges for practical application.
Li et al. [
21] introduce deterministic clock-gating (DCG) for microprocessors, showing an average of 19.9% reduction in processor power with no performance loss. DCG is contrasted with pipeline balancing (PLB), demonstrating greater power savings and simpler implementation.
Casillo et al. [
22] present an embedded Intrusion Detection System (IDS) for automotive cybersecurity, using a Bayesian Network approach to quickly identify malicious messages in the vehicle’s Controller Area Network (CAN-Bus). Initial experiments with an automotive simulator show promising results for the system’s effectiveness.
Sayadi et al. [
23] propose a lightweight, machine learning-based HMD framework for embedded devices, utilizing Hardware Performance Counter (HPC) features for runtime malware detection. The research highlights that while complex classifiers like MLP, BayesNet, and SMO show higher detection accuracy, lightweight classifiers like JRip and OneR offer high accuracy per unit area for different malware classes. The study demonstrates a significant improvement in malware detection accuracy using the customized HMD approach, providing insights into selecting suitable ML classifiers for embedded system malware detection.
Rahmatian et al. [
24] present a hardware-assisted intrusion detection technique for secure embedded systems, focusing on real-time detection of malware execution. The method uses FPGA logic to detect behavioral differences between correct system operation and malware and is adaptable to new malware and changing system behaviors. The system extracts the Process ID (PID) from the OS, using it to monitor system call sequences on the FPGA. The technique is shown to be effective in handling real-world programs with minimal runtime performance overhead, making it a promising approach for application-specific embedded processors requiring fast and accurate attack detection.
Previous research underscores the evolving challenges in power optimization and malware detection in embedded systems, with a focus on clock-gating techniques and hardware-assisted solutions. While these studies lay a solid groundwork, our research distinguishes itself by specifically addressing the vulnerabilities in ARM Cortex-M-based microcontrollers. We propose an innovative Intrusion Detection System (IDS) tailored for these systems, utilizing advanced machine learning techniques for heightened accuracy in detecting and categorizing clock-gating malware, a crucial step forward in bolstering the security of modern embedded platforms.
4. Proposed Methodology
Figure 7 illustrates the methodological steps employed in this research to develop and evaluate an Intrusion Detection System (IDS) for detecting and classifying malware on embedded systems. According to
Figure 7, here are our methodology steps:
Data Loading and Preprocessing: This initial step involves loading the dataset used for experimentation and preprocessing the data. The dataset consists of two features, time and current, and is labeled with distinct classes representing different malware types.
Model Training and Evaluation: The preprocessed dataset is used to train the chosen machine learning models. This involves feeding the models with labeled data and allowing them to learn patterns and features associated with malware detection. Subsequently, the trained models are evaluated using appropriate metrics to assess their performance.
Machine Learning Models: In this phase, various traditional machine learning models are considered for the IDS. These models include K-Nearest Classifier (KNN), Random Forest (RF), and Logistic Regression (LR).
Result Analysis: The outcome of model training and evaluation is thoroughly analyzed. Evaluation metrics such as accuracy scores, as mentioned in the paper, are used to measure the effectiveness of the IDS in detecting and classifying malware.
Experiments and Validation: The authors conduct experiments to compare the performance of different machine learning models.
The algorithm used to carry out the experiment is described in Algorithm 1.
Algorithm 1 Classification Algorithm. |
- 1:
D← LoadData(); where ▹ Load data into dataset - 2:
Check if ∈D is empty or null; impute if yes - 3:
Initialize D as feature vector: - 4:
Normalize feature vector: - 5:
Split feature vector X into training () and testing () data - 6:
- 7:
model_prediction ▹ List to hold model predictions - 8:
- 9:
- 10:
for in do - 11:
for do - 12:
Initialize - 13:
Train - 14:
= Test - 15:
- 16:
end for - 17:
- 18:
print classification report - 19:
- 20:
end for
|
4.1. Dataset
This study involved the collection of current/power consumption data under normal device operation and when the device was infected. Various malicious codes, including Power Hungry, PIT-off, killer, and uart killer, were executed on separate IoT platforms, simulating a total duration of 600 s. Current/power measurements were recorded at a sample rate of 1000 samples per second, with a current resolution of 1 µA. For each malware strain, 600,000 data points were gathered during the experiment. Additionally, a dataset comprising 700,000 current measurements was collected for an IoT testbed without infection.
4.2. Intrusion Detection System Based on Machine Learning Approaches
The proposed Intrusion Detection System (IDS) is designed to detect malware on embedded systems based on the “systemInit()” function. The “systemInit()” method is responsible for initializing the system after booting. We intend to check the “systemInit()” during boot time to detect and correctly classify malware types. The IDS uses the signature-based technique for malware detection. The signature-based detection technique detects known malware based on its signature and pattern.
The design consists of two main components: preprocessing and detection modules. The preprocessing module is responsible for collecting and preprocessing data from the “systemInit()” method. This module collects the necessary data that are needed to detect malware. The collected data are then preprocessed to extract features. The detection module uses machine learning models to classify the extracted features as malicious or benign. It will also classify the subcategory of the malware where necessary.
Our proposed IDS is expected to be an effective tool for detecting malware on embedded systems. By monitoring the “systemInit()” method and using machine learning to classify the extracted features, the IDS can detect and respond to malware in real time, preventing potential damage to the system. We have employed various traditional machine learning models to detect and classify clock-gating-assisted malware. Six machine learning approaches were utilized for the proposed IDS, namely, K-Nearest Classifier, Random Forest, Logistic Regression, Decision Tree, Naive Bayes, and Stochastic Gradient Descent.
4.2.1. K-Nearest Classifier (KNN)-Based Detection Approach
This is a classification algorithm that is commonly used in machine learning. To implement the K-Nearest Classifier algorithm, we used Python and its built-in library for machine learning, sci-kit-learn. To optimize the algorithm’s performance, we experimented with different values of the hyperparameter k, which determines the number of nearest neighbors to consider when classifying a new sample. Tuning this hyperparameter improved classification accuracy, especially for the clock-gating-assisted malware dataset [
29].
The K-Nearest Neighbors classification formula is a fundamental concept in machine learning for classifying data points based on the majority class of their nearest neighbors.
represents the data point’s expected class label
x.
is used to find the class label
j that maximizes a specific expression.
represents the summation over
k terms, where
k is the number of nearest neighbors considered.
is an indicator function that equals 1 when
, indicating that the
ith neighbor belongs to class
j.
4.2.2. Random Forest (RF)-Based Detection Approach
Multi-decision tree ensemble learning improves classification accuracy by combining multiple decision trees. With Python and scikit-learn, we implemented the Random Forest algorithm and tuned its hyperparameters, including the number of trees and their maximum depth. By using this algorithm, we were able to achieve high accuracy in detecting and classifying clock-gating-assisted malware. Although Random Forests are robust and perform well on many datasets, they can be computationally expensive and may overfit on noisy datasets [
30]. In the formula below,
represents the predicted class label for the data point
x. It is determined by taking the mode (the most frequently occurring class label) of the predicted class labels from
n individual decision trees, where
is the prediction made by the
i-th decision tree. Random Forest leverages the diversity of multiple trees to improve the overall accuracy and generalization of the classification, making it a powerful machine learning algorithm for various tasks.
4.2.3. Logistic Regression (LR)-Based Detection Approach
For binary classification applications, the logistic regression approach is frequently utilized. We implemented the Logistic Regression algorithm using Python and scikit-learn, and we tuned the hyperparameters, including the regularization strength and the solver used. By fine-tuning these hyperparameters, we were able to improve the accuracy of the algorithm in detecting and classifying clock-gating-assisted malware. We were able to improve the accuracy of the algorithm in detecting and classifying clock-gating-assisted malware. Logistic Regression is simple and efficient but may not perform well when the decision boundary is nonlinear [
31]. In the formula below,
represents the conditional probability that the target variable
Y takes the value 1 given the input features
X. The equation involves model parameters
that are learned during training, and
are the input feature values. The logistic function
, where
z is the linear combination of the features and parameters, is used to model the probability of the positive class. One-vs.-all (OvA) or softmax regression approaches can be used to expand the utility of logistic regression to multiclass issues, which are particularly beneficial for binary classification applications.
4.2.4. Decision Tree (DT)
This is a widely used algorithm in machine learning that is commonly used for classification tasks. Based on a tree-like model, decisions are modeled along with their possible consequences. Using a recursive process, the algorithm divides the data into smaller groups at every tree node by focusing on the most important feature. The Decision Tree algorithm’s hyperparameters, such as the tree’s deepest point and the lowest possible number of samples necessary to divide a node, were adjusted using grid search. The Decision Tree is easy to interpret and can handle categorical and numerical data, but it can easily overfit and perform poorly on complex datasets [
32]. A Decision Tree, denoted as
T, recursively partitions the feature space into regions by evaluating feature conditions at each node.
T represent a Decision Tree with nodes
, feature conditions
, and child nodes
and
:
4.2.5. Naive Bayes (NB)
This is a probabilistic method that is frequently used for classification tasks, particularly in Natural Language Processing (NLP). We implemented the Naive Bayes algorithm using the scikit-learn library in Python. We tuned the hyperparameter alpha, which controls the strength of the smoothing applied to the probabilities. To improve the accuracy of the algorithm, we experimented with different values of the hyperparameter “alpha”, which controls the strength of the smoothing applied to the probabilities. Naive Bayes is fast and efficient for high-dimensional datasets, but it assumes independence between features and may perform poorly when this assumption is violated [
33]. Naive Bayes is a probabilistic classifier that estimates the probability of a data point belonging to a particular class
y based on the likelihood of the features
X given that class and the prior probability of class
y. It simplifies the computation by assuming that characteristics are conditionally independent. In the following equation,
signifies the probability that a particular class
y is the correct one given the input features
X.
represents the likelihood of encountering the input features
X when the class is
y.
indicates the initial probability of class
y, while
refers to the overall probability of observing the input features
X.
4.2.6. Stochastic Gradient Descent (SGD)
An iterative optimization algorithm is used to train large-scale machine learning models. At each iteration, the model’s parameters are updated by computing the gradient of the loss function on a subset of the training data. It can optimize logistic regression or linear SVM models for classification tasks. Hyperparameters such as learning rate, regularization parameter, and batch size can be selected using grid search and cross-validation. Due to its efficiency and effectiveness in large-scale problems, SGD is widely used in machine learning libraries [
34]. In the formula below,
represents the updated model parameters at iteration
,
is the current model parameter vector at iteration
t,
is the learning rate that controls the step size, and
is the gradient of the loss function
with respect to the model parameters. SGD is suitable for large datasets and online learning since it iteratively changes the model variables in a direction that minimizes the loss. The learning rate
plays a crucial role in controlling the step size and convergence speed.
5. Performance Analysis
5.1. Experimental Setup
Configuring the hardware and software environments is crucial for strong and repeatable experiments. This section details the carefully selected settings that supported our research, ensuring the reliability and scalability of our research.
5.1.1. Hardware Configuration
In this study, Google Colab, a cloud-based platform known for its versatility in facilitating machine learning experiments, was used to facilitate the computation. The study’s computational infrastructure was enhanced by utilizing Google Colab’s “Pro” subscription, which provided access to premium resources. To speed up model development and training, both A100 and V100 Tensor Core GPUs (NVIDIA, Santa Clara, USA) were utilized within this subscription.
The availability of premium GPUs played a pivotal role in the research process. For computing-intensive experiments, the V100 GPUs, distinguished by their exceptional computing capabilities, were strategically employed. The A100 GPUs provided robust performance for various machine learning tasks. As a result of this dynamic allocation of GPU resources, machine learning models were trained efficiently across 100 epochs of experiments.
5.1.2. Software Configuration
Using Google Colab, the software environment was meticulously configured to integrate hardware resources and essential software tools. The Google Colab environment accommodated a wide range of software components:
Operating System: The research was conducted within the Google Colab environment, eliminating the need to manage the operating system manually. By abstracting the underlying operating system complexity, Colab provided a consistent and reliable environment.
Python: For modeling, Python served as the foundational programming language. It is regarded as one of the most prominent languages in machine learning. The codebase was executed using Python 3.10, enabling access to various machine learning libraries and frameworks.
Machine Learning Libraries: The study leveraged an ensemble of machine learning libraries, including Scikit-learn, Keras, and TensorFlow. The Colab environment makes it easy to develop and evaluate machine learning models using these libraries.
Data Preprocessing Tools: With Scikit-learn’s robust preprocessing module, data preprocessing tasks such as data cleaning, feature scaling, and encoding were seamlessly performed.
Hyperparameter: For each model, we use a different set of parameters and hyperparameters. We use the Gini impurity as the criterion to partition at each node and set the maximum depth to three in the Decision Tree classifier. Both parameters were chosen due to their simplicity computational simplicity. The Gini impurity is also suitable for multiclass classification. The maximum depth was also set to three because it gave the best result during testing. Similarly, in the Random Forest model, the random state is set at zero with a maximum depth of three. In KNN, the number of neighbors is set to seven. To determine the K-value, the odd values were first tested since this eliminates ties and gives a majority class. As a result, we tested K values of 3, 5, and 7, with 7 showing the highest level of performance. For SGD, the maximum iteration parameter is 100. The value was chosen based on resource availability and faster convergence of the SGD model. In conclusion, we used different values and combinations of parameters and hyperparameters to achieve the best results.
5.2. Metrics
A set of key performance metrics was used to evaluate our models, including precision, recall, accuracy, and F1-score. Each of these metrics is crucial in assessing the model’s performance.
The accuracy of the model’s favorable predictions is referred to as precision. It indicates the proportion of accurate positive predictions out of all positive examples. A model’s precision measures its ability to identify relevant instances while minimizing false positives. Precision is defined as: .
A model’s recall measures how well it can identify all relevant instances. It reflects the percentage of real positive predictions that were successfully detected, based on all actual positive events. The majority of positive cases are accurately captured by a model with a high recall. Recall is defined as .
A model’s accuracy is measured by how well it predicts the future. From all instances in the dataset, it represents the percentage of correctly classified instances (both true positives and true negatives). For all classes, accuracy provides an overview of the model’s performance. It is defined as .
An F1-score is a harmonic mean of precision and recall. This metric measures the model’s ability to achieve high precision and recall simultaneously. In imbalanced datasets, where precision and recall may trade off, the F1-score is especially useful. It is defined as .
Ultimately, these performance metrics provide a nuanced evaluation of machine learning models, encompassing precision, recall, accuracy, and F1-score. By considering these metrics, we can determine how well the models perform in various aspects of classification and prediction.
5.3. Results and Discussion
In this section, we delve into the findings of our study, starting with the visualization of our data and extending through the performance of various machine learning models in classifying clock-gating-assisted malware. Our results show the effectiveness of our approach as well as the complexity of the task.
5.3.1. Initial Data Characterization
Before applying machine learning algorithms, it is critical to understand the data’s inherent structure and any observable patterns. To this end, we employed a scatterplot, as depicted in
Figure 8, to visualize the distribution of different malware types alongside normal operation data. Classes 1, 2, 3, 4, and 5 represent Power Hungry,
killer, normal operations, PIT-off, and uart killer, respectively. This preliminary analysis helps to set expectations regarding the complexity of the classification task and to underscore the necessity for sophisticated analytical techniques such as machine learning. It becomes evident from this visualization that while certain malware classes, specifically classes 1 and 2, appear similar in the context of current consumption, others are more distinctly separable, suggesting varied levels of difficulty that one might encounter during the classification process.
5.3.2. Machine Learning Model Efficacy
Table 1 shows the comparison results of our proposed approaches. It can be seen that the K-Nearest Classifier and Logistic Regression achieved the highest accuracy, precision, recall, and F1-score values, above 0.97. The Decision Tree model achieved an accuracy of 0.80 and an F1-score of 0.73, slightly lower than the other models.
5.3.3. Training Performance Analysis
The machine learning models defined above were trained and evaluated over 100 epochs with our dataset, utilizing a data split of 70% for training, 15% for validation, and 15% for testing.
Figure 9,
Figure 10,
Figure 11,
Figure 12,
Figure 13 and
Figure 14 display accuracy and loss plots for both training and validation of Decision Tree, K-Nearest Neighbors, Linear Regression, Naive Bayes, Random Forests, and Stochastic Gradient Descents to evaluate their training performances. These figures provide an overview of each model’s training performance. The accuracy plots illustrate the models’ ability to learn from training data. Furthermore, the loss plots demonstrate the convergence of the models’ training processes, indicating their ability to minimize errors and improve prediction. Combined, these plots provide valuable insights into the training performances of the models and highlight their strengths and effectiveness.
5.3.4. Post-Training Classification Insights
After the training phase, we evaluated the performance of each model through confusion matrices, presented in
Figure 15. The confusion matrices of machine learning models, including Decision Tree (a), K-Nearest Neighbors (b), Linear Regression (c), Naive Bayes (d), Random Forest (e), and Stochastic Gradient Descent (f) are shown. These matrices provide a stark contrast to the initial scatterplot
Figure 8 by revealing the effectiveness of each algorithm in classifying the data post-learning. In these matrices, classes 0, 1, 2, 3, and 4 represent Power Hungry,
killer, normal operations, PIT-off, and uart killer, respectively.
The K-Nearest Neighbors (KNN) and Logistic Regression (LR) models notably outperformed other models with remarkably high accuracy rates of 99%. This impressive performance indicates not just their ability to learn from the training data but also their robustness in distinguishing between classes that appeared similar in the raw data. Their success in accurately classifying closely clustered data points, as seen in the scatter plot, validates their capability to handle real-world scenarios where malware types may not be distinctly separable.
The confusion matrices serve as a detailed record of each model’s classification strengths and potential areas for improvement. For instance, while the Decision Tree model demonstrated lower accuracy, this was mitigated by the Random Forest model, which leverages the power of multiple decision trees to improve the overall classification results. In light of this observation, careful model selection must be tailored to the specific characteristics of the dataset and the details of the classification task.
The efficacy of the KNN and LR models, in particular, suggests their strong potential for application in embedded systems for malware detection and prevention. These models have proven to be highly reliable in distinguishing clock-gating-assisted malware from legitimate operations. In contrast, the lower reliability of the Decision Tree model suggests that while it may contribute to an ensemble method like Random Forest, it might not be the best independent choice for this specific task.
In conclusion, the demonstrated ability of machine learning models, especially KNN and LR, to detect and prevent clock-gated malware holds great promise for enhancing the security framework of embedded systems. Their high performance in our evaluations underscores the valuable role that machine learning can play in improving system security and reliability against increasingly sophisticated cyber threats.
6. Conclusions
This study has presented a comprehensive approach to enhancing the security of embedded systems through the development of an Intrusion Detection System (IDS) that leverages machine learning techniques. By focusing on the classification of clock-gating-assisted malware, the research aimed to address the software threats that exploit clock-gating techniques in embedded systems. We identified and deployed four distinct types of malware on a testbed, namely, Power Hungry, PIT-off, uart killer, and killer, to test the efficacy of the proposed IDS. The integration of machine learning models with the “systemInit()” method provided a real-time response capability, crucial for reducing potential damages to the system.
Based on the evaluation, we assessed the performance of several machine learning models: K-Nearest Classifier (KNN), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Naive Bayes (NB), and Stochastic Gradient Descent (SGD). The models demonstrated a significant ability to detect and classify clock-gating-assisted malware, with accuracy scores ranging from 0.80 to 0.99. The KNN and LR models, in particular, showed exceptional performance and robustness, indicating their potential for real-world application in embedded systems security.
The findings suggest that machine learning models are not only capable of providing a reliable and efficient defense against clock-gating-assisted attacks but also show promise for the continued advancement of security measures in the face of sophisticated and evolving threats. Future research is encouraged to focus on refining these models to further enhance the detection and classification capabilities for a broader spectrum of malicious activities. Furthermore, the IDS could be extended to other types of embedded systems, adapting to different hardware configurations and operating environments. It may be possible to integrate real-time adaptive learning mechanisms in order to continually evolve in response to emerging malware threats in the future.
In conclusion, this paper underscores the critical need for implementing advanced security mechanisms in embedded systems. Embedding machine learning models into an IDS framework offers an effective way to protect embedded systems against the increasingly complex landscape of cyber threats.