1. Introduction
The Internet of Things (IoT) is revolutionizing various sectors by enabling massive data sharing and robust decision-making through connected devices. As a result, IoT scenarios are anticipated to generate large quantities of data in the near future [
1]. To harness the potential of these data, researchers are investigating the feasibility of storing and processing them in cloud-based environments using advanced artificial intelligence techniques, such as deep learning and machine learning algorithms [
2]. However, cloud-based data processing raises concerns about privacy and communication costs, which could potentially limit the benefits of such solutions [
3]. Furthermore, IoT systems demand distributed intelligent services that can adapt to dynamic environments in real time, posing challenges due to the inherent complexity and heterogeneity of IoT devices. Researchers have proposed a multi-objective optimization technique using the Glowworm Swarm Optimization (GSO) algorithm and a hybrid optimization approach combining the artificial bee colony method, genetic operators, and density correlation degree to enhance the performance of blockchain-based Industrial Internet of Things (IIoT) systems [
4,
5]. Federated learning (FL), a distributed machine learning framework that ensures privacy, reduces data transfer, and solves device heterogeneity while allowing real-time adaptation to various contexts, has thus emerged as a solution to these problems [
6].
FL is a distributed machine learning technique that allows for the training of algorithms across several edge devices, each of which has local datasets that are not shared or exchanged [
7]. By locally training models on IoT devices and aggregating the trained parameters on a central server, FL addresses data privacy concerns and adapts to the heterogeneity of IoT environments. FL has been successfully applied in various applications, such as mobile keyboard prediction [
8], autonomous industry [
9], remote health monitoring [
10], and medical image processing [
11], demonstrating its significant potential. By leveraging local computing capabilities and ensuring data security through local model training, FL not only improves training efficiency but also enhances privacy protection and model accuracy.
Recently, FL methods have been used in IoT-based systems to enable privacy-enhanced IoT systems by allowing multiple devices to coordinate with a central server for data training without sharing actual datasets [
12]. In IoT networks, devices act as workers communicating with an aggregator, performing neural network training to enhance training quality and minimize user privacy leakage. The robustness and potential of FL frameworks make them fundamental building blocks for sophisticated IoT applications.
Figure 1 shows the general design of FL in an IoT context, where IoT devices train their local models and send weights to the FL server. The FL server gathers all local model weights from IoT devices and updates its global model to acquire global model weights.
Although the potential of FL in IoT scenarios is huge, there are shortcomings of FL that need to be solved to implement FL in heterogeneous industrial IoT networks.
Figure 2 illustrates a scenario of heterogeneous FL in an IoT environment. In heterogeneous scenarios, clients do not share the same computational and energy resources. The dataset is dispersed uniquely, each with distinct attributes and properties. The aggregation from these IoT devices affects the performance of the global model due to inadequate training of clients. Therefore, performance optimization in such scenarios requires approaches that can handle the variations in computational and data distributions.
Despite these variations in scenarios, a majority of existing methods continue to employ identical training configurations (epochs) and model aggregation methods, irrespective of the scenario at hand. It is important to note that this approach could significantly affect the performance of the model. We discuss these challenges in the following.
1.1. Model Accuracy in Heterogeneous Industrial IoT with Non-IID Dataset
In heterogeneous industrial IoT environments with non-IID datasets, FL encounters the challenge of maintaining model accuracy. The non-IID data characterize situations in which the data distribution across IoT devices is uneven, unbalanced, or displays distinct features, a common occurrence in real-world IoT deployments. Additionally, IoT devices within extensive networks may possess varying resources for data processing, further contributing to heterogeneity. Consequently, these scenarios result in heterogeneity challenges stemming from diverse data collection rates, a variety of device types, and unique data patterns across devices, which can impact the performance of federated learning systems.
Specifically, the issue of model accuracy in heterogeneous industrial IoT environments arises from federated learning’s reliance on aggregating locally trained models to create a global model. In the presence of non-IID data, local models trained on individual devices may exhibit satisfactory performance on their respective local data but perform inadequately on data from other devices. As a result, when aggregating these local models, the ensuing global model may acquire lower accuracy or slow convergence due to discrepancies in data distributions and available computational resources.
1.2. Computation and Communication Efficiency
FL in heterogeneous industrial IoT scenarios also presents challenges in terms of computation and communication efficiency. Heterogeneous industrial IoT devices often possess varying computational resources, processing capabilities, and communication bandwidth, which can affect the overall performance and efficiency of a federated learning system.
In heterogeneous industrial IoT environments, devices with limited processing power may struggle to keep pace with more capable devices during the local training process. This discrepancy can lead to longer training times and slow convergence of the global model. Additionally, the energy constraints of battery-powered IoT devices may further exacerbate the problem, as these devices need to balance local computation with energy consumption.
Moreover, communication between IoT devices and the central server is critical for FL, as local model updates need to be transmitted and aggregated to form the global model. Heterogeneous industrial IoT devices often have varying communication capabilities and network conditions, which can lead to communication bottlenecks and increased latency.
1.3. Weight Aggregation Challenges in Heterogeneous Industrial IoT Scenarios
The problem of weight aggregation in heterogeneous industrial IoT devices for federated learning is due to inherent complexities and variations in device capabilities, data distribution and available resources. Heterogeneous industrial IoT devices encompass resource-constrained edge servers, limited resources, limited computing power, and battery endurance, which can affect the weight aggregation process during federated learning. In heterogeneous industrial IoT environments, local model updates from individual devices may differ significantly due to the varying data distribution, device resources, and training characteristics. Consequently, the central server must carefully aggregate these diverse local model updates to create a robust and accurate global model. Moreover, non-IID data are common in heterogeneous industrial IoT environments, leading to an imbalanced data distribution across devices. The central server must account for this imbalance during weight aggregation to avoid biases and ensure an accurate representation of the underlying data. IoT devices with limited processing power may not perform as many training iterations on their local data as other IoT devices. Therefore, the aggregation of those straggler IoT devices affects the overall performance of the global model accuracy and convergence.
FL with a reinforcement learning approach has evolved in recent years as a potential way to train machine learning models in diverse and decentralized environments. However, the implementation of FL provides considerable hurdles due to the variability in data distribution, computational capacity, and resource availability among IoT devices. There is still potential for advancement in tackling the difficulties of implementing federated learning in dynamic IoT contexts, even though numerous studies on the topics of “federated learning” and “heterogeneous networks” have been published.
This paper proposes an Uncertainty-Aware Federated Reinforcement Learning (UA-FedRL) method, which dynamically selects local epochs to manage heterogeneous industrial IoT devices in federated learning networks more effectively. In combination with the Predictive Weighted Average Aggregation (PWA) method, UA-FedRL provides a comprehensive solution to address the complexities and challenges of implementing federated learning in heterogeneous industrial IoT environments and acquiring high accuracy with communication and computation efficiency. Our proposed methods demonstrate outstanding results in harsh-IoT-environment simulations, outperforming other federated learning approaches in terms of accuracy, communication, and computational efficiency. Moreover, Software-Defined Networking (SDN) technology is implemented in the core network of the IoT environment with a focus on achieving a lower communication latency.
The contributions of this study can be summarized as follows:
We propose a UA-FedRL method that addresses the challenges associated with the heterogeneity of IoT devices in non-IID data distribution scenarios. The UA-FedRL method dynamically selects local epochs, improving accuracy, computation, and communication efficiency while effectively managing heterogeneous industrial IoT devices in federated learning networks.
We introduce the PWA method to address weight aggregation issues commonly found in heterogeneous industrial IoT scenarios. This method calculates the weight quality of every local IoT device using the predictive log-likelihood of the validation accuracy of the local model. The weights of individual models are adjusted based on their quality to mitigate the impact of heterogeneity and non-IID issues during aggregation.
The rest of this paper is structured as follows.
Section 2 presents a comprehensive literature overview that highlights key breakthroughs in FL for the IoT and identifies gaps in existing research techniques.
Section 3 discusses the problem formulation of the study.
Section 4 discusses in detail our proposed UA-FedRL technique and PWA approach, describing their conceptual basis.
Section 5 discusses the parameters of our simulation environments and presents the findings of our comparative studies. We compare the performance of UA-FedRL with that of existing FL systems in terms of accuracy, communication efficiency, and computation efficiency. Finally, in
Section 6, we conclude our paper by summarizing our results, discussing their significance for the discipline, and providing an outline of prospective future work in this area.
3. Problem Formulation
FL is utilized to accommodate numerous devices that accumulate data and a central server that manages the global learning objective throughout the network of devices. Important notation used in this paper is indicated in
Table 1.
We have
N IoT devices, each with a local dataset
and a computational cost represented by
. The goal is to minimize a federated objective function
defined as
where
is the local objective function of device
i.
At each round
t of the FL process, device
i selects a random subset
of its local data
and performs local training on
using the current model parameters
to obtain a new set of parameters
. Then, the device
i sends
to a central server. The central server aggregates the received model updates to obtain a new global model parameter
:
where
is the new global model parameter,
is the local model parameter from device
i,
N is the number of clients, and
t is the training round. The new global model parameter
is sent back to all devices, and the process repeats until convergence.
However, a significant challenge emerges when dealing with heterogeneous industrial IoT devices. These devices are characterized by their diverse computational capacities, available energy resources, and data sizes, which can vary widely from one device to another. This heterogeneity presents a formidable task when it comes to selecting the optimal number of epochs for each IoT device in the context of FL.
Currently, FL methods use the same number of epochs for every participating device without considering the heterogeneous issues present in IoT devices. However, if the epoch count is set too low to reduce the usage of energy resources and computation resources, the model might be unfit. Thus, the model fails to learn the underlying patterns in the data and performs poorly on real-world data because it has not adequately captured the complexity of the data.
On the other hand, if the epoch count is set too high, the model might overfit the local data on the device. Overfitting occurs when the model learns the training data too well, to the point that it captures not only the underlying patterns but also the noise or random fluctuations in the data. Moreover, it also increases the usage of energy and computational resources, which reduces the usability of IoT devices for a long time.
Therefore, selecting an optimal number of epochs for each device is a complex but crucial problem in FL on heterogeneous industrial IoT devices. An optimal number of epochs for each device should be selected to ensure that the model neither underfits nor overfits the data and also reduces the consumption of energy and computation resources.
The objective is to minimize the expected loss over all devices while accounting for the communication and computation costs,
where
is the number of epochs selected for device
i,
is the accuracy,
is the loss function of the model on dataset
x after
epochs, and
is the total cost, which includes the sum of the computation and communication cost,
quantifies the reduction in communication rounds, and
represents the variability due to the distribution of non-IID data between devices.
,
, and
are introduced to control the trade-offs between model quality, total cost, communication rounds, and data distribution, respectively. The relevant total cost
can be defined as
The communication cost can be defined as
where
is the bits required to transmit the model parameters
from device
i to the central server, and
is the number of bits required to transmit a unique ID from device
i to the central server, which is specific to our proposed method. The computational energy consumption can be defined as
where
is the client’s data sample,
is the required central processing unit for handling each data sample,
is the number of epochs,
is the size of the deep learning model implemented in each client,
is the central processing unit frequency, and
is the complexity of the learning task.
The challenge is to design an adaptive and efficient algorithm that can learn the optimal number of epochs for each device, based on its local characteristics and the current state of the model, while minimizing communication and computation costs. This can be formulated as UA-FedRL, where the agent learns a policy for each participating IoT device that maps the current state (e.g., the model’s performance on its local data) to the action (the number of epochs to select). The objective is to achieve the highest predicted cumulative reward, which is a function of the learned model’s quality while reducing communication and computing costs.
Weight Aggregation in Heterogeneous Industrial IoT Devices
Weight parameter aggregation also poses a great challenge to obtain the optimal performance of the global model in FL. A device with larger local datasets and higher computational resources may take more epochs to converge compared to a device with a smaller dataset and lower computational resources. The optimal number of epochs for each device to train a model depends on numerous factors such as quantity of data, quality of the data, computational resources, size of the model, etc. Therefore, it poses a challenge to aggregate the global model from heterogeneous industrial IoT devices and achieve an accurate global model accuracy.
Let
be the model parameters of device
i after
training epochs, and let
be the computational resources available on device
i. The objective function for our system can be defined using (1)
To minimize the local object function of device
i, the optimal number of epochs can be defined as
However, the system may choose a different number of epochs,
, instead of the optimal number of epochs, which may degrade the performance of the global model. Moreover, different devices may have different magnitudes and directions because of training local models with different epochs, making it difficult to directly aggregate them. Therefore, different aggregation methods compared to traditional aggregation methods must be proposed to accommodate variable epochs. This study proposes a PWA method to solve the heterogeneous industrial IoT device aggregation problem, which is discussed in
Section 4.
4. Construction of the UA-FedRL Method
This section is dedicated to the detailed structure of the UA-FedRL method to solve the issues found in heterogeneous industrial IoT device networks in FL, such as (1) optimizing FL to accommodate the participation of heterogeneous industrial IoT devices to obtain optimal global model accuracy and (2) introducing a novel aggregation method to mitigate heterogeneity in the IoT network.
4.1. Uncertainty-Aware Reinforcement Learning-Based Optimal Epoch Selection
4.1.1. Preliminaries
This section is dedicated to describing the preliminaries of the uncertainty-aware reinforcement learning (UA-RL) method, which is proposed in this paper.
We assume that the agent’s policy is represented by a neural network that takes the state and action as input and outputs a distribution over the number of epochs. Specifically, let
be the action (number of epochs) selected by agent
i at time step
t, and let
be the state observed by agent
i at time step
t. The agent’s policy is given by:
where
is the output of a neural network with weights
, which takes
as input and outputs a distribution over the number of epochs. We assume that the neural network has one hidden layer with hidden units and a Gaussian prior over the weights:
where
and
are learnable parameters that are updated during training. Bayes by backprop with variational inference can be used to approximate the posterior distribution over the weights given to the data. We assume a Gaussian distribution over the weights with a mean and variance that are parameterized by the neural network as follows,
We aim to learn the parameters
and the weights
of the neural network with variational inference using the following loss equation:
The first term corresponds to the expected negative log-likelihood of the actions under the policy and the assumed Gaussian distribution over the number of epochs. The second term is the Kullback–Leibler (KL) divergence between the approximate posterior
and the prior
. The hyperparameter
controls the strength of the regularization, which can be defined as follows,
The upper confidence bound (UCB) algorithm is used to select actions that maximize a combination of the expected reward and the uncertainty of the estimate of the value of the action. Uncertainty is captured by the variance of the approximate posterior distribution over the weights.
The uncertainty can be computed by sampling
from the approximate posterior distribution
and computing the estimate of the action value
for each action a using the weighted neural network. The relevant equation of variance of the action value estimate is defined as
where
is the feature vector for the state–action pair
. The UCB
for each action is then calculated as
where
is the mean of the approximate posterior distribution
, and
is a hyperparameter that controls the balance between exploration and exploitation. The action with the highest upper confidence bound is selected as the next action to take.
4.1.2. Proposed Method
The optimum epoch selection can be represented as an MDP problem using the equation
. UA-RL is then used to explore the action to select the optimal epoch for each IoT device to acquire the best accuracy. To achieve lower latency, SDN and MEC are employed in the core of heterogeneous industrial IoT networks. SDN facilitates the separation of the control plane from the data plane within the communication network. The control plane handles decision-making related to network traffic, while the data plane is responsible for the actual forwarding of this traffic. This study utilized multi-agent reinforcement learning approach, where each client is assigned to a UA-RL agent to select epochs for each device.
Figure 3 shows the detailed workflow of the UA-RL method in selecting local epochs for each client considering the status of each client. After monitoring the current state
of the assigned device at time step
t, each agent performs an action
. The central server calculates team rewards
considering the accuracy, available resources, and total cost of the training model. The aim of the UA-FedRL method is to acquire the maximum reward by exploring an optimal policy. The state space, action space, and reward can be defined as follows,
- 1.
State space: The state space of the UA-FedRL method at time step
t consists of available computation resources
, model state,
, and epoch selection state,
. The state space of UA-FedRL can be defined as
- 2.
Action space: The action of the UA-FedRL method is to select epochs for each IoT device at each time step to find the optimal epoch for each device. The action space of the UA-FedRL model can be defined as
where
E is the set of possible epochs for each IoT device, and
is the next epoch number for the
ith device.
- 3.
Reward space: The reward function is used by the agent to evaluate the action taken to find the optimal action, i.e., the optimal epoch number. The UA-FedRL model’s reward is defined as
where
k is the model accuracy,
is the loss value of the training model from the
ith IoT device, and
is the total cost of training the model on the
ith IoT device. The global reward can be defined as
The detailed procedure of the proposed UA-FedRL to select local training epochs can be seen in Algorithm 1. The algorithm starts by initializing the weights
of the neural network with a Gaussian prior
. Additionally, we initialize the hyperparameters
,
,
, and
N, where
N represents the number of agents. During the training phase, the algorithm iterates over multiple episodes. Within each episode, the algorithm loops through several time steps
t. At each time step, it observes the current state
and iterates on each agent
i. For each agent, we sample the weights
from the approximate posterior distribution
and compute the action value estimates
for all possible actions a. We then calculate the upper confidence bounds
for all actions and select the action
that has the highest upper confidence bound. After determining the actions for all agents, we perform the composite action
and observe the reward
. The weights of the neural network
are then updated using backpropagation with a loss function that incorporates the log-likelihood of the action, the difference between the observed reward and the estimated action value, and the KL divergence between the approximate posterior distribution and the prior distribution. The hyperparameter update of
is performed using the calculation method given in (14).
Algorithm 1 UA-FedRL with UCB-based epoch selection to maximize performance in heterogenous IoT. |
- 1:
Initialize the neural network weights with Gaussian prior - 2:
Initialize the hyperparameters , , , and N - 3:
for each episode do - 4:
for each time step t do - 5:
Observe state - 6:
for each agent i do - 7:
Sample from the approximate posterior distribution - 8:
Compute the action value estimates for all actions a - 9:
Compute the upper confidence bounds for all actions a - 10:
Select action - 11:
end for - 12:
Perform action - 13:
Observe reward - 14:
Update the neural network weights using backpropagation with the loss equation:
- 15:
Update the hyperparameter - 16:
end for - 17:
end for
|
4.2. Predictive Weighted Average Aggregation
PWA is designed to aggregate model updates from heterogeneous industrial IoT devices with different local epoch counts. The PWA method is responsible for assigning weights to each IoT device on the quality of each device’s local model updates and combining weighted weights to update the global model. The PWA method first computes weight quality
for each device, which represents the degree of confidence in the accuracy of the model updates from device
i. We assume that the weight quality
can be calculated using the predictive log-likelihood of the local model on the validation set. We assume
is the validation set from device
i, and
is the loss function that measures the discrepancy between the predicted output
and the true output
for the
jth data point in
. The total validation loss across all data points in the validation set can be expressed as follows,
The above equation represents the cumulative error of the model’s overall validation data points. The above equation is transformed into the logarithmic equation to simplify the product of the losses, which is computationally more stable and interpretable as follows,
In order to translate the validation performance of each model into a weight, we use the negative exponential function. The negative exponential of the total validation loss ensures that models with lower losses get higher weights. Additionally, we introduce a hyperparameter
to control the sensitivity of the weight to the validation performance. We can calculate the weight quality of the device
i as
where
is a hyperparameter that controls the strength of the weighting function. The higher weight quality indicates the local model updates are more accurate and reliable and will be given more weight during global model aggregation. The PWA method then combines the model updates from all devices by weighting them according to their quality measures. The updated global model parameter
is given by:
Therefore, the model updates from devices with higher-quality measures contribute more to the global model than those with lower-quality measures, resulting in a more accurate and robust global model.
The weight adjustment in the PWA method is implemented to ensure the quality-based contribution and regularization with the hyperparameter . The weight reflects the predictive quality of the local model on device i, based on its validation performance, ensuring that models with a lower validation loss contribute more to the global update. This is especially important in non-IID environments, where data distributions vary significantly across devices. The parameter serves as a regularizer, controlling the sensitivity of the weight adjustment. A larger increases the impact of high-performing devices by accentuating differences in their predictive performance, while a smaller smooths out the contributions from all devices, resulting in a more uniform aggregation. This balance helps prevent the global model from overly relying on any single device, improving generalization and robustness.
The PWA method enhances overall model performance by improving convergence, reducing the heterogeneity gap, and increasing robustness. It improves convergence by prioritizing high-quality model updates and reducing the variance from poorly performing devices, leading to a faster convergence rate of compared to the traditional in FedAvg. The heterogeneity gap , which measures the difference between local and global model updates, is also reduced, as devices with large gaps contribute less to the global model. This results in faster convergence and more accurate global models. Additionally, by assigning lower weights to outlier devices with skewed data distributions, PWA improves the robustness of the global model, making it more resilient to non-IID data. The regularization parameter helps balance high- and low-quality contributions, preventing overfitting to any one device’s data.
The overall architecture of the PWA and related pseudocode can be seen in
Figure 4 and Algorithm 2, respectively. For further details, refer to the
supplementary matters.
Algorithm 2 PWA for heterogeneous industrial IoT mitigation. |
- Require:
N devices with local datasets and local models , quality measures - Ensure:
Global model parameter - 1:
Set learning rate - 2:
for each round t do - 3:
for each device i do - 4:
Select a random subset of - 5:
Train local model on for epochs - 6:
- 7:
end for - 8:
Compute weighted average:
- 9:
Update global model parameter:
- 10:
end for
|
5. Experimental Evaluation
5.1. Experimental Setup
The relevant experiments of the UA-FedRL model were carried out on Ubuntu 18.04 desktop with AMD Ryzen 5 3500x cpu @ 3.6 Hhz, Nvidia RTX 3080, and 32 GBytes of RAM. Relevant codes were written in Python scripts using Python 3.8. To design a communication network, a Mininet emulator was utilized. The Mininet emulator is programmed in the Python language and is publicly accessible for researchers at the Mininet official website (
http://mininet.org/, accessed on 1 August 2024). It uses the Linux kernel to create a real network with virtualized end-hosts, switches, routers, and links. It provides built-in support for the SDN architecture. Two commonly used datasets were selected to train local IoT devices. The local datasets were distributed with a non-IID distribution across IoT devices for creating heterogeneous scenarios. A Convolutional Neural Network (CNN) model was designed and distributed to all participating IoT devices for the purpose of training their local models. The architecture of this CNN comprised two convolutional layers, each with a kernel size of five. The optimization of this network was carried out using the SGD method, with a learning rate set at 0.01. For the UA-FedRL algorithm, multiple parameters were meticulously chosen to thoroughly investigate its performance under various conditions. The learning rate for UA-FedRL’s
was set at values of 0.1, 0.5, and 0.9. These values were selected to study the algorithm behavior under gradual, moderate, and rapid learning processes. Similarly, the discount factor
was set at 0.1, 0.5, and 0.9 to analyze the agent’s preference for immediate rewards as opposed to long-term gains. The episode counts
were chosen as 20 and 100, in order to discern the relationship between the number of learning iterations and the overall performance of the algorithm. The UA-FedRL model was benchmarked against several state-of-the-art FL models, including FedAVG, FedProx, FedShare, and FedSGD. Default parameter values, simulation environment, and models’ hyperparameter settings used for the experiments are presented in
Table 2.
The performance evaluation of our UA-FedRL method was conducted by simulating 100 client devices, each with varying resource capabilities. Five Raspberry Pi models were selected for this test due to their common use as IoT devices in real-world environments. These Raspberry Pi variants offered a range of CPU frequencies, from 700 MHz to 1.5 GHz, and varied battery capacities, from 60% to 100%. The Raspberry Pi 4 Model B was identified as the primary IoT device, while the remaining devices were categorized as stragglers. Detailed hardware specifications for each chosen device can be seen in
Figure 5.
A communication simulator was also introduced to both client- and server-side training to simulate communication between server and client. The Mininet 2.3.0 Python simulator was used to configure every client as a host, which is similar to IoT devices in a real-world environment. The model parameters from each host were then converted to a byte stream prior to being sent to the server. The byte stream was then divided into packets according to the size of the packet. These packets were then sent to the server side to further process the model parameters. The overall data flow of the implemented communication simulator between the client and server can be seen in
Figure 6.
5.2. Non-IID Data Distribution Method
The non-IID Data distribution was designed to distribute a dataset into subsets across multiple clients. This was achieved by first sorting the dataset based on labels, then dividing it into shards, and finally distributing these shards across clients. The method ensured that each client received a specific number of shards, but the data within those shards were not uniformly distributed across classes.
- 1.
Sorting by labels: Given a dataset D with n samples, each associated with a label from a set L, the dataset is sorted on the basis of these labels. This results in a sequence where samples with the same labels are grouped together.
- 2.
Shard creation: The sorted dataset
is divided into
N shards, each containing S samples. Thus, each shard
is defined as:
- 3.
Data allocation to clients: For each client
c, a specific number
X of shards are randomly selected without replacement from the set of all shards. The union of samples from these selected shards forms the non-IID dataset for client
c. Mathematically, the dataset
for client
c is:
where
are the indices of the shards selected for client
c.
This study considered three non-IID data distribution scenarios such as low (20%), medium (50%) and high (80%) to evaluate the performance of the UA-FedRL model. The relevant parameters used to create scenarios can be seen in
Table 3.
5.3. Hyperparameter Optimization
Figure 7 shows the reward values obtained by UA-FedRL over the course of 20 episodes, for three different learning rates. The
x-axis represents the episodes, while the
y-axis represents the reward values. The learning rate was fixed at 0.1 and different discount factors were used to acquire rewards for epoch selection.
It can be seen that the rewards fluctuated for initial episodes until they converged between 0.9 and 0.98. Therefore, it can be said that the choice of learning rate has an impact on the rate and magnitude of the agent’s reward improvement over time. However, in that case, the different learning rates produced similar results, suggesting that the agent was able to learn effectively regardless of the specific learning rate chosen.
Figure 8 shows the reward values obtained by UA-FedRL over the course of 20 episodes, for a 0.5 learning rate and three different discount factors such as 0.1, 0.5, and 0.9. The x-axis represents the episodes, while the y-axis represents the reward values. Each discount factor is represented by a different style in the plot.
For each discount factor, the agent’s reward value started at a relatively low value and increased over the course of the episodes, eventually becoming stable at a high value. The three discount factors produced different results, with the rewards for each discount factor being significantly different.
The acquired result demonstrates that the choice of learning rate has a significant impact on the rate and magnitude of the agent’s reward improvement over time. In that case, the choice of learning rate resulted in significant differences in the final reward values obtained by the agent.
Figure 9 shows the acquired rewards of the UA-FedRL with a 0.9 learning rate and different discount factors. UA-FedRL was run for 20 episodes, and it can be seen that the rewards increased over the episodes. However, the rewards with a learning rate of 0.9 were little bit lower compared to those with other learning rates.
The acquired result clearly indicates that the selection of the learning rate considerably influences the speed and extent of the agent’s reward enhancement over time. In that case, the chosen learning rate caused notable differences in the acquired reward values garnered by the agent.
The optimal parameters were selected after evaluating the accumulated rewards of different combinations. It can be seen that UA-FedRL model acquired the highest rewards when the learning rate and gamma were set to 0.1. Therefore, the optimal parameters were used to further evaluate the performance of the UA-FedRL model.
5.4. Performance Comparison on MNIST Dataset
Figure 10 shows a comparison of the performance of the UA-FedRL method with a number of existing FL methods. This comparison assessed the precision achieved after 100 rounds of training, with 100 clients selected for all FL methods. All models were trained on non-IID data, which are commonly encountered in real-world scenarios.
As shown in
Figure 10, the Fed_AVG, Fed_Prox, and UAFedRL methods provided superior results compared to the Fed SGD and Fed Share approaches. The FedProx method, specifically designed to address heterogeneity and non-IID data in IoT contexts, demonstrated its effectiveness in the graph. On the contrary, the UA-FedRL method employs Bayesian techniques to tackle heterogeneity in IoT devices and leverages predictive weighted aggregation to enhance robustness when learning from non-IID data.
By achieving an accuracy of 96.45%, the UA-FedRL method outperformed existing FL approaches for IoT devices. This result suggests that the UA-FedRL model is well suited for heterogeneous industrial IoT device scenarios, ensuring more robust learning capabilities in real-world applications. The detailed accuracy of all FL methods used in this experiment is shown in
Table 4.
The three best FL methods from
Figure 10 were further evaluated to demonstrate their effectiveness in handling straggling devices with different non-IID data distributions. In real-world scenarios, IoT devices in an FL network may possess varying computational capacities. Consequently, it is essential to conduct experiments simulating these conditions to identify suitable methods that accommodate heterogeneous industrial IoT devices.
Figure 11 describes the performance metrics of three methodologies tested across 100 clients, 90% of whom were identified as stragglers. These methods were trained for 100 rounds and assessed various non-IID data distributions.
Figure 11a provides an assessment of the Fed_Avg methodology under three distinct non-IID conditions: low, medium, and high. The observation suggests that while Fed_Avg showcased proficient accuracy rates, surpassing 90% under low non-IID data conditions, its efficiency notably decreased with higher levels of non-IID data distribution. This pattern underlined Fed_Avg’s limited adaptability to higher heterogeneity in data and training environments.
In contrast, both the Fed_Prox and UA-FedRL methodologies presented robust performance metrics under varying non-IID conditions, as depicted in
Figure 11b,c, respectively. Both methodologies surpassed the 90% accuracy benchmark on various non-IID data spectrums. The architectural premise of the Fed_Prox method is calibrated to accommodate the inherent heterogeneity of IoT devices, incorporating a proximal component to counterbalance disparities during global model updates.
The UA-FedRL algorithm adopts an innovative approach by integrating Bayes by backprop coupled with variational inference reinforcement learning. This fusion facilitates dynamic adjustments to local epochs, depending upon individual device computational capabilities. The introduction of the predictive weighted average aggregation method thereby enhances the cumulative accuracy coefficient of the UA-FedRL framework. UA-FedRL achieved an accuracy of 96.45%, surpassing its FL counterparts. The results acquired for all participating algorithms can be seen in
Table 5. This result demonstrates the potential of the proposed method for the effective management of heterogeneous industrial IoT devices in FL networks.
5.5. Performance Comparison on CIFAR-10 Dataset
Figure 12 compares the performance of various FL methods based on their accuracy. The purpose of this comparison was to highlight the effectiveness of the UA-FedRL method in comparison to existing FL techniques. The Fed_Avg method, which is based on the standard federated averaging algorithm, obtained an accuracy of 50.95%. This approach involves a weighted averaging of local model updates from participating clients to update the global model. Despite its popularity, the relatively low accuracy rate of the Fed_Avg method for non-IID data distribution on the CIFAR10 dataset reveals potential limitations in its effectiveness for non-IID datasets. The Fed_Prox method, which is an extension of the Fed_Avg algorithm, introduces a proximity term to penalize local updates that deviate significantly from the global model. This method achieved an accuracy of 60.37%, indicating a significant improvement over the Fed_Avg method. The Fed_SGD method, which employs SGD in an FL environment, yielded an accuracy of 45.98%. Although this approach is simple to implement and has been extensively studied, the results demonstrate that it may not always be the most suitable choice to achieve the highest accuracy rates in FL. The Fed_Share method, a communication-efficient approach that reduces the quantity of data exchanged between clients and the central server by sharing selected model parameters, achieved an accuracy of 43.63%. The significantly lower accuracy rate compared to the other methods indicates that the increases in communication efficiency may come at the cost of reduced model performance.
However, the UA-FedRL method outperformed all other techniques by achieving an accuracy rate of 62.75%. This result demonstrates the prospective benefits of the UA-FedRL method to improve the overall performance of FL models. The increased accuracy rate demonstrates that the new approach can offer significant benefits over existing methods and pave the way for more effective applications of FL in heterogeneous industrial IoT scenarios. The acquired accuracy of the implemented models can be seen in
Table 6.
We further evaluated the effectiveness of the top three FL methods from
Figure 13 in handling straggling devices, a common challenge in real-world scenarios where IoT devices in an FL network exhibit varying computational capacities.
Figure 13 presents a comparison of three FL strategies, evaluating their accuracy on the CIFAR-10 dataset amidst challenges like 90% stragglers and different non-IID levels. The traditional Fed_Avg method, employing weighted averaging from client models for global updates, reached a maximum accuracy of 50% under low non-IID data but that accuracy decreased with higher data heterogeneity. On the other hand, Fed_Prox, an advanced version of federated averaging, achieved an accuracy of around 55%. In particular, its efficiency was also affected by the presence of high non-IID data distributions with a lower accuracy of 44%.
In contrast, the UA-FedRL mechanism introduces Bayes by backprop and variational inference reinforcement learning, allowing adaptive local epoch adjustments based on individual device capabilities. Furthermore, the integration of the predictive weighted average aggregation technique enables UA-FedRL to consistently demonstrate superior accuracy, surpassing 60%. Unlike its counterparts, this method exhibited good stability, maintaining consistent performance at varying levels of non-IID data, as detailed in
Table 7.
These findings demonstrate the potential of the UA-FedRL method to effectively manage heterogeneous industrial IoT devices in FL networks. The superior performance of the UA-FedRL method highlights its applicability in scenarios where IoT devices with varying computational capacities are prevalent, ensuring efficient and accurate FL outcomes.
5.6. Ablation Study
An ablation study was conducted to observe the individual and combined effects of the UA-RL and PWA modules in our UA-FedRL approach, which can be seen in
Table 8. When the UA-RL and PWA modules were implemented independently, the accuracy of the model was increased to 93.34% and 92.54% in the MNIST dataset and 60.95% and 60.34% in the CIFAR-10 dataset, respectively. This highlights the effectiveness of the UA-RL to effectively select epochs in heterogeneous industrial IoT settings and the PWA’s ability to boost model accuracy by aggregating model weights in line with their quality.
However, the UA-FedRL approach achieved good accuracy when both submodules were combined together, achieving an accuracy of 96.87% and 32.35% on the MNIST and CIFAR-10 datasets, respectively. This outcome validates the dynamic combination between the UA-RL and PWA within the UA-FedRL method.
Table 9 shows the accuracy of UA-FedRL as the number of IoT devices increased on the MNIST and CIFAR-10 datasets. For MNIST, accuracy slightly decreased as the number of devices increased, starting from 96.87% with 100 devices to 96.04% with 250 devices. Similarly, on CIFAR-10, accuracy dropped from 62.73% with 100 devices to 61.98% with 250 devices. It suggests that as more devices participate, the challenge of handling diverse data distributions may slightly affect the overall accuracy.
5.7. Communication Efficiency
Figure 14 presents a comparative study of the normalized communication costs associated with different FL methods, Fed_Share, Fed_SGD, Fed_AVG, Fed_Prox, and UA-FedRL, utilizing two datasets, MNIST and CIFAR10. Each method is represented on the x-axis, while the y-axis quantifies the associated normalized communication cost. Two distinct bars represent each FL method, corresponding to the MNIST (light blue) and CIFAR-10 (light yellow) datasets. The experiment was designed to achieve a target accuracy of 90% for the MNIST dataset and 45% for the CIFAR-10 dataset. Different FL methods with 100 clients were implemented to calculate the normalized communication cost for each dataset. The detailed calculation description of normalized communication cost for a single FL model is given below. The communication cost for each device during a communication round was calculated using the following equation:
The total communication cost for all devices at round t was then calculated as follows:
The cumulative communication cost across all rounds to achieve the target accuracy was defined as:
Lastly, the normalized communication cost for MNIST and CIFAR-10, given the target accuracies of 90% and 45%, respectively, was calculated as:
where
is the minimum communication cost achieved to reach the target accuracy.
is the maximum possible communication cost given a 90% accuracy target for MNIST.
where
is the minimum communication cost achieved to reach the target accuracy.
is the maximum possible communication cost given a 45% accuracy target for the CIFAR-10 dataset.
The results represent the variation in efficiency among FL methods to reduce communication costs. UA-FedRL emerged as the most communication-efficient method for both datasets, demonstrating costs of 0.19 and 0.24 for MNIST and CIFAR-10, respectively. On the other hand, Fed_Share exhibited the highest communication costs for both datasets, peaking particularly for CIFAR-10 with a cost of 0.9. Therefore, the chart highlights the critical role of selecting the appropriate FL method for optimizing communication costs in heterogeneous industrial IoT scenarios.
5.8. Computation Efficiency
Figure 15 shows a comparative study of the energy consumption related to a range of FL methods, namely, Fed_Share, Fed_SGD, Fed_AVG, Fed_Prox, and UA-FedRL. These methods were tested on two distinct datasets, MNIST and CIFAR-10. The normalized energy consumption, presented on the y-axis, effectively illustrates the efficiency of each method in terms of energy utilization. Each method’s impact is separately depicted for both MNIST and CIFAR-10 datasets using sky-blue and salmon-colored bars, respectively. The experiment was designed to achieve a target accuracy of 90% for the MNIST dataset and 45% for the CIFAR-10 dataset. The energy consumption for each device during a communication round was calculated using the following equation:
The total communication cost for all devices at round t was then calculated as follows:
The cumulative communication cost across all rounds to achieve the target accuracy was defined as:
Lastly, the normalized communication cost for MNIST and CIFAR-10, given the target accuracies of 90% and 45%, respectively, was calculated as:
where
is the minimum communication cost achieved to reach the target accuracy.
is the maximum possible communication cost given a 90% accuracy target for MNIST.
where
is the minimum communication cost achieved to reach the target accuracy.
is the maximum possible communication cost given a 45% accuracy target for the CIFAR-10 dataset.
The UA-FedRL method emerged as the most energy-efficient model across both datasets, acquiring the least normalized energy consumption values of 0.25 and 0.19 for MNIST and CIFAR-10, respectively. On the contrary, Fed_SGD consumed the most energy for the CIFAR-10 dataset with a value of 0.78, while Fed_AVG showed the highest energy use on the MNIST dataset at 0.86. These results highlight the energy efficiency of different FL methods when deployed on different datasets.
Table 10 presents a comprehensive comparison of different federated learning algorithms, including Fed_Share, Fed_SGD, Fed_AVG, Fed_Prox, and UA-FedRL, across three metrics: accuracy (with 95% confidence intervals), communication cost, and energy consumption on both the MNIST and CIFAR-10 datasets. UA-FedRL demonstrated the highest accuracy on both datasets (96.87% on MNIST and 62.73% on CIFAR-10), which reflects its effectiveness in handling non-IID data distributions. Additionally, UA-FedRL had the lowest communication cost (0.19 for MNIST and 0.24 for CIFAR-10) and energy consumption (0.25 for MNIST and 0.19 for CIFAR-10) which demonstrates its efficiency in resource-constrained environments. Other methods like Fed_AVG and Fed_Prox achieved competitive accuracy but with higher communication costs and energy consumption, which indicates a trade-off between model performance and resource utilization.
5.9. Uncertainty Estimation
Figure 16 illustrates the performance of an agent using UCB and uncertainty estimation in a reinforcement learning setting. The x-axis represents the number of episodes, while the y-axis represents the expected reward obtained by the agent. The blue dots represent the rewards obtained by the agent during each episode, while the solid red line represents the expected reward predicted by the UCB algorithm. The shaded pink area around the red line represents the uncertainty estimate of the algorithm.
Overall, the UCB algorithm performed better as it achieved a higher expected reward in fewer episodes. Moreover, the uncertainty estimate of the UCB algorithm was not large, indicating that the UCB algorithm was more certain about its predictions. Therefore, the UA-FedRL method can help guide the development of more effective reinforcement learning algorithms in the future for FL applications.
6. Conclusions
This study presented the UA-FedRL method, an innovative approach designed to tackle the challenges of implementing FL in heterogeneous industrial IoT environments. By dynamically selecting local epochs, UA-FedRL effectively managed the complexities of non-IID datasets and straggler IoT devices, improving accuracy, computation, and communication efficiency. The proposed PWA method further enhanced performance by addressing weight aggregation issues and adjusting the weights of individual models based on their quality. Two commonly used datasets, MNIST and CIFAR-10, were employed to perform extensive experiments of the proposed UA-FedRL method. UA-FedRL obtained an accuracy of 96.45% on the MNIST dataset and 62.75% on the CIFAR-10 dataset, when 90% straggler devices were used to train the model. Furthermore, uncertainty estimation showed the effectiveness of UCB algorithms in achieving good performance in decision-making tasks. These results demonstrate that UA-FedRL outperformed the benchmark in terms of faster convergence and higher training accuracy on both datasets, indicating its potential for enhancing the performance of FL in heterogeneous industrial IoT environments.
While our proposed UA-FedRL method provided good performance, there are some limitations which include the reliance on stable network conditions. Therefore, the communication between devices and the central server may be affected by unstable connectivity which can impact the overall performance. Additionally, the method’s performance may vary depending on the degree of data heterogeneity, as highly variable data distributions might reduce the effectiveness of the quality-based weighting mechanism. To address these limitations, future research could explore adaptive communication strategies to optimize network performance and data normalization techniques to better handle extreme data variability. Furthermore, we can optimize the PWA method by integrating additional factors such as temporal stability for aggregation. Moreover, clustered aggregation can also be investigated to improve robustness in highly heterogeneous environments.