5 - Application-Level Data Rate Adaptation in Wi-Fi

Application-Level Data Rate Adaptation in Wi-Fi
Networks Using Deep Reinforcement Learning

Ibrahim Sammour, Gérard Chalhoub
To cite this version:

Ibrahim Sammour, Gérard Chalhoub. Application-Level Data Rate Adaptation in Wi-Fi Networks
Using Deep Reinforcement Learning. 2022 IEEE 96th Vehicular Technology Conference (VTC2022-
Fall), Sep 2022, London, United Kingdom. pp.1-7, �10.1109/VTC2022-Fall57202.2022.10013037�. �hal-
04071129�
HAL Id: hal-04071129

https://hal.science/hal-04071129
Submitted on 10 May 2023
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Ibrahim Sammour, Gerard Chalhoub
To cite this version:

Ibrahim Sammour, Gerard Chalhoub. Application-Level Data Rate Adaptation in Wi-Fi Networks
Using Deep Reinforcement Learning. Vehicular Technology Conference: VTC2022-Spring, Aug 2022,
Helsinki, France. �hal-04090674�
HAL Id: hal-04090674

https://hal.uca.fr/hal-04090674
Submitted on 5 May 2023
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Ibrahim Sammour Gerard Chalhoub
LIMOS-CNRS LIMOS-CNRS
University of Clermont-Auvergne University of Clermont-Auvergne
Clermont-Ferrand, France Clermont-Ferrand, France
0000-0002-7673-310X 0000-0003-1687-598X
Abstract—Wireless technologies are used in almost every simultaneously connected devices in dense networks such as
application domain. They are easy to deploy and some of stadiums [7], airports [8], or public transportation.
them offer high data rates. Modern applications require more The increase in the number of Wi-Fi clients causes an
bandwidth to cope with the growing quality of network content
(i.e. multimedia). The increasing demand for network capacity increase in the offered load per access point. Indeed, Wi-Fi
in terms of bandwidth and the growing number of users are adopts the CSMA/CA random access protocol to manage the
causing densification in the deployed networks. Most wireless medium access. However, the contention increases in dense
technologies, such as Wi-Fi, suffer from deterioration of Quality networks resulting in longer backoff durations and degradation
of Experience (QoE) in dense deployments. To overcome this in the overall performance of the network.
problem, one of the techniques that can be used is data rate
adaptation depending on the state of the network. In this paper, One way to deal with this overload is to adapt the offered
we propose a Deep Reinforcement Learning (DRL) approach load to the capacity of the network. For example, network
for decentralized application-level data rate adaptation in dense content can be delivered in different forms such as 360p up
Wi-Fi networks. We present the training procedure of the DRL to 4K videos. Usually, this adaptation is based on the Quality
model using NS-3 simulator and TensorFlow. The model is of Service (QoS) metrics visible to the application layer such
then evaluated in dense scenarios and compared to an existing
approach from the literature. Results show that using DRL can as throughput, packet loss, delay or jitter.
help to better cope with the current capacity of the wireless Machine learning is playing an increasing role in recent
network. years in improving Wi-Fi performance. Reinforcement Learn-
ing (RL), a branch of machine learning capable of solving
I. I NTRODUCTION
optimization problems in complex environments, is frequently
Over the past few decades, wireless technologies have used in Wi-Fi [10]. In RL, we train an agent to learn about
enabled a wide variety of applications that have a significant the environment (through measurements/sensors), take actions
impact on the way we live. Many modern applications rely on (optimize control parameters), and receive rewards to evaluate
these technologies. They eased the way humans and objects the actions. DRL offers a better generalization by introducing
interact and removed limitations of cable-based networks. deep learning in RL. DRL introduces significant improvements
These revolutionary technologies are being deployed almost over traditional optimization methods for wireless networks
everywhere from airplanes and space rockets [1] to laptops such as threshold based techniques or supervised learning [11].
and mobile phones, going all the way down to tiny implantable DRL has the ability to handle more complex scenarios and
medical devices [2]. generalize its knowledge to situations it has never encountered
Wireless Fidelty (Wi-Fi) is one of the popular wireless before [12].
technologies. Wi-Fi devices are included in almost every In this paper, we tackle the problem of application-level
smartphone and laptop. The availability of Wi-Fi access points data rate adaptation in dense Wi-Fi networks. We propose
has become a necessity in any public, business, or commercial a realistic decentralized DRL mechanism for dense networks
place. Hence, Wi-Fi faces an increasing amount of challenges that is able to cope with high congestion scenarios by adapting
in terms of the Quality of Service (QoS). Over the years, Wi- the data rate at the application level based on goodput and
Fi standards have introduced advanced features to improve packet loss metrics. The rest of the paper is organized as
the data rates, coverage, and the multi-user functionality. For follows: In Section II, we briefly talk about previous DRL
example, Multi-User Multiple Input Multiple Output (MU- approaches in Wi-Fi networks. In Section III, we formulate our
MIMO) [3], Orthogonal Frequency Division Multiple Ac- problem and present the structure of our proposed approach.
cess (OFDMA) [4], and higher order modulation and coding In Section IV, we present the training scenario and discuss the
schemes (MCS) such as the 1024-QAM in Wi-Fi 6. These validation results of our DRL model compared to an existing
features made it possible to deploy the technology with many approach from the literature. We conclude the paper and talk
This research was funded by the French government IDEX-ISITE initiative about future work in Section V.
16-IDEX-0001 (CAP 20-25). The main novelty of this paper is the use of a DRL approach
on the application layer for a generic application by adapting Markov Decision Process (MDP). A DRL approach is pro-
the offered load depending on the current conditions of the posed for adapting the application profiles based on network
wireless network. conditions.
II. D EEP R EINFORCEMENT L EARNING IN W I -F I
A. Problem Formulation
NETWORKS
In this section, we provide a quick overview of the recent Modern applications are more and more demanding in terms
applications of DRL in Wi-Fi networks. of network needs. Thus, more bandwidth is required by each
Recent surveys, such as [13], described the various Wi-Fi Wi-Fi node in order to transmit/receive data while maintaining
areas where ML is applied, mainly RL and Deep Learning the satisfaction of the users. Studies on Wi-Fi performance
(DL), DRL has not been extensively studied. However, the have shown the significant decrease of the overall network
papers working on a similar problem as ours are dedicated to throughput as the network size increases [21].
certain applications such as video streaming where only a few In an attempt to reproduce this behavior, we evaluated
ones use RL as in [14]. the performance of Wi-Fi using fixed application-level data
Authors in [15] used Deep Q-Learning (DQL) to enhance rates for all nodes. We performed simulations using NS-3
the performance of CSMA/CA in dense Wireless Local Area by varying the number of nodes from 1 to 100. Simulation
Networks (WLANs) by observing the backoff values, perform- parameters are presented in table I. Each node is transmitting
ing actions to increase or decrease them, and learning from using the same application-level data rate toward the same
a reward calculated based on the probability of collisions. access point. Figure 1 depicts the aggregated goodput obtained
Results showed enhancement in terms of throughput, channel in the network averaged over 50 runs with the standard devi-
access delay, and fairness compared to other mechanisms in ation. During simulation, the network becomes saturated for
the literature. To overcome the performance degradation due a certain number of nodes depending on the application-level
to the random selection of Resource Units (RUs) in Wi-Fi 6 data rate. When more nodes are introduced to the network, the
OFDMA, authors in [16] proposed a DRL mechanism based aggregated goodput drops from the saturation value and keeps
on energy detection and acknowledgments for fair resource decreasing. This is mainly due to the probabilistic behavior
distribution. This mechanism led to higher average throughput of the CSMA/CA algorithm used by Wi-Fi which does not
and lower average latency. Double Deep Q-Learning was used guarantee access to the medium especially in high offered load
for rate adaptation on the physical layer of Wi-Fi networks scenarios [33]. Optimizations have been studied to enhance the
in [17]. The state representation included metrics such as performance of CSMA/CA [23] but this can only be done by
Modulation and Coding Schemes (MCS) and Received Signal updating the specifications of the 802.11 standard. Hence, our
Strength Indicator (RSSI). The agent relied on the goodput to proposal that is situated on the application level can benefit
calculate the reward and selects a new triplet (MCS, channel any device without requiring modifications on the standard.
width, number of spatial streams) predefined profiles. The Figure 1 also shows that the maximum goodput increases
agent was deployed on Intel 802.11ac Network Interface when the application-level datarate is increased. Higher
Cards (NICs). It outperformed the Intel and Linux default rate application-level data rates reach their saturation goodput
adaptation algorithms by more than 200%. Authors in [18] earlier. For example, when using a data rate of 3.5 Mb/s, we
used DQL based algorithm to check if clients actually benefit reach the saturation goodput with 25 nodes and it decreases
from participating in Multi-User MIMO (MU-MIMO). The afterwards. However, when using a data rate of 0.7 Mb/s, we
metrics used in the study were Channel State Information reach the saturation goodput with 75 nodes. Also note that for
(CSI) and SNR. The experimental results show that using the same number of nodes, using a higher application profile
DRL to solve the pre-screening problem improves the system gives higher goodput. Except for when the number of nodes
throughput by almost 40%. In [19], authors improve the data is high, the goodputs values obtained by different application
rate obtained by Wi-Fi clients using a client-access point profiles are close.
association scheme based on DQL. The proposed method takes The deterioration of the network performance is caused
into account the application demands of the user and link mainly by 2 reasons. First, (i) the increase in the Offered
capacity. Results showed improvements over standard signal load: when each node increases the amount of data, the overall
strength based association in terms of throughput and ensuring offered load becomes greater than the reception capacity of
application requirements. Authors in [20] proposed a video the access point. This reduces the goodput of each node and
quality adaptation algorithm called End-to-End MAC (E2E- decreases the QoE. Second, (ii) the use of a random access
MAC). The algorithm combines throughput measurements protocol (CSMA/CA) in the Medium Access Control (MAC)
with the number of re-transmissions on the MAC layer to layer: after transmitting a data packet, a node waits for an
improve the QoE of users. acknowledgment from the receiver, if no acknowledgement is
received, the node backs off and waits for a random number
III. P ROPOSED A PPROACH of time slots before accessing the channel again. At each new
In this section we address the network performance degra- attempt to retransmit the packet, the probability of choosing a
dation in dense Wi-Fi networks. The problem is framed as a longer backoff duration is increased. This mechanism causes
70 Note that other metrics can also be used such as the Age of
0.7 Mb/s
1.5 Mb/s
Information (AoI) as done in [27] which provides feedback
60
2.5 Mb/s about the time spent to access the channel and send the data..
3.5 Mb/s 1) Model Architecture: The optimization problem can be
50
framed as a MDP consisting of (i) an agent exploring the
Goodput (Mb/s)
40
environment, (ii) a set of states S, (iii) a set of actions A, (iv)
a reward function R, (v) a transition probability T, and (vi) a
30 reward discount factor γ ∈ [0,1].
The agent is installed on the application layer of the Wi-
20 Fi node. The agent sends data toward an access point over
an interval of time and observes the state s ∈ S of the
10 environment. The state is expressed as the goodput (g) and
the packet loss (pl) of the transmissions in the previous
0
0 20 40 60 80 100 interval [g, pl]. The agent then takes an action a ∈ A by
Number Of Nodes selecting data rate based on a policy (π). The policy is a
Fig. 1: The goodput of nodes transmitting using a single neural network which provides a set of probabilities, each
application profile. of which corresponds to an action. Based on the policy, the
agent samples an action and decides whether to keep using the
current data rate or to select a new one. The reward r is then
significant access delays and collisions especially in dense calculated as the difference between the overall goodput and
networks [22]. packet loss aiming at enhancing the overall performance of the
In what follows, we propose a mechanism that reduces the network. Note that our reward function expresses the QoE. As
offered load per node in an attempt maintain the goodput we chose a generic application, we reduced the QoE to these
level for a higher number of nodes. For example, under high metrics, note that for other specific applications such video
contention conditions, a node preserves the satisfaction of the streaming, other metrics can be added to the reward function
user by selecting a lower application-level data rate. such as buffer size, access delay, or current quality of image.
Also note that the fairness issue and energy consumption are
B. Deep Reinforcement Learning Model out of the scope of our approach. As a first step, we only focus
The idea behind RL is to employ a self-learning agent that on two results, namely, the goodput and the packet loss. The
can interact with the environment through a series of actions. reward equation is shown in (1).
The agent is rewarded after each action it makes, moving it
closer to its goal. The agent improves its decision-making r = α ∗ g − β ∗ pl (1)
through training, which helps it learn the best possible action
for each state it may encounter in a real environment [24]. The goodput is normalized to the range of [0,1] by dividing
However, the number of different network conditions an agent it by the maximum expected goodput (based on the maximum
may encounter is large. Indeed, this large space of values can possible data rate). α and β parameters help to fine tune
be discretized. In order to provide more accuracy and better the reward to be compliant with the requirements of the
coverage of a real environment, we decided to use DRL. In application. For instance, increasing β will drive the agent
DRL, the decision-making process of an agent is represented to favor learning how to reduce the packet loss more than
by a neural network. A neural network takes the network increasing the goodput. In this case, selecting a lower data
conditions as an input and outputs the corresponding suitable rate would be the most probable decision.
application-level data rate. The immediate reward may not be sufficient to determine
The adaptation from one application-level data rate to the proper decision in the current state. The decision at any
another is a challenging task due to the varying nature of state have an impact on the future series of events. The
wireless networks and the large space of network conditions discount factor γ ∈ [0, 1] is used to determine the importance
that a Wi-Fi node can encounter. To tackle this issue, multiple of future rewards in comparison to the immediate reward.
techniques have been introduced in the literature based on the This is represented by the cumulative reward equation (2). It
knowledge of physical layer metrics such as the signal to noise predicts how much reward is expected in the future after taking
ratio (SNR) in [25]. The knowledge of these metrics gives an action in a certain state. For instance, choosing a low value
an immediate estimate of channel conditions [26]. However, of γ means favoring short term rewards. For example, when
applications may be deployed in various types of devices that streaming a live video, we care more about short term rewards.
do not provide access to the physical layer metrics. For that
reason, we will only use certain metrics that can be available r = rt + γ ∗ rt+1 + γ 2 ∗ rt+2 + · · · + γ T ∗ rT (2)
on the application level for any device, namely, goodput which
is the amount of correctly received data, and packet loss which where T is the window size of the future rewards that we
is the ratio of lost packets due to collisions and buffer overflow. care about.
Our aim is to reach a policy in which for any observed is restarted. When the simulation starts, the density of nodes is
network conditions, it selects the data rate that maximizes increased gradually over time. The agent starts observing the
the return (cumulative reward) of the agent. Our approach is environment by collecting the state of each node. A reward
split into two phases: Offline-training and Exploitation. During is then calculated based on the collected states. Next, the
the Offline-training phase, the agent explores the environment previous state, the action taken in the previous state, and the
through different simulation scenarios until converging to a newly calculated reward are added to B. The actor network is
policy based on our reward design. At the end of the training, then used to predict an action for the current state.
we save the model that contains our final policy to use it in To check if our model is improving or deteriorating, we
the next phase. During the Exploitation phase, we deploy the calculate the advantage function at the end of each episode
generated model in scenarios where the agent is able to pick shown in (3). It indicates how beneficial each action was when
with a certain degree of confidence the most suitable data rate using the current policy. It is a comparison between the return
given a state. Our training method is presented in the next when taking an action in a state and the expected return of
sub-section. the state using the previous policy. The advantage function
2) Finding the most suitable data rate: We employ an on- provides an insight on the impact of the action of the agent
policy approach based on the actor-critic framework, namely, on the return of the state.
Proximal Policy Optimisation (PPO) with the clipping sur-
rogate technique [29]. We chose PPO for its stability as it
Aπt (s, a) = rt + γ ∗ V π (st+1 ) − V π (st ) (3)
constraints policy updates so the learning does not diverge or
fall to a local optimum. The actor-critic framework consists of
Where V π (st ) is the critic network that gives the expected
two neural networks: the actor network and the critic network.
return of a state. The advantage function is then used to update
The actor network is responsible for the action selection. It
the ϕ parameters of the critic network by performing gradient
takes the state as an input and outputs a probability vector of
descent with respect to the loss function (4). We update the
the possible actions. The critic network is the value function.
critic network parameters so that its predictions match the
It takes a state as an input and outputs the expected return.
return of the policy.
The critic network decides if the policy (actor) is improving
or deteriorating. Figure 2 shows the complete structure of the
proposed approach. L(ϕ) = E[A2t ] (4)
Input/Output
Episode
Neural network The θ parameters of the actor network are updated by
update
performing gradient ascent with respect to the loss function

shown in (5) where rθ is the probability ratio shown in (6).
(Action)
rθ will be greater than 1 when the action is more probable
Application
for the current policy than it was for the old policy, it will be
Simulation
(State)
(Goodput, Packet Loss) Actor Proﬁle
between 0 and 1 otherwise. Clipping is done based on ϵ which

is a hyperparameter in the loss function, used to avoid the cases
Actor Loss
where the actions between policies have a larger difference in
Batch
(state, action, reward) probabilities. Thus, it prevents taking big gradient steps when
updating the policy. The update of the actor parameters makes
State Return
the actions that resulted in a better return more probable in
Batch Advantage
(state, action, return) the new policy.
Expected
Critic Return
L(θ) = Et [min(rθ ∗ At , clip(rθ , 1 − ϵ, 1 + ϵ)At )] (5)
(Goodput, Packet Loss)
Critic Loss
πθ (at | st )
rt (θ) = (6)
Fig. 2: The structure of the proposed approach using the PPO πθold (at | st )
algorithm
The actor and critic networks are updated until their losses
In the first step of training we initialize the hyperparameters become negligible. The critic loss becomes negligible when
and the network weights of the actor and the critic networks, θ the new policies have no more advantage over the old ones.
and ϕ respectively. Then, we iterate through multiple episodes The actor loss becomes negligible when the policies are
of training. At the beginning of each episode, we initialize producing almost no difference in the predicted action prob-
an empty batch B that will hold the (s, a, r) tuples. The abilities for the different states. Thus the training is finished.
tuples are used to update the actor and the critic networks We save the model which is now ready to be deployed in real
at the end of each episode where the simulation environment scenarios.
IV. S IMULATION R ESULTS for an interval of time which is set to 250 milliseconds. This
In this section, we conduct a performance evaluation of duration allows the nodes to interact with the environment
our DRL model. We train our model offline in a simulated multiple times providing a better vision of the environment
environment and we evaluate it in complex scenarios that were than a single transmission. Depending on the fault tolerance
not encountered during training. of the application, a shorter or a longer monitoring duration
can be used. At the end of the simulation (Episode), the losses
A. Simulation Environment of the actor and the critic networks are calculated. The actor
Our simulation procedure consists of two key parts: Wi-Fi and critic networks are then updated accordingly. The training
simulation and DRL Training. Each part is implemented in process is marked as done when the losses become stable and
its own environment. The Wi-Fi simulation is conducted in the return is not increasing anymore.
NS-3 which is an open source simulator offering a detailed
Wi-Fi module [30]. The training and validation of the DRL C. Model Validation
model take place in a python environment using TensorFlow The offline training produces an optimized model that
which is an open source platform for machine learning [31]. maximizes the return. The goal of our model is to adapt
Both NS-3 and the DRL module exchange information using the application-level data rate to enhance the performance of
ZeroMQ (ZMQ) which is an asynchronous messaging library the network. We validate the model, which was trained in a
that allows to exchange information between independent few static scenarios limited to 30 nodes, in a larger space of
applications [32]. The simulation parameters are presented in scenarios including mobile scenarios with more nodes (up to
table I. 100 nodes). The validation aims at testing the ability of the
model to generalize and adapt to scenarios it has not previously
TABLE I: Simulation parameters.
encountered.
Parameter Value In what follows, we will compare our DRL model with
Simulation time (Training) 180 seconds E2E-MAC [20] and the highest application-level data rate.
Simulation time (Validation) 60 seconds Papers dealing with the same problem as ours often lack details
Runs 50 about the proposed model such as hyperparameters, simulation
WLAN standard IEEE 802.11ac
Path loss model Log-distance time, and the environment. This makes producing the same
Fading factor random(0, 2dB) models for comparison almost impossible. Thus we decided to
Traffic UDP compare our approach with [20] for which the needed details
Channel Width 20 MHz
Packet size 1500 Bytes were available in the paper.
Mobility model Random Walk 2d Mobility Model First, we perform scenarios with mobility to make our
Mobility speed 4 m/s simulation more realistic and to confront our model to dy-
Topology size Square of boundaries (-30, 30, -30, 30)
α 0.8 namic situations. During the scenarios, the density of nodes is
β 0.2 increased from 1 to 100, which covers the decrease in goodput
Learning rate 0.001 for most application profiles.
Figures 3 and 4 show the goodput and the ratio of collisions
respectively averaged over 50 runs and include the standard
B. Model Training
deviations.
To start the training process, we prepare a simulation
scenario with a varying number of nodes. The total duration 70
of the simulation is set to 180 seconds of NS-3 simulation DRL Mechanism
E2E-MAC
seconds. We start the simulation with 10 nodes and we split 60
3.5 Mb/s
each simulation scenario into 3 phases of 60 seconds. After
each phase, we add 10 nodes to the scenario. The number of 50
Goodput (Mb/s)
nodes is chosen in order for the model to explore a variety of

40
states in different network conditions.
Our training reward design of equation 1 is set to favor 30
improving the goodput over the packet loss α > β. Note that
the packet loss seen by the application can be caused not only 20
by collisions but also by buffer overflow.
During training, each NS-3 simulation sends to the DRL 10
module the unique identifier and the state of each node. Then,
the DRL module predicts an application profile for each node 0
0 20 40 60 80 100
based on the states and calculates the corresponding rewards. Number Of Nodes
The list of available application-level data rates are [0.7 Mb/s,
1.5 Mb/s, 2.5 Mb/s, 3.5 Mb/s] corresponding to data rates Fig. 3: Overall goodput obtained in the network using the
that were used in [20]. Each node transmits with data rate trained DRL model, E2E-MAC, and single application profile.
0.5 double the number of nodes in the network. This is the main
DRL Mechanism
E2E MAC
aim of the approach, and figure 3 shows this improvement. As
3.5 Mb/s for the number of collisions, which is not the main aim of the
0.4
approach (as apposed to E2E-MAC method), our DRL method
was able to reduce them compared to the baseline method, and
under very high contention, the DRL method resulted in less
Collisions
0.3
collisions than E2E-MAC.
0.2 D. Model Complexity
In our approach, the DRL model is trained offline inside
0.1 the simulator. The exported model, which is used for valida-
tion, can be deployed on end devices where calculating the
computational overhead is critical. Thus, the computational
0
0 20 40 60 80 100 overhead is important during validation and is directly related
Number Of Nodes to the dimensions of the designed neural network. The over-
head is the number of operations inside the neural network
Fig. 4: Ratio of collisions obtained in the network using the
starting from the input layer to the output layer. The time and
trained DRL model, E2E-MAC, and single application profile.
memory overhead of the model predictions depends on the
computational power and the memory specs of the hardware.
Relatively small neural networks such as our case have an
Results in terms of goodput show that using our DRL model insignificant overhead on end devices.
we are able to maintain the maximum value of goodput for
almost double the number of nodes. Note that E2E-MAC V. C ONCLUSION AND F UTURE W ORK
does not reach the maximum goodput value reached by the In this paper, we explored a DRL mechanism for enhancing
maximum data rate and our DRL model but is able to maintain goodput in dense Wi-Fi networks. Wi-Fi networks are present
higher goodput values for a higher number of nodes compared in our every day life through many applications. Nevertheless,
to the maximum data rate case. When we only have a few they are known to suffer from performance degradation under
nodes in the network, results of the three approaches are close. high offered load situations. We explored different saturation
This is an expected result because the enhancement we are situations showing how Wi-Fi achieves low goodput under
aiming is achievable under saturation conditions. high offered loads.
Results in term of number of collisions show that taking the In an attempt to increase the goodput in the network, we
network state into consideration reduces the overall number of investigated application-level data rate adaptation using Deep
losses. E2E-MAC approach aims gives better results than our Reinforcement Learning that allows to adapt the offered load
DRL model for a number of nodes less than 80. This can depending on the locally observed network state. We proposed
be explained by the fact that E2E-MAC takes into account a DRL model that takes into account the current performance
the number of MAC level retransmissions which is a closer of the network in order to dynamically choose the most
estimation of the losses than application level packet loss. suitable data rate. We used network simulation to train our
For more than 80 nodes, our DRL results in less collisions model offline. This approach is cost-free and can be done
because under these scenarios most of the nodes switch to the prior to real deployments. The DRL model was then validated
lowest data rate which helps to reduce the overall number of in dense and more complex scenarios. The aim was to show
collisions. how the model was able to cope in scenarios that it did not
In certain situations, the data rate requirement of an ap- encounter during training. The DRL model learned adapting
plication profile can be higher than the physical data rate. In the application-level data rates according to the observed state
these cases, the DRL model selects application-level data rates of the network. We showed how the training achieved better
lower than the current physical data rate. This is done without results in term of goodput under network congestion compared
knowledge to the underlying physical data rate. It is based to a similar learning approach and baseline cases.
on local observations for each node. Nodes suffering from In our future work, we will test our model on real devices
bad network performance are more likely to choose lower in an attempt to validate the ability of the offline simulated
application-level data rates. This kind of decisions has a double training to serve for real deployments. We will further explore
impact: first they are generating less traffic and thus suffering offline training through simulation by offering all available
from less data loss, second they are occupying the channel network metrics to the nodes. The aim is to use a reward
less often, meaning that other nodes can profit from a lower that includes the current overall network performance instead
overall contention and increase their application profile if their of per node local observations. Once the model is trained, it
performance feedback suggests so. can be exploited without the global knowledge of the network
Overall, the simulation results show that our DRL approach performance. Also, we will expend our approach by giving the
is able to maintain its peak throughput levels for more than nodes the ability to adapt parts of the Wi-Fi standard, namely
the CSMA/CA mechanism in order to achieve full potential [23] Y. Edalat and K. Obraczka, ”Dynamically Tuning IEEE 802.11’s Con-
performance. In addition to that, we will work on more specific tention Window Using Machine Learning,” Association for Computing
Machinery, 2019.
applications such as video streaming and construct reward [24] N. C. Luong et al., ”Applications of Deep Reinforcement Learning in
functions that included metrics that relate to the application Communications and Networking: A Survey,” in IEEE Communications
such as buffer size. Surveys and Tutorials, vol. 21, no. 4, pp. 3133-3174, Fourthquarter 2019.
[25] F. Chiariotti, C. Pielli, A. Zanella and M. Zorzi, ”QoE-aware Video Rate
Adaptation algorithms in multi-user IEEE 802.11 wireless networks,”
R EFERENCES 2015 IEEE ICC, 2015, pp. 6116-6121.
[26] I. Sammour and G. Chalhoub, ”Evaluation of Rate Adaptation Algo-
[1] J. F. Schmidt, D. Neuhold, C. Bettstetter, J. Klaue and D. Schupke, rithms in IEEE 802.11 Networks,” Electronics 2020, 9, 1436.
”Wireless Connectivity in Airplanes: Challenges and the Case for [27] A. Gong, T. Zhang, H. Chen and Y. Zhang, ”Age-of-Information-based
UWB,” in IEEE Access, vol. 9, pp. 52913-52925, 2021. Scheduling in Multiuser Uplinks with Stochastic Arrivals: A POMDP
[2] C. Shi, V. Andino-Pavlovsky, et al ”Application of a sub–0.1- Approach,” GLOBECOM 2020 - 2020 IEEE Global Communications
mm¡sup¿3¡/sup¿ implantable mote for in vivo real-time wireless tem- Conference, 2020, pp. 1-6
perature sensing,” in Science Advances, vol. 7, 2021. [28] R. Sutton and A. Barto, ”Reinforcement Learning: An Introduction,”
[3] Ravindranath, N.S. Singh, Inder Prasad, Ajay Rao, V.. ”Study of The MIT Press, 2nd edition, 2018
performance of transmit beamforming and MU-MIMO mechanisms in [29] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O.
IEEE 802.11ac WLANs,” IEEE ICICCT 2017. Klimov, ”Proximal policy optimization algorithms,” 2017.
[4] George, A. Shaji, A.s.. ”A Review of Wi-Fi 6: The Revolution of 6th https://arxiv.org/abs/1707.06347
Generation Wi-Fi Technology,” 10. 56-65. 2020. [30] G. Riley, T. Henderson, ”The ns-3 network simulator,” in Modeling and
[5] Garcia, Laura Jimenez, Jose Taha, Miran Lloret, Jaime. (2018). Wireless Tools for Network Simulation. Springer.
Technologies for IoT in Smart Cities. Network Protocols and Algo- [31] M. Abadi, A. Agarwal et al. ”Large-Scale Machine Learning on Het-
rithms. erogeneous Systems,” White Paper 2015
[6] Jawad, H.M.; Nordin, R.; Gharghan, S.K.; Jawad, A.M.; Ismail, M. [32] ZeroMQ, http://zeromq.org accessed on 14/02/2022
Energy-Efficient Wireless Sensor Networks for Precision Agriculture: [33] Medepalli, K. and Tobagi, F. A. ”Towards Performance Modeling of
A Review. IEEE 802.11 Based Wireless Networks: A Unified Framework and Its
[7] Levallet, N.O’Reilly, N. Wanless, E. Naraine, M. Alkon, E. Longmire, Applications”, Proceedings IEEE INFOCOM 2006.
Wade. ”Enhancing the Fan Experience at Live Sporting Events: The Case
of Stadium Wi-Fi,” Case Studies In Sport Management. vol.8, 2019.
[8] Z. Hays, G. Richter, S. Berger, C. Baylis and R. J. Marks, ”Alleviating
airport WiFi congestion: An comparison of 2.4 GHz and 5 GHz WiFi
usage and capabilities,” Texas Symposium on Wireless and Microwave
Circuits and Systems, 2014, pp. 1-4.
[9] ”Cisco Annual Internet Report (2018-2023) White Paper,” Cisco, San
Jose, CA, USA, White Paper, 2018
[10] S. Szott, K. Kosek-Szott et al, ”Wi-Fi Meets ML: A Survey on Improving
IEEE 802.11 Performance with Machine Learning,” 2022
[11] C. Zhang, P. Patras and H. Haddadi, ”Deep Learning in Mobile and
Wireless Networking: A Survey,” in IEEE Communications Surveys and
Tutorials, vol. 21, no. 3, pp. 2224-2287, thirdquarter 2019.
[12] K. Arulkumaran, M. Deisenroth, M. Brundage, and A. Bharath. (2017).
”A Brief Survey of Deep Reinforcement Learning,” in IEEE Signal
Processing Magazine. 34.
[13] S. Szott et al., ”Wi-Fi Meets ML: A Survey on Improving IEEE
802.11 Performance With Machine Learning,” in IEEE Communications
Surveys & Tutorials, vol. 24, no. 3, pp. 1843-1893, thirdquarter 2022
[14] M. Morshedi and J. Noll, ”A Survey on Prediction of PQoS Using
Machine Learning on Wi-Fi Networks,” 2020 International Conference
on Advanced Technologies for Communications (ATC), 2020, pp. 5-11
[15] R. Ali, N. Shahin, Y. B. Zikria, B. Kim and S. W. Kim, ”Deep
Reinforcement Learning Paradigm for Performance Optimization of
Channel Observation–Based MAC Protocols in Dense WLANs,” in
IEEE Access, vol. 7, pp. 3500-3511, 2019.
[16] D. Kotagiri, K. Nihei and T. Li, ”Distributed Convolutional Deep
Reinforcement Learning based OFDMA MAC for 802.11ax,” IEEE ICC
2021, pp. 1-6.
[17] S. -C. Chen, C. -Y. Li and C. -H. Chiu, ”An Experience Driven Design
for IEEE 802.11ac Rate Adaptation based on Reinforcement Learning,”
IEEE INFOCOM 2021, pp. 1-10.
[18] S. Su, W. Tan, X. Zhu and R. Liston, ”Client Pre-Screening for MU-
MIMO in Commodity 802.11ac Networks via Online Learning,” IEEE
INFOCOM 2019, pp. 649-657.
[19] M. A. Kafi, A. Mouradian and V. Vèque, ”On-line Client Association
Scheme Based on Reinforcement Learning for WLAN Networks,” 2019
IEEE WCNC, 2019, pp. 1-7.
[20] A. S. Abdallah and A. B. MacKenzie, ”A cross-layer controller for
adaptive video streaming over IEEE 802.11 networks,” 2015 IEEE ICC,
2015, pp. 6797-6802.
[21] A. Ganji, G. Page and M. Shahzad, ”Characterizing the Performance of
WiFi in Dense IoT Deployments,” 2019 ICCCN, pp. 1-9.
[22] E. Ziouva, T. Antonakopoulos, ”CSMA/CA performance under high
traffic conditions: throughput and delay analysis,” Computer Commu-
nications, Volume 25, Issue 3.

5 - Application-Level Data Rate Adaptation in Wi-Fi

Uploaded by

Copyright:

Available Formats

5 - Application-Level Data Rate Adaptation in Wi-Fi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 - Application-Level Data Rate Adaptation in Wi-Fi

Uploaded by

Copyright:

Available Formats

Application-Level Data Rate Adaptation in Wi-Fi

Networks Using Deep Reinforcement Learning

To cite this version:

HAL Id: hal-04071129

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

To cite this version:

HAL Id: hal-04090674

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

performing gradient ascent with respect to the loss function

between 0 and 1 otherwise. Clipping is done based on ϵ which

nodes is chosen in order for the model to explore a variety of

You might also like