A Transfer Reinforcement Learning Approach for Capacity Sharing in Beyond 5G Networks
<p>TRL approach for relearning a new policy after increasing the number of cells.</p> "> Figure 2
<p>Illustrative example of the transfer of weights <b><span class="html-italic">θ</span><sup>(<span class="html-italic">S</span>)</sup></b> when the transfer is performed for a source task with <span class="html-italic">N</span> = 2 cells to a target task with <span class="html-italic">N</span>′ = 3.</p> "> Figure 3
<p>Learning process of the target policy with inter-task mapping.</p> "> Figure 4
<p>Average aggregated reward (<span class="html-italic">R</span>) during training.</p> "> Figure 5
<p>Standard deviation of <span class="html-italic">R</span>.</p> "> Figure 6
<p>Aggregated offered load per tenant in one cell for <span class="html-italic">N</span> = 4 and <span class="html-italic">N</span>′ = 5.</p> "> Figure 7
<p>Offered load vs. assigned capacity to tenants 1 and 2 for both re-training modes.</p> "> Figure 8
<p>O-RAN-based system model for capacity sharing and deployment at the edge.</p> ">
Abstract
:1. Introduction
2. System Model and Problem Definition
3. Transfer Reinforcement Learning Approach
3.1. States, Actions, and Reward in Source and Target Tasks
- State: The state in the source task is defined as s(S) = {sN, sSLA}, where sN = {sn|n = 1…N} includes the state components of the tenant in the N cells, denoted as vector sn for the n-th cell, and sSLA includes the state components that reflect the SLA of the tenant. In turn, the state in the target task is defined as s(T) = {sN, sΔN, sSLA}, where sΔN = {sn|n = N + 1…N′} includes the state of the newly deployed cells.
- Action: One action in the source task includes N components an, n = 1…N, each one associated with one cell. an tunes the capacity share σk,n(t) to be applied in the following time step in the n-th cell and can take three different values an ϵ {Δ, 0, −Δ}, corresponding to increasing the capacity share by a step of Δ, maintaining it, or decreasing it by a step of Δ, respectively. As a result, the action space in the source task has 3N possible actions, and the i-th action a(S)(i) ∈ is denoted as a(S)(i) = {an(i)|n = 1…N}, for i = 1,…, 3N. Correspondingly, the action space of the target task has 3N′ possible actions, and the d-th action is denoted as a(T)(d) = {an(d)|n = 1…N′}, for d = 1,…, 3N′. An action in the target task can be decomposed into two parts: a(T)(d) = {aN(d), aΔN(d)}, where aN(d) ∈ includes the components of the initial N cells (so it is one of the actions of the source task), and aΔN(d) = {an(d)| n = N + 1…N′} includes the components for the newly deployed cells.
- Reward: The reward in both the source and target tasks assesses at the system level how good or bad the action applied to the different cells in the RAN infrastructure was, promoting the learning of optimal actions during the training process. These optimal actions are those that allow satisfying the tenant’s SLAs with less assigned capacity and penalizing those that lead to not fulfilling the SLA or to assigning more capacity than needed to a tenant in a cell (i.e., overprovisioning). Therefore, a common definition of the reward, denoted as r, is considered for both the source and target tasks. However, in the source task, the reward will be based on the actions made over N cells, while in the target task, it will be based on N′ cells.
3.2. Inter-Task Mapping Transfer Approach
4. Performance Evaluation
4.1. Considered Scenario
4.2. Training Assessment Methodology and KPIs
- Average aggregated reward per evaluation (R): It is computed as the average of the aggregated reward of the two tenants throughout one evaluation.
- Standard deviation (std(m, W)): The standard deviation of R at the m-th training step measured over the window of the last W training steps.
- Training duration: Number of training steps until the convergence criterion is achieved. This criterion considers that convergence is achieved in the training step m that fulfills std(m, W) < stdth, where stdth is a threshold.
4.3. Training Performance
4.4. Performance of Trained Policies
5. Implementation Considerations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yzar, A.; Dogan-Tusha, S.; Arslan, H. 6G Vision: An Ultra-flexible Perspective. ITU J. Future Evol. Technol. 2020, 1, 121–140. [Google Scholar]
- Li, R.; Zhao, Z.; Sun, Q.; Chih-Lin, I.; Yang, C.; Chen, X.; Zhao, M.; Zhang, H. Deep Reinforcement Learning for Resource Management in Network Slicing. IEEE Access 2018, 6, 74429–74441. [Google Scholar] [CrossRef]
- Qi, C.; Hua, Y.; Li, R.; Zhao, Z.; Zhang, H. Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing. IEEE Commun. Lett. 2019, 23, 1337–1341. [Google Scholar] [CrossRef]
- Hua, Y.; Li, R.; Zhao, Z.; Chen, X.; Zhang, H. GAN-Powered Deep Distributional Reinforcement Learning for Resource Management in Network Slicing. IEEE J. Sel. Areas Commun. 2020, 38, 334–349. [Google Scholar] [CrossRef]
- Sun, G.; Gebrekidan, Z.T.; Boateng, G.O.; Ayepah-Mensah, D.; Jiang, W. Dynamic Reservation and Deep Reinforcement Learning Based Autonomous Resource Slicing for Virtualized Radio Access Networks. IEEE Access 2019, 7, 45758–45772. [Google Scholar] [CrossRef]
- Sun, G.; Xiong, K.; Boateng, G.O.; Ayepah-Mensah, D.; Liu, G.; Jiang, W. Autonomous Resource Provisioning and Resource Customization for Mixed Traffics in Virtualized Radio Access Network. IEEE Syst. J. 2019, 13, 2454–2465. [Google Scholar] [CrossRef]
- Li, T.; Zhu, X.; Liu, X. An End-to-End Network Slicing Algorithm Based on Deep Q-Learning for 5G Network. IEEE Access 2020, 8, 122229–122240. [Google Scholar] [CrossRef]
- Sun, G.; Boateng, G.O.; Ayepah-Mensah, D.; Liu, G.; Wei, J. Autonomous Resource Slicing for Virtualized Vehicular Networks with D2D Communications Based on Deep Reinforcement Learning. IEEE Syst. J. 2020, 14, 4694–4705. [Google Scholar] [CrossRef]
- Mei, J.; Wang, X.; Zheng, K.; Bondreau, G.; Bin, A.; Abou-Zeid, H. Intelligent Radio Access Network Slicing for Service Provisioning in 6G: A Hierarchical Deep Reinforcement Learning Approach. IEEE Trans. Commun. 2021, 69, 6063–6078. [Google Scholar] [CrossRef]
- Abiko, Y.; Saito, T.; Ikeda, D.; Ohta, K.; Mizuno, T.; Mineno, H. Flexible Resource Block Allocation to Multiple Slices for Radio Access Network Slicing Using Deep Reinforcement Learning. IEEE Access 2020, 8, 68183–68198. [Google Scholar] [CrossRef]
- Vilà, I.; Pérez-Romero, J.; Sallent, O.; Umbert, A. A Multi-Agent Reinforcement Learning Approach for Capacity Sharing in Multi-tenant Scenarios. IEEE Trans. Veh. Technol. 2021, 70, 9450–9465. [Google Scholar] [CrossRef]
- Vilà, I. Contribution to the Modelling and Evaluation of Radio Network Slicing Solutions in 5G. Ph.D. Thesis, Universitat Politècnica de Catalunya, Departament de Teoria del Senyal i Comunicacions, Barcelona, Spain, 6 April 2022. [Google Scholar]
- Zhou, L.; Leng, S.; Wang, Q.; Liu, Q. Integrated Sensing and Communication in UAV Swarms for Cooperative Multiple Targets Tracking. IEEE Trans. Mob. Comput. 2023, 22, 6526–6542. [Google Scholar] [CrossRef]
- Zhou, L.; Leng, S.; Wang, Q. A Federated Digital Twin Framework for UAVs-Based Mobile Scenarios. IEEE Trans. Mob. Comput. 2024, 23, 7377–7393. [Google Scholar] [CrossRef]
- Zhao, R.; Li, Y.; Fan, Y.; Gao, F.; Tsukada, M.; Gao, Z. A Survey on Recent Advancements in Autonomous Driving Using Deep Reinforcement Learning: Applications, Challenges, and Solutions. IEEE Trans. Intell. Transp. Syst. 2024. early access. [Google Scholar] [CrossRef]
- O-RAN-WG2. AI/ML Workflow Description and Requirements v01.03; Technical Report; O-RAN Alliance: Alfter, Germany, October 2021. [Google Scholar]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Mei, J.; Wang, X.; Zheng, K. An intelligent self-sustained RAN slicing framework for diverse service provisioning in 5G-beyond and 6G networks. Intell. Converg. Netw. 2020, 1, 281–294. [Google Scholar] [CrossRef]
- Gautam, N.; Lieto, A.; Malanchini, I.; Liao, Q. Leveraging Transfer Learning for Production-Aware Slicing in Industrial Networks. In Proceedings of the 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20–23 June 2023. [Google Scholar]
- Nagib, A.M.; Abou-Zeid, H.; Hassanein, H.S. Transfer Learning-Based Accelerated Deep Reinforcement Learning for 5G RAN Slicing. In Proceedings of the 2021 IEEE 46th Conference on Local Computer Networks (LCN), Edmonton, AB, Canada, 4–7 October 2021. [Google Scholar]
- Nagib, A.M.; Abou-Zeid, H.; Hassanein, H.S. Safe and Accelerated Deep Reinforcement Learning-Based O-RAN Slicing: A Hybrid Transfer Learning Approach. IEEE J. Sel. Areas Commun. 2024, 42, 310–325. [Google Scholar] [CrossRef]
- Hu, T.; Liao, Q.; Liu, Q.; Carle, G. Network Slicing via Transfer Learning aided Distributed Deep Reinforcement Learning. In Proceedings of the 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022. [Google Scholar]
- Hu, T.; Liao, Q.; Liu, Q.; Carle, G. Inter-Cell Network Slicing with Transfer Learning Empowered Multi-Agent Deep Reinforcement Learning. IEEE Open J. Commun. Soc. 2023, 4, 1141–1155. [Google Scholar] [CrossRef]
- Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef]
- Taylor, M.E.; Whiteson, S.; Stone, P. Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, HI, USA, 14–18 May 2007. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- O-RAN.WG1. O-RAN Architecture Description Version 6.00; O-RAN Alliance, Working Group 1, Technical Specification; O-RAN Alliance: Alfter, Germany, November 2021. [Google Scholar]
- O-RAN.WG2. O-RAN Non-RT RIC & A1 Interface: Use Cases and Requirements Version 5.00; O-RAN Alliance, Working Group 2, Technical Specification; O-RAN Alliance: Alfter, Germany, November 2021. [Google Scholar]
- O-RAN.WG10. O-RAN Operations and Maintenance Interface Specification v05.00; O-RAN Alliance, Working Group 10, Technical Specification; O-RAN Alliance: Alfter, Germany, August 2020. [Google Scholar]
- 3GPP. Management and Orchestration; 5G Network Resource Model (NRM) (Release 16); 3GPP TS 28.541 v16.0.0; 3GPP: Sophia Antipolis, France, March 2019. [Google Scholar]
- Vilà, I.; Sallent, O.; Pérez-Romero, J. On the Implementation of a Reinforcement Learning-based Capacity Sharing Algorithm in O-RAN. In Proceedings of the 2022 IEEE Globecom Workshops (GC Wkshps), Rio de Janeiro, Brazil, 4–8 December 2022. [Google Scholar]
- O-RAN.WG2. Non-RT RIC Architecture v02.01; O-RAN Alliance, Working Group 2, Techical Specification; O-RAN Alliance: Alfter, Germany, October 2022. [Google Scholar]
- O-RAN.WG2. Non-RT RIC Functional Architecture v01.01; O-RAN Alliance, Working Group 2, Technical Report; O-RAN Alliance: Alfter, Germany, June 2021. [Google Scholar]
- Polese, M.; Bonati, L.; D’Oro, S.; Basagni, S.; Melodia, T. Understanding O-RAN: Architecture, Interfaces, Algorithms, Security, and Research Challenges. IEEE Commun. Surv. Tutor. 2023, 25, 1376–1411. [Google Scholar] [CrossRef]
Parameter | Value | |
---|---|---|
Cell configuration | ||
PRB bandwidth (Bn) | 360 kHz | |
Number of available PRBs in a cell (Wn) | 65 PRBs | |
Average spectral efficiency (Sn) | 5 b/s/Hz | |
Total cell capacity (cn) | 117 Mb/s | |
SLA configuration | ||
SAGBRk | Tenant k = 1 | 60% of system capacity C |
Tenant k = 2 | 40% of system capacity C | |
MCBRk,n | Tenant k = 1 | 80% of cell capacity cn |
Tenant k = 2 |
Parameter | Values | ||
---|---|---|---|
Number of cells | N = 4 | N′ = 5 | |
Initial training steps | 1000 | 3000 | |
ANN config. | Input layer (nodes) | 22 | 27 |
Fully connected layer | 1 Layer (100 nodes) | ||
Output layer(nodes) | 81 | 243 | |
Experience replay buffer maximum length (l) | 107 | ||
Mini-batch size (J) | 256 | ||
Learning rate (τ) | 10−4 | ||
Discount factor(γ) | 0.9 | ||
ε value (ε-Greedy) | 0.1 | ||
Reward weights | φ1 = 0.5, φ2 = 0.4 | ||
Time step duration (Δt) | 3 min | ||
Action step (Δ) | 0.03 |
Parameter | Values | ||||||||
---|---|---|---|---|---|---|---|---|---|
Number of cells | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
Initial training steps | 50 | 100 | 300 | 1 × 103 | 3 × 103 | 8 × 103 | 2 × 104 | 6.5 × 103 | |
ANN config. | Input layer (nodes) | 7 | 12 | 17 | 22 | 27 | 32 | 37 | 42 |
Fully connected layer | 1 Layer (100 nodes) | ||||||||
Output layer (nodes) | 3 | 9 | 27 | 81 | 243 | 729 | 2187 | 6561 |
Number of Cells After the New Cell Deployment (N′) | Training Duration (Training Steps) | Training Duration Reduction | |
---|---|---|---|
Non-TRL Mode | TRL Mode | ||
2 | 178 × 103 | 88 × 103 | 51% |
3 | 169 × 103 | 92 × 103 | 46% |
4 | 251 × 103 | 124 × 103 | 51% |
5 | 265 × 103 | 123 × 103 | 54% |
6 | 950 × 103 | 195 × 103 | 79% |
7 | 2377 × 103 | 788 × 103 | 67% |
8 | 9700 × 103 | 2058 × 103 | 79% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vilà, I.; Pérez-Romero, J.; Sallent, O. A Transfer Reinforcement Learning Approach for Capacity Sharing in Beyond 5G Networks. Future Internet 2024, 16, 434. https://doi.org/10.3390/fi16120434
Vilà I, Pérez-Romero J, Sallent O. A Transfer Reinforcement Learning Approach for Capacity Sharing in Beyond 5G Networks. Future Internet. 2024; 16(12):434. https://doi.org/10.3390/fi16120434
Chicago/Turabian StyleVilà, Irene, Jordi Pérez-Romero, and Oriol Sallent. 2024. "A Transfer Reinforcement Learning Approach for Capacity Sharing in Beyond 5G Networks" Future Internet 16, no. 12: 434. https://doi.org/10.3390/fi16120434
APA StyleVilà, I., Pérez-Romero, J., & Sallent, O. (2024). A Transfer Reinforcement Learning Approach for Capacity Sharing in Beyond 5G Networks. Future Internet, 16(12), 434. https://doi.org/10.3390/fi16120434