research-article

Safe Exploration in Wireless Security: A Safe Reinforcement Learning Algorithm With Hierarchical Structure

Authors:

Qian WangAuthors Info & Claims

IEEE Transactions on Information Forensics and Security, Volume 17

Pages 732 - 743

https://doi.org/10.1109/TIFS.2022.3149396

Published: 01 January 2022 Publication History

Abstract

Most safe reinforcement learning (RL) algorithms depend on the accurate reward that is rarely available in wireless security applications and suffer from severe performance degradation for the learning agents that have to choose the policy from a large action set. In this paper, we propose a safe RL algorithm, which uses a policy priority-based hierarchical structure to divide each policy into sub-policies with different selection priorities and thus compresses the action set. By applying inter-agent transfer learning to initialize the learning parameters, this algorithm accelerates the initial exploration of the optimal policy. Based on a security criterion that evaluates the risk value, the sub-policy distribution formulation avoids the dangerous sub-policies that cause learning failure such as severe network security problems in wireless security applications, e.g., Internet services interruption. We also propose a deep safe RL and design four deep neural networks in each sub-policy selection to further improve the learning efficiency for the learning agents that support four convolutional neural networks (CNNs): The Q-network evaluates the long-term expected reward of each sub-policy under the current state, and the E-network evaluates the long-term risk value. The target Q and E-networks update the learning parameters of the corresponding CNN to improve the policy exploration stability. As a case study, our proposed safe RL algorithms are implemented in the anti-jamming communication of unmanned aerial vehicles (UAVs) to select the frequency channel and transmit power to the ground node. Experimental results show that our proposed schemes significantly improve the UAV communication performance, save the UAV energy and increase the reward compared with the benchmark against jamming.

References

[1]

L. Xiao, X. Lu, D. Xu, Y. Tang, L. Wang, and W. Zhuang, “UAV relay in VANETs against smart jamming with reinforcement learning,” IEEE Trans. Veh. Technol., vol. 67, no. 5, pp. 4087–4097, May 2018.

[2]

L. Xiao, D. Jiang, D. Xu, H. Zhu, Y. Zhang, and H. V. Poor, “Two-dimensional antijamming mobile communication based on reinforcement learning,” IEEE Trans. Veh. Technol., vol. 67, no. 10, pp. 9499–9512, Oct. 2018.

[3]

F. Yao and L. Jia, “A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks,” IEEE Wireless Commun. Lett., vol. 8, no. 4, pp. 1024–1027, Aug. 2019.

[4]

L. Xiao, X. Lu, T. Xu, W. Zhuang, and H. Dai, “Reinforcement learning-based physical-layer authentication for controller area networks,” IEEE Trans. Inf. Forensics Security, vol. 16, pp. 2535–2547, 2021.

[5]

H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep reinforcement learning-based intelligent reflecting surface for secure wireless communications,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 375–388, Jan. 2021.

Digital Library

[6]

X. He, R. Jin, and H. Dai, “Deep PDS-learning for privacy-aware offloading in MEC-enabled IoT,” IEEE Internet Things J., vol. 6, no. 3, pp. 4547–4555, Jun. 2019.

[7]

D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Privacy-preserved task offloading in mobile blockchain with deep reinforcement learning,” IEEE Trans. Netw. Service Manage., vol. 17, no. 4, pp. 2536–2549, Dec. 2020.

Digital Library

[8]

L. Xiao, C. Xie, M. Min, and W. Zhuang, “User-centric view of unmanned aerial vehicle transmission against smart attacks,” IEEE Trans. Veh. Technol., vol. 67, no. 4, pp. 3420–3430, Apr. 2018.

[9]

C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), New Orleans, LA, USA, May 2019, pp. 1–15.

[10]

C. Dai, L. Xiao, X. Wan, and Y. Chen, “Reinforcement learning with safe exploration for network security,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Brighton, U.K., May 2019, pp. 3057–3061.

[11]

L. Xiao, X. Lu, T. Xu, X. Wan, W. Ji, and Y. Zhang, “Reinforcement learning-based mobile offloading for edge computing against jamming and interference,” IEEE Trans. Commun., vol. 68, no. 10, pp. 6114–6126, Oct. 2020.

[12]

T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” J. Artif. Intell. Res., vol. 13, pp. 227–303, Nov. 2000.

[13]

T. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Barcelona, Spain, Dec. 2016, pp. 3675–3683.

[14]

F. L. Da Silva, G. Warnell, A. H. R. Costa, and P. Stone, “Agents teaching agents: A survey on inter-agent transfer learning,” Auton. Agents Multi-Agent Syst., vol. 34, no. 1, pp. 1–17, Dec. 2019.

[15]

R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, Oct.2018.

Digital Library

[16]

C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, nos. 3–4, pp. 279–292, 1992.

Digital Library

[17]

V. Mnihet al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.

[18]

T. P. Lillicrapet al., “Continuous control with deep reinforcement learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Juan, PR, USA, May 2016, pp. 1–14.

[19]

A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dyn. Syst., vol. 13, nos. 1–2, pp. 41–77, Jan. 2003.

Digital Library

[20]

A. S. Vezhnevetset al., “FeUdal networks for hierarchical reinforcement learning,” Mar. 2017, arXiv:1703.01161.

[21]

O. Nachum, H. Lee, S. Gu, and S. Levine, “Data-efficient hierarchical reinforcement learning,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Montreal, QC, Canada, Dec. 2018, pp. 3303–3313.

[22]

J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” J. Mach. Learn. Res., vol. 16, no. 1, pp. 1437–1480, Aug. 2015.

Digital Library

[23]

S. Miryoosefi, K. Brantley, H. Daume, III, M. Dudik, and R. E. Schapire, “Reinforcement learning with convex constraints,” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2019, pp. 14093–14102.

[24]

M. Yu, Z. Yang, M. Kolar, and Z. Wang, “Convergent policy optimization for safe reinforcement learning,” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2019, pp. 3127–3139.

[25]

E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Actor-Mimic: Deep multitask and transfer reinforcement learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Juan, PR, USA, May 2016, pp. 1–16.

[26]

C. Jin, Z. Allen-Zhu, S. Bubeck, and M. I. Jordan, “Is Q-learning provably efficient?” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Montreal, QC, Canada, Dec. 2018, pp. 4868–4878.

[27]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, May 2015, pp. 1–15.

[28]

K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, Jun. 2015, pp. 5353–5360.

[29]

X. Xiao, W. Wang, T. Chen, Y. Cao, T. Jiang, and Q. Zhang, “Sensor-augmented neural adaptive bitrate video streaming on UAVs,” IEEE Trans. Multimedia, vol. 22, no. 6, pp. 1567–1576, Jun. 2020.

[30]

A. Trotta, F. D. Andreagiovanni, M. D. Felice, E. Natalizio, and K. R. Chowdhury, “When UAVs ride a bus: Towards energy-efficient city-scale video surveillance,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), Honolulu, HI, USA, Apr. 2018, pp. 1043–1051.

[31]

F. Yilmaz, “On the relationships between average channel capacity, average bit error rate, outage probability, and outage capacity over additive white Gaussian noise channels,” IEEE Trans. Commun., vol. 68, no. 5, pp. 2763–2776, May 2020.

[32]

M. K. Hanawal, M. J. Abdel-Rahman, and M. Krunz, “Joint adaptation of frequency hopping and transmission rate for anti-jamming wireless systems,” IEEE Trans. Mobile Comput., vol. 15, no. 9, pp. 2247–2259, Sep. 2016.

Digital Library

[33]

Latitude Matrice 200 V2 Series Technical Parameters. Accessed: Feb.21, 2019. [Online]. Available: https://www.dji.com/cn/matrice-200-series-v2/info#specs

Cited By

Lv ZXiao LDu YZhu YHan SLiu Y(2024)Efficient Communications in Multi-Agent Reinforcement Learning for Mobile ApplicationsIEEE Transactions on Wireless Communications10.1109/TWC.2024.339260823:9_Part_2(12440-12454)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TWC.2024.3392608
Ejaz MGui JAsim MEl-Affendi MFung CAbd El-Latif A(2024)RL-Planner: Reinforcement Learning-Enabled Efficient Path Planning in Multi-UAV MEC SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337867721:3(3317-3329)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3378677
Lu XLiu Z(2023)Safety Guide-Based Deep Reinforcement Learning for Network Security ApplicationsProceedings of the ACM Turing Award Celebration Conference - China 202310.1145/3603165.3607439(136-137)Online publication date: 28-Jul-2023
https://dl.acm.org/doi/10.1145/3603165.3607439
Show More Cited By

Recommendations

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Abstract
Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...
Safe Offline Reinforcement Learning Through Hierarchical Policies
Advances in Knowledge Discovery and Data Mining
Abstract
Recently, offline reinforcement learning has gained increasing attention. However, the safety of offline reinforcement learning has been ignored. It poses a significant challenge to learn a safe and high-performance policy from a fixed dataset ...
Safe exploration in reinforcement learning: a generalized formulation and algorithms
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Information Forensics and Security

IEEE Transactions on Information Forensics and Security Volume 17, Issue

2022

1497 pages

ISSN:1556-6013

Issue’s Table of Contents

1556-6021 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lv ZXiao LDu YZhu YHan SLiu Y(2024)Efficient Communications in Multi-Agent Reinforcement Learning for Mobile ApplicationsIEEE Transactions on Wireless Communications10.1109/TWC.2024.339260823:9_Part_2(12440-12454)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TWC.2024.3392608
Ejaz MGui JAsim MEl-Affendi MFung CAbd El-Latif A(2024)RL-Planner: Reinforcement Learning-Enabled Efficient Path Planning in Multi-UAV MEC SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337867721:3(3317-3329)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3378677
Lu XLiu Z(2023)Safety Guide-Based Deep Reinforcement Learning for Network Security ApplicationsProceedings of the ACM Turing Award Celebration Conference - China 202310.1145/3603165.3607439(136-137)Online publication date: 28-Jul-2023
https://dl.acm.org/doi/10.1145/3603165.3607439
Yang FWang LXu ZZhang JLi LQiao BCouturier CBansal CRam SQin SMa ZGoiri ÍCortez EYang TRühle VRajmohan SLin QZhang DAamodt TJerger NSwift M(2023)Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582028(631-643)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582028
Zhou QNiu YXiang PLi Y(2023)Intra-Domain Knowledge Reuse Assisted Reinforcement Learning for Fast Anti-Jamming CommunicationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328461118(4707-4720)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIFS.2023.3284611
Sun YAn KLi CLin ZNiu HNg DWang JAl-Dhahir N(2023)Joint Transmissive and Reflective RIS-Aided Secure MIMO Systems Design Under Spatially-Correlated Angular Uncertainty and Coupled PSEsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328313018(3606-3621)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIFS.2023.3283130
Yin HLyu Y(2023)GWO-Based Power Allocation Optimization Algorithm for Consumer IoT NetworksIEEE Transactions on Consumer Electronics10.1109/TCE.2023.332066170:1(1294-1301)Online publication date: 29-Sep-2023
https://dl.acm.org/doi/10.1109/TCE.2023.3320661
Lu XXiao LLi PJi XXu CYu SZhuang W(2023)Reinforcement Learning-Based Physical Cross-Layer Security and Privacy in 6GIEEE Communications Surveys & Tutorials10.1109/COMST.2022.322427925:1(425-466)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/COMST.2022.3224279
Zhao FWang QWang L(2023)An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithmKnowledge-Based Systems10.1016/j.knosys.2023.110368265:COnline publication date: 8-Apr-2023
https://dl.acm.org/doi/10.1016/j.knosys.2023.110368

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents