Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Safe Exploration in Wireless Security: A Safe Reinforcement Learning Algorithm With Hierarchical Structure

Published: 01 January 2022 Publication History

Abstract

Most safe reinforcement learning (RL) algorithms depend on the accurate reward that is rarely available in wireless security applications and suffer from severe performance degradation for the learning agents that have to choose the policy from a large action set. In this paper, we propose a safe RL algorithm, which uses a policy priority-based hierarchical structure to divide each policy into sub-policies with different selection priorities and thus compresses the action set. By applying inter-agent transfer learning to initialize the learning parameters, this algorithm accelerates the initial exploration of the optimal policy. Based on a security criterion that evaluates the risk value, the sub-policy distribution formulation avoids the dangerous sub-policies that cause learning failure such as severe network security problems in wireless security applications, e.g., Internet services interruption. We also propose a deep safe RL and design four deep neural networks in each sub-policy selection to further improve the learning efficiency for the learning agents that support four convolutional neural networks (CNNs): The Q-network evaluates the long-term expected reward of each sub-policy under the current state, and the E-network evaluates the long-term risk value. The target Q and E-networks update the learning parameters of the corresponding CNN to improve the policy exploration stability. As a case study, our proposed safe RL algorithms are implemented in the anti-jamming communication of unmanned aerial vehicles (UAVs) to select the frequency channel and transmit power to the ground node. Experimental results show that our proposed schemes significantly improve the UAV communication performance, save the UAV energy and increase the reward compared with the benchmark against jamming.

References

[1]
L. Xiao, X. Lu, D. Xu, Y. Tang, L. Wang, and W. Zhuang, “UAV relay in VANETs against smart jamming with reinforcement learning,” IEEE Trans. Veh. Technol., vol. 67, no. 5, pp. 4087–4097, May 2018.
[2]
L. Xiao, D. Jiang, D. Xu, H. Zhu, Y. Zhang, and H. V. Poor, “Two-dimensional antijamming mobile communication based on reinforcement learning,” IEEE Trans. Veh. Technol., vol. 67, no. 10, pp. 9499–9512, Oct. 2018.
[3]
F. Yao and L. Jia, “A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks,” IEEE Wireless Commun. Lett., vol. 8, no. 4, pp. 1024–1027, Aug. 2019.
[4]
L. Xiao, X. Lu, T. Xu, W. Zhuang, and H. Dai, “Reinforcement learning-based physical-layer authentication for controller area networks,” IEEE Trans. Inf. Forensics Security, vol. 16, pp. 2535–2547, 2021.
[5]
H. Yang, Z. Xiong, J. Zhao, D. Niyato, L. Xiao, and Q. Wu, “Deep reinforcement learning-based intelligent reflecting surface for secure wireless communications,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 375–388, Jan. 2021.
[6]
X. He, R. Jin, and H. Dai, “Deep PDS-learning for privacy-aware offloading in MEC-enabled IoT,” IEEE Internet Things J., vol. 6, no. 3, pp. 4547–4555, Jun. 2019.
[7]
D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Privacy-preserved task offloading in mobile blockchain with deep reinforcement learning,” IEEE Trans. Netw. Service Manage., vol. 17, no. 4, pp. 2536–2549, Dec. 2020.
[8]
L. Xiao, C. Xie, M. Min, and W. Zhuang, “User-centric view of unmanned aerial vehicle transmission against smart attacks,” IEEE Trans. Veh. Technol., vol. 67, no. 4, pp. 3420–3430, Apr. 2018.
[9]
C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), New Orleans, LA, USA, May 2019, pp. 1–15.
[10]
C. Dai, L. Xiao, X. Wan, and Y. Chen, “Reinforcement learning with safe exploration for network security,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Brighton, U.K., May 2019, pp. 3057–3061.
[11]
L. Xiao, X. Lu, T. Xu, X. Wan, W. Ji, and Y. Zhang, “Reinforcement learning-based mobile offloading for edge computing against jamming and interference,” IEEE Trans. Commun., vol. 68, no. 10, pp. 6114–6126, Oct. 2020.
[12]
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” J. Artif. Intell. Res., vol. 13, pp. 227–303, Nov. 2000.
[13]
T. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Barcelona, Spain, Dec. 2016, pp. 3675–3683.
[14]
F. L. Da Silva, G. Warnell, A. H. R. Costa, and P. Stone, “Agents teaching agents: A survey on inter-agent transfer learning,” Auton. Agents Multi-Agent Syst., vol. 34, no. 1, pp. 1–17, Dec. 2019.
[15]
R. Sutton and A. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, Oct.2018.
[16]
C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, nos. 3–4, pp. 279–292, 1992.
[17]
V. Mnihet al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
[18]
T. P. Lillicrapet al., “Continuous control with deep reinforcement learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Juan, PR, USA, May 2016, pp. 1–14.
[19]
A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dyn. Syst., vol. 13, nos. 1–2, pp. 41–77, Jan. 2003.
[20]
A. S. Vezhnevetset al., “FeUdal networks for hierarchical reinforcement learning,” Mar. 2017, arXiv:1703.01161.
[21]
O. Nachum, H. Lee, S. Gu, and S. Levine, “Data-efficient hierarchical reinforcement learning,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Montreal, QC, Canada, Dec. 2018, pp. 3303–3313.
[22]
J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” J. Mach. Learn. Res., vol. 16, no. 1, pp. 1437–1480, Aug. 2015.
[23]
S. Miryoosefi, K. Brantley, H. Daume, III, M. Dudik, and R. E. Schapire, “Reinforcement learning with convex constraints,” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2019, pp. 14093–14102.
[24]
M. Yu, Z. Yang, M. Kolar, and Z. Wang, “Convergent policy optimization for safe reinforcement learning,” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2019, pp. 3127–3139.
[25]
E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Actor-Mimic: Deep multitask and transfer reinforcement learning,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Juan, PR, USA, May 2016, pp. 1–16.
[26]
C. Jin, Z. Allen-Zhu, S. Bubeck, and M. I. Jordan, “Is Q-learning provably efficient?” in Proc. Conf. Adv. Neural Inf. Process. Syst. (NIPS), Montreal, QC, Canada, Dec. 2018, pp. 4868–4878.
[27]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, May 2015, pp. 1–15.
[28]
K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, Jun. 2015, pp. 5353–5360.
[29]
X. Xiao, W. Wang, T. Chen, Y. Cao, T. Jiang, and Q. Zhang, “Sensor-augmented neural adaptive bitrate video streaming on UAVs,” IEEE Trans. Multimedia, vol. 22, no. 6, pp. 1567–1576, Jun. 2020.
[30]
A. Trotta, F. D. Andreagiovanni, M. D. Felice, E. Natalizio, and K. R. Chowdhury, “When UAVs ride a bus: Towards energy-efficient city-scale video surveillance,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), Honolulu, HI, USA, Apr. 2018, pp. 1043–1051.
[31]
F. Yilmaz, “On the relationships between average channel capacity, average bit error rate, outage probability, and outage capacity over additive white Gaussian noise channels,” IEEE Trans. Commun., vol. 68, no. 5, pp. 2763–2776, May 2020.
[32]
M. K. Hanawal, M. J. Abdel-Rahman, and M. Krunz, “Joint adaptation of frequency hopping and transmission rate for anti-jamming wireless systems,” IEEE Trans. Mobile Comput., vol. 15, no. 9, pp. 2247–2259, Sep. 2016.
[33]
Latitude Matrice 200 V2 Series Technical Parameters. Accessed: Feb.21, 2019. [Online]. Available: https://www.dji.com/cn/matrice-200-series-v2/info#specs

Cited By

View all
  • (2024)Efficient Communications in Multi-Agent Reinforcement Learning for Mobile ApplicationsIEEE Transactions on Wireless Communications10.1109/TWC.2024.339260823:9_Part_2(12440-12454)Online publication date: 1-Sep-2024
  • (2024)RL-Planner: Reinforcement Learning-Enabled Efficient Path Planning in Multi-UAV MEC SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337867721:3(3317-3329)Online publication date: 1-Jun-2024
  • (2023)Safety Guide-Based Deep Reinforcement Learning for Network Security ApplicationsProceedings of the ACM Turing Award Celebration Conference - China 202310.1145/3603165.3607439(136-137)Online publication date: 28-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security  Volume 17, Issue
2022
1497 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Communications in Multi-Agent Reinforcement Learning for Mobile ApplicationsIEEE Transactions on Wireless Communications10.1109/TWC.2024.339260823:9_Part_2(12440-12454)Online publication date: 1-Sep-2024
  • (2024)RL-Planner: Reinforcement Learning-Enabled Efficient Path Planning in Multi-UAV MEC SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337867721:3(3317-3329)Online publication date: 1-Jun-2024
  • (2023)Safety Guide-Based Deep Reinforcement Learning for Network Security ApplicationsProceedings of the ACM Turing Award Celebration Conference - China 202310.1145/3603165.3607439(136-137)Online publication date: 28-Jul-2023
  • (2023)Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582028(631-643)Online publication date: 25-Mar-2023
  • (2023)Intra-Domain Knowledge Reuse Assisted Reinforcement Learning for Fast Anti-Jamming CommunicationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328461118(4707-4720)Online publication date: 1-Jan-2023
  • (2023)Joint Transmissive and Reflective RIS-Aided Secure MIMO Systems Design Under Spatially-Correlated Angular Uncertainty and Coupled PSEsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.328313018(3606-3621)Online publication date: 1-Jan-2023
  • (2023)GWO-Based Power Allocation Optimization Algorithm for Consumer IoT NetworksIEEE Transactions on Consumer Electronics10.1109/TCE.2023.332066170:1(1294-1301)Online publication date: 29-Sep-2023
  • (2023)Reinforcement Learning-Based Physical Cross-Layer Security and Privacy in 6GIEEE Communications Surveys & Tutorials10.1109/COMST.2022.322427925:1(425-466)Online publication date: 1-Jan-2023
  • (2023)An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithmKnowledge-Based Systems10.1016/j.knosys.2023.110368265:COnline publication date: 8-Apr-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media