Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Q-learning-based non-zero sum games for Markov jump multiplayer systems under actor-critic NNs structure

Published: 18 October 2024 Publication History

Abstract

This article addresses the problem of non-zero sum games for Markov jump multiplayer systems (MJMSs) using the reinforcement Q-learning method. Firstly, the Q-functions for each player are derived from the system states and the control inputs. On this basis, by incorporating the integral reinforcement learning scheme and the actor-critic neural networks structure, we design a novel reinforcement learning approach for MJMSs. It should be noted that the designed algorithm does not require any information about the system dynamics and transition probabilities. Furthermore, the stochastic stability and Nash equilibrium of MJMSs can be ensured by the designed algorithm. Finally, a simulation example is presented to illustrate the effectiveness of the designed approach.

References

[1]
Rafik A. Aliev, Witold Pedrycz, Babek G. Guirimov, Rashad R. Aliev, Umit Ilhan, Mustafa Babagil, Sadik Mammadli, Type-2 fuzzy neural networks with fuzzy clustering and differential evolution optimization, Inf. Sci. 181 (9) (2011) 1591–1608.
[2]
Tao Bian, Zhong-Ping Jiang, Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: a value iteration approach, IEEE Trans. Netw. Learn. Syst. 33 (7) (2022) 2781–2790.
[3]
Shanling Dong, Meiqin Liu, Adaptive fuzzy asynchronous control for nonhomogeneous Markov jump power systems under hybrid attacks, IEEE Trans. Fuzzy Syst. 31 (3) (2023) 1009–1019.
[4]
Ge Guo, Renyongkang Zhang, Zeng-Di Zhou, A local-minimization-free zero-gradient-sum algorithm for distributed optimization, Automatica 157 (2023).
[5]
Shuping Hei, Jun Song, Zhengtao Ding, Fei Liu, Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm, IET Control Theory Appl. 9 (10) (2015) 1536–1543.
[6]
Yongming He, Lining Xing, Yingwu Chen, Witold Pedrycz, Ling Wang, Guohua Wu, A generic Markov decision process model and reinforcement learning method for scheduling agile Earth observation satellites, IEEE Trans. Syst. Man Cybern. Syst. 52 (3) (2022) 1463–1474.
[7]
Zhen Huang, Yidong Tu, Haiyang Fang, Hai Wang, Liang Zhang, Kaibo Shi, Shuping He, Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics, J. Franklin Inst. 360 (3) (2023) 2361–2378.
[8]
Hongyi Li, Peng Shi, Deyin Yao, Ligang Wu, Observer-based adaptive sliding mode control for nonlinear Markovian jump systems, Automatica 64 (2016) 133–142.
[9]
Menghua Li, Ding Wang, Mingming Zhao, Junfei Qiao, Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games, Inf. Sci. 631 (2023) 412–428.
[10]
Yongming Li, Tiechao Wang, Wei Liu, Shaocheng Tong, Neural network adaptive output-feedback optimal control for active suspension systems, IEEE Trans. Syst. Man Cybern. Syst. 52 (6) (2022) 4021–4032.
[11]
Mushuang Liu, Yan Wan, Frank L. Lewis, Victor G. Lopez, Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning, IEEE Trans. Netw. Learn. Syst. 31 (12) (2020) 5522–5533.
[12]
Mingkang Long, Qing An, Housheng Su, Hui Luo, Jin Zhao, Model-free algorithm for consensus of discrete-time multi-agent systems using reinforcement learning method, J. Franklin Inst. 360 (14) (2023) 10564–10581.
[13]
Zhongyang Ming, Huaguang Zhang, Weihua Li, Yanhong Luo, Base on Q-learning Pareto optimality for linear Itô stochastic systems with Markovian jumps, IEEE Trans. Autom. Sci. Eng. 21 (1) (2024) 965–975.
[14]
Chaoxu Mu, Ke Wang, Zhen Ni, Adaptive learning and sampled-control for nonlinear game systems using dynamic event-triggering strategy, IEEE Trans. Netw. Learn. Syst. 33 (9) (2022) 4437–4450.
[15]
Witold Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (4) (1998) 601–612.
[16]
Zhinan Peng, Rui Luo, Jiangping Hu, Kaibo Shi, Sing Kiong Nguang, Bijoy Kumar Ghosh, Optimal tracking control of nonlinear multiagent systems using internal reinforce Q-learning, IEEE Trans. Netw. Learn. Syst. 33 (8) (2022) 4043–4055.
[17]
Wenhai Qi, Guangdeng Zong, Hamid Reza Karimi, Sliding mode control for nonlinear stochastic semi-Markov switching systems with application to SRMM, IEEE Trans. Ind. Electron. 67 (5) (2020) 3955–3966.
[18]
Jun Song, Shuping He, Zhengtao Ding, Fei Liu, A new iterative algorithm for solving H ∞ control problem of continuous-time Markovian jumping linear systems based on online implementation, Int. J. Robust Nonlinear Control 26 (17) (2016) 3737–3754.
[19]
Cheng Tan, Chengzhen Gao, Zhengqiang Zhang, Wing Shing Wong, Non-fragile guaranteed cost control for networked nonlinear Markov jump systems under multiple cyber-attacks, J. Franklin Inst. 360 (13) (2023) 9446–9467.
[20]
Valery Ugrinovskii, Hemanshu Roy Pota, Decentralized control of power systems via robust control of uncertain Markov jump parameter systems, Int. J. Control 78 (9) (2005) 662–677.
[21]
Kyriakos G. Vamvoudakis, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica 61 (C) (2015) 274–281.
[22]
Alessandro N. Vargas, Gisela Pujol, Leonardo Acho, Stability of Markov jump systems with quadratic terms and its application to RLC circuits, J. Franklin Inst. 354 (1) (2017) 332–344.
[23]
Dong Wang, Jiaxun Liu, Jie Lian, Yang Liu, Zhu Wang, Wei Wang, Distributed delayed dual averaging for distributed optimization over time-varying digraphs, Automatica 150 (2023).
[24]
Dong Wang, Wei Wang, Necessary and sufficient conditions for containment control of multi-agent systems with time delay, Automatica 103 (2019) 418–423.
[25]
Jing Wang, Jiacheng Wu, Hao Shen, Jinde Cao, Leszek Rutkowski, Fuzzy H ∞ control of discrete-time nonlinear Markov jump systems via a novel hybrid reinforcement Q-learning method, IEEE Trans. Cybern. 53 (11) (2023) 7380–7391.
[26]
Wang Ke, Chaoxu Mu, Asynchronous learning for actor–critic neural networks and synchronous triggering for multiplayer system, ISA Trans. 129 (2022) 295–308.
[27]
Wang Ke, Chaoxu Mu, Learning-based control with decentralized dynamic event-triggering for vehicle systems, IEEE Trans. Ind. Inform. 19 (3) (2023) 2629–2639.
[28]
Qinglai Wei, Liao Zhu, Ruizhuo Song, Pinjia Zhang, Derong Liu, Jun Xiao, Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game, IEEE Trans. Netw. Learn. Syst. 33 (2) (2022) 879–892.
[29]
Lifei Xie, Jun Cheng, Yanli Zou, Zheng-Guang Wu, Huaicheng Yan, A dynamic-memory event-triggered protocol to multiarea power systems with semi-Markov jumping parameter, IEEE Trans. Cybern. 53 (10) (2023) 6577–6587.
[30]
Xilin Xin, Yidong Tu, Vladimir Stojanovic, Hai Wang, Kaibo Shi, Shuping He, Tianhong Pan, Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems, Appl. Math. Comput. 412 (2022).
[31]
Yanyan Yin, Peng Shi, Fei Liu, Kok Lay Teo, Cheng-Chew Lim, Robust filtering for nonlinear nonhomogeneous Markov jump systems by fuzzy approximation approach, IEEE Trans. Cybern. 45 (9) (2015) 1706–1716.
[32]
Iuliu Alexandru Zamfirache, Radu-Emil Precup, Raul-Cristian Roman, Emil M. Petriu, Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system, Inf. Sci. 583 (2022) 99–120.
[33]
Chengke Zhang, Fangchao Li, Non-zero sum differential game for stochastic Markovian jump systems with partially unknown transition probabilities, J. Franklin Inst. 358 (15) (2021) 7528–7558.
[34]
Haoyan Zhang, Huanqing Wang, Ben Niu, Liang Zhang, Adil M. Ahmad, Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time, Inf. Sci. 580 (2021) 756–774.
[35]
Huiyan Zhang, Zixian Chen, Ning Zhao, Bin Xing, Mathiyalagan Kalidass, Adaptive neural dissipative control for Markovian jump cyber-physical systems against sensor and actuator attacks, J. Franklin Inst. 360 (12) (2023) 7676–7698.
[36]
Jilie Zhang, Zhanshan Wang, Hongwei Zhang, Data-based optimal control of multiagent systems: a reinforcement learning design approach, IEEE Trans. Cybern. 49 (12) (2019) 4441–4449.
[37]
Kun Zhang, Rong Su, Huaguang Zhang, A novel resilient control scheme for a class of Markovian jump systems with partially unknown information, IEEE Trans. Cybern. 52 (8) (2022) 8191–8200.
[38]
Kun Zhang, Hua-guang Zhang, Yuliang Cai, Rong Su, Parallel optimal tracking control schemes for mode-dependent control of coupled Markov jump systems via integral RL method, IEEE Trans. Autom. Sci. Eng. 17 (3) (2020) 1332–1342.
[39]
Yongwei Zhang, Bo Zhao, Derong Liu, Shunchao Zhang, Event-triggered optimal tracking control of multiplayer unknown nonlinear systems via adaptive critic designs, Int. J. Robust Nonlinear Control 32 (1) (2022) 29–51.
[40]
Yanwei Zhao, Huanqing Wang, Ning Xu, Guangdeng Zong, Xudong Zhao, Reinforcement learning-based decentralized fault tolerant control for constrained interconnected nonlinear systems, Chaos Solitons Fractals 167 (2023).
[41]
Peixin Zhou, Huiwen Xue, Jiwei Wen, Peng Shi, Xaoli Luan, Model-free optimal tracking policies for Markov jump systems by solving non-zero-sum games, Inf. Sci. 647 (2023).
[42]
J. Zhu, L.P. Wang, Maksym Spiryagin, Control and decision strategy for a class of Markovian jump systems in failure prone manufacturing process, IET Control Theory Appl. 6 (12) (2012) 1803–1811.
[43]
Xinye Zhu, Tianjiao An, Bo Dong, Multiplayer zero-sum games optimal control for modular robot manipulators with interconnected dynamic couplings, Int. J. Adapt. Control Signal Process. 36 (12) (2022) 3254–3270.

Index Terms

  1. Q-learning-based non-zero sum games for Markov jump multiplayer systems under actor-critic NNs structure
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Information Sciences: an International Journal
    Information Sciences: an International Journal  Volume 681, Issue C
    Oct 2024
    1022 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 18 October 2024

    Author Tags

    1. Markov jump systems
    2. Q-learning
    3. Integral reinforcement learning
    4. Non-zero sum games

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media