research-article

Q-learning-based non-zero sum games for Markov jump multiplayer systems under actor-critic NNs structure

Authors:

Hao ShenAuthors Info & Claims

Volume 681, Issue C

https://doi.org/10.1016/j.ins.2024.121196

Published: 18 October 2024 Publication History

Abstract

This article addresses the problem of non-zero sum games for Markov jump multiplayer systems (MJMSs) using the reinforcement Q-learning method. Firstly, the Q-functions for each player are derived from the system states and the control inputs. On this basis, by incorporating the integral reinforcement learning scheme and the actor-critic neural networks structure, we design a novel reinforcement learning approach for MJMSs. It should be noted that the designed algorithm does not require any information about the system dynamics and transition probabilities. Furthermore, the stochastic stability and Nash equilibrium of MJMSs can be ensured by the designed algorithm. Finally, a simulation example is presented to illustrate the effectiveness of the designed approach.

References

[1]

Rafik A. Aliev, Witold Pedrycz, Babek G. Guirimov, Rashad R. Aliev, Umit Ilhan, Mustafa Babagil, Sadik Mammadli, Type-2 fuzzy neural networks with fuzzy clustering and differential evolution optimization, Inf. Sci. 181 (9) (2011) 1591–1608.

[2]

Tao Bian, Zhong-Ping Jiang, Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: a value iteration approach, IEEE Trans. Netw. Learn. Syst. 33 (7) (2022) 2781–2790.

[3]

Shanling Dong, Meiqin Liu, Adaptive fuzzy asynchronous control for nonhomogeneous Markov jump power systems under hybrid attacks, IEEE Trans. Fuzzy Syst. 31 (3) (2023) 1009–1019.

[4]

Ge Guo, Renyongkang Zhang, Zeng-Di Zhou, A local-minimization-free zero-gradient-sum algorithm for distributed optimization, Automatica 157 (2023).

[5]

Shuping Hei, Jun Song, Zhengtao Ding, Fei Liu, Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm, IET Control Theory Appl. 9 (10) (2015) 1536–1543.

[6]

Yongming He, Lining Xing, Yingwu Chen, Witold Pedrycz, Ling Wang, Guohua Wu, A generic Markov decision process model and reinforcement learning method for scheduling agile Earth observation satellites, IEEE Trans. Syst. Man Cybern. Syst. 52 (3) (2022) 1463–1474.

[7]

Zhen Huang, Yidong Tu, Haiyang Fang, Hai Wang, Liang Zhang, Kaibo Shi, Shuping He, Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics, J. Franklin Inst. 360 (3) (2023) 2361–2378.

[8]

Hongyi Li, Peng Shi, Deyin Yao, Ligang Wu, Observer-based adaptive sliding mode control for nonlinear Markovian jump systems, Automatica 64 (2016) 133–142.

[9]

Menghua Li, Ding Wang, Mingming Zhao, Junfei Qiao, Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games, Inf. Sci. 631 (2023) 412–428.

[10]

Yongming Li, Tiechao Wang, Wei Liu, Shaocheng Tong, Neural network adaptive output-feedback optimal control for active suspension systems, IEEE Trans. Syst. Man Cybern. Syst. 52 (6) (2022) 4021–4032.

[11]

Mushuang Liu, Yan Wan, Frank L. Lewis, Victor G. Lopez, Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning, IEEE Trans. Netw. Learn. Syst. 31 (12) (2020) 5522–5533.

[12]

Mingkang Long, Qing An, Housheng Su, Hui Luo, Jin Zhao, Model-free algorithm for consensus of discrete-time multi-agent systems using reinforcement learning method, J. Franklin Inst. 360 (14) (2023) 10564–10581.

[13]

Zhongyang Ming, Huaguang Zhang, Weihua Li, Yanhong Luo, Base on Q-learning Pareto optimality for linear Itô stochastic systems with Markovian jumps, IEEE Trans. Autom. Sci. Eng. 21 (1) (2024) 965–975.

[14]

Chaoxu Mu, Ke Wang, Zhen Ni, Adaptive learning and sampled-control for nonlinear game systems using dynamic event-triggering strategy, IEEE Trans. Netw. Learn. Syst. 33 (9) (2022) 4437–4450.

[15]

Witold Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw. 9 (4) (1998) 601–612.

[16]

Zhinan Peng, Rui Luo, Jiangping Hu, Kaibo Shi, Sing Kiong Nguang, Bijoy Kumar Ghosh, Optimal tracking control of nonlinear multiagent systems using internal reinforce Q-learning, IEEE Trans. Netw. Learn. Syst. 33 (8) (2022) 4043–4055.

[17]

Wenhai Qi, Guangdeng Zong, Hamid Reza Karimi, Sliding mode control for nonlinear stochastic semi-Markov switching systems with application to SRMM, IEEE Trans. Ind. Electron. 67 (5) (2020) 3955–3966.

[18]

Jun Song, Shuping He, Zhengtao Ding, Fei Liu, A new iterative algorithm for solving H ∞ control problem of continuous-time Markovian jumping linear systems based on online implementation, Int. J. Robust Nonlinear Control 26 (17) (2016) 3737–3754.

[19]

Cheng Tan, Chengzhen Gao, Zhengqiang Zhang, Wing Shing Wong, Non-fragile guaranteed cost control for networked nonlinear Markov jump systems under multiple cyber-attacks, J. Franklin Inst. 360 (13) (2023) 9446–9467.

[20]

Valery Ugrinovskii, Hemanshu Roy Pota, Decentralized control of power systems via robust control of uncertain Markov jump parameter systems, Int. J. Control 78 (9) (2005) 662–677.

[21]

Kyriakos G. Vamvoudakis, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica 61 (C) (2015) 274–281.

[22]

Alessandro N. Vargas, Gisela Pujol, Leonardo Acho, Stability of Markov jump systems with quadratic terms and its application to RLC circuits, J. Franklin Inst. 354 (1) (2017) 332–344.

[23]

Dong Wang, Jiaxun Liu, Jie Lian, Yang Liu, Zhu Wang, Wei Wang, Distributed delayed dual averaging for distributed optimization over time-varying digraphs, Automatica 150 (2023).

[24]

Dong Wang, Wei Wang, Necessary and sufficient conditions for containment control of multi-agent systems with time delay, Automatica 103 (2019) 418–423.

[25]

Jing Wang, Jiacheng Wu, Hao Shen, Jinde Cao, Leszek Rutkowski, Fuzzy H ∞ control of discrete-time nonlinear Markov jump systems via a novel hybrid reinforcement Q-learning method, IEEE Trans. Cybern. 53 (11) (2023) 7380–7391.

[26]

Wang Ke, Chaoxu Mu, Asynchronous learning for actor–critic neural networks and synchronous triggering for multiplayer system, ISA Trans. 129 (2022) 295–308.

[27]

Wang Ke, Chaoxu Mu, Learning-based control with decentralized dynamic event-triggering for vehicle systems, IEEE Trans. Ind. Inform. 19 (3) (2023) 2629–2639.

[28]

Qinglai Wei, Liao Zhu, Ruizhuo Song, Pinjia Zhang, Derong Liu, Jun Xiao, Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game, IEEE Trans. Netw. Learn. Syst. 33 (2) (2022) 879–892.

[29]

Lifei Xie, Jun Cheng, Yanli Zou, Zheng-Guang Wu, Huaicheng Yan, A dynamic-memory event-triggered protocol to multiarea power systems with semi-Markov jumping parameter, IEEE Trans. Cybern. 53 (10) (2023) 6577–6587.

[30]

Xilin Xin, Yidong Tu, Vladimir Stojanovic, Hai Wang, Kaibo Shi, Shuping He, Tianhong Pan, Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems, Appl. Math. Comput. 412 (2022).

Digital Library

[31]

Yanyan Yin, Peng Shi, Fei Liu, Kok Lay Teo, Cheng-Chew Lim, Robust filtering for nonlinear nonhomogeneous Markov jump systems by fuzzy approximation approach, IEEE Trans. Cybern. 45 (9) (2015) 1706–1716.

[32]

Iuliu Alexandru Zamfirache, Radu-Emil Precup, Raul-Cristian Roman, Emil M. Petriu, Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system, Inf. Sci. 583 (2022) 99–120.

Digital Library

[33]

Chengke Zhang, Fangchao Li, Non-zero sum differential game for stochastic Markovian jump systems with partially unknown transition probabilities, J. Franklin Inst. 358 (15) (2021) 7528–7558.

[34]

Haoyan Zhang, Huanqing Wang, Ben Niu, Liang Zhang, Adil M. Ahmad, Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time, Inf. Sci. 580 (2021) 756–774.

[35]

Huiyan Zhang, Zixian Chen, Ning Zhao, Bin Xing, Mathiyalagan Kalidass, Adaptive neural dissipative control for Markovian jump cyber-physical systems against sensor and actuator attacks, J. Franklin Inst. 360 (12) (2023) 7676–7698.

[36]

Jilie Zhang, Zhanshan Wang, Hongwei Zhang, Data-based optimal control of multiagent systems: a reinforcement learning design approach, IEEE Trans. Cybern. 49 (12) (2019) 4441–4449.

[37]

Kun Zhang, Rong Su, Huaguang Zhang, A novel resilient control scheme for a class of Markovian jump systems with partially unknown information, IEEE Trans. Cybern. 52 (8) (2022) 8191–8200.

[38]

Kun Zhang, Hua-guang Zhang, Yuliang Cai, Rong Su, Parallel optimal tracking control schemes for mode-dependent control of coupled Markov jump systems via integral RL method, IEEE Trans. Autom. Sci. Eng. 17 (3) (2020) 1332–1342.

[39]

Yongwei Zhang, Bo Zhao, Derong Liu, Shunchao Zhang, Event-triggered optimal tracking control of multiplayer unknown nonlinear systems via adaptive critic designs, Int. J. Robust Nonlinear Control 32 (1) (2022) 29–51.

[40]

Yanwei Zhao, Huanqing Wang, Ning Xu, Guangdeng Zong, Xudong Zhao, Reinforcement learning-based decentralized fault tolerant control for constrained interconnected nonlinear systems, Chaos Solitons Fractals 167 (2023).

[41]

Peixin Zhou, Huiwen Xue, Jiwei Wen, Peng Shi, Xaoli Luan, Model-free optimal tracking policies for Markov jump systems by solving non-zero-sum games, Inf. Sci. 647 (2023).

[42]

J. Zhu, L.P. Wang, Maksym Spiryagin, Control and decision strategy for a class of Markovian jump systems in failure prone manufacturing process, IET Control Theory Appl. 6 (12) (2012) 1803–1811.

[43]

Xinye Zhu, Tianjiao An, Bo Dong, Multiplayer zero-sum games optimal control for modular robot manipulators with interconnected dynamic couplings, Int. J. Adapt. Control Signal Process. 36 (12) (2022) 3254–3270.

Index Terms

Q-learning-based non-zero sum games for Markov jump multiplayer systems under actor-critic NNs structure
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations

Index terms have been assigned to the content through auto-classification.

Recommendations

Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method
Abstract
In this paper, the zero-sum game problem for linear discrete-time Markov jump systems is solved by two novel model-free reinforcement Q-learning algorithms, on-policy Q-learning and off-policy Q-learning. Firstly, under the framework of the zero-...
Highlights
- The zero-sum game problem for linear discrete-time Markov jump systems is addressed with the reinforcement learning method.
- A novel model-free parallel off-policy Q-learning method is developed, which is not affected by probing noise.
QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-...
Stochastic Stabilization of Discrete-Time Markov Jump Systems with Generalized Delay and Deficient Transition Rates

In this paper, the problem of mode-dependent state feedback controller design is studied for discrete-time Markov jump systems with generalized delay and deficient transition rates. The time delay under consideration is subject to mode-dependent and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 681, Issue C

Oct 2024

1022 pages

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 18 October 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents