Article

A multiagent reinforcement learning algorithm using extended optimal response

Authors:

Nobuo Suematsu,

Akira HayashiAuthors Info & Claims

AAMAS '02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1

Pages 370 - 377

https://doi.org/10.1145/544741.544831

Published: 15 July 2002 Publication History

Get Access

Abstract

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games by Hu and Wellman. Given a stochastic game, if all agents learn with their algorithm, we can expect that the policies of the agents converge to a Nash equilibrium. However, agents with their algorithm always try to converge to a Nash equilibrium independent of the policies used by the other agents. In addition, in case there are multiple Nash equilibria, agents must agree on the equilibrium where they want to reach. Thus, their algorithm lacks adaptability in a sense. In this paper, we propose a multiagent reinforcement learning algorithm. The algorithm uses the extended optimal response which we introduce in this paper. It will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response. We also provide some empirical results in three simple stochastic games, which show that the algorithm can realize what we intend.

References

[1]

M. Bowling. Convergence problems of general-sum multiagent reinforcement learning. In Proc. 17th International Conf. on Machine Learning, pages 89--94. Morgan Kaufmann, San Francisco, CA, 2000.

Digital Library

Google Scholar

[2]

M. Bowling and M. Veloso. Rational and convergent learning in stochastic games. In In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, August 2001. Seattle, WA, August 2001.

Digital Library

Google Scholar

[3]

C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98) and of the 10th Conference on Innovative Applications of Artificial Intelligence (IAAI-98), pages 746--752, Menlo Park, July 26--30 1998. AAAI Press.

Digital Library

Google Scholar

[4]

J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer-Verlag, 1997.

Digital Library

Google Scholar

[5]

J. Hu and M. P. Wellman. Multiagent reinforcement learning: theoretical framework and an algorithm. In Proc. the 15th International Conference on Machine Learning, pages 242--250. Morgan Kaufmann, San Francisco, CA, 1998.

Digital Library

Google Scholar

[6]

J. Hu and M. P. Wellman. Experimental results on q-learning for general-sum stochastic games. In Proc. the 17th International Conference on Machine Learning, 2000.

Digital Library

Google Scholar

[7]

M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proc. 7th International Conference on Machine Learning, pages 157--163, 1994.

Crossref

Google Scholar

[8]

M. L. Littman. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research, 2:55--66, 2001.

Digital Library

Google Scholar

[9]

J. F. Nash. Non-cooperative games. Annals of Mathematics, 54:286--295, 1951.

Crossref

Google Scholar

[10]

M. L. Puterman. Markov Decision Processes - Discrete Stochastic Dynamic Programming. Jhon Wiley & Sons, Inc., New York, NY., 1994.

Digital Library

Google Scholar

[11]

H. Robins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400--407, 1951.

Crossref

Google Scholar

[12]

P. Stone and M. M. Veloso. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3):345--383, 2000.

Digital Library

Google Scholar

[13]

R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1997.

Digital Library

Google Scholar

[14]

C. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge, England, 1989.

Google Scholar

Cited By

View all

Dolgopolov A(2024)Reinforcement Learning in a Prisoner's DilemmaGames and Economic Behavior10.1016/j.geb.2024.01.004Online publication date: Jan-2024
https://doi.org/10.1016/j.geb.2024.01.004
Shweta Pandey Rohit Agarwal Sachin Bhardwaj Sanjay Kumar Singh Dr. Yusuf Perwej Niraj Kumar Singh (2023)A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly MannerInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2390147(206-227)Online publication date: 10-Feb-2023
https://doi.org/10.32628/CSEIT2390147
Huang HHu ZLu ZWen X(2023)Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive NetworkIEEE Transactions on Cybernetics10.1109/TCYB.2021.308722853:1(262-274)Online publication date: Jan-2023
https://doi.org/10.1109/TCYB.2021.3087228
Show More Cited By

Index Terms

A multiagent reinforcement learning algorithm using extended optimal response
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
    2. Planning and scheduling
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Dynamic programming

Recommendations

QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Markov games is a framework which can be used to formalise n-agent reinforcement learning (RL). Littman (Markov games as a framework for multi-agent reinforcement learning, in: Proceedings of the 11th International Conference on Machine Learning (ICML-...
Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2

This paper introduces a multiagent reinforcement learning algorithm that converges with a given accuracy to stationary Nash equilibria in general-sum discounted stochastic games. Under some assumptions we formally prove its convergence to Nash ...
Colearning in Differential Games

Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

AAMAS '02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1

July 2002

540 pages

ISBN:1581134800

DOI:10.1145/544741

Conference Chairs:
Maria Gini
University of Minnesota, USA
,
Toru Ishida
Kyoto University, Japan
,
Program Chairs:
Cristiano Castelfranchi
CNR and Università di Siena, Italy
,
W. Lewis Johnson
University of Southern California, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

AAMAS02

Sponsor:

AAMAS02: The First International Joint Conference on Autonomous Agents and Multi-Agent Systems ( formerly known as Autonomous Agents )

July 15 - 19, 2002

Bologna, Italy

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
819
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Dolgopolov A(2024)Reinforcement Learning in a Prisoner's DilemmaGames and Economic Behavior10.1016/j.geb.2024.01.004Online publication date: Jan-2024
https://doi.org/10.1016/j.geb.2024.01.004
Shweta Pandey Rohit Agarwal Sachin Bhardwaj Sanjay Kumar Singh Dr. Yusuf Perwej Niraj Kumar Singh (2023)A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly MannerInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2390147(206-227)Online publication date: 10-Feb-2023
https://doi.org/10.32628/CSEIT2390147
Huang HHu ZLu ZWen X(2023)Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive NetworkIEEE Transactions on Cybernetics10.1109/TCYB.2021.308722853:1(262-274)Online publication date: Jan-2023
https://doi.org/10.1109/TCYB.2021.3087228
Liu HZhang ZWang D(2020)WRFMR: A Multi-Agent Reinforcement Learning Method for Cooperative TasksIEEE Access10.1109/ACCESS.2020.30409858(216320-216331)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3040985
Sadhu AKonar A(2020)Improve Convergence Speed of Multi‐Agent Q‐Learning for Cooperative Task PlanningMulti‐Agent Coordination10.1002/9781119699057.ch2(111-166)Online publication date: 4-Dec-2020
https://doi.org/10.1002/9781119699057.ch2
Sadhu AKonar A(2020)IntroductionMulti‐Agent Coordination10.1002/9781119699057.ch1(1-110)Online publication date: 4-Dec-2020
https://doi.org/10.1002/9781119699057.ch1
Gan XGuo HLi Z(2019)A New Multi-Agent Reinforcement Learning Method based on Evolving Dynamic Correlation MatrixIEEE Access10.1109/ACCESS.2019.2946848(1-1)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2946848
Hernandez-Leal PZhan YTaylor MSucar LMunoz De Cote E(2017)An exploration strategy for non-stationary opponentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9347-331:5(971-1002)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s10458-016-9347-3
Vamvoudakis K(2016)Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamicsInternational Journal of Robust and Nonlinear Control10.1002/rnc.371927:16(2900-2920)Online publication date: 30-Nov-2016
https://doi.org/10.1002/rnc.3719
Vamvoudakis K(2015)Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systemsAutomatica (Journal of IFAC)10.1016/j.automatica.2015.08.01761:C(274-281)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.automatica.2015.08.017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games

Colearning in Differential Games