Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/544741.544831acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
Article

A multiagent reinforcement learning algorithm using extended optimal response

Published: 15 July 2002 Publication History

Abstract

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games by Hu and Wellman. Given a stochastic game, if all agents learn with their algorithm, we can expect that the policies of the agents converge to a Nash equilibrium. However, agents with their algorithm always try to converge to a Nash equilibrium independent of the policies used by the other agents. In addition, in case there are multiple Nash equilibria, agents must agree on the equilibrium where they want to reach. Thus, their algorithm lacks adaptability in a sense. In this paper, we propose a multiagent reinforcement learning algorithm. The algorithm uses the extended optimal response which we introduce in this paper. It will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response. We also provide some empirical results in three simple stochastic games, which show that the algorithm can realize what we intend.

References

[1]
M. Bowling. Convergence problems of general-sum multiagent reinforcement learning. In Proc. 17th International Conf. on Machine Learning, pages 89--94. Morgan Kaufmann, San Francisco, CA, 2000.
[2]
M. Bowling and M. Veloso. Rational and convergent learning in stochastic games. In In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, August 2001. Seattle, WA, August 2001.
[3]
C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98) and of the 10th Conference on Innovative Applications of Artificial Intelligence (IAAI-98), pages 746--752, Menlo Park, July 26--30 1998. AAAI Press.
[4]
J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer-Verlag, 1997.
[5]
J. Hu and M. P. Wellman. Multiagent reinforcement learning: theoretical framework and an algorithm. In Proc. the 15th International Conference on Machine Learning, pages 242--250. Morgan Kaufmann, San Francisco, CA, 1998.
[6]
J. Hu and M. P. Wellman. Experimental results on q-learning for general-sum stochastic games. In Proc. the 17th International Conference on Machine Learning, 2000.
[7]
M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proc. 7th International Conference on Machine Learning, pages 157--163, 1994.
[8]
M. L. Littman. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research, 2:55--66, 2001.
[9]
J. F. Nash. Non-cooperative games. Annals of Mathematics, 54:286--295, 1951.
[10]
M. L. Puterman. Markov Decision Processes - Discrete Stochastic Dynamic Programming. Jhon Wiley & Sons, Inc., New York, NY., 1994.
[11]
H. Robins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400--407, 1951.
[12]
P. Stone and M. M. Veloso. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3):345--383, 2000.
[13]
R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1997.
[14]
C. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge, England, 1989.

Cited By

View all
  • (2024)Reinforcement Learning in a Prisoner's DilemmaGames and Economic Behavior10.1016/j.geb.2024.01.004Online publication date: Jan-2024
  • (2023)A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly MannerInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2390147(206-227)Online publication date: 10-Feb-2023
  • (2023)Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive NetworkIEEE Transactions on Cybernetics10.1109/TCYB.2021.308722853:1(262-274)Online publication date: Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
July 2002
540 pages
ISBN:1581134800
DOI:10.1145/544741
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Markov games
  2. Q-learning
  3. reinforcement learning
  4. stochastic games

Qualifiers

  • Article

Conference

AAMAS02
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Reinforcement Learning in a Prisoner's DilemmaGames and Economic Behavior10.1016/j.geb.2024.01.004Online publication date: Jan-2024
  • (2023)A Review of Current Perspective and Propensity in Reinforcement Learning (RL) in an Orderly MannerInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2390147(206-227)Online publication date: 10-Feb-2023
  • (2023)Network-Scale Traffic Signal Control via Multiagent Reinforcement Learning With Deep Spatiotemporal Attentive NetworkIEEE Transactions on Cybernetics10.1109/TCYB.2021.308722853:1(262-274)Online publication date: Jan-2023
  • (2020)WRFMR: A Multi-Agent Reinforcement Learning Method for Cooperative TasksIEEE Access10.1109/ACCESS.2020.30409858(216320-216331)Online publication date: 2020
  • (2020)Improve Convergence Speed of Multi‐Agent Q‐Learning for Cooperative Task PlanningMulti‐Agent Coordination10.1002/9781119699057.ch2(111-166)Online publication date: 4-Dec-2020
  • (2020)IntroductionMulti‐Agent Coordination10.1002/9781119699057.ch1(1-110)Online publication date: 4-Dec-2020
  • (2019)A New Multi-Agent Reinforcement Learning Method based on Evolving Dynamic Correlation MatrixIEEE Access10.1109/ACCESS.2019.2946848(1-1)Online publication date: 2019
  • (2017)An exploration strategy for non-stationary opponentsAutonomous Agents and Multi-Agent Systems10.1007/s10458-016-9347-331:5(971-1002)Online publication date: 1-Sep-2017
  • (2016)Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamicsInternational Journal of Robust and Nonlinear Control10.1002/rnc.371927:16(2900-2920)Online publication date: 30-Nov-2016
  • (2015)Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systemsAutomatica (Journal of IFAC)10.1016/j.automatica.2015.08.01761:C(274-281)Online publication date: 1-Nov-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media