Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2976248.2976454guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Cyclic equilibria in Markov games

Published: 05 December 2005 Publication History

Abstract

Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a new (non-stationary) equilibrium concept that we call "cyclic equilibria." We prove that value iteration identifies cyclic equilibria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.

References

[1]
Bellman, R. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.
[2]
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
[3]
Greenwald, A., & Hall, K. (2003). Correlated Q-learning. Proceedings of the Twentieth International Conference on Machine Learning (pp. 242-249).
[4]
Hu, J., & Wellman, M. (1998). Multiagent reinforcement learning: theoretical framework and an algorithm. Proceedings of the Fifteenth International Conference on Machine Learning (pp. 242-250). Morgan Kaufman.
[5]
Littman, M. (2001). Friend-or-foe Q-learning in general-sum games. Proceedings of the Eighteenth International Conference on Machine Learning (pp. 322-328). Morgan Kaufmann.
[6]
Littman, M. L., & Szepesvári, C. (1996). A generalized reinforcement-learning model: Convergence and applications. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 310-318).
[7]
Osborne, M. J., & Rubinstein, A. (1994). A Course in Game Theory. The MIT Press.
[8]
Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley-Interscience.
[9]
Shapley, L. (1953). Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39, 1095-1100.
[10]
Tesauro, G., & Kephart, J. (1999). Pricing in agent economies using multi-agent Q-learning. Proceedings of Fifth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (pp. 71-86).

Cited By

View all
  • (2018)Actor-critic policy optimization in partially observable multiagent environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327261(3426-3439)Online publication date: 3-Dec-2018
  • (2018)Learning with Opponent-Learning AwarenessProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237408(122-130)Online publication date: 9-Jul-2018
  • (2018)Designing Non-greedy Reinforcement Learning Agents with Diminishing Reward ShapingProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3278721.3278759(297-302)Online publication date: 27-Dec-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'05: Proceedings of the 18th International Conference on Neural Information Processing Systems
December 2005
1656 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 05 December 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Actor-critic policy optimization in partially observable multiagent environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327261(3426-3439)Online publication date: 3-Dec-2018
  • (2018)Learning with Opponent-Learning AwarenessProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237408(122-130)Online publication date: 9-Jul-2018
  • (2018)Designing Non-greedy Reinforcement Learning Agents with Diminishing Reward ShapingProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3278721.3278759(297-302)Online publication date: 27-Dec-2018
  • (2017)Multi-agent Reinforcement Learning in Sequential Social DilemmasProceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems10.5555/3091125.3091194(464-473)Online publication date: 8-May-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media