research-article

Neural Population Learning beyond Symmetric Zero-Sum Games

Authors:

Georgios Piliouras,

Nicolas HeessAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 1247 - 1255

Published: 06 May 2024 Publication History

Abstract

We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.

References

[1]

Robert J Aumann. 1974. Subjectivity and correlation in randomized strategies. Journal of mathematical Economics 1, 1 (1974), 67--96.

[2]

Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. 2018. Emergent Complexity via Multi-Agent Competition. In International Conference on Learning Representations. https://openreview.net/forum?id=Sy0GnUxCb

[3]

Siddharth Barman and Katrina Ligett. 2015. Finding any nontrivial coarse corre- lated equilibrium is hard. ACM SIGecom Exchanges 14, 1 (2015), 76--79.

Digital Library

[4]

George W. Brown. 1951. Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation (1951). arXiv:13(1):374-376 [cs.LG]

[5]

Noam Brown and Tuomas Sandholm. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359, 6374 (2018), 418--424.

[6]

Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu. 2002. Deep blue. Artificial intelligence 134, 1--2 (2002), 57--83.

[7]

Nicolo Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, learning, and games. Cambridge university press.

[8]

Constantinos Daskalakis, Maxwell Fishelson, and Noah Golowich. 2021. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems 34 (2021), 27604--27616.

[9]

Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. 2009. The complexity of computing a Nash equilibrium. SIAM J. Comput. 39, 1 (2009), 195--259.

Digital Library

[10]

Marta Garnelo, Wojciech Marian Czarnecki, Siqi Liu, Dhruva Tirumala, Junhyuk Oh, Gauthier Gidel, Hado van Hasselt, and David Balduzzi. 2021. Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity. In AAMAS.

[11]

Max Jaderberg, Wojciech M Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C Rabinowitz, Ari S Morcos, Avraham Ruderman, et al. 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859--865.

[12]

H. W. Kuhn and AW Tucker. 1957. Extensive games and the problem and information. Contributions to the Theory of Games, II, Annals of Mathematical Studies 28 (1957), 193--216.

[13]

Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, and Jonah Ryan-Davis. 2019. OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019). arXiv:1908.09453 [cs.LG] http://arxiv.org/abs/1908.09453

[14]

Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, and Thore Graepel. 2017. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/ 3323fe11e9595c09af38fe67567a9394-Paper.pdf

Digital Library

[15]

Joel Z Leibo, Edgar A Dueñez-Guzman, Alexander Vezhnevets, John P Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charlie Beattie, Igor Mordatch, and Thore Graepel. 2021. Scalable evaluation of multi-agent reinforcement learning with Melting Pot. In International Conference on Machine Learning. PMLR, 6187--6199.

[16]

Siqi Liu, Marc Lanctot, Luke Marris, and Nicolas Heess. 2022. Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 13793--13806. https://proceedings.mlr.press/v162/liu22h.html

[17]

Siqi Liu, Guy Lever, Nicholas Heess, Josh Merel, Saran Tunyasuvunakool, and Thore Graepel. 2019. Emergent Coordination Through Competition. In International Conference on Learning Representations. https://openreview.net/forum?id= BkG8sjR5Km

[18]

Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, SM Ali Eslami, Daniel Hennes, Wojciech M Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, et al. 2022. From motor control to team play in simulated humanoid football. Science Robotics 7, 69 (2022), eabo0235.

[19]

Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, and Thore Graepel. 2022. NeuPL: Neural Population Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=MIX3fJkl_1

[20]

Siqi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, and Nicolas Heess. 2024. Neural Population Learning beyond Symmetric Zero-sum Games. arXiv:2401.05133 [cs.AI]

[21]

Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, and Thore Graepel. 2021. Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 7480--7491. http://proceedings.mlr.press/v139/marris21a.html

[22]

Stephen Mcaleer, JB Lanier, Roy Fox, and Pierre Baldi. 2020. Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 20238--20248. https://proceedings.neurips.cc/paper/2020/file/ e9bcd1b063077573285ae1a41025f5dc-Paper.pdf

[23]

H. Brendan McMahan, Geoffrey J. Gordon, and Avrim Blum. 2003. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning (Washington, DC, USA, 2003-08-21) (ICML'03). AAAI Press, 536--543.

Digital Library

[24]

Hervé Moulin and J-P Vial. 1978. Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory 7, 3 (1978), 201--221.

Digital Library

[25]

Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, and Remi Munos. 2020. A Generalized Training Approach for Multiagent Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkl5kxrKDr

[26]

John Nash. 1951. Non-Cooperative Games. Annals of Mathematics 54, 2 (1951), 286--295. http://www.jstor.org/stable/1969529

[27]

Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M Czarnecki, Marc Lanctot, Julien Perolat, and Remi Munos. 2019. α-rank: Multi-agent evaluation by evolution. Scientific reports 9, 1 (2019), 1--29.

[28]

Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, et al. 2022. Mastering the game of Stratego with model-free multiagent reinforcement learning. Science 378, 6623 (2022), 990--996.

[29]

Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, et al. 2021. From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization. In International Conference on Machine Learning. PMLR, 8525--8535.

[30]

Sheldon M Ross. 1971. Goofspiel-the game of pure strategy. Journal of Applied Probability 8, 3 (1971), 621--625.

[31]

Arthur L Samuel. 1967. Some studies in machine learning using the game of checkers. II-Recent progress. IBM Journal of research and development 11, 6 (1967), 601--617.

[32]

Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. 2018. Progress & compress: A scalable framework for continual learning. In International Conference on Machine Learning. PMLR, 4528--4537.

[33]

Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, Siqi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, and Martin Riedmiller. 2022. Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach. arXiv preprint arXiv:2204.10256 (2022).

[34]

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140--1144.

[35]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354--359.

[36]

Max Smith, Thomas Anthony, and Michael Wellman. 2020. Iterative Empirical Game Solving via Single Policy Best Response. In International Conference on Learning Representations.

[37]

Yuval Tassa, Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy P. Lillicrap, and Nicolas Manfred Otto Heess. 2020. dm_control: Software and Tasks for Continuous Control. Softw. Impacts 6 (2020), 100022. https://api.semanticscholar.org/CorpusID: 219980295

[38]

Gerald Tesauro et al. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58--68.

Digital Library

[39]

Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350--354

Index Terms

Recommendations

Repeated zero-sum games with budget
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2

When a zero-sum game is played once, a risk-neutral player will want to maximize his expected outcome in that single play. However, if that single play instead only determines how much one player must pay to the other, and the same game must be played ...
Double-oracle algorithm for computing an exact nash equilibrium in zero-sum extensive-form games
AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

We investigate an iterative algorithm for computing an exact Nash equilibrium in two-player zero-sum extensive-form games with imperfect information. The approach uses the sequence-form representation of extensive-form games and the double-oracle ...
Pure strategy equilibria in symmetric two-player zero-sum games
Abstract
We observe that a symmetric two-player zero-sum game has a pure strategy equilibrium if and only if it is not a generalized rock-paper-scissors matrix. Moreover, we show that every finite symmetric quasiconcave two-player zero-sum game has a pure ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '23

Sponsor:

SIGAI

AAMAS '23: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
15
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents