Multiagent Reinforcement Learning

Living reference work entry
First Online: 28 January 2020

pp 1–9
Cite this living reference work entry

Encyclopedia of Systems and Control

Jonathan P. How³,
Dong-Ki Kim³ &
Samir Wadhwania³

464 Accesses

Abstract

The area of multiagent reinforcement learning (MARL) provides a promising approach to learning collaborative policies for multiagent systems. However, MARL is inherently more difficult than single-agent learning problems because agents interact with both the environment and other agents. Specifically, learning in multiagent settings involves significant issues of the nonstationary, equilibrium selection, credit assignment, and curse of dimensionality. Despite these difficulties, there have been important recent developments in MARL. This entry provides a background in multiagent reinforcement learning as well as an overview of recent work on topics of game theory, communication, coordination, knowledge sharing, and agent modeling. This entry also summarizes several useful multiagent simulation platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Similar content being viewed by others

Multiagent Reinforcement Learning

Chapter © 2021

Multi-Agent Reinforcement Learning

Chapter © 2020

Deep multiagent reinforcement learning: challenges and directions

Article Open access 19 October 2022

Notes

1.
Leduc poker is a toy poker game often used as a benchmark in Poker AI research. See Southey et al. (2012) for more information.
2.
long short-term memory (LSTM) networks (Hochreiter and Schmidhuber 1997) are one type of recurrent neural network (RNN). LSTMs are well-suited for identifying long-term temporal dependencies by augmenting information in the cell state.
3.
RNNs are a type of network that incorporates feedback, allowing the network to process sequences of data.
4.
Meta-learning trains a model on a variety of tasks, such that it can learn new skills or adapt to new environments quickly using only a small number of training samples (e.g., learning of initial model parameters (Finn et al. 2017)).

Bibliography

Amir O, Kamar E, Kolobov A, Grosz BJ (2016) Interactive teaching strategies for agent training. In: International joint conferences on artificial intelligence (IJCAI)
Google Scholar
Avis D, Rosenberg GD, Savani R, von Stengel B (2010) Enumeration of nash equilibria for two-player games. Econ Theory 42(1):9–37. [Online]. Available: https://doi.org/10.1007/s00199-009-0449-x
Article MathSciNet Google Scholar
Bowling M (2005) Convergence and no-regret in multiagent learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, pp 209–216. [Online]. Available: http://papers.nips.cc/paper/2673-convergen ce-and-no-regret-in-multiagent-learning.pdf
Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. Springer, Berlin/Heidelberg, pp 183–221. [Online]. Available: https://doi.org/10.1007/978-3-642-14435-6_7
Chapter Google Scholar
Clouse J (1997) On integrating apprentice learning and reinforcement learning
Google Scholar
da Silva FL, Glatt R, Costa AHR (2017) Simultaneously learning and advising in multiagent reinforcement learning. In: International conference on autonomous agents and multiagent systems (AAMAS), pp 1100–1108
Google Scholar
Dayan P (1993) Improving generalization for temporal difference learning: the successor representation. Neural Comput 5(4):613–624
Article Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning (ICML), ser. Proceedings of machine learning research, vol 70. PMLR, 06–11 Aug 2017, pp 1126–1135
Google Scholar
Foerster J, Assael IA, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems. Curran Associates Inc., pp 2137–2145
Google Scholar
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2017) Counterfactual multi-agent policy gradients, CoRR, vol abs/1705.08926. [Online]. Available: http://arxiv.org/abs/1705.08926
Goldberg PW, Papadimitriou CH, Savani R (2010) The complexity of the homotopy method, equilibrium selection, and lemke-howson solutions, CoRR, vol abs/1006.5352. [Online]. Available: http://arxiv.org/abs/1006.5352
Grover A, Al-Shedivat M, Gupta JK, Burda Y, Edwards H (2018) Learning policy representations in multiagent systems, CoRR, vol abs/1806.06464. [Online]. Available: http://arxiv.org/abs/1806.06464
han Chang Y, Ho T, Kaelbling LP (2004) All learning is local: multi-agent learning in global reward games. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, pp 807–814. [Online]. Available: http://papers.nips.cc/ paper/2476-all-learning-is-local-multi-agent-learning-i n-global-reward-games.pdf
He H, Boyd-Graber JL, Kwok K, III Daumé H (2016) Opponent modeling in deep reinforcement learning, CoRR, vol abs/1609.05559. [Online]. Available: http://arxiv.org/abs/1609.05559
Hernandez-Leal P, Kartal B, Taylor ME (2018) Is multiagent deep reinforcement learning the answer or the question? A brief survey, CoRR, vol abs/1810.05587. [Online]. Available: http://arxiv.org/abs/1810.05587
Hernandez-Leal P, Kaisers M, Baarslag T, de Cote EM (2017) A survey of learning in multiagent environments: dealing with non-stationarity, CoRR, vol abs/1707.09183. [Online]. Available: http://arxiv.org/abs/1707.09183
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735
id Software (1999) https://www.idsoftware.com/
Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K (2017) Population based training of neural networks, CoRR, vol abs/1711.09846. [Online]. Available: http://arxiv.org/abs/1711.09846
Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castañeda AG, Beattie C, Rabinowitz NC, Morcos AS, Ruderman A, Sonnerat N, Green T, Deason L, Leibo JZ, Silver D, Hassabis D, Kavukcuoglu K, Graepel T (2018) Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, CoRR, vol abs/1807.01281, 2018. [Online]. Available: http://arxiv.org/abs/1807.01281
Kim D, Liu M, Omidshafiei S, Lopez-Cot S, Riemer M, Habibi G, Tesauro G, Mourad S, Campbell M, How JP (2019) Learning hierarchical teaching in cooperative multiagent reinforcement learning, CoRR, vol abs/1903.03216. [Online]. Available: http://arxiv.org/abs/1903.03216
Lanctot M, Zambaldi VF, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. CoRR, vol abs/1711.00832. [Online]. Available: http://arxiv.org/abs/1711.00832
Leyton-Brown K, Shoham Y (2008) Essentials of game theory: a concise multidisciplinary introduction. Morgan & Claypool. [Online]. Available: https://ieeexplore.ieee.org/document/6812710
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on international conference on machine learning, ser. ICML’94. Morgan Kaufmann Publishers, San Francisco, pp 157–163. [Online]. Available: http://dl.acm.org/citation.cfm?id=3091574.3091594
Chapter Google Scholar
Liu S, Lever G, Heess N, Merel J, Tunyasuvunakool S, Graepel T (2019) Emergent coordination through competition. In: International conference on learning representations. [Online]. Available: https://openreview.net/forum?id=BkG8sjR5Km
Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems. NY Curran Associates, Red Hook, pp 6382–6393
Google Scholar
Nowe A, Vrancx P, De Hauwere Y-M (2012) Game theory and multi-agent reinforcement learning. Adapt Learn Optim 12:441–470
Article Google Scholar
Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs, ser. SpringerBriefs in intelligent systems. Springer, May 2016. [Online]. Available: http://www.fransoliehoek.net/docs/Oliehoe kAmato16book.pdf
Book Google Scholar
Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR org, pp 2681–2690
Google Scholar
Omidshafiei S, Kim D, Liu M, Tesauro G, Riemer M, Amato C, Campbell M, How JP (2018) Learning to teach in cooperative multiagent reinforcement learning, CoRR, vol abs/1805.07830. [Online]. Available: http://arxiv.org/abs/1805.07830
Omidshafiei S, Papadimitriou CH, Piliouras G, Tuyls K, Rowland M, Lespiau J, Czarnecki WM, Lanctot M, Pérolat J, Munos R (2019) α-rank: multi-agent evaluation by evolution, CoRR, vol abs/1903.01373. [Online]. Available: http://arxiv.org/abs/1903.01373
OpenAI, Openai five (2018) https://blog.openai.com/openai-five/
Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. Auton Agent Multi-Agent Syst 11(3):387–434
Article Google Scholar
Ponsen M, Tuyls K, Kaisers M, Ramon J (2009) An evolutionary game-theoretic analysis of poker strategies, Entertainment Computing, vol 1, no 1, pp 39–45. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1875952109000056
Rabinowitz NC, Perbet F, Song HF, Zhang C, Eslami SMA, Botvinick M (2018) Machine theory of mind, CoRR, vol abs/1802.07740. [Online]. Available: http://arxiv.org/abs/1802.07740
Southey F, Bowling MP, Larson B, Piccione C, Burch N, Billings D, Rayner C (2012) Bayes’ bluff: opponent modelling in poker. arXiv preprint arXiv:1207.1411
Google Scholar
Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in neural information processing systems. Curran Associates Inc., pp 2244–2252
Google Scholar
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685. [Online]. Available: http://dl.acm.org/citation.cfm?id=1577069.1755839
Tesauro G (2004) Extending q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 871–878
Google Scholar
Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033
Google Scholar
Torrey L, Taylor M (2013) Teaching on a budget: agents advising agents in reinforcement learning. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1053–1060
Google Scholar
Tuyls K, Weiss G (2012) Multiagent learning: basics, challenges, and prospects. Ai Mag 33:41–52
Article Google Scholar
Tuyls K, Weiss G (2012) Multiagent learning: basics, challenges, and prospects. Ai Mag 33(3):41–41
Article Google Scholar
Tuyls K, Pérolat J, Lanctot M, Leibo JZ, Graepel T (2018) A generalised method for empirical game theoretic analysis, CoRR, vol abs/1803.06376. [Online]. Available: http://arxiv.org/abs/1803.06376
Vinyals O, Babuschkin I, Chung J, Mathieu M, Jaderberg M, Czarnecki WM, Dudzik A, Huang A, Georgiev P, Powell R, Ewalds T, Horgan D, Kroiss M, Danihelka I, Agapiou J, Oh J, Dalibard V, Choi D, Sifre L, Sulsky Y, Vezhnevets S, Molloy J, Cai T, Budden D, Paine T, Gulcehre C, Wang Z, Pfaff T, Pohlen T, Wu Y, Yogatama D, Cohen J, McKinney K, Smith O, Schaul T, Lillicrap T, Apps C, Kavukcuoglu K, Hassabis D, Silver D (2019) AlphaStar: mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mas tering-real-time-strategy-game-starcraft-ii/
Wadhwania S, Kim D-K, Omidshafiei S, How JP (2019) Policy distillation and value matching in multiagent reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China. [Online]. Available: https://arxiv.org/abs/1903.06592
Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning, arXiv preprint arXiv:1802.05438
Google Scholar
Zinkevich M, Balch T (2001) Symmetry in markov decision processes and its implications for single agent and multi agent learning. In: In Proceedings of the 18th international conference on machine learning, Citeseer
Google Scholar

Download references

Acknowledgments

This work was supported by IBM (as part of the MIT-IBM Watson AI Lab initiative), Boeing, AWS Machine Learning Research Awards program, and by ARL DCIST under Cooperative Agreement Number W911NF-17-2-0181.

Author information

Authors and Affiliations

Department of Aeronautics and Astronautics, Aerospace Controls Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Jonathan P. How, Dong-Ki Kim & Samir Wadhwania

Authors

Jonathan P. How
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Ki Kim
View author publications
You can also search for this author in PubMed Google Scholar
Samir Wadhwania
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan P. How .

Editor information

Editors and Affiliations

Electrical and Computer Engineering, Boston University, Boston, MA, USA
John Baillieul
Automation and Control Solutions, Honeywell, Golden Valley, MN, USA
Tariq Samad

Section Editor information

Department of Aeronautics and Astronautics, Aerospace Controls Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Jonathan P. How

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag London Ltd., part of Springer Nature

About this entry

Cite this entry

How, J.P., Kim, DK., Wadhwania, S. (2020). Multiagent Reinforcement Learning. In: Baillieul, J., Samad, T. (eds) Encyclopedia of Systems and Control. Springer, London. https://doi.org/10.1007/978-1-4471-5102-9_100066-1

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5102-9_100066-1
Published: 28 January 2020
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5102-9
Online ISBN: 978-1-4471-5102-9
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics