Nothing Special   »   [go: up one dir, main page]

Skip to main content

Multiagent Reinforcement Learning

  • Living reference work entry
  • First Online:
Encyclopedia of Systems and Control

Abstract

The area of multiagent reinforcement learning (MARL) provides a promising approach to learning collaborative policies for multiagent systems. However, MARL is inherently more difficult than single-agent learning problems because agents interact with both the environment and other agents. Specifically, learning in multiagent settings involves significant issues of the nonstationary, equilibrium selection, credit assignment, and curse of dimensionality. Despite these difficulties, there have been important recent developments in MARL. This entry provides a background in multiagent reinforcement learning as well as an overview of recent work on topics of game theory, communication, coordination, knowledge sharing, and agent modeling. This entry also summarizes several useful multiagent simulation platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Leduc poker is a toy poker game often used as a benchmark in Poker AI research. See Southey et al. (2012) for more information.

  2. 2.

    long short-term memory (LSTM) networks (Hochreiter and Schmidhuber 1997) are one type of recurrent neural network (RNN). LSTMs are well-suited for identifying long-term temporal dependencies by augmenting information in the cell state.

  3. 3.

    RNNs are a type of network that incorporates feedback, allowing the network to process sequences of data.

  4. 4.

    Meta-learning trains a model on a variety of tasks, such that it can learn new skills or adapt to new environments quickly using only a small number of training samples (e.g., learning of initial model parameters (Finn et al. 2017)).

Bibliography

  • Amir O, Kamar E, Kolobov A, Grosz BJ (2016) Interactive teaching strategies for agent training. In: International joint conferences on artificial intelligence (IJCAI)

    Google Scholar 

  • Avis D, Rosenberg GD, Savani R, von Stengel B (2010) Enumeration of nash equilibria for two-player games. Econ Theory 42(1):9–37. [Online]. Available: https://doi.org/10.1007/s00199-009-0449-x

    Article  MathSciNet  Google Scholar 

  • Bowling M (2005) Convergence and no-regret in multiagent learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems 17. MIT Press, pp 209–216. [Online]. Available: http://papers.nips.cc/paper/2673-convergen ce-and-no-regret-in-multiagent-learning.pdf

  • Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. Springer, Berlin/Heidelberg, pp 183–221. [Online]. Available: https://doi.org/10.1007/978-3-642-14435-6_7

    Chapter  Google Scholar 

  • Clouse J (1997) On integrating apprentice learning and reinforcement learning

    Google Scholar 

  • da Silva FL, Glatt R, Costa AHR (2017) Simultaneously learning and advising in multiagent reinforcement learning. In: International conference on autonomous agents and multiagent systems (AAMAS), pp 1100–1108

    Google Scholar 

  • Dayan P (1993) Improving generalization for temporal difference learning: the successor representation. Neural Comput 5(4):613–624

    Article  Google Scholar 

  • Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning (ICML), ser. Proceedings of machine learning research, vol 70. PMLR, 06–11 Aug 2017, pp 1126–1135

    Google Scholar 

  • Foerster J, Assael IA, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems. Curran Associates Inc., pp 2137–2145

    Google Scholar 

  • Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2017) Counterfactual multi-agent policy gradients, CoRR, vol abs/1705.08926. [Online]. Available: http://arxiv.org/abs/1705.08926

  • Goldberg PW, Papadimitriou CH, Savani R (2010) The complexity of the homotopy method, equilibrium selection, and lemke-howson solutions, CoRR, vol abs/1006.5352. [Online]. Available: http://arxiv.org/abs/1006.5352

  • Grover A, Al-Shedivat M, Gupta JK, Burda Y, Edwards H (2018) Learning policy representations in multiagent systems, CoRR, vol abs/1806.06464. [Online]. Available: http://arxiv.org/abs/1806.06464

  • han Chang Y, Ho T, Kaelbling LP (2004) All learning is local: multi-agent learning in global reward games. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, pp 807–814. [Online]. Available: http://papers.nips.cc/ paper/2476-all-learning-is-local-multi-agent-learning-i n-global-reward-games.pdf

  • He H, Boyd-Graber JL, Kwok K, III Daumé H (2016) Opponent modeling in deep reinforcement learning, CoRR, vol abs/1609.05559. [Online]. Available: http://arxiv.org/abs/1609.05559

  • Hernandez-Leal P, Kartal B, Taylor ME (2018) Is multiagent deep reinforcement learning the answer or the question? A brief survey, CoRR, vol abs/1810.05587. [Online]. Available: http://arxiv.org/abs/1810.05587

  • Hernandez-Leal P, Kaisers M, Baarslag T, de Cote EM (2017) A survey of learning in multiagent environments: dealing with non-stationarity, CoRR, vol abs/1707.09183. [Online]. Available: http://arxiv.org/abs/1707.09183

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735

  • id Software (1999) https://www.idsoftware.com/

  • Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K (2017) Population based training of neural networks, CoRR, vol abs/1711.09846. [Online]. Available: http://arxiv.org/abs/1711.09846

  • Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castañeda AG, Beattie C, Rabinowitz NC, Morcos AS, Ruderman A, Sonnerat N, Green T, Deason L, Leibo JZ, Silver D, Hassabis D, Kavukcuoglu K, Graepel T (2018) Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, CoRR, vol abs/1807.01281, 2018. [Online]. Available: http://arxiv.org/abs/1807.01281

  • Kim D, Liu M, Omidshafiei S, Lopez-Cot S, Riemer M, Habibi G, Tesauro G, Mourad S, Campbell M, How JP (2019) Learning hierarchical teaching in cooperative multiagent reinforcement learning, CoRR, vol abs/1903.03216. [Online]. Available: http://arxiv.org/abs/1903.03216

  • Lanctot M, Zambaldi VF, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. CoRR, vol abs/1711.00832. [Online]. Available: http://arxiv.org/abs/1711.00832

  • Leyton-Brown K, Shoham Y (2008) Essentials of game theory: a concise multidisciplinary introduction. Morgan & Claypool. [Online]. Available: https://ieeexplore.ieee.org/document/6812710

  • Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on international conference on machine learning, ser. ICML’94. Morgan Kaufmann Publishers, San Francisco, pp 157–163. [Online]. Available: http://dl.acm.org/citation.cfm?id=3091574.3091594

    Chapter  Google Scholar 

  • Liu S, Lever G, Heess N, Merel J, Tunyasuvunakool S, Graepel T (2019) Emergent coordination through competition. In: International conference on learning representations. [Online]. Available: https://openreview.net/forum?id=BkG8sjR5Km

  • Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems. NY Curran Associates, Red Hook, pp 6382–6393

    Google Scholar 

  • Nowe A, Vrancx P, De Hauwere Y-M (2012) Game theory and multi-agent reinforcement learning. Adapt Learn Optim 12:441–470

    Article  Google Scholar 

  • Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs, ser. SpringerBriefs in intelligent systems. Springer, May 2016. [Online]. Available: http://www.fransoliehoek.net/docs/Oliehoe kAmato16book.pdf

    Book  Google Scholar 

  • Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning-volume 70. JMLR org, pp 2681–2690

    Google Scholar 

  • Omidshafiei S, Kim D, Liu M, Tesauro G, Riemer M, Amato C, Campbell M, How JP (2018) Learning to teach in cooperative multiagent reinforcement learning, CoRR, vol abs/1805.07830. [Online]. Available: http://arxiv.org/abs/1805.07830

  • Omidshafiei S, Papadimitriou CH, Piliouras G, Tuyls K, Rowland M, Lespiau J, Czarnecki WM, Lanctot M, Pérolat J, Munos R (2019) α-rank: multi-agent evaluation by evolution, CoRR, vol abs/1903.01373. [Online]. Available: http://arxiv.org/abs/1903.01373

  • OpenAI, Openai five (2018) https://blog.openai.com/openai-five/

  • Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. Auton Agent Multi-Agent Syst 11(3):387–434

    Article  Google Scholar 

  • Ponsen M, Tuyls K, Kaisers M, Ramon J (2009) An evolutionary game-theoretic analysis of poker strategies, Entertainment Computing, vol 1, no 1, pp 39–45. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1875952109000056

  • Rabinowitz NC, Perbet F, Song HF, Zhang C, Eslami SMA, Botvinick M (2018) Machine theory of mind, CoRR, vol abs/1802.07740. [Online]. Available: http://arxiv.org/abs/1802.07740

  • Southey F, Bowling MP, Larson B, Piccione C, Burch N, Billings D, Rayner C (2012) Bayes’ bluff: opponent modelling in poker. arXiv preprint arXiv:1207.1411

    Google Scholar 

  • Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in neural information processing systems. Curran Associates Inc., pp 2244–2252

    Google Scholar 

  • Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685. [Online]. Available: http://dl.acm.org/citation.cfm?id=1577069.1755839

  • Tesauro G (2004) Extending q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 871–878

    Google Scholar 

  • Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033

    Google Scholar 

  • Torrey L, Taylor M (2013) Teaching on a budget: agents advising agents in reinforcement learning. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1053–1060

    Google Scholar 

  • Tuyls K, Weiss G (2012) Multiagent learning: basics, challenges, and prospects. Ai Mag 33:41–52

    Article  Google Scholar 

  • Tuyls K, Weiss G (2012) Multiagent learning: basics, challenges, and prospects. Ai Mag 33(3):41–41

    Article  Google Scholar 

  • Tuyls K, Pérolat J, Lanctot M, Leibo JZ, Graepel T (2018) A generalised method for empirical game theoretic analysis, CoRR, vol abs/1803.06376. [Online]. Available: http://arxiv.org/abs/1803.06376

  • Vinyals O, Babuschkin I, Chung J, Mathieu M, Jaderberg M, Czarnecki WM, Dudzik A, Huang A, Georgiev P, Powell R, Ewalds T, Horgan D, Kroiss M, Danihelka I, Agapiou J, Oh J, Dalibard V, Choi D, Sifre L, Sulsky Y, Vezhnevets S, Molloy J, Cai T, Budden D, Paine T, Gulcehre C, Wang Z, Pfaff T, Pohlen T, Wu Y, Yogatama D, Cohen J, McKinney K, Smith O, Schaul T, Lillicrap T, Apps C, Kavukcuoglu K, Hassabis D, Silver D (2019) AlphaStar: mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mas tering-real-time-strategy-game-starcraft-ii/

  • Wadhwania S, Kim D-K, Omidshafiei S, How JP (2019) Policy distillation and value matching in multiagent reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China. [Online]. Available: https://arxiv.org/abs/1903.06592

  • Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning, arXiv preprint arXiv:1802.05438

    Google Scholar 

  • Zinkevich M, Balch T (2001) Symmetry in markov decision processes and its implications for single agent and multi agent learning. In: In Proceedings of the 18th international conference on machine learning, Citeseer

    Google Scholar 

Download references

Acknowledgments

This work was supported by IBM (as part of the MIT-IBM Watson AI Lab initiative), Boeing, AWS Machine Learning Research Awards program, and by ARL DCIST under Cooperative Agreement Number W911NF-17-2-0181.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan P. How .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag London Ltd., part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

How, J.P., Kim, DK., Wadhwania, S. (2020). Multiagent Reinforcement Learning. In: Baillieul, J., Samad, T. (eds) Encyclopedia of Systems and Control. Springer, London. https://doi.org/10.1007/978-1-4471-5102-9_100066-1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5102-9_100066-1

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5102-9

  • Online ISBN: 978-1-4471-5102-9

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics