Article

Free access

Stabilising experience replay for deep multi-agent reinforcement learning

Authors:

Jakob Foerster,

Nantas Nardelli,

Gregory Farquhar,

Triantafyllos Afouras,

Philip H. S. Torr,

Pushmeet Kohli,

Shimon WhitesonAuthors Info & Claims

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Pages 1146 - 1155

Published: 06 August 2017 Publication History

PDF eReader Publisher Site

Abstract

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micro-management confirm that these methods enable the successful combination of experience replay with multi-agent RL.

References

[1]

Busoniu, Lucian, Babuska, Robert, and De Schutter, Bart. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 38(2):156, 2008.

Digital Library

[2]

Chung, Junyoung, Gulcehre, Caglar, Cho, Kyung Hyun, and Bengio, Yoshua. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.

[3]

Ciosek, Kamil and Whiteson, Shimon. Offer: Off-environment reinforcement learning. 2017.

[4]

Collobert, R., Kavukcuoglu, K., and Farabet, C. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011.

[5]

Conitzer, Vincent and Sandholm, Tuomas. Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1-2):23-43, 2007.

Digital Library

[6]

Da Silva, Bruno C, Basso, Eduardo W, Bazzan, Ana LC, and Engel, Paulo M. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning, pp. 217-224. ACM, 2006.

Digital Library

[7]

Foerster, Jakob, Assael, Yannis M, de Freitas, Nando, and Whiteson, Shimon. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pp. 2137-2145, 2016.

Digital Library

[8]

Hausknecht, Matthew and Stone, Peter. Deep recurrent q-learning for partially observable mdps. arXiv preprint arXiv:1507.06527, 2015.

[9]

Hausknecht, Matthew, Mupparaju, Prannoy, Subramanian, Sandeep, Kalyanakrishnan, S, and Stone, P. Half field offense: an environment for multiagent learning and ad hoc teamwork. In AAMAS Adaptive Learning Agents (ALA) Workshop, 2016.

[10]

He, He, Boyd-Graber, Jordan, Kwok, Kevin, and Daumé III, Hal. Opponent modeling in deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, pp. 1804-1813, 2016.

Digital Library

[11]

Hochreiter, Sepp and Schmidhuber, Jurgen. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.

Digital Library

[12]

Jorge, Emilio, Kågebäck, Mikael, and Gustavsson, Emil. Learning to play guess who? and inventing a grounded language as a consequence. arXiv preprint arXiv:1611.03218, 2016.

[13]

Kok, Jelle R and Vlassis, Nikos. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7(Sep):1789-1828, 2006.

Digital Library

[14]

Kuyer, Lior, Whiteson, Shimon, Bakker, Bram, and Vlassis, Nikos. Multiagent reinforcement learning for urban traffic control using coordination graphs. In ECML 2008: Proceedings of the Nineteenth European Conference on Machine Learning, pp. 656-671, September 2008.

[15]

Lauer, Martin and Riedmiller, Martin. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer, 2000.

Digital Library

[16]

Leibo, Joel Z, Zambaldi, Vinicius, Lanctot, Marc, Marecki, Janusz, and Graepel, Thore. Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037, 2017.

Digital Library

[17]

Makar, Rajbala, Mahadevan, Sridhar, and Ghavamzadeh, Mohammad. Hierarchical multi-agent reinforcement learning. In Proceedings of the fifth international conference on Autonomous agents, pp. 246-253. ACM, 2001.

Digital Library

[18]

Mataric, Maja J. Using communication to reduce locality in distributed multiagent learning. Journal of experimental & theoretical artificial intelligence, 10(3):357-369, 1998.

[19]

Matignon, Laetitia, Laurent, Guillaume J, and Le Fort-Piat, Nadine. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(01):1-31, 2012.

Digital Library

[20]

Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.

[21]

Robert, CP and Casella, G. Monte carlo statistical methods springer. New York, 2004.

Digital Library

[22]

Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. CoRR, abs/1511.05952, 2015.

[23]

Shoham, Y. and Leyton-Brown, K. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, New York, 2009.

Digital Library

[24]

Sukhbaatar, Sainbayar, Fergus, Rob, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pp. 2244-2252, 2016.

Digital Library

[25]

Sutton, Richard S and Barto, Andrew G. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.

Digital Library

[26]

Synnaeve, Gabriel, Nardelli, Nantas, Auvolat, Alex, Chintala, Soumith, Lacroix, Timothée, Lin, Zeming, Richoux, Florian, and Usunier, Nicolas. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625, 2016.

[27]

Tampuu, Ardi, Matiisen, Tambet, Kodelja, Dorian, Kuzovkin, Ilya, Korjus, Kristjan, Aru, Juhan, Aru, Jaan, and Vicente, Raul. Multiagent cooperation and competition with deep reinforcement learning. arXiv preprint arXiv:1511.08779, 2015.

[28]

Tan, Ming. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330-337, 1993.

Digital Library

[29]

Tesauro, Gerald. Extending q-learning to general adaptive multi-agent systems. In NIPS, volume 4, 2003.

Digital Library

[30]

Usunier, Nicolas, Synnaeve, Gabriel, Lin, Zeming, and Chintala, Soumith. Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks. arXiv preprint arXiv:1609.02993, 2016.

[31]

Van der Pol, Elise and Oliehoek, Frans A. Coordinated deep reinforcement learners for traffic light control. In NIPS'16 Workshop on Learning, Inference and Control of Multi-Agent Systems, 2016.

[32]

Wang, Ziyu, Bapst, Victor, Heess, Nicolas, Mnih, Volodymyr, Munos, Remi, Kavukcuoglu, Koray, and de Freitas, Nando. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016.

[33]

Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.

[34]

Weyns, Danny, Helleboogh, Alexander, and Holvoet, Tom. The packet-world: A test bed for investigating situated multi-agent systems. In Software Agent-Based Applications, Platforms and Development Kits, pp. 383-408. Springer, 2005.

[35]

Yang, Erfu and Gu, Dongbing. Multiagent reinforcement learning for multi-robot systems: A survey. Technical report, tech. rep, 2004.

[36]

Ye, Dayong, Zhang, Minjie, and Yang, Yun. A multi-agent framework for packet routing in wireless sensor networks. sensors, 15(5):10026-10047, 2015.

[37]

Zawadzki, E., Lipson, A., and Leyton-Brown, K. Empirically evaluating multiagent learning algorithms. arXiv preprint 1401.8074, 2014.

Cited By

Cohen SAgmon NPelachaud CTaylor MFaliszewski PMascardi V(2022)Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive ReasoningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535884(290-299)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535884
Li ZZhou XDe Turck FLi TRen YQin Y(2022)Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing OptimizationWireless Communications & Mobile Computing10.1155/2022/12319792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/1231979
Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Show More Cited By

Stabilising experience replay for deep multi-agent reinforcement learning
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence

Recommendations

Learning to communicate with Deep multi-agent reinforcement learning
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the ...
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United Kingdom

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
A multi-agent reinforcement learning with weighted experience sharing
ICIC'11: Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence

Reinforcement Learning, also sometimes called learning by rewards and punishments is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment [1]. With repeated trials however, it is expected ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

August 2017

4208 pages

Publisher

JMLR.org

Publication History

Published: 06 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
986
Total Downloads

Downloads (Last 12 months)85
Downloads (Last 6 weeks)8

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cohen SAgmon NPelachaud CTaylor MFaliszewski PMascardi V(2022)Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive ReasoningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535884(290-299)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3535884
Li ZZhou XDe Turck FLi TRen YQin Y(2022)Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing OptimizationWireless Communications & Mobile Computing10.1155/2022/12319792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/1231979
Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Hu HLerer APeysakhovich AFoerster JDaumé HSingh A(2020)"Other-Play " for zero-shot coordinationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525347(4399-4410)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525347
Scialom TDray PLamprier SPiwowarski BStaiano JLarochelle HRanzato MHadsell RBalcan MLin H(2020)ColdGANsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497317(18978-18989)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497317
Rashid TFarquhar GPeng BWhiteson SLarochelle HRanzato MHadsell RBalcan MLin H(2020)Weighted QMIXProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496579(10199-10210)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496579
Dann MThangarajah JYao YLogan BEl Fallah Seghrouchni ASukthankar GAn BYorke-Smith N(2020)Intention-Aware Multiagent SchedulingProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398799(285-293)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.5555/3398761.3398799
Prasad VBhavsar M(2020)Monitoring IaaS Cloud for Healthcare SystemsInternational Journal of E-Health and Medical Communications10.4018/IJEHMC.202007010411:3(54-70)Online publication date: 1-Jul-2020
https://dl.acm.org/doi/10.4018/IJEHMC.2020070104
de Witt CFoerster JFarquhar GTorr PBöhmer WWhiteson SWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Multi-agent common knowledge reinforcement learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455177(9927-9939)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455177
Samvelyan MRashid TSchroeder de Witt CFarquhar GNardelli NRudner THung CTorr PFoerster JWhiteson SElkind EVeloso MAgmon NTaylor M(2019)The StarCraft Multi-Agent ChallengeProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332052(2186-2188)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332052
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents