Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3305381.3305500guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Stabilising experience replay for deep multi-agent reinforcement learning

Published: 06 August 2017 Publication History

Abstract

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micro-management confirm that these methods enable the successful combination of experience replay with multi-agent RL.

References

[1]
Busoniu, Lucian, Babuska, Robert, and De Schutter, Bart. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 38(2):156, 2008.
[2]
Chung, Junyoung, Gulcehre, Caglar, Cho, Kyung Hyun, and Bengio, Yoshua. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[3]
Ciosek, Kamil and Whiteson, Shimon. Offer: Off-environment reinforcement learning. 2017.
[4]
Collobert, R., Kavukcuoglu, K., and Farabet, C. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011.
[5]
Conitzer, Vincent and Sandholm, Tuomas. Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1-2):23-43, 2007.
[6]
Da Silva, Bruno C, Basso, Eduardo W, Bazzan, Ana LC, and Engel, Paulo M. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning, pp. 217-224. ACM, 2006.
[7]
Foerster, Jakob, Assael, Yannis M, de Freitas, Nando, and Whiteson, Shimon. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pp. 2137-2145, 2016.
[8]
Hausknecht, Matthew and Stone, Peter. Deep recurrent q-learning for partially observable mdps. arXiv preprint arXiv:1507.06527, 2015.
[9]
Hausknecht, Matthew, Mupparaju, Prannoy, Subramanian, Sandeep, Kalyanakrishnan, S, and Stone, P. Half field offense: an environment for multiagent learning and ad hoc teamwork. In AAMAS Adaptive Learning Agents (ALA) Workshop, 2016.
[10]
He, He, Boyd-Graber, Jordan, Kwok, Kevin, and Daumé III, Hal. Opponent modeling in deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, pp. 1804-1813, 2016.
[11]
Hochreiter, Sepp and Schmidhuber, Jurgen. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
[12]
Jorge, Emilio, Kågebäck, Mikael, and Gustavsson, Emil. Learning to play guess who? and inventing a grounded language as a consequence. arXiv preprint arXiv:1611.03218, 2016.
[13]
Kok, Jelle R and Vlassis, Nikos. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7(Sep):1789-1828, 2006.
[14]
Kuyer, Lior, Whiteson, Shimon, Bakker, Bram, and Vlassis, Nikos. Multiagent reinforcement learning for urban traffic control using coordination graphs. In ECML 2008: Proceedings of the Nineteenth European Conference on Machine Learning, pp. 656-671, September 2008.
[15]
Lauer, Martin and Riedmiller, Martin. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer, 2000.
[16]
Leibo, Joel Z, Zambaldi, Vinicius, Lanctot, Marc, Marecki, Janusz, and Graepel, Thore. Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037, 2017.
[17]
Makar, Rajbala, Mahadevan, Sridhar, and Ghavamzadeh, Mohammad. Hierarchical multi-agent reinforcement learning. In Proceedings of the fifth international conference on Autonomous agents, pp. 246-253. ACM, 2001.
[18]
Mataric, Maja J. Using communication to reduce locality in distributed multiagent learning. Journal of experimental & theoretical artificial intelligence, 10(3):357-369, 1998.
[19]
Matignon, Laetitia, Laurent, Guillaume J, and Le Fort-Piat, Nadine. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(01):1-31, 2012.
[20]
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
[21]
Robert, CP and Casella, G. Monte carlo statistical methods springer. New York, 2004.
[22]
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. CoRR, abs/1511.05952, 2015.
[23]
Shoham, Y. and Leyton-Brown, K. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, New York, 2009.
[24]
Sukhbaatar, Sainbayar, Fergus, Rob, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pp. 2244-2252, 2016.
[25]
Sutton, Richard S and Barto, Andrew G. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
[26]
Synnaeve, Gabriel, Nardelli, Nantas, Auvolat, Alex, Chintala, Soumith, Lacroix, Timothée, Lin, Zeming, Richoux, Florian, and Usunier, Nicolas. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625, 2016.
[27]
Tampuu, Ardi, Matiisen, Tambet, Kodelja, Dorian, Kuzovkin, Ilya, Korjus, Kristjan, Aru, Juhan, Aru, Jaan, and Vicente, Raul. Multiagent cooperation and competition with deep reinforcement learning. arXiv preprint arXiv:1511.08779, 2015.
[28]
Tan, Ming. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330-337, 1993.
[29]
Tesauro, Gerald. Extending q-learning to general adaptive multi-agent systems. In NIPS, volume 4, 2003.
[30]
Usunier, Nicolas, Synnaeve, Gabriel, Lin, Zeming, and Chintala, Soumith. Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks. arXiv preprint arXiv:1609.02993, 2016.
[31]
Van der Pol, Elise and Oliehoek, Frans A. Coordinated deep reinforcement learners for traffic light control. In NIPS'16 Workshop on Learning, Inference and Control of Multi-Agent Systems, 2016.
[32]
Wang, Ziyu, Bapst, Victor, Heess, Nicolas, Mnih, Volodymyr, Munos, Remi, Kavukcuoglu, Koray, and de Freitas, Nando. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224, 2016.
[33]
Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.
[34]
Weyns, Danny, Helleboogh, Alexander, and Holvoet, Tom. The packet-world: A test bed for investigating situated multi-agent systems. In Software Agent-Based Applications, Platforms and Development Kits, pp. 383-408. Springer, 2005.
[35]
Yang, Erfu and Gu, Dongbing. Multiagent reinforcement learning for multi-robot systems: A survey. Technical report, tech. rep, 2004.
[36]
Ye, Dayong, Zhang, Minjie, and Yang, Yun. A multi-agent framework for packet routing in wireless sensor networks. sensors, 15(5):10026-10047, 2015.
[37]
Zawadzki, E., Lipson, A., and Leyton-Brown, K. Empirically evaluating multiagent learning algorithms. arXiv preprint 1401.8074, 2014.

Cited By

View all
  • (2022)Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive ReasoningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535884(290-299)Online publication date: 9-May-2022
  • (2022)Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing OptimizationWireless Communications & Mobile Computing10.1155/2022/12319792022Online publication date: 1-Jan-2022
  • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
  • Show More Cited By
  1. Stabilising experience replay for deep multi-agent reinforcement learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70
    August 2017
    4208 pages

    Publisher

    JMLR.org

    Publication History

    Published: 06 August 2017

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)85
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 17 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive ReasoningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3535884(290-299)Online publication date: 9-May-2022
    • (2022)Feudal Multiagent Reinforcement Learning for Interdomain Collaborative Routing OptimizationWireless Communications & Mobile Computing10.1155/2022/12319792022Online publication date: 1-Jan-2022
    • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
    • (2020)"Other-Play " for zero-shot coordinationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525347(4399-4410)Online publication date: 13-Jul-2020
    • (2020)ColdGANsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497317(18978-18989)Online publication date: 6-Dec-2020
    • (2020)Weighted QMIXProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496579(10199-10210)Online publication date: 6-Dec-2020
    • (2020)Intention-Aware Multiagent SchedulingProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398799(285-293)Online publication date: 5-May-2020
    • (2020)Monitoring IaaS Cloud for Healthcare SystemsInternational Journal of E-Health and Medical Communications10.4018/IJEHMC.202007010411:3(54-70)Online publication date: 1-Jul-2020
    • (2019)Multi-agent common knowledge reinforcement learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455177(9927-9939)Online publication date: 8-Dec-2019
    • (2019)The StarCraft Multi-Agent ChallengeProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332052(2186-2188)Online publication date: 8-May-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media