research-article

Attention-based recurrence for multi-agent reinforcement learning under stochastic partial observability

AUTHORs:

Philipp Altmann,

Maximilian Zorn,

Jonas Nüßlein,

Michael Kölle,

Claudia Linnhoff-PopienAuthors Info & Claims

ICML'23: Proceedings of the 40th International Conference on Machine Learning

Article No.: 1157, Pages 27840 - 27853

Published: 23 July 2023 Publication History

Abstract

Stochastic partial observability poses a major challenge for decentralized coordination in multiagent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in MessySMAC.

References

[1]

Amato, C., Bernstein, D. S., and Zilberstein, S. Optimizing Memory-Bounded Controllers for Decentralized POMDPs. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, pp. 1-8, 2007.

[2]

Bernstein, D. S., Hansen, E. A., and Zilberstein, S. Bounded Policy Iteration for Decentralized POMDPs. In IJCAI, pp. 52-57, 2005.

[3]

Boutilier, C. Planning, Learning and Coordination in Multiagent Decision Processes. In Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pp. 195-210. Morgan Kaufmann Publishers Inc., 1996.

[4]

Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., Abbeel, P., Srinivas, A., and Mordatch, I. Decision Transformer: Reinforcement Learning via Sequence Modeling. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 15084-15097. Curran Associates, Inc., 2021.

[5]

Cho, K., van Merriënboer, B., Bahdanau, D., and Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103-111, 2014.

[6]

Ellis, B., Moalla, S., Samvelyan, M., Sun, M., Mahajan, A., Foerster, J. N., and Whiteson, S. SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. 2022.

[7]

Emery-Montemerlo, R., Gordon, G., Schneider, J., and Thrun, S. Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS '04, pp. 136-143, USA, 2004. IEEE Computer Society. ISBN 1581138644.

[8]

Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. Counterfactual Multi-Agent Policy Gradients. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018.

[9]

Gupta, J. K., Egorov, M., and Kochenderfer, M. Cooperative Multi-Agent Control using Deep Reinforcement Learning. In Autonomous Agents and Multiagent Systems, pp. 66- 83. Springer, 2017.

[10]

Hausknecht, M. and Stone, P. Deep Recurrent Q-Learning for Partially Observable MDPs. In 2015 AAAI Fall Symposium Series, 2015.

[11]

Hochreiter, S. and Schmidhuber, J. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997.

Digital Library

[12]

Hu, H. and Foerster, J. N. Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning. In International Conference on Learning Representations, 2019.

[13]

Iqbal, S. and Sha, F. Actor-Attention-Critic for Multi-Agent Reinforcement Learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2961-2970, Long Beach, California, USA, 09-15 Jun 2019. PMLR.

[14]

Iqbal, S., De Witt, C. A. S., Peng, B., Boehmer, W., Whiteson, S., and Sha, F. Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 4596- 4606. PMLR, 18-24 Jul 2021.

[15]

Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. Planning and Acting in Partially Observable Stochastic Domains. Artificial intelligence, 101(1-2):99-134, 1998.

[16]

Khan, M. J., Ahmed, S. H., and Sukthankar, G. Transformer-Based Value Function Decomposition for Cooperative Multi-Agent Reinforcement Learning in StarCraft. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 18(1):113-119, Oct. 2022.

Digital Library

[17]

Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

[18]

Lyu, X., Xiao, Y., Daley, B., and Amato, C. Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, pp. 844-852, 2021.

Digital Library

[19]

Lyu, X., Baisero, A., Xiao, Y., and Amato, C. A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):9396-9404, Jun. 2022.

[20]

Nair, R., Tambe, M., Yokoo, M., Pynadath, D., and Marsella, S. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI'03, pp. 705-711, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc.

[21]

Oliehoek, F. A. and Amato, C. A Concise Introduction to Decentralized POMDPs, volume 1. Springer, 2016.

[22]

Oliehoek, F. A., Spaan, M. T., and Vlassis, N. Optimal and Approximate Q-Value Functions for Decentralized POMDPs. Journal of Artificial Intelligence Research, 32: 289-353, 2008.

Digital Library

[23]

Phan, T., Gabor, T., Sedlmeier, A., Ritz, F., Kempter, B., Klein, C., Sauer, H., Schmid, R., Wieghardt, J., Zeller, M., et al. Learning and Testing Resilience in Cooperative Multi-Agent Systems. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS '20, pp. 1055-1063. International Foundation for Autonomous Agents and Multiagent Systems, 2020.

Digital Library

[24]

Phan, T., Ritz, F., Belzner, L., Altmann, P., Gabor, T., and Linnhoff-Popien, C. VAST: Value Function Factorization with Variable Agent Sub-Teams. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J.W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 24018-24032. Curran Associates, Inc., 2021.

[25]

Phan, T., Ritz, F., Nüßlein, J., Kölle, M., Gabor, T., and Linnhoff-Popien, C. Attention-Based Recurrency for Multi-Agent Reinforcement Learning under State Uncertainty. In Extended Abstracts of the 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS '23, pp. 2839-2841. International Foundation for Autonomous Agents and Multiagent Systems, 2023. ISBN 9781450394321.

[26]

Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., and Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 4295-4304. PMLR, 10-15 Jul 2018.

[27]

Rashid, T., Farquhar, G., Peng, B., and Whiteson, S. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 10199-10210. Curran Associates, Inc., 2020.

[28]

Samvelyan, M., Rashid, T., Schroeder de Witt, C., Farquhar, G., Nardelli, N., Rudner, T. G., Hung, C.-M., Torr, P. H., Foerster, J., and Whiteson, S. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, AAMAS '19, pp. 2186-2188, Richland, SC, 2019. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450363099.

Digital Library

[29]

Son, K., Kim, D., Kang, W. J., Hostallero, D. E., and Yi, Y. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 5887-5896. PMLR, 09-15 Jun 2019.

[30]

Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., and Graepel, T. Value-Decomposition Networks for Cooperative Multi-Agent Learning based on Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS '18, pp. 2085-2087, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.

Digital Library

[31]

Szer, D., Charpillet, F., and Zilberstein, S. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs. UAI'05, pp. 576-583, Arlington, Virginia, USA, 2005. AUAI Press. ISBN 0974903914.

[32]

Tan, M. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents. In Proceedings of the Tenth International Conference on International Conference on Machine Learning, ICML'93, pp. 330-337, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc. ISBN 1558603077.

[33]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is All You Need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

[34]

Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., et al. Grandmaster Level in StarCraft II using Multi-Agent Reinforcement Learning. Nature, pp. 1-5, 2019.

[35]

Wang, J., Ren, Z., Liu, T., Yu, Y., and Zhang, C. QPLEX: Duplex Dueling Multi-Agent Q-Learning. In International Conference on Learning Representations, 2021.

[36]

Watkins, C. J. and Dayan, P. Q-Learning. Machine Learning, 8(3-4):279-292, 1992.

Digital Library

[37]

Wen, M., Kuba, J. G., Lin, R., Zhang, W., Wen, Y., Wang, J., and Yang, Y. Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. arXiv preprint arXiv:2205.14953, 2022.

[38]

Whittlestone, J., Arulkumaran, K., and Crosby, M. The Societal Implications of Deep Reinforcement Learning. Journal of Artificial Intelligence Research, 70:1003-1030, May 2021. ISSN 1076-9757.

Digital Library

[39]

Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., and Wu, Y. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. In 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.

Cited By

Phan TDriscoll JRomberg JKoenig SDastani MSichman JAlechina NDignum V(2024)Confidence-Based Curriculum Learning for Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663016(1558-1566)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663016

Recommendations

Attention-Based Recurrency for Multi-Agent Reinforcement Learning under State Uncertainty
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

State uncertainty poses a major challenge for decentralized coordination. However, state uncertainty is largely neglected in multi-agent reinforcement learning research due to a strong focus on state-based centralized training for decentralized execution ...
Reinforcement Learning Under Partial Observability Guided by Learned Environment Models
iFM 2023
Abstract
Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, ...
Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement Learning
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

Sparse and delayed rewards pose a challenge to single agent reinforcement learning. This challenge is amplified in multi-agent reinforcement learning (MARL) where credit assignment of these rewards needs to happen not only across time, but also across ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'23: Proceedings of the 40th International Conference on Machine Learning

July 2023

43479 pages

Copyright © 2023.

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Phan TDriscoll JRomberg JKoenig SDastani MSichman JAlechina NDignum V(2024)Confidence-Based Curriculum Learning for Multi-Agent Path FindingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663016(1558-1566)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663016

View Options

View options

Figures

Tables

Media

View Table of Conten