Nothing Special   »   [go: up one dir, main page]

skip to main content

Open and real-world human-AI coordination by heterogeneous training with communication

Published: 22 November 2024 Publication History


Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners, making it a crucial aspect of cooperative multi-agent reinforcement learning (MARL). Achieving satisfying performance of AI agents poses a long-standing challenge. Recently, ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings, requiring agents to coordinate efficiently with a range of unseen human partners. However, these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner, which deviates from real-world conditions. To facilitate the practical deployment and application of human-AI coordination in open and real-world environments, we propose the first benchmark for open and real-world human-AI coordination (ORC) called ORCBench. ORCBench includes widely used human-AI coordination environments. Notably, within the context of real-world scenarios, ORCBench considers heterogeneity between AI agents and partners, encompassing variations in capabilities and observations, which aligns more closely with real-world applications. Furthermore, we introduce a framework known as Heterogeneous training with Communication (HeteC) for ORC. HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners. Additionally, HeteC incorporates a communication module that enables human partners to communicate with AI agents, mitigating the adverse effects of partially observable environments. Through a series of experiments, we demonstrate the effectiveness of HeteC in improving coordination performance. Our contribution serves as an initial but important step towards addressing the challenges of ORC.


Klein G, Woods D D, Bradshaw J M, Hoffman R R, and Feltovich P J Ten challenges for making automation a “team player” in joint human-agent activity IEEE Intelligent Systems 2004 19 6 91-95
Dafoe A, Bachrach Y, Hadfield G, Horvitz E, Larson K, and Graepel T Cooperative AI: machines must learn to find common ground Nature 2021 593 7857 33-36
Hernandez-Leal P, Kartal B, and Taylor M E A survey and critique of multiagent deep reinforcement learning Autonomous Agents and Multi-Agent Systems 2019 33 6 750-797
Du W and Ding S F A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications Artificial Intelligence Review 2021 54 5 3215-3238
Oroojlooy A and Hajinezhad D A review of cooperative multi-agent deep reinforcement learning Applied Intelligence 2023 53 11 13677-13722
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, and Mordatch I Multi-agent actor-critic for mixed cooperative-competitive environments Proceedings of the 31st International Conference on Neural Information Processing Systems 2017 6382-6393
Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo J Z, Tuyls K, and Graepel T Value-decomposition networks for cooperative multi-agent learning based on team reward Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems 2018 2085-2087
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, and Whiteson S QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning Proceedings of the 35th International Conference on Machine Learning 2018 4295-4304
Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A M, and Wu Y The surprising effectiveness of PPO in cooperative multi-agent games Proceedings of the 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track 2022 24611-24624
Gorsane R, Mahjoub O, De Kock R J, Dubb R, Singh S, and Pretorius A Towards a standardised performance evaluation protocol for cooperative marl Proceedings of the 36th Conference on Neural Information Processing Systems 2022 5510-5521
Hu H, Lerer A, Peysakhovich A, and Foerster J “Other-play” for zero-shot coordination Proceedings of the 37th International Conference on Machine Learning 2020 409
Carroll M, Shah R, Ho M K, Griffiths T, Seshia S A, Abbeel P, and Dragan A On the utility of learning about humans for human-AI coordination Proceedings of the 33rd International Conference on Neural Information Processing Systems 2019 465
Yuan L, Li L, Zhang Z, Chen F, Zhang T, Guan C, Yu Y, and Zhou Z H Learning to coordinate with anyone Proceedings of the 5th International Conference on Distributed Artificial Intelligence 2023 4
Zhou Z H Open-environment machine learning National Science Review 2022 9 8 nwac123
Liu X, Liang J, Liu D Y, Chen R, and Yuan S M Weapon-target assignment in unreliable peer-to-peer architecture based on adapted artificial bee colony algorithm Frontiers of Computer Science 2022 16 1 161103
Parmar J, Chouhan S, Raychoudhury V, and Rathore S Open-world machine learning: applications, challenges, and opportunities ACM Computing Surveys 2023 55 10 205
Yuan L, Zhang Z, Li L, Guan C, Yu Y. A survey of progress on cooperative multi-agent reinforcement learning in open environment. 2023, arXiv preprint arXiv: 2312.01058
Stone P, Kaminka G A, Kraus S, and Rosenschein J S Ad hoc autonomous agent teams: Collaboration without pre-coordination Proceedings of the 24th AAAI Conference on Artificial Intelligence 2010 1504-1509
Mirsky R, Carlucho I, Rahman A, Fosong E, Macke W, Sridharan M, Stone P, and Albrecht S V A survey of ad Hoc teamwork research Proceedings of the 19th European Conference on Multi-Agent Systems 2022 275-293
Lupu A, Cui B, Hu H, and Foerster J Trajectory diversity for zero-shot coordination Proceedings of the 38th International Conference on Machine Learning 2021 7204-7213
Strouse D J, McKee K R, Botvinick M, Hughes E, and Everett R Collaborating with humans without human data Proceedings of the 35th Conference on Neural Information Processing Systems 2021 14502-14515
Zhao R, Song J, Yuan Y, Hu H, Gao Y, Wu Y, Sun Z, and Yang W Maximum entropy population-based training for zero-shot human-AI coordination Proceedings of the 37th AAAI Conference on Artificial Intelligence 2023 689
Yu C, Gao J, Liu W, Xu B, Tang H, Yang J, Wang Y, and Wu Y Learning zero-shot cooperation with humans, assuming humans are biased Proceedings of the 11th International Conference on Learning Representations 2023
Wang X, Zhang S, Zhang W, Dong W, Chen J, Wen Y, and Zhang W Quantifying zero-shot coordination capability with behavior preferring partners Proceedings of the 12th International Conference on Learning Representations 2024
Kapetanakis S and Kudenko D Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems 2004 1258-1259
Wang C, Pérez-D’Arpino C, Xu D, Li F F, Liu K, and Savarese S Co-GAIL: Learning diverse strategies for human-robot collaboration Proceedings of the 5th Conference on Robot Learning 2022 1279-1290
Xue K, Wang Y, Guan C, Yuan L, Fu H, Fu Q, Qian C, Yu Y. Heterogeneous multi-agent zero-shot coordination by coevolution. 2022, arXiv preprint arXiv: 2208.04957
Cabrera C, Paleyes A, Thodoroff P, Lawrence N D. Real-world machine learning systems: a survey from a data-oriented architecture perspective. 2023, arXiv preprint arXiv: 2302.04810
Davenport T H and Ronanki R Artificial intelligence for the real world Harvard Business Review 2018 96 1 108-116
Fontaine M C, Hsu Y C, Zhang Y, Tjanaka B, and Nikolaidis S On the importance of environments in human-robot coordination Proceedings of the 17th Robotics: Science and Systems 2021 2021
Busoniu L, Babuska R, and De Schutter B A comprehensive survey of multiagent reinforcement learning IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2008 38 2 156-172
Zhang K, Yang Z, and Başar T Vamvoudakis K G, Wan Y, Lewis F L, and Cansever D Multi-agent reinforcement learning: a selective overview of theories and algorithms Handbook of Reinforcement Learning and Control 2021 Cham Springer 321-384
Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar T K S, Koenig S, and Choset H Primal: pathfinding via reinforcement and imitation multi-agent learning IEEE Robotics and Automation Letters 2019 4 3 2378-2385
Wang J, Xu W, Gu Y, Song W, and Green T C Multi-agent reinforcement learning for active voltage control on power distribution networks Proceedings of the 35th Conference on Advances in Neural Information Processing Systems 2021 3271-3284
Xue K, Xu J, Yuan L, Li M, Qian C, Zhang Z, and Yu Y Multi-agent dynamic algorithm configuration Proceedings of the 36th Conference on Advances in Neural Information Processing Systems 2022 20147-20161
Wen M, Kuba J G, Lin R, Zhang W, Wen Y, Wang J, and Yang Y Multi-agent reinforcement learning is a sequence modeling problem Proceedings of the 36th Conference on Neural Information Processing Systems 2022 16509-16521
Samvelyan M, Rashid T, De Witt C S, Farquhar G, Nardelli N, Rudner T G J, Hung C, Torr P H S, Foerster J N, and Whiteson S The starcraft multi-agent challenge Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems 2019 2186-2188
Bard N, Foerster J N, Chandar S, Burch N, Lanctot M, Song H F, Parisotto E, Dumoulin V, Moitra S, Hughes E, Dunning I, Mourad S, Larochelle H, Bellemare M G, and Bowling M The hanabi challenge: A new frontier for AI research Artificial Intelligence 2020 280 103216
Zhu C, Dastani M, Wang S. A survey of multi-agent reinforcement learning with communication. 2022, arXiv preprint arXiv: 2203.08975
Zhang F, Jia C, Li Y C, Yuan L, Yu Y, and Zhang Z Discovering generalizable multi-agent coordination skills from multi-task offline data Proceedings of the 11th International Conference on Learning Representations 2023
Wang X, Zhang Z, Zhang W. Model-based multi-agent reinforcement learning: Recent progress and prospects. 2022, arXiv preprint arXiv: 2203.10603
Guo J, Chen Y, Hao Y, Yin Z, Yu Y, and Li S Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022
Yuan L, Zhang Z, Xue K, Yin H, Chen F, Guan C, Li L, Qian C, and Yu Y Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers Proceedings of the 37th AAAI Conference on Artificial Intelligence 2023 1319
Foerster J N, Assael Y M, De Freitas N, and Whiteson S Learning to communicate with deep multi-agent reinforcement learning Proceedings of the 30th International Conference on Neural Information Processing Systems 2016 2145-2153
Sukhbaatar S, Szlam A, and Fergus R Learning multiagent communication with backpropagation Proceedings of the 30th International Conference on Neural Information Processing Systems 2016 2252-2260
Ding Z, Huang T, and Lu Z Learning individually inferred communication for multi-agent cooperation Proceedings of the 34th International Conference on Neural Information Processing Systems 2020 1851
Mao H, Zhang Z, Xiao Z, Gong Z, and Ni Y Learning agent communication under limited bandwidth by message pruning Proceedings of the 34th AAAI Conference on Artificial Intelligence 2020 5142-5149
Yuan L, Wang J, Zhang F, Wang C, Zhang Z, Yu Y, and Zhang C Multi-agent incentive communication via decentralized teammate modeling Proceedings of the 36th AAAI Conference on Artificial Intelligence 2022 9466-9474
Zhang S Q, Zhang Q, and Lin J Efficient communication in multi-agent reinforcement learning via variance based control Proceedings of the 33rd International Conference on Neural Information Processing Systems 2019 291
Zhang S Q, Zhang Q, and Lin J Succinct and robust multi-agent communication with temporal message control Proceedings of the 34th International Conference on Neural Information Processing Systems 2020 1449
Guan C, Chen F, Yuan L, Wang C, Yin H, Zhang Z, and Yu Y Efficient multi-agent communication via self-supervised information aggregation Proceedings of the 36th Conference on Neural Information Processing Systems 2022 1020-1033
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, and Pineau J TarMAC: Targeted multi-agent communication Proceedings of the 36th International Conference on Machine Learning 2019 1538-1546
Guan C, Chen F, Yuan L, Zhang Z, Yu Y. Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. 2023, arXiv preprint arXiv: 2302.09605
Yuan L, Jiang T, Li L, Chen F, Zhang Z, Yu Y. Robust multi-agent communication via multi-view message certification. 2023, arXiv preprint arXiv: 2305.13936
Yuan L, Chen F, Zhang Z, and Yu Y Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation Frontiers of Computer Science 2024 18 6 186331
Gwak J, Jung J, Oh R, Park M, Rakhimov M A K, and Ahn J A review of intelligent self-driving vehicle software research KSII Transactions on Internet and Information Systems (TIIS) 2019 13 11 5299-5320
Andrychowicz O M, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L L, and Zaremba W Learning dexterous inhand manipulation The International Journal of Robotics Research 2020 39 1 3-20
Engelbart D C Augmenting human intellect: a conceptual framework 2023
Carter S and Nielsen M Using artificial intelligence to augment human intelligence Distill 2017 2 12 e9
Hu H, Lerer A, Cui B, Pineda L, Brown N, and Foerster J N Off-belief learning Proceedings of the 38th International Conference on Machine Learning 2021 4369-4379
Treutlein J, Dennis M, Oesterheld C, and Foerster J A new formalism, method and open issues for zero-shot coordination Proceedings of the 38th International Conference on Machine Learning 2021 10413-10423
Li Y, Zhang S, Sun J, Du Y, Wen Y, Wang X, and Pan W Cooperative open-ended learning framework for zero-shot coordination Proceedings of the 40th International Conference on Machine Learning 2023 844
Oliehoek F A and Amato C A Concise Introduction to Decentralized POMDPs 2016 Cham Springer
Xue W, Qiu W, An B, Rabinovich Z, Obraztsova S, and Yeo C K Misspoke or mis-lead: Achieving robustness in multi-agent communicative reinforcement learning Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems 2022 1418-1426
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017, arXiv preprint arXiv: 1712.01815
Tesauro G TD-gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 1994 6 2 215-219
Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv: 1711.09846
Lucas K and Allen R E Any-play: an intrinsic augmentation for zero shot coordination Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems 2022 853-861
Mondal W U, Agarwal M, Aggarwal V, and Ukkusuri S V On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC) Journal of Machine Learning Research 2022 23 1 129
Kuba J G, Feng X, Ding S, Dong H, Wang J, Yang Y. Heterogeneous-agent mirror learning: A continuum of solutions to cooperative MARL. 2022, arXiv preprint arXiv: 2208.01682
Charakorn R, Manoonpong P, and Dilokthanakul N Generating diverse cooperative agents by learning incompatible policies Proceedings of the 11th International Conference on Learning Representations 2023
Lou X, Guo J, Zhang J, Wang J, Huang K, and Du Y PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems 2023 679-688
Zheng S, Trott A, Srinivasa S, Naik N, Gruesbeck M, Parkes D C, Socher R. The AI economist: Improving equality and productivity with AI-Driven tax policies. 2020, arXiv preprint arXiv: 2004. 13332
Bäck T Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms 1996 New York Oxford University Press
Hao H, Zhang X, and Zhou A Enhancing SAEAs with unevaluated solutions: A case study of relation model for expensive optimization Science China Information Sciences 2024 67 2 120103
Wang Y, Xue K, and Qian C Evolutionary diversity optimization with clustering-based selection for reinforcement learning Proceedings of the 10th International Conference on Learning Representations 2022
Demšar J Statistical comparisons of classifiers over multiple data sets The Journal of Machine Learning Research 2006 7 1-30

Index Terms

  1. Open and real-world human-AI coordination by heterogeneous training with communication
            Index terms have been assigned to the content through auto-classification.



            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors


            Published In

            cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
            Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 19, Issue 4
            Apr 2025
            122 pages



            Berlin, Heidelberg

            Publication History

            Published: 22 November 2024
            Accepted: 13 March 2024
            Received: 07 October 2023

            Author Tags

            1. human-AI coordination
            2. multi-agent reinforcement learning
            3. communication
            4. open-environment coordination
            5. real-world coordination


            • Research-article


            Other Metrics

            Bibliometrics & Citations


            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 05 Mar 2025

            Other Metrics


            View Options

            View options






            Share this Publication link

            Share on social media