Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Open and real-world human-AI coordination by heterogeneous training with communication

Published: 22 November 2024 Publication History

Abstract

Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners, making it a crucial aspect of cooperative multi-agent reinforcement learning (MARL). Achieving satisfying performance of AI agents poses a long-standing challenge. Recently, ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings, requiring agents to coordinate efficiently with a range of unseen human partners. However, these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner, which deviates from real-world conditions. To facilitate the practical deployment and application of human-AI coordination in open and real-world environments, we propose the first benchmark for open and real-world human-AI coordination (ORC) called ORCBench. ORCBench includes widely used human-AI coordination environments. Notably, within the context of real-world scenarios, ORCBench considers heterogeneity between AI agents and partners, encompassing variations in capabilities and observations, which aligns more closely with real-world applications. Furthermore, we introduce a framework known as Heterogeneous training with Communication (HeteC) for ORC. HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners. Additionally, HeteC incorporates a communication module that enables human partners to communicate with AI agents, mitigating the adverse effects of partially observable environments. Through a series of experiments, we demonstrate the effectiveness of HeteC in improving coordination performance. Our contribution serves as an initial but important step towards addressing the challenges of ORC.

References

[1]
Klein G, Woods D D, Bradshaw J M, Hoffman R R, and Feltovich P J Ten challenges for making automation a “team player” in joint human-agent activity IEEE Intelligent Systems 2004 19 6 91-95
[2]
Dafoe A, Bachrach Y, Hadfield G, Horvitz E, Larson K, and Graepel T Cooperative AI: machines must learn to find common ground Nature 2021 593 7857 33-36
[3]
Hernandez-Leal P, Kartal B, and Taylor M E A survey and critique of multiagent deep reinforcement learning Autonomous Agents and Multi-Agent Systems 2019 33 6 750-797
[4]
Du W and Ding S F A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications Artificial Intelligence Review 2021 54 5 3215-3238
[5]
Oroojlooy A and Hajinezhad D A review of cooperative multi-agent deep reinforcement learning Applied Intelligence 2023 53 11 13677-13722
[6]
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, and Mordatch I Multi-agent actor-critic for mixed cooperative-competitive environments Proceedings of the 31st International Conference on Neural Information Processing Systems 2017 6382-6393
[7]
Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo J Z, Tuyls K, and Graepel T Value-decomposition networks for cooperative multi-agent learning based on team reward Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems 2018 2085-2087
[8]
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, and Whiteson S QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning Proceedings of the 35th International Conference on Machine Learning 2018 4295-4304
[9]
Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A M, and Wu Y The surprising effectiveness of PPO in cooperative multi-agent games Proceedings of the 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track 2022 24611-24624
[10]
Gorsane R, Mahjoub O, De Kock R J, Dubb R, Singh S, and Pretorius A Towards a standardised performance evaluation protocol for cooperative marl Proceedings of the 36th Conference on Neural Information Processing Systems 2022 5510-5521
[11]
Hu H, Lerer A, Peysakhovich A, and Foerster J “Other-play” for zero-shot coordination Proceedings of the 37th International Conference on Machine Learning 2020 409
[12]
Carroll M, Shah R, Ho M K, Griffiths T, Seshia S A, Abbeel P, and Dragan A On the utility of learning about humans for human-AI coordination Proceedings of the 33rd International Conference on Neural Information Processing Systems 2019 465
[13]
Yuan L, Li L, Zhang Z, Chen F, Zhang T, Guan C, Yu Y, and Zhou Z H Learning to coordinate with anyone Proceedings of the 5th International Conference on Distributed Artificial Intelligence 2023 4
[14]
Zhou Z H Open-environment machine learning National Science Review 2022 9 8 nwac123
[15]
Liu X, Liang J, Liu D Y, Chen R, and Yuan S M Weapon-target assignment in unreliable peer-to-peer architecture based on adapted artificial bee colony algorithm Frontiers of Computer Science 2022 16 1 161103
[16]
Parmar J, Chouhan S, Raychoudhury V, and Rathore S Open-world machine learning: applications, challenges, and opportunities ACM Computing Surveys 2023 55 10 205
[17]
Yuan L, Zhang Z, Li L, Guan C, Yu Y. A survey of progress on cooperative multi-agent reinforcement learning in open environment. 2023, arXiv preprint arXiv: 2312.01058
[18]
Stone P, Kaminka G A, Kraus S, and Rosenschein J S Ad hoc autonomous agent teams: Collaboration without pre-coordination Proceedings of the 24th AAAI Conference on Artificial Intelligence 2010 1504-1509
[19]
Mirsky R, Carlucho I, Rahman A, Fosong E, Macke W, Sridharan M, Stone P, and Albrecht S V A survey of ad Hoc teamwork research Proceedings of the 19th European Conference on Multi-Agent Systems 2022 275-293
[20]
Lupu A, Cui B, Hu H, and Foerster J Trajectory diversity for zero-shot coordination Proceedings of the 38th International Conference on Machine Learning 2021 7204-7213
[21]
Strouse D J, McKee K R, Botvinick M, Hughes E, and Everett R Collaborating with humans without human data Proceedings of the 35th Conference on Neural Information Processing Systems 2021 14502-14515
[22]
Zhao R, Song J, Yuan Y, Hu H, Gao Y, Wu Y, Sun Z, and Yang W Maximum entropy population-based training for zero-shot human-AI coordination Proceedings of the 37th AAAI Conference on Artificial Intelligence 2023 689
[23]
Yu C, Gao J, Liu W, Xu B, Tang H, Yang J, Wang Y, and Wu Y Learning zero-shot cooperation with humans, assuming humans are biased Proceedings of the 11th International Conference on Learning Representations 2023
[24]
Wang X, Zhang S, Zhang W, Dong W, Chen J, Wen Y, and Zhang W Quantifying zero-shot coordination capability with behavior preferring partners Proceedings of the 12th International Conference on Learning Representations 2024
[25]
Kapetanakis S and Kudenko D Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems 2004 1258-1259
[26]
Wang C, Pérez-D’Arpino C, Xu D, Li F F, Liu K, and Savarese S Co-GAIL: Learning diverse strategies for human-robot collaboration Proceedings of the 5th Conference on Robot Learning 2022 1279-1290
[27]
Xue K, Wang Y, Guan C, Yuan L, Fu H, Fu Q, Qian C, Yu Y. Heterogeneous multi-agent zero-shot coordination by coevolution. 2022, arXiv preprint arXiv: 2208.04957
[28]
Cabrera C, Paleyes A, Thodoroff P, Lawrence N D. Real-world machine learning systems: a survey from a data-oriented architecture perspective. 2023, arXiv preprint arXiv: 2302.04810
[29]
Davenport T H and Ronanki R Artificial intelligence for the real world Harvard Business Review 2018 96 1 108-116
[30]
Fontaine M C, Hsu Y C, Zhang Y, Tjanaka B, and Nikolaidis S On the importance of environments in human-robot coordination Proceedings of the 17th Robotics: Science and Systems 2021 2021
[31]
Busoniu L, Babuska R, and De Schutter B A comprehensive survey of multiagent reinforcement learning IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2008 38 2 156-172
[32]
Zhang K, Yang Z, and Başar T Vamvoudakis K G, Wan Y, Lewis F L, and Cansever D Multi-agent reinforcement learning: a selective overview of theories and algorithms Handbook of Reinforcement Learning and Control 2021 Cham Springer 321-384
[33]
Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar T K S, Koenig S, and Choset H Primal: pathfinding via reinforcement and imitation multi-agent learning IEEE Robotics and Automation Letters 2019 4 3 2378-2385
[34]
Wang J, Xu W, Gu Y, Song W, and Green T C Multi-agent reinforcement learning for active voltage control on power distribution networks Proceedings of the 35th Conference on Advances in Neural Information Processing Systems 2021 3271-3284
[35]
Xue K, Xu J, Yuan L, Li M, Qian C, Zhang Z, and Yu Y Multi-agent dynamic algorithm configuration Proceedings of the 36th Conference on Advances in Neural Information Processing Systems 2022 20147-20161
[36]
Wen M, Kuba J G, Lin R, Zhang W, Wen Y, Wang J, and Yang Y Multi-agent reinforcement learning is a sequence modeling problem Proceedings of the 36th Conference on Neural Information Processing Systems 2022 16509-16521
[37]
Samvelyan M, Rashid T, De Witt C S, Farquhar G, Nardelli N, Rudner T G J, Hung C, Torr P H S, Foerster J N, and Whiteson S The starcraft multi-agent challenge Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems 2019 2186-2188
[38]
Bard N, Foerster J N, Chandar S, Burch N, Lanctot M, Song H F, Parisotto E, Dumoulin V, Moitra S, Hughes E, Dunning I, Mourad S, Larochelle H, Bellemare M G, and Bowling M The hanabi challenge: A new frontier for AI research Artificial Intelligence 2020 280 103216
[39]
Zhu C, Dastani M, Wang S. A survey of multi-agent reinforcement learning with communication. 2022, arXiv preprint arXiv: 2203.08975
[40]
Zhang F, Jia C, Li Y C, Yuan L, Yu Y, and Zhang Z Discovering generalizable multi-agent coordination skills from multi-task offline data Proceedings of the 11th International Conference on Learning Representations 2023
[41]
Wang X, Zhang Z, Zhang W. Model-based multi-agent reinforcement learning: Recent progress and prospects. 2022, arXiv preprint arXiv: 2203.10603
[42]
Guo J, Chen Y, Hao Y, Yin Z, Yu Y, and Li S Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022
[43]
Yuan L, Zhang Z, Xue K, Yin H, Chen F, Guan C, Li L, Qian C, and Yu Y Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers Proceedings of the 37th AAAI Conference on Artificial Intelligence 2023 1319
[44]
Foerster J N, Assael Y M, De Freitas N, and Whiteson S Learning to communicate with deep multi-agent reinforcement learning Proceedings of the 30th International Conference on Neural Information Processing Systems 2016 2145-2153
[45]
Sukhbaatar S, Szlam A, and Fergus R Learning multiagent communication with backpropagation Proceedings of the 30th International Conference on Neural Information Processing Systems 2016 2252-2260
[46]
Ding Z, Huang T, and Lu Z Learning individually inferred communication for multi-agent cooperation Proceedings of the 34th International Conference on Neural Information Processing Systems 2020 1851
[47]
Mao H, Zhang Z, Xiao Z, Gong Z, and Ni Y Learning agent communication under limited bandwidth by message pruning Proceedings of the 34th AAAI Conference on Artificial Intelligence 2020 5142-5149
[48]
Yuan L, Wang J, Zhang F, Wang C, Zhang Z, Yu Y, and Zhang C Multi-agent incentive communication via decentralized teammate modeling Proceedings of the 36th AAAI Conference on Artificial Intelligence 2022 9466-9474
[49]
Zhang S Q, Zhang Q, and Lin J Efficient communication in multi-agent reinforcement learning via variance based control Proceedings of the 33rd International Conference on Neural Information Processing Systems 2019 291
[50]
Zhang S Q, Zhang Q, and Lin J Succinct and robust multi-agent communication with temporal message control Proceedings of the 34th International Conference on Neural Information Processing Systems 2020 1449
[51]
Guan C, Chen F, Yuan L, Wang C, Yin H, Zhang Z, and Yu Y Efficient multi-agent communication via self-supervised information aggregation Proceedings of the 36th Conference on Neural Information Processing Systems 2022 1020-1033
[52]
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, and Pineau J TarMAC: Targeted multi-agent communication Proceedings of the 36th International Conference on Machine Learning 2019 1538-1546
[53]
Guan C, Chen F, Yuan L, Zhang Z, Yu Y. Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. 2023, arXiv preprint arXiv: 2302.09605
[54]
Yuan L, Jiang T, Li L, Chen F, Zhang Z, Yu Y. Robust multi-agent communication via multi-view message certification. 2023, arXiv preprint arXiv: 2305.13936
[55]
Yuan L, Chen F, Zhang Z, and Yu Y Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation Frontiers of Computer Science 2024 18 6 186331
[56]
Gwak J, Jung J, Oh R, Park M, Rakhimov M A K, and Ahn J A review of intelligent self-driving vehicle software research KSII Transactions on Internet and Information Systems (TIIS) 2019 13 11 5299-5320
[57]
Andrychowicz O M, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L L, and Zaremba W Learning dexterous inhand manipulation The International Journal of Robotics Research 2020 39 1 3-20
[58]
Engelbart D C Augmenting human intellect: a conceptual framework 2023
[59]
Carter S and Nielsen M Using artificial intelligence to augment human intelligence Distill 2017 2 12 e9
[60]
Hu H, Lerer A, Cui B, Pineda L, Brown N, and Foerster J N Off-belief learning Proceedings of the 38th International Conference on Machine Learning 2021 4369-4379
[61]
Treutlein J, Dennis M, Oesterheld C, and Foerster J A new formalism, method and open issues for zero-shot coordination Proceedings of the 38th International Conference on Machine Learning 2021 10413-10423
[62]
Li Y, Zhang S, Sun J, Du Y, Wen Y, Wang X, and Pan W Cooperative open-ended learning framework for zero-shot coordination Proceedings of the 40th International Conference on Machine Learning 2023 844
[63]
Oliehoek F A and Amato C A Concise Introduction to Decentralized POMDPs 2016 Cham Springer
[64]
Xue W, Qiu W, An B, Rabinovich Z, Obraztsova S, and Yeo C K Misspoke or mis-lead: Achieving robustness in multi-agent communicative reinforcement learning Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems 2022 1418-1426
[65]
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017, arXiv preprint arXiv: 1712.01815
[66]
Tesauro G TD-gammon, a self-teaching backgammon program, achieves master-level play Neural Computation 1994 6 2 215-219
[67]
Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv: 1711.09846
[68]
Lucas K and Allen R E Any-play: an intrinsic augmentation for zero shot coordination Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems 2022 853-861
[69]
Mondal W U, Agarwal M, Aggarwal V, and Ukkusuri S V On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC) Journal of Machine Learning Research 2022 23 1 129
[70]
Kuba J G, Feng X, Ding S, Dong H, Wang J, Yang Y. Heterogeneous-agent mirror learning: A continuum of solutions to cooperative MARL. 2022, arXiv preprint arXiv: 2208.01682
[71]
Charakorn R, Manoonpong P, and Dilokthanakul N Generating diverse cooperative agents by learning incompatible policies Proceedings of the 11th International Conference on Learning Representations 2023
[72]
Lou X, Guo J, Zhang J, Wang J, Huang K, and Du Y PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems 2023 679-688
[73]
Zheng S, Trott A, Srinivasa S, Naik N, Gruesbeck M, Parkes D C, Socher R. The AI economist: Improving equality and productivity with AI-Driven tax policies. 2020, arXiv preprint arXiv: 2004. 13332
[74]
Bäck T Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms 1996 New York Oxford University Press
[75]
Hao H, Zhang X, and Zhou A Enhancing SAEAs with unevaluated solutions: A case study of relation model for expensive optimization Science China Information Sciences 2024 67 2 120103
[76]
Wang Y, Xue K, and Qian C Evolutionary diversity optimization with clustering-based selection for reinforcement learning Proceedings of the 10th International Conference on Learning Representations 2022
[77]
Demšar J Statistical comparisons of classifiers over multiple data sets The Journal of Machine Learning Research 2006 7 1-30

Index Terms

  1. Open and real-world human-AI coordination by heterogeneous training with communication
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
            Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 19, Issue 4
            Apr 2025
            122 pages

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 22 November 2024
            Accepted: 13 March 2024
            Received: 07 October 2023

            Author Tags

            1. human-AI coordination
            2. multi-agent reinforcement learning
            3. communication
            4. open-environment coordination
            5. real-world coordination

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 05 Mar 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media