Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3463952.3464045acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Structured Diversification Emergence via Reinforced Organization Control and Hierachical Consensus Learning

Published: 03 May 2021 Publication History

Abstract

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named Rochico based on reinforced organization control and hierarchical consensus learning. Rochico first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm Rochico can output the final diversified multi-agent cooperative policy.

References

[1]
Sherief Abdallah and Victor R. Lesser. 2007. Multiagent reinforcement learning and self-organization in a network of agents. In AAMAS.
[2]
Georgios Chalkiadakis, Edith Elkind, Evangelos Markakis, Maria Polukarov, and Nicholas R. Jennings. 2010. Cooperative Games with Overlapping Coalitions. J. Artif. Intell. Res.39 (2010), 179--216.
[3]
Mehdi Dastani, Virginia Dignum, and Frank Dignum. 2003. Role-assignment in open agent societies. In AAMAS.
[4]
Christian Schroeder de Witt, Jakob Foerster, Gregory Farquhar, Philip Torr, Wendelin Boehmer, and Shimon Whiteson. 2019. Multi-Agent Common Knowledge Reinforcement Learning. In NeurIPS.
[5]
Giovanna Di Marzo Serugendo, Marie-Pierre Gleizes, and Anthony Karageorgos. 2005. Self-organization in multi-agent systems. Knowledge Engineering Review 20, 2 (2005), 165--189.
[6]
Daniela Scherer Dos Santos and Ana LC Bazzan. 2012. Distributed clustering for group formation and task allocation in multiagent systems: A swarm intelligence approach.Applied Soft Computing 12, 8 (2012), 2123--2131.
[7]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2019. Diversity is All You Need: Learning Skills without a Reward Function. In ICLR.
[8]
Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In NeurIPS.
[9]
Matthew E Gaston and Marie DesJardins. 2005. Agent-organized networks for dynamic team formation. In AAMAS.
[10]
Robin Glinton, Katia P. Sycara, and Paul Scerri. 2008. Agent Organized Networks Redux. In AAAI.
[11]
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel,and Sergey Levine. 2018. Composable deep reinforcement learning for robotic manipulation. In ICRA.
[12]
Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. 2017. Reinforcement learning with deep energy-based policies. In ICML.
[13]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In ICML.
[14]
Eric A Hansen, Daniel S Bernstein, and Shlomo Zilberstein. 2004. Dynamic programming for partially observable stochastic games. In AAAI.
[15]
Jiechuan Jiang, Chen Dun, Tiejun Huang, and Zongqing Lu. 2019. Graph Convolutional Reinforcement Learning. In ICLR.
[16]
Jiechuan Jiang and Zongqing Lu. 2018. Learning attentional communication formulti-agent cooperation. In NeurIPS.
[17]
Jiechuan Jiang and Zongqing Lu. 2020. The Emergence of Individuality in Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2006.05842(2020).
[18]
Daewoo Kim, Sangwoo Moon, David Hostallero, Wan Ju Kang, Taeyoung Lee, Kyunghwan Son, and Yung Yi. 2018. Learning to Schedule Communication in Multi-agent Reinforcement Learning. In ICLR.
[19]
Ramachandra Kota, Nicholas Gibbins, and Nicholas R. Jennings. 2012. Decentralized approaches for self-adaptation in agent organizations.ACM Trans. Auton.Adapt. Syst. 7 (2012), 1:1--1:28.
[20]
Youngwoon Lee, Jingyun Yang, and Joseph J Lim. 2020. Learning to Coordinate Manipulation Skills via Skill Behavior Diversification. In ICLR.
[21]
Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, and Hongyuan Zha. 2020. F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning.arXiv preprint arXiv:2004.11145(2020).
[22]
Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, and Yann Dauphin. 2019. On the Pitfalls of Measuring Emergent Communication. In AAMAS.
[23]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research9, Nov (2008), 2579--2605.
[24]
Kathryn Sarah Macarthur, Ruben Stranders, Sarvapali Ramchurn, and Nicholas Jennings. 2011. A distributed anytime algorithm for dynamic task allocation in multi-agent systems. In AAAI.
[25]
Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. 2020. Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning. In AAAI.
[26]
Milan Mares. 2000. Fuzzy coalition structures. Fuzzy Sets Syst.114 (2000), 23--33.
[27]
Laëtitia Matignon, Laurent Jeanpierre, and Abdel-Illah Mouaddib. 2012. Coordinated multi-robot exploration under communication constraints using dcentralized Markov decision processes. In AAAI.
[28]
Kevin R McKee, Ian Gemp, Brian McWilliams, Edgar A Duèñez-Guzmán, Edward Hughes, and Joel Z Leibo. 2020. Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning. In AAMAS.
[29]
Frans A Oliehoek, Matthijs TJ Spaan, and Nikos Vlassis. 2008. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32 (2008), 289--353.
[30]
Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, and Stefano V Albrecht. 2020. Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms. arXiv preprint arXiv:2006.07869(2020).
[31]
Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long,and Jun Wang. 2017. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069(2017).
[32]
Alexander Peysakhovich and Adam Lerer. 2018. Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones. In AAMAS.
[33]
Sarvapali D Ramchurn, Alessandro Farinelli, Kathryn S Macarthur, and Nicholas R Jennings. 2010. Decentralized coordination in robocup rescue. Comput. J.53, 9(2010), 1447--1461.
[34]
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In ICML.
[35]
Tabish Rashid, Mikayel Samvelyan, C. S. Witt, Gregory Farquhar, Jakob N. Foerster, and S. Whiteson. 2018. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In ICML.
[36]
Pedro V Sander, Denis Peleshchuk, and Barbara J Grosz. 2002. A scalable, distributed algorithm for efficient task allocation. In AAMAS.
[37]
Alberto Sanfeliu and King-Sun Fu. 1983. A distance measure between attributed relational graphs for pattern recognition.IEEE transactions on systems, man, andcybernetics3 (1983), 353--362.
[38]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR.
[39]
Junjie Sheng, Xiangfeng Wang, Bo Jin, Junchi Yan, Wenhao Li, Tsung-Hui Chang, Jun Wang, and Hongyuan Zha. 2020. Learning Structured Communication for Multi-agent Reinforcement Learning. arXiv preprint arXiv:2002.04235(2020).
[40]
Tianmin Shu and Yuandong Tian. 2019. M3RL: Mind-aware Multi-agent Management Reinforcement Learning. In ICLR.
[41]
Mark Sims, Daniel Corkill, and Victor Lesser. 2008. Automated organization design for multi-agent systems.Autonomous agents and multi-agent systems16,2 (2008), 151--185.
[42]
Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Mai Xu, ZihanDing, and Lianlong Wu. 2020. Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence. In AAAI.
[43]
Peter Sunehag, G. Lever, A. Gruslys, W. Czarnecki, V. Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, K. Tuyls, and T. Graepel. 2018. Value-Decomposition Networks For Cooperative Multi-Agent Learning. In AAMAS.
[44]
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing1, 2 (1972), 146--160.
[45]
Zheng Tian, Shihao Zou, Tim Warr, Lisheng Wu, and Jun Wang. 2018. Learning to communicate implicitly by actions. arXiv preprint arXiv:1810.04444(2018).
[46]
Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. 2020. Multi-Agent Reinforcement Learning with Emergent Roles. In ICML.
[47]
Jiachen Yang, Igor Borovikov, and Hongyuan Zha. 2020. Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery. In AAMAS.
[48]
Dayong Ye, Minjie Zhang, and Danny Sutanto. 2013. Self-Adaptation-Based Dynamic Coalition Formation in a Distributed Agent Network: A Mechanism and a Brief Survey.IEEE Transactions on Parallel and Distributed Systems24(2013), 1042--1051.
[49]
Dayong Ye, Minjie Zhang, and Athanasios V Vasilakos. 2016. A survey of self-organization mechanisms in multiagent systems.IEEE Transactions on Systems,Man, and Cybernetics: Systems 47, 3 (2016), 441--461.
[50]
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. 2017. Deep sets. In NeurIPS.
[51]
Chongjie Zhang, Sherief Abdallah, and Victor Lesser. 2009. Integrating organizational control into multi-agent learning. In AAMAS.
[52]
Chongjie Zhang, Victor R Lesser, and Sherief Abdallah. 2010. Self-organization for coordinating decentralized reinforcement learning. In AAMAS.
[53]
L Zheng, J Yang, H Cai, W Zhang, J Wang, and Y Yu. 2018. MAgent: A many-agent reinforcement learning platform for artificial collective intelligence. In AAAI.

Index Terms

  1. Structured Diversification Emergence via Reinforced Organization Control and Hierachical Consensus Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
      May 2021
      1899 pages
      ISBN:9781450383073

      Sponsors

      Publisher

      International Foundation for Autonomous Agents and Multiagent Systems

      Richland, SC

      Publication History

      Published: 03 May 2021

      Check for updates

      Author Tags

      1. cooperative marl
      2. diversification
      3. organization control

      Qualifiers

      • Research-article

      Conference

      AAMAS '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 40
        Total Downloads
      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 24 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media