Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration †
<p>A six-robot exploration scenario; the signals stand for the correct communication method: Robot 1 should communicate with Robot 0 to avoid repeated exploration; Robot 1 should also communicate with Robot 3 to avoid target area conflict; Robot 2, Robot 4, and Robot 5 should communicate with the others less to avoid interference.</p> "> Figure 2
<p>The architecture of the Attention-based Communication neural network (CommAttn): each agent’s input is composed of two parts: the local observation (the environment knowledge within the agent’s vision range) and the trajectory (a set of the agent’s history positions).</p> "> Figure 3
<p>The experimental environment, which is dynamic in the number of blocks.</p> "> Figure 4
<p>The exploration rate in 30 s of a different number of robots.</p> "> Figure 5
<p>As the adding frequency of the blocks increases, CommAttn shows a more stable exploration efficiency than the baseline “pre-designed” methods.</p> "> Figure 6
<p>An illustrative scenario (the initial environment and the corresponding actions for all agents) to show how CommAttn successfully deals with dynamic environments (newly-introduced blocks).</p> "> Figure 7
<p>The optimal actions of CommAttn and the sub-optimal actions of the coordinated frontier-based approach after some unexpected blocks introduced to the environment.</p> "> Figure 8
<p>The variation of the agents’ summed scores in the training process between CommAttn and the baseline “learning” methods.</p> "> Figure 9
<p>The relationship among the success rate, the vision range, and the communication range of CommNet.</p> "> Figure 10
<p>The values of the hidden state <math display="inline"><semantics> <msub> <mi>s</mi> <mi>j</mi> </msub> </semantics></math> of each agent from the decoder part in the static exploration environment.</p> "> Figure 11
<p>The average norm of communication vectors in the static environment.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Multi-Robot Exploration
2.2. Learning to Cooperate Based on Explicit Communication
3. Methodology
4. Implementation Details
4.1. Modeling of the Exploration Environment
4.2. Entropy-Oriented Reward Function
4.3. Exploration Ratio-Based Training Approach
5. Experimental Results
5.1. Results Compared with the “Pre-Designed” Strategies
5.2. Results Compared with the Existing “Learning-Based” Methods
5.3. Analysis of Communication
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Juliá, M.; Gil, A.; Reinoso, O. A comparison of path planning strategies for autonomous exploration and mapping of unknown environments. Auton. Robot. 2012, 33, 427–444. [Google Scholar] [CrossRef]
- Arai, T.; Pagello, E.; Parker, L.E. Advances in multi-robot systems. IEEE Trans. Robot. Autom. 2002, 18, 655–661. [Google Scholar] [CrossRef]
- Peng, Z.; Zhang, L.; Luo, T. Learning to Communicate via Supervised Attentional Message Processing. In Proceedings of the 31st International Conference on Computer Animation and Social Agents, Beijing, China, 21–23 May 2018; ACM: New York, NY, USA, 2018; pp. 11–16. [Google Scholar]
- Khamis, A.; Hussein, A.; Elmogy, A. Multi-robot task allocation: A review of the state-of-the-art. In Cooperative Robots and Sensor Networks 2015; Springer: Cham, Switzerland, 2015; pp. 31–51. [Google Scholar]
- Geng, M.; Li, Y.; Ding, B.; Wang, H. Deep Learning-based Cooperative Trail Following for Multi-Robot System. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Tai, L.; Zhang, J.; Liu, M.; Boedecker, J.; Burgard, W. A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation. arXiv, 2016; arXiv:1612.07139. [Google Scholar]
- Geng, M.; Zhou, X.; Ding, B.; Wang, H.; Zhang, L. Learning to Cooperate in Decentralized Multi-robot Exploration of Dynamic Environments. In Proceedings of the International Conference on Neural Information Processing, Siem Reap, Cambodia, 13–16 December 2018; Springer: Cham, Switzerland; pp. 40–51. [Google Scholar]
- Geng, M.; Liu, S.; Wu, Z. Sensor Fusion-Based Cooperative Trail Following for Autonomous Multi-Robot System. Sensors 2019, 19, 823. [Google Scholar] [CrossRef] [PubMed]
- Qin, J.; Ma, Q.; Shi, Y.; Wang, L. Recent advances in consensus of multi-agent systems: A brief survey. IEEE Trans. Ind. Electron. 2017, 64, 4972–4983. [Google Scholar] [CrossRef]
- Foerster, J.N.; Assael, Y.M.; de Freitas, N.; Whiteson, S. Learning to communicate to solve riddles with deep distributed recurrent q-networks. arXiv, 2016; arXiv:1602.02672. [Google Scholar]
- Peng, P.; Yuan, Q.; Wen, Y.; Yang, Y.; Tang, Z.; Long, H.; Wang, J. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv, 2017; arXiv:1703.10069. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, O.P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 6379–6390. [Google Scholar]
- Sukhbaatar, S.; Szlam, A.; Fergus, R. Learning multiagent communication with backpropagation. In Proceedings of the Thirtieth Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 2244–2252. [Google Scholar]
- Hoshen, Y. Vain: Attentional multi-agent predictive modeling. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 2701–2711. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Yamauchi, B. Frontier-based exploration using multiple robots. In Proceedings of the Second International Conference on Autonomous Agents, Minneapolis, MN, USA, 10–13 May 1998; ACM: New York, NY, USA, 1998; pp. 47–53. [Google Scholar] [Green Version]
- Nieto-Granda, C.; Rogers, J.G., III; Christensen, H.I. Coordination strategies for multi-robot exploration and mapping. Int. J. Robot. Res. 2014, 33, 519–533. [Google Scholar] [CrossRef]
- Visser, A.; Slamet, B.A. Balancing the information gain against the movement cost for multi-robot frontier exploration. In European Robotics Symposium 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 43–52. [Google Scholar]
- Nwachukwu, B.U.; Bozic, K.J.; Schairer, W.W.; Bernstein, J.L.; Jevsevar, D.S.; Marx, R.G.; Padgett, D.E. Current status of cost utility analyses in total joint arthroplasty: A systematic review. Clin. Orthop. Relat. Res. 2015, 473, 1815–1827. [Google Scholar] [CrossRef] [PubMed]
- Charrow, B.; Liu, S.; Kumar, V.; Michael, N. Information-theoretic mapping using cauchy-schwarz quadratic mutual information. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 4791–4798. [Google Scholar]
- Jadidi, M.G.; Miro, J.V.; Dissanayake, G. Mutual information-based exploration on continuous occupancy maps. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Minneapolis, MN, USA, 10–13 May 1998; pp. 6086–6092. [Google Scholar]
- Best, G.; Cliff, O.; Patten, T.; Mettu, R.; Fitch, R. Decentralised Monte Carlo tree search for active perception. In Proceedings of the International Workshop on the Algorithmic Foundations of Robotics, San Francisco, CA, USA, 18–20 December 2016. [Google Scholar]
- Lauri, M.; Ritala, R. Planning for robotic exploration based on forward simulation. Robot. Auton. Syst. 2016, 83, 15–31. [Google Scholar] [CrossRef] [Green Version]
- Tabib, W.; Corah, M.; Michael, N.; Whittaker, R. Computationally efficient information-theoretic exploration of pits and caves. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 3722–3727. [Google Scholar]
- Sariel, S.; Balch, T. Real time auction based allocation of tasks for multi-robot exploration problem in dynamic environments. In Proceedings of the Workshop on Integrating Planning into Scheduling (AAAI-05), Palo Alto, CA, USA, 9–10 July 2005; pp. 27–33. [Google Scholar]
- Lagoudakis, M.G.; Markakis, E.; Kempe, D.; Keskinocak, P.; Kleywegt, A.J.; Koenig, S.; Tovey, C.A.; Meyerson, A.; Jain, S. Auction-Based Multi-Robot Routing. In Proceedings of the Robotics: Science and Systems, Cambridge, MA, USA, 8–11 June 2005; Volume 5, pp. 343–350. [Google Scholar]
- Tovey, C.; Lagoudakis, M.G.; Jain, S.; Koenig, S. The generation of bidding rules for auction-based robot coordination. In Multi-Robot Systems. From Swarms to Intelligent Automata Volume III; Springer: Dordrecht, The Netherlands, 2005; pp. 3–14. [Google Scholar]
- Koenig, S.; Keskinocak, P.; Tovey, C.A. Progress on Agent Coordination with Cooperative Auctions. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), Atlanta, GA, USA, 11–15 July 2010; Volume 10, pp. 1713–1717. [Google Scholar]
- Koenig, S.; Tovey, C.; Lagoudakis, M.; Markakis, V.; Kempe, D.; Keskinocak, P.; Kleywegt, A.; Meyerson, A.; Jain, S. The power of sequential single-item auctions for agent coordination. In Proceedings of the National Conference on Artificial Intelligence, Boston, MA, USA, 16–20 July 2006; Volume 21, p. 1625. [Google Scholar]
- Colby, M.; Chung, J.J.; Tumer, K. Implicit adaptive multi-robot coordination in dynamic environments. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 5168–5173. [Google Scholar]
- Omidshafiei, S.; Pazis, J.; Amato, C.; How, J.P.; Vian, J. Deep decentralized multi-task multi-agent rl under partial observability. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Dilokthanakul, N.; Kaplanis, C.; Pawlowski, N.; Shanahan, M. Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning. arXiv, 2017; arXiv:1705.06769. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef] [Green Version]
- Barratt, S. Active Robotic Mapping through Deep Reinforcement Learning. arXiv, 2017; arXiv:1712.10069. [Google Scholar]
- Sukhbaatar, S.; Szlam, A.; Synnaeve, G.; Chintala, S.; Fergus, R. Mazebase: A sandbox for learning from games. arXiv, 2015; arXiv:1511.07401. [Google Scholar]
- González-Banos, H.H.; Latombe, J.C. Navigation strategies for exploring indoor environments. Int. J. Robot. Res. 2002, 21, 829–848. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv, 2014; arXiv:1409.0473. [Google Scholar]
- Maaten, L.V.D.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Approach | Planning Time (s) |
---|---|
CommAttn | 37 ± 5 |
Coordinated Frontier | 230 ± 20 |
Nearest Frontier | 35 ± 5 |
Cost-utility | 395 ± 15 |
Approach | Average Mean Reward | Collisions | Exploration Ratio (%) |
---|---|---|---|
CommAttn | 37 | 9 ± 6 | 95.2 ± 3.1 |
VAIN | 21 | 15 ± 4 | 90.2 ± 3.5 |
BicNet | 25 | 22 ± 10 | 89.8 ± 2.8 |
CommNet | 24 | 13 ± 5 | 89.2 ± 4.1 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Geng, M.; Xu, K.; Zhou, X.; Ding, B.; Wang, H.; Zhang, L. Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration. Entropy 2019, 21, 294. https://doi.org/10.3390/e21030294
Geng M, Xu K, Zhou X, Ding B, Wang H, Zhang L. Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration. Entropy. 2019; 21(3):294. https://doi.org/10.3390/e21030294
Chicago/Turabian StyleGeng, Mingyang, Kele Xu, Xing Zhou, Bo Ding, Huaimin Wang, and Lei Zhang. 2019. "Learning to Cooperate via an Attention-Based Communication Neural Network in Decentralized Multi-Robot Exploration" Entropy 21, no. 3: 294. https://doi.org/10.3390/e21030294