Transformer-Based Reinforcement Learning for Multi-Robot Autonomous Exploration
<p>An example of the initial state of an exploration process.</p> "> Figure 2
<p>Structural diagram of the methodology of centralized training.</p> "> Figure 3
<p>Policy net structure.</p> "> Figure 4
<p>Critic net structure.</p> "> Figure 5
<p>Randomly generated training environments.</p> "> Figure 6
<p>Policy performance varied with the number of training episodes.</p> "> Figure 7
<p>The average time spent and the average total distance traveled to complete the exploration by different methods.</p> "> Figure 8
<p>Exploring progress–time curves. The three figures correspond to robot teams of sizes 2, 3, and 4, respectively.</p> "> Figure 9
<p>Test environment and the robot. (<b>a</b>) The environment, (<b>b</b>) the robot.</p> "> Figure 10
<p>Path of different-size robotics teams to complete the exploration.</p> ">
Abstract
:1. Introduction
- A novel DRL framework is introduced to address the challenge of multi-robot autonomous exploration.
- We propose a distributed decision network architecture based on the Transformer model.
2. Related Work
2.1. Multi-Robot Exploration
2.2. Deep Reinforcement Learning
3. Problem Statement
4. Methodology
4.1. Model for Decentralized Multi-Robot Autonomous Exploration
4.2. Deep Reinforcement Learning
- The local map was represented by a grayscale image with a depth of 8 bits, where the pixel values of the free space, obstacle space, and unknown space were 200, 100, and 0. Edge extraction was carried out by the Canny operator with high and low thresholds of 650 and 600, respectively, and the extracted edges made up the set of all the frontier pixels.
- All connected domains were found for the pixels in the set, and the center of each connected domain was calculated, thus clustering the connected frontier pixels.
- For each center position, the nearest pixel that belonged to the free space was calculated as a frontier point in the action space.
4.3. Exploration Process
5. Training
5.1. Training Environment Generation
5.2. Training Process
6. Experiment
6.1. Comparative Experiment
- Nearest frontier [9]: each robot continuously selects the closest frontier location as its target point until the environment is fully explored.
- Next-Best-View [23]: each robot selects a target point to explore by evaluating the distance from the candidate point to itself and the information gain available at the target point.
- Planning-based [8]: each robot uses Dec-MDP for online planning and reduces repetitive exploration of the environment by predicting the state of other robots from locally observed information.
- DME-DRL [25]: the distributed multi-robot exploration algorithm based on deep reinforcement learning (DME-DRL) explores by selecting the nearest frontier position in a particular direction as a target point.
6.2. Generalization Capability Test
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DRL | Deep reinforcement learning |
MADRL | Multi-agent deep reinforcement learning |
RL | Reinforcement learning |
DDPG | Deep Deterministic Policy Gradient |
SAC | Soft Actor–Critic |
CTDE | Centered Training and Decentralized Execution |
VDN | Value-Decomposition Network |
MADDPG | Multi-Agent Deep Deterministic Policy Gradient |
MAPPO | Multi-Agent Proximal Policy Optimization |
Dec-POMDP | Decentralized Partially Observable Markov Decision Process |
TBDE-Net | Transformer-Based Decentralized Exploration Network |
TEB | Timed-Elastic-Band |
LSTM | Long Short-Term Memory |
Dec-MDP | Decentralized Markov Decision Process |
DME-DRL | Distributed multi-robot exploration algorithm based on deep reinforcement learning |
ROS | Robot Operating System |
References
- Quattrini Li, A. Exploration and mapping with groups of robots: Recent trends. Curr. Robot. Rep. 2020, 1, 227–237. [Google Scholar] [CrossRef]
- Haroon, A.; Lakshman, A.; Mundy, M.; Li, B. Autonomous robotic 3D scanning for smart factory planning. In Dimensional Optical Metrology and Inspection for Practical Applications XIII; SPIE: Bellingham, WA, USA, 2024; Volume 13038, pp. 104–112. [Google Scholar]
- Yi, L.; Le, A.V.; Hayat, A.A.; Borusu, C.S.C.S.; Mohan, R.E.; Nhan, N.H.K.; Kandasamy, P. Reconfiguration during locomotion by pavement sweeping robot with feedback control from vision system. IEEE Access 2020, 8, 113355–113370. [Google Scholar] [CrossRef]
- Pan, D.; Zhao, D.; Pu, Y.; Wang, L.; Zhang, Y. Use of cross-training in human–robot collaborative rescue. Hum. Factor. Ergon. Man. 2024, 34, 261–276. [Google Scholar] [CrossRef]
- Arm, P.; Waibel, G.; Preisig, J.; Tuna, T.; Zhou, R.; Bickel, V.; Ligeza, G.; Miki, T.; Kehl, F.; Kolvenbach, H.; et al. Scientific exploration of challenging planetary analog environments with a team of legged robots. Sci. Robot. 2023, 8, eade9548. [Google Scholar] [CrossRef]
- Zhu, L.; Cheng, J.; Liu, Y. Multi-Robot Autonomous Exploration in Unknown Environment: A Review. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 7639–7644. [Google Scholar]
- Gul, F.; Mir, A.; Mir, I.; Mir, S.; Islaam, T.U.; Abualigah, L.; Forestiero, A. A centralized strategy for multi-agent exploration. IEEE Access 2022, 10, 126871–126884. [Google Scholar] [CrossRef]
- Matignon, L.; Jeanpierre, L.; Mouaddib, A.I. Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; Volume 26, pp. 2017–2023. [Google Scholar]
- Yamauchi, B. Frontier-based exploration using multiple robots. In Proceedings of the Second International Conference on Autonomous Agents, St. Paul, MN, USA, 9–13 May 1998; pp. 47–53. [Google Scholar]
- Butzke, J.; Likhachev, M. Planning for multi-robot exploration with multiple objective utility functions. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 3254–3259. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Proc. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
- Chen, J.; Li, S.E.; Tomizuka, M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transp. 2021, 23, 5068–5078. [Google Scholar] [CrossRef]
- Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Burgard, W.; Moors, M.; Stachniss, C.; Schneider, F.E. Coordinated multi-robot exploration. IEEE Trans. Robot. 2005, 21, 376–386. [Google Scholar] [CrossRef]
- Matignon, L.; Jeanpierre, L.; Mouaddib, A.I. Distributed value functions for multi-robot exploration. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1544–1550. [Google Scholar]
- Stachniss, C.; Mozos, O.M.; Burgard, W. Speeding-up multi-robot exploration by considering semantic place information. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, Orlando, FL, USA, 15–19 May 2006; pp. 1692–1697. [Google Scholar]
- Wang, C.; Zhu, D.; Li, T.; Meng, M.Q.H.; De Silva, C.W. Efficient autonomous robotic exploration with semantic road map in indoor environments. IEEE Robot. Autom. Lett. 2019, 4, 2989–2996. [Google Scholar] [CrossRef]
- Colares, R.G.; Chaimowicz, L. The next frontier: Combining information gain and distance cost for decentralized multi-robot exploration. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, 4–8 April 2016; pp. 268–274. [Google Scholar]
- Li, H.; Zhang, Q.; Zhao, D. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2064–2076. [Google Scholar] [CrossRef]
- He, D.; Feng, D.; Jia, H.; Liu, H. Decentralized exploration of a structured environment based on multi-agent deep reinforcement learning. In Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China, 2–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 172–179. [Google Scholar]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Fan, J.; Wang, Z.; Xie, Y.; Yang, Z. A theoretical analysis of deep Q-learning. In Proceedings of the Learning for Dynamics and Control, Online, 10–11 June 2020; pp. 486–489. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Lohse, O.; Pütz, N.; Hörmann, K. Implementing an online scheduling approach for production with multi agent proximal policy optimization (MAPPO). In Proceedings of the Advances in Production Management Systems. Artificial Intelligence for Sustainable and Resilient Production Systems: IFIP WG 5.7 International Conference, APMS 2021, Nantes, France, 5–9 September 2021; Proceedings, Part V. Springer: Berlin/Heidelberg, Germany, 2021; pp. 586–595. [Google Scholar]
- Shani, G.; Pineau, J.; Kaplow, R. A survey of point-based POMDP solvers. Auton. Agents Multi-Agent Syst. 2013, 27, 1–51. [Google Scholar] [CrossRef]
- Cheng, C.; Sha, Q.; He, B.; Li, G. Path planning and obstacle avoidance for AUV: A review. Ocean Eng. 2021, 235, 109355. [Google Scholar] [CrossRef]
- Rösmann, C.; Hoffmann, F. Bertram, Integrated online trajectory planning and optimization in distinctive topologies. Robot. Auton. Syst. 2017, 88, 142–153. [Google Scholar] [CrossRef]
Parameter Name | Value |
---|---|
Optimizer | Adam |
Initial learn rate | 0.001 |
Batch size | 64 |
Replay buffer size | 10,000 |
Communication range | 5 m |
Robotics Team Size | Method | Time (s) | Distance (m) |
---|---|---|---|
2 | TBDE-Net | 31.2 | 78.7 |
Nearest frontier | 34.6 | 88.4 | |
Next-Best-View | 38.2 | 101.8 | |
Planning-based | 35.9 | 92.2 | |
DME-DRL | 36.0 | 93.7 | |
3 | TBDE-Net | 23.0 | 85.3 |
Nearest frontier | 25.6 | 91.5 | |
Next-Best-View | 28.8 | 104.7 | |
Planning-based | 25.8 | 94.9 | |
DME-DRL | 26.0 | 94.4 | |
4 | TBDE-Net | 21.3 | 91.9 |
Nearest frontier | 23.6 | 111.9 | |
Next-Best-View | 27.4 | 108.1 | |
Planning-based | 24.6 | 96.4 | |
DME-DRL | 24.4 | 95.2 |
Robot Team Size | Method | Time (s) | Distance (m) |
---|---|---|---|
2 | TBDE-Net | 31.2 | 78.7 |
Nearest frontier | 39.1 | 89.9 | |
3 | TBDE-Net | 25.6 | 99.3 |
Nearest frontier | 26.3 | 103.7 | |
4 | TBDE-Net | 23.5 | 107.1 |
Nearest frontier | 27.2 | 127.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Q.; Wang, R.; Lyu, M.; Zhang, J. Transformer-Based Reinforcement Learning for Multi-Robot Autonomous Exploration. Sensors 2024, 24, 5083. https://doi.org/10.3390/s24165083
Chen Q, Wang R, Lyu M, Zhang J. Transformer-Based Reinforcement Learning for Multi-Robot Autonomous Exploration. Sensors. 2024; 24(16):5083. https://doi.org/10.3390/s24165083
Chicago/Turabian StyleChen, Qihong, Rui Wang, Ming Lyu, and Jie Zhang. 2024. "Transformer-Based Reinforcement Learning for Multi-Robot Autonomous Exploration" Sensors 24, no. 16: 5083. https://doi.org/10.3390/s24165083
APA StyleChen, Q., Wang, R., Lyu, M., & Zhang, J. (2024). Transformer-Based Reinforcement Learning for Multi-Robot Autonomous Exploration. Sensors, 24(16), 5083. https://doi.org/10.3390/s24165083