Abstract
Training Reinforcement Learning (RL) policies for a robot requires an extensive amount of data recorded while interacting with the environment. Acquiring such a policy on a real robot is a tedious and time-consuming task. This is more challenging in a multi-agent system where individual data may be required from each agent. While training in simulations is the common approach due to efficiency and low-cost, they rarely describe the real world. Consequently, policies trained in simulations and transferred to the real robot usually perform poorly. In this paper, we present a novel real-to-sim-to-real framework to bridge the reality gap for an agent in collective motion of a homogeneous multi-agent system. First, we propose a novel deep neural-network architecture termed Convolutional-Recurrent Network (CR-Net) to capture the complex state transition of an agent and simulate its motion. Once trained with data from one agent, we show that the CR-Net can accurately predict motion of all agents in the group. Second, we propose to invest a limited amount of real data from the agent in a generative model. Then, training the CR-Net with synthetic data sampled from the generative model is shown to be at least equivalent to real data. Hence, the proposed approach provides a sufficiently accurate model with significantly less real data. The generative model can also be disseminated along with open-source hardware for easier usage. We show experiments on ground and underwater vehicles in which multi-agent RL policies are trained in the simulation for collective motion and successfully transferred to the real-world.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets and models generated during the current study are available in the Git repository, https://github.com/eranbTAU/Closing-the-Reality.
References
Foerster JN (2018) Deep multi-agent reinforcement learning. PhD thesis, University of Oxford
Zhang K, Yang Z, Başar T (2021) In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp. 321–384. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_12
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: Inter. Conf. on Autonomous Agents and Multiagent Systems, pp. 66–83
Hüttenrauch M, Sosic A, Neumann G (2017) Guided deep reinforcement learning for swarm systems. CoRR abs/1709.06011
Yasuda T, Ohkura K (2019) Sharing experience for behavior generation of real swarm robot systems using deep reinforcement learning. Jour. of Robotics and Mechatronics 31(4):520–525
Billah MA, Faruque IA (2021) Bioinspired visuomotor feedback in a multiagent group/swarm context. IEEE Transactions on Robotics 37(2):603–614
Lim V, Huang H, Chen LY, Wang J, Ichnowski J, Seita D, Laskey M, Goldberg K (2021) Planar robot casting with real2sim2real self-supervised learning. CoRR
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. IEEE Symposium Series on Computational Intelligence (SSCI), 737–744
Osinski B, Jakubowski A, Milos P, Ziecina P, Galias C, Homoceanu S, Michalewski H (2020) Simulation-based reinforcement learning for real-world autonomous driving. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6411–6418
Azulay O, Shapiro A (2021) Wheel loader scooping controller using deep reinforcement learning. IEEE Access, 24145–24154
Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: IEEE Inter. Conf. on Robotics and Automation (ICRA), pp. 3803–3810
Ma RR, Dollar AM (2017) Yale openhand project: Optimizing open-source hand designs for ease of fabrication and adoption. IEEE Rob. & Aut. Mag. 24:32–40
Yu J, Han SD, Tang WN, Rus D (2017) A portable, 3d-printing enabled multi-vehicle platform for robotics research and education. In: IEEE Inter. Conf. on Robotics and Automation, pp. 1475–1480. https://doi.org/10.1109/ICRA.2017.7989176
Nguyen-Tuong D, Peters J (2011) Model learning for robot control: a survey. Cognitive processing 12(4):319–340
Hahn D, Banzet P, Bern JM, Coros S (2019) Real2sim: Visco-elastic parameter estimation from dynamic motion. ACM Transactions on Graphics (TOG) 38(6):1–13
Jordan MI, Rumelhart DE (1992) Forward models: Supervised learning with a distal teacher. Cognitive science 16(3):307–354
Sintov A, Morgan AS, Kimmel A, Dollar AM, Bekris KE, Boularias A (2019) Learning a state transition model of an underactuated adaptive hand. IEEE Robotics and Automation Letters 4(2):1287–1294
Sun D, Chen J, Mitra S, Fan C (2022) Multi-agent motion planning from signal temporal logic specifications. IEEE Robotics and Automation Letters 7(2):3451–3458. https://doi.org/10.1109/LRA.2022.3146951
Dai L, Cao Q, Xia Y, Gao Y (2017) Distributed mpc for formation of multi-agent systems with collision avoidance and obstacle avoidance. Journal of the Franklin Institute 354(4):2068–2085. https://doi.org/10.1016/j.jfranklin.2016.12.021
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 2, pp. 2672–2680. MIT Press, ???
Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing 321:321–331. https://doi.org/10.1016/j.neucom.2018.09.013
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S, Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. IEEE International Conference on Robotics and Automation (ICRA), 4243–4250
Zhang K, Yang Z, Liu H, Zhang T, Başar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. In: Inter. Conf. on Machine Learning, vol. 80, pp. 5872–5881
Zheng H, Shi D (2020) A multi-agent system for environmental monitoring using boolean networks and reinforcement learning. Journal of Cyber Security 2:85–96
Hüttenrauch M, Šošić A, Neumann G (2019) Deep reinforcement learning for swarm systems. J. Mach. Learn. Res. 20(1):1966–1996
Brambilla M, Ferrante E, Birattari M, Dorigo M (2012) Swarm robotics: a review from the swarm engineering perspective. Swarm Intelligence 7:1–41
Rossi F, Bandyopadhyay S, Wolf M, Pavone M (2018) Review of multi-agent algorithms for collective behavior: a structural taxonomy. IFAC-PapersOnLine 51(12):112–117. https://doi.org/10.1016/j.ifacol.2018.07.097. IFAC Workshop on Networked & Autonomous Air & Space Systems NAASS 2018
Xuan P, Lesser V (2002) Multi-agent policies: From centralized ones to decentralized ones. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 3. AAMAS ’02, pp. 1098–1105. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/545056.545078
Zhang Q, Lu C, Garg A, Foerster J (2022) Centralized model and exploration policy for multi-agent rl. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 1500–1508. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC
Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 55(2):895–943. https://doi.org/10.1007/s10462-021-09996-w
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18, pp. 2085–2087
Chamanbaz M, Mateo D, Zoss BM, Tokić G, Wilhelm E, Bouffanais R, Yue DKP (2017) Swarm-enabling technology for multi-robot systems. Frontiers in Robotics and AI 4
Ribeiro R, Silvestre D, Silvestre C (2021) Decentralized control for multi-agent missions based on flocking rules. In: CONTROLO 2020, pp. 445–454
Mishra RK, Vasal D, Vishwanath S (2021) Decentralized multi-agent reinforcement learning with shared actions. In: Annual Conference on Information Sciences and Systems (CISS), pp. 1–6
Dobbe R, Fridovich-Keil D, Tomlin C (2017) Fully decentralized policies for multi-agent systems: An information theoretic approach. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 2945–2954. Curran Associates Inc., Red Hook, NY, USA
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer
Jakobi N, Husbands P, Harvey I (1995) Noise and the reality gap: The use of simulation in evolutionary robotics. In: European Conference on Artificial Life, pp. 704–720. Springer
Kaspar M, Osorio JDM, Bock J (2020) Sim2real transfer for reinforcement learning without dynamics randomization. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, 4383–4388
Golemo F (2018) How to train your robot-new environments for robotic training and new methods for transferring policies from the simulator to the real robot. PhD thesis, Université de Bordeaux
Dearden A, Demiris Y (2005) Learning forward models for robots. In: IJCAI, vol. 5, p. 1440
Ruthotto L, Haber E (2021) An introduction to deep generative modeling. GAMM-Mitteilungen 44(2):202100008
GM H, Gourisaria MK, Pandey M, Rautaray SS (2020) A comprehensive survey and analysis of generative models in machine learning. Computer Science Review 38, 100285
Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for GAN training. IEEE Transactions on Image Processing 30:1882–1897
Finn C, Tan XY, Duan Y, Darrell T, Levine S, Abbeel P (2016) Deep spatial autoencoders for visuomotor learning. In: IEEE Inter. Conf. on Robotics and Automation (ICRA), pp. 512–519
Golany T, Freedman D, Radinsky K (2021) Ecg ode-gan: Learning ordinary differential equations of ecg dynamics via generative adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 134–141
Lembono TS, Pignat E, Jankowski J, Calinon S (2021) Learning constrained distributions of robot configurations with generative adversarial network. IEEE Rob. & Aut. Let. 6(2)
Xu T, Wenliang LK, Munn M, Acciaio B (2020) Cot-gan: Generating sequential data via causal optimal transport. In: Advances in Neural Information Processing Systems, vol. abs/2006.08571
Klemmer K, Xu T, Acciaio B, Neill DB (2022) Spate-gan: Improved generative modeling of dynamic spatio-temporal patterns with an autoregressive embedding loss. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 4523–4531
Sampath V, Maurtua I, Aguilar J, Gutierrez A (2021) A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data 8(27)
Sintov A, Morgan AS, Kimmel A, Dollar AM, Bekris KE, Boularias A (2019) Learning a state transition model of an underactuated adaptive hand. IEEE Robotics and Automation Letters 4(2):1287–1294
Kimmel* A, Sintov* A, Wen B, Boularias A, Bekris K (2019) Belief-space planning using learned models with application to underactuated hands. In: Proc. of the 2019 International Symposium on Robotics Research, Hanoi, Vietnam
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems 27
Yu Y, Si X, Hu C, Zhang J (2019) A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation 31(7):1235–1270
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2):157–166
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780
Dhillon A, Verma G (2019) Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence 9
Chen Y, Yang J, Qian J (2017) Recurrent neural network for facial landmark detection. Neurocomputing 219:26–38
Malu K, Majumdar J (2014) Sandeep: Kinematics, localization and control of differential drive mobile robot. Global Journal of Research In Engineering 14
Wang W, Dai X, Li L, Gheneti BH, Ding Y, Yu J, Xie G (2018) Three-dimensional modeling of a fin-actuated robotic fish with multimodal swimming. IEEE/ASME Transactions on Mechatronics 23(4):1641–1652. https://doi.org/10.1109/TMECH.2018.2848220
Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, Stoica I (2018) Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118
Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243
Bowles C, Chen L, Guerrero R, Bentley P, Gunn RN, Hammers A, Dickie DA, del C Valdés Hernández M, Wardlaw JM, Rueckert D (2018) GAN augmentation: Augmenting training data using generative adversarial networks. In: CoRR, vol. abs/1810.10863
Lillicrap TP, Hunt JJ, Pritzel A, Heess NMO, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: CoRR, vol. abs/1509.02971
Acknowledgements
This research was supported by the Zimin Institute for Engineering Solutions Advancing Better Lives.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary file1 (MP4 90,695 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gurevich, A., Bamani, E. & Sintov, A. Learning a data-efficient model for a single agent in homogeneous multi-agent systems. Neural Comput & Applic 35, 20069–20085 (2023). https://doi.org/10.1007/s00521-023-08838-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08838-w