Learning a data-efficient model for a single agent in homogeneous multi-agent systems

233 Accesses
Explore all metrics

Abstract

Training Reinforcement Learning (RL) policies for a robot requires an extensive amount of data recorded while interacting with the environment. Acquiring such a policy on a real robot is a tedious and time-consuming task. This is more challenging in a multi-agent system where individual data may be required from each agent. While training in simulations is the common approach due to efficiency and low-cost, they rarely describe the real world. Consequently, policies trained in simulations and transferred to the real robot usually perform poorly. In this paper, we present a novel real-to-sim-to-real framework to bridge the reality gap for an agent in collective motion of a homogeneous multi-agent system. First, we propose a novel deep neural-network architecture termed Convolutional-Recurrent Network (CR-Net) to capture the complex state transition of an agent and simulate its motion. Once trained with data from one agent, we show that the CR-Net can accurately predict motion of all agents in the group. Second, we propose to invest a limited amount of real data from the agent in a generative model. Then, training the CR-Net with synthetic data sampled from the generative model is shown to be at least equivalent to real data. Hence, the proposed approach provides a sufficiently accurate model with significantly less real data. The generative model can also be disseminated along with open-source hardware for easier usage. We show experiments on ground and underwater vehicles in which multi-agent RL policies are trained in the simulation for collective motion and successfully transferred to the real-world.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Agent-Based Autonomous Robotic System Using Deep Reinforcement and Transfer Learning

Deep imitation learning for 3D navigation tasks

Article Open access 04 December 2017

A Comprehensive Review of Mobile Robot Navigation Using Deep Reinforcement Learning Algorithms in Crowded Environments

Article Open access 08 November 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets and models generated during the current study are available in the Git repository, https://github.com/eranbTAU/Closing-the-Reality.

References

Foerster JN (2018) Deep multi-agent reinforcement learning. PhD thesis, University of Oxford
Zhang K, Yang Z, Başar T (2021) In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp. 321–384. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_12
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: Inter. Conf. on Autonomous Agents and Multiagent Systems, pp. 66–83
Hüttenrauch M, Sosic A, Neumann G (2017) Guided deep reinforcement learning for swarm systems. CoRR abs/1709.06011
Yasuda T, Ohkura K (2019) Sharing experience for behavior generation of real swarm robot systems using deep reinforcement learning. Jour. of Robotics and Mechatronics 31(4):520–525
Article Google Scholar
Billah MA, Faruque IA (2021) Bioinspired visuomotor feedback in a multiagent group/swarm context. IEEE Transactions on Robotics 37(2):603–614
Article Google Scholar
Lim V, Huang H, Chen LY, Wang J, Ichnowski J, Seita D, Laskey M, Goldberg K (2021) Planar robot casting with real2sim2real self-supervised learning. CoRR
Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. IEEE Symposium Series on Computational Intelligence (SSCI), 737–744
Osinski B, Jakubowski A, Milos P, Ziecina P, Galias C, Homoceanu S, Michalewski H (2020) Simulation-based reinforcement learning for real-world autonomous driving. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6411–6418
Azulay O, Shapiro A (2021) Wheel loader scooping controller using deep reinforcement learning. IEEE Access, 24145–24154
Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: IEEE Inter. Conf. on Robotics and Automation (ICRA), pp. 3803–3810
Ma RR, Dollar AM (2017) Yale openhand project: Optimizing open-source hand designs for ease of fabrication and adoption. IEEE Rob. & Aut. Mag. 24:32–40
Article Google Scholar
Yu J, Han SD, Tang WN, Rus D (2017) A portable, 3d-printing enabled multi-vehicle platform for robotics research and education. In: IEEE Inter. Conf. on Robotics and Automation, pp. 1475–1480. https://doi.org/10.1109/ICRA.2017.7989176
Nguyen-Tuong D, Peters J (2011) Model learning for robot control: a survey. Cognitive processing 12(4):319–340
Article Google Scholar
Hahn D, Banzet P, Bern JM, Coros S (2019) Real2sim: Visco-elastic parameter estimation from dynamic motion. ACM Transactions on Graphics (TOG) 38(6):1–13
Article Google Scholar
Jordan MI, Rumelhart DE (1992) Forward models: Supervised learning with a distal teacher. Cognitive science 16(3):307–354
Article Google Scholar
Sintov A, Morgan AS, Kimmel A, Dollar AM, Bekris KE, Boularias A (2019) Learning a state transition model of an underactuated adaptive hand. IEEE Robotics and Automation Letters 4(2):1287–1294
Article Google Scholar
Sun D, Chen J, Mitra S, Fan C (2022) Multi-agent motion planning from signal temporal logic specifications. IEEE Robotics and Automation Letters 7(2):3451–3458. https://doi.org/10.1109/LRA.2022.3146951
Article Google Scholar
Dai L, Cao Q, Xia Y, Gao Y (2017) Distributed mpc for formation of multi-agent systems with collision avoidance and obstacle avoidance. Journal of the Franklin Institute 354(4):2068–2085. https://doi.org/10.1016/j.jfranklin.2016.12.021
Article MathSciNet MATH Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 2, pp. 2672–2680. MIT Press, ???
Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing 321:321–331. https://doi.org/10.1016/j.neucom.2018.09.013
Article Google Scholar
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S, Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. IEEE International Conference on Robotics and Automation (ICRA), 4243–4250
Zhang K, Yang Z, Liu H, Zhang T, Başar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. In: Inter. Conf. on Machine Learning, vol. 80, pp. 5872–5881
Zheng H, Shi D (2020) A multi-agent system for environmental monitoring using boolean networks and reinforcement learning. Journal of Cyber Security 2:85–96
Article Google Scholar
Hüttenrauch M, Šošić A, Neumann G (2019) Deep reinforcement learning for swarm systems. J. Mach. Learn. Res. 20(1):1966–1996
MathSciNet MATH Google Scholar
Brambilla M, Ferrante E, Birattari M, Dorigo M (2012) Swarm robotics: a review from the swarm engineering perspective. Swarm Intelligence 7:1–41
Article Google Scholar
Rossi F, Bandyopadhyay S, Wolf M, Pavone M (2018) Review of multi-agent algorithms for collective behavior: a structural taxonomy. IFAC-PapersOnLine 51(12):112–117. https://doi.org/10.1016/j.ifacol.2018.07.097. IFAC Workshop on Networked & Autonomous Air & Space Systems NAASS 2018
Xuan P, Lesser V (2002) Multi-agent policies: From centralized ones to decentralized ones. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 3. AAMAS ’02, pp. 1098–1105. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/545056.545078
Zhang Q, Lu C, Garg A, Foerster J (2022) Centralized model and exploration policy for multi-agent rl. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 1500–1508. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC
Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 55(2):895–943. https://doi.org/10.1007/s10462-021-09996-w
Article Google Scholar
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18, pp. 2085–2087
Chamanbaz M, Mateo D, Zoss BM, Tokić G, Wilhelm E, Bouffanais R, Yue DKP (2017) Swarm-enabling technology for multi-robot systems. Frontiers in Robotics and AI 4
Ribeiro R, Silvestre D, Silvestre C (2021) Decentralized control for multi-agent missions based on flocking rules. In: CONTROLO 2020, pp. 445–454
Mishra RK, Vasal D, Vishwanath S (2021) Decentralized multi-agent reinforcement learning with shared actions. In: Annual Conference on Information Sciences and Systems (CISS), pp. 1–6
Dobbe R, Fridovich-Keil D, Tomlin C (2017) Fully decentralized policies for multi-agent systems: An information theoretic approach. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 2945–2954. Curran Associates Inc., Red Hook, NY, USA
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer
Jakobi N, Husbands P, Harvey I (1995) Noise and the reality gap: The use of simulation in evolutionary robotics. In: European Conference on Artificial Life, pp. 704–720. Springer
Kaspar M, Osorio JDM, Bock J (2020) Sim2real transfer for reinforcement learning without dynamics randomization. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, 4383–4388
Golemo F (2018) How to train your robot-new environments for robotic training and new methods for transferring policies from the simulator to the real robot. PhD thesis, Université de Bordeaux
Dearden A, Demiris Y (2005) Learning forward models for robots. In: IJCAI, vol. 5, p. 1440
Ruthotto L, Haber E (2021) An introduction to deep generative modeling. GAMM-Mitteilungen 44(2):202100008
Article MathSciNet Google Scholar
GM H, Gourisaria MK, Pandey M, Rautaray SS (2020) A comprehensive survey and analysis of generative models in machine learning. Computer Science Review 38, 100285
Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for GAN training. IEEE Transactions on Image Processing 30:1882–1897
Article MathSciNet Google Scholar
Finn C, Tan XY, Duan Y, Darrell T, Levine S, Abbeel P (2016) Deep spatial autoencoders for visuomotor learning. In: IEEE Inter. Conf. on Robotics and Automation (ICRA), pp. 512–519
Golany T, Freedman D, Radinsky K (2021) Ecg ode-gan: Learning ordinary differential equations of ecg dynamics via generative adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 134–141
Lembono TS, Pignat E, Jankowski J, Calinon S (2021) Learning constrained distributions of robot configurations with generative adversarial network. IEEE Rob. & Aut. Let. 6(2)
Xu T, Wenliang LK, Munn M, Acciaio B (2020) Cot-gan: Generating sequential data via causal optimal transport. In: Advances in Neural Information Processing Systems, vol. abs/2006.08571
Klemmer K, Xu T, Acciaio B, Neill DB (2022) Spate-gan: Improved generative modeling of dynamic spatio-temporal patterns with an autoregressive embedding loss. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 4523–4531
Sampath V, Maurtua I, Aguilar J, Gutierrez A (2021) A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data 8(27)
Sintov A, Morgan AS, Kimmel A, Dollar AM, Bekris KE, Boularias A (2019) Learning a state transition model of an underactuated adaptive hand. IEEE Robotics and Automation Letters 4(2):1287–1294
Article Google Scholar
Kimmel* A, Sintov* A, Wen B, Boularias A, Bekris K (2019) Belief-space planning using learned models with application to underactuated hands. In: Proc. of the 2019 International Symposium on Robotics Research, Hanoi, Vietnam
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems 27
Yu Y, Si X, Hu C, Zhang J (2019) A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation 31(7):1235–1270
Article MathSciNet MATH Google Scholar
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2):157–166
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780
Article Google Scholar
Dhillon A, Verma G (2019) Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence 9
Chen Y, Yang J, Qian J (2017) Recurrent neural network for facial landmark detection. Neurocomputing 219:26–38
Article Google Scholar
Malu K, Majumdar J (2014) Sandeep: Kinematics, localization and control of differential drive mobile robot. Global Journal of Research In Engineering 14
Wang W, Dai X, Li L, Gheneti BH, Ding Y, Yu J, Xie G (2018) Three-dimensional modeling of a fin-actuated robotic fish with multimodal swimming. IEEE/ASME Transactions on Mechatronics 23(4):1641–1652. https://doi.org/10.1109/TMECH.2018.2848220
Article Google Scholar
Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, Stoica I (2018) Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118
Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243
Bowles C, Chen L, Guerrero R, Bentley P, Gunn RN, Hammers A, Dickie DA, del C Valdés Hernández M, Wardlaw JM, Rueckert D (2018) GAN augmentation: Augmenting training data using generative adversarial networks. In: CoRR, vol. abs/1810.10863
Lillicrap TP, Hunt JJ, Pritzel A, Heess NMO, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: CoRR, vol. abs/1509.02971

Download references

Acknowledgements

This research was supported by the Zimin Institute for Engineering Solutions Advancing Better Lives.

Author information

Anton Gurevich and Eran Bamani have equally contributed to this work.

Authors and Affiliations

School of Mechanical Engineering, Tel-Aviv University, Haim Levanon, Tel-Aviv, 6997801, Israel
Anton Gurevich, Eran Bamani & Avishai Sintov

Authors

Anton Gurevich
View author publications
You can also search for this author in PubMed Google Scholar
Eran Bamani
View author publications
You can also search for this author in PubMed Google Scholar
Avishai Sintov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avishai Sintov.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary file1 (MP4 90,695 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gurevich, A., Bamani, E. & Sintov, A. Learning a data-efficient model for a single agent in homogeneous multi-agent systems. Neural Comput & Applic 35, 20069–20085 (2023). https://doi.org/10.1007/s00521-023-08838-w

Download citation

Received: 08 July 2022
Accepted: 28 June 2023
Published: 17 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00521-023-08838-w

Learning a data-efficient model for a single agent in homogeneous multi-agent systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Agent-Based Autonomous Robotic System Using Deep Reinforcement and Transfer Learning

Deep imitation learning for 3D navigation tasks

A Comprehensive Review of Mobile Robot Navigation Using Deep Reinforcement Learning Algorithms in Crowded Environments

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning a data-efficient model for a single agent in homogeneous multi-agent systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Agent-Based Autonomous Robotic System Using Deep Reinforcement and Transfer Learning

Deep imitation learning for 3D navigation tasks

A Comprehensive Review of Mobile Robot Navigation Using Deep Reinforcement Learning Algorithms in Crowded Environments

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation