Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Learning a data-efficient model for a single agent in homogeneous multi-agent systems

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Training Reinforcement Learning (RL) policies for a robot requires an extensive amount of data recorded while interacting with the environment. Acquiring such a policy on a real robot is a tedious and time-consuming task. This is more challenging in a multi-agent system where individual data may be required from each agent. While training in simulations is the common approach due to efficiency and low-cost, they rarely describe the real world. Consequently, policies trained in simulations and transferred to the real robot usually perform poorly. In this paper, we present a novel real-to-sim-to-real framework to bridge the reality gap for an agent in collective motion of a homogeneous multi-agent system. First, we propose a novel deep neural-network architecture termed Convolutional-Recurrent Network (CR-Net) to capture the complex state transition of an agent and simulate its motion. Once trained with data from one agent, we show that the CR-Net can accurately predict motion of all agents in the group. Second, we propose to invest a limited amount of real data from the agent in a generative model. Then, training the CR-Net with synthetic data sampled from the generative model is shown to be at least equivalent to real data. Hence, the proposed approach provides a sufficiently accurate model with significantly less real data. The generative model can also be disseminated along with open-source hardware for easier usage. We show experiments on ground and underwater vehicles in which multi-agent RL policies are trained in the simulation for collective motion and successfully transferred to the real-world.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets and models generated during the current study are available in the Git repository, https://github.com/eranbTAU/Closing-the-Reality.

References

  1. Foerster JN (2018) Deep multi-agent reinforcement learning. PhD thesis, University of Oxford

  2. Zhang K, Yang Z, Başar T (2021) In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms, pp. 321–384. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_12

  3. Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: Inter. Conf. on Autonomous Agents and Multiagent Systems, pp. 66–83

  4. Hüttenrauch M, Sosic A, Neumann G (2017) Guided deep reinforcement learning for swarm systems. CoRR abs/1709.06011

  5. Yasuda T, Ohkura K (2019) Sharing experience for behavior generation of real swarm robot systems using deep reinforcement learning. Jour. of Robotics and Mechatronics 31(4):520–525

    Article  Google Scholar 

  6. Billah MA, Faruque IA (2021) Bioinspired visuomotor feedback in a multiagent group/swarm context. IEEE Transactions on Robotics 37(2):603–614

    Article  Google Scholar 

  7. Lim V, Huang H, Chen LY, Wang J, Ichnowski J, Seita D, Laskey M, Goldberg K (2021) Planar robot casting with real2sim2real self-supervised learning. CoRR

  8. Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. IEEE Symposium Series on Computational Intelligence (SSCI), 737–744

  9. Osinski B, Jakubowski A, Milos P, Ziecina P, Galias C, Homoceanu S, Michalewski H (2020) Simulation-based reinforcement learning for real-world autonomous driving. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6411–6418

  10. Azulay O, Shapiro A (2021) Wheel loader scooping controller using deep reinforcement learning. IEEE Access, 24145–24154

  11. Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: IEEE Inter. Conf. on Robotics and Automation (ICRA), pp. 3803–3810

  12. Ma RR, Dollar AM (2017) Yale openhand project: Optimizing open-source hand designs for ease of fabrication and adoption. IEEE Rob. & Aut. Mag. 24:32–40

    Article  Google Scholar 

  13. Yu J, Han SD, Tang WN, Rus D (2017) A portable, 3d-printing enabled multi-vehicle platform for robotics research and education. In: IEEE Inter. Conf. on Robotics and Automation, pp. 1475–1480. https://doi.org/10.1109/ICRA.2017.7989176

  14. Nguyen-Tuong D, Peters J (2011) Model learning for robot control: a survey. Cognitive processing 12(4):319–340

    Article  Google Scholar 

  15. Hahn D, Banzet P, Bern JM, Coros S (2019) Real2sim: Visco-elastic parameter estimation from dynamic motion. ACM Transactions on Graphics (TOG) 38(6):1–13

    Article  Google Scholar 

  16. Jordan MI, Rumelhart DE (1992) Forward models: Supervised learning with a distal teacher. Cognitive science 16(3):307–354

    Article  Google Scholar 

  17. Sintov A, Morgan AS, Kimmel A, Dollar AM, Bekris KE, Boularias A (2019) Learning a state transition model of an underactuated adaptive hand. IEEE Robotics and Automation Letters 4(2):1287–1294

    Article  Google Scholar 

  18. Sun D, Chen J, Mitra S, Fan C (2022) Multi-agent motion planning from signal temporal logic specifications. IEEE Robotics and Automation Letters 7(2):3451–3458. https://doi.org/10.1109/LRA.2022.3146951

    Article  Google Scholar 

  19. Dai L, Cao Q, Xia Y, Gao Y (2017) Distributed mpc for formation of multi-agent systems with collision avoidance and obstacle avoidance. Journal of the Franklin Institute 354(4):2068–2085. https://doi.org/10.1016/j.jfranklin.2016.12.021

    Article  MathSciNet  MATH  Google Scholar 

  20. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the International Conference on Neural Information Processing Systems, vol. 2, pp. 2672–2680. MIT Press, ???

  21. Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing 321:321–331. https://doi.org/10.1016/j.neucom.2018.09.013

    Article  Google Scholar 

  22. Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K, Levine S, Vanhoucke V (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. IEEE International Conference on Robotics and Automation (ICRA), 4243–4250

  23. Zhang K, Yang Z, Liu H, Zhang T, Başar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. In: Inter. Conf. on Machine Learning, vol. 80, pp. 5872–5881

  24. Zheng H, Shi D (2020) A multi-agent system for environmental monitoring using boolean networks and reinforcement learning. Journal of Cyber Security 2:85–96

    Article  Google Scholar 

  25. Hüttenrauch M, Šošić A, Neumann G (2019) Deep reinforcement learning for swarm systems. J. Mach. Learn. Res. 20(1):1966–1996

    MathSciNet  MATH  Google Scholar 

  26. Brambilla M, Ferrante E, Birattari M, Dorigo M (2012) Swarm robotics: a review from the swarm engineering perspective. Swarm Intelligence 7:1–41

    Article  Google Scholar 

  27. Rossi F, Bandyopadhyay S, Wolf M, Pavone M (2018) Review of multi-agent algorithms for collective behavior: a structural taxonomy. IFAC-PapersOnLine 51(12):112–117. https://doi.org/10.1016/j.ifacol.2018.07.097. IFAC Workshop on Networked & Autonomous Air & Space Systems NAASS 2018

  28. Xuan P, Lesser V (2002) Multi-agent policies: From centralized ones to decentralized ones. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 3. AAMAS ’02, pp. 1098–1105. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/545056.545078

  29. Zhang Q, Lu C, Garg A, Foerster J (2022) Centralized model and exploration policy for multi-agent rl. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 1500–1508. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC

  30. Gronauer S, Diepold K (2022) Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 55(2):895–943. https://doi.org/10.1007/s10462-021-09996-w

    Article  Google Scholar 

  31. Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’18, pp. 2085–2087

  32. Chamanbaz M, Mateo D, Zoss BM, Tokić G, Wilhelm E, Bouffanais R, Yue DKP (2017) Swarm-enabling technology for multi-robot systems. Frontiers in Robotics and AI 4

  33. Ribeiro R, Silvestre D, Silvestre C (2021) Decentralized control for multi-agent missions based on flocking rules. In: CONTROLO 2020, pp. 445–454

  34. Mishra RK, Vasal D, Vishwanath S (2021) Decentralized multi-agent reinforcement learning with shared actions. In: Annual Conference on Information Sciences and Systems (CISS), pp. 1–6

  35. Dobbe R, Fridovich-Keil D, Tomlin C (2017) Fully decentralized policies for multi-agent systems: An information theoretic approach. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 2945–2954. Curran Associates Inc., Red Hook, NY, USA

  36. Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer

  37. Jakobi N, Husbands P, Harvey I (1995) Noise and the reality gap: The use of simulation in evolutionary robotics. In: European Conference on Artificial Life, pp. 704–720. Springer

  38. Kaspar M, Osorio JDM, Bock J (2020) Sim2real transfer for reinforcement learning without dynamics randomization. IEEE/RSJ Inter. Conf. on Intelligent Robots and Systems, 4383–4388

  39. Golemo F (2018) How to train your robot-new environments for robotic training and new methods for transferring policies from the simulator to the real robot. PhD thesis, Université de Bordeaux

  40. Dearden A, Demiris Y (2005) Learning forward models for robots. In: IJCAI, vol. 5, p. 1440

  41. Ruthotto L, Haber E (2021) An introduction to deep generative modeling. GAMM-Mitteilungen 44(2):202100008

    Article  MathSciNet  Google Scholar 

  42. GM H, Gourisaria MK, Pandey M, Rautaray SS (2020) A comprehensive survey and analysis of generative models in machine learning. Computer Science Review 38, 100285

  43. Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for GAN training. IEEE Transactions on Image Processing 30:1882–1897

    Article  MathSciNet  Google Scholar 

  44. Finn C, Tan XY, Duan Y, Darrell T, Levine S, Abbeel P (2016) Deep spatial autoencoders for visuomotor learning. In: IEEE Inter. Conf. on Robotics and Automation (ICRA), pp. 512–519

  45. Golany T, Freedman D, Radinsky K (2021) Ecg ode-gan: Learning ordinary differential equations of ecg dynamics via generative adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 134–141

  46. Lembono TS, Pignat E, Jankowski J, Calinon S (2021) Learning constrained distributions of robot configurations with generative adversarial network. IEEE Rob. & Aut. Let. 6(2)

  47. Xu T, Wenliang LK, Munn M, Acciaio B (2020) Cot-gan: Generating sequential data via causal optimal transport. In: Advances in Neural Information Processing Systems, vol. abs/2006.08571

  48. Klemmer K, Xu T, Acciaio B, Neill DB (2022) Spate-gan: Improved generative modeling of dynamic spatio-temporal patterns with an autoregressive embedding loss. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 4523–4531

  49. Sampath V, Maurtua I, Aguilar J, Gutierrez A (2021) A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data 8(27)

  50. Sintov A, Morgan AS, Kimmel A, Dollar AM, Bekris KE, Boularias A (2019) Learning a state transition model of an underactuated adaptive hand. IEEE Robotics and Automation Letters 4(2):1287–1294

    Article  Google Scholar 

  51. Kimmel* A, Sintov* A, Wen B, Boularias A, Bekris K (2019) Belief-space planning using learned models with application to underactuated hands. In: Proc. of the 2019 International Symposium on Robotics Research, Hanoi, Vietnam

  52. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems 27

  53. Yu Y, Si X, Hu C, Zhang J (2019) A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation 31(7):1235–1270

    Article  MathSciNet  MATH  Google Scholar 

  54. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2):157–166

    Article  Google Scholar 

  55. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780

    Article  Google Scholar 

  56. Dhillon A, Verma G (2019) Convolutional neural network: a review of models, methodologies and applications to object detection. Progress in Artificial Intelligence 9

  57. Chen Y, Yang J, Qian J (2017) Recurrent neural network for facial landmark detection. Neurocomputing 219:26–38

    Article  Google Scholar 

  58. Malu K, Majumdar J (2014) Sandeep: Kinematics, localization and control of differential drive mobile robot. Global Journal of Research In Engineering 14

  59. Wang W, Dai X, Li L, Gheneti BH, Ding Y, Yu J, Xie G (2018) Three-dimensional modeling of a fin-actuated robotic fish with multimodal swimming. IEEE/ASME Transactions on Mechatronics 23(4):1641–1652. https://doi.org/10.1109/TMECH.2018.2848220

    Article  Google Scholar 

  60. Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, Stoica I (2018) Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118

  61. Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243

  62. Bowles C, Chen L, Guerrero R, Bentley P, Gunn RN, Hammers A, Dickie DA, del C Valdés Hernández M, Wardlaw JM, Rueckert D (2018) GAN augmentation: Augmenting training data using generative adversarial networks. In: CoRR, vol. abs/1810.10863

  63. Lillicrap TP, Hunt JJ, Pritzel A, Heess NMO, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: CoRR, vol. abs/1509.02971

Download references

Acknowledgements

This research was supported by the Zimin Institute for Engineering Solutions Advancing Better Lives.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avishai Sintov.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary file1 (MP4 90,695 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gurevich, A., Bamani, E. & Sintov, A. Learning a data-efficient model for a single agent in homogeneous multi-agent systems. Neural Comput & Applic 35, 20069–20085 (2023). https://doi.org/10.1007/s00521-023-08838-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08838-w

Keywords

Navigation