extended-abstract

Decomposed Deep Reinforcement Learning for Robotic Control

Authors:

Ahmed Maustafa,

Hongwei GeAuthors Info & Claims

AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

Pages 1834 - 1836

Published: 13 May 2020 Publication History

Abstract

We study how structural decomposition and interactive learning among multiple agents can be utilized by deep reinforcement learning in order to address high dimensional robotic control problems. We decompose the whole control space of a certain robot into multiple independent agents according to this robot's physical structure. We then introduce the concept of Degree of Interaction (DoI) to describe the level of dependencies (i.e., the necessity of coordination) among the learning agents. Three different methods are then proposed to compute the DoI dynamically during learning. The experimental evaluation demonstrates that the decomposed learning method is substantially more sample efficient than the state-of-the-art algorithms, and more explicit interpretations can be generated on the final learned policy as well as the underlying dependencies among the learning agents.

References

[1]

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017).

[2]

Lucian Busoniu, Bart De Schutter, and Robert Babuska. 2006. Decentralized reinforcement learning control of a robotic manipulator. In 2006 9th ICARCV. 1--6.

[3]

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. In ICML. 1329--1338.

[4]

Uladzimir Dziomin, Anton Kabysh, Vladimir Golovko, and Ralf Stetter. 2013. A multi-agent reinforcement learning approach for the efficient control of mobile robot. In 2013 IEEE 7th IDAACS, Vol. 2. 867--873.

[5]

Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 ICRA. 3389--3396.

[6]

Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine. 2018. Composable deep reinforcement learning for robotic manipulation. In 2018 ICRA. 6244--6251.

[7]

David L Leottau, Javier Ruiz-del Solar, and Robert Babuvs ka. 2018. Decentralized reinforcement learning of robot behaviors. Artificial Intelligence, Vol. 256 (2018), 130--159.

[8]

David L Leottau, Javier Ruiz-del Solar, Patrick MacAlpine, and Peter Stone. 2015. A study of layered learning strategies applied to individual behaviors in robot soccer. In RoboCup. 290--302.

[9]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[10]

José Antonio Martin, H De Lope, et almbox. 2007. A distributed reinforcement learning architecture for multi-link robots. In 4th ICICAR, Vol. 192. 197.

[11]

Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. 2009. Design of semi-decentralized control laws for distributed-air-jet micromanipulators by reinforcement learning. In 2009 IROS. 3277--3283.

[12]

Jan Peters and Stefan Schaal. 2008. Natural actor-critic. Neurocomputing, Vol. 71, 7--9 (2008), 1180--1190.

Digital Library

[13]

Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and Peter Battaglia. 2018. Graph networks as learnable physics engines for inference and control. arXiv preprint arXiv:1806.01242 (2018).

[14]

Erik Schuitema. 2012. Reinforcement learning on autonomous humanoid robots. Mechanical Maritime and Materials Engineering (2012).

[15]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In ICML. 1889--1897.

[16]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[17]

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In ICML. 387--395.

[18]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

[19]

István Szita and András Lörincz. 2006. Learning Tetris using the noisy cross-entropy method. Neural computation, Vol. 18, 12 (2006), 2936--2941.

[20]

Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. 2012 IROS. 5026--5033.

[21]

Sebastiaan Troost, Erik Schuitema, and Pieter Jonker. 2008. Using cooperative multi-agent Q-learning to achieve action space decomposition within single robots. In 1st ERLARS. 23--32.

[22]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.

[23]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.

[24]

Chao Yu, Dongxu Wang, Jiankang Ren, Hongwei Ge, and Liang Sun. 2018. Decentralized Multiagent Reinforcement Learning for Efficient Robotic Control by Coordination Graphs. In PRICAI. 191--203.

Index Terms

Decomposed Deep Reinforcement Learning for Robotic Control
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Cooperation and coordination
      2. Multi-agent systems
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

D3PG: Decomposed Deep Deterministic Policy Gradient for Continuous Control
Distributed Artificial Intelligence
Abstract
In this paper, we study how structural decomposition and multiagent interactions can be utilized by deep reinforcement learning in order to address high dimensional robotic control problems. In this regard, we propose the D3PG approach, which is a ...
Swarm Deep Reinforcement Learning for Robotic Manipulation
Abstract
Deep reinforcement learning scheme, which combines both deep learning and reinforcement learning, enables robots to learn from exploration and flexibly performance in a range of different operational tasks under highly dynamic and complex ...
Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

May 2020

2289 pages

ISBN:9781450375184

General Chairs:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Program Chairs:
Bo An
Nanyang Technological University, Singapore
,
Neil Yorke-Smith Yorke-Smith
Delft University of Technology, Netherlands

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 13 May 2020

Check for updates

Author Tags

Qualifiers

Extended-abstract

Funding Sources

Dalian Science and Technology Innovation Fund

Conference

AAMAS '19

Sponsor:

SIGAI

AAMAS '19: International Conference on Autonomous Agents and Multiagent Systems

May 9 - 13, 2020

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
116
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten