research-article

Hybrid control for combining model-based and model-free reinforcement learning

Authors:

Allison Pinosky,

Alexander Broad, Brenna Argall,

Todd D MurpheyAuthors Info & Claims

The International Journal of Robotics Research, Volume 42, Issue 6

Pages 337 - 355

https://doi.org/10.1177/02783649221083331

Published: 01 May 2023 Publication History

Abstract

We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains.

References

[1]

Abraham I, de la Torre G, and Murphey T (2017) Model-based control using Koopman operators. In: Proceedings of robotics: science and systems. MIT Press Journals.

[2]

Abraham I, Handa A, and Ratliff N, et al. (2020) Model-based generalization under parameter uncertainty using path integral control. IEEE Robotics and Automation Letters 5(2): 2864–2871.

[3]

Abraham I and Murphey TD (2019) Active learning of dynamics for data-driven control using Koopman operators. IEEE Transactions on Robotics 35(5): 1071–1083.

[4]

Ansari AR and Murphey TD (2016) Sequential action control: closed-form optimal control for nonlinear and nonsmooth systems. IEEE Transactions on Robotics 32(5): 1196–1214.

Digital Library

[5]

Argall BD, Chernova S, and Veloso M, et al. (2009) A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5): 469–483.

Digital Library

[6]

Axelsson H, Wardi Y, and Egerstedt M, et al. (2008) Gradient descent approach to optimal mode scheduling in hybrid dynamical systems. Journal of Optimization Theory and Applications 136(2): 167–186.

Digital Library

[7]

Bansal S, Calandra R, and Chua K, et al. (2017) Mbmf: Model-based priors for model-free reinforcement learning. arXiv preprint arXiv:1709.03153.

[8]

Boyan JA (1999) Least-squares temporal difference learning. In: Proceedings of the 16th International conference on machine learning, Bled, Slovenia, June 27-30, 1999, pp. 49–56.

[9]

Brockman G, Cheung V, and Pettersson L, et al. (2016) OpenAI Gym. arXiv preprint arXiv:1606.01540.

[10]

Buckman J, Hafner D, and Tucker G, et al. (2018) Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: NeurIPS Montreal, Canada, December 3-8, 2018.

[11]

Chebotar Y, Kalakrishnan M, and Yahya A, et al. (2017) Path integral guided policy search. In: International conference on robotics and automation (ICRA), Singapore, May 29-June3, 2017, pp. 3381–3388.

[12]

Chua K, Calandra R, and McAllister R, et al. (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in neural information processing systems, Montreal, Canada, December 3-8, 2018, pp. 4754–4765.

[13]

Coumans E and Bai Y (2016) Pybullet, A Python Module for Physics Simulation for Games, Robotics and Machine Learning. GitHub repository.

[14]

Deisenroth M and Rasmussen CE (2011) PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472.

[15]

Feinberg V, Wan A, and Stoica I, et al. (2018) Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101.

[16]

Haarnoja T, Zhou A, and Abbeel P, et al. (2018a) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR Vol. 80, 1861–1870.

[17]

Haarnoja T, Zhou A, and Hartikainen K, et al. (2018b) Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.

[18]

Havens A, Ouyang Y, and Nagarajan P, et al. (2019) Learning latent state spaces for planning through reward prediction. arXiv preprint arXiv:1912.04201.

[19]

Janner M, Fu J, and Zhang M, et al. (2019) When to trust your model: model-based policy optimization. Advances in Neural Information Processing Systems Vancouver, British Columbia, December 8–14, 201932: 12519–12530.

[20]

Kingma DP and Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[21]

Lambert N, Amos B, and Yadan O, et al. (2020) Proceedings of the 2nd Conference on Learning for Dynamics and Control. Proceedings of Machine Learning Research, PMLR, Vol. 120, 761–770. arXiv preprint arXiv:2002.04523.

[22]

Levine S and Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in neural information processing systems, Montreal, Canada, 8-13 December, 2014, pp. 1071–1079.

[23]

Li W and Todorov E (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. In: International Conference on informatics in control, automation and robotics, Setubal, Portugal, 25-28 August, 2004, pp. 222–229.

[24]

Montgomery WH and Levine S (2016) Guided policy search via approximate mirror descent. In: Lee DD, Sugiyama M, and Luxburg UV, et al. (eds) Advances in neural information processing systems, Barcelona, Spain, 5-10 December, 2016, Vol. 29, 4008–4016.

[25]

Nagabandi A, Kahn G, and Fearing RS, et al. (2018) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: International conference on robotics and automation (ICRA), South Brisbane, Australia, 21-25 May, 2018, pp. 7559–7566.

[26]

OpenAI (2017) Roboschool, Open-Source Software for Robot Simulation, Integrated With Openai Gym. GitHub repository.

[27]

Pathak D, Agrawal P, and Efros AA, et al. (2017) Curiosity-driven exploration by self-supervised prediction. In: Conference on computer vision and pattern recognition workshops, Honolulu, Hawaii, USA, 21-26 July, 2017, pp. 16–17.

[28]

Pomerleau D (1998) An autonomous land vehicle in a neural network. In: Advances in neural information processing systems, Denver, Colorado, USA, 30 November-5December, 1998, Morgan Kaufmann Publishers Inc., Vol. 1.

[29]

Precup D, Sutton RS, and Dasgupta S (2001) Off-policy temporal-difference learning with function approximation. In: ICML, Williamstown, MA, 28 June-1 July, 2001, pp. 417–424.

[30]

Ross S and Bagnell D (2010) Efficient reductions for imitation learning. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, Sardinia, Italy, 13-15 May, 2010, pp. 661–668.

[31]

Schulman J, Wolski F, and Dhariwal P, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

[32]

Sharma A, Gu S, and Levine S, et al. (2019) Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657.

[33]

Sutton RS, McAllester DA, and Singh SP, et al. (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, Denver, Colorado, USA, 27 November-2 December, 2000, pp. 1057–1063.

[34]

Tang G, Sun W, and Hauser K (2018) Learning trajectories for real- time optimal control of quadrotors. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), Madrid, Spain, 1-5 October, 2018, pp. 3620–3625.

Digital Library

[35]

Theodorou EA and Todorov E (2012) Relative entropy and free energy dualities: Connections to path integral and KL control. In: IEEE Conference on decision and control (CDC), Maui, Hawaii, USA, 10-13 December, 2012, pp. 1466–1473.

[36]

Todorov E, Erez T, and Tassa Y (2012) Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Algarve, Portugal, 7-12 October, 2012, pp. 5026–5033.

[37]

Vasudevan R, Gonzalez H, and Bajcsy R, et al. (2013) Consistent approximations for the optimal control of constrained switched systems—part 1: a conceptual algorithm. SIAM Journal on Control and Optimization 51(6): 4463–4483.

Digital Library

[38]

Williams G, Drews P, and Goldfain B, et al. (2016) Aggressive driving with model predictive path integral control. In: IEEE International conference on robotics and automation, Stockholm, Sweden, 16-21 May, 2016, pp. 1433–1440.

[39]

Williams G, Goldfain B, and Drews P, et al. (2018) Robust sampling based model predictive control with sparse objective information. In: Robotics science and systems, Pittsburgh, Pennsylvania, USA, 26-30 June, 2018.

[40]

Williams G, Wagener N, and Goldfain B, et al. (2017) Information theoretic MPC for model-based reinforcement learning. In: IEEE International conference on robotics and automation, Singapore, 29 May-3 June, 2017.

Cited By

Hernandez Moreno VJansing SPolikarpov MCarmichael MDeuse J(2024)Obstacles and opportunities for learning from demonstration in practical industrial assemblyRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2023.10265886:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.rcim.2023.102658

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Machine Learning and Knowledge Discovery in Databases
Abstract
Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy ...
Model gradient: unified model and policy learning in model-based reinforcement learning
Abstract
Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment. Previous model learning methods aim at fitting the transition data, and commonly ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research

International Journal of Robotics Research Volume 42, Issue 6

May 2023

160 pages

ISSN:0278-3649

Issue’s Table of Contents

© The Author(s) 2022.

This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 May 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hernandez Moreno VJansing SPolikarpov MCarmichael MDeuse J(2024)Obstacles and opportunities for learning from demonstration in practical industrial assemblyRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2023.10265886:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.rcim.2023.102658

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents