Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hybrid control for combining model-based and model-free reinforcement learning

Published: 01 May 2023 Publication History

Abstract

We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains.

References

[1]
Abraham I, de la Torre G, and Murphey T (2017) Model-based control using Koopman operators. In: Proceedings of robotics: science and systems. MIT Press Journals.
[2]
Abraham I, Handa A, and Ratliff N, et al. (2020) Model-based generalization under parameter uncertainty using path integral control. IEEE Robotics and Automation Letters 5(2): 2864–2871.
[3]
Abraham I and Murphey TD (2019) Active learning of dynamics for data-driven control using Koopman operators. IEEE Transactions on Robotics 35(5): 1071–1083.
[4]
Ansari AR and Murphey TD (2016) Sequential action control: closed-form optimal control for nonlinear and nonsmooth systems. IEEE Transactions on Robotics 32(5): 1196–1214.
[5]
Argall BD, Chernova S, and Veloso M, et al. (2009) A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5): 469–483.
[6]
Axelsson H, Wardi Y, and Egerstedt M, et al. (2008) Gradient descent approach to optimal mode scheduling in hybrid dynamical systems. Journal of Optimization Theory and Applications 136(2): 167–186.
[7]
Bansal S, Calandra R, and Chua K, et al. (2017) Mbmf: Model-based priors for model-free reinforcement learning. arXiv preprint arXiv:1709.03153.
[8]
Boyan JA (1999) Least-squares temporal difference learning. In: Proceedings of the 16th International conference on machine learning, Bled, Slovenia, June 27-30, 1999, pp. 49–56.
[9]
Brockman G, Cheung V, and Pettersson L, et al. (2016) OpenAI Gym. arXiv preprint arXiv:1606.01540.
[10]
Buckman J, Hafner D, and Tucker G, et al. (2018) Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: NeurIPS Montreal, Canada, December 3-8, 2018.
[11]
Chebotar Y, Kalakrishnan M, and Yahya A, et al. (2017) Path integral guided policy search. In: International conference on robotics and automation (ICRA), Singapore, May 29-June3, 2017, pp. 3381–3388.
[12]
Chua K, Calandra R, and McAllister R, et al. (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in neural information processing systems, Montreal, Canada, December 3-8, 2018, pp. 4754–4765.
[13]
Coumans E and Bai Y (2016) Pybullet, A Python Module for Physics Simulation for Games, Robotics and Machine Learning. GitHub repository.
[14]
Deisenroth M and Rasmussen CE (2011) PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472.
[15]
Feinberg V, Wan A, and Stoica I, et al. (2018) Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101.
[16]
Haarnoja T, Zhou A, and Abbeel P, et al. (2018a) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR Vol. 80, 1861–1870.
[17]
Haarnoja T, Zhou A, and Hartikainen K, et al. (2018b) Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
[18]
Havens A, Ouyang Y, and Nagarajan P, et al. (2019) Learning latent state spaces for planning through reward prediction. arXiv preprint arXiv:1912.04201.
[19]
Janner M, Fu J, and Zhang M, et al. (2019) When to trust your model: model-based policy optimization. Advances in Neural Information Processing Systems Vancouver, British Columbia, December 8–14, 201932: 12519–12530.
[20]
Kingma DP and Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[21]
Lambert N, Amos B, and Yadan O, et al. (2020) Proceedings of the 2nd Conference on Learning for Dynamics and Control. Proceedings of Machine Learning Research, PMLR, Vol. 120, 761–770. arXiv preprint arXiv:2002.04523.
[22]
Levine S and Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in neural information processing systems, Montreal, Canada, 8-13 December, 2014, pp. 1071–1079.
[23]
Li W and Todorov E (2004) Iterative linear quadratic regulator design for nonlinear biological movement systems. In: International Conference on informatics in control, automation and robotics, Setubal, Portugal, 25-28 August, 2004, pp. 222–229.
[24]
Montgomery WH and Levine S (2016) Guided policy search via approximate mirror descent. In: Lee DD, Sugiyama M, and Luxburg UV, et al. (eds) Advances in neural information processing systems, Barcelona, Spain, 5-10 December, 2016, Vol. 29, 4008–4016.
[25]
Nagabandi A, Kahn G, and Fearing RS, et al. (2018) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: International conference on robotics and automation (ICRA), South Brisbane, Australia, 21-25 May, 2018, pp. 7559–7566.
[26]
OpenAI (2017) Roboschool, Open-Source Software for Robot Simulation, Integrated With Openai Gym. GitHub repository.
[27]
Pathak D, Agrawal P, and Efros AA, et al. (2017) Curiosity-driven exploration by self-supervised prediction. In: Conference on computer vision and pattern recognition workshops, Honolulu, Hawaii, USA, 21-26 July, 2017, pp. 16–17.
[28]
Pomerleau D (1998) An autonomous land vehicle in a neural network. In: Advances in neural information processing systems, Denver, Colorado, USA, 30 November-5December, 1998, Morgan Kaufmann Publishers Inc., Vol. 1.
[29]
Precup D, Sutton RS, and Dasgupta S (2001) Off-policy temporal-difference learning with function approximation. In: ICML, Williamstown, MA, 28 June-1 July, 2001, pp. 417–424.
[30]
Ross S and Bagnell D (2010) Efficient reductions for imitation learning. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, Sardinia, Italy, 13-15 May, 2010, pp. 661–668.
[31]
Schulman J, Wolski F, and Dhariwal P, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
[32]
Sharma A, Gu S, and Levine S, et al. (2019) Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657.
[33]
Sutton RS, McAllester DA, and Singh SP, et al. (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, Denver, Colorado, USA, 27 November-2 December, 2000, pp. 1057–1063.
[34]
Tang G, Sun W, and Hauser K (2018) Learning trajectories for real- time optimal control of quadrotors. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), Madrid, Spain, 1-5 October, 2018, pp. 3620–3625.
[35]
Theodorou EA and Todorov E (2012) Relative entropy and free energy dualities: Connections to path integral and KL control. In: IEEE Conference on decision and control (CDC), Maui, Hawaii, USA, 10-13 December, 2012, pp. 1466–1473.
[36]
Todorov E, Erez T, and Tassa Y (2012) Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Algarve, Portugal, 7-12 October, 2012, pp. 5026–5033.
[37]
Vasudevan R, Gonzalez H, and Bajcsy R, et al. (2013) Consistent approximations for the optimal control of constrained switched systems—part 1: a conceptual algorithm. SIAM Journal on Control and Optimization 51(6): 4463–4483.
[38]
Williams G, Drews P, and Goldfain B, et al. (2016) Aggressive driving with model predictive path integral control. In: IEEE International conference on robotics and automation, Stockholm, Sweden, 16-21 May, 2016, pp. 1433–1440.
[39]
Williams G, Goldfain B, and Drews P, et al. (2018) Robust sampling based model predictive control with sparse objective information. In: Robotics science and systems, Pittsburgh, Pennsylvania, USA, 26-30 June, 2018.
[40]
Williams G, Wagener N, and Goldfain B, et al. (2017) Information theoretic MPC for model-based reinforcement learning. In: IEEE International conference on robotics and automation, Singapore, 29 May-3 June, 2017.

Cited By

View all
  • (2024)Obstacles and opportunities for learning from demonstration in practical industrial assemblyRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2023.10265886:COnline publication date: 1-Apr-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 42, Issue 6
May 2023
160 pages
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 May 2023

Author Tags

  1. Reinforcement learning
  2. learning theory
  3. optimal control
  4. hybrid control

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Obstacles and opportunities for learning from demonstration in practical industrial assemblyRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2023.10265886:COnline publication date: 1-Apr-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media