Article

Using inaccurate models in reinforcement learning

Authors:

Morgan Quigley,

Andrew Y. NgAuthors Info & Claims

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 1 - 8

https://doi.org/10.1145/1143844.1143845

Published: 25 June 2006 Publication History

Abstract

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.

References

[1]

Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/.]]

Digital Library

[2]

Anderson, B., & Moore, J. (1989). Optimal control: Linear quadratic methods. Prentice-Hall.]]

Digital Library

[3]

Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning for control. AI Review.]]

Digital Library

[4]

Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. Proc. ICML.]]

Digital Library

[5]

Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report). Robotics Institute, Carnegie-Mellon University.]]

[6]

Bertsekas, D. P. (2001). Dynamic programming and optimal control, vol. 1. Athena Scientific. 2nd edition.]]

Digital Library

[7]

Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.]]

Digital Library

[8]

Dullerud, G. E., & Paganini, F. (2000). A course in robust control theory: A convex approach, vol. 36 of Texts in Applied Mathematics. Springer - New York.]]

[9]

Gillespie, T. (1992). Fundamentals of vehicle dynamics. SAE.]]

[10]

Intel (2001). Opencv libraries for computer vision. http://www.intel.com/research/mrl/research/opencv/.]]

[11]

Jacobson, D. H., & Mayne, D. Q. (1970). Differential dynamic programming. Elsevier.]]

[12]

Kohl, N., & Stone, P. (2004). Machine learning for fast quadrupedal locomotion. Proc. AAAI.]]

[13]

Moore, K. L. (1998). Iterative learning control: An expository overview. Applied and Computational Controls, Signal Processing, and Circuits.]]

[14]

Morimoto, J., & Atkeson, C. G. (2002). Minimax differential dynamic programming: An application to robust biped walking. NIPS 14.]]

[15]

Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems.]]

[16]

Nilim, A., & El Ghaoui, L. (2005). Robust solutions to Markov decision problems with uncertain transition matrices. Operations Research.]]

Digital Library

[17]

Stevens, B. L., & Lewis, F. L. (2003). Aircraft control and simulation. Wiley and Sons. 2nd edition.]]

[18]

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.]]

Digital Library

[19]

Zhou, K., Doyle, J., & Glover, K. (1995). Robust and optimal control. Prentice Hall.]]

Digital Library

Cited By

Medina HFarmer CLiu I(2024)Dielectric Elastomer-Based Actuators: A Modeling and Control Review for Non-ExpertsActuators10.3390/act1304015113:4(151)Online publication date: 17-Apr-2024
https://doi.org/10.3390/act13040151
Qin SSong LJin GChen Y(2024)Iterative Optimal Feedback Control for Time-based Switched Systems2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662132(1728-1735)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10662132
Li XShang WCong S(2024)Model-Based Reinforcement Learning Inspired by Augmented PD for Robotic Control2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661623(4457-4462)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661623
Show More Cited By

Index Terms

Using inaccurate models in reinforcement learning

Recommendations

Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting

Reinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '06: Proceedings of the 23rd international conference on Machine learning

June 2006

1154 pages

ISBN:1595933832

DOI:10.1145/1143844

Program Chairs:
William Cohen,
Andrew Moore

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

107
Total Citations
View Citations
3,068
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)10

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Medina HFarmer CLiu I(2024)Dielectric Elastomer-Based Actuators: A Modeling and Control Review for Non-ExpertsActuators10.3390/act1304015113:4(151)Online publication date: 17-Apr-2024
https://doi.org/10.3390/act13040151
Qin SSong LJin GChen Y(2024)Iterative Optimal Feedback Control for Time-based Switched Systems2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662132(1728-1735)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10662132
Li XShang WCong S(2024)Model-Based Reinforcement Learning Inspired by Augmented PD for Robotic Control2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661623(4457-4462)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10661623
Capone CPaolucci P(2024)Towards biologically plausible model-based reinforcement learning in recurrent spiking networks by dreaming new experiencesScientific Reports10.1038/s41598-024-65631-y14:1Online publication date: 25-Jun-2024
https://doi.org/10.1038/s41598-024-65631-y
Khairy SBalaprakash P(2024)Multi-fidelity reinforcement learning with control variatesNeurocomputing10.1016/j.neucom.2024.127963(127963)Online publication date: Jun-2024
https://doi.org/10.1016/j.neucom.2024.127963
Zhao JZhao WDeng BWang ZZhang FZheng WCao WNan JLian YBurke A(2024)Autonomous driving system: A comprehensive surveyExpert Systems with Applications10.1016/j.eswa.2023.122836242(122836)Online publication date: May-2024
https://doi.org/10.1016/j.eswa.2023.122836
Bajaj CNguyen M(2024)Physics-Informed Neural Networks via Stochastic Hamiltonian Dynamics LearningIntelligent Systems and Applications10.1007/978-3-031-66428-1_11(182-197)Online publication date: 31-Jul-2024
https://doi.org/10.1007/978-3-031-66428-1_11
Zhang SLiu BWang ZZhao TOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Model-based reparameterization policy gradient methodsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669112(68391-68419)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669112
Thomason WKress-Gazit H(2023)Counterexample-Guided Repair for Symbolic-Geometric Action AbstractionsIEEE Transactions on Robotics10.1109/TRO.2023.329491839:5(4152-4165)Online publication date: Oct-2023
https://doi.org/10.1109/TRO.2023.3294918
Spielberg NTempler MSubosits JGerdes J(2023)Learning Policies for Automated Racing Using Vehicle Model GradientsIEEE Open Journal of Intelligent Transportation Systems10.1109/OJITS.2023.32379774(130-142)Online publication date: 2023
https://doi.org/10.1109/OJITS.2023.3237977
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents