Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1143844.1143845acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Using inaccurate models in reinforcement learning

Published: 25 June 2006 Publication History

Abstract

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.

References

[1]
Abbeel, P., Quigley, M., & Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. (Full paper.) http://www.cs.stanford.edu/~pabbeel/.]]
[2]
Anderson, B., & Moore, J. (1989). Optimal control: Linear quadratic methods. Prentice-Hall.]]
[3]
Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning for control. AI Review.]]
[4]
Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. Proc. ICML.]]
[5]
Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report). Robotics Institute, Carnegie-Mellon University.]]
[6]
Bertsekas, D. P. (2001). Dynamic programming and optimal control, vol. 1. Athena Scientific. 2nd edition.]]
[7]
Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.]]
[8]
Dullerud, G. E., & Paganini, F. (2000). A course in robust control theory: A convex approach, vol. 36 of Texts in Applied Mathematics. Springer - New York.]]
[9]
Gillespie, T. (1992). Fundamentals of vehicle dynamics. SAE.]]
[10]
Intel (2001). Opencv libraries for computer vision. http://www.intel.com/research/mrl/research/opencv/.]]
[11]
Jacobson, D. H., & Mayne, D. Q. (1970). Differential dynamic programming. Elsevier.]]
[12]
Kohl, N., & Stone, P. (2004). Machine learning for fast quadrupedal locomotion. Proc. AAAI.]]
[13]
Moore, K. L. (1998). Iterative learning control: An expository overview. Applied and Computational Controls, Signal Processing, and Circuits.]]
[14]
Morimoto, J., & Atkeson, C. G. (2002). Minimax differential dynamic programming: An application to robust biped walking. NIPS 14.]]
[15]
Morimoto, J., & Doya, K. (2001). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems.]]
[16]
Nilim, A., & El Ghaoui, L. (2005). Robust solutions to Markov decision problems with uncertain transition matrices. Operations Research.]]
[17]
Stevens, B. L., & Lewis, F. L. (2003). Aircraft control and simulation. Wiley and Sons. 2nd edition.]]
[18]
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.]]
[19]
Zhou, K., Doyle, J., & Glover, K. (1995). Robust and optimal control. Prentice Hall.]]

Cited By

View all
  • (2024)Dielectric Elastomer-Based Actuators: A Modeling and Control Review for Non-ExpertsActuators10.3390/act1304015113:4(151)Online publication date: 17-Apr-2024
  • (2024)Iterative Optimal Feedback Control for Time-based Switched Systems2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662132(1728-1735)Online publication date: 28-Jul-2024
  • (2024)Model-Based Reinforcement Learning Inspired by Augmented PD for Robotic Control2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661623(4457-4462)Online publication date: 28-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

ICML '06 Paper Acceptance Rate 140 of 548 submissions, 26%;
Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)10
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Dielectric Elastomer-Based Actuators: A Modeling and Control Review for Non-ExpertsActuators10.3390/act1304015113:4(151)Online publication date: 17-Apr-2024
  • (2024)Iterative Optimal Feedback Control for Time-based Switched Systems2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662132(1728-1735)Online publication date: 28-Jul-2024
  • (2024)Model-Based Reinforcement Learning Inspired by Augmented PD for Robotic Control2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661623(4457-4462)Online publication date: 28-Jul-2024
  • (2024)Towards biologically plausible model-based reinforcement learning in recurrent spiking networks by dreaming new experiencesScientific Reports10.1038/s41598-024-65631-y14:1Online publication date: 25-Jun-2024
  • (2024)Multi-fidelity reinforcement learning with control variatesNeurocomputing10.1016/j.neucom.2024.127963(127963)Online publication date: Jun-2024
  • (2024)Autonomous driving system: A comprehensive surveyExpert Systems with Applications10.1016/j.eswa.2023.122836242(122836)Online publication date: May-2024
  • (2024)Physics-Informed Neural Networks via Stochastic Hamiltonian Dynamics LearningIntelligent Systems and Applications10.1007/978-3-031-66428-1_11(182-197)Online publication date: 31-Jul-2024
  • (2023)Model-based reparameterization policy gradient methodsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669112(68391-68419)Online publication date: 10-Dec-2023
  • (2023)Counterexample-Guided Repair for Symbolic-Geometric Action AbstractionsIEEE Transactions on Robotics10.1109/TRO.2023.329491839:5(4152-4165)Online publication date: Oct-2023
  • (2023)Learning Policies for Automated Racing Using Vehicle Model GradientsIEEE Open Journal of Intelligent Transportation Systems10.1109/OJITS.2023.32379774(130-142)Online publication date: 2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media