Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1160633.1160686acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
Article

A hierarchical approach to efficient reinforcement learning in deterministic domains

Published: 08 May 2006 Publication History

Abstract

Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-scale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcement-learning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchy-based MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near optimal performance within the hierarchy.

References

[1]
{Boutilier et al., 1999} Craig Boutilier, Thomas Dean, and Steve Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1--94, 1999.]]
[2]
{Brafman and Tennenholtz, 2002} Ronen I. Brafman and Moshe Tennenholtz. R-MAX---a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research. 3:213--231, 2002.]]
[3]
{Dietterich, 2000a} Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227--303, 2000.]]
[4]
{Dietterich, 2000b} Thomas G. Dietterich. An overview of MAXQ hierarchical reinforcement learning. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation SARA 2000, pages 26--44, 2000.]]
[5]
{Dietterich, 2000c} Thomas G. Dietterich. State abstraction in MAXQ hierarchical reinforcement learning. In Advances in Neural Information Processing Systems 12, pages 994--1000, 2000.]]
[6]
{Guestrin et al., 2002} Carlos Guestrin, Relu Patrascu, and Dale Schuurmans. Algorithm-directed exploration for model-based reinforcement learning in factored MDPs. In Proceedings of the International Conference on Machine Learning, pages 235--242, 2002.]]
[7]
{Hengst, 2002} B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, pages 243--250. Morgan Kaufmann, 2002.]]
[8]
{Kaelbling, 1993} Leslie Pack Kaelbling. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, 1993. Morgan Kaufmann.]]
[9]
{Kakade, 2003} Sham M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.]]
[10]
{Kearns and Koller, 1999} Michael J. Kearns and Daphne Koller. Efficient reinforcement learning in factored MDPs. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), pages 740--747, 1999.]]
[11]
{Kearns and Singh, 2002} Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2--3):209--232, 2002.]]
[12]
{Koller and Parr, 2000} Daphne Koller and Ronald Parr. Policy iteration for factored MDPs. In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference (UAI 2000), pages 326--334, 2000.]]
[13]
{Moore and Atkeson, 1993} Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103--130, 1993.]]
[14]
{Sutton and Barto, 1998} Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.]]
[15]
{Sutton, 1990} Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216--224, Austin, TX, 1990. Morgan Kaufmann.]]
[16]
{Watkins and Dayan, 1992} Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279--292, 1992.]]

Cited By

View all
  • (2021)The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement LearningSymmetry10.3390/sym1308133513:8(1335)Online publication date: 24-Jul-2021
  • (2019)Escape RoomProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332031(2123-2125)Online publication date: 8-May-2019
  • (2016)Routing an Autonomous Taxi with Reinforcement LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983379(2421-2424)Online publication date: 24-Oct-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
May 2006
1631 pages
ISBN:1595933034
DOI:10.1145/1160633
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. factored representations
  2. hierarchical reinforcement learning
  3. reinforcement learning
  4. sample complexity

Qualifiers

  • Article

Conference

AAMAS06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement LearningSymmetry10.3390/sym1308133513:8(1335)Online publication date: 24-Jul-2021
  • (2019)Escape RoomProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332031(2123-2125)Online publication date: 8-May-2019
  • (2016)Routing an Autonomous Taxi with Reinforcement LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983379(2421-2424)Online publication date: 24-Oct-2016
  • (2016)Context-switching and adaptation: Brain-inspired mechanisms for handling environmental changes2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727651(3522-3529)Online publication date: Jul-2016
  • (2015)Online Planning for Large Markov Decision Processes with Hierarchical DecompositionACM Transactions on Intelligent Systems and Technology10.1145/27173166:4(1-28)Online publication date: 15-Jul-2015
  • (2014)Q^{*}-based state abstraction and knowledge discovery in reinforcement learningIntelligent Data Analysis10.5555/2691107.269111718:6(1153-1175)Online publication date: 1-Nov-2014
  • (2012)Multiresolution State-Space Discretization Method for Q-Learning for One or More Regions of InterestAdvances in Intelligent and Autonomous Aerospace Systems10.2514/5.9781600868962.0273.0308(273-308)Online publication date: 10-Jan-2012
  • (2010)Combined task and motion planning for mobile manipulationProceedings of the Twentieth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037334.3037373(254-257)Online publication date: 12-May-2010
  • (2008)Automatic discovery and transfer of MAXQ hierarchiesProceedings of the 25th international conference on Machine learning10.1145/1390156.1390238(648-655)Online publication date: 5-Jul-2008
  • (2008)Hierarchical model-based reinforcement learningProceedings of the 25th international conference on Machine learning10.1145/1390156.1390211(432-439)Online publication date: 5-Jul-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media