Article

A hierarchical approach to efficient reinforcement learning in deterministic domains

Authors:

Alexander L. Strehl,

Michael L. LittmanAuthors Info & Claims

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

Pages 313 - 319

https://doi.org/10.1145/1160633.1160686

Published: 08 May 2006 Publication History

Abstract

Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-scale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcement-learning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchy-based MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near optimal performance within the hierarchy.

References

[1]

{Boutilier et al., 1999} Craig Boutilier, Thomas Dean, and Steve Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1--94, 1999.]]

Digital Library

[2]

{Brafman and Tennenholtz, 2002} Ronen I. Brafman and Moshe Tennenholtz. R-MAX---a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research. 3:213--231, 2002.]]

Digital Library

[3]

{Dietterich, 2000a} Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227--303, 2000.]]

Digital Library

[4]

{Dietterich, 2000b} Thomas G. Dietterich. An overview of MAXQ hierarchical reinforcement learning. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation SARA 2000, pages 26--44, 2000.]]

Digital Library

[5]

{Dietterich, 2000c} Thomas G. Dietterich. State abstraction in MAXQ hierarchical reinforcement learning. In Advances in Neural Information Processing Systems 12, pages 994--1000, 2000.]]

[6]

{Guestrin et al., 2002} Carlos Guestrin, Relu Patrascu, and Dale Schuurmans. Algorithm-directed exploration for model-based reinforcement learning in factored MDPs. In Proceedings of the International Conference on Machine Learning, pages 235--242, 2002.]]

Digital Library

[7]

{Hengst, 2002} B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, pages 243--250. Morgan Kaufmann, 2002.]]

Digital Library

[8]

{Kaelbling, 1993} Leslie Pack Kaelbling. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, 1993. Morgan Kaufmann.]]

[9]

{Kakade, 2003} Sham M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.]]

[10]

{Kearns and Koller, 1999} Michael J. Kearns and Daphne Koller. Efficient reinforcement learning in factored MDPs. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI), pages 740--747, 1999.]]

Digital Library

[11]

{Kearns and Singh, 2002} Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2--3):209--232, 2002.]]

Digital Library

[12]

{Koller and Parr, 2000} Daphne Koller and Ronald Parr. Policy iteration for factored MDPs. In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference (UAI 2000), pages 326--334, 2000.]]

Digital Library

[13]

{Moore and Atkeson, 1993} Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103--130, 1993.]]

Digital Library

[14]

{Sutton and Barto, 1998} Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.]]

Digital Library

[15]

{Sutton, 1990} Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, pages 216--224, Austin, TX, 1990. Morgan Kaufmann.]]

Digital Library

[16]

{Watkins and Dayan, 1992} Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279--292, 1992.]]

Digital Library

Cited By

Wu MGao YWang PZhang FLiu Z(2021)The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement LearningSymmetry10.3390/sym1308133513:8(1335)Online publication date: 24-Jul-2021
https://doi.org/10.3390/sym13081335
Menashe JStone PElkind EVeloso MAgmon NTaylor M(2019)Escape RoomProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332031(2123-2125)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332031
Han MSenellart PBressan SWu HMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Routing an Autonomous Taxi with Reinforcement LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983379(2421-2424)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983379
Show More Cited By

Recommendations

Hierarchical Reinforcement Learning: A Comprehensive Survey

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious ...
Transfer in variable-reward hierarchical reinforcement learning

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL ...
Hierarchical Reinforcement Learning with Integrated Discovery of Salient Subgoals
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

Hierarchical Reinforcement Learning (HRL) is a promising approach to solve more complex tasks which may be challenging for the traditional reinforcement learning. HRL achieves this by decomposing a task into shorter-horizon subgoals which are simpler to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

May 2006

1631 pages

ISBN:1595933034

DOI:10.1145/1160633

General Chairs:
Hideyuki Nakashima
Future University - Hakodate, Japan
,
Michael Wellman
University of Michigan
,
Program Chairs:
Gerhard Weiss
Technical University Munich, Germany
,
Peter Stone
The University of Texas at Austin

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFMAS: The International Foundation for Multiagent Systems
SIGAI: ACM Special Interest Group on Artificial Intelligence
ATAL: The International Workshop on Agent Theories, Architectures, and Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

AAMAS06

Sponsor:

IFMAS
SIGAI
ATAL

AAMAS06: AAMAS '06 - 5th International Joint Conference on Autonomous Agents and Multi-agent Systems 2006

May 8 - 12, 2006

Japan, Hakodate

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
356
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu MGao YWang PZhang FLiu Z(2021)The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement LearningSymmetry10.3390/sym1308133513:8(1335)Online publication date: 24-Jul-2021
https://doi.org/10.3390/sym13081335
Menashe JStone PElkind EVeloso MAgmon NTaylor M(2019)Escape RoomProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332031(2123-2125)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332031
Han MSenellart PBressan SWu HMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Routing an Autonomous Taxi with Reinforcement LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983379(2421-2424)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983379
Chalmers EContreras ERobertson BLuczak AGruber A(2016)Context-switching and adaptation: Brain-inspired mechanisms for handling environmental changes2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727651(3522-3529)Online publication date: Jul-2016
https://doi.org/10.1109/IJCNN.2016.7727651
Bai AWu FChen X(2015)Online Planning for Large Markov Decision Processes with Hierarchical DecompositionACM Transactions on Intelligent Systems and Technology10.1145/27173166:4(1-28)Online publication date: 15-Jul-2015
https://dl.acm.org/doi/10.1145/2717316
Taherian NShiri M(2014)Q^{*}-based state abstraction and knowledge discovery in reinforcement learningIntelligent Data Analysis10.5555/2691107.269111718:6(1153-1175)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.5555/2691107.2691117
Lampton AValasek JValasek J(2012)Multiresolution State-Space Discretization Method for Q-Learning for One or More Regions of InterestAdvances in Intelligent and Autonomous Aerospace Systems10.2514/5.9781600868962.0273.0308(273-308)Online publication date: 10-Jan-2012
https://doi.org/10.2514/5.9781600868962.0273.0308
Wolfe JMarthi BRussell S(2010)Combined task and motion planning for mobile manipulationProceedings of the Twentieth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037334.3037373(254-257)Online publication date: 12-May-2010
https://dl.acm.org/doi/10.5555/3037334.3037373
Mehta NRay STadepalli PDietterich TCohen WMcCallum ARoweis S(2008)Automatic discovery and transfer of MAXQ hierarchiesProceedings of the 25th international conference on Machine learning10.1145/1390156.1390238(648-655)Online publication date: 5-Jul-2008
https://dl.acm.org/doi/10.1145/1390156.1390238
Jong NStone PCohen WMcCallum ARoweis S(2008)Hierarchical model-based reinforcement learningProceedings of the 25th international conference on Machine learning10.1145/1390156.1390211(432-439)Online publication date: 5-Jul-2008
https://dl.acm.org/doi/10.1145/1390156.1390211
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents