Article

A causal approach to hierarchical decomposition of factored MDPs

Authors:

Anders Jonsson,

Andrew BartoAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 401 - 408

https://doi.org/10.1145/1102351.1102402

Published: 07 August 2005 Publication History

Abstract

We present Variable Influence Structure Analysis, an algorithm that dynamically performs hierarchical decomposition of factored Markov decision processes. Our algorithm determines causal relationships between state variables and introduces temporally-extended actions that cause the values of state variables to change. Each temporally-extended action corresponds to a subtask that is significantly easier to solve than the overall task. Results from experiments show great promise in scaling to larger tasks.

References

[1]

Boutilier, C., Dearden. R., & Goldszmidt, M. (1995) Exploiting structure in policy construction. IJCAI, 14: 1104--1113.

Digital Library

[2]

Dean, T., & Kanazawa, K. (1989) A model for reasoning about persistence and causation. Computational Intelligence, 5(3): 142--150.

Digital Library

[3]

Dietterich, T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research. 13: 227--303.

[4]

Digney, B. (1996) Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments. From animals to animats, 4: 363--372.

[5]

Feng, Z., Hansen, E., & Zilberstein, Z. (2003) Symbolic generalization for on-line planning. UAI, 19: 209--216.

Digital Library

[6]

Ghavamzadeh, M., & Mahadevan, S. (2001) Continuous-time hierarchical reinforcement learning. ICML, 18: 186--193.

Digital Library

[7]

Guestrin, C., Koller, D., & Parr, R. (2001) Max-norm projections for factored MDPs. IJCAI, 17: 673--680.

Digital Library

[8]

Helmert, M. (2004) A planning heuristic based on causal graph analysis. ICAPS, 16: 161--170.

[9]

Hengst, B. (2002) Discovering hierarchy in reinforcement learning with HEXQ. ICML, 19: 243--250.

Digital Library

[10]

Hoey, J., St-Aubin, R., Hu, A., & Boutilier, C. (1999) SPUDD: Stochastic Planning using Decision Diagrams. UAI, 15: 279--288.

Digital Library

[11]

Kearns, M., & Koller, D. (1999) Efficient reinforcement learning in factored MDPs. IJCAI, 16: 740--747.

Digital Library

[12]

Mannor, S., Menache, I., Hoze, A., & Klein, U. (2004) Dynamic abstraction in reinforcement learning via clustering. ICML, 21: 560--567.

[13]

McGovern, A., & Barto, A. (2001) Automatic discovery of subgoals in reinforcement learning using diverse density. ICML, 18: 361--368.

Digital Library

[14]

Menache, I., Mannor, S., & Shimkin, N. (2002) Q-Cut -- Dynamic discovery of sub-goals in reinforcement learning. ECML, 14: 295--306.

Digital Library

[15]

Parr, R., & Russell, S. (1998) Reinforcement learning with hierarchies of machines. NIPS, 10: 1043--1049.

Digital Library

[16]

Pickett, M., & Barto, A. (2002) PolicyBlocks: An algorithm for creating useful macro-actions in reinforcement learning. ICML, 19: 506--513.

Digital Library

[17]

Şimşek, Ö., & Barto, A. (2004) Using relative novelty to identify useful temporal abstractions in reinforcement learning. ICML, 21: 751--758.

[18]

Sutton, R., Precup. D., & Singh, S. (1999) Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112: 181--211.

Digital Library

[19]

Thrun, S., & Schwartz, A. (1995) Finding structure in reinforcement learning. NIPS, 8: 385-392.

Cited By

Wu GBao WCao JZhu XWang JXiao WLiang W(2023)Towards efficient long-horizon decision-making using automated structure search method of hierarchical reinforcement learning for edge artificial intelligenceInternet of Things10.1016/j.iot.2023.10095124(100951)Online publication date: Dec-2023
https://doi.org/10.1016/j.iot.2023.100951
Simankov VDubenko Y(2021)SYSTEM ANALYSIS IN HIERARCHICAL INTELLIGENT MULTI-AGENT SYSTEMSVestnik komp'iuternykh i informatsionnykh tekhnologii10.14489/vkit.2021.03.pp.033-046(33-46)Online publication date: 2021
https://doi.org/10.14489/vkit.2021.03.pp.033-046
Dubenko YDyshkant EGura D(2021)Multi-Agent Reinforcement Learning for Robot CollaborationRobotics, Machinery and Engineering Technology for Precision Agriculture10.1007/978-981-16-3844-2_53(607-623)Online publication date: 5-Oct-2021
https://doi.org/10.1007/978-981-16-3844-2_53
Show More Cited By

Recommendations

Causal Graph Based Decomposition of Factored MDPs

We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures ...
Solving factored MDPs using non-homogeneous partitions
special issue on planning with uncertainty and incomplete information

We present an algorithm for aggregating states in solving large MDPs (represented as factored MDPs) using search by successive refinement in the space of non-homogeneous partitions. Homogeneity is defined in terms of stochastic bisimulation and reward ...
Near-optimal reinforcement learning in factored MDPs
NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer Ω(SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = Ω(SA) time to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
330
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu GBao WCao JZhu XWang JXiao WLiang W(2023)Towards efficient long-horizon decision-making using automated structure search method of hierarchical reinforcement learning for edge artificial intelligenceInternet of Things10.1016/j.iot.2023.10095124(100951)Online publication date: Dec-2023
https://doi.org/10.1016/j.iot.2023.100951
Simankov VDubenko Y(2021)SYSTEM ANALYSIS IN HIERARCHICAL INTELLIGENT MULTI-AGENT SYSTEMSVestnik komp'iuternykh i informatsionnykh tekhnologii10.14489/vkit.2021.03.pp.033-046(33-46)Online publication date: 2021
https://doi.org/10.14489/vkit.2021.03.pp.033-046
Dubenko YDyshkant EGura D(2021)Multi-Agent Reinforcement Learning for Robot CollaborationRobotics, Machinery and Engineering Technology for Precision Agriculture10.1007/978-981-16-3844-2_53(607-623)Online publication date: 5-Oct-2021
https://doi.org/10.1007/978-981-16-3844-2_53
Gillani RNasir A(2021)Computational complexity reduction algorithms for Markov decision process based vertical handoff in mobile networksInternational Journal of Communication Systems10.1002/dac.493834:15Online publication date: 28-Jul-2021
https://doi.org/10.1002/dac.4938
Dubenko YDyshkant YGura D(2020)ANALYSIS OF HIERARCHICAL LEARNING WITH REINFORCEMENT FOR THE IMPLEMENTATION OF BEHAVIORAL STRATEGIES OF INTELLIGENT AGENTSVestnik komp'iuternykh i informatsionnykh tekhnologii10.14489/vkit.2020.09.pp.035-045(35-45)Online publication date: Sep-2020
https://doi.org/10.14489/vkit.2020.09.pp.035-045
Farahani MMozayani N(2019)Automatic construction and evaluation of macro-actions in reinforcement learningApplied Soft Computing10.1016/j.asoc.2019.105574(105574)Online publication date: Jun-2019
https://doi.org/10.1016/j.asoc.2019.105574
Konidaris G(2016)Constructing abstraction hierarchies using a skill-symbol loopProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060851(1648-1654)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060832.3060851
Gillani RNasir A(2016)Incorporating artificial intelligence in shopping assistance robot using Markov Decision Process2016 International Conference on Intelligent Systems Engineering (ICISE)10.1109/INTELSE.2016.7475168(94-99)Online publication date: Jan-2016
https://doi.org/10.1109/INTELSE.2016.7475168
Gillani RNasir A(2016)Conversion of MDP problems into heuristics based planning problems using temporal decomposition2016 13th International Bhurban Conference on Applied Sciences and Technology (IBCAST)10.1109/IBCAST.2016.7429874(179-184)Online publication date: Jan-2016
https://doi.org/10.1109/IBCAST.2016.7429874
Menashe JStone PWeiss GYolum PBordini RElkind E(2015)Monte Carlo Hierarchical Model LearningProceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems10.5555/2772879.2773252(771-779)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.5555/2772879.2773252
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents