Article

Free access

SPUDD: stochastic planning using decision diagrams

Authors:

Robert St-Aubin,

Craig BoutilierAuthors Info & Claims

UAI'99: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Pages 279 - 288

Published: 30 July 1999 Publication History

PDF eReader Publisher Site

Abstract

Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose and examine a new value iteration algorithm for MDPs that uses algebraic decision diagrams (ADDs) to represent value functions and policies, assuming an ADD input representation of the MDP. Dynamic programming is implemented via ADD manipulation. We demonstrate our method on a class of large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions).

References

[1]

R. Iris Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi. Algebraic decision diagrams and their applications. Intl. Conf. Computer-Aided Design, 188-191, IEEE, 1993.

Digital Library

[2]

R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, 1957.

Digital Library

[3]

D. P. Bertsekas and D. A. Castanon. Adaptive aggregation for infinite horizon dynamic programming. IEEE Trans. Aut. Cont., 34589-598.1989,

[4]

C. Boutilier. Correlated action effects in decision theoretic regression. Proc. UAI-97, pp.30-37, Providence, RI, 1997.

Digital Library

[5]

C. Boutilier, T. Dean, and S. Hanks. Decision theoretic planning: Structural assumptions and computational leverage. J. Artif. Intel. Research, 1999. To appear.

[6]

C. Boutilier and R. Dearden. Approximating value trees in structured dynamic programming. Proc. Intl. Conf. Machine Learning, pp.54-62, Bari, Italy, 1996.

[7]

C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. Proc. IJCAI-95, pp.1104- 1111, Montreal, 1995.

Digital Library

[8]

C. Boutilier, R. Dearden, and M. Goldszmidt. Stochastic dynamic programming with factored representations. manuscript, 1999.

[9]

C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-specific independence in Bayesian networks. Proc. UAI-96, pp. 115-123, Portland, OR, 1996.

Digital Library

[10]

R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comp., C-35(8):677491, 1986.

Digital Library

[11]

A. Cimatti, M. Roveri, and P. Traverso. Automatic obdd-based generation of universal plans in non-deterministic domains. Proc. AAAI-98, pp.875-881, 1998.

Digital Library

[12]

T. Dean and R. Givan. Model minimization in Markov decision processes. Proc. AAAI-97, pp. 106-111, Providence, 1997.

Digital Library

[13]

T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Comp. Intel., 5(3):142-150, 1989.

Digital Library

[14]

R. Dearden and C. Boutilier. Abstraction and approximate decision theoretic planning. Artif. Intel., 89:219-283, 1997.

Digital Library

[15]

R. Dechter. Topological parameters for time-space tradeoff. Proc. UAI-96, pp.220-227, Portland, OR, 1996.

Digital Library

[16]

S. Hanks and D. V. McDermott. Modeling a dynamic and uncertain world i: Symbolic and probabilistic reasoning about change. Artif. Intil., 1994.

Digital Library

[17]

J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, 1988.

Digital Library

[18]

D. Poole. Exploiting the rule structure for decision making within the independent choice logic. Proc. UAI-95, pp.454- 463, Montreal, 1995.

Digital Library

[19]

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, NY., 1994.

Digital Library

[20]

F. Somenzi. CUDD: CU decision diagram package. Available from ftp://vlsi.colorado.edu/pub/, 1998.

Cited By

Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Sapio ABhattacharyya SWolf M(2020)Runtime Adaptation in Wireless Sensor Nodes Using Structured LearningACM Transactions on Cyber-Physical Systems10.1145/33721534:4(1-28)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3372153
Zhou HKhatri SHu JLiu F(2019)A Memory-Efficient Markov Decision Process Computation Framework Using BDD-based Sampling RepresentationProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317748(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317748
Show More Cited By

Index Terms

SPUDD: stochastic planning using decision diagrams
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Probabilistic reasoning
      2. Vagueness and fuzzy logic
2. Mathematics of computing
  1. Probability and statistics
    1. Stochastic processes

Recommendations

Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time ...
Controller Synthesis for Reward Collecting Markov Processes in Continuous Space
HSCC '17: Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control

We propose and analyze a generic mathematical model for optimizing rewards in continuous-space, dynamic environments, called Reward Collecting Markov Processes. Our model is motivated by request-serving applications in robotics, where the objective is ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

UAI'99: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

July 1999

703 pages

ISBN:1558606149

Editors:
Kathryn B. Laskey
George Mason University, Fairfax, Virginia
,
Henri Prade
Institut de Recherche en Informatique de Toulouse, Toulouse, France

Sponsors

Rockwell Science Center: Rockwell Science Center
HUGIN: Hugin Expert A/S
Information Extraction and Transportation
Microsoft Research: Microsoft Research
AT&T: AT&T Labs Research

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 30 July 1999

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

88
Total Citations
View Citations
93
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)7

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Sapio ABhattacharyya SWolf M(2020)Runtime Adaptation in Wireless Sensor Nodes Using Structured LearningACM Transactions on Cyber-Physical Systems10.1145/33721534:4(1-28)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3372153
Zhou HKhatri SHu JLiu F(2019)A Memory-Efficient Markov Decision Process Computation Framework Using BDD-based Sampling RepresentationProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317748(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317748
Gorodetsky AKaraman SMarzouk Y(2018)High-dimensional stochastic optimal control using continuous tensor decompositionsInternational Journal of Robotics Research10.1177/027836491775399437:2-3(340-377)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1177/0278364917753994
Charles JChanel CChauffaut CChauvin PDrougard NImai MNorman TSklar EKomatsu T(2018)Human-Agent Interaction Model Learning based on CrowdsourcingProceedings of the 6th International Conference on Human-Agent Interaction10.1145/3284432.3284471(20-28)Online publication date: 4-Dec-2018
https://dl.acm.org/doi/10.1145/3284432.3284471
Péron MBartlett PBecker KHelmstedt KChadès IZegura E(2018)Two Approximate Dynamic Programming Algorithms for Managing Complete SIS NetworksProceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies10.1145/3209811.3209814(1-10)Online publication date: 20-Jun-2018
https://dl.acm.org/doi/10.1145/3209811.3209814
Mladenov MBelle VKersting KSingh SMarkovitch S(2017)The symbolic interior point methodProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298415(1199-1205)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3298239.3298415
Mladenov MBoutilier CSchuurmans DMeshi OElidan GLu T(2017)Logistic Markov decision processesProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172234(2486-2493)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172234
Bauer MMathur UChadha RSistla AViswanathan MStewart DWeissenbacher G(2017)Exact quantitative probabilistic model checking through rational searchProceedings of the 17th Conference on Formal Methods in Computer-Aided Design10.5555/3168451.3168475(92-99)Online publication date: 2-Oct-2017
https://dl.acm.org/doi/10.5555/3168451.3168475
Oliehoek FSpaan MTerwijn BRobbel PMessias J(2017)The MADP toolboxThe Journal of Machine Learning Research10.5555/3122009.317683318:1(3112-3116)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3122009.3176833
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents