article

Efficient solution algorithms for factored MDPs

Authors:

Carlos Guestrin,

Shobha VenkataramanAuthors Info & Claims

Journal of Artificial Intelligence Research, Volume 19, Issue 1

Pages 399 - 468

Published: 01 October 2003 Publication History

Abstract

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10⁴⁰ states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.

References

[1]

Arnborg, S., Corneil, D. G., & Proskurowski, A. (1987). Complexity of finding embeddings in a K-tree. SIAM Journal of Algebraic and Discrete Methods, 8 (2), 277 - 284.

Digital Library

[2]

Becker, A., & Geiger, D. (2001). A sufficiently fast algorithm for finding close to optimal clique trees. Artificial Intelligence, 125 (1-2), 3-17.

Digital Library

[3]

Bellman, R., Kalaba, R., & Kotkin, B. (1963). Polynomial approximation - a new computational technique in dynamic programming. Math. Comp., 17 (8), 155-161.

[4]

Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.

Digital Library

[5]

Bertele, U., & Brioschi, F. (1972). Nonserial Dynamic Programming. Academic Press, New York.

Digital Library

[6]

Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts.

Digital Library

[7]

Boutilier, C., Dean, T., & Hanks, S. (1999). Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1 - 94.

Digital Library

[8]

Boutilier, C., & Dearden, R. (1996). Approximating value trees in structured dynamic programming. In Proc. ICML, pp. 54-62.

[9]

Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proc. IJCAI, pp. 1104-1111.

Digital Library

[10]

Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121 (1-2), 49-107.

Digital Library

[11]

Cheney, E. W. (1982). Approximation Theory (2nd edition). Chelsea Publishing Co., New York, NY.

[12]

de Farias, D., & Van Roy, B. (2001a). The linear programming approach to approximate dynamic programming. Submitted to Operations Research.

Digital Library

[13]

de Farias, D., & Van Roy, B. (2001b). On constraint sampling for the linear programming approach to approximate dynamic programming. To appear in Mathematics of Operations Research.

Digital Library

[14]

Dean, T., Kaelbling, L. P., Kirman, J., & Nicholson, A. (1993). Planning with deadlines in stochastic domains. In Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93), pp. 574-579, Washington, D.C. AAAI Press.

Digital Library

[15]

Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5 (3), 142-150.

Digital Library

[16]

Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI- 97), pp. 106-111, Providence, Rhode Island, Oregon. AAAI Press.

Digital Library

[17]

Dearden, R., & Boutilier, C. (1997). Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89 (1), 219-283.

Digital Library

[18]

Dechter, R. (1999). Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113 (1-2), 41-85.

Digital Library

[19]

Gordon, G. (1995). Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 261-268, Tahoe City, CA. Morgan Kaufmann.

Digital Library

[20]

Guestrin, C. E., Koller, D., & Parr, R. (2001a). Max-norm projections for factored MDPs. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 673-680, Seattle, Washington. Morgan Kaufmann.

Digital Library

[21]

Guestrin, C. E., Koller, D., & Parr, R. (2001b). Multiagent planning with factored MDPs. In 14th Neural Information Processing Systems (NIPS-14), pp. 1523-1530, Vancouver, Canada.

[22]

Guestrin, C. E., Koller, D., & Parr, R. (2001c). Solving factored POMDPs with linear value functions. In Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01) workshop on Planning under Uncertainty and Incomplete Information, pp. 67-75, Seattle, Washington.

[23]

Guestrin, C. E., Venkataraman, S., & Koller, D. (2002). Context specific multiagent coordination and planning with factored MDPs. In The Eighteenth National Conference on Artificial Intelligence (AAAI-2002), pp. 253-259, Edmonton, Canada.

Digital Library

[24]

Hoey, J., St-Aubin, R., Hu, A., & Boutilier, C. (1999). SPUDD: Stochastic planning using decision diagrams. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 279-288, Stockholm, Sweden. Morgan Kaufmann.

Digital Library

[25]

Hoey, J., St-Aubin, R., Hu, A., & Boutilier, C. (2002). Stochastic planning using decision diagrams - C implementation. http://www.cs.ubc.ca/spider/staubin/Spudd/.

[26]

Howard, R. A., & Matheson, J. E. (1984). Influence diagrams. In Howard, R. A., & Matheson, J. E. (Eds.), Readings on the Principles and Applications of Decision Analysis, pp. 721-762. Strategic Decisions Group, Menlo Park, California.

[27]

Keeney, R. L., & Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York.

[28]

Kim, K.-E., & Dean, T. (2001). Solving factored Mdps using non-homogeneous partitioning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 683-689, Seattle, Washington. Morgan Kaufmann.

Digital Library

[29]

Kjaerulff, U. (1990). Triangulation of graphs - algorithms giving small total state space. Tech. rep. TR R 90-09, Department of Mathematics and Computer Science, Strandvejen, Aalborg, Denmark.

[30]

Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pp. 1332-1339. Morgan Kaufmann.

Digital Library

[31]

Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-00), pp. 326- 334, Stanford, California. Morgan Kaufmann.

Digital Library

[32]

Meuleau, N., Hauskrecht, M., Kim, K., Peshkin, L., Kaelbling, L., Dean, T., & Boutilier, C. (1998). Solving very large weakly-coupled Markov decision processes. In Proceedings of the 15th National Conference on Artificial Intelligence, pp. 165-172, Madison, WI.

Digital Library

[33]

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference . Morgan Kaufmann, San Mateo, California.

Digital Library

[34]

Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming . Wiley, New York.

Digital Library

[35]

Reed, B. (1992). Finding approximate separators and computing tree-width quickly. In 24th Annual Symposium on Theory of Computing, pp. 221-228. ACM.

Digital Library

[36]

Schuurmans, D., & Patrascu, R. (2001). Direct value-approximation for factored MDPs. In Advances in Neural Information Processing Systems (NIPS-14), pp. 1579-1586, Vancouver, Canada.

[37]

Schweitzer, P., & Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110, 568 - 582.

[38]

Simon, H. A. (1981). The Sciences of the Artificial (second edition). MIT Press, Cambridge, Massachusetts.

Digital Library

[39]

Singh, S., & Cohn, D. (1998). How to dynamically merge Markov decision processes. In Jordan, M. I., Kearns, M. J., & Solla, S. A. (Eds.), Advances in Neural Information Processing Systems, Vol. 10. The MIT Press.

Digital Library

[40]

St-Aubin, R., Hoey, J., & Boutilier, C. (2001). APRICODD: Approximate policy construction using decision diagrams. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, pp. 1089-1095, Denver, Colorado. MIT Press.

[41]

Stiefel, E. (1960). Note on Jordan elimination, linear programming and Tchebycheff approximation. Numerische Mathematik, 2, 1-17.

Digital Library

[42]

Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.

Digital Library

[43]

Tadepalli, P., & Ok, D. (1996). Scaling up average reward reinforcmeent learning by approximating the domain models and the value function. In Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy. Morgan Kaufmann.

Digital Library

[44]

Tatman, J. A., & Shachter, R. D. (1990). Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man and Cybernetics, 20 (2), 365-379.

[45]

Tsitsiklis, J. N., & Van Roy, B. (1996a). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.

Digital Library

[46]

Tsitsiklis, J. N., & Van Roy, B. (1996b). An analysis of temporal-difference learning with function approximation. Technical report LIDS-P-2322, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.

[47]

Van Roy, B. (1998). Learning and Value Function Approximation in Complex Decision Processes. Ph.D. thesis, Massachusetts Institute of Technology.

Digital Library

[48]

Williams, R. J., & Baird, L. C. I. (1993). Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep., College of Computer Science, Northeastern University, Boston, Massachusetts.

[49]

Yedidia, J., Freeman, W., & Weiss, Y. (2001). Generalized belief propagation. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, pp. 689-695, Denver, Colorado. MIT Press.

[50]

Zhang, N., & Poole, D. (1999). On the role of context-specific independence in probabilistic reasoning. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pp. 1288-1293. Morgan Kaufmann.

Digital Library

Cited By

Bose MParuchuri PKumar ADastani MSichman JAlechina NDignum V(2024)Factored MDP based Moving Target Defense with Dynamic Threat ModelingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663095(2165-2167)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663095
Feng FMagliacane SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning dynamic attribute-factored world models for efficient multi-object reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666960(19117-19144)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666960
Chen BHawkins CKarabag MNeary CHale MTopcu UEvans RShpitser I(2023)Differential privacy in cooperative multiagent planningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625867(347-357)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625867
Show More Cited By

Efficient solution algorithms for factored MDPs
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques

Recommendations

Efficient approximate linear programming for factored MDPs

Factored Markov Decision Processes (MDPs) provide a compact representation for modeling sequential decision making problems with many variables. Approximate linear programming (LP) is a prominent method for solving factored MDPs. However, it cannot be ...
Symmetric approximate linear programming for factored MDPs with application to constrained problems

A weakness of classical Markov decision processes (MDPs) is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward ...
Efficient solutions to factored MDPs with imprecise transition probabilities

When modeling real-world decision-theoretic planning problems in the Markov Decision Process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Artificial Intelligence Research

Journal of Artificial Intelligence Research Volume 19, Issue 1

July 2003

652 pages

ISSN:1076-9757

Issue’s Table of Contents

Publisher

AI Access Foundation

El Segundo, CA, United States

Publication History

Published: 01 October 2003

Received: 01 January 2002

Published in JAIR Volume 19, Issue 1

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

137
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bose MParuchuri PKumar ADastani MSichman JAlechina NDignum V(2024)Factored MDP based Moving Target Defense with Dynamic Threat ModelingProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663095(2165-2167)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663095
Feng FMagliacane SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning dynamic attribute-factored world models for efficient multi-object reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666960(19117-19144)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666960
Chen BHawkins CKarabag MNeary CHale MTopcu UEvans RShpitser I(2023)Differential privacy in cooperative multiagent planningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625867(347-357)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625867
Castellini ABianchi FZorzi ESimão TFarinelli ASpaan MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Scalable safe policy improvement via monte carlo tree searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618560(3732-3756)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618560
Yu JLin MRahmani HShell DO’Kane J(2023)Planning to chronicleInternational Journal of Robotics Research10.1177/0278364921106915442:6(412-432)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1177/02783649211069154
Zhang YQu GXu PLin YChen ZWierman A(2023)Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement LearningACM SIGMETRICS Performance Evaluation Review10.1145/3606376.359354551:1(83-84)Online publication date: 27-Jun-2023
https://dl.acm.org/doi/10.1145/3606376.3593545
Zhang YQu GXu PLin YChen ZWierman A(2023)Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement LearningProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794437:1(1-51)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3579443
Zhang YQu GXu PLin YChen ZWierman ASmirni EAvrachenkov KGill PUrgaonkar B(2023)Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement LearningAbstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems10.1145/3578338.3593545(83-84)Online publication date: 19-Jun-2023
https://dl.acm.org/doi/10.1145/3578338.3593545
Saengkyongam SThams NPeters JPfister N(2023)Invariant Policy Learning: A Causal PerspectiveIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.323236345:7(8606-8620)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1109/TPAMI.2022.3232363
Yu ZZhang JWen ZTacchetti AWang MGemp I(2023)Teamwork Reinforcement Learning With Concave UtilitiesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331512023:5(5709-5721)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1109/TMC.2023.3315120
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents