Abstract
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. In many cases, we have developed new ways of viewing the problem that are, perhaps, more consistent with the AI perspective. We begin by introducing the theory of Markov decision processes (MDPs) and partially observable Markov decision processes POMDPs. We then outline a novel algorithm for solving POMDPs off line and show how, in many cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a simple example.
Preview
Unable to display preview. Download preview PDF.
References
Astrom, K. J. (1965), Optimal control of Markov decision processes with incomplete state estimation, Journal of Mathematical Analysis and Applications 10, 174–205
Boutilier, C. & Dearden, R. (1994), Using abstractions for decision theoretic planning with time constraints, in Proceedings of the Twelfth National Conference on Artificial Intelligence
Boutilier, C., Dearden, R. & Goldszmidt, M. (1995), Exploiting structure in policy construction, in Proceedings of the International Joint Conference on Artificial Intelligence
Cassandra, A. R. (1994), Algorithms for partially observable Markov decision processes, Technical Report CS-94-14, Brown University, Providence, Rhode Island
Cheng, H.-T. (1988), Algorithms for Partially Observable Markov Decision Processes, PhD thesis, University of British Columbia, British Columbia, Canada
Dean, T., Kaelbling, L. P., Kirman, J. & Nicholson, A. (To Appear), Planning under time constraints in stochastic domains, Artificial Intelligence
Drummond, M. & Bresina, J. (1990), Anytime synthetic projection, in Proceedings of the Eighth National Conference on Artificial Intelligence, Morgan Kaufmann, pp. 138–144
Howard, R. A. (1960), Dynamic Programming and Markov Processes, The MIT Press, Cambridge, Massachusetts
Kalman, R. E. (1960), A new approach to linear filtering and prediction problems, Transactions of the American Society of Mechanical Engineers Journal of Basic Engineering 82, 35–45
Kushmerick, N., Hanks, S. & Weld, D. (1993), An Algorithm for Probabilistic Planning, Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence.
Littman, M. L. (1994), The witness algorithm: Solving partially observable Markov decision processes, Technical Report CS-94-40, Brown University, Providence, Rhode Island
Littman, M. L., Cassandra, A. R. & Kaelbling, L. P. (1995 a), An efficient algorithm for dynamic programming in partially observable Markov decision processes, Technical Report CS-95-19, Brown University, Providence, Rhode Island
Littman, M. L., Cassandra, A. R. & Kaelbling, L. P. (1995 b), Learning policies for partially observable environments: Scaling up, in Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann
Lovejoy, W. S. (1991), A survey of algorithmic methods for partially observed Markov decision processes, Annals of Operations Research 28(1), 47–65
Monahan, G. E. (1982), A survey of partially observable Markov decision processes: Theory, models, and algorithms, Management Science 28(1), 1–16
Moore, R. C. (1985), A formal theory of knowledge and action, in J. R. Hobbs & R. C. Moore (eds.) Formal Theories of the Commonsense World, Ablex Publishing Company, Norwood, New Jersey
Puterman, M. L. (1994), Markov Decision Processes, John Wiley & Sons, New York
Singh, S. P., Jaakkola, T. & Jordan, M.I. (1994), Model-free reinforcement learning for non-Markovian decision problems, in Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, San Francisco, California, pp. 284–292
Smallwood, R. D. & Sondik, E. J. (1973), The optimal control of partially observable Markov processes over a finite horizon, Operations Research 21, 1071–1088
Sondik, E. J. (1978), The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs, Operations Research 26(2), 282–304
Tseng, P. (1990), Solving H-horizon, stationary Markov decision problems in time proportional to log (H), Operations Research Letters 9(5), 287–297
White, III, C. C. (1991), Partially observed Markov decision processes: A survey, Annals of Operations Research
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaelbling, L.P., Littman, M.L., Cassandra, A.R. (1996). Partially observable markov decision processes for artificial intelligence. In: Dorst, L., van Lambalgen, M., Voorbraak, F. (eds) Reasoning with Uncertainty in Robotics. RUR 1995. Lecture Notes in Computer Science, vol 1093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0013957
Download citation
DOI: https://doi.org/10.1007/BFb0013957
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61376-3
Online ISBN: 978-3-540-68506-7
eBook Packages: Springer Book Archive