Robust partially observable Markov decision process
Pages 106 - 115
Abstract
We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.
References
[1]
Braziunas, D. and Boutilier, C. Stochastic local search for POMDP controllers. In Proceedings of the 19th National Conference on Artifical Intelligence (AAAI-04), pp. 690-696, July 2004.
[2]
Brechtel, S., Gindele, T., and Dillmann, R. Solving continuous POMDPs: Value iteration with incremental learning of an efficient space representation. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 370-378, June 2013.
[3]
Doshi-velez, F. The infinite partially observable Markov decision process. In Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 22, pp. 477-485. Curran Associates, Inc., 2009.
[4]
Hauskrecht, M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13(1):33-94, 2000.
[5]
Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., and Mihailidis, A. Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5):503-519, 2010.
[6]
Itoh, H. and Nakamura, K. Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence, 171(8-9):453-490, 2007.
[7]
Kurniawati, H., Hsu, D., and Lee, W. S. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces, pp. 65-72. MIT Press, 2009.
[8]
Makino, T. and Takeuchi, J. Apprenticeship learning for model parameters of partially observable environments. In Proceedings of the 29th International Conference on Machine Learning, 2012.
[9]
Monahan, G. E. A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28:1-16, 1982.
[10]
Motwani, R. and Raghavan, P. Randomized Algorithms. Cambridge University Press, 1995.
[11]
Ni, Y. and Liu, Z.-Q. Policy iteration for bounded-parameter POMDPs. Soft Computing, 17(4):537-548, 2012.
[12]
Ni, Y. and Liu, Z.-Q. Bounded-parameter partially observable Markov decision processes: Framework and algorithm. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 21(6):821-864, 2013.
[13]
Nilim, A. and El Ghaoui, L. Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5):780-798, 2005.
[14]
Osogami, T. Robustness and risk-sensitivity in Markov decision processes. In Advances in Neural Information Processing Systems 25, pp. 233-241. Curran Associates, Inc., December 2012.
[15]
Pineau, J., Gordon, G., and Thrun, S. Point-based value iteration: An anytime algorithm for POMDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 1025-1032, August 2003a.
[16]
Pineau, J., Montemerlo, M., Pollack, M., Roy, N., and Thrun, S. Towards robotic assistants in nursing homes: Challenges and results. Robotics and Autonomous Systems, 42:271-281, 2003b.
[17]
Poupart, P. Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. PhD thesis, Graduate Department of Computer Science, University of Toronto, 2005.
[18]
Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.
[19]
Ross, S., Chaib-draa, B., and Pineau, J. Bayes-adaptive POMDPs. In Platt, J. C., Koller, D., Singer, Y., and Roweis, S. T. (eds.), Advances in Neural Information Processing Systems 20, pp. 1225-1232. Curran Associates, Inc., 2008.
[20]
Ross, S., Pineau, J., Chaib-draa, B., and Kreitmann, P. A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research, 12:1729-1770, 2011.
[21]
Shani, G. and Meek, C. Improving existing fault recovery policies. In Bengio, Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 22, pp. 1642-1650. Curran Associates, Inc., 2009.
[22]
Shani, G., Brafman, R. I., and Shimony, S. E. Model-based online learning of POMDPs. In Proceedings of the 16th European Conference on Machine Learning, pp. 353-364, October 2005.
[23]
Shani, G., Brafman, R. I., and Shimony, S. E. Forward search value iteration for POMDPs. In Proceedings of the IJCAI-07, pp. 2619-2624, January 2007.
[24]
Shani, G., Pineau, J., and Kaplow, R. A survey of pointbased POMDP solvers. Autonomous Agents and Multi-Agent Systems, 27(1):1-51, 2013.
[25]
Smallwood, R. D. and Sondik, E. J. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5):1071-1088, 1973.
[26]
Smith, T. and Simmons, R. Heuristic search value iteration for pomdps. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 520-527, 2004.
[27]
Vlassis, N., Ghavamzadeh, M., Mannor, S., and Poupart, P. Bayesian reinforcement learning. In Wiering, M. and van Otterlo, M. (eds.), Reinforcement Learning: State of the Art, chapter 11. Springer Verlag, 2012.
[28]
Young, S., Gašic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., and Yu, K. The hidden information state model: A practical framework for POMDP based spoken dialogue management. Computer Speech and Language, 24:150-174, 2009.
[29]
Young, S., Gasic, M., Thomson, B., and Williams, J. D. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160-1179, May 2013.
[30]
Zhang, Z., Hsu, D., and Lee, W. S. Covering number for efficient heuristic-based POMDP planning. In Proceedings of the 31th International Conference on Machine Learning (ICML 2014), pp. 28-36, June 2014.
- Robust partially observable Markov decision process
Recommendations
Partially Observable Risk-Sensitive Markov Decision Processes
We consider the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite time horizon that is generated by a partially observable Markov decision process POMDP. In contrast to a risk-neutral decision ...
Partially Observable Markov Decision Process Approximations for Adaptive Sensing
Adaptive sensing involves actively managing sensor resources to achieve a sensing task, such as object detection, classification, and tracking, and represents a promising direction for new applications of discrete event system methods. We describe an ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
JMLR.org
Publication History
Published: 06 July 2015
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024