Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3045118.3045131guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Robust partially observable Markov decision process

Published: 06 July 2015 Publication History

Abstract

We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.

References

[1]
Braziunas, D. and Boutilier, C. Stochastic local search for POMDP controllers. In Proceedings of the 19th National Conference on Artifical Intelligence (AAAI-04), pp. 690-696, July 2004.
[2]
Brechtel, S., Gindele, T., and Dillmann, R. Solving continuous POMDPs: Value iteration with incremental learning of an efficient space representation. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 370-378, June 2013.
[3]
Doshi-velez, F. The infinite partially observable Markov decision process. In Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 22, pp. 477-485. Curran Associates, Inc., 2009.
[4]
Hauskrecht, M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13(1):33-94, 2000.
[5]
Hoey, J., Poupart, P., von Bertoldi, A., Craig, T., Boutilier, C., and Mihailidis, A. Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5):503-519, 2010.
[6]
Itoh, H. and Nakamura, K. Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence, 171(8-9):453-490, 2007.
[7]
Kurniawati, H., Hsu, D., and Lee, W. S. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces, pp. 65-72. MIT Press, 2009.
[8]
Makino, T. and Takeuchi, J. Apprenticeship learning for model parameters of partially observable environments. In Proceedings of the 29th International Conference on Machine Learning, 2012.
[9]
Monahan, G. E. A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28:1-16, 1982.
[10]
Motwani, R. and Raghavan, P. Randomized Algorithms. Cambridge University Press, 1995.
[11]
Ni, Y. and Liu, Z.-Q. Policy iteration for bounded-parameter POMDPs. Soft Computing, 17(4):537-548, 2012.
[12]
Ni, Y. and Liu, Z.-Q. Bounded-parameter partially observable Markov decision processes: Framework and algorithm. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 21(6):821-864, 2013.
[13]
Nilim, A. and El Ghaoui, L. Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5):780-798, 2005.
[14]
Osogami, T. Robustness and risk-sensitivity in Markov decision processes. In Advances in Neural Information Processing Systems 25, pp. 233-241. Curran Associates, Inc., December 2012.
[15]
Pineau, J., Gordon, G., and Thrun, S. Point-based value iteration: An anytime algorithm for POMDPs. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 1025-1032, August 2003a.
[16]
Pineau, J., Montemerlo, M., Pollack, M., Roy, N., and Thrun, S. Towards robotic assistants in nursing homes: Challenges and results. Robotics and Autonomous Systems, 42:271-281, 2003b.
[17]
Poupart, P. Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. PhD thesis, Graduate Department of Computer Science, University of Toronto, 2005.
[18]
Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.
[19]
Ross, S., Chaib-draa, B., and Pineau, J. Bayes-adaptive POMDPs. In Platt, J. C., Koller, D., Singer, Y., and Roweis, S. T. (eds.), Advances in Neural Information Processing Systems 20, pp. 1225-1232. Curran Associates, Inc., 2008.
[20]
Ross, S., Pineau, J., Chaib-draa, B., and Kreitmann, P. A Bayesian approach for learning and planning in partially observable Markov decision processes. Journal of Machine Learning Research, 12:1729-1770, 2011.
[21]
Shani, G. and Meek, C. Improving existing fault recovery policies. In Bengio, Y., Schuurmans, D., Lafferty, J. D., Williams, C. K. I., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 22, pp. 1642-1650. Curran Associates, Inc., 2009.
[22]
Shani, G., Brafman, R. I., and Shimony, S. E. Model-based online learning of POMDPs. In Proceedings of the 16th European Conference on Machine Learning, pp. 353-364, October 2005.
[23]
Shani, G., Brafman, R. I., and Shimony, S. E. Forward search value iteration for POMDPs. In Proceedings of the IJCAI-07, pp. 2619-2624, January 2007.
[24]
Shani, G., Pineau, J., and Kaplow, R. A survey of pointbased POMDP solvers. Autonomous Agents and Multi-Agent Systems, 27(1):1-51, 2013.
[25]
Smallwood, R. D. and Sondik, E. J. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5):1071-1088, 1973.
[26]
Smith, T. and Simmons, R. Heuristic search value iteration for pomdps. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 520-527, 2004.
[27]
Vlassis, N., Ghavamzadeh, M., Mannor, S., and Poupart, P. Bayesian reinforcement learning. In Wiering, M. and van Otterlo, M. (eds.), Reinforcement Learning: State of the Art, chapter 11. Springer Verlag, 2012.
[28]
Young, S., Gašic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., and Yu, K. The hidden information state model: A practical framework for POMDP based spoken dialogue management. Computer Speech and Language, 24:150-174, 2009.
[29]
Young, S., Gasic, M., Thomson, B., and Williams, J. D. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160-1179, May 2013.
[30]
Zhang, Z., Hsu, D., and Lee, W. S. Covering number for efficient heuristic-based POMDP planning. In Proceedings of the 31th International Conference on Machine Learning (ICML 2014), pp. 28-36, June 2014.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37
July 2015
2558 pages

Publisher

JMLR.org

Publication History

Published: 06 July 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media