Abstract
Partially observable Markov decision processes (POMDPs) provide a rich mathematical framework for planning tasks in partially observable stochastic environments. The notion of the covering number, a metric of capturing the search space size of a POMDP planning problem, has been proposed as a complexity measure of approximate POMDP planning. Existing theoretical results are based on POMDPs with finite and discrete state spaces and measured in the l 1-metric space. When considering heuristics, they are assumed to be always admissible. This paper extends the theoretical results on the covering numbers of different search spaces, including the newly defined space reachable under inadmissible heuristics, to the l n-metric spaces. We provide a simple but scalable algorithm for estimating covering numbers. Experimentally, we provide estimated covering numbers of the search spaces reachable by following different policies on several benchmark problems, and analyze their abilities to predict the runtime of POMDP planning algorithms.
Similar content being viewed by others
References
Kaelbling L, Littman M, Cassandra A. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998, 101(1–2): 99–134
Madani O, Hanks S, Condon A. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proceedings of the 16th National Conference on Artificial Intelligence (AAAI). 1999, 541–548
Smith T, Simmons R. Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of International Conference on Uncertainty in Artificial Intelligence (UAI). 2005, 542–547
Pineau J, Gordon G, Thrun S. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 2006, 27: 335–380
Kurniawati H, Hsu D, Lee W S. SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proceedings of Robotics: Science and Systems (RSS). 2008
Zhang Z, Chen X. Accelerating point-based POMDP algorithms via greedy strategies. In: Proceedings of the 2nd International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). 2010, 545–556
Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers. Journal of Autonomous Agents andMulti-Agent Systems, 2013, 27(1): 1–51
Zhang Z, Hsu D, Lee W S. Covering number for efficient heuristicbased POMDP planning. In: Proceedings of the 31st International Conference on Machine Learning (ICML). 2014, 26–34
Zhang Z, Hsu D, Lee WS, Lim ZW, Bai A. PLEASE: plam leaf search for POMDPs with large observation spaces. In: Proceedings of the 8th Symposium on Combinatorial Search. 2015, 249–257
Hsu D, Lee W S, Rong N. What makes some POMDP problems easy to approximate. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2007, 689–696
Zhang Z, Littman M, Chen X. Covering number as a complexity measure for POMDP planning and learning. In: Proceedings of the 26th National Conference on Artificial Intelligence (AAAI). 2012, 1853–1859
Littman M, Cassandra A, Kaelbling L. Learning policies for partially observable environments: scaling up. In: Proceedings of the 12th International Conference on Machine Learning (ICML). 1995, 362–370
Hauskrecht M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2000, 13: 33–94
Ross S, Pineau J, Paquet S, Chaib-Draa B. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 2008, 32: 663–704
Bellman R. Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957
Smith T. Probabilistic planning for robotic exploration. Dissertation for the Doctoral Degree. Pittsburgh: Carnegie Mellon University, 2007
Kakade S, Kearns M, Langford J. Exploration in metric state spaces. In: Proceedings of International Conference on Machine Learning (ICML). 2003, 306–312
Cassandra A, Littman M, Zhang N L. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI). 1997, 54–61
Bonet B, Geffner H. Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2009, 1641–1646
Ng A, Jordan M. PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI). 2000, 406–415
Zhang Z, Chen X. FHHOP: a factored hybrid heuristic online planning algorithm for large POMDPs. 2012, arXiv preprint arXiv: 1210.4912
Somani A, Ye N, Hsu D, Lee WS. DESPOT: online POMDP planning with regularization. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2013, 1772–1780
Pineau J, Gordon G, Thrun S. Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2003, 1025–1032
Poupart P, Kim K E, Kim D. Closing the gap: improved bounds on optimal POMDP solutions. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2011, 194–201
Silver D, Veness J. Monte-Carlo planning in large POMDPs. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2010, 2164–2172
Bai A, Wu F, Zhang Z, Chen X. Thompson sampling based Monte-Carlo planning in POMDPs. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2014, 28–36
Ong S C, Png S W, Hsu D, Lee W S. Planning under uncertainty for robotic tasks with mixed observability. International Journal of Robotics Research, 2010, 29(8): 1053–1068
Seuken S, Zilberstein S. Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-Agent Systems, 2008, 17(2): 190–250
Oliehoek F. Sufficient plan-time statistics for decentralized POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 302–308
Dibangoye J, Amato C, Buffet O, Charpillet F. Optimally solving Dec-POMDPs as continuous-state MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 90–96
Author information
Authors and Affiliations
Corresponding author
Additional information
Zongzhang Zhang received his PhD degree in computer science from University of Science and Technology of China, China in 2012. He is currently an associate professor at Soochow University, China. He worked as a research fellow at National University of Singapore, Singapore from 2012 to 2014 and as a visiting scholar at Rutgers University, USA from 2010 to 2011. His research directions include POMDPs, reinforcement learning, and multi-agent systems.
Qiming Fu received his PhD degree in computer science from Soochow University, China in 2014. He works as a lecturer at Soochow University of Science and Technology, China. His main research interests include reinforcement learning and Bayesian methods.
Xiaofang Zhang received her PhD degree in software engineering from Southeast University, China in 2008. She is now an associate professor in School of Computer Science and Technology at Soochow University, China. Her main research interests include software testing and reinforcement learning.
Quan Liu is now a professor and PhD supervisor in School of Computer Science and Technology at Soochow University, China. He received his PhD degree at Jilin University, China in 2004. He worked as a post-doctor at Nanjing University, China from 2006-2008. He is also a senior member of China Computer Federation. His main research interests include reinforcement learning, intelligence information processing, and automated reasoning.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, Z., Fu, Q., Zhang, X. et al. Reasoning and predicting POMDP planning complexity via covering numbers. Front. Comput. Sci. 10, 726–740 (2016). https://doi.org/10.1007/s11704-015-5038-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-015-5038-5