Reasoning and predicting POMDP planning complexity via covering numbers

Zongzhang Zhang^1,2,
Qiming Fu^3,4,
Xiaofang Zhang^1,2 &
…
Quan Liu^1,2,4

108 Accesses
4 Citations
Explore all metrics

Abstract

Partially observable Markov decision processes (POMDPs) provide a rich mathematical framework for planning tasks in partially observable stochastic environments. The notion of the covering number, a metric of capturing the search space size of a POMDP planning problem, has been proposed as a complexity measure of approximate POMDP planning. Existing theoretical results are based on POMDPs with finite and discrete state spaces and measured in the l ₁-metric space. When considering heuristics, they are assumed to be always admissible. This paper extends the theoretical results on the covering numbers of different search spaces, including the newly defined space reachable under inadmissible heuristics, to the l _n-metric spaces. We provide a simple but scalable algorithm for estimating covering numbers. Experimentally, we provide estimated covering numbers of the search spaces reachable by following different policies on several benchmark problems, and analyze their abilities to predict the runtime of POMDP planning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enforcing Almost-Sure Reachability in POMDPs

POMDP Controllers with Optimal Budget

Complexity Bounds for Deterministic Partially Observed Markov Decision Processes

Article 30 October 2024

References

Kaelbling L, Littman M, Cassandra A. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998, 101(1–2): 99–134
Article MathSciNet MATH Google Scholar
Madani O, Hanks S, Condon A. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proceedings of the 16th National Conference on Artificial Intelligence (AAAI). 1999, 541–548
Google Scholar
Smith T, Simmons R. Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of International Conference on Uncertainty in Artificial Intelligence (UAI). 2005, 542–547
Google Scholar
Pineau J, Gordon G, Thrun S. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 2006, 27: 335–380
MATH Google Scholar
Kurniawati H, Hsu D, Lee W S. SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proceedings of Robotics: Science and Systems (RSS). 2008
Google Scholar
Zhang Z, Chen X. Accelerating point-based POMDP algorithms via greedy strategies. In: Proceedings of the 2nd International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). 2010, 545–556
Chapter Google Scholar
Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers. Journal of Autonomous Agents andMulti-Agent Systems, 2013, 27(1): 1–51
Article Google Scholar
Zhang Z, Hsu D, Lee W S. Covering number for efficient heuristicbased POMDP planning. In: Proceedings of the 31st International Conference on Machine Learning (ICML). 2014, 26–34
Google Scholar
Zhang Z, Hsu D, Lee WS, Lim ZW, Bai A. PLEASE: plam leaf search for POMDPs with large observation spaces. In: Proceedings of the 8th Symposium on Combinatorial Search. 2015, 249–257
Google Scholar
Hsu D, Lee W S, Rong N. What makes some POMDP problems easy to approximate. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2007, 689–696
Google Scholar
Zhang Z, Littman M, Chen X. Covering number as a complexity measure for POMDP planning and learning. In: Proceedings of the 26th National Conference on Artificial Intelligence (AAAI). 2012, 1853–1859
Google Scholar
Littman M, Cassandra A, Kaelbling L. Learning policies for partially observable environments: scaling up. In: Proceedings of the 12th International Conference on Machine Learning (ICML). 1995, 362–370
Google Scholar
Hauskrecht M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2000, 13: 33–94
MathSciNet MATH Google Scholar
Ross S, Pineau J, Paquet S, Chaib-Draa B. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 2008, 32: 663–704
MathSciNet MATH Google Scholar
Bellman R. Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957
MATH Google Scholar
Smith T. Probabilistic planning for robotic exploration. Dissertation for the Doctoral Degree. Pittsburgh: Carnegie Mellon University, 2007
Google Scholar
Kakade S, Kearns M, Langford J. Exploration in metric state spaces. In: Proceedings of International Conference on Machine Learning (ICML). 2003, 306–312
Google Scholar
Cassandra A, Littman M, Zhang N L. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI). 1997, 54–61
Google Scholar
Bonet B, Geffner H. Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2009, 1641–1646
Google Scholar
Ng A, Jordan M. PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI). 2000, 406–415
Google Scholar
Zhang Z, Chen X. FHHOP: a factored hybrid heuristic online planning algorithm for large POMDPs. 2012, arXiv preprint arXiv: 1210.4912
Google Scholar
Somani A, Ye N, Hsu D, Lee WS. DESPOT: online POMDP planning with regularization. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2013, 1772–1780
Google Scholar
Pineau J, Gordon G, Thrun S. Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2003, 1025–1032
Google Scholar
Poupart P, Kim K E, Kim D. Closing the gap: improved bounds on optimal POMDP solutions. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2011, 194–201
Google Scholar
Silver D, Veness J. Monte-Carlo planning in large POMDPs. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2010, 2164–2172
Google Scholar
Bai A, Wu F, Zhang Z, Chen X. Thompson sampling based Monte-Carlo planning in POMDPs. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2014, 28–36
Google Scholar
Ong S C, Png S W, Hsu D, Lee W S. Planning under uncertainty for robotic tasks with mixed observability. International Journal of Robotics Research, 2010, 29(8): 1053–1068
Article Google Scholar
Seuken S, Zilberstein S. Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-Agent Systems, 2008, 17(2): 190–250
Article Google Scholar
Oliehoek F. Sufficient plan-time statistics for decentralized POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 302–308
Google Scholar
Dibangoye J, Amato C, Buffet O, Charpillet F. Optimally solving Dec-POMDPs as continuous-state MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 90–96
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215000, China
Zongzhang Zhang, Xiaofang Zhang & Quan Liu
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210000, China
Zongzhang Zhang, Xiaofang Zhang & Quan Liu
School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215000, China
Qiming Fu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
Qiming Fu & Quan Liu

Authors

Zongzhang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiming Fu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan Liu.

Additional information

Zongzhang Zhang received his PhD degree in computer science from University of Science and Technology of China, China in 2012. He is currently an associate professor at Soochow University, China. He worked as a research fellow at National University of Singapore, Singapore from 2012 to 2014 and as a visiting scholar at Rutgers University, USA from 2010 to 2011. His research directions include POMDPs, reinforcement learning, and multi-agent systems.

Qiming Fu received his PhD degree in computer science from Soochow University, China in 2014. He works as a lecturer at Soochow University of Science and Technology, China. His main research interests include reinforcement learning and Bayesian methods.

Xiaofang Zhang received her PhD degree in software engineering from Southeast University, China in 2008. She is now an associate professor in School of Computer Science and Technology at Soochow University, China. Her main research interests include software testing and reinforcement learning.

Quan Liu is now a professor and PhD supervisor in School of Computer Science and Technology at Soochow University, China. He received his PhD degree at Jilin University, China in 2004. He worked as a post-doctor at Nanjing University, China from 2006-2008. He is also a senior member of China Computer Federation. His main research interests include reinforcement learning, intelligence information processing, and automated reasoning.

Electronic supplementary material

Supplementary material, approximately 381 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Fu, Q., Zhang, X. et al. Reasoning and predicting POMDP planning complexity via covering numbers. Front. Comput. Sci. 10, 726–740 (2016). https://doi.org/10.1007/s11704-015-5038-5

Download citation

Received: 26 January 2015
Accepted: 18 August 2015
Published: 19 January 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11704-015-5038-5

Reasoning and predicting POMDP planning complexity via covering numbers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enforcing Almost-Sure Reachability in POMDPs

POMDP Controllers with Optimal Budget

Complexity Bounds for Deterministic Partially Observed Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 381 KB.

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Reasoning and predicting POMDP planning complexity via covering numbers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enforcing Almost-Sure Reachability in POMDPs

POMDP Controllers with Optimal Budget

Complexity Bounds for Deterministic Partially Observed Markov Decision Processes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 381 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation