Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Reasoning and predicting POMDP planning complexity via covering numbers

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Partially observable Markov decision processes (POMDPs) provide a rich mathematical framework for planning tasks in partially observable stochastic environments. The notion of the covering number, a metric of capturing the search space size of a POMDP planning problem, has been proposed as a complexity measure of approximate POMDP planning. Existing theoretical results are based on POMDPs with finite and discrete state spaces and measured in the l 1-metric space. When considering heuristics, they are assumed to be always admissible. This paper extends the theoretical results on the covering numbers of different search spaces, including the newly defined space reachable under inadmissible heuristics, to the l n-metric spaces. We provide a simple but scalable algorithm for estimating covering numbers. Experimentally, we provide estimated covering numbers of the search spaces reachable by following different policies on several benchmark problems, and analyze their abilities to predict the runtime of POMDP planning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kaelbling L, Littman M, Cassandra A. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998, 101(1–2): 99–134

    Article  MathSciNet  MATH  Google Scholar 

  2. Madani O, Hanks S, Condon A. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proceedings of the 16th National Conference on Artificial Intelligence (AAAI). 1999, 541–548

    Google Scholar 

  3. Smith T, Simmons R. Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of International Conference on Uncertainty in Artificial Intelligence (UAI). 2005, 542–547

    Google Scholar 

  4. Pineau J, Gordon G, Thrun S. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 2006, 27: 335–380

    MATH  Google Scholar 

  5. Kurniawati H, Hsu D, Lee W S. SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proceedings of Robotics: Science and Systems (RSS). 2008

    Google Scholar 

  6. Zhang Z, Chen X. Accelerating point-based POMDP algorithms via greedy strategies. In: Proceedings of the 2nd International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). 2010, 545–556

    Chapter  Google Scholar 

  7. Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers. Journal of Autonomous Agents andMulti-Agent Systems, 2013, 27(1): 1–51

    Article  Google Scholar 

  8. Zhang Z, Hsu D, Lee W S. Covering number for efficient heuristicbased POMDP planning. In: Proceedings of the 31st International Conference on Machine Learning (ICML). 2014, 26–34

    Google Scholar 

  9. Zhang Z, Hsu D, Lee WS, Lim ZW, Bai A. PLEASE: plam leaf search for POMDPs with large observation spaces. In: Proceedings of the 8th Symposium on Combinatorial Search. 2015, 249–257

    Google Scholar 

  10. Hsu D, Lee W S, Rong N. What makes some POMDP problems easy to approximate. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2007, 689–696

    Google Scholar 

  11. Zhang Z, Littman M, Chen X. Covering number as a complexity measure for POMDP planning and learning. In: Proceedings of the 26th National Conference on Artificial Intelligence (AAAI). 2012, 1853–1859

    Google Scholar 

  12. Littman M, Cassandra A, Kaelbling L. Learning policies for partially observable environments: scaling up. In: Proceedings of the 12th International Conference on Machine Learning (ICML). 1995, 362–370

    Google Scholar 

  13. Hauskrecht M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2000, 13: 33–94

    MathSciNet  MATH  Google Scholar 

  14. Ross S, Pineau J, Paquet S, Chaib-Draa B. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 2008, 32: 663–704

    MathSciNet  MATH  Google Scholar 

  15. Bellman R. Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957

    MATH  Google Scholar 

  16. Smith T. Probabilistic planning for robotic exploration. Dissertation for the Doctoral Degree. Pittsburgh: Carnegie Mellon University, 2007

    Google Scholar 

  17. Kakade S, Kearns M, Langford J. Exploration in metric state spaces. In: Proceedings of International Conference on Machine Learning (ICML). 2003, 306–312

    Google Scholar 

  18. Cassandra A, Littman M, Zhang N L. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI). 1997, 54–61

    Google Scholar 

  19. Bonet B, Geffner H. Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2009, 1641–1646

    Google Scholar 

  20. Ng A, Jordan M. PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI). 2000, 406–415

    Google Scholar 

  21. Zhang Z, Chen X. FHHOP: a factored hybrid heuristic online planning algorithm for large POMDPs. 2012, arXiv preprint arXiv: 1210.4912

    Google Scholar 

  22. Somani A, Ye N, Hsu D, Lee WS. DESPOT: online POMDP planning with regularization. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2013, 1772–1780

    Google Scholar 

  23. Pineau J, Gordon G, Thrun S. Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2003, 1025–1032

    Google Scholar 

  24. Poupart P, Kim K E, Kim D. Closing the gap: improved bounds on optimal POMDP solutions. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2011, 194–201

    Google Scholar 

  25. Silver D, Veness J. Monte-Carlo planning in large POMDPs. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2010, 2164–2172

    Google Scholar 

  26. Bai A, Wu F, Zhang Z, Chen X. Thompson sampling based Monte-Carlo planning in POMDPs. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2014, 28–36

    Google Scholar 

  27. Ong S C, Png S W, Hsu D, Lee W S. Planning under uncertainty for robotic tasks with mixed observability. International Journal of Robotics Research, 2010, 29(8): 1053–1068

    Article  Google Scholar 

  28. Seuken S, Zilberstein S. Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-Agent Systems, 2008, 17(2): 190–250

    Article  Google Scholar 

  29. Oliehoek F. Sufficient plan-time statistics for decentralized POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 302–308

    Google Scholar 

  30. Dibangoye J, Amato C, Buffet O, Charpillet F. Optimally solving Dec-POMDPs as continuous-state MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 90–96

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Liu.

Additional information

Zongzhang Zhang received his PhD degree in computer science from University of Science and Technology of China, China in 2012. He is currently an associate professor at Soochow University, China. He worked as a research fellow at National University of Singapore, Singapore from 2012 to 2014 and as a visiting scholar at Rutgers University, USA from 2010 to 2011. His research directions include POMDPs, reinforcement learning, and multi-agent systems.

Qiming Fu received his PhD degree in computer science from Soochow University, China in 2014. He works as a lecturer at Soochow University of Science and Technology, China. His main research interests include reinforcement learning and Bayesian methods.

Xiaofang Zhang received her PhD degree in software engineering from Southeast University, China in 2008. She is now an associate professor in School of Computer Science and Technology at Soochow University, China. Her main research interests include software testing and reinforcement learning.

Quan Liu is now a professor and PhD supervisor in School of Computer Science and Technology at Soochow University, China. He received his PhD degree at Jilin University, China in 2004. He worked as a post-doctor at Nanjing University, China from 2006-2008. He is also a senior member of China Computer Federation. His main research interests include reinforcement learning, intelligence information processing, and automated reasoning.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Fu, Q., Zhang, X. et al. Reasoning and predicting POMDP planning complexity via covering numbers. Front. Comput. Sci. 10, 726–740 (2016). https://doi.org/10.1007/s11704-015-5038-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-5038-5

Keywords

Navigation