Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Christopher Amato¹,
Daniel S. Bernstein¹ &
Shlomo Zilberstein¹

480 Accesses
3 Altmetric
Explore all metrics

Abstract

POMDPs and their decentralized multiagent counterparts, DEC-POMDPs, offer a rich framework for sequential decision making under uncertainty. Their high computational complexity, however, presents an important research challenge. One way to address the intractable memory requirements of current algorithms is based on representing agent policies as finite-state controllers. Using this representation, we propose a new approach that formulates the problem as a nonlinear program, which defines an optimal policy of a desired size for each agent. This new formulation allows a wide range of powerful nonlinear programming algorithms to be used to solve POMDPs and DEC-POMDPs. Although solving the NLP optimally is often intractable, the results we obtain using an off-the-shelf optimization method are competitive with state-of-the-art POMDP algorithms and outperform state-of-the-art DEC-POMDP algorithms. Our approach is easy to implement and it opens up promising research directions for solving POMDPs and DEC-POMDPs using nonlinear programming methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactive POMDPs with finite-state models of other agents

Article 25 January 2017

Multi-agent Planning with High-Level Human Guidance

Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement

Article Open access 10 June 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In: Proceedings of the twentieth international joint conference on artificial intelligence. (pp. 2418–2424.) Hyderabad, India.
Becker R., Zilberstein S., Lesser V., Goldman C.V. (2004) Solving transition-independent decentralized Markov decision processes. Journal of AI Research 22: 423–455
MATH MathSciNet Google Scholar
Bernstein, D. S., Hansen, E., & Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In: Proceedings of the nineteenth international joint conference on artificial intelligence. (pp. 1287–1292). Edinburgh, Scotland.
Bernstein D.S., Amato C., Hansen E.A., Zilberstein S. (2009) Policy iteration for decentralized control of Markov decision processes. Journal of AI Research 34: 89–132
MATH MathSciNet Google Scholar
Bertsekas D.P. (2004) Nonlinear programming. Belmont, MA, Athena Scientific
Google Scholar
Cassandra, A. R. (1998a). A survey of POMDP applications. In: AAAI fall symposium: Planning with POMDPs. Orlando, FL.
Cassandra, A. R. (1998b). Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis. Brown University Providence, RI.
Eckles J.E. (1968) Optimum maintenance with incomplete information. Operations Research 16: 1058–1067
Article Google Scholar
Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2004). Approximate solutions for partially observable stochastic games with common payoffs. In: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 136–143). New York, NY.
Gill P. E., Murray W., Saunders M. (2005) Snopt: An SQP algorithm for large-scale constrained optimization. SIAM Review 47: 99–131
Article MATH MathSciNet Google Scholar
Hansen, E. A. (1998). Solving POMDPs by searching in policy space. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence. (pp. 211–219). Madison, WI.
Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In: Proceedings of the nineteenth national conference on artificial intelligence. (pp. 709–715). San Jose, CA.
Hauskrecht, M., & Fraser, H. (1998). Modeling treatment of ischemic heart disease with partially observable Markov decision processes. In: Proceedings of American medical informatics association annual symposium on computer applications in health care. (pp. 538–542). Orlando, Florida.
Horst R., Tuy H. (1996) Global optimization: Deterministic approaches. Springer, New York
MATH Google Scholar
Ji, S., Parr, R., Li, H., Liao, X., & Carin, L. (2007). Point-based policy iteration. In: Proceedings of the twenty-second national conference on artificial intelligence. (pp. 1243–1249). Vancouver, Canada.
Littman M.L., Cassandra A.R., Kaelbling L.P. (1995) Learning policies for partially observable environments: Scaling up. Technical report CS-95-11. Brown University, Department of Computer Science, Providence, RI
Google Scholar
Marecki, J., Gupta, T., Varakantham, P., Tambe, M., & Yokoo, M. (2008). Not all agents are equal: Scaling up distributed POMDPs for agent networks. In: Proceedings of the seventh international joint conference on autonomous agents and multiagent systems. (pp. 485–492). Estoril, Portugal.
Meuleau, N., Kim, K. E., Kaelbling, L. P., & Cassandra, A. R. (1999). Solving POMDPs by searching the space of finite policies. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. (pp. 417–426). Stockholm, Sweden.
Nair, R., Pynadath, D., Yokoo, M., Tambe, M., & Marsella, S. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In: Proceedings of the nineteenth international joint conference on artificial intelligence. (pp. 705–711). Acapulco, Mexico.
Petrik, M., & Zilberstein, S. (2007). Average-reward decentralized Markov decision processes. In Proceedings of the twentieth international joint conference on artificial intelligence (pp. 1997–2002). Hyderabad, India.
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of the eighteenth international joint conference on artificial intelligence. (pp. 1025–1032). Acapulco, Mexico.
Poupart, P. (2005). Exploiting structure to efficiently solve large scale partially observable Markov decision processes. PhD thesis. University of Toronto.
Poupart P., Boutilier C. (2004) Bounded finite state controllers. In: Thrun S., Saul L., Schölkopf B. (eds) Advances in neural information processing systems 16. MIT Press, Cambridge, MA
Google Scholar
Seuken, S., & Zilberstein, S. (2007a). Memory-bounded dynamic programming for DEC-POMDPs. In: Proceedings of the twentieth international joint conference on artificial intelligence. (pp. 2009–2015). Hyderabad, India.
Seuken, S., & Zilberstein, S. (2007b). Improved memory-bounded dynamic programming for decentralized POMDPs. In: Proceedings of the twenty-third conference on uncertainty in artificial intelligence. Vancouver, Canada.
Simmons, R., & Koenig, S. (1995). Probabilistic navigation in partially observable environments. In: Proceedings of the fourteenth international joint conference on artificial intelligence. (pp. 1080–1087). Montral, Canada.
Singh, S., Jaakkola, T., & Jordan, M. (1994). Learning without state-estimation in partially observable Markovian decision processes. In: Proceedings of the eleventh international conference on machine learning. (pp. 284–292). New Brunswick, NJ.
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for POMDPs. In: Proceedings of the twentieth conference on uncertainty in artificial intelligence. (pp. 520–527). Banff, Canada.
Smith, T., & Simmons, R. (2005). Point-based POMDP algorithms: Improved analysis and implementation. In: Proceedings of the twenty-first conference on uncertainty in artificial intelligence. Edinburgh, Scotland.
Sondik, E. J. (1971). The optimal control of partially observable Markov processes. PhD thesis. Stanford University.
Spaan M. T. J., Vlassis N. (2005) Perseus: Randomized point-based value iteration for POMDPs. Journal of AI Research 24: 195–220
MATH Google Scholar
Sutton R. S., Barto A. G. (1995) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
Szer, D., & Charpillet, F. (2005). An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs. In: Proceedings of the sixteenth European conference on machine learning. (pp. 389–399). Porto, Portugal.
Szer, D., Charpillet, F., & Zilberstein, S. (2005). MAA*: A heuristic search algorithm for solving decentralized POMDPs. In: Proceedings of the twenty-first conference on uncertainty in artificial intelligence. (pp. 576–583). Edinburgh, Scotland.
Wah, B. W., & Chen, Y. (2005). Solving large-scale nonlinear programming problems by constraint partitioning. In: Proceedings of the eleventh international conference on principles and practice of constraint programming. (pp. 697–711).

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, Amherst, MA, 01003, USA
Christopher Amato, Daniel S. Bernstein & Shlomo Zilberstein

Authors

Christopher Amato
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Bernstein
View author publications
You can also search for this author in PubMed Google Scholar
Shlomo Zilberstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Amato.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amato, C., Bernstein, D.S. & Zilberstein, S. Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton Agent Multi-Agent Syst 21, 293–320 (2010). https://doi.org/10.1007/s10458-009-9103-z

Download citation

Published: 25 August 2009
Issue Date: November 2010
DOI: https://doi.org/10.1007/s10458-009-9103-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive POMDPs with finite-state models of other agents

Multi-agent Planning with High-Level Human Guidance

Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive POMDPs with finite-state models of other agents

Multi-agent Planning with High-Level Human Guidance

Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now