A Stochastic Point-Based Algorithm for POMDPs

François Laviolette¹ &
Ludovic Tobin¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5032))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1472 Accesses

Abstract

We introduce a new backup operator for point-based POMDP algorithms which performs a look-ahead search at depth greater than one. We apply this operator into a new algorithm, called Stochastic Search Value Iteration (SSVI). This new algorithm relies on stochastic exploration of the environment in order to update the value function. This is in opposition with existing POMDP point-based algorithms. The underlying ideas on which SSVI is based are very similar to temporal difference learning algorithms for MDPs. In particular, SSVI takes advantage of a soft-max action selection function and of the random character of the environment itself. Empirical results on usual benchmark problems show that our algorithm performs a bit better and a bit faster than HSVI2, the state of the art algorithm. This suggests that stochastic algorithms are an alternative for solving large POMDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm

A Fast Elimination Method for Pruning in POMDPs

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

References

Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for pomdps. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 1025–1032 (2003)
Google Scholar
Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI 2004), Banff, Canada (2004)
Google Scholar
Vlassis, N., Spaan, M.T.J.: A fast point-based algorithm for POMDPs. In: Benelearn 2004: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands, Brussels, Belgium, January 2004, pp. 170–176 (2003); (Also presented at the NIPS-16 workshop ‘Planning for the Real-World’, Whistler, Canada, December 2003)
Google Scholar
Paquet, S., Tobin, L., Chaib-draa, B.: An Online POMDP Algorithm for Complex Multiagent Environments. In: Proceedings of The fourth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS 2005), Utrecht, The Netherlands (2005)
Google Scholar
Paquet, S., Chaib-draa, B., Ross, S.: Hybrid POMDP Algorithms. In: Proceedings of The Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSMD 2006), Hakodate, Hokkaido, Japan (2006)
Google Scholar
Ross, S., Chaib-draa, B.: AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs. In: Proceedings of The 20th Joint Conference in Artificial Intelligence (IJCAI 2007), Hyderabad, India (2007)
Google Scholar
Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University (1971)
Google Scholar
Smith, T., Simmons, R.: Point-based POMDP Algorithms: Improved Analysis and Implementation. In: Proceedings of the 21th Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, Scotland (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Laval University, Quebec, Canada
François Laviolette & Ludovic Tobin

Authors

François Laviolette
View author publications
You can also search for this author in PubMed Google Scholar
Ludovic Tobin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sabine Bergler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laviolette, F., Tobin, L. (2008). A Stochastic Point-Based Algorithm for POMDPs. In: Bergler, S. (eds) Advances in Artificial Intelligence. Canadian AI 2008. Lecture Notes in Computer Science(), vol 5032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68825-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-68825-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68821-1
Online ISBN: 978-3-540-68825-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Stochastic Point-Based Algorithm for POMDPs

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm

A Fast Elimination Method for Pruning in POMDPs

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Stochastic Point-Based Algorithm for POMDPs

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

POPVI: A Probability-Based Optimal Policy Value Iteration Algorithm

A Fast Elimination Method for Pruning in POMDPs

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation