Abstract
We introduce a new backup operator for point-based POMDP algorithms which performs a look-ahead search at depth greater than one. We apply this operator into a new algorithm, called Stochastic Search Value Iteration (SSVI). This new algorithm relies on stochastic exploration of the environment in order to update the value function. This is in opposition with existing POMDP point-based algorithms. The underlying ideas on which SSVI is based are very similar to temporal difference learning algorithms for MDPs. In particular, SSVI takes advantage of a soft-max action selection function and of the random character of the environment itself. Empirical results on usual benchmark problems show that our algorithm performs a bit better and a bit faster than HSVI2, the state of the art algorithm. This suggests that stochastic algorithms are an alternative for solving large POMDPs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for pomdps. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 1025–1032 (2003)
Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI 2004), Banff, Canada (2004)
Vlassis, N., Spaan, M.T.J.: A fast point-based algorithm for POMDPs. In: Benelearn 2004: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands, Brussels, Belgium, January 2004, pp. 170–176 (2003); (Also presented at the NIPS-16 workshop ‘Planning for the Real-World’, Whistler, Canada, December 2003)
Paquet, S., Tobin, L., Chaib-draa, B.: An Online POMDP Algorithm for Complex Multiagent Environments. In: Proceedings of The fourth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS 2005), Utrecht, The Netherlands (2005)
Paquet, S., Chaib-draa, B., Ross, S.: Hybrid POMDP Algorithms. In: Proceedings of The Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSMD 2006), Hakodate, Hokkaido, Japan (2006)
Ross, S., Chaib-draa, B.: AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs. In: Proceedings of The 20th Joint Conference in Artificial Intelligence (IJCAI 2007), Hyderabad, India (2007)
Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University (1971)
Smith, T., Simmons, R.: Point-based POMDP Algorithms: Improved Analysis and Implementation. In: Proceedings of the 21th Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, Scotland (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laviolette, F., Tobin, L. (2008). A Stochastic Point-Based Algorithm for POMDPs. In: Bergler, S. (eds) Advances in Artificial Intelligence. Canadian AI 2008. Lecture Notes in Computer Science(), vol 5032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68825-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-68825-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68821-1
Online ISBN: 978-3-540-68825-9
eBook Packages: Computer ScienceComputer Science (R0)