Article

Factorized asymptotic bayesian policy search for POMDPs

Authors:

Masaaki Imaizumi,

Ryohei FujimakiAuthors Info & Claims

IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

Pages 4346 - 4352

Published: 19 August 2017 Publication History

Abstract

This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highly-flexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted loglikelihood (MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recently-developed factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.

References

[1]

Pieter Abbeel, Adam Coates, and Andrew Y Ng. Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research , 2010.

Digital Library

[2]

Karl J Áström. Optimal control of markov processes with incomplete state information. Journal of Mathematical Analysis and Applications , 10(1):174-205, 1965.

[3]

Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Weiwei Cheng, and Eyke Hüllermeier. Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm. Machine Learning , 97(3):327-351, 2014.

Digital Library

[4]

Chenghui Cai, Xuejun Liao, and Lawrence Carin. Learning to explore and exploit in pomdps. In Advances in Neural Information Processing Systems , pages 198-206, 2009.

Digital Library

[5]

Jesus Capitan, Matthijs TJ Spaan, Luis Merino, and Anibal Ollero. Decentralized multi-robot cooperation with auctioned pomdps. The International Journal of Robotics Research , 32(6):650-671, 2013.

Digital Library

[6]

Hichem Debbi, Mustapha Bourahla, and Aimad Debbi. Medical treatment analysis using probabilistic model checking. International Journal of Biomedical Engineering and Technology , 12(4):346-359, 2013.

[7]

Marc Deisenroth and Carl E Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning , pages 465-472, 2011.

Digital Library

[8]

Finale Doshi-Velez, David Pfau, Frank Wood, and Nicholas Roy. Bayesian nonparametric methods for partially-observable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence , 37(2):394-407, 2015.

[9]

Riki Eto, Ryohei Fujimaki, Satoshi Morinaga, and Hiroshi Tamano. Fully-automatic bayesian piecewise sparse linear models. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics , pages 238-246, 2014.

[10]

Ryohei Fujimaki and Kohei Hayashi. Factorized asymptotic bayesian hidden markov models. In Proceedings of the 29th International Conference on Machine Learning , pages 799-806, 2012.

Digital Library

[11]

Ryohei Fujimaki and Satoshi Morinaga. Factorized asymptotic bayesian inference for mixture modeling. In International Conference on Artificial Intelligence and Statistics , pages 400-408, 2012.

[12]

Kohei Hayashi and Ryohei Fujimaki. Factorized asymptotic bayesian inference for latent feature models. In Advances in Neural Information Processing Systems , pages 1214-1222, 2013.

Digital Library

[13]

Kohei Hayashi, Shinichi Maeda, and Ryohei Fujimaki. Rebuilding factorized information criterion: Asymptotically accurate marginal likelihood. In Proceedings of the 32th International Conference on Machine Learning , page 1358-1366, 2015.

Digital Library

[14]

Ronald A Howard. Dynamic programming and Markov processes . MIT Press, 1960.

[15]

Athirai A Irissappane, Frans A Oliehoek, and Jie Zhang. A pomdp based approach to optimally select sellers in electronic marketplaces. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems , pages 1329-1336, 2014.

Digital Library

[16]

Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems , pages 1008-1014, 2000.

Digital Library

[17]

Sadanori Konishi and Genshiro Kitagawa. Information criteria and statistical modeling . Springer Science & Business Media, 2008.

Digital Library

[18]

Sergey Levine and Vladlen Koltun. Variational policy search via trajectory optimization. In Advances in Neural Information Processing Systems , pages 207-215, 2013.

Digital Library

[19]

Chunchen Liu, Lu Feng, Ryohei Fujimaki, and Yusuke Muraoka. Scalable model selection for largescale factorial relational models. In Proceedings of The 32nd International Conference on Machine Learning , pages 1227-1235, 2015.

Digital Library

[20]

Jan Peters and Stefan Schaal. Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the 24th International Conference on Machine Learning , pages 745-750. ACM, 2007.

Digital Library

[21]

Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming . John Wiley & Sons, 2014.

Digital Library

[22]

Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics , 6(2):461-464, 1978.

[23]

Matthijs TJ Spaan, Tiago S Veiga, and Pedro U Lima. Active cooperative perception in network robot systems using pomdps. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 4800-4805, 2010.

[24]

Tsuyoshi Ueno, Kohei Hayashi, Takashi Washio, and Yoshinobu Kawahara. Weighted likelihood policy search with model selection. In Advances in Neural Information Processing Systems , pages 2357-2365, 2012.

Digital Library

[25]

Sumio Watanabe. Algebraic geometry and statistical learning theory , volume 25. Cambridge University Press, 2009.

Digital Library

[26]

Steve Young, Milica Gašic, Simon Keizer, François Mairesse, Jost Schatzmann, Blaise Thomson, and Kai Yu. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech & Language , 24(2):150-174, 2010.

Digital Library

Factorized asymptotic bayesian policy search for POMDPs
1. Computing methodologies

Recommendations

Efficient planning in large POMDPs through policy graph based factorized approximations
ECMLPKDD'10: Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightforward optimization of plans (policies) computationally intractable. To ...
Factorized asymptotic Bayesian inference for latent feature models
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1

This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete log-likelihood, which ...
Bayesian Reinforcement Learning in Factored POMDPs
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

August 2017

5253 pages

ISBN:9780999241103

Editor:
Carles Sierra
IIIA-CSIC

Sponsors

Australian Comp Soc: Australian Computer Society
NSF: National Science Foundation
Griffith University
University of Technology Sydney
AI Journal: AI Journal

Publisher

AAAI Press

Publication History

Published: 19 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents