Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3171837.3171894guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Factorized asymptotic bayesian policy search for POMDPs

Published: 19 August 2017 Publication History

Abstract

This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highly-flexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted loglikelihood (MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recently-developed factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.

References

[1]
Pieter Abbeel, Adam Coates, and Andrew Y Ng. Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research , 2010.
[2]
Karl J Áström. Optimal control of markov processes with incomplete state information. Journal of Mathematical Analysis and Applications , 10(1):174-205, 1965.
[3]
Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Weiwei Cheng, and Eyke Hüllermeier. Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm. Machine Learning , 97(3):327-351, 2014.
[4]
Chenghui Cai, Xuejun Liao, and Lawrence Carin. Learning to explore and exploit in pomdps. In Advances in Neural Information Processing Systems , pages 198-206, 2009.
[5]
Jesus Capitan, Matthijs TJ Spaan, Luis Merino, and Anibal Ollero. Decentralized multi-robot cooperation with auctioned pomdps. The International Journal of Robotics Research , 32(6):650-671, 2013.
[6]
Hichem Debbi, Mustapha Bourahla, and Aimad Debbi. Medical treatment analysis using probabilistic model checking. International Journal of Biomedical Engineering and Technology , 12(4):346-359, 2013.
[7]
Marc Deisenroth and Carl E Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning , pages 465-472, 2011.
[8]
Finale Doshi-Velez, David Pfau, Frank Wood, and Nicholas Roy. Bayesian nonparametric methods for partially-observable reinforcement learning. IEEE transactions on pattern analysis and machine intelligence , 37(2):394-407, 2015.
[9]
Riki Eto, Ryohei Fujimaki, Satoshi Morinaga, and Hiroshi Tamano. Fully-automatic bayesian piecewise sparse linear models. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics , pages 238-246, 2014.
[10]
Ryohei Fujimaki and Kohei Hayashi. Factorized asymptotic bayesian hidden markov models. In Proceedings of the 29th International Conference on Machine Learning , pages 799-806, 2012.
[11]
Ryohei Fujimaki and Satoshi Morinaga. Factorized asymptotic bayesian inference for mixture modeling. In International Conference on Artificial Intelligence and Statistics , pages 400-408, 2012.
[12]
Kohei Hayashi and Ryohei Fujimaki. Factorized asymptotic bayesian inference for latent feature models. In Advances in Neural Information Processing Systems , pages 1214-1222, 2013.
[13]
Kohei Hayashi, Shinichi Maeda, and Ryohei Fujimaki. Rebuilding factorized information criterion: Asymptotically accurate marginal likelihood. In Proceedings of the 32th International Conference on Machine Learning , page 1358-1366, 2015.
[14]
Ronald A Howard. Dynamic programming and Markov processes . MIT Press, 1960.
[15]
Athirai A Irissappane, Frans A Oliehoek, and Jie Zhang. A pomdp based approach to optimally select sellers in electronic marketplaces. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems , pages 1329-1336, 2014.
[16]
Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems , pages 1008-1014, 2000.
[17]
Sadanori Konishi and Genshiro Kitagawa. Information criteria and statistical modeling . Springer Science & Business Media, 2008.
[18]
Sergey Levine and Vladlen Koltun. Variational policy search via trajectory optimization. In Advances in Neural Information Processing Systems , pages 207-215, 2013.
[19]
Chunchen Liu, Lu Feng, Ryohei Fujimaki, and Yusuke Muraoka. Scalable model selection for largescale factorial relational models. In Proceedings of The 32nd International Conference on Machine Learning , pages 1227-1235, 2015.
[20]
Jan Peters and Stefan Schaal. Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the 24th International Conference on Machine Learning , pages 745-750. ACM, 2007.
[21]
Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming . John Wiley & Sons, 2014.
[22]
Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics , 6(2):461-464, 1978.
[23]
Matthijs TJ Spaan, Tiago S Veiga, and Pedro U Lima. Active cooperative perception in network robot systems using pomdps. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages 4800-4805, 2010.
[24]
Tsuyoshi Ueno, Kohei Hayashi, Takashi Washio, and Yoshinobu Kawahara. Weighted likelihood policy search with model selection. In Advances in Neural Information Processing Systems , pages 2357-2365, 2012.
[25]
Sumio Watanabe. Algebraic geometry and statistical learning theory , volume 25. Cambridge University Press, 2009.
[26]
Steve Young, Milica Gašic, Simon Keizer, François Mairesse, Jost Schatzmann, Blaise Thomson, and Kai Yu. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech & Language , 24(2):150-174, 2010.
  1. Factorized asymptotic bayesian policy search for POMDPs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence
    August 2017
    5253 pages
    ISBN:9780999241103

    Sponsors

    • Australian Comp Soc: Australian Computer Society
    • NSF: National Science Foundation
    • Griffith University
    • University of Technology Sydney
    • AI Journal: AI Journal

    Publisher

    AAAI Press

    Publication History

    Published: 19 August 2017

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media