research-article

Gaussian Processes for POMDP-Based Dialogue Manager Optimization

Authors:

Steve YoungAuthors Info & Claims

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), Volume 22, Issue 1

Pages 28 - 40

https://doi.org/10.1109/TASL.2013.2282190

Published: 01 January 2014 Publication History

Abstract

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems.

References

[1]

S. Young, "Talking to machines (Statistically speaking)," in Proc. ICSLP, 2002.

[2]

N. Roy, J. Pineau, and S. Thrun, "Spoken dialogue management using probabilistic reasoning," in Proc. ACL, 2000.

[3]

B. Zhang, Q. Cai, J. Mao, E. Chang, and B. Guo, "Spoken dialogue management as planning and acting under uncertainty," in Proc. Eurospeech, 2001.

[4]

J. Williams and S. Young, "Partially observable Markov decision processes for spoken dialog systems," Comput. Speech Lang., vol. 21, no. 2, pp. 393-422, 2007.

[5]

B. Thomson, "Statistical methods for spoken dialogue management," Ph.D., Univ. of Cambridge, Cambridge, U.K., 2009.

[6]

S. Young, M. Ga¿ic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu, "The hidden information state model: A practical framework for POMDP-based spoken dialogue management," Comput. Speech Lang., vol. 24, no. 2, pp. 150-174, 2010.

[7]

M. Ga¿ic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu, and S. Young, "Training and evaluation of the HIS-POMDP dialogue system in noise," in Proc. SIGDIAL, 2008.

[8]

F. Jur¿í¿ek, B. Thomson, and S. Young, "Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs," ACM Trans. Speech Lang. Process., pp. 6:1-6:26, 2011.

[9]

L. Kaelbling, M. Littman, and A. Cassandra, "Planning and acting in partially observable stochastic domains," Artif. Intell., vol. 101, pp. 99-134, 1998.

[10]

J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: An anytime algorithm for POMDPs," in Proc. IJCAI, 2003, pp. 1025-1032.

[11]

J. Williams and S. Young, "Scaling POMDPs for spoken dialog management," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2116-2129, Sep. 2007.

[12]

J. Henderson, O. Lemon, and K. Georgila, "Hybrid reinforcement/supervised learning for dialogue policies from fixed data sets," Comput. Linguist., vol. 34, no. 4, pp. 487-511, 2008.

[13]

B. Thomson and S. Young, "Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems," Comput. Speech Lang., vol. 24, no. 4, pp. 562-588, 2010.

[14]

L. Li, J. Williams, and S. Balakrishnan, "Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection," in Proc. Interspeech, 2009.

[15]

P. Crook and O. Lemon, "Lossless value directed compression of complex user goal states for statistical spoken dialogue systems," in Proc. Interspeech, 2011.

[16]

L. Daubigney, M. Geist, and O. Pietquin, "Off-policy learning in large-scale POMDP-based dialogue systems," in Proc. ICASSP, 2012, pp. 4989-4992.

[17]

Amazon Mechanical Turk Amazon, 2011 [Online]. Available: https:// www.mturk.com/mturk/welcome

[18]

Y. Engel, S. Mannor, and R. Meir, "Bayes meets Bellman: The Gaussian process approach to temporal difference learning," in Proc. ICML, 2003.

[19]

Y. Engel, S. Mannor, and R. Meir, "Reinforcement learning with Gaussian processes," in Proc. ICML, 2005.

[20]

C. E. Rasmussen and M. Kuss, "Gaussian processes in reinforcement learning," in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2004, vol. 16, pp. 751-759.

[21]

M. Deisenroth, C. Rasmussen, and J. Peters, "Gaussian process dynamic programming," Neurocomputing, vol. 72, no. 7-9, pp. 1508-1524, 2009.

[22]

C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press, 2005.

[23]

M. Ga¿ic, "Statistical dialogue modelling," Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K., 2011.

[24]

M. Ga¿ic, F. Jur¿í¿ek, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu, and S. Young, "Gaussian processes for fast policy optimisation of POMDP-based dialogue managers," in Proc. SIGDIAL, 2010.

[25]

D. Cohn, L. Atlas, and R. Ladner, "Improving generalization with active learning," Mach. Learn., vol. 15, pp. 201-221, 1994.

[26]

D. J. C. MacKay, "Information-based objective functions for active data selection," Neural Comput., vol. 4, no. 4, pp. 590-604, 1992.

[27]

J. Quinonero-Candela and C. Rasmussen, "A unifying view of sparse approximate Gaussian process regression," J. Mach. Learn. Res., vol. 6, pp. 1939-1959, 2005.

[28]

Y. Engel, "Algorithms and representations for reinforcement learning," Ph.D. dissertation, Hebrew Univ., Jerusalem, Israel, 2005.

[29]

D. Nguyen-Tuong and J. Peters, "Incremental sparsication for real-time online model learning," in Proc. AISTATS '10, 2010.

[30]

B. Scholkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press, 2001.

[31]

J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, pp. 1180-1190, 2008.

[32]

J. Schatzmann, "Statistical user and error modelling for spoken dialogue systems," Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K., 2008.

[33]

S. Keizer, M. Ga¿ic, F. Jur¿í¿ek, F. Mairesse, B. Thomson, K. Yu, and S. Young, "Parameter estimation for agenda-based user simulation," in Proc. SIGDIAL, 2010.

[34]

B. Thomson, M. Gasic, M. Henderson, P. Tsiakoulis, and S. Young, "N-best error simulation for training spoken dialogue systems," in Proc. SLT, 2012.

[35]

F. Jur¿í¿ek, S. Keizer, M. Ga¿ic, F. Mairesse, B. Thomson, K. Yu, and S. Young, "Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk," in Proc. Interspeech, 2011.

[36]

L. Daubigney, M. Ga¿ic, S. Chandramohan, M. Geist, O. Pietquin, and S. Young, "Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system," in Proc. Interspeech, 2011.

[37]

M. Ga¿ic, F. Jur¿í¿ek, B. Thomson, K. Yu, and S. Young, "On-line policy optimisation of spoken dialogue systems via live interaction with human subjects," in Proc. ASRU, 2011.

[38]

H. Jakab and L. Csató, "Reinforcement learning with guided policy search using Gaussian processes," in Proc. Int. Joint Conf. Neural Networks (IJCNN), 2012, pp. 1-8.

[39]

T. Jebara, R. Kondor, and A. Howard, "Probability product kernels," J. Mach. Learn. Res., vol. 5, pp. 819-844, Dec. 2004.

[40]

M. Hein and O. Bousquet, "Hilbertian metrics and positive definite kernels on probability measures," in Proc. AISTATS '05, 2004.

Cited By

Langerak TChristen SAlbaba MGebhardt CHolz CHilliges O(2024)MARLUI: Multi-Agent Reinforcement Learning for Adaptive Point-and-Click UIsProceedings of the ACM on Human-Computer Interaction10.1145/36611478:EICS(1-27)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661147
Ye TSi SWang JCheng NLi ZXiao JWilliams BChen YNeville J(2023)On the calibration and uncertainty with pólya-gamma augmentation for dialog retrieval modelsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26630(13923-13931)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i11.26630
Wang ZWang FZhang HYang MCao SWen ZZhang ZDemartini GZuccon GCulpepper JHuang ZTong H(2021)'Could You Describe the Reason for the Transfer?'Proceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481906(4214-4223)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3481906
Show More Cited By

Index Terms

Gaussian Processes for POMDP-Based Dialogue Manager Optimization

Recommendations

Gaussian processes for fast policy optimisation of POMDP-based dialogue managers
SIGDIAL '10: Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of ...
Dialogue POMDP components (Part II): learning the reward function

The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while being robust to noise. In this context, estimating the dialogue POMDP model ...
Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager

Effective dialogue management is critically dependent on the information that is encoded in the dialogue state. In order to deploy reinforcement learning for policy optimization, dialogue must be modeled as a Markov Decision Process. This requires that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Audio, Speech and Language Processing

IEEE/ACM Transactions on Audio, Speech and Language Processing Volume 22, Issue 1

January 2014

282 pages

ISSN:2329-9290

EISSN:2329-9304

Issue’s Table of Contents

Copyright © 2014.

Publisher

IEEE Press

Publication History

Published: 01 January 2014

Published in TASLP Volume 22, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
419
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Langerak TChristen SAlbaba MGebhardt CHolz CHilliges O(2024)MARLUI: Multi-Agent Reinforcement Learning for Adaptive Point-and-Click UIsProceedings of the ACM on Human-Computer Interaction10.1145/36611478:EICS(1-27)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661147
Ye TSi SWang JCheng NLi ZXiao JWilliams BChen YNeville J(2023)On the calibration and uncertainty with pólya-gamma augmentation for dialog retrieval modelsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26630(13923-13931)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i11.26630
Wang ZWang FZhang HYang MCao SWen ZZhang ZDemartini GZuccon GCulpepper JHuang ZTong H(2021)'Could You Describe the Reason for the Transfer?'Proceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481906(4214-4223)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3481906
Gebhardt CHecox Bvan Opheusden BWigdor DHillis JHilliges OBenko HGuimbretière FBernstein MReinecke K(2019)Learning Cooperative Personalized Policies from Gaze DataProceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology10.1145/3332165.3347933(197-208)Online publication date: 17-Oct-2019
https://dl.acm.org/doi/10.1145/3332165.3347933
Chen LChen ZTan BLong SGasic MYu K(2019)AgentGraphIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.291987227:9(1378-1391)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1109/TASLP.2019.2919872
Hou YFang MChe WLiu T(2019)A Corpus-Free State2Seq User Simulator for Task-Oriented DialogueChinese Computational Linguistics10.1007/978-3-030-32381-3_55(689-702)Online publication date: 18-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-32381-3_55
Nie XLin ZHuang XZhang Y(2019)Graph Neural Net-Based User SimulatorChinese Computational Linguistics10.1007/978-3-030-32381-3_51(638-650)Online publication date: 18-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-32381-3_51
Imani MGhoreishi SBraga-Neto U(2018)Bayesian control of large MDPs with unknown dynamics in data-poor environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327757.3327909(8157-8167)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327757.3327909
Hu ZLiang YZhang JLi ZLiu Y(2018)Inference aided reinforcement learning for incentive mechanism design in crowdsourcingProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327455(5512-5522)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327455
Weisz GBudzianowski PSu PGasic M(2018)Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action SpacesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.285166426:11(2083-2097)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2851664
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents