Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Gaussian Processes for POMDP-Based Dialogue Manager Optimization

Published: 01 January 2014 Publication History

Abstract

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems.

References

[1]
S. Young, "Talking to machines (Statistically speaking)," in Proc. ICSLP, 2002.
[2]
N. Roy, J. Pineau, and S. Thrun, "Spoken dialogue management using probabilistic reasoning," in Proc. ACL, 2000.
[3]
B. Zhang, Q. Cai, J. Mao, E. Chang, and B. Guo, "Spoken dialogue management as planning and acting under uncertainty," in Proc. Eurospeech, 2001.
[4]
J. Williams and S. Young, "Partially observable Markov decision processes for spoken dialog systems," Comput. Speech Lang., vol. 21, no. 2, pp. 393-422, 2007.
[5]
B. Thomson, "Statistical methods for spoken dialogue management," Ph.D., Univ. of Cambridge, Cambridge, U.K., 2009.
[6]
S. Young, M. Ga¿ic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu, "The hidden information state model: A practical framework for POMDP-based spoken dialogue management," Comput. Speech Lang., vol. 24, no. 2, pp. 150-174, 2010.
[7]
M. Ga¿ic, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu, and S. Young, "Training and evaluation of the HIS-POMDP dialogue system in noise," in Proc. SIGDIAL, 2008.
[8]
F. Jur¿í¿ek, B. Thomson, and S. Young, "Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs," ACM Trans. Speech Lang. Process., pp. 6:1-6:26, 2011.
[9]
L. Kaelbling, M. Littman, and A. Cassandra, "Planning and acting in partially observable stochastic domains," Artif. Intell., vol. 101, pp. 99-134, 1998.
[10]
J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: An anytime algorithm for POMDPs," in Proc. IJCAI, 2003, pp. 1025-1032.
[11]
J. Williams and S. Young, "Scaling POMDPs for spoken dialog management," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2116-2129, Sep. 2007.
[12]
J. Henderson, O. Lemon, and K. Georgila, "Hybrid reinforcement/supervised learning for dialogue policies from fixed data sets," Comput. Linguist., vol. 34, no. 4, pp. 487-511, 2008.
[13]
B. Thomson and S. Young, "Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems," Comput. Speech Lang., vol. 24, no. 4, pp. 562-588, 2010.
[14]
L. Li, J. Williams, and S. Balakrishnan, "Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection," in Proc. Interspeech, 2009.
[15]
P. Crook and O. Lemon, "Lossless value directed compression of complex user goal states for statistical spoken dialogue systems," in Proc. Interspeech, 2011.
[16]
L. Daubigney, M. Geist, and O. Pietquin, "Off-policy learning in large-scale POMDP-based dialogue systems," in Proc. ICASSP, 2012, pp. 4989-4992.
[17]
Amazon Mechanical Turk Amazon, 2011 [Online]. Available: https:// www.mturk.com/mturk/welcome
[18]
Y. Engel, S. Mannor, and R. Meir, "Bayes meets Bellman: The Gaussian process approach to temporal difference learning," in Proc. ICML, 2003.
[19]
Y. Engel, S. Mannor, and R. Meir, "Reinforcement learning with Gaussian processes," in Proc. ICML, 2005.
[20]
C. E. Rasmussen and M. Kuss, "Gaussian processes in reinforcement learning," in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2004, vol. 16, pp. 751-759.
[21]
M. Deisenroth, C. Rasmussen, and J. Peters, "Gaussian process dynamic programming," Neurocomputing, vol. 72, no. 7-9, pp. 1508-1524, 2009.
[22]
C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press, 2005.
[23]
M. Ga¿ic, "Statistical dialogue modelling," Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K., 2011.
[24]
M. Ga¿ic, F. Jur¿í¿ek, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu, and S. Young, "Gaussian processes for fast policy optimisation of POMDP-based dialogue managers," in Proc. SIGDIAL, 2010.
[25]
D. Cohn, L. Atlas, and R. Ladner, "Improving generalization with active learning," Mach. Learn., vol. 15, pp. 201-221, 1994.
[26]
D. J. C. MacKay, "Information-based objective functions for active data selection," Neural Comput., vol. 4, no. 4, pp. 590-604, 1992.
[27]
J. Quinonero-Candela and C. Rasmussen, "A unifying view of sparse approximate Gaussian process regression," J. Mach. Learn. Res., vol. 6, pp. 1939-1959, 2005.
[28]
Y. Engel, "Algorithms and representations for reinforcement learning," Ph.D. dissertation, Hebrew Univ., Jerusalem, Israel, 2005.
[29]
D. Nguyen-Tuong and J. Peters, "Incremental sparsication for real-time online model learning," in Proc. AISTATS '10, 2010.
[30]
B. Scholkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA, USA: MIT Press, 2001.
[31]
J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, pp. 1180-1190, 2008.
[32]
J. Schatzmann, "Statistical user and error modelling for spoken dialogue systems," Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K., 2008.
[33]
S. Keizer, M. Ga¿ic, F. Jur¿í¿ek, F. Mairesse, B. Thomson, K. Yu, and S. Young, "Parameter estimation for agenda-based user simulation," in Proc. SIGDIAL, 2010.
[34]
B. Thomson, M. Gasic, M. Henderson, P. Tsiakoulis, and S. Young, "N-best error simulation for training spoken dialogue systems," in Proc. SLT, 2012.
[35]
F. Jur¿í¿ek, S. Keizer, M. Ga¿ic, F. Mairesse, B. Thomson, K. Yu, and S. Young, "Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk," in Proc. Interspeech, 2011.
[36]
L. Daubigney, M. Ga¿ic, S. Chandramohan, M. Geist, O. Pietquin, and S. Young, "Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system," in Proc. Interspeech, 2011.
[37]
M. Ga¿ic, F. Jur¿í¿ek, B. Thomson, K. Yu, and S. Young, "On-line policy optimisation of spoken dialogue systems via live interaction with human subjects," in Proc. ASRU, 2011.
[38]
H. Jakab and L. Csató, "Reinforcement learning with guided policy search using Gaussian processes," in Proc. Int. Joint Conf. Neural Networks (IJCNN), 2012, pp. 1-8.
[39]
T. Jebara, R. Kondor, and A. Howard, "Probability product kernels," J. Mach. Learn. Res., vol. 5, pp. 819-844, Dec. 2004.
[40]
M. Hein and O. Bousquet, "Hilbertian metrics and positive definite kernels on probability measures," in Proc. AISTATS '05, 2004.

Cited By

View all
  • (2024)MARLUI: Multi-Agent Reinforcement Learning for Adaptive Point-and-Click UIsProceedings of the ACM on Human-Computer Interaction10.1145/36611478:EICS(1-27)Online publication date: 17-Jun-2024
  • (2023)On the calibration and uncertainty with pólya-gamma augmentation for dialog retrieval modelsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26630(13923-13931)Online publication date: 7-Feb-2023
  • (2021)'Could You Describe the Reason for the Transfer?'Proceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481906(4214-4223)Online publication date: 26-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Audio, Speech and Language Processing
IEEE/ACM Transactions on Audio, Speech and Language Processing  Volume 22, Issue 1
January 2014
282 pages
ISSN:2329-9290
EISSN:2329-9304
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 January 2014
Published in TASLP Volume 22, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MARLUI: Multi-Agent Reinforcement Learning for Adaptive Point-and-Click UIsProceedings of the ACM on Human-Computer Interaction10.1145/36611478:EICS(1-27)Online publication date: 17-Jun-2024
  • (2023)On the calibration and uncertainty with pólya-gamma augmentation for dialog retrieval modelsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i11.26630(13923-13931)Online publication date: 7-Feb-2023
  • (2021)'Could You Describe the Reason for the Transfer?'Proceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481906(4214-4223)Online publication date: 26-Oct-2021
  • (2019)Learning Cooperative Personalized Policies from Gaze DataProceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology10.1145/3332165.3347933(197-208)Online publication date: 17-Oct-2019
  • (2019)AgentGraphIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.291987227:9(1378-1391)Online publication date: 1-Sep-2019
  • (2019)A Corpus-Free State2Seq User Simulator for Task-Oriented DialogueChinese Computational Linguistics10.1007/978-3-030-32381-3_55(689-702)Online publication date: 18-Oct-2019
  • (2019)Graph Neural Net-Based User SimulatorChinese Computational Linguistics10.1007/978-3-030-32381-3_51(638-650)Online publication date: 18-Oct-2019
  • (2018)Bayesian control of large MDPs with unknown dynamics in data-poor environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327757.3327909(8157-8167)Online publication date: 3-Dec-2018
  • (2018)Inference aided reinforcement learning for incentive mechanism design in crowdsourcingProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327455(5512-5522)Online publication date: 3-Dec-2018
  • (2018)Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action SpacesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.285166426:11(2083-2097)Online publication date: 1-Nov-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media