research-article

Learning to control listening-oriented dialogue using partially observable markov decision processes

Authors:

Yasuhiro Minami,

Ryuichiro Higashinaka,

Kohji DohsakaAuthors Info & Claims

ACM Transactions on Speech and Language Processing (TSLP), Volume 10, Issue 4

Article No.: 15, Pages 1 - 20

https://doi.org/10.1145/2513145

Published: 03 January 2014 Publication History

Abstract

Our aim is to build listening agents that attentively listen to their users and satisfy their desire to speak and have themselves heard. This article investigates how to automatically create a dialogue control component of such a listening agent. We collected a large number of listening-oriented dialogues with their user satisfaction ratings and used them to create a dialogue control component that satisfies users by means of Partially Observable Markov Decision Processes (POMDPs). Using a hybrid dialog controller where high-level dialog acts are chosen with a statistical policy and low-level slot values are populated by a wizard, we evaluated our dialogue control method in a Wizard-of-Oz experiment. The experimental results show that our POMDP-based method achieves significantly higher user satisfaction than other stochastic models, confirming the validity of our approach. This article is the first to verify, by using human users, the usefulness of POMDP-based dialogue control for improving user satisfaction in nontask-oriented dialogue systems.

References

[1]

Bickmore, T. and Cassell, J. 2001. Relational agents: A model and implementation of building user trust. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'01). 396--403.

Digital Library

[2]

Boularias, A., Chinaei, H. R., and Chaib-Draa, B. 2010. Learning the reward model of dialogue POMDPS from data. In Proceedings of the NIPS Workshop of Machine Learning for Assistive Techniques. 1--9.

[3]

Cohen, J. 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 43, 37--46.

[4]

Cohen, W. W., Carvalho, V. R., and Mitchell, T. M. 2004. Learning to classify email into “speech acts”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'04). 309--316.

[5]

Core, M. G. and Allen, J. F. 1997. Coding dialogs with the damsl annotation scheme. In Proceedings of the Working Notes of the AAAI Fall Symposium on Communicative Action in Humans and Machines.

[6]

Ferguson, G., Allen, J. F., and Miller, B. 1996. TRAINS-95: Towards a mixed-initiative planning assistant. In Proceedings of the 3^rd Conference on AI Planning Systems. 70--77.

[7]

Gašić, M., Jurcicek, F., Thomson, B., Yu, K., and Young, S. 2011. On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'11). 312--317.

[8]

Higashinaka, R., Dohsaka, K., and Isozaki, H. 2008. Effects of self-disclosure and empathy in human computer dialogue. In Proceedings of the Spoken Language Technology Workshop (SLT'08). 108--112.

[9]

Higashinaka, R., Minami, Y., Dohsaka, K., and Meguro, T. 2010. Issues in predicting user satisfaction transitions in dialogues: Individual differences, evaluation criteria, and prediction models. In Proceedings of the 2^nd International Conference on Spoken Dialogue Systems for Ambient Environments. Springer, 48--60.

Digital Library

[10]

Higuchi, S., Rzepka, R., and Araki, K. 2008. A casual conversation system using modality and word associations retrieved from the web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'08). 382--390.

Digital Library

[11]

Hirasawa, J., Miyazaki, N., Nakano, M., and Kawabata, T. 1998. Implementation of coordinative nodding behavior on spoken dialogue systems. In Proceedings of the 5^th International Conference on Spoken Language Processing (ICSLP'98). 2347--2350.

[12]

Hirshman, L. 1989. Overview of the DARPA speech and natural language workshop. In Proceedings of the DARPA Speech and Natural Language Workshop. 1--2.

Digital Library

[13]

Isomura, N., Toriumi, F., and Ishii, K. 2009. Evaluation method of non-task-oriented dialogue system using HMM. IEICE Trans. Inf. Syst. J92-D, 4, 542--551.

[14]

Ivey, A., Ivey, M., and Zalaquett, C. 2009. Intentional Interviewing and Counseling: Facilitating Client Development in a Multicultural Society. Brooks/Cole Publishing.

[15]

Ivey, A. E. and Ivey, M. B. 2002. Intentional Interviewing and Counseling: Facilitating Client Development in a Multicultural Society. Brooks/Cole Publishing.

[16]

Jurafsky, D., Shriberg, L., and Biasca, D. 1997. Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual. Tech. rep. 97-01, University of Colorado, Institute of Cognitive Science.

[17]

Kang, S., Gratch, J., Sidner, C. L., and Artstein, R. 2012. Towards building a virtual counselor: Modeling nonverbal behavior during intimate self-disclosure. In Proceedings of the 11^th International Conference on Autonomous Agents and Multiagent Systems. 63--70.

Digital Library

[18]

Kobayashi, Y., Yamamoto, D., Koga, T., Yokoyama, S., and Doi, M. 2010. Design targeting voice interface robot capable of active listening. In Proceedings of the 5^th ACM/IEEE International Conference on Human-Robot Interaction (HRI'10). 161--162.

Digital Library

[19]

Lee, A., Shikano, K., and Kawahara, T. 2004. Real-time word confidence scoring using local posterior probabilities on tree trellis search. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Vol. 1, 793--796.

[20]

Maatman, R. M., Gratch, J., and Marsella, S. 2005. Natural behavior of a listening agent. In Proceedings of the 5^th International Working Conference on Intelligent Virtual Agents (IVA'05). Lecture Notes in Computer Science, vol. 3661, Springer, 25--36.

Digital Library

[21]

Mauldin, M. L. 1994. Chatterbots, tinymuds, and the turing test: Entering the Loebner prize competition. In Proceedings of the 12^th National Conference on Artificial Intelligence (AAAI'94). 16--21.

Digital Library

[22]

Meguro, T., Higashinaka, R., Dohsaka, K., Minami, Y., and Isozaki, H. 2009. Analysis of listening oriented dialogue for building listening agents. In Proceedings of the 10^th Annual SIGDIAL Meeting on Discourse and Dialogue (SIGDIAL'09). 124--127.

Digital Library

[23]

Meguro, T., Higashinaka, R., Minami, Y., and Dohsaka, K. 2010. Controlling listening-oriented dialogue using partially observable Markov decision processes. In Proceedings of the 23^rd International Conference on Computational Linguistics (COLING'10). 761--769.

Digital Library

[24]

Meguro, T., Higashinaka, R., Minami, Y., and Dohsaka, K. 2011. Evaluation of listening-oriented dialogue control rules based on the analysis of HMMs. In Proceedings of the 12^th Annual Conference of the International Speech Communication Association (Interspeech'11). 809--812.

[25]

Minami, Y., Mori, A., Meguro, T., Higashinaka, R., Dohsaka, K., and Maeda, E. 2009. Dialogue control algorithm for ambient intelligence based on partially observable markov decision processes. In Proceedings of the 1^st International Workshop on Spoken Dialogue Systems Technology (IWSDS'09). 254--263.

[26]

Minami, Y., Sawaki, M., Dohsaka, K., Higashinaka, R., Ishizuka, K., Isozaki, H., Matsubayashi, T., Miyoshi, M., Nakamura, A., Oba, T., Sawada, H., Yamada, T., and Maeda, E. 2007. The world of mushrooms: Human-computer interaction prototype systems for ambient intelligence. In Proceedings of the 9^th International Conference on Multimodal Interfaces (ICMI'07). 366--373.

Digital Library

[27]

Nakano, M., Miyazaki, N., Hirasawa, J., Dohsaka, K., and Kawabata, T. 1999. Understanding unsegmented user utterances in real-time spoken dialogue systems. In Proceedings of the 47^th Annual Meeting of the Association for Computational Linguistics (ACL). 200--207.

Digital Library

[28]

Ng, A. Y. and Russell, S. J. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 7^th International Conference on Machine Learning (ICML'00). 663--670.

Digital Library

[29]

Pineau, J., Gordon, G., and Thrun, S. 2003. Point-based value iteration: An anytime algorithm for POMDPS. In Proceedings of the 18^th International Joint Conference on Artificial Intelligence (IJSAI'03). 1025--1032.

Digital Library

[30]

Schmitt, A., Schatz, B., and Minker, W. 2011. Modeling and predicting quality in spoken human computer interaction. In Proceedings of the 12^th Annual SIGDIAL Meeting on Discourse and Dialogue (SIGDIAL'11). 173--184.

Digital Library

[31]

Shirai, K. 1996. Modeling of spoken dialogue with and without visual information. In Proceedings of the 4^th International Conference on Spoken Language Processing (ICSLP'96). Vol. 1, 188--191.

[32]

Shitaoka, K., Tokuhisa, R., Yoshimura, T., Hoshino, H., and Watanabe, N. 2010. Active listening system for dialogue robot. Tech. rep. SIG-SLUD, vol. 58, 61--66, Japanese Society for Artificial Intelligence (in Japanese).

[33]

Stolcke, A., Coccaro, N., Bates, R., Taylor, P., Van Ess-Dykema, C., Ries, K., Shriberg, E., Jurafsky, D., Martin, R., and Meteer, M. 2000. Dialogue act modeling for automatic tagging and recognition of conversational speech. Comput. Linguist. 26, 3, 339--373.

Digital Library

[34]

Sugiyama, H., Meguro, T., and Minami, Y. 2012. Preference-learning based inverse reinforcement learning for dialog control. In Proceedings of the 13^th Annual Conference of the International Speech Communication Association.

[35]

Surendran, D. and Levow, G. 2006. Dialog act tagging with support vector machines and hidden Markov models. In Proceedings of the 9^th International Conference on Spoken Language Processing (ICSLP'06). 1950--1953.

[36]

Walker, M., Rudnicky, A., Aberdeen, J., Bratt, E. O., Prasad, R., Roukos, S. S. G., and Stallard, S. D. 2002. DARPA Communicator evaluation: Progress from 2000 to 2001. In Proceedings of the 7^th International Conference on Spoken Language Processing (ICSLP'02). 273--276.

[37]

Wallace, R. S. 2004. The Anatomy of A.L.I.C.E. Artificial Intelligence Foundation, Inc. http://www.alicebot. org/anatomy.html.

[38]

Wilks, Y. 2005. Artificial companions. Interdisciplinary Sci. Rev. 30, 145--152.

[39]

Williams, J. and Young, S. 2007. Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 2, 393--422.

Digital Library

[40]

Yokoyama, S., Yamamoto, D., Kobayashi, Y., and Doi, M. 2010. Development of dialogue interface for elderly people—Switching the topic presenting mode and the attentive listening mode to keep chatting. Tech. rep., IPSJ SIG vol. 2010-SLP-80, 1--6 (in Japanese).

Cited By

Zeng JNakano YSakato T(2024)Implementation and Evaluation of Interviewer’s Response Generation Model Considering Semantic Content意味内容に基づくインタビュアー応答生成モデルの作成と評価Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.39-3_IDS6-A39:3(IDS6-A_1-15)Online publication date: 1-May-2024
https://doi.org/10.1527/tjsai.39-3_IDS6-A
Mitsuda KHigashinaka ROga YYoshida S(2023)Recording and Analyzing the Process of Building Common Ground in Dialogues in a Collaborative Task共同作業を行う対話における共通基盤構築過程の記録と分析Journal of Natural Language Processing10.5715/jnlp.30.90730:3(907-934)Online publication date: 2023
https://doi.org/10.5715/jnlp.30.907
Mitsuda KHigashinaka RTomita J(2020)Collection and Analysis of Perceived Information in Chat-oriented Dialogue雑談対話における言外の情報の収集と類型化Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.DSI-E35:1(DSI-E_1-10)Online publication date: 1-Jan-2020
https://doi.org/10.1527/tjsai.DSI-E
Show More Cited By

Index Terms

Learning to control listening-oriented dialogue using partially observable markov decision processes
1. Information systems
  1. Information systems applications
    1. Decision support systems
      1. Expert systems

Recommendations

Controlling listening-oriented dialogue using partially observable Markov decision processes
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

This paper investigates how to automatically create a dialogue control component of a listening agent to reduce the current high cost of manually creating such components. We collected a large number of listening-oriented dialogues with their user ...
Planning and acting in partially observable stochastic domains

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable ...
Analysis of listening-oriented dialogue for building listening agents
SIGDIAL '09: Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Our aim is to build listening agents that can attentively listen to the user and satisfy his/her desire to speak and have himself/herself heard. This paper investigates the characteristics of such listening-oriented dialogues so that such a listening ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Speech and Language Processing

ACM Transactions on Speech and Language Processing Volume 10, Issue 4

December 2013

206 pages

ISSN:1550-4875

EISSN:1550-4883

DOI:10.1145/2560566

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2014

Accepted: 01 March 2013

Revised: 01 November 2012

Received: 01 August 2012

Published in TSLP Volume 10, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Ministry of Education, Culture, Sports, Science, and Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
494
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zeng JNakano YSakato T(2024)Implementation and Evaluation of Interviewer’s Response Generation Model Considering Semantic Content意味内容に基づくインタビュアー応答生成モデルの作成と評価Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.39-3_IDS6-A39:3(IDS6-A_1-15)Online publication date: 1-May-2024
https://doi.org/10.1527/tjsai.39-3_IDS6-A
Mitsuda KHigashinaka ROga YYoshida S(2023)Recording and Analyzing the Process of Building Common Ground in Dialogues in a Collaborative Task共同作業を行う対話における共通基盤構築過程の記録と分析Journal of Natural Language Processing10.5715/jnlp.30.90730:3(907-934)Online publication date: 2023
https://doi.org/10.5715/jnlp.30.907
Mitsuda KHigashinaka RTomita J(2020)Collection and Analysis of Perceived Information in Chat-oriented Dialogue雑談対話における言外の情報の収集と類型化Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.DSI-E35:1(DSI-E_1-10)Online publication date: 1-Jan-2020
https://doi.org/10.1527/tjsai.DSI-E
Tsunomori YHigashinaka RYoshimura TIsoda Y(2020)An Evaluation of a Chat-oriented Dialogue System that Remembers and Uses User Information over Multiple Daysユーザ情報を記憶する雑談対話システムの構築とその複数日にまたがる評価Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.DSI-B35:1(DSI-B_1-10)Online publication date: 1-Jan-2020
https://doi.org/10.1527/tjsai.DSI-B
Lubis NSakti SYoshino KNakamura S(2019)Positive Emotion Elicitation in Chat-Based Dialogue SystemsIEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)10.1109/TASLP.2019.290091027:4(866-877)Online publication date: 17-May-2019
https://dl.acm.org/doi/10.1109/TASLP.2019.2900910
Lubis NHeck MSakti SYoshino KNakamura S(2017)Processing negative emotions through social communication: Multimodal database construction and analysis2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII.2017.8273582(79-85)Online publication date: Oct-2017
https://doi.org/10.1109/ACII.2017.8273582
Hirano TKobayashi NHigashinaka RMakino TMatsuo Y(2016)User Information Extraction for Personalized Dialogue SystemsTransactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.DSF-51231:1(DSF-B_1-10)Online publication date: 2016
https://doi.org/10.1527/tjsai.DSF-512
Han SKim YLee G(2015)Micro-Counseling Dialog System Based on Semantic ContentNatural Language Dialog Systems and Intelligent Assistants10.1007/978-3-319-19291-8_6(63-72)Online publication date: 2015
https://doi.org/10.1007/978-3-319-19291-8_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents