Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3340555.3353730acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multitask Prediction of Exchange-level Annotations for Multimodal Dialogue Systems

Published: 14 October 2019 Publication History

Abstract

This paper presents multimodal computational modeling of three labels that are independently annotated per exchange to implement an adaptation mechanism of dialogue strategy in spoken dialogue systems based on recognizing user sentiment by multimodal signal processing. The three labels include (1) user’s interest label pertaining to the current topic, (2) user’s sentiment label, and (3) topic continuance denoting whether the system should continue the current topic or change it. Predicting the three types of labels that capture different aspects of the user’s sentiment level and the system’s next action contribute to adopting a dialogue strategy based on the user’s sentiment. For this purpose, we enhanced shared multimodal dialogue data by annotating impressed sentiment labels and the topic continuance labels. With the corpus, we develop a multimodal prediction model for the three labels. A multitask learning technique is applied for binary classification tasks of the three labels considering the partial similarities among them. The prediction model was efficiently trained even with a small data set (less than 2000 samples) thanks to the multitask learning framework. Experimental results show that the multitask deep neural network (DNN) model trained with multimodal features including linguistics, facial expressions, body and head motions, and acoustic features, outperformed those trained as single-task DNNs by 1.6 points at the maximum.

References

[1]
Masahiro Araki, Sayaka Tomimasu, Mikio Nakano, Kazunori Komatani, Shogo Okada, Shinya Fujie, and Hiroaki Sugiyama. 2018. Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users’ Interest Level. In Proc. International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA).
[2]
Oya Aran and Daniel Gatica-Perez. 2013. One of a Kind: Inferring Personality Impressions in Meetings. In Proc. International Conference on Multimodal Interaction (ICMI). 11–18.
[3]
T. Baltrusaitis, A. Zadeh, Y. Lim, and L. Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In Proc. International Conference on Automatic Face and Gesture Recognition (FG). IEEE Computer Society, 59–66.
[4]
Dan Bohus and Eric Horvitz. 2009. Learning to Predict Engagement with a Spoken Dialog System in Open-world Settings. In Proc. Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics, 244–252.
[5]
Richard Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proc. International Conference on Machine Learning (ICML). Morgan Kaufmann, 41–48.
[6]
Richard Caruana. 1997. Multitask Learning. Machine Learning 28, 1 (1997), 41–75.
[7]
Yuya Chiba, Masashi Ito, Takashi Nose, and Akinori Ito. 2014. User Modeling by Using Bag-of-Behaviors for Building a Dialog System Sensitive to the Interlocutor’s Internal State. In Proc. Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 74–78.
[8]
Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. In Machine Learning. 273–297.
[9]
David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, Gale Lucas, Stacy Marsella, Fabrizio Morbini, Angela Nazarian, Stefan Scherer, Giota Stratou, Apar Suri, David Traum, Rachel Wood, Yuyu Xu, Albert Rizzo, and Louis-Philippe Morency. 2014. SimSensei Kiosk: A Virtual Human Interviewer for Healthcare Decision Support. In Proc. International Conference on Autonomous Agents and Multi-agent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, 1061–1068.
[10]
D.Kingma and J.Ba. 2014. Adam: A method for stochastic optimization. In Proc. International Conference for Learning Representations (ICLR).
[11]
Ekman.P and W V Friesen. 1978. The facial action coding system: A technique for the measurement of facial movement.Consulting Psychologists Press.
[12]
Moataz El Ayadi, Mohamed S Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572–587.
[13]
Edward Filisko and Stephanie Seneff. 2004. Error detection and recovery in spoken dialogue systems. In Proc. Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing (HLT-NAACL).
[14]
Nadine Glas and Catherine Pelachaud. 2015. Definitions of engagement in human-agent interaction. In Proc. International Workshop on Engagment in Human Computer Interaction (ENHANCE). 944–949.
[15]
Ryuichiro Higashinaka, Kotaro Funakoshi, Yuka Kobayashi, and Michimasa Inaba. 2016. The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics. In Proc. International Conference on Language Resources and Evaluation (LREC).
[16]
Takatsugu Hirayama, Yasuyuki Sumi, Tatsuya Kawahara, and Takashi Matsuyama. 2011. Info-concierge: Proactive multi-modal interaction through mind probing. In The Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2011).
[17]
Mohammed Ehsan Hoque, Matthieu Courgeon, Jean-Claude Martin, Bilge Mutlu, and Rosalind W Picard. 2013. Mach: My automated conversation coach. In Proc. International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp). ACM, 697–706.
[18]
Koji Inoue, Divesh Lala, Katsuya Takanashi, and Tatsuya Kawahara. 2018. Latent character model for engagement recognition based on multimodal behaviors. In Proc. International Workshop on Spoken Dialogue Systems (IWSDS).
[19]
Kazunori Komatani, Shogo Okada, Haruto Nishimoto, Masahiro Araki, and Mikio Nakano. 2019. Multimodal Dialogue Data Collection and Analysis of Annotation Disagreement. In Proc. International Workshop on Spoken Dialogue Systems (IWSDS).
[20]
Krippendorff.K. 2011. Computing Krippendorff’s Alpha-Reliability.
[21]
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying Conditional Random Fields to Japanese Morphological Analysis. In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 230–237.
[22]
Divesh Lala, Koji Inoue, and Tatsuya Kawahara. 2018. Evaluation of real-time deep learning turn-taking models for multiple dialogue scenarios. In Proc. International Conference on Multimodal Interaction (ICMI). ACM, 78–86.
[23]
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159–174.
[24]
Yukiko I. Nakano and Ryo Ishii. 2010. Estimating User’s Engagement from Eye-gaze Behaviors in Human-agent Conversations. In Proc. International Conference on Intelligent User Interfaces (IUI). ACM, 139–148.
[25]
Setareh Nasihati Gilani, David Traum, Arcangelo Merla, Eugenia Hee, Zoey Walker, Barbara Manini, Grady Gallagher, and Laura-Ann Petitto. 2018. Multimodal Dialogue Management for Multiparty Interaction with Infants. In Proc. International Conference on Multimodal Interaction (ICMI). ACM, 5–13.
[26]
Verónica Pérez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proc. Annual Meeting of the Association for Computational Linguistics. 973–982.
[27]
Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems 89 (2015), 14–46.
[28]
Björn Schuller, Stefan Steidl, and Anton Batliner. 2009. The INTERSPEECH 2009 emotion challenge. In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 312–315.
[29]
Gabriel Skantze. 2007. Error Handling in Spoken Dialogue Systems : Managing Uncertainty, Grounding and Miscommunication. Ph.D. Dissertation. KTH, Speech, Music and Hearing, TMH. QC 20100812.
[30]
Gabriel Skantze, Anna Hjalmarsson, and Catharine Oertel. 2014. Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Communication 65(2014), 50 – 66.
[31]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 1929–1958.
[32]
Giota Stratou and Louis-Philippe Morency. 2017. MultiSense—Context-Aware Nonverbal Behavior Analysis Framework: A Psychological Distress Use Case. IEEE Transactions on Affective Computing 8, 2 (2017), 190–203.
[33]
Hiroki Tanaka, Hiroyoshi Adachi, Norimichi Ukita, Manabu Ikeda, Hiroaki Kazui, Takashi Kudo, and Satoshi Nakamura. 2017. Detecting Dementia Through Interactive Computer Avatars. IEEE journal of translational engineering in health and medicine 5 (2017), 1–11.
[34]
Hiroki Tanaka, Hideki Negoro, Hidemi Iwasaka, and Satoshi Nakamura. 2017. Embodied conversational agents for multimodal automated social skills training in people with autism spectrum disorders. PloS one 12, 8 (2017), e0182151.
[35]
Hiroki Tanaka, Hideki Negoro, Hidemi Iwasaka, and Satoshi Nakamura. 2018. Listening Skills Assessment Through Computer Agents. In Proc. International Conference on Multimodal Interaction (ICMI). ACM, 492–496.
[36]
Sayaka Tomimasu and Masahiro Araki. 2016. Assessment of Users’ Interests in Multimodal Dialog Based on Exchange Unit. In Proc. International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction. ACM, 33–37.
[37]
Klaus Weber, Hannes Ritschel, Ilhan Aslan, Florian Lingenfelser, and Elisabeth André. 2018. How to Shape the Humor of a Robot - Social Behavior Adaptation Based on Reinforcement Learning. In Proc. International Conference on Multimodal Interaction (ICMI). ACM, 154–162.
[38]
Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence 31, 1(2009), 39–58.

Cited By

View all
  • (2024)Adaptive Interview Strategy Based on Interviewees’ Speaking Willingness Recognition for Interview RobotsIEEE Transactions on Affective Computing10.1109/TAFFC.2023.330964015:3(942-957)Online publication date: Jul-2024
  • (2024)Empirical Analysis of Individual Differences Based on Sentiment Estimation Performance Toward Speaker Adaptation for Social Signal ProcessingSocial Computing and Social Media10.1007/978-3-031-61281-7_26(359-371)Online publication date: 1-Jun-2024
  • (2023)A multilayer perceptron-based model applied to histopathology image classification of lung adenocarcinoma subtypesFrontiers in Oncology10.3389/fonc.2023.117223413Online publication date: 18-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '19: 2019 International Conference on Multimodal Interaction
October 2019
601 pages
ISBN:9781450368605
DOI:10.1145/3340555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '19

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Interview Strategy Based on Interviewees’ Speaking Willingness Recognition for Interview RobotsIEEE Transactions on Affective Computing10.1109/TAFFC.2023.330964015:3(942-957)Online publication date: Jul-2024
  • (2024)Empirical Analysis of Individual Differences Based on Sentiment Estimation Performance Toward Speaker Adaptation for Social Signal ProcessingSocial Computing and Social Media10.1007/978-3-031-61281-7_26(359-371)Online publication date: 1-Jun-2024
  • (2023)A multilayer perceptron-based model applied to histopathology image classification of lung adenocarcinoma subtypesFrontiers in Oncology10.3389/fonc.2023.117223413Online publication date: 18-May-2023
  • (2023)Effects of Physiological Signals in Different Types of Multimodal Sentiment EstimationIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315560414:3(2443-2457)Online publication date: 1-Jul-2023
  • (2022)Multimodal Analysis for Communication Skill and Self-Efficacy Level Estimation in Job Interview ScenarioProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3568461(110-120)Online publication date: 27-Nov-2022
  • (2022)Investigating the relationship between dialogue and exchange-level impressionProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556602(359-367)Online publication date: 7-Nov-2022
  • (2021)Multimodal User Satisfaction Recognition for Non-task Oriented Dialogue SystemsProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479928(586-594)Online publication date: 18-Oct-2021
  • (2021)Recognizing Social Signals with Weakly Supervised Multitask Learning for Multimodal Dialogue SystemsProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479927(141-149)Online publication date: 18-Oct-2021
  • (2021)Multimodal Human-Agent Dialogue Corpus with Annotations at Utterance and Dialogue Levels2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII52823.2021.9597447(1-8)Online publication date: 28-Sep-2021
  • (2020)Packing, Stacking, and Tracking: An Empirical Study of Online User AdaptationConversational Dialogue Systems for the Next Decade10.1007/978-981-15-8395-7_24(319-336)Online publication date: 25-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media