Article

Predicting Listener Backchannels: A Probabilistic Multimodal Approach

Authors:

Louis-Philippe Morency,

Jonathan GratchAuthors Info & Claims

IVA '08: Proceedings of the 8th international conference on Intelligent Virtual Agents

Pages 176 - 190

https://doi.org/10.1007/978-3-540-85483-8_18

Published: 01 September 2008 Publication History

Abstract

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

References

[1]

Drolet, A., Morris, M.: Rapport in conflict resolution: accounting for how face-toface contact fosters mutual cooperation in mixed-motive conflicts. Experimental Social Psychology 36, 26-50 (2000)

[2]

Goldberg, S.: The secrets of successful mediators. Negotiation Journal 21(3), 365- 376 (2005)

[3]

Tsui, P., Schultz, G.: Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55, 561-569 (1985)

[4]

Fuchs, D.: Examiner familiarity effects on test performance: implications for training and practice. Topics in Early Childhood Special Education 7, 90-104 (1987)

[5]

Burns, M.: Rapport and relationships: The basis of child care. Journal of Child Care 2, 47-57 (1984)

[6]

Cassell, J., Vilhjlmsson, H., Bickmore, T.: Beat: The behavior expressive animation toolkit. In: Proceedings of the SIGGRAPH (2001)

[7]

Lee, J., Marsella, S.: Nonverbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243-255. Springer, Heidelberg (2006)

[8]

Kipp, M., Neff, M., Kipp, K., Albrecht, I.: Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 15-28. Springer, Heidelberg (2007)

[9]

Thiebaux, M., Marshall, A., Marsella, S., Kallmann, M.: Smartbody: Behavior realization for embodied conversational agents. In: AAMAS (2008)

[10]

Morency, L.P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: ICMI (October 2005)

[11]

Demirdjian, D., Darrell, T.: 3-d articulated pose tracking for untethered deictic reference. In: Int'l Conf. on Multimodal Interfaces (2002)

[12]

Heylen, D., Bevacqua, E., Tellier, M., Pelachaud, C.: Searching for prototypical facial feedback signals. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 147-153. Springer, Heidelberg (2007)

[13]

Kopp, S., Stocksmeier, T., Gibbon, D.: Incremental multimodal feedback for conversational agents. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 139-146. Springer, Heidelberg (2007)

[14]

Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics 23, 1177-1207 (2000)

[15]

Gratch, J., Wang, N., Gerten, J., Fast, E.: Creating rapport with virtual agents. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722. Springer, Heidelberg (2007)

[16]

Jónsdóttir, G.R., Gratch, J., Fast, E., Thórisson, K.R.: Fluid semantic backchannel feedback in dialogue: Challenges and progress. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722. Springer, Heidelberg (2007)

[17]

Allwood, J.: Dimensions of Embodied Communication - towards a typology of embodied communication. In: Embodied Communication in Humans and Machines, Oxford University Press, Oxford

[18]

Yngve, V.: On getting a word in edgewise. In: Proceedings of the Sixth regional Meeting of the Chicago Linguistic Society (1970)

[19]

Bavelas, J., Coates, L., Johnson, T.: Listeners as co-narrators. Journal of Personality and Social Psychology 79(6), 941-952 (2000)

[20]

Nishimura, R., Kitaoka, N., Nakagawa, S.: A spoken dialog system for chat-like conversations considering response timing. In: Matou?sek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 599-606. Springer, Heidelberg (2007)

[21]

Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European ACL, pp. 51-58 (2003)

[22]

Anderson, H., Bader, M., Bard, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H., Weinert, R.: The mcrc map task corpus. Language and Speech 34(4), 351-366 (1991)

[23]

Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: RO-MAN, pp. 159-164 (September 2004)

[24]

Maatman, M., Gratch, J., Marsella, S.: Natural behavior of a listening agent. In: Panayiotopoulos, T., Gratch, J., Aylett, R.S., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661. Springer, Heidelberg (2005)

[25]

Kang, S.H., Gratch, J., Wang, N., Watt, J.: Does the contingency of agents' nonverbal feedback affect users' social anxiety? In: AAMAS (2008)

[26]

Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257-286 (1989)

[27]

Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: ICML (2001)

[28]

Igor, S., Petr, S., Pavel, M., Luk, B., Michal, F., Martin, K., Jan, C.: Comparison of keyword spotting approaches for informal continuous speech. In: MLMI (2005)

[29]

hCRF library, http://sourceforge.net/projects/hcrf/

Cited By

Yoshimura KChen DWitkowski O(2024)Synlogue with Aizuchi-bot: Investigating the Co-Adaptive and Open-Ended Interaction ParadigmProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642046(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642046
Onishi KTanaka HNakamura S(2023)Multimodal Voice Activity Prediction: Turn-taking Events Detection in Expert-Novice ConversationProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623837(13-21)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3623809.3623837
Fu SFan YHosseinkashi YGupchup JCutler RMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Improving Meeting Inclusiveness using Speech Interruption AnalysisProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548379(887-895)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548379
Show More Cited By

Recommendations

A probabilistic multimodal approach for predicting listener backchannels

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important ...
Entrainment in speech preceding backchannels
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2

In conversation, when speech is followed by a backchannel, evidence of continued engagement by one's dialogue partner, that speech displays a combination of cues that appear to signal to one's interlocutor that a backchannel is appropriate. We term ...
A regression-based approach to modeling addressee backchannels
SIGDIAL '12: Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

During conversations, addressees produce conversational acts---verbal and nonverbal backchannels---that facilitate turn-taking, acknowledge speakership, and communicate common ground without disrupting the speaker's speech. These acts play a key role in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

IVA '08: Proceedings of the 8th international conference on Intelligent Virtual Agents

September 2008

553 pages

ISBN:9783540854821

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 September 2008

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yoshimura KChen DWitkowski O(2024)Synlogue with Aizuchi-bot: Investigating the Co-Adaptive and Open-Ended Interaction ParadigmProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642046(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642046
Onishi KTanaka HNakamura S(2023)Multimodal Voice Activity Prediction: Turn-taking Events Detection in Expert-Novice ConversationProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623837(13-21)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3623809.3623837
Fu SFan YHosseinkashi YGupchup JCutler RMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Improving Meeting Inclusiveness using Speech Interruption AnalysisProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548379(887-895)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3548379
Plopski AHirzle TNorouzi NQian LBruder GLanglotz T(2022)The Eye in Extended Reality: A Survey on Gaze Interaction and Eye Tracking in Head-worn Extended RealityACM Computing Surveys10.1145/349120755:3(1-39)Online publication date: 25-Mar-2022
https://dl.acm.org/doi/10.1145/3491207
Ding ZKang JHO TWong KFung HMeng HMa X(2022)TalkTive: A Conversational Agent Using Backchannels to Engage Older Adults in Neurocognitive Disorders ScreeningProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502005(1-19)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502005
Gratch JLucas G(2021)Rapport Between Humans and Socially Interactive AgentsThe Handbook on Socially Interactive Agents10.1145/3477322.3477335(433-462)Online publication date: 10-Sep-2021
https://dl.acm.org/doi/10.1145/3477322.3477335
Boudin ABertrand RRauzy SOchs MBlache P(2021)A Multimodal Model for Predicting Conversational FeedbacksText, Speech, and Dialogue10.1007/978-3-030-83527-9_46(537-549)Online publication date: 6-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-83527-9_46
Utami DBickmore TKim JTapus ASirkin DJung MKwak S(2019)Collaborative user responses in multiparty interaction with a couples counselor robotProceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction10.5555/3378680.3378723(294-303)Online publication date: 11-Mar-2019
https://dl.acm.org/doi/10.5555/3378680.3378723
Skantze GGustafson JBeskow J(2019)Multimodal conversational interaction with robotsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233799(77-104)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233799
Ding YZhang YXiao MDeng ZMark GFussell SLampe Cschraefel mHourcade JAppert CWigdor D(2017)A Multifaceted Study on Eye Contact based Speaker Identification in Three-party ConversationsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3025644(3011-3021)Online publication date: 2-May-2017
https://dl.acm.org/doi/10.1145/3025453.3025644
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents