Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-85483-8_18guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Predicting Listener Backchannels: A Probabilistic Multimodal Approach

Published: 01 September 2008 Publication History

Abstract

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

References

[1]
Drolet, A., Morris, M.: Rapport in conflict resolution: accounting for how face-toface contact fosters mutual cooperation in mixed-motive conflicts. Experimental Social Psychology 36, 26-50 (2000)
[2]
Goldberg, S.: The secrets of successful mediators. Negotiation Journal 21(3), 365- 376 (2005)
[3]
Tsui, P., Schultz, G.: Failure of rapport: Why psychotheraputic engagement fails in the treatment of asian clients. American Journal of Orthopsychiatry 55, 561-569 (1985)
[4]
Fuchs, D.: Examiner familiarity effects on test performance: implications for training and practice. Topics in Early Childhood Special Education 7, 90-104 (1987)
[5]
Burns, M.: Rapport and relationships: The basis of child care. Journal of Child Care 2, 47-57 (1984)
[6]
Cassell, J., Vilhjlmsson, H., Bickmore, T.: Beat: The behavior expressive animation toolkit. In: Proceedings of the SIGGRAPH (2001)
[7]
Lee, J., Marsella, S.: Nonverbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243-255. Springer, Heidelberg (2006)
[8]
Kipp, M., Neff, M., Kipp, K., Albrecht, I.: Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 15-28. Springer, Heidelberg (2007)
[9]
Thiebaux, M., Marshall, A., Marsella, S., Kallmann, M.: Smartbody: Behavior realization for embodied conversational agents. In: AAMAS (2008)
[10]
Morency, L.P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: ICMI (October 2005)
[11]
Demirdjian, D., Darrell, T.: 3-d articulated pose tracking for untethered deictic reference. In: Int'l Conf. on Multimodal Interfaces (2002)
[12]
Heylen, D., Bevacqua, E., Tellier, M., Pelachaud, C.: Searching for prototypical facial feedback signals. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 147-153. Springer, Heidelberg (2007)
[13]
Kopp, S., Stocksmeier, T., Gibbon, D.: Incremental multimodal feedback for conversational agents. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 139-146. Springer, Heidelberg (2007)
[14]
Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics 23, 1177-1207 (2000)
[15]
Gratch, J., Wang, N., Gerten, J., Fast, E.: Creating rapport with virtual agents. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722. Springer, Heidelberg (2007)
[16]
Jónsdóttir, G.R., Gratch, J., Fast, E., Thórisson, K.R.: Fluid semantic backchannel feedback in dialogue: Challenges and progress. In: Pélachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722. Springer, Heidelberg (2007)
[17]
Allwood, J.: Dimensions of Embodied Communication - towards a typology of embodied communication. In: Embodied Communication in Humans and Machines, Oxford University Press, Oxford
[18]
Yngve, V.: On getting a word in edgewise. In: Proceedings of the Sixth regional Meeting of the Chicago Linguistic Society (1970)
[19]
Bavelas, J., Coates, L., Johnson, T.: Listeners as co-narrators. Journal of Personality and Social Psychology 79(6), 941-952 (2000)
[20]
Nishimura, R., Kitaoka, N., Nakagawa, S.: A spoken dialog system for chat-like conversations considering response timing. In: Matou?sek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 599-606. Springer, Heidelberg (2007)
[21]
Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European ACL, pp. 51-58 (2003)
[22]
Anderson, H., Bader, M., Bard, E., Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H., Weinert, R.: The mcrc map task corpus. Language and Speech 34(4), 351-366 (1991)
[23]
Fujie, S., Ejiri, Y., Nakajima, K., Matsusaka, Y., Kobayashi, T.: A conversation robot using head gesture recognition as para-linguistic information. In: RO-MAN, pp. 159-164 (September 2004)
[24]
Maatman, M., Gratch, J., Marsella, S.: Natural behavior of a listening agent. In: Panayiotopoulos, T., Gratch, J., Aylett, R.S., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661. Springer, Heidelberg (2005)
[25]
Kang, S.H., Gratch, J., Wang, N., Watt, J.: Does the contingency of agents' nonverbal feedback affect users' social anxiety? In: AAMAS (2008)
[26]
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257-286 (1989)
[27]
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: ICML (2001)
[28]
Igor, S., Petr, S., Pavel, M., Luk, B., Michal, F., Martin, K., Jan, C.: Comparison of keyword spotting approaches for informal continuous speech. In: MLMI (2005)
[29]
hCRF library, http://sourceforge.net/projects/hcrf/

Cited By

View all
  • (2024)Synlogue with Aizuchi-bot: Investigating the Co-Adaptive and Open-Ended Interaction ParadigmProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642046(1-21)Online publication date: 11-May-2024
  • (2023)Multimodal Voice Activity Prediction: Turn-taking Events Detection in Expert-Novice ConversationProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623837(13-21)Online publication date: 4-Dec-2023
  • (2022)Improving Meeting Inclusiveness using Speech Interruption AnalysisProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548379(887-895)Online publication date: 10-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IVA '08: Proceedings of the 8th international conference on Intelligent Virtual Agents
September 2008
553 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 September 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Synlogue with Aizuchi-bot: Investigating the Co-Adaptive and Open-Ended Interaction ParadigmProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642046(1-21)Online publication date: 11-May-2024
  • (2023)Multimodal Voice Activity Prediction: Turn-taking Events Detection in Expert-Novice ConversationProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623837(13-21)Online publication date: 4-Dec-2023
  • (2022)Improving Meeting Inclusiveness using Speech Interruption AnalysisProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3548379(887-895)Online publication date: 10-Oct-2022
  • (2022)The Eye in Extended Reality: A Survey on Gaze Interaction and Eye Tracking in Head-worn Extended RealityACM Computing Surveys10.1145/349120755:3(1-39)Online publication date: 25-Mar-2022
  • (2022)TalkTive: A Conversational Agent Using Backchannels to Engage Older Adults in Neurocognitive Disorders ScreeningProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502005(1-19)Online publication date: 29-Apr-2022
  • (2021)Rapport Between Humans and Socially Interactive AgentsThe Handbook on Socially Interactive Agents10.1145/3477322.3477335(433-462)Online publication date: 10-Sep-2021
  • (2021)A Multimodal Model for Predicting Conversational FeedbacksText, Speech, and Dialogue10.1007/978-3-030-83527-9_46(537-549)Online publication date: 6-Sep-2021
  • (2019)Collaborative user responses in multiparty interaction with a couples counselor robotProceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction10.5555/3378680.3378723(294-303)Online publication date: 11-Mar-2019
  • (2019)Multimodal conversational interaction with robotsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233799(77-104)Online publication date: 1-Jul-2019
  • (2017)A Multifaceted Study on Eye Contact based Speaker Identification in Three-party ConversationsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3025644(3011-3021)Online publication date: 2-May-2017
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media