Perceptual evaluation of backchannel strategies for artificial listeners

Ronald Poppe¹,
Khiet P. Truong¹ &
Dirk Heylen¹

680 Accesses
6 Citations
Explore all metrics

Abstract

Artificial listeners are virtual agents that can listen attentively to a human speaker in a dialog. In this paper, we present two experiments where we investigate the perception of rule-based backchannel strategies for artificial listeners. In both, we collect subjective judgements of humans who observe a video of a speaker together with a corresponding animation of an artificial listener. In the first experiment, we evaluate six rule-based strategies that differ in the types of features (e.g. prosody, gaze) they consider. The ratings are given at the level of a speech turn and can be considered a measure for how human-like the generated listening behavior is perceived. In the second experiment, we systematically investigate the effect of the quantity, type and timing of backchannels within the discourse of the speaker. Additionally, we asked human observers to press a button whenever they thought a generated backchannel occurrence was inappropriate. Both experiments together give insights in the factors, both from an observation and generation point-of-view, that influence the perception of backchannel strategies for artificial listeners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regressive spectral assimilation bias in speech perception

Article 21 May 2019

Why are listeners hindered by talker variability?

Article Open access 14 August 2023

Analysis of conversational listening skills toward agent-based social skills training

Article 16 October 2019

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality and Social Psychology, 79(6), 941–952.
Article Google Scholar
Bavelas, J. B., Coates, L., & Johnson, T. (2002). Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52(3), 566–580.
Article Google Scholar
Bertrand, R., Ferré, G., Blache, P., Espesser, R., & Rauzy, S. (2007). Backchannels revisited from a multimodal perspective. Proceedings of Auditory-Visual Speech Processing (pp. 1–5). Hilvarenbeek, The Netherlands.
Bevacqua, E., Pammi, S., Hyniewska, S.J., Schröder, M., & Pelachaud, C. (2010). Multimodal backchannels for embodied conversational agents. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 194–200). Philadelphia.
Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer. Software. www.praat.org.
Brunner, L. J. (1979). Smiles can be back channels. Journal of Personality and Social Psychology, 37(5), 728–734.
Article MathSciNet Google Scholar
Cathcart, N., Carletta, J., & Klein, E. (2003). A shallow model of backchannel continuers in spoken dialogue. Proceedings of the Conference of the European chapter of the Association for Computational Linguistics, vol. 1 (pp. 51–58). Budapest, Hungary.
de Kok, I., Ozkan, D., Heylen, D., & Morency, L.P. (2010). Learning and evaluating response prediction models using parallel listener consensus. Proceeding of International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI) (p. A3). Beijing, China.
de Kok, I., Poppe, R., & Heylen, D. (2012) Iterative perceptual learning for social behavior synthesis. Technical Report TR-CTIT-12-01, Enschede.
Dittmann, A. T., & Llewellyn, L. G. (1967). The phonemic clause as a unit of speech decoding. Journal of Personality and Social Psychology, 6(3), 341–349.
Article Google Scholar
Dittmann, A. T., & Llewellyn, L. G. (1968). Relationship between vocalizations and head nods as listener responses. Journal of Personality and Social Psychology, 9(1), 79–84.
Article Google Scholar
Duncan, S, Jr. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23(2), 283–292.
Article Google Scholar
Duncan, S, Jr. (1974). On the structure of speaker-auditor interaction during speaking turns. Language in Society, 3(2), 161–180.
Article Google Scholar
Granström, B., House, D., & Swerts, M. (2002). Multimodal feedback cues in human-machine interactions. Proceedings of the International Conference on Speech Prosody (pp. 11–14). Aix-en-Provence, France.
Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R.J., & Morency, L.P. (2006). Virtual rapport. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 14–27). Marina del Rey, CA.
Gravano, A., & Hirschberg, J. (2009). Backchannel-inviting cues in task-oriented dialogue. Proceedings of Interspeech (pp. 1019–1022). Brighton, UK.
Heylen, D., Bevacqua, E., Pelachaud, C., Poggi, I., Gratch, J., & Schröder, M. (2011). Generating Listening Behaviour (Part 4). In: Emotion-oriented systems cognitive technologies, (pp. 321–347) Springer.
Huang, L., Morency, L.P., & Gratch, J. (2010). Learning backchannel prediction model from parasocial consensus sampling: A subjective evaluation. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 159–172). Philadelphia, PA.
Huang, L., Morency, L.P., & Gratch, J. (2011). Virtual rapport 2.0. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 68–79). Reykjavik, Iceland.
Jonsdottir, G.R., Gratch, J., Fast, E., & Thórisson, K.R. (2007). Fluid semantic back-channel feedback in dialogue: Challenges and progress. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 154–160). Paris, France.
Kendon, A. (1967). Some functions of gaze direction in social interaction. Acta Psychologica, 26(1), 22–63.
Article Google Scholar
Kitaoka, N., Takeuchi, M., Nishimura, R., & Nakagawa, S. (2005). Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. Transactions of the Japanese Society for Artificial Intelligence, 20(3), 220–228.
Article Google Scholar
Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., & Den, Y. (1998). An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and Speech, 41(3–4), 295–321.
Google Scholar
Maatman, M., Gratch, J., & Marsella, S. (2005). Natural behavior of a listening agent. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 25–36). Kos, Greece.
Morency, L. P., de Kok, I., & Gratch, J. (2010). A probabilistic multimodal approach for predicting listener backchannels. Autonomous Agents and Multi-Agent Systems, 20(1), 80–84.
Article Google Scholar
Noguchi, H., & Den, Y. (1998). Prosody-based detection of the context of backchannel responses. Proceedings of the International Conference on Spoken Language Processing (ICSLP) (pp. 487–490). Sydney, Australia.
Okato, Y., Kato, K., Yamamoto, M., & Itahashi, S. (1996) Insertion of interjectory response based on prosodic information. Proceedings of the IEEE Workshop Interactive Voice Technology for Telecommunication Applications (pp. 85–88). Basking Ridge, NJ.
Poppe, R., ter Maat, M., & Heylen, D. (2012). Online behavior evaluation with the switching wizard of oz. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 486–488). Santa Cruz, CA.
Poppe, R., Truong, K.P., & Heylen, D. (2011). Backchannels: Quantity, type and timing matters. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 228–239). Reykjavik, Iceland.
Poppe, R., Truong, K.P., Reidsma, D., & Heylen, D. (2010). Backchannel strategies for artificial listeners. Proceedings of the International Conference on Interactive Virtual Agents (IVA) (pp. 146–158). Philadelphia, PA
Truong, K.P., Poppe, R., & Heylen, D. (2010). A rule-based backchannel prediction model using pitch and pause information. Proceedings of Interspeech (pp. 490–493). Makuhari, Japan.
Truong, K.P., Poppe, R., de Kok, I., & Heylen, D. (2011). A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. Proceedings of Interspeech (pp. 2973–2976). Florence, Italy.
Valstar, M.F., McKeown, G., Cowie, R., & Pantic, M. (2010). The Semaine corpus of emotionally coloured character interactions. Proceedings of the International Conference on Multimedia & Expo (pp. 1079–1084). Singapore, Singapore.
Van Welbergen, H., Reidsma, D., Ruttkay, Z., & Zwiers, J. (2010). Elckerlyc: A BML realizer for continuous, multimodal interaction with a virtual human. Journal of Multimodal User Interfaces, 3(4), 271–284.
Article Google Scholar
Ward, N., & Tsukahara, W. (2000). Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 32(8), 1177–1207.
Article Google Scholar
Xudong, D. (2009). Listener response. In: The pragmatics of interaction (pp. 104–124). Amsterdam: John Benjamins Publishing.
Yngve, V.H. (1970). On getting a word in edgewise. In: Papers from the Sixth Regional Meeting of Chicago Linguistic Society, pp. 567–577. Chicago Linguistic Society.

Download references

Acknowledgments

The authors would like to thank Dennis Reidsma for his assistance in preparing the stimuli, and Iwan de Kok for interesting discussions.

Author information

Authors and Affiliations

University of Twente, Human Media Interaction Group, PO Box 217, 7500 , Enschede, AE, The Netherlands
Ronald Poppe, Khiet P. Truong & Dirk Heylen

Authors

Ronald Poppe
View author publications
Search author on:PubMed Google Scholar
Khiet P. Truong
View author publications
Search author on:PubMed Google Scholar
Dirk Heylen
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ronald Poppe.

Additional information

The research leading to these results has received funding from the European Community’s 7th Framework Programme under Grant agreement 211486 (SEMAINE) and 231287 (SSPNet). Preliminary versions of this paper appeared as [29] and [30].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poppe, R., Truong, K.P. & Heylen, D. Perceptual evaluation of backchannel strategies for artificial listeners. Auton Agent Multi-Agent Syst 27, 235–253 (2013). https://doi.org/10.1007/s10458-013-9219-z

Download citation

Published: 13 January 2013
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10458-013-9219-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Regressive spectral assimilation bias in speech perception

Why are listeners hindered by talker variability?

Analysis of conversational listening skills toward agent-based social skills training

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Perceptual evaluation of backchannel strategies for artificial listeners

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Regressive spectral assimilation bias in speech perception

Why are listeners hindered by talker variability?

Analysis of conversational listening skills toward agent-based social skills training

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now