Abstract
We propose a computational model that endows conversational agents with the capability to coordinate their speaking turns (turn-taking management) in the context of mixed-initiative two-party dialogs. In human conversations, participants are continuously adjusting their verbal and non-verbal productions for ensuring the effective coordination of speaking turns. In our model, the decision making is a continuous process based on the intrinsic current goal of the agent with respect to turn-taking, namely its motivation to keep-or to leave-its current role (speaker or listener), and on its perception of the intentions of its partner. Concurrently, the agent is also producing signals indicating its willingness to maintain or leave its current role. Our model is based on two models from cognitive psychology: the drift-diffusion model and the theory of behavioral dynamics. After presenting simulations showing how our model makes the coordination emerge from the interactions, we propose a SAIBA-Compliant architecture, named BeAware, created to support the implementation of our model. Finally, using our model, we investigate how an agent’s turn-taking strategy may impact the user’s experience and the effectiveness of the coordination.
Similar content being viewed by others
References
Al Moubayed S, Lehman J (2015) Regulating turn-taking in multi-child spoken Interaction. In: Brinkman WP, Broekens J, Heylen D (eds) Intelligent virtual agents. Springer, Berlin, pp 363–374
Bailly G, Gouvernayre C (2012) Pauses and respiratory markers of the structure of book reading. In: 13th Annual conference of the international speech communication association (InterSpeech 2012), Portland
Balentine BE, Ayer CM, Miller CL, Scott BL (1997) Debouncing the speech button: a sliding capture window device for synchronizing turn-taking. Int J Speech Technol 2(1):7–19
Baumann T, Schlangen D (2012) INPRO_iSS: a component for just-in-time incremental speech synthesis. In: Proceedings of the ACL 2012 system demonstrations, association for computational linguistics, Stroudsburg, pp 103–108
Bevacqua E, Pammi S, Hyniewska SJ, Schröder M, Pelachaud C (2010) Multimodal backchannels for embodied conversational agents. In: Proceedings intelligent virtual agents 2010 conference, Philadelphia, pp 194–200
Bevacqua E, Stanković I, Maatallaoui A, Nédélec A, De Loor P (2014) Effects of coupling in human-virtual agent body interaction. In: Proceeedings of intelligent virtual agents 2014 conference, pp 54–63
Beňuš v, Gravano A, Hirschberg J (2011) Pragmatic aspects of temporal accommodation in turn-taking. J Pragmat 43(12):3001–3027
Bogacz R, Brown E, Moehlis J, Holmes P, Cohen J (2006) The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev 113(4):700–765
Bohus D, Horvitz E (2010) Facilitating multiparty dialog with gaze, gesture, and speech. In: International conference on multimodal interfaces and the workshop on machine learning for multimodal interaction, ICMI-MLMI ’10. ACM, New York, pp 1–8
Bohus D, Horvitz E (2011) Decisions about turns in multiparty conversation: from perception to action. In: Proceedings of the 13th international conference on multimodal interfaces, pp 153–160
Bunt H (2006) Dimensions in dialogue act annotation. Proc LREC 6:919–924
Buschmeier H, Kopp S (2014) When to elicit feedback in dialogue: towards a model based on the information needs of speakers. In: Proceedings of the 14th international conference on intelligent virtual agents
Cafaro A, Glas N, Pelachaud C (2016) The effects of interrupting behavior on interpersonal attitude and engagement in dyadic interactions. In: Proceedings of the 2016 international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, pp 911–920
Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjlmsson H, Yan H (1999) Embodiment in conversational interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 520–527
Clancy B, McCarthy M (2015) Co-constructed turn-taking. Corpus pragmatics. Cambridge University Press, Cambridge, pp 430–453
Clark HH (1996) Using language. Cambridge University Press, Cambridge
Clavel C, Cafaro A, Campano S, Pelachaud C (2016) Fostering user engagement in face-to-face human–agent interactions: a survey. In: Esposito A, Jain LC (eds) Toward robotic socially believable behaving systems-volume II, vol 106. Springer, Berlin, pp 93–120
Cutler A, Pearson M (1985) On the analysis of prosodic turn-taking cues. In: Johns-Lewis C (ed) Intonation in discourse. Croom Helm, London, pp 139–155
De Ruiter JP, Mitterer H, Enfield NJ (2006) Projecting the end of a speaker’s turn: a cognitive cornerstone of conversation. Language 82(3):515–535
De Vault D, Sagae K, Traum D (2011) Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialogue Discourse 2(1):143–170
De Vault D, Mell J, Gratch J (2015) Toward natural turn-taking in a virtual human negotiation agent. In: AAAI Spring symposium on turn-taking and coordination in human–machine interaction, Stanford
Duncan S (1972) Some signals and rules for taking speaking turns in conversations. J Personal Soc Psychol 23(2):283–292
Eyben F, Weninger F, Gross F, Schuller B (2013) Recent Developments in openSMILE, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on multimedia, pp 835–838
Ferrer L, Shriberg E, Stolcke A (2002) Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody. In: Interspeech
Ford C, Thompson S (1996) Interactional units in conversation: syntactic, intonational, and pragmatic resources for the management of turns. In: Ochs E, Schegloff E, Thompson S (eds) Interaction and grammar. Studies in interactional sociolinguistics, Cambridge University Pres, Cambridge, pp 134–184
Fowler CA, Richardson MJ, Marsh KL, Shockley KD (2008) Language use, coordination, and the emergence of cooperative action. In: Fuchs A, Jirsa VK (eds) Coordination: neural, behavioral and social dynamics. Springer, Berlin, pp 261–279
French P, Local J (1983) Turn-competitive incomings. J Pragmat 7(1):17–38
Goldberg JA (1990) Interrupting the discourse on interruptions: an analysis in terms of relationally neutral, power- and rapport-oriented acts. J Pragmat 14(6):883–903
Gravano A, Hirschberg J (2011) Turn-taking cues in task-oriented dialogue. Comput Speech Lang 25(3):601–634
Haken H, JaS Kelso, Bunz H (1985) A theoretical model of phase transitions in human hand movements. Biol Cybern 51(5):347–356
Heldner M, Edlund J (2010) Pauses, gaps and overlaps in conversations. J Phon 38(4):555–568
Hjalmarsson A (2011) The additive effect of turn-taking cues in human and synthetic voice. Speech Commun 53(1):23–35
Huang L, Morency LP, Gratch J (2011) A multimodal end-of-turn prediction model: learning from parasocial consensus sampling. In: The 10th international conference on autonomous agents and multiagent systems-vol 3, AAMAS’11, Richland, pp 1289–1290
Jégou M, Lefebvre L, Chevaillier P (2015) A continuous model for the management of turn-taking in user-agent spoken interactions based on the variations of prosodic signals. In: Proceedings intelligent virtual agents 2015 conference, lecture notes in computer science, vol 9238. Springer, Berlin, pp 389–398
Jonsdottir GR, Thórisson KR (2013) A distributed architecture for real-time dialogue and on-task learning of efficient co-operative turn-taking. In: Campbell N (ed) Coverbal synchrony in human–machine interaction. CRC Press, Boca Raton, pp 293–323
Kelso JAS (2013) Coordination dynamics. In: Meyers R (ed) Encyclopedia of complexity and systems science. Springer, New York
Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychol 26:22–63
de Kok I, Heylen D (2009) Multimodal end-of-turn prediction in multi-party meetings. In: Proceedings of the 2009 international conference on multimodal interfaces, ICMI-MLMI ’09. ACM, New York, pp 91–98
Kopp S, Buschmeier H (2014) A dynamic minimal model of the listener for feedback-based dialogue coordination. In: Proceedings of the 18th workshop on the semantics and pragmatics of dialogue, Edinburgh, pp 17–25
Kopp S, van Welbergen H, Yaghoubzadeh R, Buschmeier H (2014) An architecture for fluid real-time conversational agents: integrating incremental output generation and input processing. J Multimodal User Interfaces 8(1):97–108
Kronlid F (2006) Turn taking for artificial conversational agents. In: Klusch M, Rovatsos M, Payne TR (eds) Cooperative information agents X. Springer, Berlin, pp 81–95
Kurtić E, Brown GJ, Wells B (2013) Resources for turn competition in overlapping talk. Speech Commun 55(5):721–743
Leßmann N, Kranstedt A, Wachsmuth I (2004) Towards a cognitively motivated processing of turn-taking signals for the embodied conversational agent max. In: Proceedings of the workshop embodied conversational agents: balanced perception and action. ACM Press, New-York, 19–23 August, p–65
Levitan R, Beňuš S, Gravano A, Hirschberg J (2015) Entrainment and turn-taking in human-human dialogue. In: 2015 AAAI spring symposium series
ter Maat M, Heylen D (2009) Turn management or impression management? In: Proceedings intelligent virtual agents 2009 conference. Springer, Berlin, pp 467–473
Magyari L, de Ruiter JP (2012) Prediction of turn-ends based on anticipation of upcoming words. Front Psychol 3:376
McFarland DH (2001) Respiratory markers of conversational interaction. J Speech Lang Hear Res 44:128–143
Mondada L (2007) Multimodal resources for turn-taking: pointing and the emergence of possible next speakers. Discourse Stud 9(2):194–225
Mutlu B, Forlizzi J, Hodgins J (2006) A storytelling robot: modeling and evaluation of human-like gaze behavior. In: 6th IEEE-RAS international conference on humanoid robots, pp 518–523
Novick D, Hansen B, Ward K (1996) Coordinating turn-taking with gaze. In: Proceedings of the fourth international conference on spoken language, ICSLP 96, vol 3, pp 1888–1891
OConnell DC, Kowal S (2008) Turn-taking. In: Communicating with one another, cognition and language: a series in psycholinguistics. Springer, New York, pp 1–13
O’Connell DC, Kowal S, Kaltenbacher E (1990) Turn-taking: a critical analysis of the research tradition. J Psycholinguist Res 19(6):345–373
Oertel C, Wlodarczak M, Edlund J, Wagner P, Gustafson J (2013) Gaze patterns in turn-taking. In: 13th annual conference of the international speech communication association (Interspeech 2012)
Padilha E, Carletta J (2002) A simulation of small group discussion. In: Proceedings of EDILOG, pp 117–124
Paek T, Horvitz E, Ringger EK (2000) Continu-ous listening for unconstrained spoken dialog. In: Proceedings interspeech 2000, pp 138–141
Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85(2):59–108
Ratcliff R (1980) A note on modeling accumulation of information when the rate of accumulation changes over time. J Math Psychol 21(2):178–184
Raux A, Eskenazi M (2012) Optimizing the turn-taking behavior of task-oriented spoken dialog systems. ACM Trans Speech Lang Process 9(1):1–23
Ravenet B, Cafaro A, Biancardi B, Ochs M, Pelachaud C (2015) Conversational behavior reflecting interpersonal attitudes in small group interactions. In: Proceedings of intelligent virtual agents 2015 conference, vol 9238. Springer, Berlin, p 375
Reidsma D, de Kok I, Neiberg D, Pammi SC, van Straalen B, Truong K, van Welbergen H (2011) Continuous interaction with a virtual human. J Multimodal User Interfaces 4(2):97–118
Riest C, Jorschick AB, de Ruiter JP (2015) Anticipation in turn-taking: mechanisms and information sources. Lang Sci 6:89
Rio KW, Rhea CK, Warren WH (2014) Follow the leader: visual control of speed in pedestrian following. J Vis 14(2):4
Sacks H, Schegloff EA, Jefferson G (1974) A simplest systematics for the organization of turn-taking for conversation. Language 50(4):696–735
Schegloff EA (2000) Overlapping talk and the organization of turn-taking for conversation. Lang soc 29(01):1–63
Schlangen D (2006) From reaction to prediction: experiments with computational models of turn-taking. In: Proceedings of interspeech 2006, panel on prosody of dialogue acts and turn-taking
Selfridge E, Arizmendi I, Heeman P, Williams J (2013) Continuously predicting and processing barge-in during a live spoken dialogue task. In: Proceedings of the SIGDIAL 2013 conference, pp 384–393
Selfridge EO, Heeman PA (2009) A bidding approach to turn-taking. In: 1st International workshop on spoken dialogue systems
Skantze G, Hjalmarsson A (2010) Towards incremental speech generation in dialogue systems. In: Proceedings of SIGDIAL 2010, pp 1–8
Skantze G, Hjalmarsson A, Oertel C (2014) Turn-taking, feedback and joint attention in situated human–robot interaction. Speech Commun 65:50–66. https://doi.org/10.1016/j.specom.2014.05.005
Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T, Hoymann G, Rossano F, Ruiter JPd, Yoon KE, Levinson SC (2009) Universals and cultural variation in turn-taking in conversation. Proc Natl Acad Sci 106(26):10587–10592
Ter Maat M, Truong KP, Heylen D (2010) How turn-taking strategies influence users impressions of an agent. In: Proceedings of intelligent virtual agents 2010 conference, pp 441–453
Thórisson KR (1999) A mind model for multimodal communicative creatures and humanoids. Int J Appl Artif Intell 13(4):449–486
Thórisson KR (2002) Natural turn-taking needs no manual: computational theory and model, from perception to action. In: Granström B, House D, Karlsson I (eds) Multimodality in language and speech systems. Text, speech and language technology, vol 19. Springer, Dordrecht
Thórisson KR, Gislason O, Jonsdottir GR, Thórisson HT (2010) A multiparty multimodal architecture for realtime turntaking. In: Proceedings of intelligent virtual agents 2010 conference. Springer, Berlin, pp 350–356
Torreira F, Bögels S, Levinson SC (2015) Breathing for answering: the time course of response planning in conversation. Front Psychol 6:284
Ward NG, Rivera AG, Ward K, Novick DG (2005) Root causes of lost time and user stress in a simple dialog system. In: Proceedings of interspeech 2005 conference
Warren WH (2006) The dynamics of perception and action. Psychol Rev 113(2):358–389
Wilson M, Wilson TP (2005) An oscillator model of the timing of turn-taking. Psychon Bull Rev 12(6):957–968
Wilson TP, Zimmerman DH (1986) The structure of silence between turns in two party conversation. Discourse Process 9(4):375–390
Witt S (2014) Modeling user response timings in spoken dialog systems. Int J Speech Technol 18(2):231–243
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jégou, M., Chevaillier, P. A computational model for the emergence of turn-taking behaviors in user-agent interactions. J Multimodal User Interfaces 12, 199–223 (2018). https://doi.org/10.1007/s12193-018-0265-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-018-0265-3