Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action

Kristinn R. Thórisson⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 19))

222 Accesses
48 Citations
3 Altmetric

Abstract

Decisions like these are made by dialogue participants as often as 2–3 times per second. For a 30 minute conversation that’s over 5000 decisions. And that’s just a fraction of what goes on. How do we do it? Face-to-face dialogue consists of interaction between several complex, dynamic systems — visual and auditory display of information, internal processing, knee-jerk reactions, thought-out rhetoric, learned patterns, social convention, etc. One could postulate that the power of dialogue is a direct result of this fact. However, combining a multitude of systems in one place does not guarantee a coherent outcome such as goal-directed dialogue. For this to happen the systems need to be architected in a way that guides their interaction and ensures that — complex as it may be — the interaction tends towards homeostasis in light of errors and uncertainties, towards the set of goals shared by participants.

Beth and Alan are sitting at a Fifth Avenue outdoors restaurant in Manhattan. Alan is telling Beth an exciting story about his vacation in Nice. Alan presents the story through gesture and speech. Then Beth’s arm starts moving and her neck stiffens.

We, the viewers, know that she’s surprised to see an elephant in the middle of Manhattan, and that in 460 milliseconds her arm and hand motion will turn into a well-defined deictic gesture, her eyebrows will rise, and her mouth will open with surprise, at which point Alan will most certainly recognize the signs and look over at the elephant. But right now, at t-minus-460 milliseconds, Beth’s gesture is barely recognizable as a communicative action, so Alan doesn’t know for sure. And thus, before that all happens, in the next 460 milliseconds, Alan has to decide what to do about Beth’s behavior. Should he stop telling his story? Or should he go on, in case Beth is simply adjusting her jacket?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

You cannot speak and listen at the same time: a probabilistic model of turn-taking

Article 06 March 2017

Conversational UX Design: An Introduction

About Understanding

References

Adler, R. (1989). Blackboard Systems. In S. C. Shapiro (ed.), The Encyclopedia of Artificial Intelligence, 2nd ed., 110–116. New York, NY: Wiley Interscience.
Google Scholar
Boff, K. R., L. Kaufman, and J. P. Thomas (eds.) (1986). Handbook of Human Perception. New York, New York: John Wiley and Sons.
Google Scholar
Bryson, J. and K. R. Th6risson (in press). Dragons, Bats and Evil Knights: A Chatacter-Based Approach to Constructive Play. Submitted to Virtual Reality, Special Issue on Intelligent Agents. London: Springer.
Google Scholar
Cahn, J. E. and S. E. Brennan (1999). A Psychological Model of Grounding and Repair in Dialog. Proceedings of the Fall 1999 AAAI Symposium on Psychological Models of Communication in Collaborative Systems,Sea Cliff, Massachusetts, November 5–7, 25–33.
Google Scholar
Card, S. K., T. P. Moran, and A. Newell (1983). The Psychology of Human-Computer Interaction. Hillsdale, New Jersey: Lawrence Earlbaum Associates.
Google Scholar
Cassell, J. and K. R. Thdrisson (1999). The Power of a Nod and a Glance: Envelope vs. Emotional Feedback in Animated Conversational Agents. Applied Artificial Intelligence, 13 (4–5), 519–538.
Article Google Scholar
Clark, H. H. (1992). Arenas of Language Use. Chicago, Illinoi: University of Chicago Press. Clark, H.H. and E. F. Schaefer (1989). Contributing to Discourse. Cognitive Science, 13: 259–294.
Google Scholar
Dodhiawala, R. T. (1989). Blackboard Systems in Real-Time Problem Solving. In Jagannathan, V., Dodhiawala, R. and Baum, L. S. (eds.), Blackboard Architectures and Applications, 181–191. Boston: Academic Press, Inc.
Google Scholar
Duncan, S. Jr. (1972). Some Signals and Rules for Taking Speaking Turns in Conversations. Journal of Personality and Social Psychology, 23 (2), 283–292.
Article Google Scholar
Effron, D. (1941/1972). Gesture, Race and Culture. The Hague: Mouton.
Google Scholar
Ekman, P. and W. Friesen (1969). The Repertoire of Non-Verbal Behavior: Categories, Origins, Usage, and Coding. Semiotica, 1, 49–98.
Google Scholar
Goodwin, M. H. and C. Goodwin (1986). Gesture and Coparticipation in the Activity of Searching for a Word. Semiotica, 62 (1/2), 51–75.
Google Scholar
Goodwin, C. (1981). Conversational Organization: Interaction Between Speakers and Hearers. New York, NY: Academic Press.
Google Scholar
Goodwin, C. (1986). Gestures as a Resource for the Organization of Mutual Orientation. Semiotica, 62 (1/2), 29–49.
Google Scholar
Grice, H. P. (1989). Studies in the Way of Words. Cambridge, Massachusetts: Harvard University Press.
Google Scholar
Grosz, B. J. and C. L. Sidner (1986). Attention, Intentions, and the Strucutre of Discourse. Computational Linguistics, 12 (3), 175–204.
Google Scholar
Kahneman, D. (1973). Attention and Effort. New Jersey: Prentice-Hall, Inc.
Google Scholar
Kleinke, C. (1986). Gaze and Eye Contact: A Research Review. Psychological Bulletin, 100 (1), 78–100.
Article Google Scholar
Kosslyn, S. M. and O. Koenig (1992). Wet Mind: The New Cognitive Neuroscience. New York, New York: The Free Press.
Google Scholar
Lenat, D. B. (1995). Cyc: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM, 38 (11).
Google Scholar
Maes, P. (ed.) (1990a). Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back. Cambridge, MA: MIT Press/Elsevier.
Google Scholar
Maes, P. (1990b). Situated Agents can have Goals. In P. Maes (ed.), Designing Autonomous Agents, 4970. Cambridge, MA: MIT Press.
Google Scholar
McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago, IL: University of Chicago Press.
Google Scholar
Nespolous, J-L and Lecours, A. R. (1986). Gestures: Nature and Function. In J-L Nespolous, P. Perron and A. R. Lecours (eds.), The Biological Foundations of Gestures: Motor and Semiotic Aspects, 49–62. Hillsdale, NJ: Lawrence Earlbaum Associates.
Google Scholar
Newell, A. (1990). Unified Theories of Cognition. Cambridge, MA: Harvard University Press.
Google Scholar
Nii, P. (1989). Blackboard Systems. In A. Barr, P. R. Cohen and E. A. Feigenbaum (eds.), The Handbook of Artificial Intelligence, Vol. IV, 1–74. Reading, MA: Addison-Wesley Publishing Co.
Google Scholar
Pierrehumbert, J. and J. Hirschberg (1990). The Meaning of Intonational Contours in the Interpretation of Discourse. In P. R. Cohen, J. Morgan and M. E. Pollack (eds.), Intentions in Communication. Cambridge: MIT Press.
Google Scholar
Rimé, B. and Schiaratura, L. (1991). Gesture and Speech. In R. S. Feldman and B. Rimé, Fundamentals of Nonverbal Behavior, 239–281. New York: Press Syndicate of the University of Cambridge.
Google Scholar
Sacks, H., Schegloff, E. A.. and Jefferson, G. A. (1974). A Simplest Systematics for the Organization of Tum-Taking in Conversation. Language, 50, 696–735.
Article Google Scholar
Sacks, H. (1992). Lectures on Conversation, vol II. Cambridge, MA: Blackwell. Schegloff, E. A. and H. Sacks (1973). Opening up Closings. Semiotica, 7, 289–327.
Google Scholar
Selfridge, O. (1959). Pandemonium: A Paradigm for Learning. Proceedings of Symposium on the Mechanization of Thought Processes, 1959, 511–29.
Google Scholar
Sommer, R. (1959). Studies in Personal Space. Sociometry, 23, 247–260.
Article Google Scholar
Taylor, T. J. and D. Cameron (1987). Analysing Conversation: Rules and Units in the Structure of Talk. Oxford, England: Pergamon Press.
Google Scholar
Th6risson, K. R. (in press). Machine Perception of Embodied, Real-Time, Multimodal Dialogue. To be published in P. McKevitt (ed.), Language, Vision and Music.
Google Scholar
Th6risson, K. R. (1999). A Mind Model for Multimodal Communicative Creatures and Humanoids. Inter- national Journal of Applied Artificial Intelligence, 1999, Vol. 13 (4–5), 449–486.
Article Google Scholar
ThOrisson, K. R. (1998). Decision Making in Real-Time Face-to-Face Multimodal Communication. Second ACM International Conference on Autonomous Agents `98, Minneapolis, Minnesota, May 12–15.
Google Scholar
Thdrisson, K. R. (1997). Layered, Modular Action Control in Communicative Humanoids. Proceedings of Computer Graphics Europe ‘87, June 5–7, Genieva, 134–143.
Google Scholar
Th6risson, K. R. (1996). Communicative Humanoids: A Computational Model of Psychosocial Dialogue Skills. Ph.D. Thesis, Massachusetts Institute of Technology, U.S.A.
Google Scholar
Walker, M. and Whittaker, S. (1990). Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics.
Google Scholar
Whittaker, S., S. E. Brennan and H. H. Clark (1991). Co-ordinated Activity: An Analysis of Interaction in Computer-Supported Co-operative Work. Proceedings of Conference on Computer Human Interaction, 361–367.
Google Scholar
Whittaker, S. and Stenton, P. (1988). Cues and Control in Expert-Client Dialogues. Proc. 26th Annual Meeting of the Association of Computational Linguistics, 123–130.
Google Scholar
Yngve, V. H. (1970). On Getting a Word in Edgewise. Papers from the Sixth Regional Meeting., Chicago Linguistics Society, 567–78.
Google Scholar

Download references

Author information

Authors and Affiliations

Communicative Machines Inc., 131 E 23rd St., suite 2C, New York, NY, 10010, USA
Kristinn R. Thórisson

Authors

Kristinn R. Thórisson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
Björn Granström , David House & Inger Karlsson , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Thórisson, K.R. (2002). Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_8

Download citation

DOI: https://doi.org/10.1007/978-94-017-2367-1_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6024-2
Online ISBN: 978-94-017-2367-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

You cannot speak and listen at the same time: a probabilistic model of turn-taking

Conversational UX Design: An Introduction

About Understanding

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

You cannot speak and listen at the same time: a probabilistic model of turn-taking

Conversational UX Design: An Introduction

About Understanding

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation