Abstract
Decisions like these are made by dialogue participants as often as 2–3 times per second. For a 30 minute conversation that’s over 5000 decisions. And that’s just a fraction of what goes on. How do we do it? Face-to-face dialogue consists of interaction between several complex, dynamic systems — visual and auditory display of information, internal processing, knee-jerk reactions, thought-out rhetoric, learned patterns, social convention, etc. One could postulate that the power of dialogue is a direct result of this fact. However, combining a multitude of systems in one place does not guarantee a coherent outcome such as goal-directed dialogue. For this to happen the systems need to be architected in a way that guides their interaction and ensures that — complex as it may be — the interaction tends towards homeostasis in light of errors and uncertainties, towards the set of goals shared by participants.
Beth and Alan are sitting at a Fifth Avenue outdoors restaurant in Manhattan. Alan is telling Beth an exciting story about his vacation in Nice. Alan presents the story through gesture and speech. Then Beth’s arm starts moving and her neck stiffens.
We, the viewers, know that she’s surprised to see an elephant in the middle of Manhattan, and that in 460 milliseconds her arm and hand motion will turn into a well-defined deictic gesture, her eyebrows will rise, and her mouth will open with surprise, at which point Alan will most certainly recognize the signs and look over at the elephant. But right now, at t-minus-460 milliseconds, Beth’s gesture is barely recognizable as a communicative action, so Alan doesn’t know for sure. And thus, before that all happens, in the next 460 milliseconds, Alan has to decide what to do about Beth’s behavior. Should he stop telling his story? Or should he go on, in case Beth is simply adjusting her jacket?
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adler, R. (1989). Blackboard Systems. In S. C. Shapiro (ed.), The Encyclopedia of Artificial Intelligence, 2nd ed., 110–116. New York, NY: Wiley Interscience.
Boff, K. R., L. Kaufman, and J. P. Thomas (eds.) (1986). Handbook of Human Perception. New York, New York: John Wiley and Sons.
Bryson, J. and K. R. Th6risson (in press). Dragons, Bats and Evil Knights: A Chatacter-Based Approach to Constructive Play. Submitted to Virtual Reality, Special Issue on Intelligent Agents. London: Springer.
Cahn, J. E. and S. E. Brennan (1999). A Psychological Model of Grounding and Repair in Dialog. Proceedings of the Fall 1999 AAAI Symposium on Psychological Models of Communication in Collaborative Systems,Sea Cliff, Massachusetts, November 5–7, 25–33.
Card, S. K., T. P. Moran, and A. Newell (1983). The Psychology of Human-Computer Interaction. Hillsdale, New Jersey: Lawrence Earlbaum Associates.
Cassell, J. and K. R. Thdrisson (1999). The Power of a Nod and a Glance: Envelope vs. Emotional Feedback in Animated Conversational Agents. Applied Artificial Intelligence, 13 (4–5), 519–538.
Clark, H. H. (1992). Arenas of Language Use. Chicago, Illinoi: University of Chicago Press. Clark, H.H. and E. F. Schaefer (1989). Contributing to Discourse. Cognitive Science, 13: 259–294.
Dodhiawala, R. T. (1989). Blackboard Systems in Real-Time Problem Solving. In Jagannathan, V., Dodhiawala, R. and Baum, L. S. (eds.), Blackboard Architectures and Applications, 181–191. Boston: Academic Press, Inc.
Duncan, S. Jr. (1972). Some Signals and Rules for Taking Speaking Turns in Conversations. Journal of Personality and Social Psychology, 23 (2), 283–292.
Effron, D. (1941/1972). Gesture, Race and Culture. The Hague: Mouton.
Ekman, P. and W. Friesen (1969). The Repertoire of Non-Verbal Behavior: Categories, Origins, Usage, and Coding. Semiotica, 1, 49–98.
Goodwin, M. H. and C. Goodwin (1986). Gesture and Coparticipation in the Activity of Searching for a Word. Semiotica, 62 (1/2), 51–75.
Goodwin, C. (1981). Conversational Organization: Interaction Between Speakers and Hearers. New York, NY: Academic Press.
Goodwin, C. (1986). Gestures as a Resource for the Organization of Mutual Orientation. Semiotica, 62 (1/2), 29–49.
Grice, H. P. (1989). Studies in the Way of Words. Cambridge, Massachusetts: Harvard University Press.
Grosz, B. J. and C. L. Sidner (1986). Attention, Intentions, and the Strucutre of Discourse. Computational Linguistics, 12 (3), 175–204.
Kahneman, D. (1973). Attention and Effort. New Jersey: Prentice-Hall, Inc.
Kleinke, C. (1986). Gaze and Eye Contact: A Research Review. Psychological Bulletin, 100 (1), 78–100.
Kosslyn, S. M. and O. Koenig (1992). Wet Mind: The New Cognitive Neuroscience. New York, New York: The Free Press.
Lenat, D. B. (1995). Cyc: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM, 38 (11).
Maes, P. (ed.) (1990a). Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back. Cambridge, MA: MIT Press/Elsevier.
Maes, P. (1990b). Situated Agents can have Goals. In P. Maes (ed.), Designing Autonomous Agents, 4970. Cambridge, MA: MIT Press.
McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago, IL: University of Chicago Press.
Nespolous, J-L and Lecours, A. R. (1986). Gestures: Nature and Function. In J-L Nespolous, P. Perron and A. R. Lecours (eds.), The Biological Foundations of Gestures: Motor and Semiotic Aspects, 49–62. Hillsdale, NJ: Lawrence Earlbaum Associates.
Newell, A. (1990). Unified Theories of Cognition. Cambridge, MA: Harvard University Press.
Nii, P. (1989). Blackboard Systems. In A. Barr, P. R. Cohen and E. A. Feigenbaum (eds.), The Handbook of Artificial Intelligence, Vol. IV, 1–74. Reading, MA: Addison-Wesley Publishing Co.
Pierrehumbert, J. and J. Hirschberg (1990). The Meaning of Intonational Contours in the Interpretation of Discourse. In P. R. Cohen, J. Morgan and M. E. Pollack (eds.), Intentions in Communication. Cambridge: MIT Press.
Rimé, B. and Schiaratura, L. (1991). Gesture and Speech. In R. S. Feldman and B. Rimé, Fundamentals of Nonverbal Behavior, 239–281. New York: Press Syndicate of the University of Cambridge.
Sacks, H., Schegloff, E. A.. and Jefferson, G. A. (1974). A Simplest Systematics for the Organization of Tum-Taking in Conversation. Language, 50, 696–735.
Sacks, H. (1992). Lectures on Conversation, vol II. Cambridge, MA: Blackwell. Schegloff, E. A. and H. Sacks (1973). Opening up Closings. Semiotica, 7, 289–327.
Selfridge, O. (1959). Pandemonium: A Paradigm for Learning. Proceedings of Symposium on the Mechanization of Thought Processes, 1959, 511–29.
Sommer, R. (1959). Studies in Personal Space. Sociometry, 23, 247–260.
Taylor, T. J. and D. Cameron (1987). Analysing Conversation: Rules and Units in the Structure of Talk. Oxford, England: Pergamon Press.
Th6risson, K. R. (in press). Machine Perception of Embodied, Real-Time, Multimodal Dialogue. To be published in P. McKevitt (ed.), Language, Vision and Music.
Th6risson, K. R. (1999). A Mind Model for Multimodal Communicative Creatures and Humanoids. Inter- national Journal of Applied Artificial Intelligence, 1999, Vol. 13 (4–5), 449–486.
ThOrisson, K. R. (1998). Decision Making in Real-Time Face-to-Face Multimodal Communication. Second ACM International Conference on Autonomous Agents `98, Minneapolis, Minnesota, May 12–15.
Thdrisson, K. R. (1997). Layered, Modular Action Control in Communicative Humanoids. Proceedings of Computer Graphics Europe ‘87, June 5–7, Genieva, 134–143.
Th6risson, K. R. (1996). Communicative Humanoids: A Computational Model of Psychosocial Dialogue Skills. Ph.D. Thesis, Massachusetts Institute of Technology, U.S.A.
Walker, M. and Whittaker, S. (1990). Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics.
Whittaker, S., S. E. Brennan and H. H. Clark (1991). Co-ordinated Activity: An Analysis of Interaction in Computer-Supported Co-operative Work. Proceedings of Conference on Computer Human Interaction, 361–367.
Whittaker, S. and Stenton, P. (1988). Cues and Control in Expert-Client Dialogues. Proc. 26th Annual Meeting of the Association of Computational Linguistics, 123–130.
Yngve, V. H. (1970). On Getting a Word in Edgewise. Papers from the Sixth Regional Meeting., Chicago Linguistics Society, 567–78.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Thórisson, K.R. (2002). Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_8
Download citation
DOI: https://doi.org/10.1007/978-94-017-2367-1_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6024-2
Online ISBN: 978-94-017-2367-1
eBook Packages: Springer Book Archive