Article

Working with robots and objects: revisiting deictic reference for achieving spatial common ground

Authors:

Andrew G. Brooks,

Cynthia BreazealAuthors Info & Claims

HRI '06: Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction

Pages 297 - 304

https://doi.org/10.1145/1121241.1121292

Published: 02 March 2006 Publication History

Abstract

Robust joint visual attention is necessary for achieving a common frame of reference between humans and robots interacting multimodally in order to work together on real-world spatial tasks involving objects. We make a comprehensive examination of one component of this process that is often otherwise implemented in an ad hoc fashion: the ability to correctly determine the object referent from deictic reference including pointing gestures and speech. From this we describe the development of a modular spatial reasoning framework based around decomposition and resynthesis of speech and gesture into a language of pointing and object labeling. This framework supports multimodal and unimodal access in both real-world and mixed-reality workspaces, accounts for the need to discriminate and sequence identical and proximate objects, assists in overcoming inherent precision limitations in deictic gesture, and assists in the extraction of those gestures. We further discuss an implementation of the framework that has been deployed on two humanoid robot platforms to date.

References

[1]

Bluethmann, W., Ambrose, R., Diftler, M., Huber, E., Fagg, A., Rosenstein, M., Platt, R., Grupen, R., Breazeal, C., Brooks, A.G., Lockerd, A., Peters, R.A. II, Jenkins, O.C., MatariĆ, M., and Bugajska, M. Building an autonomous humanoid tool user. In Proc. IEEE-RAS/RSJ Int'l Conf. on Humanoid Robots (Humanoids'04), Los Angeles, California, November 2004.

[2]

Bolt, R.A. Put-That-There: Voice and gesture at the graphics interface. ACM Computer Graphics, 14(3):262--270, 1980.

Digital Library

[3]

Breazeal, C., Brooks, A.G., Gray, J., Hoffman, G., Kidd, C., Lee, H., Lieberman, J., Lockerd, A., and Chilongo, D. Tutelage and collaboration for humanoid robots. International Journal of Humanoid Robots, 1(2):315--348, 2004.

[4]

Breazeal, C., Kidd, C.D., Lockerd Thomaz, A., Hoffman, G., and Berlin, M. Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In Proc. International Conference on Intelligent Robots and Systems, 2004.

[5]

Clark, H.H. and Marshall, C.R. Definite reference and mutual knowledge. In Joshi, A.K., Webber, B.L., and Sag, I.A., editors, Elements of Discourse Understanding. Cambridge University Press, Cambridge, 1981.

[6]

Clark, H.H., Schreuder, R., and Buttrick, S. Common ground and the understanding of demonstrative reference. Journal of Verbal Learning and Verbal Behavior, 22:245--258, 1983.

[7]

CMU Sphinx Group. Open Source Speech Recognition Engines. http://cmusphinx.sourceforge.net/.

[8]

Demirdjian, D., Ko, T., and Darrell, T. Constraining human body tracking. In Proc. International Conference on Computer Vision, Nice, France, October 2003.

Digital Library

[9]

Gullberg, M. Gestures in spatial descriptions. In Working Papers 47, pages 87--97. Lund University, Department of Linguistics, 1999.

[10]

Hanafiah, Z.M., Yamazaki, C., Nakamura, A., and Kuno, Y. Human-robot speech interface understanding inexplicit utterances using vision. In Late Breaking Results of the 2004 Conference on Human Factors and Computing Systems (CHI'04), pages 1321--1324. ACM Press, April 24--29 2004.

Digital Library

[11]

Huber, E. and Baker, K. Using a hybrid of silhouette and range templates for real-time pose estimation. In Proc. International Conference on Robotics and Automation, pages 1652--1657, New Orleans, Louisiana, 2004. IEEE.

[12]

Huls, C., Bos, E., and Claassen, W. Automatic referent resolution of deictic and anaphoric expressions. Computational Linguistics, 21(1):59--79, 1995.

Digital Library

[13]

Kaur, M., Tremaine, M., Huang, N., Wilder, J., and Gacovski, Z. Where is 'it'? event synchronization in gaze-speech input systems. In Proc. 5th International Conference on Multimodal Interfaces (ICMI'03), pages 151--158, November 2003.

Digital Library

[14]

Kendon, A. Current issues in the study of gesture. In Nespoulous, J.-L., Perron, P., and Lecours, A.R., editors, The Biological Foundations of Gestures, pages 23--47. Lawrence Erlbaum Associates, Hillsdale, NJ, 1986.

[15]

Kobsa, A., Allgayer, J., Reddig, C., Reithinger. N., Schmauks, D., Harbusch, K., and Wahlster, W. Combining deictic gestures and natural language for referent identification. In Proc. 11th Conference on Computational Linguistics, pages 356--361, Bonn, Germany, 1986.

Digital Library

[16]

Koons, D.B., Sparrell, C.J., and Thorisson, K.R. Integrating simultaneous input from speech, gaze, and hand gestures. In Maybury, M., editor, Intelligent Multimedia Interfaces, pages 257--276. MIT Press, Menlo Park, CA, 1993.

Digital Library

[17]

Kuniyoshi, Y. and Inoue, H. Qualitative recognition of ongoing human action sequences. In Proc. International Joint Conference on Artificial Intelligence, pages 1600--1609, 1993.

[18]

Latoschik, M.E. and Wachsmuth, I. Exploiting distant pointing gestures for object selection in a virtual environment. In Wachsmuth, I. and Fröhlich, M., editors, Gesture and Sign Language in Human-Computer Interaction, volume 1371 of Lecture Notes in Artificial Intelligence, pages 185--196. Springer-Verlag, 1998.

Digital Library

[19]

Louwerse, M.M. and Bangerter, A. Focusing attention with deictic gestures and linguistic expressions. In Proc. XXVII Annual Conference of the Cognitive Science Society (CogSci 2005), Stresa, Italy, July 21--23 2005.

[20]

Machotka, P. and Spiegel, J. The Articulate Body. Irvington, 1982.

[21]

Marslen-Wilson, W., Levy, E., and Tyler, L.K. Producing interpretable discourse: The establishment and maintenance of reference. In Jarvella, R.J. and Klein, W., editors, Speech, Place and Action: Studies in Deixis and Related Topics. Wiley, 1982.

[22]

McNeill, D. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago, IL, 1992.

[23]

McNeill, D. and Levy, E. Conceptual representations in language activity and gesture. In Jarvella, R.J. and Klein, W., editors, Speech, Place and Action: Studies in Deixis and Related Topics. Wiley, 1982.

[24]

Milota, A.D. and Blattner, M.M. Multimodal interfaces with voice and gesture input. In Proc. International Conference on Systems, Man and Cybernetics, pages 2760--2765, Vancouver, Canada, October 1995. IEEE.

[25]

Moore, C. and Dunham, P.J., editors. Joint Attention: Its Origins and Role in Development. Lawrence Erlbaum Associates, 1995.

[26]

Moore, D., Essa, I., and Hayes, M. Exploiting human actions and object context for recognition tasks. In Proc. International Conference on Computer Vision, Corfu, Greece, 1999.

[27]

Nagai, Y. Learning to comprehend deictic gestures in robots and human infants. In Proc. 14th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN'05), pages 217--222, Nashville, TN, August 2005.

[28]

Oviatt, S. Ten myths of multimodal interaction. Communications of the ACM, 42(11):74--81, 1999.

Digital Library

[29]

Oviatt, S., DeAngeli, A., and Kuhn, K. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the 1997 Conference on Human Factors in Computing Systems (CHI'97), pages 415--422, Atlanta, Georgia, April 1997.

Digital Library

[30]

Pavlovic, V.I., Sharma, R., and Huang, T.S. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):677--695, July 1997.

Digital Library

[31]

Perzanowski, D., Schultz, A.C., Adams, W., Marsh, E., and Bugajska, M. Building a multimodal human-robot interface. IEEE Intelligent Systems, 16(1):16--21, 2001.

Digital Library

[32]

Peters, R.A.II, Hambuchen, K.E., Kawamura, K., and Wilkes, D.M. The sensory ego-sphere as a short-term memory for humanoids. In Proc. IEEE-RAS/RSJ Int'l Conf. on Humanoid Robots (Humanoids'01), pages 451--459, Tokyo, Japan, 2001.

[33]

Pfeiffer, T. and Latoschik, M.E. Resolving object references in multimodal dialogues for immersive virtual environments. In Proc. IEEE Virtual Reality Conference (VR'04), Chicago, IL, March 27--31 2004.

Digital Library

[34]

Premack, D. and Woodruff, G. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4):515--526, 1978.

[35]

Scaife, M. and Bruner, J.S. The capacity for joint visual attention in the infant. Nature, 253:265--266, 1975.

[36]

Strobel, M., Illmann, J., Kluge, B., and Marrone, F. Using spatial context knowledge in gesture recognition for commanding a domestic service robot. In Proc. 11th IEEE Workshop on Robot and Human Interactive Communication (RO-MAN'02), pages 468--473, Berlin, Germany, September 25--27 2002.

Cited By

Huang ARanucci AStogsdill AClark GSchott KHigger MHan ZWilliams TGrollman DBroadbent EJu WSoh HWilliams T(2024)(Gestures Vaguely): The Effects of Robots' Use of Abstract Pointing Gestures in Large-Scale EnvironmentsProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3634924(293-302)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610977.3634924
Han ZZhu YPhan AGarza FCastro AWilliams TCastellano GRiek LCakmak MLeite I(2023)Crossing Reality: Comparing Physical and Virtual Robot DeixisProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3568162.3576972(152-161)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1145/3568162.3576972
Wang SZhou XHigami YTakahashi HIwata HMaeda YMatsushima J(2023)Test Point Insertion for Multi-Cycle Power-On Self-TestACM Transactions on Design Automation of Electronic Systems10.1145/356355228:3(1-21)Online publication date: 10-May-2023
https://dl.acm.org/doi/10.1145/3563552
Show More Cited By

Index Terms

Working with robots and objects: revisiting deictic reference for achieving spatial common ground
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
      1. External interfaces for robotics
2. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Integrated and visual development environments

Recommendations

Conversational gaze mechanisms for humanlike robots

During conversations, speakers employ a number of verbal and nonverbal mechanisms to establish who participates in the conversation, when, and in what capacity. Gaze cues and mechanisms are particularly instrumental in establishing the participant roles ...
Cooperative Interactions Generated by Incompleteness in Robots' Utterance
HAI '18: Proceedings of the 6th International Conference on Human-Agent Interaction

The general principle about human-robot conversation is that the conversation of a robot should emphasize on providing complete information in a single utterance, without the necessity of a human asking further questions. On the other hand, information ...
Authoring Communicative Behaviors for Situated, Embodied Characters
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Embodied conversational agents hold great potential as multimodal interfaces due to their ability to communicate naturally using speech and nonverbal cues. The goal of my research is to enable animators and designers to endow ECAs with interactive ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '06: Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction

March 2006

376 pages

ISBN:1595932941

DOI:10.1145/1121241

General Chair:
Michael A. Goodrich
Brigham Young University USA
,
Program Chairs:
Alan C. Schultz
Naval Research Laboratory, USA
,
David J. Bruemmer
Idaho National Laboratory, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

HRI06

Sponsor:

HRI06: International Conference on Human Robot Interaction

March 2 - 3, 2006

Utah, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
827
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang ARanucci AStogsdill AClark GSchott KHigger MHan ZWilliams TGrollman DBroadbent EJu WSoh HWilliams T(2024)(Gestures Vaguely): The Effects of Robots' Use of Abstract Pointing Gestures in Large-Scale EnvironmentsProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3634924(293-302)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610977.3634924
Han ZZhu YPhan AGarza FCastro AWilliams TCastellano GRiek LCakmak MLeite I(2023)Crossing Reality: Comparing Physical and Virtual Robot DeixisProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3568162.3576972(152-161)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1145/3568162.3576972
Wang SZhou XHigami YTakahashi HIwata HMaeda YMatsushima J(2023)Test Point Insertion for Multi-Cycle Power-On Self-TestACM Transactions on Design Automation of Electronic Systems10.1145/356355228:3(1-21)Online publication date: 10-May-2023
https://dl.acm.org/doi/10.1145/3563552
Brown LHamilton JHan ZPhan APhung THansen ETran NWilliams T(2023)Best of Both Worlds? Combining Different Forms of Mixed Reality Deictic GesturesACM Transactions on Human-Robot Interaction10.1145/356338712:1(1-23)Online publication date: 15-Feb-2023
https://dl.acm.org/doi/10.1145/3563387
Sohn JKang SYoo S(2023)Arachne: Search-Based Repair of Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/356321032:4(1-26)Online publication date: 27-May-2023
https://dl.acm.org/doi/10.1145/3563210
Rogers AGardner MAugenstein I(2023)QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading ComprehensionACM Computing Surveys10.1145/356026055:10(1-45)Online publication date: 2-Feb-2023
https://dl.acm.org/doi/10.1145/3560260
Zheng KStein BFarzan R(2023)Use Ping Wisely: A Study of Team Communication and Performance under Lean AffordanceACM Transactions on Social Computing10.1145/35570225:1-4(1-26)Online publication date: 6-Jan-2023
https://dl.acm.org/doi/10.1145/3557022
Lorentz VWeiss MHildebrand KBoblan I(2023)Pointing Gestures for Human-Robot Interaction with the Humanoid Robot Digit2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309407(1886-1892)Online publication date: 28-Aug-2023
https://doi.org/10.1109/RO-MAN57019.2023.10309407
Singh ABansal A(2023)Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic GesturesIntelligent Computing10.1007/978-3-031-37963-5_85(1227-1246)Online publication date: 20-Aug-2023
https://doi.org/10.1007/978-3-031-37963-5_85
Upadhyaya AChandra J(2022)Spotting Flares: The Vital Signs of the Viral Spread of Tweets Made During Communal IncidentsACM Transactions on the Web10.1145/355035716:4(1-28)Online publication date: 16-Nov-2022
https://dl.acm.org/doi/10.1145/3550357
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents