Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1121241.1121292acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
Article

Working with robots and objects: revisiting deictic reference for achieving spatial common ground

Published: 02 March 2006 Publication History

Abstract

Robust joint visual attention is necessary for achieving a common frame of reference between humans and robots interacting multimodally in order to work together on real-world spatial tasks involving objects. We make a comprehensive examination of one component of this process that is often otherwise implemented in an ad hoc fashion: the ability to correctly determine the object referent from deictic reference including pointing gestures and speech. From this we describe the development of a modular spatial reasoning framework based around decomposition and resynthesis of speech and gesture into a language of pointing and object labeling. This framework supports multimodal and unimodal access in both real-world and mixed-reality workspaces, accounts for the need to discriminate and sequence identical and proximate objects, assists in overcoming inherent precision limitations in deictic gesture, and assists in the extraction of those gestures. We further discuss an implementation of the framework that has been deployed on two humanoid robot platforms to date.

References

[1]
Bluethmann, W., Ambrose, R., Diftler, M., Huber, E., Fagg, A., Rosenstein, M., Platt, R., Grupen, R., Breazeal, C., Brooks, A.G., Lockerd, A., Peters, R.A. II, Jenkins, O.C., MatariĆ, M., and Bugajska, M. Building an autonomous humanoid tool user. In Proc. IEEE-RAS/RSJ Int'l Conf. on Humanoid Robots (Humanoids'04), Los Angeles, California, November 2004.
[2]
Bolt, R.A. Put-That-There: Voice and gesture at the graphics interface. ACM Computer Graphics, 14(3):262--270, 1980.
[3]
Breazeal, C., Brooks, A.G., Gray, J., Hoffman, G., Kidd, C., Lee, H., Lieberman, J., Lockerd, A., and Chilongo, D. Tutelage and collaboration for humanoid robots. International Journal of Humanoid Robots, 1(2):315--348, 2004.
[4]
Breazeal, C., Kidd, C.D., Lockerd Thomaz, A., Hoffman, G., and Berlin, M. Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In Proc. International Conference on Intelligent Robots and Systems, 2004.
[5]
Clark, H.H. and Marshall, C.R. Definite reference and mutual knowledge. In Joshi, A.K., Webber, B.L., and Sag, I.A., editors, Elements of Discourse Understanding. Cambridge University Press, Cambridge, 1981.
[6]
Clark, H.H., Schreuder, R., and Buttrick, S. Common ground and the understanding of demonstrative reference. Journal of Verbal Learning and Verbal Behavior, 22:245--258, 1983.
[7]
CMU Sphinx Group. Open Source Speech Recognition Engines. http://cmusphinx.sourceforge.net/.
[8]
Demirdjian, D., Ko, T., and Darrell, T. Constraining human body tracking. In Proc. International Conference on Computer Vision, Nice, France, October 2003.
[9]
Gullberg, M. Gestures in spatial descriptions. In Working Papers 47, pages 87--97. Lund University, Department of Linguistics, 1999.
[10]
Hanafiah, Z.M., Yamazaki, C., Nakamura, A., and Kuno, Y. Human-robot speech interface understanding inexplicit utterances using vision. In Late Breaking Results of the 2004 Conference on Human Factors and Computing Systems (CHI'04), pages 1321--1324. ACM Press, April 24--29 2004.
[11]
Huber, E. and Baker, K. Using a hybrid of silhouette and range templates for real-time pose estimation. In Proc. International Conference on Robotics and Automation, pages 1652--1657, New Orleans, Louisiana, 2004. IEEE.
[12]
Huls, C., Bos, E., and Claassen, W. Automatic referent resolution of deictic and anaphoric expressions. Computational Linguistics, 21(1):59--79, 1995.
[13]
Kaur, M., Tremaine, M., Huang, N., Wilder, J., and Gacovski, Z. Where is 'it'? event synchronization in gaze-speech input systems. In Proc. 5th International Conference on Multimodal Interfaces (ICMI'03), pages 151--158, November 2003.
[14]
Kendon, A. Current issues in the study of gesture. In Nespoulous, J.-L., Perron, P., and Lecours, A.R., editors, The Biological Foundations of Gestures, pages 23--47. Lawrence Erlbaum Associates, Hillsdale, NJ, 1986.
[15]
Kobsa, A., Allgayer, J., Reddig, C., Reithinger. N., Schmauks, D., Harbusch, K., and Wahlster, W. Combining deictic gestures and natural language for referent identification. In Proc. 11th Conference on Computational Linguistics, pages 356--361, Bonn, Germany, 1986.
[16]
Koons, D.B., Sparrell, C.J., and Thorisson, K.R. Integrating simultaneous input from speech, gaze, and hand gestures. In Maybury, M., editor, Intelligent Multimedia Interfaces, pages 257--276. MIT Press, Menlo Park, CA, 1993.
[17]
Kuniyoshi, Y. and Inoue, H. Qualitative recognition of ongoing human action sequences. In Proc. International Joint Conference on Artificial Intelligence, pages 1600--1609, 1993.
[18]
Latoschik, M.E. and Wachsmuth, I. Exploiting distant pointing gestures for object selection in a virtual environment. In Wachsmuth, I. and Fröhlich, M., editors, Gesture and Sign Language in Human-Computer Interaction, volume 1371 of Lecture Notes in Artificial Intelligence, pages 185--196. Springer-Verlag, 1998.
[19]
Louwerse, M.M. and Bangerter, A. Focusing attention with deictic gestures and linguistic expressions. In Proc. XXVII Annual Conference of the Cognitive Science Society (CogSci 2005), Stresa, Italy, July 21--23 2005.
[20]
Machotka, P. and Spiegel, J. The Articulate Body. Irvington, 1982.
[21]
Marslen-Wilson, W., Levy, E., and Tyler, L.K. Producing interpretable discourse: The establishment and maintenance of reference. In Jarvella, R.J. and Klein, W., editors, Speech, Place and Action: Studies in Deixis and Related Topics. Wiley, 1982.
[22]
McNeill, D. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago, IL, 1992.
[23]
McNeill, D. and Levy, E. Conceptual representations in language activity and gesture. In Jarvella, R.J. and Klein, W., editors, Speech, Place and Action: Studies in Deixis and Related Topics. Wiley, 1982.
[24]
Milota, A.D. and Blattner, M.M. Multimodal interfaces with voice and gesture input. In Proc. International Conference on Systems, Man and Cybernetics, pages 2760--2765, Vancouver, Canada, October 1995. IEEE.
[25]
Moore, C. and Dunham, P.J., editors. Joint Attention: Its Origins and Role in Development. Lawrence Erlbaum Associates, 1995.
[26]
Moore, D., Essa, I., and Hayes, M. Exploiting human actions and object context for recognition tasks. In Proc. International Conference on Computer Vision, Corfu, Greece, 1999.
[27]
Nagai, Y. Learning to comprehend deictic gestures in robots and human infants. In Proc. 14th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN'05), pages 217--222, Nashville, TN, August 2005.
[28]
Oviatt, S. Ten myths of multimodal interaction. Communications of the ACM, 42(11):74--81, 1999.
[29]
Oviatt, S., DeAngeli, A., and Kuhn, K. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the 1997 Conference on Human Factors in Computing Systems (CHI'97), pages 415--422, Atlanta, Georgia, April 1997.
[30]
Pavlovic, V.I., Sharma, R., and Huang, T.S. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):677--695, July 1997.
[31]
Perzanowski, D., Schultz, A.C., Adams, W., Marsh, E., and Bugajska, M. Building a multimodal human-robot interface. IEEE Intelligent Systems, 16(1):16--21, 2001.
[32]
Peters, R.A.II, Hambuchen, K.E., Kawamura, K., and Wilkes, D.M. The sensory ego-sphere as a short-term memory for humanoids. In Proc. IEEE-RAS/RSJ Int'l Conf. on Humanoid Robots (Humanoids'01), pages 451--459, Tokyo, Japan, 2001.
[33]
Pfeiffer, T. and Latoschik, M.E. Resolving object references in multimodal dialogues for immersive virtual environments. In Proc. IEEE Virtual Reality Conference (VR'04), Chicago, IL, March 27--31 2004.
[34]
Premack, D. and Woodruff, G. Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4):515--526, 1978.
[35]
Scaife, M. and Bruner, J.S. The capacity for joint visual attention in the infant. Nature, 253:265--266, 1975.
[36]
Strobel, M., Illmann, J., Kluge, B., and Marrone, F. Using spatial context knowledge in gesture recognition for commanding a domestic service robot. In Proc. 11th IEEE Workshop on Robot and Human Interactive Communication (RO-MAN'02), pages 468--473, Berlin, Germany, September 25--27 2002.

Cited By

View all
  • (2024)(Gestures Vaguely): The Effects of Robots' Use of Abstract Pointing Gestures in Large-Scale EnvironmentsProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3634924(293-302)Online publication date: 11-Mar-2024
  • (2023)Crossing Reality: Comparing Physical and Virtual Robot DeixisProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3568162.3576972(152-161)Online publication date: 13-Mar-2023
  • (2023)Test Point Insertion for Multi-Cycle Power-On Self-TestACM Transactions on Design Automation of Electronic Systems10.1145/356355228:3(1-21)Online publication date: 10-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '06: Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
March 2006
376 pages
ISBN:1595932941
DOI:10.1145/1121241
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-robot interaction
  2. multimodal interfaces
  3. natural gesture understanding
  4. spatial behavior

Qualifiers

  • Article

Conference

HRI06
HRI06: International Conference on Human Robot Interaction
March 2 - 3, 2006
Utah, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)(Gestures Vaguely): The Effects of Robots' Use of Abstract Pointing Gestures in Large-Scale EnvironmentsProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3634924(293-302)Online publication date: 11-Mar-2024
  • (2023)Crossing Reality: Comparing Physical and Virtual Robot DeixisProceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3568162.3576972(152-161)Online publication date: 13-Mar-2023
  • (2023)Test Point Insertion for Multi-Cycle Power-On Self-TestACM Transactions on Design Automation of Electronic Systems10.1145/356355228:3(1-21)Online publication date: 10-May-2023
  • (2023)Best of Both Worlds? Combining Different Forms of Mixed Reality Deictic GesturesACM Transactions on Human-Robot Interaction10.1145/356338712:1(1-23)Online publication date: 15-Feb-2023
  • (2023)Arachne: Search-Based Repair of Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/356321032:4(1-26)Online publication date: 27-May-2023
  • (2023)QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading ComprehensionACM Computing Surveys10.1145/356026055:10(1-45)Online publication date: 2-Feb-2023
  • (2023)Use Ping Wisely: A Study of Team Communication and Performance under Lean AffordanceACM Transactions on Social Computing10.1145/35570225:1-4(1-26)Online publication date: 6-Jan-2023
  • (2023)Pointing Gestures for Human-Robot Interaction with the Humanoid Robot Digit2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309407(1886-1892)Online publication date: 28-Aug-2023
  • (2023)Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic GesturesIntelligent Computing10.1007/978-3-031-37963-5_85(1227-1246)Online publication date: 20-Aug-2023
  • (2022)Spotting Flares: The Vital Signs of the Viral Spread of Tweets Made During Communal IncidentsACM Transactions on the Web10.1145/355035716:4(1-28)Online publication date: 16-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media