research-article

Combining body pose, gaze, and gesture to determine intention to interact in vision-based interfaces

Authors:

Charles Claudius Marais,

Tommer Leyvand,

Scott E. Hudson,

Jennifer MankoffAuthors Info & Claims

CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 3443 - 3452

https://doi.org/10.1145/2556288.2556989

Published: 26 April 2014 Publication History

Abstract

Vision-based interfaces, such as those made popular by the Microsoft Kinect, suffer from the Midas Touch problem: every user motion can be interpreted as an interaction. In response, we developed an algorithm that combines facial features, body pose and motion to approximate a user's intention to interact with the system. We show how this can be used to determine when to pay attention to a user's actions and when to ignore them. To demonstrate the value of our approach, we present results from a 30-person lab study conducted to compare four engagement algorithms in single and multi-user scenarios. We found that combining intention to interact with a 'raise an open hand in front of you' gesture yielded the best results. The latter approach offers a 12% improvement in accuracy and a 20% reduction in time to engage over a baseline 'wave to engage' gesture currently used on the Xbox 360.

Supplementary Material

suppl.mov (pn0235-file3.mp4)

Supplemental video

Download
34.51 MB

MP4 File (p3443-sidebyside.mp4)

Download
136.11 MB

References

[1]

Asteriadis, S., Karpouzis, K., and Kollias, S. Feature extraction and selection for inferring user engagement in an hci environment. Human-Computer Interaction, Springer-Verlag (2009), 22--29.

Digital Library

[2]

Bianchi-Berthouze, N. What Can Body Movement Tell Us About Players' Engagement? Measuring Behavior'12, ACM Press (2012), 94--97.

[3]

Bianchi-Berthouze, N. Understanding the role of body movement in player engagement. Human Computer Interaction 28, 2 (2013), 42--75.

[4]

Bohus, D. and Horvitz, E. Learning to Predict Engagement with a Spoken Dialog System in OpenWorld Settings. SIGDIAL'09, ACM Press (2009), 244--252.

Digital Library

[5]

Bohus, D. and Horvitz, E. Dialog in the open world: platform and applications. ICMI-MLMI'09, ACM Press (2009), 31--38.

Digital Library

[6]

Brown, R.G. Exponential Smoothing for Predicting Demand. Little, 1956.

[7]

Freund, Y. and Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.

Digital Library

[8]

Harmonix. Dance Central. 2012. http://www.dancecentral.com.

[9]

Hartmann, B., Morris, M.R., Benko, H., and Wilson, A.D. Pictionaire: supporting collaborative design work by integrating physical and digital artifacts. CSCW'10, ACM Press (2010), 421--424.

Digital Library

[10]

Hinckley, K., Pausch, R., Goble, J.C., and Kassell, N.F. A survey of design issues in spatial input. UIST'94, ACM Press (1994), 213--222.

Digital Library

[11]

Kapoor, A., Picard, R.W., and Ivanov, Y. Probabilistic combination of multiple modalities to detect interest. ICPR'04., IEEE Computer Society Press (2004), 969--972.

Digital Library

[12]

Kjeldsen, R. and Hartman, J. Design issues for visionbased computer interaction systems. PUI'01, ACM Press (2001), 1--8.

Digital Library

[13]

Kleinsmith, A. and Bianchi-Berthouze, N. Affective Body Expression Perception and Recognition: A Survey. IEEE Transactions on Affective Computing 9, Preprint (2012).

Digital Library

[14]

Michalowski, M.P., Sabanovic, S., and Simmons, R. A spatial model of engagement for a social robot. Advanced Motion Control'06, IEEE Computer Society Press (2006), 762--767.

[15]

Microsoft. Xbox 360 + Kinect. http://www.xbox.com/kinect.

[16]

Mota, S. and Picard, R. Automated posture analysis for detecting learner's interest level. In CVPR Workshop on HCI, (2003).

[17]

Nakano, Y. and Ishii, R. Estimating user's engagement from eye-gaze behaviors in human-agent conversations. IUI'10, ACM Press (2010), 139--148.

Digital Library

[18]

Pawar, U.S., Pal, J., Gupta, R., and Toyama, K. Multiple mice for retention tasks in disadvantaged schools. CHI'07, ACM Press (2007), 1581--1590.

Digital Library

[19]

Rare. Kinect Sports. 2010. http://www.rareusa.com/games/kinect-sports.

[20]

Rich, C., Ponsler, B., Holroyd, A., and Sidner, C.L. Recognizing engagement in human-robot interaction. HRI'10, IEEE Computer Society Press (2010), 375--382.

Digital Library

[21]

Sanghvi, J., Castellano, G., Leite, I., Pereira, A., McOwan, P.W., and Paiva, A. Automatic analysis of affective postures and body motion to detect engagement with a game companion. HRI'11, ACM Press (2011), 305--312.

Digital Library

[22]

Schwarz, J., Hudson, S.E., Mankoff, J., and Wilson, A.D. A Framework for Robust and Flexible Handling of Inputs with Uncertainty. Proceedings of the 24th annual ACM symposium on User interface software and technology UIST'11, ACM Press (2010), 47--56.

Digital Library

[23]

Smith, P., Shah, M., and da Vitoria Lobo, N. Determining Driver Visual Attention With One Camera. IEEE Trans. on Intelligent Transportation Systems 4, 4 (2003), 205--218.

Digital Library

[24]

Stødle, D., Hagen, T., Bjorndalen, J., and Anshus, O. Gesture-based, touch-free multi-user gaming on wallsized, high-resolution tiled displays. Journal of Virtual Reality and Broadcasting 5, 10 (2008), 1860--2037.

[25]

Viola, P. and Jones, M.J. Robust Real-Time Face Detection. International Journal of Computer Vision 57, 2 (2004), 137--154.

Digital Library

[26]

Walter, R., Bailly, G., and Laboratories, T.I. StrikeAPose' : Revealing Mid-Air Gestures on Public Displays. (2013), 841--850.

Digital Library

Cited By

Zeng XWang XZhang TYu CZhao SChen Y(2024)GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model AgentsProceedings of the ACM on Human-Computer Interaction10.1145/36981458:ISS(462-499)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3698145
Devries PTran NDelk KMiga MTaulbee RPidathala PGlasser AKushalnagar RVogler C(2024)Sign Language-Based versus Touch-Based Input for Deaf Users with Interactive Personal Assistants in Simulated Kitchen EnvironmentsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651075(1-9)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3651075
Tran NDeVries PSeita MKushalnagar RGlasser AVogler C(2024)Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal AssistantsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642094(1-15)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642094
Show More Cited By

Index Terms

Combining body pose, gaze, and gesture to determine intention to interact in vision-based interfaces
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Humans rely on eye gaze and hand manipulations extensively in their everyday activities. Most often, users gaze at an object to perceive it and then use their hands to manipulate it. We propose applying a multimodal, gaze plus free-space gesture ...
Gliding and saccadic gaze gesture recognition in real time

Eye movements can be consciously controlled by humans to the extent of performing sequences of predefined movement patterns, or 'gaze gestures'. Gaze gestures can be tracked noninvasively employing a video-based eye tracking system. Gaze gestures hold ...
Gaze Speedup: Eye Gaze Assisted Gesture Typing in Virtual Reality
IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces

Mid-air text input in augmented or virtual reality (AR/VR) is an open problem. One proposed solution is gesture typing where the user performs a gesture trace over the keyboard. However, this requires the user to move their hands precisely and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

April 2014

4206 pages

ISBN:9781450324731

DOI:10.1145/2556288

General Chairs:
Matt Jones
Swansea University, Wales, UK
,
Philippe Palanque
Université Paul Sabatier, France
,
Program Chairs:
Albrecht Schmidt
University of Stuttgart, Germany
,
Tovi Grossman
Autodesk Research, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '14

Sponsor:

SIGCHI

CHI '14: CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2014

Ontario, Toronto, Canada

Acceptance Rates

CHI '14 Paper Acceptance Rate 465 of 2,043 submissions, 23%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI '25

Sponsor:
sigchi

CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
1,022
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)5

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zeng XWang XZhang TYu CZhao SChen Y(2024)GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model AgentsProceedings of the ACM on Human-Computer Interaction10.1145/36981458:ISS(462-499)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3698145
Devries PTran NDelk KMiga MTaulbee RPidathala PGlasser AKushalnagar RVogler C(2024)Sign Language-Based versus Touch-Based Input for Deaf Users with Interactive Personal Assistants in Simulated Kitchen EnvironmentsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651075(1-9)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3651075
Tran NDeVries PSeita MKushalnagar RGlasser AVogler C(2024)Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal AssistantsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642094(1-15)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642094
Her PManderle LDias PMedeiros HOdone F(2023)Uncertainty-Aware Gaze Tracking for Assisted Living EnvironmentsIEEE Transactions on Image Processing10.1109/TIP.2023.325325332(2335-2347)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3253253
Lin TLafreniere BXu YGrossman TWigdor DGlueck M(2023)XR Input Error Mediation for Hand-Based Input: Task and Context Influences a User’s Preference2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)10.1109/ISMAR59233.2023.00117(1006-1015)Online publication date: 16-Oct-2023
https://doi.org/10.1109/ISMAR59233.2023.00117
Axelsson ABuschmeier HSkantze G(2022)Modeling Feedback in Interaction With Conversational Agents—A ReviewFrontiers in Computer Science10.3389/fcomp.2022.7445744Online publication date: 15-Mar-2022
https://doi.org/10.3389/fcomp.2022.744574
Cronin SFreeman EDoherty G(2022)Investigating Clutching Interactions for Touchless Medical Imaging SystemsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517512(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517512
Chen XNiu LVeeraraghavan ASabharwal A(2022)FaceEngage: Robust Estimation of Gameplay Engagement from User-Contributed (YouTube) VideosIEEE Transactions on Affective Computing10.1109/TAFFC.2019.294501413:2(651-665)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TAFFC.2019.2945014
Sand ARakkolainen ISurakka VRaisamo RBrewster S(2022)Touchless Tactile Interaction with Unconventional Permeable DisplaysUltrasound Mid-Air Haptics for Touchless Interfaces10.1007/978-3-031-04043-6_8(207-223)Online publication date: 17-Sep-2022
https://doi.org/10.1007/978-3-031-04043-6_8
Lytridis CKaburlasos VBazinas CPapakostas GPapadopoulou CNikopoulou V(2021)A Software Toolbox for Behavioral Analysis in Robot-Assisted Special Education2021 International Conference on Software, Telecommunications and Computer Networks (SoftCOM)10.23919/SoftCOM52868.2021.9559093(1-5)Online publication date: 23-Sep-2021
https://doi.org/10.23919/SoftCOM52868.2021.9559093
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents