Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2556288.2556989acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Combining body pose, gaze, and gesture to determine intention to interact in vision-based interfaces

Published: 26 April 2014 Publication History

Abstract

Vision-based interfaces, such as those made popular by the Microsoft Kinect, suffer from the Midas Touch problem: every user motion can be interpreted as an interaction. In response, we developed an algorithm that combines facial features, body pose and motion to approximate a user's intention to interact with the system. We show how this can be used to determine when to pay attention to a user's actions and when to ignore them. To demonstrate the value of our approach, we present results from a 30-person lab study conducted to compare four engagement algorithms in single and multi-user scenarios. We found that combining intention to interact with a 'raise an open hand in front of you' gesture yielded the best results. The latter approach offers a 12% improvement in accuracy and a 20% reduction in time to engage over a baseline 'wave to engage' gesture currently used on the Xbox 360.

Supplementary Material

suppl.mov (pn0235-file3.mp4)
Supplemental video
MP4 File (p3443-sidebyside.mp4)

References

[1]
Asteriadis, S., Karpouzis, K., and Kollias, S. Feature extraction and selection for inferring user engagement in an hci environment. Human-Computer Interaction, Springer-Verlag (2009), 22--29.
[2]
Bianchi-Berthouze, N. What Can Body Movement Tell Us About Players' Engagement? Measuring Behavior'12, ACM Press (2012), 94--97.
[3]
Bianchi-Berthouze, N. Understanding the role of body movement in player engagement. Human Computer Interaction 28, 2 (2013), 42--75.
[4]
Bohus, D. and Horvitz, E. Learning to Predict Engagement with a Spoken Dialog System in OpenWorld Settings. SIGDIAL'09, ACM Press (2009), 244--252.
[5]
Bohus, D. and Horvitz, E. Dialog in the open world: platform and applications. ICMI-MLMI'09, ACM Press (2009), 31--38.
[6]
Brown, R.G. Exponential Smoothing for Predicting Demand. Little, 1956.
[7]
Freund, Y. and Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.
[8]
Harmonix. Dance Central. 2012. http://www.dancecentral.com.
[9]
Hartmann, B., Morris, M.R., Benko, H., and Wilson, A.D. Pictionaire: supporting collaborative design work by integrating physical and digital artifacts. CSCW'10, ACM Press (2010), 421--424.
[10]
Hinckley, K., Pausch, R., Goble, J.C., and Kassell, N.F. A survey of design issues in spatial input. UIST'94, ACM Press (1994), 213--222.
[11]
Kapoor, A., Picard, R.W., and Ivanov, Y. Probabilistic combination of multiple modalities to detect interest. ICPR'04., IEEE Computer Society Press (2004), 969--972.
[12]
Kjeldsen, R. and Hartman, J. Design issues for visionbased computer interaction systems. PUI'01, ACM Press (2001), 1--8.
[13]
Kleinsmith, A. and Bianchi-Berthouze, N. Affective Body Expression Perception and Recognition: A Survey. IEEE Transactions on Affective Computing 9, Preprint (2012).
[14]
Michalowski, M.P., Sabanovic, S., and Simmons, R. A spatial model of engagement for a social robot. Advanced Motion Control'06, IEEE Computer Society Press (2006), 762--767.
[15]
Microsoft. Xbox 360 + Kinect. http://www.xbox.com/kinect.
[16]
Mota, S. and Picard, R. Automated posture analysis for detecting learner's interest level. In CVPR Workshop on HCI, (2003).
[17]
Nakano, Y. and Ishii, R. Estimating user's engagement from eye-gaze behaviors in human-agent conversations. IUI'10, ACM Press (2010), 139--148.
[18]
Pawar, U.S., Pal, J., Gupta, R., and Toyama, K. Multiple mice for retention tasks in disadvantaged schools. CHI'07, ACM Press (2007), 1581--1590.
[19]
Rare. Kinect Sports. 2010. http://www.rareusa.com/games/kinect-sports.
[20]
Rich, C., Ponsler, B., Holroyd, A., and Sidner, C.L. Recognizing engagement in human-robot interaction. HRI'10, IEEE Computer Society Press (2010), 375--382.
[21]
Sanghvi, J., Castellano, G., Leite, I., Pereira, A., McOwan, P.W., and Paiva, A. Automatic analysis of affective postures and body motion to detect engagement with a game companion. HRI'11, ACM Press (2011), 305--312.
[22]
Schwarz, J., Hudson, S.E., Mankoff, J., and Wilson, A.D. A Framework for Robust and Flexible Handling of Inputs with Uncertainty. Proceedings of the 24th annual ACM symposium on User interface software and technology UIST'11, ACM Press (2010), 47--56.
[23]
Smith, P., Shah, M., and da Vitoria Lobo, N. Determining Driver Visual Attention With One Camera. IEEE Trans. on Intelligent Transportation Systems 4, 4 (2003), 205--218.
[24]
Stødle, D., Hagen, T., Bjorndalen, J., and Anshus, O. Gesture-based, touch-free multi-user gaming on wallsized, high-resolution tiled displays. Journal of Virtual Reality and Broadcasting 5, 10 (2008), 1860--2037.
[25]
Viola, P. and Jones, M.J. Robust Real-Time Face Detection. International Journal of Computer Vision 57, 2 (2004), 137--154.
[26]
Walter, R., Bailly, G., and Laboratories, T.I. StrikeAPose' : Revealing Mid-Air Gestures on Public Displays. (2013), 841--850.

Cited By

View all
  • (2024)GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model AgentsProceedings of the ACM on Human-Computer Interaction10.1145/36981458:ISS(462-499)Online publication date: 24-Oct-2024
  • (2024)Sign Language-Based versus Touch-Based Input for Deaf Users with Interactive Personal Assistants in Simulated Kitchen EnvironmentsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651075(1-9)Online publication date: 11-May-2024
  • (2024)Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal AssistantsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642094(1-15)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. Combining body pose, gaze, and gesture to determine intention to interact in vision-based interfaces

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
    April 2014
    4206 pages
    ISBN:9781450324731
    DOI:10.1145/2556288
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 April 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. free-space interaction
    2. input segmentation
    3. learned models
    4. user engagement
    5. vision-based input

    Qualifiers

    • Research-article

    Conference

    CHI '14
    Sponsor:
    CHI '14: CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2014
    Ontario, Toronto, Canada

    Acceptance Rates

    CHI '14 Paper Acceptance Rate 465 of 2,043 submissions, 23%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI '25
    CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)103
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GestureGPT: Toward Zero-Shot Free-Form Hand Gesture Understanding with Large Language Model AgentsProceedings of the ACM on Human-Computer Interaction10.1145/36981458:ISS(462-499)Online publication date: 24-Oct-2024
    • (2024)Sign Language-Based versus Touch-Based Input for Deaf Users with Interactive Personal Assistants in Simulated Kitchen EnvironmentsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651075(1-9)Online publication date: 11-May-2024
    • (2024)Assessment of Sign Language-Based versus Touch-Based Input for Deaf Users Interacting with Intelligent Personal AssistantsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642094(1-15)Online publication date: 11-May-2024
    • (2023)Uncertainty-Aware Gaze Tracking for Assisted Living EnvironmentsIEEE Transactions on Image Processing10.1109/TIP.2023.325325332(2335-2347)Online publication date: 2023
    • (2023)XR Input Error Mediation for Hand-Based Input: Task and Context Influences a User’s Preference2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)10.1109/ISMAR59233.2023.00117(1006-1015)Online publication date: 16-Oct-2023
    • (2022)Modeling Feedback in Interaction With Conversational Agents—A ReviewFrontiers in Computer Science10.3389/fcomp.2022.7445744Online publication date: 15-Mar-2022
    • (2022)Investigating Clutching Interactions for Touchless Medical Imaging SystemsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517512(1-14)Online publication date: 29-Apr-2022
    • (2022)FaceEngage: Robust Estimation of Gameplay Engagement from User-Contributed (YouTube) VideosIEEE Transactions on Affective Computing10.1109/TAFFC.2019.294501413:2(651-665)Online publication date: 1-Apr-2022
    • (2022)Touchless Tactile Interaction with Unconventional Permeable DisplaysUltrasound Mid-Air Haptics for Touchless Interfaces10.1007/978-3-031-04043-6_8(207-223)Online publication date: 17-Sep-2022
    • (2021)A Software Toolbox for Behavioral Analysis in Robot-Assisted Special Education2021 International Conference on Software, Telecommunications and Computer Networks (SoftCOM)10.23919/SoftCOM52868.2021.9559093(1-5)Online publication date: 23-Sep-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media