Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2663204.2663251acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

The Relation of Eye Gaze and Face Pose: Potential Impact on Speech Recognition

Published: 12 November 2014 Publication History

Abstract

We are interested in using context to improve speech recognition and speech understanding. Knowing what the user is attending to visually helps us predict their utterances and thus makes speech recognition easier. Eye gaze is one way to access this signal, but is often unavailable (or expensive to gather) at longer distances. In this paper we look at joint eye-gaze and facial-pose information while users perform a speech reading task. We hypothesize, and verify experimentally, that the eyes lead, and then the face follows. Face pose might not be as fast, or as accurate a signal of visual attention as eye gaze, but based on experiments correlating eye gaze with speech recognition, we conclude that face pose provides useful information to bias a recognizer toward higher accuracy.

References

[1]
1. N. J. Cooke and M. Russell. Gaze-contingent automatic speech recognition. In Signal Processing, pages 369--380, 2008.
[2]
2. L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, and J. Williams. Recent advances in deep learning for speech research at Microsoft. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
[3]
3. J.-B. Huang, Q. Cai, Z. Liu, N. Ahuja, and Z. Zhang. Towards accurate and robust cross-ratio based gaze trackers through learning from simulation. In Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA '14, pages 75--82, New York, NY, USA, 2014. ACM.
[4]
4. D. Klakow. Log-linear interpolation of language models. In Proceedings of the International Conference on Spoken-Language Processing (ICSLP), page 1695, 1998.
[5]
5. M. Ostendorf, A. Kannan, S. Austin, O. Kimball, R. Schwartz, and J. R. Rohlicek. Integration of diverse recognition methodologies through reevaluation of n-best sentence hypotheses. In Proceedings of the workshop on Speech and Natural Language (HLT '91), pages 83--87. Association for Computational Linguistics, 1991.
[6]
6. M. Slaney, R. Rajen, A. Stolcke, and P. Parthasarathy. Gaze enhanced speech recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.

Cited By

View all
  • (2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
  • (2021)Watch Where You’re Going! Gaze and Head Orientation as Predictors for Social Robot Navigation2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9561286(3553-3559)Online publication date: 30-May-2021
  • (2019)Interaction Techniques for Cinematic Virtual Reality2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)10.1109/VR.2019.8798189(1733-1737)Online publication date: Mar-2019
  • Show More Cited By

Index Terms

  1. The Relation of Eye Gaze and Face Pose: Potential Impact on Speech Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction
    November 2014
    558 pages
    ISBN:9781450328852
    DOI:10.1145/2663204
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. eye gaze
    2. face pose
    3. language models
    4. speech recognition

    Qualifiers

    • Poster

    Conference

    ICMI '14
    Sponsor:

    Acceptance Rates

    ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
    • (2021)Watch Where You’re Going! Gaze and Head Orientation as Predictors for Social Robot Navigation2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9561286(3553-3559)Online publication date: 30-May-2021
    • (2019)Interaction Techniques for Cinematic Virtual Reality2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)10.1109/VR.2019.8798189(1733-1737)Online publication date: Mar-2019
    • (2018)GazeRecallProceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia10.1145/3282894.3282903(115-119)Online publication date: 25-Nov-2018
    • (2018)Cinematic Narration in VR – Rethinking Film Conventions for 360 DegreesVirtual, Augmented and Mixed Reality: Applications in Health, Cultural Heritage, and Industry10.1007/978-3-319-91584-5_15(184-201)Online publication date: 2-Jun-2018
    • (2017)Behavioral cues help predict impact of advertising on future salesImage and Vision Computing10.1016/j.imavis.2017.03.00265:C(49-57)Online publication date: 1-Sep-2017
    • (2016)Data Collection and Synchronisation: Towards a Multiperspective Multimodal Dialogue System with Metacognitive AbilitiesDialogues with Social Robots10.1007/978-981-10-2585-3_19(245-256)Online publication date: 25-Dec-2016
    • (2015)An Empirical Evaluation of a Vocal User Interface for Programming by VoiceInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.20150701048:2(47-63)Online publication date: 1-Jul-2015
    • (2015)Probabilistic features for connecting eye gaze to spoken language understanding2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2015.7178985(5311-5315)Online publication date: Apr-2015
    • (2015)Elderly Speech-Gaze InteractionUniversal Access in Human-Computer Interaction. Access to Today's Technologies10.1007/978-3-319-20678-3_1(3-12)Online publication date: 18-Jul-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media