short-paper

Gaze+Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control

Authors:

Jun RekimotoAuthors Info & Claims

ETRA '21 Short Papers: ACM Symposium on Eye Tracking Research and Applications

Article No.: 13, Pages 1 - 6

https://doi.org/10.1145/3448018.3458011

Published: 25 May 2021 Publication History

Abstract

As eye-tracking technologies develop, gaze becomes more and more popular as an input modality. However, in situations that require fast and precise object selection, gaze is hard to use because of limited accuracy. We present Gaze+Lip, a hands-free interface that combines gaze and lip reading to enable rapid and precise remote controls when interacting with big displays. Gaze+Lip takes advantage of gaze for target selection and leverages silent speech to ensure accurate and reliable command execution in noisy scenarios such as watching TV or playing videos on a computer. For evaluation, we implemented a system on a TV, and conducted an experiment to compare our method with the dwell-based gaze-only input method. Results showed that Gaze+Lip outperformed the gaze-only approach in accuracy and input speed. Furthermore, subjective evaluations indicated that Gaze+Lip is easy to understand, easy to use, and has higher perceived speed than the gaze-only approach.

References

[1]

Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. 2016. LipNet: Sentence-level Lipreading. CoRR abs/1611.01599(2016). arxiv:1611.01599http://arxiv.org/abs/1611.01599

[2]

Ishan Chatterjee, Robert Xiao, and Chris Harrison. 2015. Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (Seattle, Washington, USA) (ICMI ’15). Association for Computing Machinery, New York, NY, USA, 131–138. https://doi.org/10.1145/2818346.2820752

Digital Library

[3]

J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman. 2017. Lip Reading Sentences in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition.

[4]

Joon Son Chung, Andrew W. Senior, Oriol Vinyals, and Andrew Zisserman. 2016. Lip Reading Sentences in the Wild. CoRR abs/1611.05358(2016). arxiv:1611.05358http://arxiv.org/abs/1611.05358

[5]

J. S. Chung and A. Zisserman. 2016. Lip Reading in the Wild. In Asian Conference on Computer Vision.

[6]

Ing-Shiou Hwang, Yi-Ying Tsai, Bo-Han Zeng, Chien-Ming Lin, Huei-Sheng Shiue, and Gwo-Ching Chang. 2020. Integration of eye tracking and lip motion for hands-free computer access. Universal Access in the Information Society(2020), 1–12.

[7]

Robert J. K. Jacob. 1990. What You Look at is What You Get: Eye Movement-Based Interaction Techniques. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/97243.97246

Digital Library

[8]

S. Ji, W. Xu, M. Yang, and K. Yu. 2013. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1(2013), 221–231. https://doi.org/10.1109/TPAMI.2012.59

Digital Library

[9]

Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI ’20). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3399715.3399852

Digital Library

[10]

Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.

Digital Library

[11]

I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey. 2002. Extraction of visual features for lipreading. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 2 (Feb 2002), 198–213. https://doi.org/10.1109/34.982900

Digital Library

[12]

Eric David Petajan. 1984. Automatic Lipreading to Enhance Speech Recognition (Speech Reading). Ph.D. Dissertation. Champaign, IL, USA. AAI8502266.

Digital Library

[13]

Ken Pfeuffer, Jason Alexander, Ming Ki Chong, and Hans Gellersen. 2014. Gaze-Touch: Combining Gaze with Multi-Touch for Interaction on the Same Surface. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology(Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 509–518. https://doi.org/10.1145/2642918.2647397

Digital Library

[14]

KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13796–13805.

[15]

Korok Sengupta, Min Ke, Raphael Menges, Chandan Kumar, and Steffen Staab. 2018. Hands-Free Web Browsing: Enriching the User Experience with Gaze and Voice Modality. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (Warsaw, Poland) (ETRA ’18). Association for Computing Machinery, New York, NY, USA, Article 88, 3 pages. https://doi.org/10.1145/3204493.3208338

Digital Library

[16]

Linda E. Sibert and Robert J. K. Jacob. 2000. Evaluation of Eye Gaze Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands) (CHI ’00). Association for Computing Machinery, New York, NY, USA, 281–288. https://doi.org/10.1145/332040.332445

Digital Library

[17]

M. Stone. 1974. Cross-Validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society. Series B (Methodological) 36, 2(1974), 111–147. http://www.jstor.org/stable/2984809

[18]

Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 581–593. https://doi.org/10.1145/3242587.3242599

Digital Library

[19]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]

Michael Wand, Jan Koutník, and Jürgen Schmidhuber. 2016. Lipreading with Long Short-Term Memory. CoRR abs/1601.08188(2016). arxiv:1601.08188http://arxiv.org/abs/1601.08188

[21]

Colin Ware and Harutune H. Mikaelian. 1986. An Evaluation of an Eye Tracker as a Device for Computer Input2. SIGCHI Bull. 17, SI (May 1986), 183–188. https://doi.org/10.1145/30851.275627

Digital Library

[22]

Shumin Zhai, Carlos Morimoto, and Steven Ihde. 1999a. Manual and Gaze Input Cascaded (MAGIC) Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 246–253. https://doi.org/10.1145/302979.303053

Digital Library

[23]

Shumin Zhai, Carlos Morimoto, and Steven Ihde. 1999b. Manual and Gaze Input Cascaded (MAGIC) Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 246–253. https://doi.org/10.1145/302979.303053

Digital Library

[24]

Yanxia Zhang, Sophie Stellmach, Abigail Sellen, and Andrew Blake. 2015. The costs and benefits of combining gaze and hand gestures for remote interaction. In IFIP Conference on Human-Computer Interaction. Springer, 570–577.

Digital Library

[25]

Ziheng Zhou, Guoying Zhao, Xiaopeng Hong, and Matti Pietikäinen. 2014. Editor’s Choice Article. Image and Vision Computing 32, 9 (2014), 590–605. https://doi.org/10.1016/j.imavis.2014.06.004

Cited By

Pandey LArif A(2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642348
Wang XSu ZRekimoto JZhang Y(2024)Watch Your Mouth: Silent Speech Recognition with Depth SensingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642092(1-15)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642092
Lai YSun MLi Z(2024)GazePuffer: Hands-Free Input Method Leveraging Puff Cheeks for VR2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00055(331-341)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VR58804.2024.00055
Show More Cited By

Index Terms

Gaze+Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Touch screens
    2. Interaction paradigms

Index terms have been assigned to the content through auto-classification.

Recommendations

Design and Evaluation of a Silent Speech-Based Selection Method for Eye-Gaze Pointing

We investigate silent speech as a hands-free selection method in eye-gaze pointing. We first propose a stripped-down image-based model that can recognize a small number of silent commands almost as fast as state-of-the-art speech recognition models. We ...
Advantage of Gaze-Only Content Browsing in VR using Cumulative Dwell Time Compared to Hand Controller
SUI '23: Proceedings of the 2023 ACM Symposium on Spatial User Interaction

Head-mounted displays(HMDs) are expected to be used as daily devices. Developing interfaces to control contents projected in a head-mounted display (HMD) is key to leading the spread of HMD usage. With the need for a new interface of the HMD, gaze ...
Improving the accuracy of gaze input for interaction
ETRA '08: Proceedings of the 2008 symposium on Eye tracking research & applications

Using gaze information as a form of input poses challenges based on the nature of eye movements and how we humans use our eyes in conjunction with other motor actions. In this paper, we present three techniques for improving the use of gaze as a form of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ETRA '21 Short Papers: ACM Symposium on Eye Tracking Research and Applications

May 2021

232 pages

ISBN:9781450383455

DOI:10.1145/3448018

Editors:
Andreas Bulling
University of Stuttgart, Germany
,
Anke Huckauf
Ulm University, Germany
,
Hans Gellersen
Aarhus University, Denmark
,
Daniel Weiskopf
University of Stuttgart, Germany
,
Mihai Bace
ETH Zürich, Switzerland
,
Teresa Hirzle
Ulm University, Germany
,
Florian Alt
Bundeswehr University Munich, Germany
,
Thies Pfeiffer
Hochschule Emden/Leer, Germany
,
Roman Bednarik
University of Eastern Finland, Finland
,
Krzysztof Krejtz
SWPS University of Social Sciences and Humanities, Poland
,
Tanja Blascheck
University of Stuttgart, Germany
,
Michael Burch
University of Applied Sciences, Chur, Switzerland
,
Peter Kiefer
ETH Zurich, Switzerland
,
Michael Dodd
University of Nebraska-Lincoln, USA
,
Bonita Sharif
University of Nebraska-Lincoln, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Japan Society for the Promotion of Science

Conference

ETRA '21

Sponsor:

SIGGRAPH

ETRA '21: 2021 Symposium on Eye Tracking Research and Applications

May 25 - 27, 2021

Virtual Event, Germany

Acceptance Rates

Overall Acceptance Rate 69 of 137 submissions, 50%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
366
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)17

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pandey LArif A(2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642348
Wang XSu ZRekimoto JZhang Y(2024)Watch Your Mouth: Silent Speech Recognition with Depth SensingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642092(1-15)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642092
Lai YSun MLi Z(2024)GazePuffer: Hands-Free Input Method Leveraging Puff Cheeks for VR2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00055(331-341)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VR58804.2024.00055
Imamura SJieun LRekimoto JMakoto I(2023)Advantage of Gaze-Only Content Browsing in VR using Cumulative Dwell Time Compared to Hand ControllerProceedings of the 2023 ACM Symposium on Spatial User Interaction10.1145/3607822.3614513(1-8)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1145/3607822.3614513
S.J. PB. N(2023)Analyzing lower half facial gestures for lip reading applicationsComputer Vision and Image Understanding10.1016/j.cviu.2023.103738233:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.cviu.2023.103738
Erazo MCocha Tobanda EYoo S(2023)Development of a System for Controlling IoT Devices Using Gaze TrackingIntelligent Sustainable Systems10.1007/978-981-99-1726-6_12(157-171)Online publication date: 16-Jun-2023
https://doi.org/10.1007/978-981-99-1726-6_12
Cocha Toabanda EErazo MYoo S(2023)Gaze Tracking: A Survey of Devices, Libraries and ApplicationsModelling and Development of Intelligent Systems10.1007/978-3-031-27034-5_2(18-41)Online publication date: 26-Feb-2023
https://doi.org/10.1007/978-3-031-27034-5_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents