Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3448018.3458011acmconferencesArticle/Chapter ViewAbstractPublication PagesetraConference Proceedingsconference-collections
short-paper

Gaze+Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control

Published: 25 May 2021 Publication History

Abstract

As eye-tracking technologies develop, gaze becomes more and more popular as an input modality. However, in situations that require fast and precise object selection, gaze is hard to use because of limited accuracy. We present Gaze+Lip, a hands-free interface that combines gaze and lip reading to enable rapid and precise remote controls when interacting with big displays. Gaze+Lip takes advantage of gaze for target selection and leverages silent speech to ensure accurate and reliable command execution in noisy scenarios such as watching TV or playing videos on a computer. For evaluation, we implemented a system on a TV, and conducted an experiment to compare our method with the dwell-based gaze-only input method. Results showed that Gaze+Lip outperformed the gaze-only approach in accuracy and input speed. Furthermore, subjective evaluations indicated that Gaze+Lip is easy to understand, easy to use, and has higher perceived speed than the gaze-only approach.

References

[1]
Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. 2016. LipNet: Sentence-level Lipreading. CoRR abs/1611.01599(2016). arxiv:1611.01599http://arxiv.org/abs/1611.01599
[2]
Ishan Chatterjee, Robert Xiao, and Chris Harrison. 2015. Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (Seattle, Washington, USA) (ICMI ’15). Association for Computing Machinery, New York, NY, USA, 131–138. https://doi.org/10.1145/2818346.2820752
[3]
J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman. 2017. Lip Reading Sentences in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition.
[4]
Joon Son Chung, Andrew W. Senior, Oriol Vinyals, and Andrew Zisserman. 2016. Lip Reading Sentences in the Wild. CoRR abs/1611.05358(2016). arxiv:1611.05358http://arxiv.org/abs/1611.05358
[5]
J. S. Chung and A. Zisserman. 2016. Lip Reading in the Wild. In Asian Conference on Computer Vision.
[6]
Ing-Shiou Hwang, Yi-Ying Tsai, Bo-Han Zeng, Chien-Ming Lin, Huei-Sheng Shiue, and Gwo-Ching Chang. 2020. Integration of eye tracking and lip motion for hands-free computer access. Universal Access in the Information Society(2020), 1–12.
[7]
Robert J. K. Jacob. 1990. What You Look at is What You Get: Eye Movement-Based Interaction Techniques. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/97243.97246
[8]
S. Ji, W. Xu, M. Yang, and K. Yu. 2013. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1(2013), 221–231. https://doi.org/10.1109/TPAMI.2012.59
[9]
Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI ’20). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3399715.3399852
[10]
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758.
[11]
I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey. 2002. Extraction of visual features for lipreading. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 2 (Feb 2002), 198–213. https://doi.org/10.1109/34.982900
[12]
Eric David Petajan. 1984. Automatic Lipreading to Enhance Speech Recognition (Speech Reading). Ph.D. Dissertation. Champaign, IL, USA. AAI8502266.
[13]
Ken Pfeuffer, Jason Alexander, Ming Ki Chong, and Hans Gellersen. 2014. Gaze-Touch: Combining Gaze with Multi-Touch for Interaction on the Same Surface. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology(Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 509–518. https://doi.org/10.1145/2642918.2647397
[14]
KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13796–13805.
[15]
Korok Sengupta, Min Ke, Raphael Menges, Chandan Kumar, and Steffen Staab. 2018. Hands-Free Web Browsing: Enriching the User Experience with Gaze and Voice Modality. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (Warsaw, Poland) (ETRA ’18). Association for Computing Machinery, New York, NY, USA, Article 88, 3 pages. https://doi.org/10.1145/3204493.3208338
[16]
Linda E. Sibert and Robert J. K. Jacob. 2000. Evaluation of Eye Gaze Interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands) (CHI ’00). Association for Computing Machinery, New York, NY, USA, 281–288. https://doi.org/10.1145/332040.332445
[17]
M. Stone. 1974. Cross-Validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society. Series B (Methodological) 36, 2(1974), 111–147. http://www.jstor.org/stable/2984809
[18]
Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology(Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 581–593. https://doi.org/10.1145/3242587.3242599
[19]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20]
Michael Wand, Jan Koutník, and Jürgen Schmidhuber. 2016. Lipreading with Long Short-Term Memory. CoRR abs/1601.08188(2016). arxiv:1601.08188http://arxiv.org/abs/1601.08188
[21]
Colin Ware and Harutune H. Mikaelian. 1986. An Evaluation of an Eye Tracker as a Device for Computer Input2. SIGCHI Bull. 17, SI (May 1986), 183–188. https://doi.org/10.1145/30851.275627
[22]
Shumin Zhai, Carlos Morimoto, and Steven Ihde. 1999a. Manual and Gaze Input Cascaded (MAGIC) Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 246–253. https://doi.org/10.1145/302979.303053
[23]
Shumin Zhai, Carlos Morimoto, and Steven Ihde. 1999b. Manual and Gaze Input Cascaded (MAGIC) Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 246–253. https://doi.org/10.1145/302979.303053
[24]
Yanxia Zhang, Sophie Stellmach, Abigail Sellen, and Andrew Blake. 2015. The costs and benefits of combining gaze and hand gestures for remote interaction. In IFIP Conference on Human-Computer Interaction. Springer, 570–577.
[25]
Ziheng Zhou, Guoying Zhao, Xiaopeng Hong, and Matti Pietikäinen. 2014. Editor’s Choice Article. Image and Vision Computing 32, 9 (2014), 590–605. https://doi.org/10.1016/j.imavis.2014.06.004

Cited By

View all
  • (2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
  • (2024)Watch Your Mouth: Silent Speech Recognition with Depth SensingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642092(1-15)Online publication date: 11-May-2024
  • (2024)GazePuffer: Hands-Free Input Method Leveraging Puff Cheeks for VR2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00055(331-341)Online publication date: 16-Mar-2024
  • Show More Cited By

Index Terms

  1. Gaze+Lip: Rapid, Precise and Expressive Interactions Combining Gaze Input and Silent Speech Commands for Hands-free Smart TV Control
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ETRA '21 Short Papers: ACM Symposium on Eye Tracking Research and Applications
      May 2021
      232 pages
      ISBN:9781450383455
      DOI:10.1145/3448018
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 May 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Gaze Input
      2. Hands-free Interaction
      3. Lip Reading
      4. Multimodal Interface
      5. Silent Speech

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ETRA '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 69 of 137 submissions, 50%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)75
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 30 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile DevicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642348(1-23)Online publication date: 11-May-2024
      • (2024)Watch Your Mouth: Silent Speech Recognition with Depth SensingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642092(1-15)Online publication date: 11-May-2024
      • (2024)GazePuffer: Hands-Free Input Method Leveraging Puff Cheeks for VR2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00055(331-341)Online publication date: 16-Mar-2024
      • (2023)Advantage of Gaze-Only Content Browsing in VR using Cumulative Dwell Time Compared to Hand ControllerProceedings of the 2023 ACM Symposium on Spatial User Interaction10.1145/3607822.3614513(1-8)Online publication date: 13-Oct-2023
      • (2023)Analyzing lower half facial gestures for lip reading applicationsComputer Vision and Image Understanding10.1016/j.cviu.2023.103738233:COnline publication date: 1-Aug-2023
      • (2023)Development of a System for Controlling IoT Devices Using Gaze TrackingIntelligent Sustainable Systems10.1007/978-981-99-1726-6_12(157-171)Online publication date: 16-Jun-2023
      • (2023)Gaze Tracking: A Survey of Devices, Libraries and ApplicationsModelling and Development of Intelligent Systems10.1007/978-3-031-27034-5_2(18-41)Online publication date: 26-Feb-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media