research-article

SpeeG: a multimodal speech- and gesture-based text input solution

Authors:

Beat SignerAuthors Info & Claims

AVI '12: Proceedings of the International Working Conference on Advanced Visual Interfaces

Pages 156 - 163

https://doi.org/10.1145/2254556.2254585

Published: 21 May 2012 Publication History

Abstract

We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.

References

[1]

J. D. Baerdemaeker. SpeeG: A Speech- and Gesture-based Text Input Device. Master's thesis, Web & Information Systems Engineering Lab, Vrije Universiteit Brussel, September 2011.

[2]

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium, Philadelphia, 1993.

[3]

D. Huggins-Daines and A. I. Rudnicky. Interactive ASR Error Correction for Touchscreen Devices. In Demo Proceedings of ACL 2008, Annual Meeting of the Association for Computational Linguistics, pages 17--19, Columbus, USA, 2008.

Digital Library

[4]

S. I. MacKenzie and W. R. Soukoreff. Phrase Sets for Evaluating Text Entry Techniques. In Extended Abstracts of CHI 2003, ACM Conference on Human Factors in Computing Systems, pages 754--755, Fort Lauderdale, USA, April 2003.

Digital Library

[5]

N. Osawa and Y. Y. Sugimoto. Multimodal Text Input in an Immersive Environment. In Proceedings of ICAT 2002, 12th International Conference on Artificial Reality and Telexistence, pages 85--92, Tokyo, Japan, December 2002.

[6]

S. Oviatt. Taming Recognition Errors with a Multimodal Interface. Communications of the ACM, 43:45--51, September 2000.

Digital Library

[7]

B. Suhm, B. Myers, and A. Waibel. Multimodal Error Correction for Speech User Interfaces. ACM Transactions on Computer-Human Interaction (TOCHI), 8:60--98, March 2001.

Digital Library

[8]

O. Tuisku, P. Majaranta, P. Isokoski, and K.-J. Räihä. Now Dasher! Dash Away!: Longitudinal Study of Fast Text Entry by Eye Gaze. In Proceedings of ETRA 2008, International Symposium on Eye Tracking Research & Applications, pages 19--26, Savannah, USA, March 2008.

Digital Library

[9]

K. Vertanen and P. O. Kristensson. Parakeet: A Continuous Speech Recognition System for Mobile Touch-Screen Devices. In Proceedings of IUI 2009, International Conference on Intelligent User Interfaces, pages 237--246, Sanibel Island, USA, February 2009.

Digital Library

[10]

K. Vertanen and D. J. MacKay. Speech Dasher: Fast Writing using Speech and Gaze. In Proceedings of CHI 2010, ACM Conference on Human Factors in Computing Systems, pages 595--598, Atlanta, USA, April 2010.

Digital Library

[11]

W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh, E. Gouvea, P. Wolf, and J. Woelfel. Sphinx-4: A Flexible Open Source Framework for Speech Recognition. Technical Report TR-2004-139, Sun Microsystems Inc., November 2004.

Digital Library

[12]

D. J. Ward, A. F. Blackwell, and D. J. C. MacKay. Dasher -- A Data Entry Interface Using Continuous Gestures and Language Models. In Proceedings of UIST 2000, 13th Annual ACM Symposium on User Interface Software and Technology, pages 129--137, San Diego, USA, November 2000.

Digital Library

[13]

D. J. Ward and D. J. C. MacKay. Fast Hands-free Writing by Gaze Direction. Nature, 418(6900):838, August 2002.

[14]

S. Wills and D. MacKay. Dasher -- An Efficient Writing System for Brain-Computer Interfaces? IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2):244--246, June 2006.

[15]

A. D. Wilson and M. Agrawala. Text Entry Using a Dual Joystick Game Controller. In Proceedings of CHI 2006, ACM Conference on Human Factors in Computing Systems, pages 475--478, Montréal, Canada, April 2006.

Digital Library

Cited By

Islam MHarden SLee SLim S(2024)Verbal and Nonverbal Communication Differences between In-Person and Live-Streamed Group Physical Activity: A Specific Investigation into Yoga InstructionSSRN Electronic Journal10.2139/ssrn.4802390Online publication date: 2024
https://doi.org/10.2139/ssrn.4802390
Rasanji VKumarasinghe NPethangoda RKarunarathna VRodrigo R(2024)Virtual Reality Bank: A Novel Banking Experience in Immersive WorldProceedings of the 26th Symposium on Virtual and Augmented Reality10.1145/3691573.3691576(193-202)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3691573.3691576
Zhan LXiong TZhang HGuo SChen XGong JLin JQin Y(2024)TouchEditorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314547:4(1-29)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631454
Show More Cited By

Index Terms

SpeeG: a multimodal speech- and gesture-based text input solution
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Touch screens

Recommendations

Human-Machine Interaction based on Hand Gesture Recognition using Skeleton Information of Kinect Sensor
ICAIT'2018: Proceedings of the 3rd International Conference on Applications in Information Technology

The hand gesture provides a natural and intuitive communication medium for the human and machine interaction. Because, it can use in virtual reality, language detection, computer games, and other human-computer or human-machine instruction applications. ...
An analysis method for eye motion and eye blink detection from colour images around ocular region

This paper examines an analysis and measurement method that to aim to involve incorporating eye motion and eye blink as functions of an eye-based human-computer interface. Proposed method has been achieved by analysing the visible-light images captured ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AVI '12: Proceedings of the International Working Conference on Advanced Visual Interfaces

May 2012

846 pages

ISBN:9781450312875

DOI:10.1145/2254556

Editors:
Genny Tortora
Università di Salerno, Italy
,
Stefano Levialdi
Sapienza Università di Roma, Italy
,
Maurizio Tucci
Università di Salerno, Italy

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Consulta Umbria SRL
University of Salerno: University of Salerno

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AVI'12

Sponsor:

University of Salerno

AVI'12: International Working Conference on Advanced Visual Interfaces

May 21 - 25, 2012

Capri Island, Italy

Acceptance Rates

Overall Acceptance Rate 128 of 490 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
521
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Islam MHarden SLee SLim S(2024)Verbal and Nonverbal Communication Differences between In-Person and Live-Streamed Group Physical Activity: A Specific Investigation into Yoga InstructionSSRN Electronic Journal10.2139/ssrn.4802390Online publication date: 2024
https://doi.org/10.2139/ssrn.4802390
Rasanji VKumarasinghe NPethangoda RKarunarathna VRodrigo R(2024)Virtual Reality Bank: A Novel Banking Experience in Immersive WorldProceedings of the 26th Symposium on Virtual and Augmented Reality10.1145/3691573.3691576(193-202)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3691573.3691576
Zhan LXiong TZhang HGuo SChen XGong JLin JQin Y(2024)TouchEditorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314547:4(1-29)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631454
Yuan ZHe SLiu YYu L(2023)MEinVR: Multimodal interaction techniques in immersive explorationVisual Informatics10.1016/j.visinf.2023.06.0017:3(37-48)Online publication date: Sep-2023
https://doi.org/10.1016/j.visinf.2023.06.001
Biener VOfek EPahud MKristensson PGrubert J(2023)Extended Reality for Knowledge Work in Everyday EnvironmentsEveryday Virtual and Augmented Reality10.1007/978-3-031-05804-2_2(21-56)Online publication date: 19-Feb-2023
https://doi.org/10.1007/978-3-031-05804-2_2
Schartmüller CRiener A(2022)Multimodal Error Correction for Speech-to-Text in a Mobile Office Automated Vehicle: Results From a Remote StudyProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511131(496-505)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490099.3511131
P RC JAishwarya RR YMathivanan G(2022)Hand Gesture Interpretation Model for Indian Sign Language using Neural Networks2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9825322(1-5)Online publication date: 7-Apr-2022
https://doi.org/10.1109/I2CT54291.2022.9825322
Huang JSun MQin JGao BQin G(2022)Climbing Keyboard: A Tilt-Based Selection Keyboard Entry for Virtual RealityInternational Journal of Human–Computer Interaction10.1080/10447318.2022.214412040:5(1327-1338)Online publication date: 15-Nov-2022
https://doi.org/10.1080/10447318.2022.2144120
Adhikary JVertanen K(2021)Text Entry in Virtual Environments using Speech and a Midair KeyboardIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.306777627:5(2648-2658)Online publication date: May-2021
https://doi.org/10.1109/TVCG.2021.3067776
Whitlock MMitchell JPfeufer NArnot BCraig RWilson BChung BSzafir D(2020)MRCAT: In Situ Prototyping of Interactive AR EnvironmentsVirtual, Augmented and Mixed Reality. Design and Interaction10.1007/978-3-030-49695-1_16(235-255)Online publication date: 10-Jul-2020
https://doi.org/10.1007/978-3-030-49695-1_16
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents