Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2254556.2254585acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaviConference Proceedingsconference-collections
research-article

SpeeG: a multimodal speech- and gesture-based text input solution

Published: 21 May 2012 Publication History

Abstract

We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.

References

[1]
J. D. Baerdemaeker. SpeeG: A Speech- and Gesture-based Text Input Device. Master's thesis, Web & Information Systems Engineering Lab, Vrije Universiteit Brussel, September 2011.
[2]
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium, Philadelphia, 1993.
[3]
D. Huggins-Daines and A. I. Rudnicky. Interactive ASR Error Correction for Touchscreen Devices. In Demo Proceedings of ACL 2008, Annual Meeting of the Association for Computational Linguistics, pages 17--19, Columbus, USA, 2008.
[4]
S. I. MacKenzie and W. R. Soukoreff. Phrase Sets for Evaluating Text Entry Techniques. In Extended Abstracts of CHI 2003, ACM Conference on Human Factors in Computing Systems, pages 754--755, Fort Lauderdale, USA, April 2003.
[5]
N. Osawa and Y. Y. Sugimoto. Multimodal Text Input in an Immersive Environment. In Proceedings of ICAT 2002, 12th International Conference on Artificial Reality and Telexistence, pages 85--92, Tokyo, Japan, December 2002.
[6]
S. Oviatt. Taming Recognition Errors with a Multimodal Interface. Communications of the ACM, 43:45--51, September 2000.
[7]
B. Suhm, B. Myers, and A. Waibel. Multimodal Error Correction for Speech User Interfaces. ACM Transactions on Computer-Human Interaction (TOCHI), 8:60--98, March 2001.
[8]
O. Tuisku, P. Majaranta, P. Isokoski, and K.-J. Räihä. Now Dasher! Dash Away!: Longitudinal Study of Fast Text Entry by Eye Gaze. In Proceedings of ETRA 2008, International Symposium on Eye Tracking Research & Applications, pages 19--26, Savannah, USA, March 2008.
[9]
K. Vertanen and P. O. Kristensson. Parakeet: A Continuous Speech Recognition System for Mobile Touch-Screen Devices. In Proceedings of IUI 2009, International Conference on Intelligent User Interfaces, pages 237--246, Sanibel Island, USA, February 2009.
[10]
K. Vertanen and D. J. MacKay. Speech Dasher: Fast Writing using Speech and Gaze. In Proceedings of CHI 2010, ACM Conference on Human Factors in Computing Systems, pages 595--598, Atlanta, USA, April 2010.
[11]
W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh, E. Gouvea, P. Wolf, and J. Woelfel. Sphinx-4: A Flexible Open Source Framework for Speech Recognition. Technical Report TR-2004-139, Sun Microsystems Inc., November 2004.
[12]
D. J. Ward, A. F. Blackwell, and D. J. C. MacKay. Dasher -- A Data Entry Interface Using Continuous Gestures and Language Models. In Proceedings of UIST 2000, 13th Annual ACM Symposium on User Interface Software and Technology, pages 129--137, San Diego, USA, November 2000.
[13]
D. J. Ward and D. J. C. MacKay. Fast Hands-free Writing by Gaze Direction. Nature, 418(6900):838, August 2002.
[14]
S. Wills and D. MacKay. Dasher -- An Efficient Writing System for Brain-Computer Interfaces? IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2):244--246, June 2006.
[15]
A. D. Wilson and M. Agrawala. Text Entry Using a Dual Joystick Game Controller. In Proceedings of CHI 2006, ACM Conference on Human Factors in Computing Systems, pages 475--478, Montréal, Canada, April 2006.

Cited By

View all
  • (2024)Verbal and Nonverbal Communication Differences between In-Person and Live-Streamed Group Physical Activity: A Specific Investigation into Yoga InstructionSSRN Electronic Journal10.2139/ssrn.4802390Online publication date: 2024
  • (2024)Virtual Reality Bank: A Novel Banking Experience in Immersive WorldProceedings of the 26th Symposium on Virtual and Augmented Reality10.1145/3691573.3691576(193-202)Online publication date: 30-Sep-2024
  • (2024)TouchEditorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314547:4(1-29)Online publication date: 12-Jan-2024
  • Show More Cited By

Index Terms

  1. SpeeG: a multimodal speech- and gesture-based text input solution

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AVI '12: Proceedings of the International Working Conference on Advanced Visual Interfaces
    May 2012
    846 pages
    ISBN:9781450312875
    DOI:10.1145/2254556
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Consulta Umbria SRL
    • University of Salerno: University of Salerno

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Kinect sensor
    2. SpeeG
    3. gesture input
    4. multimodal text input
    5. speech recognition

    Qualifiers

    • Research-article

    Conference

    AVI'12
    Sponsor:
    • University of Salerno

    Acceptance Rates

    Overall Acceptance Rate 128 of 490 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Verbal and Nonverbal Communication Differences between In-Person and Live-Streamed Group Physical Activity: A Specific Investigation into Yoga InstructionSSRN Electronic Journal10.2139/ssrn.4802390Online publication date: 2024
    • (2024)Virtual Reality Bank: A Novel Banking Experience in Immersive WorldProceedings of the 26th Symposium on Virtual and Augmented Reality10.1145/3691573.3691576(193-202)Online publication date: 30-Sep-2024
    • (2024)TouchEditorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314547:4(1-29)Online publication date: 12-Jan-2024
    • (2023)MEinVR: Multimodal interaction techniques in immersive explorationVisual Informatics10.1016/j.visinf.2023.06.0017:3(37-48)Online publication date: Sep-2023
    • (2023)Extended Reality for Knowledge Work in Everyday EnvironmentsEveryday Virtual and Augmented Reality10.1007/978-3-031-05804-2_2(21-56)Online publication date: 19-Feb-2023
    • (2022)Multimodal Error Correction for Speech-to-Text in a Mobile Office Automated Vehicle: Results From a Remote StudyProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511131(496-505)Online publication date: 22-Mar-2022
    • (2022)Hand Gesture Interpretation Model for Indian Sign Language using Neural Networks2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9825322(1-5)Online publication date: 7-Apr-2022
    • (2022)Climbing Keyboard: A Tilt-Based Selection Keyboard Entry for Virtual RealityInternational Journal of Human–Computer Interaction10.1080/10447318.2022.214412040:5(1327-1338)Online publication date: 15-Nov-2022
    • (2021)Text Entry in Virtual Environments using Speech and a Midair KeyboardIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.306777627:5(2648-2658)Online publication date: May-2021
    • (2020)MRCAT: In Situ Prototyping of Interactive AR EnvironmentsVirtual, Augmented and Mixed Reality. Design and Interaction10.1007/978-3-030-49695-1_16(235-255)Online publication date: 10-Jul-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media