research-article

Robust gesture processing for multimodal interaction

Authors:

Srinivas Bangalore,

Michael JohnstonAuthors Info & Claims

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

Pages 225 - 232

https://doi.org/10.1145/1452392.1452439

Published: 20 October 2008 Publication History

Abstract

With the explosive growth in mobile computing and communication over the past few years, it is possible to access almost any information from virtually anywhere. However, the efficiency and effectiveness of this interaction is severely limited by the inherent characteristics of mobile devices, including small screen size and the lack of a viable keyboard or mouse. This paper concerns the use of multimodal language processing techniques to enable interfaces combining speech and gesture input that overcome these limitations. Specifically we focus on robust processing of pen gesture inputs in a local search application and demonstrate that edit-based techniques that have proven effective in spoken language processing can also be used to overcome unexpected or errorful gesture input. We also examine the use of a bottom-up gesture aggregation technique to improve the coverage of multimodal understanding.

References

[1]

A. Adler and R. Davis. Speech and sketching for multimodal design. In Proceedings of 9th International Conference on Intelligent User Interfaces, pages 214--216. ACM Press, 2004.

Digital Library

[2]

J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A. Stent. Towards conversational human-computer interaction. AI Magazine, 22(4):27--38, December 2001.

Digital Library

[3]

E. André. Natural language in multimedia/multimodal systems. In R. Mitkov, editor, Handbook of Computational Linguistics, pages 650--669. Oxford University Press, 2002.

[4]

S. Bangalore and M. Johnston. Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In Proceedings of North American Association for Computational Linguistics/Human Language Technology, pages 33--40, Boston, USA, 2004.

[5]

S. Bangalore and M. Johnston. Robust understanding in multimodal interfaces. Accepted for publication in Computational Linguistics, 2008.

Digital Library

[6]

M. Boros, W. Eckert, F. Gallwitz, G. Gğrz, G. Hanrieder, and H. Niemann. Towards understanding spontaneous speech: word accuracy vs. concept accuracy. In Proceedings of International Conference on Spoken Language Processing, pages 41--44, Philadelphia, USA, 1996.

[7]

J. Cassell. Embodied conversational agents: Representation and intelligence in user interface. In AI Magazine, volume 22, pages 67--83, 2001.

Digital Library

[8]

A. Cheyer and L. Julia. Multimodal Maps: An Agent-Based Approach. Lecture Notes in Computer Science, 1374:103--113, 1998.

Digital Library

[9]

A. Ciaramella. A Prototype Performance Evaluation Report. Technical Report WP8000-D3, Project Esprit 2218 SUNDIAL, 1993.

[10]

P. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. Multimodal interaction for distributed interactive simulation. In M. Maybury and W. Wahlster, editors, Readings in Intelligent Interfaces, pages 562--571. Morgan Kaufmann Publishers, 1998.

Digital Library

[11]

P. R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Clow, and I. Smith. The efficiency of multimodal interaction: a case study. In Proceedings of International Conference on Spoken Language Processing, pages 249--252, Sydney, Australia, 1998.

[12]

J. Dowding, J. Gawron, D. Appelt, J. Bear, L. Cherny, R. Moore, and D. Moran. GEMINI: A natural language system for spoken-language understanding. In Proceedings of Association for Computational Linguistics, pages 54--61, Columbus, OH, USA, 1993.

Digital Library

[13]

P. Ehlen, M. Johnston, and G. Vasireddy. Collecting mobile multimodal data for MATCH. In Proceedings of International Conference on Spoken Language Processing, pages 2557--2560, Denver, CO, USA, 2002.

[14]

A. L. Gorin, G. Riccardi, and J. H. Wright. How May I Help You? Speech Communication, 23(1-2):113--127, 1997.

Digital Library

[15]

J. Gustafson, L. Bell, J. Beskow, J. Boye, R. Carlson, J. Edlund, B. Granstrm, D. House, and M. Wirén. Adapt - a multimodal conversational dialogue system in an apartment domain. In International Conference on Spoken Language Processing, pages 134--137, Beijing, China, 2000.

[16]

M. Johnston. Deixis and conjunction in multimodal systems. In Proceedings of International Conference on Computational Linguistics (COLING), pages 362--368, Saarbrücken, Germany, 2000.

Digital Library

[17]

M. Johnston and S. Bangalore. Finite-state multimodal parsing and understanding. In Proceedings of International Conference on Computational Linguistics (COLING), pages 369--375, Saarbrücken, Germany, 2000.

Digital Library

[18]

M. Johnston and S. Bangalore. Finite-state multimodal integration and understanding. Journal of Natural Language Engineering, 11(2):159--187, 2005.

Digital Library

[19]

M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. MATCH: An architecture for multimodal dialog systems. In Proceedings of Association of Computational Linguistics, pages 376--383, Philadelphia, PA, USA, 2002.

Digital Library

[20]

J. A. Landay and B. A. Myers. Sketching interfaces: Toward more human interface design. IEEE Computer, 34(3):56--64, March 2001.

Digital Library

[21]

S. Oviatt. Multimodal interactive maps: Designing for human performance. Human-Computer Interaction, 12(1):93--129, 1997.

[22]

S. Oviatt. Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the Conference on Human Factors in Computing Systems: CHI'99, pages 576--583, Pittsburgh, PA, USA, 1999. ACM Press.

Digital Library

[23]

W. Wahlster. SmartKom: Fusion and fission of speech, gestures, and facial expressions. In Proceedings of the 1st International Workshop on Man-Machine Symbiotic Systems, pages 213--225, Kyoto, Japan, 2002.

[24]

W. Ward. Understanding spontaneous speech: the Phoenix system. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pages 365--367, Washington, D.C., USA, 1991.

Digital Library

Index Terms

Robust gesture processing for multimodal interaction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Touch screens
    2. Interaction paradigms
      1. Natural language interfaces

Recommendations

Multimodal Interaction: Intuitive, Robust, and Preferred?
INTERACT '09: Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II

We investigated if and under which conditions multimodal interfaces (<em>touch</em> , <em>speech</em> , <em>motion control</em> ) fulfil the expectation of being superior to unimodal interfaces. The results show that the possibility of multimodal ...
Multimodal framework for mobile interaction
AVI '12: Proceedings of the International Working Conference on Advanced Visual Interfaces

In recent years multimodal interaction is becoming of great interest thanks to the increasing availability of mobile devices. In this view, many applications making use of speech, gestures on the touch screen and other interaction modalities are ...
Eye gaze assisted human-computer interaction in a hand gesture controlled multi-display environment
Gaze-In '12: Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction

A special human-computer interaction (HCI) framework processing user input in a multi-display environment has the ability to detect and interpret dynamic hand gesture input. In an environment equipped with large displays, full contactless application ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

October 2008

322 pages

ISBN:9781605581989

DOI:10.1145/1452392

General Chairs:
Vassilis Digalakis
TU Crete, Greece
,
Alex Potamianos
TU Crete, Greece
,
Matthew Turk
UC Santa Barbara, USA
,
Program Chairs:
Roberto Pieraccini
SpeechCycle, USA
,
Yuri Ivanov
MERL Research, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI '08

Sponsor:

ICMI '08: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES

October 20 - 22, 2008

Crete, Chania, Greece

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
369
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten