Article

A multimodal learning interface for sketch, speak and point creation of a schedule chart

Authors:

Ed Kaiser,

David Demirdjian,

Alexander Gruenstein,

Xiaoguang Li,

John Niekrasz,

Matt Wesson,

Sanjeev KumarAuthors Info & Claims

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Pages 329 - 330

https://doi.org/10.1145/1027933.1027992

Published: 13 October 2004 Publication History

Get Access

Abstract

We present a video demonstration of an agent-based test bed application for ongoing research into multi-user, multimodal, computer-assisted meetings. The system tracks a two person scheduling meeting: one person standing at a touch sensitive whiteboard creating a Gantt chart, while another person looks on in view of a calibrated stereo camera. The stereo camera performs real-time, untethered, vision-based tracking of the onlooker's head, torso and limb movements, which in turn are routed to a 3D-gesture recognition agent. Using speech, 3D deictic gesture and 2D object de-referencing the system is able to track the onlooker's suggestion to move a specific milestone. The system also has a speech recognition agent capable of recognizing out-of-vocabulary (OOV) words as phonetic sequences. Thus when a user at the whiteboard speaks an OOV label name for a chart constituent while also writing it, the OOV speech is combined with letter sequences hypothesized by the handwriting recognizer to yield an orthography, pronunciation and semantics for the new label. These are then learned dynamically by the system and become immediately available for future recognition.

References

[1]

E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner, "Mutual Disambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality," ICMI '03, 12--19.

Digital Library

Google Scholar

[2]

P.R. Cohen, M. Johnston, D.R. McGee, S.L. Oviatt, J.A. Pittman, I. Smith, L. Chen, and J. Clow, "QuickSet: Multimodal Interaction for Distributed Applications," Intl. Multimedia Conference, '97, 31--40.

Digital Library

Google Scholar

[3]

J. Dowding, J.M. Gawron, D. Appelt, J. Bear, L. Cherny, R. Moore and D. Moran, "Gemini: A Natural Language System for Spoken-Language Understanding," ACL '93, 54--61.

Digital Library

Google Scholar

[4]

O. Lemon, A. Gruenstein and S. Peters, "Collaborative Activities and Multi-tasking in Dialogue Systems," Traitment Automatique desLangues, 2002, 43(2), 131--154.

Google Scholar

[5]

A.W. Black, K.A. Lenzo, "Flite: a small fast run-time synthesis engine," 4th ISCA Workshop on Speech Synthesis, 2001.

Google Scholar

[6]

D. Demirdjian, T. Ko and T. Darrell, "Constraining Human Body Tracking," Proc. of Int'l Conf. on Computer Vision, Nice, France, 1071--1078, Oct. 2003.

Digital Library

Google Scholar

Cited By

View all

Sun YShi YChen FChung VConati CBauer MOliver NWeld D(2009)Skipping spare information in multimodal inputs during multimodal input fusionProceedings of the 14th international conference on Intelligent user interfaces10.1145/1502650.1502717(451-456)Online publication date: 8-Feb-2009
https://dl.acm.org/doi/10.1145/1502650.1502717
Casella GDeufemia VMascardi VCostagliola GMartelli M(2008)An agent-based framework for sketched symbol interpretationJournal of Visual Languages and Computing10.1016/j.jvlc.2007.04.00219:2(225-257)Online publication date: 1-Apr-2008
https://dl.acm.org/doi/10.1016/j.jvlc.2007.04.002
Barthelmess POviatt S(2008)Multimodal InterfacesHCI Beyond the GUI10.1016/B978-0-12-374017-5.00012-2(391-444)Online publication date: 2008
https://doi.org/10.1016/B978-0-12-374017-5.00012-2
Show More Cited By

Index Terms

A multimodal learning interface for sketch, speak and point creation of a schedule chart

Recommendations

Multimodal human discourse: gesture and speech

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the ...
Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application
IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces

Our goal is to automatically recognize and enroll new vocabulary in a multimodal interface. To accomplish this our technique aims to leverage the mutually disambiguating aspects of co-referenced, co-temporal handwriting and speech. The co-referenced ...
A multimodal learning interface for grounding spoken language in sensory perceptions

We present a multimodal interface that learns words from natural interactions with users. In light of studies of human language development, the learning system is trained in an unsupervised mode in which users perform everyday tasks while providing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

368 pages

ISBN:1581139950

DOI:10.1145/1027933

General Chairs:
Rajeev Sharma
Advanced Interfaces
,
Trevor Darrell
Massachusetts Institute of Technology
,
Program Chairs:
Mary Harper
Purdue University, West Lafayette, IN
,
Gianni Lazzari
ITC-IRST
,
Matthew Turk
University of California, Santa Barbara, CA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI04

Sponsor:

ICMI04: Sixth International Conference on Multimodal Interfaces 2004

October 13 - 15, 2004

PA, State College, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sun YShi YChen FChung VConati CBauer MOliver NWeld D(2009)Skipping spare information in multimodal inputs during multimodal input fusionProceedings of the 14th international conference on Intelligent user interfaces10.1145/1502650.1502717(451-456)Online publication date: 8-Feb-2009
https://dl.acm.org/doi/10.1145/1502650.1502717
Casella GDeufemia VMascardi VCostagliola GMartelli M(2008)An agent-based framework for sketched symbol interpretationJournal of Visual Languages and Computing10.1016/j.jvlc.2007.04.00219:2(225-257)Online publication date: 1-Apr-2008
https://dl.acm.org/doi/10.1016/j.jvlc.2007.04.002
Barthelmess POviatt S(2008)Multimodal InterfacesHCI Beyond the GUI10.1016/B978-0-12-374017-5.00012-2(391-444)Online publication date: 2008
https://doi.org/10.1016/B978-0-12-374017-5.00012-2
Pianesi FZancanaro MLepri BCappelletti A(2008)A multimodal annotated corpus of consensus decision making meetingsLanguage Resources and Evaluation10.1007/s10579-007-9060-641:3-4(409-429)Online publication date: 16-Jan-2008
https://doi.org/10.1007/s10579-007-9060-6
Pianesi FZancanaro MNot ELeonardi CFalcon VLepri B(2008)Multimodal support to group dynamicsPersonal and Ubiquitous Computing10.1007/s00779-007-0144-512:3(181-195)Online publication date: 25-Jan-2008
https://dl.acm.org/doi/10.1007/s00779-007-0144-5
Rienks RNijholt ABarthelmess P(2008)Pro-active meeting assistants: attention please!AI & Society10.1007/s00146-007-0135-023:2(213-231)Online publication date: 7-Aug-2008
https://dl.acm.org/doi/10.1007/s00146-007-0135-0
Sun YChen FShi YChung V(2007)An input-parsing algorithm supporting integration of deictic gesture in natural language interfaceProceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments10.5555/1769590.1769613(206-215)Online publication date: 22-Jul-2007
https://dl.acm.org/doi/10.5555/1769590.1769613
Sun YShi YChen FChung VThomas BBillinghurst M(2007)An efficient unification-based multimodal language processor in multimodal input fusionProceedings of the 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces10.1145/1324892.1324936(215-218)Online publication date: 28-Nov-2007
https://dl.acm.org/doi/10.1145/1324892.1324936
Kaiser EBarthelmess PErdmann CCohen PRosson MGilmore D(2007)Multimodal redundancy across handwriting and speech during computer mediated human-human interactionsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240778(1009-1018)Online publication date: 29-Apr-2007
https://dl.acm.org/doi/10.1145/1240624.1240778
Davis R(2007)Magic PaperComputer10.1109/MC.2007.32440:9(34-41)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1109/MC.2007.324
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Multimodal human discourse: gesture and speech

Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

A multimodal learning interface for grounding spoken language in sensory perceptions