Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1027933.1027992acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

A multimodal learning interface for sketch, speak and point creation of a schedule chart

Published: 13 October 2004 Publication History

Abstract

We present a video demonstration of an agent-based test bed application for ongoing research into multi-user, multimodal, computer-assisted meetings. The system tracks a two person scheduling meeting: one person standing at a touch sensitive whiteboard creating a Gantt chart, while another person looks on in view of a calibrated stereo camera. The stereo camera performs real-time, untethered, vision-based tracking of the onlooker's head, torso and limb movements, which in turn are routed to a 3D-gesture recognition agent. Using speech, 3D deictic gesture and 2D object de-referencing the system is able to track the onlooker's suggestion to move a specific milestone. The system also has a speech recognition agent capable of recognizing out-of-vocabulary (OOV) words as phonetic sequences. Thus when a user at the whiteboard speaks an OOV label name for a chart constituent while also writing it, the OOV speech is combined with letter sequences hypothesized by the handwriting recognizer to yield an orthography, pronunciation and semantics for the new label. These are then learned dynamically by the system and become immediately available for future recognition.

References

[1]
E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner, "Mutual Disambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality," ICMI '03, 12--19.
[2]
P.R. Cohen, M. Johnston, D.R. McGee, S.L. Oviatt, J.A. Pittman, I. Smith, L. Chen, and J. Clow, "QuickSet: Multimodal Interaction for Distributed Applications," Intl. Multimedia Conference, '97, 31--40.
[3]
J. Dowding, J.M. Gawron, D. Appelt, J. Bear, L. Cherny, R. Moore and D. Moran, "Gemini: A Natural Language System for Spoken-Language Understanding," ACL '93, 54--61.
[4]
O. Lemon, A. Gruenstein and S. Peters, "Collaborative Activities and Multi-tasking in Dialogue Systems," Traitment Automatique desLangues, 2002, 43(2), 131--154.
[5]
A.W. Black, K.A. Lenzo, "Flite: a small fast run-time synthesis engine," 4th ISCA Workshop on Speech Synthesis, 2001.
[6]
D. Demirdjian, T. Ko and T. Darrell, "Constraining Human Body Tracking," Proc. of Int'l Conf. on Computer Vision, Nice, France, 1071--1078, Oct. 2003.

Cited By

View all
  • (2009)Skipping spare information in multimodal inputs during multimodal input fusionProceedings of the 14th international conference on Intelligent user interfaces10.1145/1502650.1502717(451-456)Online publication date: 8-Feb-2009
  • (2008)An agent-based framework for sketched symbol interpretationJournal of Visual Languages and Computing10.1016/j.jvlc.2007.04.00219:2(225-257)Online publication date: 1-Apr-2008
  • (2008)Multimodal InterfacesHCI Beyond the GUI10.1016/B978-0-12-374017-5.00012-2(391-444)Online publication date: 2008
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
October 2004
368 pages
ISBN:1581139950
DOI:10.1145/1027933
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multimodal interaction
  2. vision-based body-tracking
  3. vocabulary learning

Qualifiers

  • Article

Conference

ICMI04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2009)Skipping spare information in multimodal inputs during multimodal input fusionProceedings of the 14th international conference on Intelligent user interfaces10.1145/1502650.1502717(451-456)Online publication date: 8-Feb-2009
  • (2008)An agent-based framework for sketched symbol interpretationJournal of Visual Languages and Computing10.1016/j.jvlc.2007.04.00219:2(225-257)Online publication date: 1-Apr-2008
  • (2008)Multimodal InterfacesHCI Beyond the GUI10.1016/B978-0-12-374017-5.00012-2(391-444)Online publication date: 2008
  • (2008)A multimodal annotated corpus of consensus decision making meetingsLanguage Resources and Evaluation10.1007/s10579-007-9060-641:3-4(409-429)Online publication date: 16-Jan-2008
  • (2008)Multimodal support to group dynamicsPersonal and Ubiquitous Computing10.1007/s00779-007-0144-512:3(181-195)Online publication date: 25-Jan-2008
  • (2008)Pro-active meeting assistants: attention please!AI & Society10.1007/s00146-007-0135-023:2(213-231)Online publication date: 7-Aug-2008
  • (2007)An input-parsing algorithm supporting integration of deictic gesture in natural language interfaceProceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments10.5555/1769590.1769613(206-215)Online publication date: 22-Jul-2007
  • (2007)An efficient unification-based multimodal language processor in multimodal input fusionProceedings of the 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces10.1145/1324892.1324936(215-218)Online publication date: 28-Nov-2007
  • (2007)Multimodal redundancy across handwriting and speech during computer mediated human-human interactionsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240778(1009-1018)Online publication date: 29-Apr-2007
  • (2007)Magic PaperComputer10.1109/MC.2007.32440:9(34-41)Online publication date: 1-Sep-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media