Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1240624.1240778acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Article

Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Published: 29 April 2007 Publication History

Abstract

Lecturers, presenters and meeting participants often say what they publicly handwrite. In this paper, we report on three empirical explorations of such multimodal redundancy -- during whiteboard presentations, during a spontaneous brainstorming meeting, and during the informal annotation and discussion of photographs. We show that redundantly presented words, compared to other words used during a presentation or meeting, tend to be topic specific and thus are likely to be out-of-vocabulary. We also show that they have significantly higher tf-idf (term frequency-inverse document frequency) weights than other words, which we argue supports the hypothesis that they are dialogue-critical words. We frame the import of these empirical findings by describing SHACER, our recently introduced Speech and HAndwriting reCognizER, which can combine information from instances of redundant handwriting and speech to dynamically learn new vocabulary.

References

[1]
A llauzen, A. and J.-L. Gauvain. Open Vocabulary ASR for Audiovisual Document Indexation. ICASSP '05, (2005).
[2]
Anderson, R., C. Hoyer, C. Prince, J. Su, F. Videon, and S. Wolfman. Speech, Ink and Slides: The Interaction of Content Channels. ACM Multimedia, (2004).
[3]
Anderson, R.J., R. Anderson, C. Hoyer, and S.A. Wolfman. A Study of Digital Ink in Lecture Presentation. CHI '04, (2004).
[4]
Baeza-Yates, R. and B. Ribeiro-Neto, Modern Information Retrieval: Addison-Wesley, 1999.
[5]
Barthelmess, P., E.C. Kaiser, X. Huang, and D. Demirdjian. Distributed Pointing for Multimodal Collaboration over Sketched Diagrams. ICMI 2005.
[6]
Barthelmess, P., E.C. Kaiser, X. Huang, D. McGee, and P. Cohen. Collaborative Multimodal Photo Annotation over Digital Paper. ICMI'06, (2006).
[7]
Black, A., P. Taylor, and R. Caley, The Festival Speech Synthesis System: System Documentation, in Technical Report HCRC/TR--83. 1998, Human Communication Research Centre.
[8]
Brennan, S. Lexical Entrainment in Spontaneous Dialogue. International Symposium on Spoken Dialogue, (1996), 41--44.
[9]
Chai, J.Y., Z. Prasov, J. Blaim, and R. Jin. Linguistic Theories in Efficient Multimodal Reference Resolution: An Empirical Investigation. IUI '05, (2005), 43--50.
[10]
Clark, H.H., Using Language: Cambridge University Press, 1996.
[11]
Garofolo, J., G. Auzanne, and E. Voorhees. The Trec Spoken Document Retrieval Track: A Success Story. RAIO-2000: Content-Based Multimedia Information Access Conference, (2000), 1--20.
[12]
Glass, J., T.J. Hazen, L. Hetherington, and C. Wang. Analysis and Processing of Lecture Audio Data: Preliminary Investigations. HLT-NAACL, Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, (2004).
[13]
Grice, H.P., Logic and Conversation, in Speech Acts, P. Cole and J. Morgan, Eds, Acad. Press: 1975, NY. 41--58.
[14]
Gupta, A.K. and T. Anastasakos. Dynamic Time Windows for Multimodal Input Fusion. INTERSPEECH-'04, (2004), 1009--1012.
[15]
Harnad, S., The Symbol Grounding Problem. Physica D 42, (1990), 335--346.
[16]
Kaiser, E., D. Demirdjian, A. Gruenstein, X. Li, J. Niekrasz, M. Wesson, and S. Kumar. Demo: A Multimodal Learning Interface for Sketch, Speak and Point Creation of a Schedule Chart. ICMI '04, (2004).
[17]
Kaiser, E.C. Multimodal New Vocabulary Recognition through Speech and Handwriting in a Whiteboard Scheduling Application. IUI '05, (2005), 51--58.
[18]
Kaiser, E.C. Shacer: A Speech and Handwriting Recognizer. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (2005).
[19]
Kaiser, E.C. Using Redundant Speech and Handwriting for Learning New Vocabulary and Understanding Abbreviations. ICMI '06, (2006), 347--356.
[20]
Kaiser, E.C. and P. Barthelmess. Edge-Splitting in a Cumulative Multimodal System, for a No-Wait Temporal Threshold on Information Fusion, Combined with an under-Specified Display. INTERSPEECH 2006.
[21]
Kaiser, E.C., P. Barthelmess, and A. Arthur. Multimodal Play Back of Collaborative Multiparty Corpora. ICMI '05, Workshop on Multimodal, Multiparty Meeting Processing, (2005).
[22]
Kurihara, K., M. Goto, J. Ogata, and T. Igarashi. Speech Pen: Predictive Handwriting Based on Ambient Multimodal Recognition. CHI '06, (2006), 851--860.
[23]
Logan, B., P. Moreno, J.-M.V. Thong, and E. Whittaker. An Experimental Study of an Audio Indexing System for the Web. ICSLP, (2000).
[24]
Mayer, R.E. and R. Moreno, Nine Ways to Reduce Cognitive Load in Multimedia Learning. Educational Psychologist 38, 1, (2003), 43--52.
[25]
Moreno, R. and R.E. Mayer, Verbal Redundancy in Multimedia Learning: When Reading Helps Listening. Jour. of Educational Psychology 94, 1, (2002), 156--163.
[26]
Ng, K. and V. Zue, Subword-Based Approaches for Spoken Document Retrieval. Speech Communication 32, 3, (2000), 157--186.
[27]
Ohtsuki, K., N. Hiroshima, M. Oku, and A. Imamura. Unsupervised Vocabulary Expansion for Automatic Transcription of Broadcast News. ICASSP '05, (2005).
[28]
Oviatt, S., Ten Myths of Multimodal Interaction. Communications of the ACM 42, 11, (1999), 74--81.
[29]
Oviatt, S.L., A. DeAngeli, and K. Kuhn. Integration and Synchronization of Input Modes During Multimodal Human-Computer Interaction. CHI '97, (1997).
[30]
Salton, G. and C. Buckley, Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24, 5, (1988), 513--523.
[31]
Saraclar, M. and R. Sproat. Lattice-Based Search for Spoken Utterance Retrieval. Proc. HLT/NAACL, (2004), 129--136.
[32]
Seekafile, Http://www.Seekafile.Org/
[33]
Sethy, A., S. Narayanan, and S. Parthasarthy. A Syllable Based Approach for Improved Recognition of Spoken Names. ISCA Pronunciation Modeling Workshop, (2002).
[34]
WaveSurfer, Http://www.Speech.Kth.Se/Wavesurfer/, Dep. of Speech, Music and Hearing, KTH.
[35]
Wickens, C.C., Multiple Resources and Performance Prediction. Theoretical Issues in Ergonomics Science 3, 2, (2002), 159--177.
[36]
Woodland, P.C., S.E. Johnson, P. Jourlin, and K.S. Jones. Effects of out of Vocabulary Words in Spoken Document Retrieval. Research and Development in Information Retrieval, (2000), 372--374.
[37]
Yu, H., T. Tomokiyo, Z. Wang, and A. Waibel. New Developments in Automatic Meeting Transcription. ICSLP, (2000).
[38]
Yu, P., K. Chen, C. Ma, and F. Seide, Vocabulary-Independent Indexing of Spontaneous Speech. IEEE Transactions on Speech and Audio Processing 13, 5, (2005), 635--643.
[39]
ZDNet, At The Whiteboard, http://news.zdnet.com/2036-2_22-6035716.html

Cited By

View all
  • (2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
  • (2011)Design of human-centric adaptive multimodal interfacesInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2011.07.00669:12(854-869)Online publication date: 1-Dec-2011
  • (2010)Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communicationProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1753326.1753584(1725-1734)Online publication date: 10-Apr-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '07: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2007
1654 pages
ISBN:9781595935939
DOI:10.1145/1240624
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. handwriting
  2. multimodal
  3. speech

Qualifiers

  • Article

Conference

CHI07
Sponsor:
CHI07: CHI Conference on Human Factors in Computing Systems
April 28 - May 3, 2007
California, San Jose, USA

Acceptance Rates

CHI '07 Paper Acceptance Rate 182 of 840 submissions, 22%;
Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI '25
CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
  • (2011)Design of human-centric adaptive multimodal interfacesInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2011.07.00669:12(854-869)Online publication date: 1-Dec-2011
  • (2010)Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communicationProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1753326.1753584(1725-1734)Online publication date: 10-Apr-2010
  • (2009)Graph-Based Partial Hypothesis Fusion for Pen-Aided Speech InputIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2009.201340917:3(478-485)Online publication date: Mar-2009
  • (2008)Multimodal InterfacesHCI Beyond the GUI10.1016/B978-0-12-374017-5.00012-2(391-444)Online publication date: 2008
  • (2007)Speech and sketchingProceedings of the 4th Eurographics workshop on Sketch-based interfaces and modeling10.1145/1384429.1384449(83-90)Online publication date: 2-Aug-2007
  • (2007)Cross-domain matching for automatic tag extraction across redundant handwriting and speech eventsProceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information10.1145/1330588.1330597(55-62)Online publication date: 15-Nov-2007
  • (2007)Toward content-aware multimodal tagging of personal photo collectionsProceedings of the 9th international conference on Multimodal interfaces10.1145/1322192.1322215(122-125)Online publication date: 12-Nov-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media