Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1218955.1218956dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Optimization in multimodal interpretation

Published: 21 July 2004 Publication History

Abstract

In a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback). These dependencies form a rich set of constraints from various perspectives such as temporal alignments between different modalities, coherence of conversation, and the domain semantics. There is strong evidence that competition and ranking of these constraints is important to achieve an optimal interpretation. Thus, we have developed an optimization approach for multimodal interpretation, particularly for interpreting multimodal references. A preliminary evaluation indicates the effectiveness of this approach, especially for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures.

References

[1]
Bolt, R. A. 1980. Put that there: Voice and Gesture at the Graphics Interface. Computer Graphics, 14(3): 262--270.
[2]
Blutner, R., 1998. Some Aspects of Optimality In Natural Language Interpretation. Journal of Semantics, 17, 189--216.
[3]
Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjalmsson, H. and Yan, H. 1999. Embodiment in Conversational Interfaces: Rea. In Proceedings of the CHI'99 Conference, 520--527.
[4]
Chai, J., Prasov, Z, and Hong, P. 2004b. Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversational System. Proceedings of HLT-NAACL 2004 (Companion Volumn).
[5]
Chai, J. Y., Hong, P., and Zhou, M. X. 2004a. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces, Proceedings of 9th International Conference on Intelligent User Interfaces (IUI): 70--77.
[6]
Chai, J., Pan, S., Zhou, M., and Houck, K. 2002. Context-based Multimodal Interpretation in Conversational Systems. Fourth International Conference on Multimodal Interfaces.
[7]
Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. 1996. Quickset: Multimodal Interaction for Distributed Applications. Proceedings of ACM Multimedia.
[8]
Eisner, Jason. 1997. Efficient Generation in Primitive Optimality Theory. Proceedings of ACL '97.
[9]
Gold, S. and Rangarajan, A. 1996. A Graduated Assignment Algorithm for Graph-matching. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 4.
[10]
Gustafson, J., Bell, L., Beskow, J., Boye J., Carlson, R., Edlund, J., Granstrom, B., House D., and Wiren, M. 2000. AdApt -- a Multimodal Conversational Dialogue System in an Apartment Domain. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP).
[11]
Johnston, M, Cohen, P., McGee, D., Oviatt, S., Pittman, J. and Smith, I. 1997. Unification-based Multimodal Integration, Proceedings of ACL '97.
[12]
Johnston, M. 1998. Unification-based Multimodal Parsing, Proceedings of COLING-ACL '98.
[13]
Johnston, M. and Bangalore, S. 2000. Finite-state Multimodal Parsing and Understanding. Proceedings of COLING'00.
[14]
Johnston, M., Bangalore, S., Visireddy G., Stent, A., Ehlen, P., Walker, M., Whittaker, S., and Maloor, P. 2002. MATCH: An Architecture for Multimodal Dialog Systems, Proceedings of ACL'02, Philadelphia, 376--383.
[15]
Kehler, A. 2000. Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction, Proceedings of AAAI'01, 685--689.
[16]
Koons, D. B., Sparrell, C. J. and Thorisson, K. R. 1993. Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures. In Intelligent Multimedia Interfaces, M. Maybury, Ed. MIT Press: Menlo Park, CA.
[17]
Neal, J. G., and Shapiro, S. C. 1991. Intelligent Multimedia Interface Technology. In Intelligent User Interfaces, J. Sullivan & S. Tyler, Eds. ACM: New York.
[18]
Oviatt, S. L. 1996. Multimodal Interfaces for Dynamic Interactive Maps. In Proceedings of Conference on Human Factors in Computing Systems: CHI '96, 95--102.
[19]
Oviatt, S., DeAngeli, A., and Kuhn, K., 1997. Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction, In Proceedings of Conference on Human Factors in Computing Systems: CHI '97.
[20]
Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Bunsford, R. Wesson, M., and Carmichael, L. 2003. Toward a Theory of Organized Multimodal Integration Patterns during Human-Computer Interaction. In Proceedings of Fifth International Conference on Multimodal Interfaces, 44--51.
[21]
Prince, A. and Smolensky, P. 1993. Optimality Theory. Constraint Interaction in Generative Grammar. ROA 537. http://roa.rutgers.edu/view.php3?id=845.
[22]
Stent, A., J. Dowding, J. M. Gawron, E. O. Bratt, and R. Moore. 1999. The Commandtalk Spoken Dialog System. Proceedings of ACL'99, 183--190.
[23]
Tsai, W. H. and Fu, K. S. 1979. Error-correcting Isomorphism of Attributed Relational Graphs for Pattern Analysis. IEEE Transactions on Systems, Man and Cybernetics., vol. 9.
[24]
Wahlster, W., 1998. User and Discourse Models for Multimodal Communication. Intelligent User Interfaces, M. Maybury and W. Wahlster (eds.), 359--370.
[25]
Wu, L., Oviatt, S., and Cohen, P. 1999. Multimodal Integration -- A Statistical View, IEEE Transactions on Multimedia, Vol. 1, No. 4, 334--341.

Cited By

View all
  • (2014)Latent Semantic Analysis for Multimodal User Input With Speech and GesturesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.229458622:2(417-429)Online publication date: 1-Feb-2014
  • (2012)Towards mediating shared perceptual basis in situated dialogueProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392827(140-149)Online publication date: 5-Jul-2012
  • (2012)Integrating word acquisition and referential grounding towards physical world interactionProceedings of the 14th ACM international conference on Multimodal interaction10.1145/2388676.2388703(109-116)Online publication date: 22-Oct-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)8
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Latent Semantic Analysis for Multimodal User Input With Speech and GesturesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.229458622:2(417-429)Online publication date: 1-Feb-2014
  • (2012)Towards mediating shared perceptual basis in situated dialogueProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392827(140-149)Online publication date: 5-Jul-2012
  • (2012)Integrating word acquisition and referential grounding towards physical world interactionProceedings of the 14th ACM international conference on Multimodal interaction10.1145/2388676.2388703(109-116)Online publication date: 22-Oct-2012
  • (2010)Usage patterns and latent semantic analyses for task goal inference of multimodal user interactionsProceedings of the 15th international conference on Intelligent user interfaces10.1145/1719970.1719989(129-138)Online publication date: 7-Feb-2010
  • (2009)Between linguistic attention and gaze fixations inmultimodal conversational interfacesProceedings of the 2009 international conference on Multimodal interfaces10.1145/1647314.1647339(143-150)Online publication date: 2-Nov-2009
  • (2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
  • (2008)An integrative recognition method for speech and gesturesProceedings of the 10th international conference on Multimodal interfaces10.1145/1452392.1452411(93-96)Online publication date: 20-Oct-2008
  • (2008)What's in a gaze?Proceedings of the 13th international conference on Intelligent user interfaces10.1145/1378773.1378777(20-29)Online publication date: 13-Jan-2008
  • (2007)Individual and domain adaptation in sentence planning for dialogueJournal of Artificial Intelligence Research10.5555/1622637.162264830:1(413-456)Online publication date: 1-Nov-2007
  • (2006)Cognitive principles in robust multimodal interpretationJournal of Artificial Intelligence Research10.5555/1622572.162257527:1(55-83)Online publication date: 1-Sep-2006
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media