Article

Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Authors:

Andrea Corradini,

Steven FeinerAuthors Info & Claims

ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces

Pages 12 - 19

https://doi.org/10.1145/958432.958438

Published: 05 November 2003 Publication History

Abstract

We describe an approach to 3D multimodal interaction in immersive augmented and virtual reality environments that accounts for the uncertain nature of the information sources. The resulting multimodal system fuses symbolic and statistical information from a set of 3D gesture, spoken language, and referential agents. The referential agents employ visible or invisible volumes that can be attached to 3D trackers in the environment, and which use a time-stamped history of the objects that intersect them to derive statistics for ranking potential referents. We discuss the means by which the system supports mutual disambiguation of these modalities and information sources, and show through a user study how mutual disambiguation accounts for over 45% of the successful 3D multimodal interpretations. An accompanying video demonstrates the system in action.

References

[1]

Atherton, P. R. A method of interactive visualization of CAD surface models on a color video display. Proc. ACM Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '81), ACM Press, 1981, 279--287.

Digital Library

[2]

Bolt, R. A. Put-That-There: Voice and gesture at the graphics interface, Computer Graphics, 14(3), 1980, 262--270.

Digital Library

[3]

Bolt, R. A. and Herranz, E. Two-handed gesture in multi-modal dialog. Proc. ACM Symposium on User Interface Software and Technology (UIST '92), ACM Press, 1992, 7--14.

Digital Library

[4]

Cohen, P. R., Johnston, M., McGee, D. R., Oviatt, S. L., Pittman, J. A., Smith, I., Chen, L. and Clow, J. QuickSet: multimodal interaction for distributed applications. Proc. International Multimedia Conference (Multimedia '97), ACM Press, 1997, 31--40.

Digital Library

[5]

Cohen, P. R., McGee, D. R., Oviatt, S. L., Wu, L., Clow, J., King, R., Julier, S. and Rosenblum, L. Multimodal interactions for 2D and 3D environments, IEEE Computer Graphics and Applications, (July/Aug), 1999, 10--13.

Digital Library

[6]

Corradini, A. and Cohen, P. R. Multimodal Speech-Gesture Interface for Handfree Painting on a Virtual Paper using Partial Recurrent Neural Networks as Gesture Recognizer. Proc. Int. Joint Conf. on Artificial Neural Networks (IJCNN '02), 2293--2298.

[7]

Corradini, A. and Cohen, P. R. On the Relationships among Speech, Gestures, and Object Manipulation in Virtual Envi-ronments: Initial Evidence. Proc. Int. CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue Systems, (2002), 52--61.

[8]

Fröhlich, B., Plate, J., Wind, J., Wesche, G. and Göbel, M. Cubic-moused-based interaction in virtual environments, IEEE Computer Graphics and Applications, 20(4), 2000, 12--15.

Digital Library

[9]

Inselberg, A. and Dimsdale, B. Parallel coordinates: A tool for visualizing multi-dimensional geometry, Proceedings of IEEE Visualization, 901990, 361--378.

Digital Library

[10]

Johnston, M. Unification-based multimodal parsing. Proc. Int. Joint Conf. of the Assoc. for Computational Linguistics and the Int. Committee on Computational Linguistics, Association for Computational Linguistics Press, 1998, 624--630.

Digital Library

[11]

Johnston, M., Cohen, P. R., McGee, D. R., Oviatt, S. L., Pittman, J. A. and Smith, I. Unification-based multimodal integration. Proc. Meeting of the Assoc. for Computational Linguistics, ACL Press, 1997, 281--288.

Digital Library

[12]

Kaiser, E. C. and Cohen, P. R. Implementation testing of a hybrid symbolic/statistical multimodal architecture. Proc. Int. Conf. on Spoken Language Processing (ICSLP '02), 173--176.

[13]

Koons, D. B., Sparrell, C. J. and Thorisson, K. R. Integrating simultaneous input from speech, gaze, and hand gestures, in Intelligent Multimedia Interfaces, M. T. Maybury. AAAI Press/MIT Press: Cambridge, MA, 1993, 257--276.

Digital Library

[14]

Krum, D. M., Omoteso, O., Ribarsky, W., Starner, T. and Hodges, L. F. Speech and gesture control of a whole earth 3D visualization environment. Proc. Joint Eurographics-IEEE TCVG Symposium on Visualization (VisSym 02), IEEE Press, 2002, 195--200.

Digital Library

[15]

Kumar, S., Cohen, P. R. and Levesque, H. J. The Adaptive Agent Architecture: Achieving Fault-Tolerance Using Persistent Broker Teams. Proc. Int. Conf. on Multi-Agent Systems, 2000.

Digital Library

[16]

Latoschik, M. E. Designing transition networks for multimodal VR-interactions using a markup language. Proc. IEEE fourth International Conference on Multimodal Interfaces (ICMI '02), IEEE Press, 2002.

Digital Library

[17]

Laviola, J. MSVT: A virtual reality-based multimodal scientific visualization tool. Proc. IASTED Int. Conf. on Computer Graphics and Imaging, 2000, 1--7.

[18]

Liang, J. and Green, M. JDCAD: A highly interactive 3D modeling system, Computers and Graphics, 18(4), 1994, 499--506.

[19]

Lucente, M., Zwart, G.-J. and George, A. Visualization Space: A testbed for deviceless multimodal user interfaces. Proc. AAAI Spring Symp., AAAI Press, 1998, 87--92.

[20]

McGee, D. R., Cohen, P. R. and Oviatt, S. L. Confirmation in multimodal systems. Proc. Int. Joint Conf. of the Assoc. for Computational Linguistics and the Int. Committee on Computational Linguistics, Université de Montréal, 1998, 823--829.

Digital Library

[21]

Olwal, A., Benko, H. and Feiner, S. SenseShapes: Using Sta-tistical Geometry for Object Selection in a Multimodal Aug-mented Reality System, in The Second International Symposium on Mixed and Augmented Reality.

Digital Library

[22]

Oviatt, S. L. Mutual disambiguation of recognition errors in a multimodal architecture. Proc. ACM Conf. on Human Factors in Computing Systems, ACM Press, 1999, 576--583.

Digital Library

[23]

Oviatt, S. L., Cohen, P. R., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions for 2000 and beyond, in Human-Computer Interaction in the New Millennium, J. Carroll. Addison-Wesley: Boston, 2002,

[24]

Poddar, I., Sethi, Y., Ozyildiz, E. and Sharma, R. Toward natural gesture/speech HCI: A case study of weather narration. Proc. ACM Workshop on Perceptual User Interfaces (PUI 98), ACM Press, 1998.

[25]

Poupyrev, I., Billinghurst, M., Weghorst, S. and Ichikawa, T. Go-Go interaction technique: Non-linear mapping for direct manipulation in VR. Proc. ACM Symposium on User Interface Software and Technology (UIST '96), ACM Press, 1996, 79--80.

Digital Library

[26]

Stiefelhagen, R. Tracking Focus of Attention in Meetings. Proc. 4th International Conference on Multimodal Interfaces (ICMI 02), IEEE Press, 2002, 273--380.

Digital Library

[27]

Tollmar, K., Demirdjian, D. and Darrell, T. Gesture + Play: Full-body interaction for virtual environments. Proc. ACM Conference on Human Factors in Computing Systems (CHI 2003), ACM Press, 2003, 620--621.

Digital Library

[28]

Weimer, D. and Ganapathy, S. K. A synthetic visual environment with hand gesturing and voice input. Proc. ACM Conference on Human Factors in Computing Systems (CHI '89), ACM Press, 1989, 235--240.

Digital Library

[29]

Wilson, A. and Shafer, S. XWand: UI for intelligent spaces. Proc. ACM Conference on Human Factors in Computing Systems (CHI '03), ACM Press, 2003.

Digital Library

[30]

Wu, L., Oviatt, S. L. and Cohen, P. R. Multimodal integration-A statistical view, IEEE Transactions on Multimedia, 1(4), 1999, 334--341.

Digital Library

Cited By

Venkatakrishnan RVenkatakrishnan RCanales RRaveendranath BPagano CRobb ALin WBabu S(2024)Investigating the Effects of Avatarization and Interaction Techniques on Near-field Mixed Reality Interactions with Physical ComponentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337205030:5(2756-2766)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3372050
Dongye XWeng DJiang HTian ZBao YChen P(2024)Personalized decision-making for agents in face-to-face interaction in virtual realityMultimedia Systems10.1007/s00530-024-01591-731:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s00530-024-01591-7
Jain AKondapally AYamada KYanaka H(2024)Neuro-Symbolic Reasoning for Multimodal Referring Expression Comprehension in HMI SystemsNew Generation Computing10.1007/s00354-024-00243-842:4(579-598)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s00354-024-00243-8
Show More Cited By

Recommendations

Multimodal augmented reality: the norm rather than the exception
MVAR '16: Proceedings of the 2016 workshop on Multimodal Virtual and Augmented Reality

Augmented reality (AR) is commonly seen as a technology that overlays virtual imagery onto a participant's view of the world. In line with this, most AR research is focused on what we see. In this paper, we challenge this focus on vision and make a case ...
Vision-Based Technique and Issues for Multimodal Interaction in Augmented Reality
VINCI '15: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction

Although many progresses have been accomplished in multimodal interaction, most researchers still treat each modality such as vision and speech, separately. They integrate the results at the application stage. This is because the roles of multiple ...
Haptics in Augmented Reality
ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

An augmented reality system merges synthetic sensory information into a user's perception of a three-dimensional environment. An important performance goal for an augmented reality system is that the user perceives a single seamless environment. In most ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces

November 2003

318 pages

ISBN:1581136218

DOI:10.1145/958432

Conference Chair:
Sharon Oviatt
Oregon Health & Science University
,
Program Chairs:
Trevor Darrell
Massachusetts Institute of Technology
,
Mark Maybury
MITRE
,
Wolfgang Wahlster
DFKI, Germany

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI-PUI03

Sponsor:

ICMI-PUI03: International Conference on Multimodal User Interfaces

November 5 - 7, 2003

British Columbia, Vancouver, Canada

Acceptance Rates

ICMI '03 Paper Acceptance Rate 45 of 130 submissions, 35%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

131
Total Citations
View Citations
2,643
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)11

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Venkatakrishnan RVenkatakrishnan RCanales RRaveendranath BPagano CRobb ALin WBabu S(2024)Investigating the Effects of Avatarization and Interaction Techniques on Near-field Mixed Reality Interactions with Physical ComponentsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.337205030:5(2756-2766)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3372050
Dongye XWeng DJiang HTian ZBao YChen P(2024)Personalized decision-making for agents in face-to-face interaction in virtual realityMultimedia Systems10.1007/s00530-024-01591-731:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s00530-024-01591-7
Jain AKondapally AYamada KYanaka H(2024)Neuro-Symbolic Reasoning for Multimodal Referring Expression Comprehension in HMI SystemsNew Generation Computing10.1007/s00354-024-00243-842:4(579-598)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s00354-024-00243-8
Domingo C(2023)Recording multimodal pair-programming dialogue for reference resolution by conversational agentsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614231(731-735)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3577190.3614231
Pan LYu CHe ZShi Y(2023)A Human-Computer Collaborative Editing Tool for Conceptual DiagramsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580676(1-29)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580676
Venkatakrishnan RVenkatakrishnan RRaveendranath BPagano CRobb ALin WBabu S(2023)Give Me a Hand: Improving the Effectiveness of Near-field Augmented Reality Interactions By Avatarizing Users' End EffectorsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324710529:5(2412-2422)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TVCG.2023.3247105
Yuan ZHe SLiu YYu L(2023)MEinVR: Multimodal interaction techniques in immersive explorationVisual Informatics10.1016/j.visinf.2023.06.0017:3(37-48)Online publication date: Sep-2023
https://doi.org/10.1016/j.visinf.2023.06.001
Grubert J(2023)Mixed Reality Interaction TechniquesSpringer Handbook of Augmented Reality10.1007/978-3-030-67822-7_5(109-129)Online publication date: 1-Jan-2023
https://doi.org/10.1007/978-3-030-67822-7_5
Yang LYuan JFeng Z(2022)Research on Equipment and Algorithm of a Multimodal Perception Gameplay Virtual and Real Fusion Intelligent ExperimentApplied Sciences10.3390/app12231218412:23(12184)Online publication date: 28-Nov-2022
https://doi.org/10.3390/app122312184
Bonilla Fominaya AChew RKomar MLo JSlabakis ASun NZhang YLindlbauer D(2022)MoonBuddy: A Voice-based Augmented Reality User Interface That Supports Astronauts During Extravehicular ActivitiesAdjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526114.3558690(1-4)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526114.3558690
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten