Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3469595.3469609acmotherconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
research-article
Open access

“You, Move There!”: Investigating the Impact of Feedback on Voice Control in Virtual Environments

Published: 27 July 2021 Publication History

Abstract

Current virtual environment (VEs) input techniques often overlook speech as a useful control modality. Speech could improve interaction in multimodal VEs by enabling users to address objects, locations, and agents, yet research on how to design effective speech for VEs is limited. Our paper investigates the effect of agent feedback on speech VE experiences. Through a lab study, users commanded agents to navigate a VE, receiving either auditory, visual or behavioural feedback. Based on a post interaction semi-structured interview, we find that the type of feedback given by agents is critical to user experience. Specifically auditory mechanisms are preferred, allowing users to engage with other modalities seamlessly during interaction. Although command-like utterances were frequently used, it was perceived as contextually appropriate, ensuring users were understood. Many also found it difficult to discover speech-based functionality. Drawing on these, we discuss key challenges for designing speech input for VEs.

References

[1]
René Amalberti, Noëlle Carbonell, and Pierre Falzon. 1993. User representations of computer systems in human-computer speech interaction. International Journal of Man-Machine Studies 38, 4 (1993), 547–566.
[2]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. ACM, Glasgow, 1–13.
[3]
Richard A. Bolt. 1980. “Put-that-there”: Voice and Gesture at the Graphics Interface. SIGGRAPH Comput. Graph. 14, 3 (July 1980), 262–270. https://doi.org/10.1145/965105.807503
[4]
Holly P. Branigan, Martin J. Pickering, Jamie Pearson, Janet F. McLean, and Ash Brown. 2011. The role of beliefs in lexical alignment: Evidence from dialogs with humans and computers. Cognition 121, 1 (Oct. 2011), 41–57. https://doi.org/10.1016/j.cognition.2011.05.011
[5]
Grigore Burdea, Paul Richard, and Philippe Coiffet. 1996. Multimodal Virtual Reality: Input-Output Devices, System Integration, and Human Factors. International Journal of Human-Computer Interaction 8 (01 1996), 5–. https://doi.org/10.1080/10447319609526138
[6]
Alan Chalmers, Kurt Debattista, and Belma Ramic-Brkic. 2009. Towards High-fidelity Multi-sensory Virtual Environments. Vis. Comput. 25, 12 (Oct. 2009), 1101–1108. https://doi.org/10.1007/s00371-009-0389-2
[7]
H. H. Clark. 1996. Using Language. Cambridge University Press, Cambridge.
[8]
Herbert H. Clark and Edward F. Schaefer. 1989. Contributing to discourse. Cognitive Science 13, 2 (April 1989), 259–294. https://doi.org/10.1016/0364-0213(89)90008-6
[9]
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, 2019. The state of speech in HCI: Trends, themes and challenges. Interacting with Computers 31, 4 (2019), 349–371.
[10]
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, Christine Murad, Cosmin Munteanu, Vincent Wade, and Benjamin R. Cowan. 2019. What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi-org.ucd.idm.oclc.org/10.1145/3290605.3300705
[11]
Sue VG Cobb, Sarah Nichols, Amanda Ramsey, and John R Wilson. 1999. Virtual reality-induced symptoms and effects (VRISE). Presence: Teleoperators & Virtual Environments 8, 2(1999), 169–186.
[12]
P. R. Cohen, M. Dalrymple, D. B. Moran, F. C. Pereira, and J. W. Sullivan. 1989. Synergistic Use of Direct Manipulation and Natural Language. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’89). ACM, New York, NY, USA, 227–233. https://doi.org/10.1145/67449.67494
[13]
Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen, and Josh Clow. 1997. QuickSet: Multimodal Interaction for Distributed Applications. In Proceedings of the Fifth ACM International Conference on Multimedia (Seattle, Washington, USA) (MULTIMEDIA ’97). ACM, New York, NY, USA, 31–40. https://doi.org/10.1145/266180.266328
[14]
Eric Corbett and Astrid Weber. 2016. What can I say?: addressing user experience challenges of a mobile voice user interface for accessibility. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, ACM, Florence, Italy, 72–82.
[15]
Benjamin R. Cowan, Holly P. Branigan, Mateo Obregón, Enas Bugis, and Russell Beale. 2015. Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human-computer dialogue. International Journal of Human-Computer Studies 83 (Nov. 2015), 27–42. https://doi.org/10.1016/j.ijhcs.2015.05.008
[16]
Benjamin R. Cowan, Philip Doyle, Justin Edwards, Diego Garaialde, Ali Hayes-Brady, Holly P. Branigan, João Cabral, and Leigh Clark. 2019. What’s in an Accent?: The Impact of Accented Synthetic Speech on Lexical Choice in Human-machine Dialogue. In Proceedings of the 1st International Conference on Conversational User Interfaces (Dublin, Ireland) (CUI ’19). ACM, New York, NY, USA, Article 23, 8 pages. https://doi.org/10.1145/3342775.3342786
[17]
Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. What can i help you with?: infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, Vienna, Austria, 43.
[18]
Nicole N. Craycraft and Sarah Brown‐Schmidt. 2018. Compensating for an Inattentive Audience. Cognitive Science 42, 5 (2018), 1504–1528. https://doi.org/10.1111/cogs.12614
[19]
Philip R Doyle, Leigh Clark, and Benjamin R. Cowan. 2021. What Do We See in Them? Identifying Dimensions of Partner Models for Speech Interfaces Using a Psycholexical Approach. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 244, 14 pages. https://doi.org/10.1145/3411764.3445206
[20]
Philip R. Doyle, Justin Edwards, Odile Dumbleton, Leigh Clark, and Benjamin R. Cowan. 2019. Mapping Perceptions of Humanness in Intelligent Personal Assistant Interaction. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (Taipei, Taiwan) (MobileHCI ’19). Association for Computing Machinery, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/3338286.3340116
[21]
Justin Edwards, He Liu, Tianyu Zhou, Sandy J. J. Gould, Leigh Clark, Philip Doyle, and Benjamin R. Cowan. 2019. Multitasking with Alexa: How Using Intelligent Personal Assistants Impacts Language-based Primary Task Performance. In Proceedings of the 1st International Conference on Conversational User Interfaces (Dublin, Ireland) (CUI ’19). ACM, New York, NY, USA, Article 4, 7 pages. https://doi.org/10.1145/3342775.3342785
[22]
Horst Eidenberger. 2018. Smell and Touch in the Virtual Jumpcube. Multimedia Syst. 24, 6 (Nov. 2018), 695–709. https://doi.org/10.1007/s00530-018-0592-y
[23]
Daniel Hepperle, Yannick Weiß, Andreas Siess, and Matthias Wölfel. 2019. 2D, 3D or speech? A case study on which user interface is preferable for what kind of object interaction in immersive virtual reality. Computers & Graphics 82 (Aug. 2019), 321–331. https://doi.org/10.1016/j.cag.2019.06.003
[24]
Sylvia Irawati, Scott Green, Mark Billinghurst, Andreas Duenser, and Heedong Ko. 2006. An Evaluation of an Augmented Reality Multimodal Interface Using Speech and Paddle Gestures. In International Conference on Artificial Reality and Telexistence(Advances in Artificial Reality and Tele-Existence). Springer Berlin Heidelberg, Berlin, Heidelberg, 272–283.
[25]
Ali Israr, Zachary Schwemler, John Mars, and Brian Krainer. 2016. VR360HD: A VR360 & Deg; Player with Enhanced Haptic Feedback. In Proceedings of the 22Nd ACM Conference on Virtual Reality Software and Technology (Munich, Germany) (VRST ’16). ACM, New York, NY, USA, 183–186. https://doi.org/10.1145/2993369.2993404
[26]
Marshall B Jones, Robert S Kennedy, and Kay M Stanney. 2004. Toward systematic control of cybersickness. Presence: Teleoperators & Virtual Environments 13, 5(2004), 589–600.
[27]
Philipp Kirschthaler, Martin Porcheron, and Joel E. Fischer. 2020. What Can I Say? Effects of Discoverability in VUIs on Task Performance and User Experience. In Proceedings of the 2nd Conference on Conversational User Interfaces (Bilbao, Spain) (CUI ’20). Association for Computing Machinery, New York, NY, USA, Article 9, 9 pages. https://doi.org/10.1145/3405755.3406119
[28]
Pascal Knierim, Thomas Kosch, Valentin Schwind, Markus Funk, Francisco Kiss, Stefan Schneegass, and Niels Henze. 2017. Tactile Drones - Providing Immersive Tactile Feedback in Virtual Reality Through Quadcopters. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI EA ’17). ACM, New York, NY, USA, 433–436. https://doi.org/10.1145/3027063.3050426
[29]
Mike Lambeta, Matt Dridger, Paul White, Jesslyn Janssen, and Ahmad Byagowi. 2016. Haptic Wheelchair. In ACM SIGGRAPH 2016 Posters (Anaheim, California) (SIGGRAPH ’16). ACM, New York, NY, USA, Article 90, 2 pages. https://doi.org/10.1145/2945078.2945168
[30]
Ludovic Le Bigot, Jean-François Rouet, and Eric Jamet. 2007. Effects of Speech- and Text-Based Interaction Modes in Natural Language Human-Computer Dialogue. Human Factors: The Journal of the Human Factors and Ergonomics Society 49, 6 (Dec. 2007), 1045–1053. https://doi.org/10.1518/001872007X249901
[31]
Minkyung Lee and Mark Billinghurst. 2008. A Wizard of Oz Study for an AR Multimodal Interface. In Proceedings of the 10th International Conference on Multimodal Interfaces (Chania, Crete, Greece) (ICMI ’08). ACM, New York, NY, USA, 249–256. https://doi.org/10.1145/1452392.1452444
[32]
Minkyung Lee, Mark Billinghurst, Woonhyuk Baek, Richard Green, and Woontack Woo. 2013. A Usability Study of Multimodal Input in an Augmented Reality Environment. Virtual Real. 17, 4 (Nov. 2013), 293–305. https://doi.org/10.1007/s10055-013-0230-0
[33]
V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 (Feb 1966), 707.
[34]
Nikolas Martelaro, Jaime Teevan, and Shamsi T. Iqbal. 2019. An Exploration of Speech-Based Productivity Support in the Car. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 264, 12 pages. https://doi.org/10.1145/3290605.3300494
[35]
Scott McGlashan and Tomas Axling. 1996. A speech interface to virtual environments.
[36]
Christine Murad, Cosmin Munteanu, Leigh Clark, and Benjamin R Cowan. 2018. Design guidelines for hands-free speech interaction. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct. ACM, ACM, Barcelona, Spain, 269–276.
[37]
Christine Murad, Cosmin Munteanu, Benjamin R Cowan, and Leigh Clark. 2019. Revolution or Evolution? Speech Interaction and HCI Design Guidelines. IEEE Pervasive Computing 18, 2 (2019), 33–45.
[38]
Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for how users overcome obstacles in voice user interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, ACM, Montreal QC, Canada, 6.
[39]
Sharon Oviatt, Jon Bernard, and Gina-Anne Levow. 1998. Linguistic Adaptations During Spoken and Multimodal Error Resolution. Language and Speech 41, 3-4 (July 1998), 419–442. https://doi.org/10.1177/002383099804100409
[40]
Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, Montreal QC, Canada, 640.
[41]
Belma Ramic-Brkic and Alan Chalmers. 2010. Virtual smell: Authentic smell diffusion in virtual environments. In Proceedings of the 7th International Conference on Computer Graphics, Virtual Reality, Visualisation and Interaction in Africa. ACM, Franschhoek, South Africa, 45–52.
[42]
Stuart Reeves. 2019. Conversation Considered Harmful?. In Proceedings of the 1st International Conference on Conversational User Interfaces (Dublin, Ireland) (CUI ’19). Association for Computing Machinery, New York, NY, USA, Article 10, 3 pages. https://doi.org/10.1145/3342775.3342796
[43]
Holger Regenbrecht, Joerg Hauber, Ralph Schoenfelder, and Andreas Maegerlein. 2005. Virtual Reality Aided Assembly with Directional Vibro-tactile Feedback. In Proceedings of the 3rd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia (Dunedin, New Zealand) (GRAPHITE ’05). ACM, New York, NY, USA, 381–387. https://doi.org/10.1145/1101389.1101464
[44]
Nina Rosa, Wolfgang Hürst, Wouter Vos, and Peter Werkhoven. 2015. The Influence of Visual Cues on Passive Tactile Sensations in a Multimodal Immersive Virtual Environment. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (Seattle, Washington, USA) (ICMI ’15). ACM, New York, NY, USA, 327–334. https://doi.org/10.1145/2818346.2820744
[45]
Daniel Roth, Peter Kullmann, Gary Bente, Dominik Gall, and Marc Erich Latoschik. 2018. Effects of hybrid and synthetic social gaze in avatar-mediated interactions. In 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, IEEE, Munich, Germany, 103–108.
[46]
Bhuvaneswari Sarupuri, Miriam Luque Chipana, and Robert W Lindeman. 2017. Trigger walking: A low-fatigue travel technique for immersive virtual reality. In 2017 IEEE Symposium on 3D User Interfaces (3DUI). IEEE, Los Angeles, CA, USA, 227–228.
[47]
Stefan Schaffer, Robert Schleicher, and Sebastian Möller. 2015. Modeling input modality choice in mobile graphical and speech interfaces. International Journal of Human-Computer Studies 75 (2015), 21–34.
[48]
Emily Shaw, Tessa Roper, Tommy Nilsson, Glyn Lawson, Sue V.G. Cobb, and Daniel Miller. 2019. The Heat is On: Exploring User Behaviour in a Multisensory Virtual Environment for Fire Evacuation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 626, 13 pages. https://doi.org/10.1145/3290605.3300856
[49]
Jaisie Sin and Cosmin Munteanu. 2020. Let’s Go There: Combining Voice and Pointing in VR. In Proceedings of the 2nd Conference on Conversational User Interfaces (Bilbao, Spain) (CUI ’20). Association for Computing Machinery, New York, NY, USA, Article 31, 3 pages. https://doi.org/10.1145/3405755.3406161
[50]
Harrison Jesse Smith and Michael Neff. 2018. Communication Behavior in Embodied Virtual Reality. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(CHI ’18). ACM, New York, NY, USA, 289:1–289:12. https://doi.org/10.1145/3173574.3173863 event-place: Montreal QC, Canada.
[51]
Bernhard Suhm. 2003. Towards best practices for speech user interface design. In Eighth European Conference on Speech Communication and Technology. ISCA Archive, Geneva, Switzerland, 2217–2220.
[52]
Anastasia Treskunov, Emil Gerhardt, David Nowottnik, Ben Fischer, Laurin Gerhardt, Mitja Säger, and Christian Geiger. 2019. ICAROSmuIti - A VR Test Environment for the Development of Multimodal and Multi-User Interaction Concepts. In Proceedings of Mensch Und Computer 2019 (Hamburg, Germany) (MuC’19). ACM, New York, NY, USA, 909–911. https://doi.org/10.1145/3340764.3345379
[53]
Edward Tse, Saul Greenberg, and Chia Shen. 2006. GSI Demo: Multiuser Gesture/Speech Interaction over Digital Tables by Wrapping Single User Applications. In Proceedings of the 8th International Conference on Multimodal Interfaces (Banff, Alberta, Canada) (ICMI ’06). ACM, New York, NY, USA, 76–83. https://doi.org/10.1145/1180995.1181012
[54]
Isaac Wang, Jesse Smith, and Jaime Ruiz. 2019. Exploring Virtual Agents for Augmented Reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). ACM, New York, NY, USA, 281:1–281:12. https://doi.org/10.1145/3290605.3300511 event-place: Glasgow, Scotland Uk.
[55]
Isaac Wang, Jesse Smith, and Jaime Ruiz. 2019. Exploring Virtual Agents for Augmented Reality. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300511
[56]
Zhuxiaona Wei and James A Landay. 2018. Evaluating Speech-Based Smart Devices Using New Usability Heuristics. IEEE Pervasive Computing 17, 2 (2018), 84–96.
[57]
D. Weimer and S. K. Ganapathy. 1989. A Synthetic Visual Environment with Hand Gesturing and Voice Input. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’89). ACM, New York, NY, USA, 235–240. https://doi.org/10.1145/67449.67495

Cited By

View all
  • (2024)VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642235(1-20)Online publication date: 11-May-2024
  • (2024)Voice user interfaces for effortless navigation in medical virtual reality environmentsComputers & Graphics10.1016/j.cag.2024.104069(104069)Online publication date: Sep-2024
  • (2023)Privacy-Enhancing Technology and Everyday Augmented RealityProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35695016:4(1-35)Online publication date: 11-Jan-2023
  • Show More Cited By
  1. “You, Move There!”: Investigating the Impact of Feedback on Voice Control in Virtual Environments

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CUI '21: Proceedings of the 3rd Conference on Conversational User Interfaces
    July 2021
    262 pages
    ISBN:9781450389983
    DOI:10.1145/3469595
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Gesture Input
    2. Speech Input
    3. Virtual Environments

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CUI '21

    Acceptance Rates

    Overall Acceptance Rate 34 of 100 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)195
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642235(1-20)Online publication date: 11-May-2024
    • (2024)Voice user interfaces for effortless navigation in medical virtual reality environmentsComputers & Graphics10.1016/j.cag.2024.104069(104069)Online publication date: Sep-2024
    • (2023)Privacy-Enhancing Technology and Everyday Augmented RealityProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35695016:4(1-35)Online publication date: 11-Jan-2023
    • (2023)Tell Me Where To Go: Voice-Controlled Hands-Free Locomotion for Virtual Reality Systems2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR55154.2023.00028(123-134)Online publication date: Mar-2023
    • (2023)Context Matters: Understanding the Effect of Usage Contexts on Users’ Modality Selection in Multimodal SystemsInternational Journal of Human–Computer Interaction10.1080/10447318.2023.225060640:20(6287-6302)Online publication date: 29-Aug-2023
    • (2023)Introduction to this special issue: guiding the conversation: new theory and design perspectives for conversational user interfacesHuman–Computer Interaction10.1080/07370024.2022.216190538:3-4(159-167)Online publication date: 5-Jan-2023
    • (2022)Sympathy for the digital: Influence of synthetic voice on affinity, social presence and empathy for photorealistic virtual humansComputers & Graphics10.1016/j.cag.2022.03.009104(116-128)Online publication date: May-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media