Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3490099.3511129acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

Does Using Voice Authentication in Multimodal Systems Correlate With Increased Speech Interaction During Non-critical Routine Tasks?

Published: 22 March 2022 Publication History

Abstract

Multimodal systems offer their functionalities through multiple communication channels. A messenger application may take either keyboard or voice input, and present incoming messages as text or audio output. This allows the users to communicate with their devices using the modality that best suits their context and personal preference. Authentication is often the first interaction with an application. The users’ login behavior can thus be used to immediately adapt the communication channel to their preferences. Yet given the sensitive nature of authentication, this interaction may not be representative for the user’s inclination to use speech input in non-critical routine tasks. In this paper, we test whether the interactions during authentication differ from non-critical routine tasks in a smart home application. Our findings indicate that, even in such a private space, the authentication behavior does not correlate with the use, nor with the perceived usability of speech input during non-critical task. We further find that short interactions with the system are not indicative of the user’s attitude towards audio output, independent of whether authentication or non-critical tasks are performed. Since security concerns are minmized in the secure environment of private spaces, our findings can be generalized to other contexts where security threats are even more apparent.

References

[1]
L. C. Aldridge and T. C. Lansdown. 1999. Driver preferences for speech based interaction with in-vehicle systems. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 43, 18 (1999), 977–981. https://doi.org/10.1177/154193129904301807
[2]
Farzana Alibay, Manolya Kavakli, Jean-Rémy Chardonnet, and Muhammad Zeeshan Baig. 2017. The usability of speech and/or gestures in multi-modal interface systems. In Proceedings of the 9th International Conference on Computer and Automation Engineering(Sydney, Australia) (ICCAE ’17). ACM, New York, NY, USA, 73–77. https://doi.org/10.1145/3057039.3057089
[3]
Amazon.com Inc.2021. Amazon Alexa Voice AI. https://developer.amazon.com/en-US/alexa
[4]
Apple Inc.2021. Siri does more than ever. Even before you ask.https://www.apple.com/siri/
[5]
Muhammad Zeeshan Baig and Manolya Kavakli. 2018. Qualitative analysis of a multimodal interface system using speech/gesture. In 2018 13th IEEE Conference on Industrial Electronics and Applications(ICIEA ’18). IEEE, Wuhan, China, 2811–2816. https://doi.org/10.1109/ICIEA.2018.8398188
[6]
John M. Bierschwale, Carlos E. Sampaio, Mark A. Stuart, and Randy L. Smith. 1989. Speech versus manual control of camera functions during a telerobotic task. Proceedings of the Human Factors Society Annual Meeting 33, 2(1989), 134–138. https://doi.org/10.1177/154193128903300229
[7]
Thayne Breetzke and Stephen V. Flowerday. 2016. The usability of IVRs for smart city crowdsourcing in developing cities. Electron. J. Inf. Syst. Dev. Ctries. 73, 1 (2016), 1–14. https://doi.org/10.1002/j.1681-4835.2016.tb00527.x
[8]
Nikola Bubalo, Frank Honold, Felix Schüssel, Michael Weber, and Anke Huckauf. 2016. User expertise in multimodal HCI. In Proceedings of the European Conference on Cognitive Ergonomics (Nottingham, UK) (ECCE ’16). ACM, New York, NY, USA, Article 10, 6 pages. https://doi.org/10.1145/2970930.2970941
[9]
Stéphanie Buisine and Jean-Claude Martin. 2003. Design principles for cooperation between modalities in bi-directional multimodal interfaces. In Proceedings of the CHI 2003 workshop on Principles for multimodal user interface design. Ft. Lauderdale, FL, USA, 72–75.
[10]
Stéphanie Buisine and Jean-Claude Martin. 2003. Experimental evaluation of bi-directional multimodal interaction with conversational agents. In IFIP TC13 International Conference on Human-Computer Interaction(INTERACT ’03). IOS Press, (c) IFIP, Zurich, Switzerland, 168–175.
[11]
José Coelho and Carlos Duarte. 2011. The contribution of multimodal adaptation techniques to the GUIDE interface. In International Conference on Universal Access in Human-Computer Interaction. Design for All and eInclusion. LNCS, vol. 6765(Orlando, FL, USA) (UAHCI ’11). Springer, Berlin, Heidelberg, 337–346. https://doi.org/10.1007/978-3-642-21672-5_37
[12]
José Coelho, Carlos Duarte, Pradipta Biswas, and Patrick Langdon. 2011. Developing accessible TV applications. In The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (Dundee, UK) (ASSETS ’11). ACM, New York, NY, USA, 131–138. https://doi.org/10.1145/2049536.2049561
[13]
Louise Connell and Dermot Lynott. 2011. Modality switching costs emerge in concept creation as well as retrieval. Cognitive Science 35, 4 (2011), 763–778. https://doi.org/10.1111/j.1551-6709.2010.01168.x
[14]
Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. ”What can I help you with?”: Infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services (Vienna, Austria) (MobileHCI ’17). ACM, New York, NY, USA, Article 43, 12 pages. https://doi.org/10.1145/3098279.3098539
[15]
Carlos Duarte and Luís Carriço. 2006. A conceptual framework for developing adaptive multimodal applications. In Proceedings of the 11th International Conference on Intelligent User Interfaces (Sydney, Australia) (IUI ’06). ACM, New York, NY, USA, 132–139. https://doi.org/10.1145/1111449.1111481
[16]
Bruno Dumas, Denis Lalanne, and Sharon Oviatt. 2009. Multimodal interfaces: A survey of principles, models and frameworks. Human Machine Interaction 5440 (2009), 1–25. https://doi.org/10.1007/978-3-642-00437-7_1
[17]
Python Software Foundation. 2021. Graphical User Interfaces with Tk. https://docs.python.org/3/library/tkinter.html
[18]
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
[19]
Fabian Hoffmann, Miriam-Ida Tyroller, Felix Wende, and Niels Henze. 2019. User-defined interaction for smart homes: Voice, touch, or mid-air gestures?. In Proceedings of the 18th International Conference on Mobile and Ubiquitous Multimedia (Pisa, Italy) (MUM ’19). ACM, New York, NY, USA, Article 33, 7 pages. https://doi.org/10.1145/3365610.3365624
[20]
Aseel Ibrahim, Jonas Lundberg, and Jenny Johansson. 2001. Speech enhanced remote control for media terminal. In 7th European Conference on Speech Communication and Technology(EUROSPEECH ’01). ISCA, Aalborg, Denmark, 2685–2688.
[21]
Alejandro Jaimes and Nicu Sebe. 2007. Multimodal human–computer interaction: A survey. Comput. Vis. Image Underst. 108, 1 (2007), 116–134. https://doi.org/10.1016/j.cviu.2006.10.019
[22]
Hui Jiang. 2005. Confidence measures for speech recognition: A survey. Speech Commun. 45, 4 (2005), 455–470. https://doi.org/10.1016/j.specom.2004.12.004
[23]
Kristiina Jokinen and Topi Hurtig. 2006. User expectations and real experience on a multimodal interactive system. In Proc. Interspeech 2006. ISCA, Pittsburgh, PA, USA, 1049–1052. https://doi.org/10.21437/Interspeech.2006-156
[24]
Veton Këpuska and Gamal Bohouta. 2017. Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl 7, 03 (2017), 20–24.
[25]
Gierad P. Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, Jason Linder, and Eytan Adar. 2013. PixelTone: A multimodal interface for image editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). ACM, New York, NY, USA, 2185–2194. https://doi.org/10.1145/2470654.2481301
[26]
Minkyung Lee, Mark Billinghurst, Woonhyuk Baek, Richard Green, and Woontack Woo. 2013. A usability study of multimodal input in an augmented reality environment. Virtual Reality 17, 4 (2013), 293–305.
[27]
Ewa Luger and Abigail Sellen. 2016. ”Like having a really bad PA”: The gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, CA, USA) (CHI ’16). ACM, New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288
[28]
Ludo Maat and Maja Pantic. 2007. Gaze-X: Adaptive, affective, multimodal interface for single-user office scenarios. In Artifical Intelligence for Human Computing, LNCS, Vol. 4451. Springer, Berlin, Heidelberg, 251–271. https://doi.org/10.1007/978-3-540-72348-6_13
[29]
Miroslav Melichar and Pavel Cenek. 2006. From vocal to multimodal dialogue management. In Proceedings of the 8th International Conference on Multimodal Interfaces (Banff, Alberta, Canada) (ICMI ’06). ACM, New York, NY, USA, 59–67. https://doi.org/10.1145/1180995.1181008
[30]
Microsoft. 2012. Microsoft Speech API (SAPI) 5.3. https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms723627(v=vs.85)
[31]
Microsoft. 2021. Cortana - Your personal productivity assistant in Microsoft 365. https://www.microsoft.com/en-us/cortana
[32]
Meredith Ringel Morris. 2012. Web on the wall: Insights from a multimodal interaction elicitation study. In Proceedings of the 2012 ACM International Conference on Interactive Tabletops and Surfaces(Cambridge, MA, USA) (ITS ’12). ACM, New York, NY, USA, 95–104. https://doi.org/10.1145/2396636.2396651
[33]
Sharon Oviatt. 1997. Mulitmodal interactive maps: Designing for human performance. Hum. Comput. Interact. 12, 1-2 (1997), 93–129. https://doi.org/10.1080/07370024.1997.9667241
[34]
Sharon Oviatt. 2003. Advances in robust multimodal interface design. IEEE Comput. Graph. Appl. 23, 5 (2003), 62–68. https://doi.org/10.1109/MCG.2003.1231179
[35]
Sharon Oviatt, Phil Cohen, Lizhong Wu, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions. Human–Computer Interaction 15, 4 (2000), 263–322. https://doi.org/10.1207/S15327051HCI1504_1
[36]
Sharon Oviatt, Rachel Coulston, and Rebecca Lunsford. 2004. When do we interact multimodally? Cognitive load and multimodal communication patterns. In Proceedings of the 6th International Conference on Multimodal Interfaces (State College, PA, USA) (ICMI ’04). ACM, New York, NY, USA, 129–136. https://doi.org/10.1145/1027933.1027957
[37]
Sharon Oviatt, Rachel Coulston, Stefanie Tomko, Benfang Xiao, Rebecca Lunsford, Matt Wesson, and Lesley Carmichael. 2003. Toward a theory of organized multimodal integration patterns during human-computer interaction. In Proceedings of the 5th International Conference on Multimodal Interfaces (Vancouver, Canada) (ICMI ’03). ACM, New York, USA, 44–51. https://doi.org/10.1145/958432.958443
[38]
Geoff Poulton. 2020. Why trust and autonomy are essential factors when working from home. https://www.rolandberger.com/en/Insights/Publications/The-home-office-becomes-the-new-normal.html
[39]
Sarah Prange, Ceenu George, and Florian Alt. 2021. Design considerations for usable authentication in smart homes. In Mensch Und Computer 2021(Ingolstadt, Germany) (MuC ’21). ACM, New York, NY, USA, 311–324. https://doi.org/10.1145/3473856.3473878
[40]
Stefan Profanter, Alexander Perzylo, Nikhil Somani, Markus Rickert, and Alois Knoll. 2015. Analysis and semantic modeling of modality preferences in industrial human-robot interaction. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Hamburg, Germany, 1812–1818. https://doi.org/10.1109/IROS.2015.7353613
[41]
Carl Rebman, Brian Reithel, and Casey Cegielski. 2001. An exploratory study of speech recognition technology and its implications for current electronic meeting support applications. In 7th Americas Conference on Information Systems(AMCIS ’01). AISeL, Boston, MA, USA, 298–306. https://aisel.aisnet.org/amcis2001/61
[42]
Napa Sae-Bae, Jonathan Wu, Nasir Memon, Janusz Konrad, and Prakash Ishwar. 2019. Emerging NUI-based methods for user authentication: A new taxonomy and survey. IEEE Transactions on Biometrics, Behavior, and Identity Science 1, 1(2019), 5–31. https://doi.org/10.1109/TBIOM.2019.2893297
[43]
Rajwant Sandhu and Benjamin J. Dyson. 2012. Re-evaluating visual and auditory dominance through modality switching costs and congruency analyses. Acta Psychologica 140, 2 (2012), 111–118. https://doi.org/10.1016/j.actpsy.2012.04.003
[44]
Stefan Schaffer, Robert Schleicher, and Sebastian Möller. 2011. Measuring cognitive load for different input modalities. In 9. Berliner Werkstatt Mensch-Maschine-Systeme. Berlin, Germany, 287–292.
[45]
Stefan Schaffer, Robert Schleicher, and Sebastian Möller. 2015. Modeling input modality choice in mobile graphical and speech interfaces. Int. J. Hum. Comput. Stud. 75 (2015), 21–34. https://doi.org/10.1016/j.ijhcs.2014.11.004
[46]
Mark Seligman and Mike Dillinger. 2006. Usability issues in an interactive speech-to-speech translation system for healthcare. In Proceedings of the Workshop on Medical Speech Translation (New York, New York) (MST ’06). ACL, USA, 1–4.
[47]
Amanda L. Smith and Barbara S. Chaparro. 2015. Smartphone text input method performance, usability, and preference with younger and older adults. Human Factors 57, 6 (2015), 1015–1028. https://doi.org/10.1177/0018720815575644
[48]
Doroteo T. Toledano, Rubén Fernández Pozo, Álvaro Hernández Trapote, and Luis Hernández Gómez. 2006. Usability evaluation of multi-modal biometric verification systems. Interacting with Computers 18, 5 (03 2006), 1101–1122. https://doi.org/10.1016/j.intcom.2006.01.004
[49]
Shari Trewin, Cal Swart, Larry Koved, Jacquelyn Martino, Kapil Singh, and Shay Ben-David. 2012. Biometric authentication on a mobile device: A study of user effort, error and task disruption. In Proceedings of the 28th Annual Computer Security Applications Conference (Orlando, FL, USA) (ACSAC ’12). ACM, New York, NY, USA, 159–168. https://doi.org/10.1145/2420950.2420976
[50]
Matthew Turk. 2014. Multimodal interaction: A review. Pattern Recognition Letters 36 (2014), 189–195. https://doi.org/10.1016/j.patrec.2013.07.003
[51]
Sebastian Uellenbeck, Markus Dürmuth, Christopher Wolf, and Thorsten Holz. 2013. Quantifying the security of graphical passwords: The case of Android unlock patterns. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (Berlin, Germany) (CCS ’13). ACM, New York, NY, USA, 161–172. https://doi.org/10.1145/2508859.2516700
[52]
Viswanath Venkatesh, Michael G. Morris, Gordon B. Davis, and Fred D. Davis. 2003. User acceptance of information technology: Toward a unified view. MIS Quarterly 27, 3 (2003), 425–478. http://www.jstor.org/stable/30036540
[53]
Wolfgang Wahlster. 2003. SmartKom: Symmetric Multimodality in an Adaptive and Reusable Dialogue Shell. In Proceedings of the Human Computer Interaction Status Conference. Berlin, Germany, 47–62.
[54]
Adam S. Williams, Jason Garcia, and Francisco Ortega. 2020. Understanding multimodal user gesture and speech behavior for object manipulation in augmented reality using elicitation. IEEE Trans. Vis. Comput. Graph. 26, 12 (2020), 3479–3489. https://doi.org/10.1109/TVCG.2020.3023566

Cited By

View all
  • (2024)Aligning smart home technology attributes with users’ preferences: a literature reviewIntelligent Buildings International10.1080/17508975.2024.241864816:3(129-143)Online publication date: 12-Nov-2024
  • (2024)Talking Buildings: Interactive Human-Building Smart-Bot for Smart BuildingsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_28(399-415)Online publication date: 29-Nov-2024
  • (2023)Reviewing and Reflecting on Smart Home Research from the Human-Centered PerspectiveProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580842(1-21)Online publication date: 19-Apr-2023

Index Terms

  1. Does Using Voice Authentication in Multimodal Systems Correlate With Increased Speech Interaction During Non-critical Routine Tasks?
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces
        March 2022
        888 pages
        ISBN:9781450391443
        DOI:10.1145/3490099
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 22 March 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. modality arbitration
        2. multimodal interfaces
        3. smart home application

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        IUI '22
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 746 of 2,811 submissions, 27%

        Upcoming Conference

        IUI '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)127
        • Downloads (Last 6 weeks)24
        Reflects downloads up to 08 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Aligning smart home technology attributes with users’ preferences: a literature reviewIntelligent Buildings International10.1080/17508975.2024.241864816:3(129-143)Online publication date: 12-Nov-2024
        • (2024)Talking Buildings: Interactive Human-Building Smart-Bot for Smart BuildingsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_28(399-415)Online publication date: 29-Nov-2024
        • (2023)Reviewing and Reflecting on Smart Home Research from the Human-Centered PerspectiveProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580842(1-21)Online publication date: 19-Apr-2023

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Login options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media