poster

“I Can’t Talk Now”: Speaking with Voice Output Communication Aid Using Text-to-Speech Synthesis During Multiparty Video Conference

Authors:

Sangsu LeeAuthors Info & Claims

CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems

Article No.: 288, Pages 1 - 6

https://doi.org/10.1145/3411763.3451745

Published: 08 May 2021 Publication History

Abstract

COVID-19 has resulted in the rapid popularization of video conferencing. A growing number of users have become obligated to find suitable places for video conferencing, but sometimes they inevitably participate in unsuitable conditions such as noisy or too silent public spaces. However, the video conference experience according to the environment users are in has not been sufficiently discussed. In particular, there is no conducted research on the occasions where video conferencing participants feel unable to speak with their voice due to spatial factors and how to address these situations. In this study, we propose a voice output communication aid (VOCA) for video conferencing which allows users to chat without making a sound. We made a technology probe and conducted a user test. Users who feel unable to speak orally could participate more actively with VOCA. Based on the results, we described the effects and potential of VOCA for video conferencing.

References

[1]

Apple. Accessed 2021. Siri does more than ever. Even before you ask.https://www.apple.com/siri/.

[2]

K Louise Barriball and Alison While. 1994. Collecting data using a semi-structured interview: a discussion paper. Journal of Advanced Nursing-Institutional Subscription 19, 2(1994), 328–335.

[3]

David R Beukelman, Pat Mirenda, 1998. Augmentative and alternative communication. Paul H. Brookes Baltimore.

[4]

DOREEN BLISCHAK, LINDA LOMBARDINO, and ALICE DYSON. 2003. Use of speech-generating devices: In support of natural speech. Augmentative and alternative communication 19, 1 (2003), 29–35.

[5]

Catarina Botelho, Lorenz Diener, Dennis Küster, Kevin Scheck, Shahin Amiriparian, Björn W Schuller, Tanja Schultz, Alberto Abad, and Isabel Trancoso. 2020. Toward silent paralinguistics: Speech-to-emg–retrieving articulatory muscle activity from speech. Small 61(2020), 12.

[6]

Kevin G Byrnes, Patrick A Kiely, Colum P Dunne, Kieran W McDermott, and John Calvin Coffey. 2021. Communication, collaboration and contagion:“Virtualisation” of anatomy during COVID-19. Clinical Anatomy 34, 1 (2021), 82–89.

[7]

Christine Develotte. 2012. L’analyse des corpus multimodaux en ligne: état des lieux et perspectives. In SHS Web of Conferences, Vol. 1. EDP Sciences, 509–525.

[8]

Alexander J. Fiannaca, Ann Paradiso, Jon Campbell, and Meredith Ringel Morris. 2018. Voicesetting: Voice Authoring UIs for Improved Expressivity in Augmentative Communication. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173857

Digital Library

[9]

David Geerts, Ishan Vaishnavi, Rufael Mekuria, Oskar van Deventer, and Pablo Cesar. 2011. Are We in Sync? Synchronization Requirements for Watching Online Video Together. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 311–314. https://doi.org/10.1145/1978942.1978986

Digital Library

[10]

Graham R Gibbs. 2007. Thematic coding and categorizing. Analyzing qualitative data 703 (2007), 38–56.

[11]

J. A. Gonzalez-Lopez, A. Gomez-Alanis, J. M. Martín Doñas, J. L. Pérez-Córdoba, and A. M. Gomez. 2020. Silent Speech Interfaces for Speech Restoration: A Review. IEEE Access 8(2020), 177995–178021. https://doi.org/10.1109/ACCESS.2020.3026579

[12]

Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Allison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, and Björn Eiderbäck. 2003. Technology Probes: Inspiring Design for and with Families. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). Association for Computing Machinery, New York, NY, USA, 17–24. https://doi.org/10.1145/642611.642616

Digital Library

[13]

Richard Kern and Christine Develotte. 2018. Screens and scenes: Multimodal communication in online intercultural encounters. Routledge.

[14]

Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300376

Digital Library

[15]

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother. 2005. Bi-layer segmentation of binocular stereo video. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. 407–414 vol. 2. https://doi.org/10.1109/CVPR.2005.91

Digital Library

[16]

S. M. Kuo, Y. C. Huang, and Zhibing Pan. 1995. Acoustic noise and echo cancellation microphone system for videoconferencing. IEEE Transactions on Consumer Electronics 41, 4 (1995), 1150–1158. https://doi.org/10.1109/30.477235

Digital Library

[17]

Naver. Accessed 2020. Clova Speech Synthesis API. https://developers.naver.com/docs/clova/api/.

[18]

David Nguyen and John Canny. 2005. MultiView: Spatially Faithful Group Video Conferencing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Portland, Oregon, USA) (CHI ’05). Association for Computing Machinery, New York, NY, USA, 799–808. https://doi.org/10.1145/1054972.1055084

Digital Library

[19]

Alexandros Pino and Georgios Kouroupetroglou. 2010. ITHACA: An open source framework for building component-based augmentative and alternative communication applications. ACM Transactions on Accessible Computing (TACCESS) 2, 4 (2010), 1–30.

Digital Library

[20]

Md Tahsin Tausif, RJ Weaver, and Sang Won Lee. 2020. Towards Enabling Eye Contact and Perspective Control in Video Conference. In Adjunct Publication of the 33rd Annual ACM Symposium on User Interface Software and Technology. 96–98.

[21]

Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Ren, Ye Jia, and Rif A Saurous. 2018. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. arXiv preprint arXiv:1803.09017(2018).

[22]

C. Zhang, Y. Rui, and L. He. 2006. Light Weight Background Blurring for Video Conferencing Applications. In 2006 International Conference on Image Processing. 481–484. https://doi.org/10.1109/ICIP.2006.312498

[23]

Yu Zhang, Ron J Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, and Bhuvana Ramabhadran. 2019. Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. arXiv preprint arXiv:1907.04448(2019).

[24]

Zoom. Accessed 2021. Video conferencing. https://zoom.us/.

Cited By

Recommendations

Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Individuality-Preserving Voice Reconstruction for Articulation Disorders Using Text-to-Speech Synthesis
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

This paper presents a speech synthesis method for people with articulation disorders. Because the movements of such speakers are limited by their athetoid symptoms, their prosody is often unstable and their speech rate differs from that of a physically ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems

May 2021

2965 pages

ISBN:9781450380959

DOI:10.1145/3411763

Editors:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

CHI '21

Sponsor:

SIGCHI

CHI '21: CHI Conference on Human Factors in Computing Systems

May 8 - 13, 2021

Yokohama, Japan

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
321
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents