Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

ClearSpeech: Improving Voice Quality of Earbuds Using Both In-Ear and Out-Ear Microphones

Published: 12 January 2024 Publication History

Abstract

Wireless earbuds have been gaining increasing popularity and using them to make phone calls or issue voice commands requires the earbud microphones to pick up human speech. When the speaker is in a noisy environment, speech quality degrades significantly and requires speech enhancement (SE). In this paper, we present ClearSpeech, a novel deep-learning-based SE system designed for wireless earbuds. Specifically, by jointly using the earbud's in-ear and out-ear microphones, we devised a suite of techniques to effectively fuse the two signals and enhance the magnitude and phase of the speech spectrogram. We built an earbud prototype to evaluate ClearSpeech under various settings with data collected from 20 subjects. Our results suggest that ClearSpeech can improve the SE performance significantly compared to conventional approaches using the out-ear microphone only. We also show that ClearSpeech can process user speech in real-time on smartphones.

Supplementary Material

ma (ma.zip)
Supplemental movie, appendix, image and software files for, ClearSpeech: Improving Voice Quality of Earbuds Using Both In-Ear and Out-Ear Microphones

References

[1]
Online. Bela Mini Board. https://learn.bela.io/products/bela-boards/bela-mini/. (Accessed on Dec 4, 2022).
[2]
Online. Microphone. https://www.cuidevices.com/product/resource/cmc-4015-40l100.pdf. (Accessed on Dec 4, 2022).
[3]
Online. Pytorch Quantization. https://pytorch.org/docs/stable/quantization.html. (Accessed on Dec 4, 2022).
[4]
Takayuki Arakawa, Takafumi Koshinaka, Shohei Yano, Hideki Irisawa, Ryoji Miyahara, and Hitoshi Imaoka. 2016. Fast and accurate personal authentication using ear acoustics. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, 1--4.
[5]
Steven Boll. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on acoustics, speech, and signal processing 27, 2 (1979), 113--120.
[6]
Kayla-Jade Butkow, Ting Dang, Andrea Ferlini, Dong Ma, and Cecilia Mascolo. 2021. Motion-resilient heart rate monitoring with in-ear microphones. arXiv preprint arXiv:2108.09393 (2021).
[7]
Ishan Chatterjee, Maruchi Kim, Vivek Jayaram, Shyamnath Gollakota, Ira Kemelmacher, Shwetak Patel, and Steven M Seitz. 2022. ClearBuds: wireless binaural earbuds for learning-based speech enhancement. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 384--396.
[8]
Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, and Helen Meng. 2023. Inter-Subnet: Speech Enhancement with Subband Interaction. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1--5.
[9]
Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, and Helen Meng. 2022. FullSubNet+: Channel attention fullsubnet with complex spectrograms for speech enhancement. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7857--7861.
[10]
Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T Freeman, and Michael Rubinstein. 2018. Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--11.
[11]
Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, and Xuedong Huang. 2022. Personalized speech enhancement: New models and comprehensive evaluation. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 356--360.
[12]
Andrea Ferlini, Dong Ma, Robert Harle, and Cecilia Mascolo. 2021. EarGate: gait-based user identification with in-ear microphones. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 337--349.
[13]
Yang Gao, Wei Wang, Vir V Phoha, Wei Sun, and Zhanpeng Jin. 2019. EarEcho: Using ear canal echo for wearable authentication. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--24.
[14]
John S Garofolo. 1993. Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993 (1993).
[15]
Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 776--780.
[16]
Lloyd Griffiths and CW Jim. 1982. An alternative approach to linearly constrained adaptive beamforming. IEEE Transactions on antennas and propagation 30, 1 (1982), 27--34.
[17]
Mattias P Heinrich, Maik Stille, and Thorsten M Buzug. 2018. Residual U-net convolutional neural network architecture for low-dose CT denoising. Current Directions in Biomedical Engineering 4, 1 (2018), 297--300.
[18]
Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, and Zhanpeng Jin. 2022. EarCommand: "Hearing" your silent speech commands in ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--28.
[19]
Pavan Karjol, M Ajay Kumar, and Prasanta Kumar Ghosh. 2018. Speech enhancement using multiple deep neural networks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5049--5052.
[20]
Xugang Lu, Yu Tsao, Shigeki Matsuda, and Chiori Hori. 2013. Speech enhancement based on deep denoising autoencoder. In Interspeech, Vol. 2013. 436--440.
[21]
Yi Luo, Zhuo Chen, and Takuya Yoshioka. 2020. Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 46--50.
[22]
Yi Luo and Nima Mesgarani. 2019. Conv-tasnet: Surpassing ideal time--frequency magnitude masking for speech separation. IEEE/ACM transactions on audio, speech, and language processing 27, 8 (2019), 1256--1266.
[23]
Dong Ma, Andrea Ferlini, and Cecilia Mascolo. 2021. OESense: employing occlusion effect for in-ear human sensing. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. 175--187.
[24]
Alexis Martin and Jérémie Voix. 2017. In-ear audio wearable: Measurement of heart and breathing rates for health and safety monitoring. IEEE Transactions on Biomedical Engineering 65, 6 (2017), 1256--1263.
[25]
Héctor A Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: noise-robust speech capturing glasses using vibration sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--23.
[26]
Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, and Jesper Jensen. 2021. An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 1368--1396.
[27]
Nasser Mohammadiha, Paris Smaragdis, and Arne Leijon. 2013. Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech, and Language Processing 21, 10 (2013), 2140--2151.
[28]
H Gustav Mueller, Kathryn E Bright, and Jerry L Northern. 1996. Studies of the hearing aid occlusion effect. In Seminars in Hearing, Vol. 17. Copyright© 1996 by Thieme Medical Publishers, Inc., 21--31.
[29]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Icml.
[30]
Nikolai Novitski, Minna Huotilainen, Mari Tervaniemi, Risto Näätänen, and Vineta Fellman. 2007. Neonatal frequency discrimination in 250--4000-Hz range: Electrophysiological evidence. Clinical Neurophysiology 118, 2 (2007), 412--419.
[31]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[32]
Pascal Scalart et al. 1996. Speech enhancement based on a priori signal to noise estimation. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Vol. 2. IEEE, 629--632.
[33]
Roman Schlieper, Song Li, Stephan Preihs, and Jürgen Peissig. 2019. The relationship between the acoustic impedance of headphones and the occlusion effect. In Audio Engineering Society Conference: 2019 AES International Conference on Headphone Technology. Audio Engineering Society.
[34]
Stefania Sesia, Issam Toufik, and Matthew Baker. 2011. LTE - The UMTS Long Term Evolution: From Theory to Practice. John Wiley & Sons Ltd.
[35]
Irtaza Shahid, Yang Bai, Nakul Garg, and Nirupam Roy. 2022. VoiceFind: Noise-resilient speech recovery in commodity headphones. In Proceedings of the 1st ACM International Workshop on Intelligent Acoustic Systems and Applications. 13--18.
[36]
Ke Sun and Xinyu Zhang. 2021. UltraSE: single-channel speech enhancement using ultrasound. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 160--173.
[37]
Ming Tu and Xianxian Zhang. 2017. Speech enhancement based on deep neural networks with skip connections. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5565--5569.
[38]
Barry D Van Veen and Kevin M Buckley. 1988. Beamforming: A versatile approach to spatial filtering. IEEE assp magazine 5, 2 (1988), 4--24.
[39]
Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018. Understanding convolution for semantic segmentation. In 2018 IEEE winter conference on applications of computer vision (WACV). Ieee, 1451--1460.
[40]
Donald S Williamson, Yuxuan Wang, and DeLiang Wang. 2015. Complex ratio masking for monaural speech separation. IEEE/ACM transactions on audio, speech, and language processing 24, 3 (2015), 483--492.
[41]
Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. 2013. An experimental study on speech enhancement based on deep neural networks. IEEE Signal processing letters 21, 1 (2013), 65--68.
[42]
Dacheng Yin, Chong Luo, Zhiwei Xiong, and Wenjun Zeng. 2020. PHASEN: A phase-and-harmonics-aware speech enhancement network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9458--9465.
[43]
Asri Rizki Yuliani, M Faizal Amri, Endang Suryawati, Ade Ramdan, and Hilman Ferdinandus Pardede. 2021. Speech enhancement using deep learning methods: a review. Jurnal Elektronika dan Telekomunikasi 21, 1 (2021), 19--26.
[44]
Qian Zhang, Dong Wang, Run Zhao, Yinggang Yu, and Junjie Shen. 2021. Sensing to hear: Speech enhancement for mobile devices using acoustic signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--30.

Cited By

View all
  • (2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
  • (2024)Enabling Hands-Free Voice Assistant Activation on EarphonesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661890(155-168)Online publication date: 3-Jun-2024
  • (2024)Understanding Real-Time Collaborative Programming: A Study of Visual Studio Live ShareACM Transactions on Software Engineering and Methodology10.1145/364367233:4(1-28)Online publication date: 27-Jan-2024

Index Terms

  1. ClearSpeech: Improving Voice Quality of Earbuds Using Both In-Ear and Out-Ear Microphones

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 4
    December 2023
    1613 pages
    EISSN:2474-9567
    DOI:10.1145/3640795
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 January 2024
    Published in IMWUT Volume 7, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Audio Processing
    2. Earables
    3. Smart Earbuds
    4. Speech Enhancement

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 Grant

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)689
    • Downloads (Last 6 weeks)114
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
    • (2024)Enabling Hands-Free Voice Assistant Activation on EarphonesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661890(155-168)Online publication date: 3-Jun-2024
    • (2024)Understanding Real-Time Collaborative Programming: A Study of Visual Studio Live ShareACM Transactions on Software Engineering and Methodology10.1145/364367233:4(1-28)Online publication date: 27-Jan-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media