Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15049))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

225 Accesses

Abstract

Acquiring speech data is a crucial step in the development of speech recognition systems and related speech-based machine learning models. However, protecting privacy is an increasing concern that must be addressed. This study investigates voice conversion (VC) as a strategy for anonymizing the speech of individuals with dysarthria. We specifically focus on training a variety of VC models using self-supervised speech representations, such as Wav2Vec and its multi-lingual variant, Wav2Vec2.0 (XLSR). The converted voices maintain a word error rate that is within 1% with respect to the original recordings. The Equal Error Rate (EER) showed a significant increase, from 1.52% to 41.18% on the LibriSpeech test set, and from 3.75% to 42.19% on speakers from the VCTK corpus, indicating a substantial decrease in speaker verification performance. A similar trend is observed with dysarthric speech, where the EER varied from 16.45% to 43.46%. Additionally, our study includes classification experiments on dysarthric vs. healthy speech data to demonstrate that anonymized voices can still yield speech features essential for distinguishing between healthy and pathological speech. The impact of voice conversion is investigated by covering aspects such as articulation, prosody, phonation, and phonology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech

Article Open access 25 September 2024

A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization

SAIC: Integration of Speech Anonymization and Identity Classification

Notes

1.
https://github.com/jcvasquezc/DisVoice.

References

Arias-Vergara, T., Vásquez-Correa, J.C., Orozco-Arroyave, J.R.: Parkinson’s disease and aging: analysis of their effect in phonation and articulation of speech. Cogn. Comput. 9(6), 731–748 (2017)
Article Google Scholar
Babu, A., Wang, C., Tjandra, A., et al.: XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv abs/2111.09296 (2021)
Google Scholar
Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Proceedings of the NeurIPS, vol. 33, pp. 12449–12460 (2020)
Google Scholar
Cernak, M., Potard, B., Garner, P.N.: Phonological vocoding using artificial neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4844–4848. IEEE (2015)
Google Scholar
Champion, P., Jouvet, D., Larcher, A.: Speaker information modification in the VoicePrivacy 2020 Toolchain. Ph.D. thesis, INRIA Nancy, équipe Multispeech; LIUM-Laboratoire d’Informatique de l’Université du Mans (2020)
Google Scholar
Chang, H.P., Yoo, I.C., Jeong, C., Yook, D.: Zero-shot unseen speaker anonymization via voice conversion. IEEE Access 10, 130190–130199 (2022)
Article Google Scholar
Chen, L., Lee, K.A., Guo, W., Ling, Z.H.: Modeling pseudo-speaker uncertainty in voice anonymization. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11601–11605. IEEE (2024)
Google Scholar
Conneau, A., Baevski, A., Collobert, R., et al.: Unsupervised cross-lingual representation learning for speech recognition. In: Proceedings of the Interspeech, pp. 2426–2430 (2021)
Google Scholar
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Proceedings of the Interspeech, pp. 3830–3834 (2020)
Google Scholar
Diez, M., Varona, A., Penagarikano, M., Rodriguez-Fuentes, L.J., Bordel, G.: On the projection of PLLRs for unbounded feature distributions in spoken language recognition. IEEE Signal Process. Lett. 21(9), 1073–1077 (2014)
Article Google Scholar
Huang, W.C., Yang, S.W., Hayashi, T., Lee, H.Y., Watanabe, S., Toda, T.: S3PRL-VC: Open-source voice conversion framework with self-supervised speech representations. arXiv preprint arXiv:2110.06280 (2021)
Kim, H., Hasegawa-Johnson, M., Perlman, A., et al.: Dysarthric speech database for universal access research. In: Proceedings of the Interspeech, pp. 1741–1744 (2008)
Google Scholar
Nautsch, A., Jasserand, C., Kindt, E., Todisco, M., Trancoso, I., Evans, N.: The GDPR & speech data: reflections of legal and technology communities, first steps towards a common understanding. In: Proc. Interspeech, pp. 3695–3699 (2019)
Google Scholar
Orozco-Arroyave, J.R., Vásquez-Correa, J.C., et al.: NeuroSpeech: an open-source software for Parkinson’s speech analysis. Digit. Signal Proc. 77, 207–221 (2018)
Article Google Scholar
Orozco-Arroyave, J.R., Vásquez-Correa, J.C., Nöth, E.: Current Methods and New Trends in Signal Processing and Pattern Recognition for the Automatic Assessment of Motor Impairments: The Case of Parkinson’s Disease. Neurological Disorders and Imaging Physics 5, 8-1–8-57 (2020)
Article Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: Proceedings of the ICASSP, pp. 5206–5210. IEEE (2015)
Google Scholar
Patino, J., Tomashenko, N., Todisco, M., Nautsch, A., Evans, N.: Speaker anonymisation using the McAdams coefficient. In: Proceedings of the Interspeech, pp. 1099–1103 (2021)
Google Scholar
Perero-Codosero, J.M., Espinoza-Cuadros, F.M., Hernández-Gómez, L.A.: X-vector anonymization using autoencoders and adversarial training for preserving speech privacy. Comput. Speech Lang. 74, 101351 (2022)
Article Google Scholar
Qian, J., Du, H., Hou, J., et al.: Speech sanitizer: speech content desensitization and voice anonymization. IEEE Trans. Dependable Secure Comput. 18(6), 2631–2642 (2019)
Article Google Scholar
Schneider, S., Baevski, A., Collobert, R., Auli, M.: wav2vec: unsupervised pre-training for speech recognition. In: Proceedings of the Interspeech, pp. 3465–3469 (2019). https://doi.org/10.21437/Interspeech.2019-1873
Tomashenko, N., Wang, X., Vincent, E., et al.: The VoicePrivacy 2020 challenge: results and findings. Comput. Speech Lang. 74, 101362 (2022)
Article Google Scholar
Vásquez-Correa, J., Klumpp, P., Orozco-Arroyave, J.R., Nöth, E.: Phonet: a tool based on gated recurrent neural networks to extract phonological posteriors from speech. In: Proceedings Interspeech, pp. 549–553 (2019)
Google Scholar
Wang, X., Takaki, S., Yamagishi, J.: An autoregressive recurrent mixture density network for parametric speech synthesis. In: Proceedings of the ICASSP, pp. 4895–4899. IEEE (2017)
Google Scholar
Yamagishi, J., Veaux, C., MacDonald, K., et al.: CSTR VCTK Corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). University of Edinburgh. The Centre for Speech Technology Research (CSTR) (2019)
Google Scholar
Yoo, I.C., Lee, K., Leem, S., et al.: Speaker anonymization for personal information protection using voice conversion techniques. IEEE Access 8, 198637–198645 (2020)
Article Google Scholar
Zhao, Y., Huang, W.C., Tian, X., et al.: Voice conversion challenge 2020: intra-lingual semi-parallel and cross-lingual voice conversion. arXiv preprint arXiv:2008.12527 (2020)

Download references

Acknowledgement

This work was partially funded by the EVUK programme (“Next-generation Al for Integrated Diagnostics”) of the Free State of Bavaria and by CODI at UdeA grant # PI2023-58010.

Author information

Authors and Affiliations

Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Abner Hernandez, Paula Andrea Perez-Toro, Tomas Arias-Vergara, Juan Rafael Orozco-Arroyave & Andreas Maier
GITA Lab, Faculty of Engineering. Universidad de Antioquia UdeA, Medellín, Colombia
Paula Andrea Perez-Toro, Tomas Arias-Vergara & Juan Rafael Orozco-Arroyave
Fundación Vicomtech, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain
Juan Camilo Vasquez-Correa
Speech and Language Processing Lab. Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Seung Hee Yang

Authors

Abner Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Paula Andrea Perez-Toro
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Arias-Vergara
View author publications
You can also search for this author in PubMed Google Scholar
Juan Camilo Vasquez-Correa
View author publications
You can also search for this author in PubMed Google Scholar
Seung Hee Yang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Rafael Orozco-Arroyave
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Maier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Abner Hernandez or Paula Andrea Perez-Toro .

Editor information

Editors and Affiliations

Friedrich-Alexander-Universität, Erlangen, Germany
Elmar Nöth
Masaryk University, Brno, Czech Republic
Aleš Horák
Masaryk University, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hernandez, A. et al. (2024). Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation. In: Nöth, E., Horák, A., Sojka, P. (eds) Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science(), vol 15049. Springer, Cham. https://doi.org/10.1007/978-3-031-70566-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-70566-3_14
Published: 27 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70565-6
Online ISBN: 978-3-031-70566-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech

A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization

SAIC: Integration of Speech Anonymization and Identity Classification

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech

A Novel Method to Evaluate the Privacy Protection in Speaker Anonymization

SAIC: Integration of Speech Anonymization and Identity Classification

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation