Nothing Special   »   [go: up one dir, main page]

Skip to main content

Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2024)

Abstract

Acquiring speech data is a crucial step in the development of speech recognition systems and related speech-based machine learning models. However, protecting privacy is an increasing concern that must be addressed. This study investigates voice conversion (VC) as a strategy for anonymizing the speech of individuals with dysarthria. We specifically focus on training a variety of VC models using self-supervised speech representations, such as Wav2Vec and its multi-lingual variant, Wav2Vec2.0 (XLSR). The converted voices maintain a word error rate that is within 1% with respect to the original recordings. The Equal Error Rate (EER) showed a significant increase, from 1.52% to 41.18% on the LibriSpeech test set, and from 3.75% to 42.19% on speakers from the VCTK corpus, indicating a substantial decrease in speaker verification performance. A similar trend is observed with dysarthric speech, where the EER varied from 16.45% to 43.46%. Additionally, our study includes classification experiments on dysarthric vs. healthy speech data to demonstrate that anonymized voices can still yield speech features essential for distinguishing between healthy and pathological speech. The impact of voice conversion is investigated by covering aspects such as articulation, prosody, phonation, and phonology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/jcvasquezc/DisVoice.

References

  1. Arias-Vergara, T., Vásquez-Correa, J.C., Orozco-Arroyave, J.R.: Parkinson’s disease and aging: analysis of their effect in phonation and articulation of speech. Cogn. Comput. 9(6), 731–748 (2017)

    Article  Google Scholar 

  2. Babu, A., Wang, C., Tjandra, A., et al.: XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv abs/2111.09296 (2021)

    Google Scholar 

  3. Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Proceedings of the NeurIPS, vol. 33, pp. 12449–12460 (2020)

    Google Scholar 

  4. Cernak, M., Potard, B., Garner, P.N.: Phonological vocoding using artificial neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4844–4848. IEEE (2015)

    Google Scholar 

  5. Champion, P., Jouvet, D., Larcher, A.: Speaker information modification in the VoicePrivacy 2020 Toolchain. Ph.D. thesis, INRIA Nancy, équipe Multispeech; LIUM-Laboratoire d’Informatique de l’Université du Mans (2020)

    Google Scholar 

  6. Chang, H.P., Yoo, I.C., Jeong, C., Yook, D.: Zero-shot unseen speaker anonymization via voice conversion. IEEE Access 10, 130190–130199 (2022)

    Article  Google Scholar 

  7. Chen, L., Lee, K.A., Guo, W., Ling, Z.H.: Modeling pseudo-speaker uncertainty in voice anonymization. In: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11601–11605. IEEE (2024)

    Google Scholar 

  8. Conneau, A., Baevski, A., Collobert, R., et al.: Unsupervised cross-lingual representation learning for speech recognition. In: Proceedings of the Interspeech, pp. 2426–2430 (2021)

    Google Scholar 

  9. Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Proceedings of the Interspeech, pp. 3830–3834 (2020)

    Google Scholar 

  10. Diez, M., Varona, A., Penagarikano, M., Rodriguez-Fuentes, L.J., Bordel, G.: On the projection of PLLRs for unbounded feature distributions in spoken language recognition. IEEE Signal Process. Lett. 21(9), 1073–1077 (2014)

    Article  Google Scholar 

  11. Huang, W.C., Yang, S.W., Hayashi, T., Lee, H.Y., Watanabe, S., Toda, T.: S3PRL-VC: Open-source voice conversion framework with self-supervised speech representations. arXiv preprint arXiv:2110.06280 (2021)

  12. Kim, H., Hasegawa-Johnson, M., Perlman, A., et al.: Dysarthric speech database for universal access research. In: Proceedings of the Interspeech, pp. 1741–1744 (2008)

    Google Scholar 

  13. Nautsch, A., Jasserand, C., Kindt, E., Todisco, M., Trancoso, I., Evans, N.: The GDPR & speech data: reflections of legal and technology communities, first steps towards a common understanding. In: Proc. Interspeech, pp. 3695–3699 (2019)

    Google Scholar 

  14. Orozco-Arroyave, J.R., Vásquez-Correa, J.C., et al.: NeuroSpeech: an open-source software for Parkinson’s speech analysis. Digit. Signal Proc. 77, 207–221 (2018)

    Article  Google Scholar 

  15. Orozco-Arroyave, J.R., Vásquez-Correa, J.C., Nöth, E.: Current Methods and New Trends in Signal Processing and Pattern Recognition for the Automatic Assessment of Motor Impairments: The Case of Parkinson’s Disease. Neurological Disorders and Imaging Physics 5, 8-1–8-57 (2020)

    Article  Google Scholar 

  16. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: Proceedings of the ICASSP, pp. 5206–5210. IEEE (2015)

    Google Scholar 

  17. Patino, J., Tomashenko, N., Todisco, M., Nautsch, A., Evans, N.: Speaker anonymisation using the McAdams coefficient. In: Proceedings of the Interspeech, pp. 1099–1103 (2021)

    Google Scholar 

  18. Perero-Codosero, J.M., Espinoza-Cuadros, F.M., Hernández-Gómez, L.A.: X-vector anonymization using autoencoders and adversarial training for preserving speech privacy. Comput. Speech Lang. 74, 101351 (2022)

    Article  Google Scholar 

  19. Qian, J., Du, H., Hou, J., et al.: Speech sanitizer: speech content desensitization and voice anonymization. IEEE Trans. Dependable Secure Comput. 18(6), 2631–2642 (2019)

    Article  Google Scholar 

  20. Schneider, S., Baevski, A., Collobert, R., Auli, M.: wav2vec: unsupervised pre-training for speech recognition. In: Proceedings of the Interspeech, pp. 3465–3469 (2019). https://doi.org/10.21437/Interspeech.2019-1873

  21. Tomashenko, N., Wang, X., Vincent, E., et al.: The VoicePrivacy 2020 challenge: results and findings. Comput. Speech Lang. 74, 101362 (2022)

    Article  Google Scholar 

  22. Vásquez-Correa, J., Klumpp, P., Orozco-Arroyave, J.R., Nöth, E.: Phonet: a tool based on gated recurrent neural networks to extract phonological posteriors from speech. In: Proceedings Interspeech, pp. 549–553 (2019)

    Google Scholar 

  23. Wang, X., Takaki, S., Yamagishi, J.: An autoregressive recurrent mixture density network for parametric speech synthesis. In: Proceedings of the ICASSP, pp. 4895–4899. IEEE (2017)

    Google Scholar 

  24. Yamagishi, J., Veaux, C., MacDonald, K., et al.: CSTR VCTK Corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92). University of Edinburgh. The Centre for Speech Technology Research (CSTR) (2019)

    Google Scholar 

  25. Yoo, I.C., Lee, K., Leem, S., et al.: Speaker anonymization for personal information protection using voice conversion techniques. IEEE Access 8, 198637–198645 (2020)

    Article  Google Scholar 

  26. Zhao, Y., Huang, W.C., Tian, X., et al.: Voice conversion challenge 2020: intra-lingual semi-parallel and cross-lingual voice conversion. arXiv preprint arXiv:2008.12527 (2020)

Download references

Acknowledgement

This work was partially funded by the EVUK programme (“Next-generation Al for Integrated Diagnostics”) of the Free State of Bavaria and by CODI at UdeA grant # PI2023-58010.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Abner Hernandez or Paula Andrea Perez-Toro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hernandez, A. et al. (2024). Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation. In: Nöth, E., Horák, A., Sojka, P. (eds) Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science(), vol 15049. Springer, Cham. https://doi.org/10.1007/978-3-031-70566-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70566-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70565-6

  • Online ISBN: 978-3-031-70566-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics