Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

Abstract

While massively multilingual speech models like wav2vec 2.0 XLSR-128 can be directly fine-tuned for automatic speech recognition (ASR), downstream performance can still be relatively poor on languages that are under-represented in the pre-training data. Continued pre-training on 70–200 hours of untranscribed speech in these languages can help — but what about languages without that much recorded data? For such cases, we show that supplementing the target language with data from a similar, higher-resource ‘donor’ language can help. For example, continued pretraining on only 10 hours of low-resource Punjabi supplemented with 60 hours of donor Hindi is almost as good as continued pretraining on 70 hours of Punjabi. By contrast, sourcing supplemental data from less similar donors like Bengali does not improve ASR performance. To inform donor language selection, we propose a novel similarity metric based on the sequence distribution of induced acoustic units: the Acoustic Token Distribution Similarity (ATDS). Across a set of typologically different target languages (Punjabi, Galician, Iban, Setswana), we show that the ATDS between the target language and its candidate donors precisely predicts target language ASR performance.

Anthology ID:: 2024.sigtyp-1.13
Volume:: Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:: March
Year:: 2024
Address:: St. Julian's, Malta
Editors:: Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:: SIGTYP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 100–112
Language:
URL:: https://aclanthology.org/2024.sigtyp-1.13
DOI:
Bibkey:
Cite (ACL):: Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, and Dan Jurafsky. 2024. Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 100–112, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):: Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens (San et al., SIGTYP-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.sigtyp-1.13.pdf

PDF Cite Search