Can Synthetic Speech Improve End-to-End Conversational Speech Translation?

Bismarck Bamfo Odoom, Nathaniel Robinson, Elijah Rippeth, Luis Tavarez-Arce, Kenton Murray, Matthew Wiesner, Paul McNamee, Philipp Koehn, Kevin Duh

Abstract

Conversational speech translation is an important technology that fosters communication among people of different language backgrounds. Three-way parallel data in the form of source speech, source transcript, and target translation is usually required to train end-to-end systems. However, such datasets are not readily available and are expensive to create as this involves multiple annotation stages. In this paper, we investigate the use of synthetic data from generative models, namely machine translation and text-to-speech synthesis, for training conversational speech translation systems. We show that adding synthetic data to the training recipe increasingly improves end-to-end training performance, especially when limited real data is available. However, when no real data is available, no amount of synthetic data helps.

Anthology ID:: 2024.amta-research.15
Volume:: Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Month:: September
Year:: 2024
Address:: Chicago, USA
Editors:: Rebecca Knowles, Akiko Eriguchi, Shivali Goel
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 167–177
Language:
URL:: https://aclanthology.org/2024.amta-research.15
DOI:
Bibkey:
Cite (ACL):: Bismarck Bamfo Odoom, Nathaniel Robinson, Elijah Rippeth, Luis Tavarez-Arce, Kenton Murray, Matthew Wiesner, Paul McNamee, Philipp Koehn, and Kevin Duh. 2024. Can Synthetic Speech Improve End-to-End Conversational Speech Translation?. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 167–177, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Can Synthetic Speech Improve End-to-End Conversational Speech Translation? (Bamfo Odoom et al., AMTA 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.amta-research.15.pdf

PDF Cite Search