We introduce a joint acoustic and text-only decoder (JATD) into the LAS decoder, which allows the LAS decoder to be trained on a much larger text-corporate.
May 4, 2020 · We find that the JATD model obtains in a 3-10% relative improvement in WER compared to a LAS decoder trained only on supervised audio-text pairs ...
We find that the JATD model obtains in a 3-10% relative improvement in WER compared to a LAS decoder trained only on supervised audio-text pairs across a ...
A joint acoustic and text decoder (JATD) into the LAS decoder, which makes it possible to incorporate a much larger text corpus into training and obtains in ...
Recently, we introduced a two-pass on-device end-to-end (E2E) speech recognition model, which runs RNN-T in the first-pass and then rescores/redecodes the ...
May 6, 2020 · E2E models are trained on audio-text pairs, which is a fraction of data compared to a conventional ASR model. ○ E2E models lag behind ...
Strohman, “An attention-based joint acoustic and text on-device end-to-end model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal ...
People also ask
What is the acoustic model in text to speech?
What is the end to end speech recognition model?
Strohman, "An Attention-Based Joint Acoustic and Text On-Device End-to-End Model," in Proc. ICASSP, 2020. B. Li, S. Chang, T.N. Sainath, R. Pang, Y. He T ...
Sep 14, 2023 · Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques.
Sep 14, 2024 · In this work, we propose a novel hybrid attention-based encoder-decoder model that enables efficient text adaptation in an end-to-end speech ...