Nothing Special   »   [go: up one dir, main page]

×
Please click here if you are not redirected within a few seconds.
We introduce a joint acoustic and text-only decoder (JATD) into the LAS decoder, which allows the LAS decoder to be trained on a much larger text-corporate.
May 4, 2020 · We find that the JATD model obtains in a 3-10% relative improvement in WER compared to a LAS decoder trained only on supervised audio-text pairs ...
We find that the JATD model obtains in a 3-10% relative improvement in WER compared to a LAS decoder trained only on supervised audio-text pairs across a ...
A joint acoustic and text decoder (JATD) into the LAS decoder, which makes it possible to incorporate a much larger text corpus into training and obtains in ...
Recently, we introduced a two-pass on-device end-to-end (E2E) speech recognition model, which runs RNN-T in the first-pass and then rescores/redecodes the ...
May 6, 2020 · E2E models are trained on audio-text pairs, which is a fraction of data compared to a conventional ASR model. ○ E2E models lag behind ...
Strohman, “An attention-based joint acoustic and text on-device end-to-end model,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal ...
People also ask
Strohman, "An Attention-Based Joint Acoustic and Text On-Device End-to-End Model," in Proc. ICASSP, 2020. B. Li, S. Chang, T.N. Sainath, R. Pang, Y. He T ...
Sep 14, 2023 · Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques.
Sep 14, 2024 · In this work, we propose a novel hybrid attention-based encoder-decoder model that enables efficient text adaptation in an end-to-end speech ...