Nothing Special   »   [go: up one dir, main page]

×
Please click here if you are not redirected within a few seconds.
Mar 27, 2018 · Our best MMDA setup obtains small improvements on character error rate (CER), and as much as 7-10\% relative word error rate (WER) improvement ...
We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using symbolic input in.
The MMDA architecture attempts to eliminate the need for an external LM, by enabling seamless mixing of large text datasets with significantly smaller ...
An E2E ASR model with an extra text encoder network is a commonly used architecture to integrate more linguistic information into the ASR encoder. ... ...
Dec 8, 2021 · In summary, the ChannelAugment technique can be easily applied to multi-channel end-to-end ASR model during training, improving robustness ...
Jun 4, 2023 · We propose an on-the-fly data augmentation strategy that transforms single speaker training data into multiple speaker data by appending together multiple ...
Missing: Modal | Show results with:Modal
In this paper we present a data augmentation scheme tailored for low-resource ASR in diverse languages. Across 3 test languages, our approach resulted in a 20% ...
People also ask
Multimodal speech recognition aims to improve the per- formance of automatic speech recognition (ASR) systems by leveraging additional visual information ...
Figure 1: The two proposed fusion mechanisms of the audio and visual modalities: emb, fuses along the embedding dimension (left); seq, fuses along the ...
Dec 10, 2018 · We explore training attention-based encoder-decoder ASR for low-resource languages and present techniques that result in a 50 character ...