Nothing Special   »   [go: up one dir, main page]

Li et al., 2018 - Google Patents

Exploring layer trajectory LSTM with depth processing units and attention

Li et al., 2018

View PDF
Document ID
5865692694548562496
Author
Li J
Lu L
Liu C
Gong Y
Publication year
Publication venue
2018 IEEE Spoken Language Technology Workshop (SLT)

External Links

Snippet

Traditional LSTM model and its variants normally work in a frame-by-frame and layer-by- layer fashion, which deals with the temporal modeling and target classification problems at the same time. In this paper, we extend our recently proposed layer trajectory LSTM (ltLSTM) …
Continue reading at scholar.archive.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6296Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies

Similar Documents

Publication Publication Date Title
US12086704B2 (en) Machine learning model with depth processing units
Li et al. Developing RNN-T models surpassing high-performance hybrid models with customization capability
Li et al. Improving RNN transducer modeling for end-to-end speech recognition
Li et al. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition
Li et al. Learning small-size DNN with output-distribution-based criteria
Chen et al. Progressive joint modeling in unsupervised single-channel overlapped speech recognition
Deng et al. Ensemble deep learning for speech recognition
Deng et al. Deep convex net: A scalable architecture for speech pattern classification
US9235799B2 (en) Discriminative pretraining of deep neural networks
Tjandra et al. Deja-vu: Double feature presentation and iterated loss in deep transformer networks
Li et al. Layer trajectory LSTM
Hsu et al. A prioritized grid long short-term memory RNN for speech recognition
Li et al. Improving layer trajectory LSTM with future context frames
Long et al. Large-scale semi-supervised training in deep learning acoustic model for ASR
Meng et al. Internal language model adaptation with text-only data for end-to-end speech recognition
Das et al. Multi-dialect speech recognition in english using attention on ensemble of experts
Li et al. High-accuracy and low-latency speech recognition with two-head contextual layer trajectory LSTM model
Takashima et al. Feature extraction using pre-trained convolutive bottleneck nets for dysarthric speech recognition
Sun et al. Improving low-resource speech recognition based on improved NN-HMM structures
Kang et al. Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
Ansari et al. Toward growing modular deep neural networks for continuous speech recognition
Mimura et al. Forward-Backward Attention Decoder.
Li et al. Exploring layer trajectory LSTM with depth processing units and attention
Cai et al. Convolutional maxout neural networks for low-resource speech recognition
Liu et al. Graph based emotion recognition with attention pooling for variable-length utterances