Han et al., 2021 - Google Patents
Speech emotion recognition with a resnet-cnn-transformer parallel neural networkHan et al., 2021
- Document ID
- 7869938574125994962
- Author
- Han S
- Leng F
- Jin Z
- Publication year
- Publication venue
- 2021 International conference on communications, information system and computer engineering (CISCE)
External Links
Snippet
As a challenging pattern recognition task, speech emotion recognition has attracted more and more attention in recent years and is widely used in medical, Affective Computing, and other fields. In this paper, we proposed a parallel network of ResNet-CNN-Transformer …
- 230000001537 neural 0 title abstract description 22
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | Speech emotion recognition with a resnet-cnn-transformer parallel neural network | |
Issa et al. | Speech emotion recognition with deep convolutional neural networks | |
Chatziagapi et al. | Data Augmentation Using GANs for Speech Emotion Recognition. | |
Sun et al. | Speech emotion recognition based on DNN-decision tree SVM model | |
Latif et al. | Direct modelling of speech emotion from raw speech | |
Ma et al. | Emotion recognition from variable-length speech segments using deep learning on spectrograms. | |
Li et al. | Dilated residual network with multi-head self-attention for speech emotion recognition | |
CN108597539B (en) | Speech emotion recognition method based on parameter migration and spectrogram | |
Chen et al. | A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition. | |
Cai et al. | A novel learnable dictionary encoding layer for end-to-end language identification | |
CN111583964B (en) | Natural voice emotion recognition method based on multimode deep feature learning | |
CN110675859B (en) | Multi-emotion recognition method, system, medium, and apparatus combining speech and text | |
CN110534133B (en) | Voice emotion recognition system and voice emotion recognition method | |
CN111429947B (en) | Speech emotion recognition method based on multi-stage residual convolutional neural network | |
Parthasarathy et al. | Convolutional neural network techniques for speech emotion recognition | |
Lin et al. | An efficient temporal modeling approach for speech emotion recognition by mapping varied duration sentences into fixed number of chunks | |
Liu et al. | Dual-tbnet: Improving the robustness of speech features via dual-transformer-bilstm for speech emotion recognition | |
Parthasarathy et al. | Improving emotion classification through variational inference of latent variables | |
Gao et al. | ToneNet: A CNN Model of Tone Classification of Mandarin Chinese. | |
Jiang et al. | Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit. | |
He et al. | Speech emotion recognition using XGBoost and CNN BLSTM with attention | |
Hu et al. | Speech emotion recognition based on attention mcnn combined with gender information | |
Singh et al. | Emoception: An inception inspired efficient speech emotion recognition network | |
Li et al. | Emotion recognition from speech with StarGAN and Dense‐DCNN | |
Shankar et al. | A comparative study of data augmentation techniques for deep learning based emotion recognition |