Han et al., 2021 - Google Patents

Speech emotion recognition with a resnet-cnn-transformer parallel neural network

Han et al., 2021

Document ID: 7869938574125994962
Author: Han S; Leng F; Jin Z
Publication year: 2021
Publication venue: 2021 International conference on communications, information system and computer engineering (CISCE)

External Links

Cited by

Snippet

As a challenging pattern recognition task, speech emotion recognition has attracted more and more attention in recent years and is widely used in medical, Affective Computing, and other fields. In this paper, we proposed a parallel network of ResNet-CNN-Transformer …

Continue reading at ieeexplore.ieee.org (other versions)

230000001537 neural 0 title abstract description 22

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06K9/6232—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
- G06K9/6247—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Similar Documents

Publication	Publication Date	Title
Han et al.	2021	Speech emotion recognition with a resnet-cnn-transformer parallel neural network
Issa et al.	2020	Speech emotion recognition with deep convolutional neural networks
Chatziagapi et al.	2019	Data Augmentation Using GANs for Speech Emotion Recognition.
Sun et al.	2019	Speech emotion recognition based on DNN-decision tree SVM model
Latif et al.	2019	Direct modelling of speech emotion from raw speech
Ma et al.	2018	Emotion recognition from variable-length speech segments using deep learning on spectrograms.
Li et al.	2019	Dilated residual network with multi-head self-attention for speech emotion recognition
CN108597539B (en)	2021-09-03	Speech emotion recognition method based on parameter migration and spectrogram
Chen et al.	2020	A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.
Cai et al.	2018	A novel learnable dictionary encoding layer for end-to-end language identification
CN111583964B (en)	2023-07-21	Natural voice emotion recognition method based on multimode deep feature learning
CN110675859B (en)	2021-11-23	Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN110534133B (en)	2022-03-25	Voice emotion recognition system and voice emotion recognition method
CN111429947B (en)	2022-06-10	Speech emotion recognition method based on multi-stage residual convolutional neural network
Parthasarathy et al.	2018	Convolutional neural network techniques for speech emotion recognition
Lin et al.	2020	An efficient temporal modeling approach for speech emotion recognition by mapping varied duration sentences into fixed number of chunks
Liu et al.	2023	Dual-tbnet: Improving the robustness of speech features via dual-transformer-bilstm for speech emotion recognition
Parthasarathy et al.	2019	Improving emotion classification through variational inference of latent variables
Gao et al.	2019	ToneNet: A CNN Model of Tone Classification of Mandarin Chinese.
Jiang et al.	2019	Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit.
He et al.	2021	Speech emotion recognition using XGBoost and CNN BLSTM with attention
Hu et al.	2023	Speech emotion recognition based on attention mcnn combined with gender information
Singh et al.	2019	Emoception: An inception inspired efficient speech emotion recognition network
Li et al.	2022	Emotion recognition from speech with StarGAN and Dense‐DCNN
Shankar et al.	2022	A comparative study of data augmentation techniques for deep learning based emotion recognition