Nothing Special   »   [go: up one dir, main page]

Han et al., 2021 - Google Patents

Speech emotion recognition with a resnet-cnn-transformer parallel neural network

Han et al., 2021

Document ID
7869938574125994962
Author
Han S
Leng F
Jin Z
Publication year
Publication venue
2021 International conference on communications, information system and computer engineering (CISCE)

External Links

Snippet

As a challenging pattern recognition task, speech emotion recognition has attracted more and more attention in recent years and is widely used in medical, Affective Computing, and other fields. In this paper, we proposed a parallel network of ResNet-CNN-Transformer …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6232Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • G06K9/6247Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods based on an approximation criterion, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Similar Documents

Publication Publication Date Title
Han et al. Speech emotion recognition with a resnet-cnn-transformer parallel neural network
Issa et al. Speech emotion recognition with deep convolutional neural networks
Chatziagapi et al. Data Augmentation Using GANs for Speech Emotion Recognition.
Sun et al. Speech emotion recognition based on DNN-decision tree SVM model
Latif et al. Direct modelling of speech emotion from raw speech
Ma et al. Emotion recognition from variable-length speech segments using deep learning on spectrograms.
Li et al. Dilated residual network with multi-head self-attention for speech emotion recognition
CN108597539B (en) Speech emotion recognition method based on parameter migration and spectrogram
Chen et al. A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition.
Cai et al. A novel learnable dictionary encoding layer for end-to-end language identification
CN111583964B (en) Natural voice emotion recognition method based on multimode deep feature learning
CN110675859B (en) Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN110534133B (en) Voice emotion recognition system and voice emotion recognition method
CN111429947B (en) Speech emotion recognition method based on multi-stage residual convolutional neural network
Parthasarathy et al. Convolutional neural network techniques for speech emotion recognition
Lin et al. An efficient temporal modeling approach for speech emotion recognition by mapping varied duration sentences into fixed number of chunks
Liu et al. Dual-tbnet: Improving the robustness of speech features via dual-transformer-bilstm for speech emotion recognition
Parthasarathy et al. Improving emotion classification through variational inference of latent variables
Gao et al. ToneNet: A CNN Model of Tone Classification of Mandarin Chinese.
Jiang et al. Speech Emotion Recognition Using Deep Convolutional Neural Network and Simple Recurrent Unit.
He et al. Speech emotion recognition using XGBoost and CNN BLSTM with attention
Hu et al. Speech emotion recognition based on attention mcnn combined with gender information
Singh et al. Emoception: An inception inspired efficient speech emotion recognition network
Li et al. Emotion recognition from speech with StarGAN and Dense‐DCNN
Shankar et al. A comparative study of data augmentation techniques for deep learning based emotion recognition