Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition
Expert Systems with Applications: An International Journal (EXWA), Volume 258, Issue Chttps://doi.org/10.1016/j.eswa.2024.125162AbstractUnsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for ...
Highlights- This study is the first work to explore which acoustic features are corpus invariant.
- We propose a new transfer subspace learning framework for cross-corpus SER, i.e., CAFS.
- Through experiments, some acoustic features reveal their ...
- research-articleOctober 2024
Speech emotion recognition based on multi-feature speed rate and LSTM
AbstractCorrectly recognizing speech emotions is of significant importance in various fields, such as healthcare and human–computer interaction (HCI). However, the complexity of speech signal features poses challenges for speech emotion recognition. This ...
Highlights- Combined rhythmic features and short-time features to mitigate overfitting.
- Restructured speech signals based on speech rate for LSTM network training.
- Addressed challenges posed by limited dataset to improve trained models.
- ...
- research-articleOctober 2024
Speech emotion recognition based on bi-directional acoustic–articulatory conversion
AbstractAcoustic and articulatory signals are naturally coupled and complementary. The challenge of acquiring articulatory data and the nonlinear ill-posedness of acoustic–articulatory conversions have resulted in previous studies on speech emotion ...
Graphical abstractDisplay Omitted
- research-articleOctober 2024
Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation
Computers in Biology and Medicine (CBIM), Volume 179, Issue Chttps://doi.org/10.1016/j.compbiomed.2024.108841AbstractSpeech emotion recognition (SER) stands as a prominent and dynamic research field in data science due to its extensive application in various domains such as psychological assessment, mobile services, and computer games, mobile services. In ...
Highlights- 18 distinct features were derived prior to the data preprocessing and augmentation.
- The feature set obtained by feature selection was given to model emotion recognition.
- This research used cross-corpus speech files common emotions ...
- research-articleAugust 2024
Speech emotion recognition for human–computer interaction
International Journal of Speech Technology (SPIJST), Volume 27, Issue 3Pages 817–830https://doi.org/10.1007/s10772-024-10138-0AbstractSpeech emotion recognition (SER) is a vital component of the human–computer interaction system. The traditional Deep learning-based speech SER schemes show poor time-domain representation, class imbalance issues due to uneven samples in the ...
-
- ArticleAugust 2024
PCQ: Emotion Recognition in Speech via Progressive Channel Querying
Advanced Intelligent Computing Technology and ApplicationsPages 264–275https://doi.org/10.1007/978-981-97-5588-2_23AbstractIn human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions. Traditional SER methods struggle to effectively capture the long-term temporal correlations and dynamic ...
- ArticleAugust 2024
MBDA: A Multi-scale Bidirectional Perception Approach for Cross-Corpus Speech Emotion Recognition
Advanced Intelligent Computing Technology and ApplicationsPages 329–341https://doi.org/10.1007/978-981-97-5669-8_27AbstractEffectively combining context to extract emotion-related features poses a significant challenge in the task of speech emotion recognition (SER). To address this challenge, this paper proposes the Bidirectional Temporal Multi-Scale Attention ...
- research-articleAugust 2024
Squeeze-and-excitation 3D convolutional attention recurrent network for end-to-end speech emotion recognition
AbstractSpeech emotion recognition (SER) is difficult since emotions are complex and dynamic processes involving multiple dimensions and sub-dimensions. Feature extraction is a challenging step in SER, where relevant features are extracted from the ...
Highlights- Proposed 3DCNN, Squeeze-and-Excitation (SE) and attention GRU model for SER.
- To capture spatial and temporal features, SE-3DCNN model is applied to 3D-MFCC.
- Attention GRU is applied to seamlessly learn important temporal ...
- research-articleJuly 2024
A Novel Dual Kernel Support Vector-Based Levy Dung Beetle Algorithm for Accurate Speech Emotion Detection
Circuits, Systems, and Signal Processing (CSSP), Volume 43, Issue 11Pages 7249–7284https://doi.org/10.1007/s00034-024-02791-2AbstractHuman emotions are easy to identify through facial expressions, body movements, and gestures. Speech carries a lot of emotional cues including variations in pitch, tone, intensity, and rhythm. In recent years, the increasing demand for human–...
- research-articleJuly 2024
Joint enhancement and classification constraints for noisy speech emotion recognition
AbstractIn the natural environment, the received speech signal is often interfered by noise, which reduces the performance of speech emotion recognition (SER) system. To this end, a noisy SER method based on joint constraints, including enhancement ...
Highlights- Extracting MDSF enhances speech emotional classification feature robustness.
- Proposed noisy SER model based on CNN-ALSTM integrates SE and AVC task constraints.
- SE and AVC tasks reduce noise interference and improve emotion feature ...
- research-articleJuly 2024
ADMRF: Elucidation of deep feature extraction and adaptive deep Markov random fields with improved heuristic algorithm for speech emotion recognition
International Journal of Speech Technology (SPIJST), Volume 27, Issue 3Pages 569–597https://doi.org/10.1007/s10772-024-10115-7AbstractOn considering Human–Computer Interaction, the recognition of emotion over speech is developed from the niche phase into the most essential component. But, it is regarded as the most challenging issue due to the unclear features that are required ...
- research-articleJuly 2024
Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network)
International Journal of Speech Technology (SPIJST), Volume 27, Issue 3Pages 551–568https://doi.org/10.1007/s10772-024-10123-7AbstractUnderstanding human emotions is necessary for various tasks, including interpersonal interaction, knowledge acquisition, and determining courses of action. Recognizing emotions, particularly in speech, poses significant challenges due to ...
- research-articleJuly 2024
Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion
The Journal of Supercomputing (JSCO), Volume 80, Issue 16Pages 23667–23688https://doi.org/10.1007/s11227-024-06158-xAbstractSelf-supervised learning models, such as Wav2vec 2.0, extract efficient features for speech processing applications including speech emotion recognition. In this study, we propose a Dimension Reduction Module (DRM) to apply to the output of each ...
- research-articleJuly 2024
MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware
Expert Systems with Applications: An International Journal (EXWA), Volume 246, Issue Chttps://doi.org/10.1016/j.eswa.2023.123110AbstractSpeech emotion recognition (SER) is a crucial topic in human–computer interaction. However, there are still many challenges to extracting emotional embeddings. Emotional embeddings extracted by network models often contain noise and incomplete ...
- review-articleOctober 2024
Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review
Highlights- A thorough analysis of the speech data corpora used in various studies, highlighting their size, diversity, and relevance to real-world applications.
- An examination of the different speech features employed for emotion classification, ...
Emotion recognition from speech has become crucial in human-computer interaction and affective computing applications. This review paper examines the complex relationship between two critical factors: the selection of speech data corpora and the ...
- research-articleJune 2024
Multi-level attention fusion network assisted by relative entropy alignment for multimodal speech emotion recognition
Applied Intelligence (KLU-APIN), Volume 54, Issue 17-18Pages 8478–8490https://doi.org/10.1007/s10489-024-05630-8AbstractMultimodal speech emotion recognition can utilize features from different modalities simultaneously to improve the modeling capabilities in affective computing. However, the rough feature combining method may not effectively promote interaction ...
- research-articleJune 2024
Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset
AbstractOver the last decade, emotion recognition has become a widely researched area worth considering in any project related to affective computing. Due to the limitless applications of this new discipline, the development of emotion recognition systems ...
- research-articleJuly 2024
MA-CapsNet-DA: Speech emotion recognition based on MA-CapsNet using data augmentation▪
Expert Systems with Applications: An International Journal (EXWA), Volume 244, Issue Chttps://doi.org/10.1016/j.eswa.2023.122939AbstractSpeech emotion recognition (SER) plays a crucial role in Human–computer interaction (HCI) applications. However, it has two challenges: the lack of effectiveness of deep learning models and data scarcity issues. As a result, the deep learning ...
Highlights- The Max-avg-pooling capsule network is more robust.
- Multi-domain integration enhances the diversity of data.
- Noisy data augmentation can significantly increase data scale.
- Multiple SNR consolidation completely solves the ...
- research-articleJune 2024
Speech emotion recognition with transfer learning and multi-condition training for noisy environments
International Journal of Speech Technology (SPIJST), Volume 27, Issue 2Pages 353–365https://doi.org/10.1007/s10772-024-10109-5AbstractThis paper explores the use of transfer learning techniques to develop robust speech emotion recognition (SER) models capable of handling noise in real-world environments. Two SER frameworks have been proposed in this work: Framework-1 is a two-...
- research-articleMay 2024
Unveiling hidden factors: explainable AI for feature boosting in speech emotion recognition
Applied Intelligence (KLU-APIN), Volume 54, Issue 11-12Pages 7046–7069https://doi.org/10.1007/s10489-024-05536-5AbstractSpeech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature ...