Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network)

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Understanding human emotions is necessary for various tasks, including interpersonal interaction, knowledge acquisition, and determining courses of action. Recognizing emotions, particularly in speech, poses significant challenges due to linguistic differences, local differences, diversity in gender identities, generational differences, and diversity of cultures. Deep learning methods are promising for automating this task; previous approaches frequently rely on a single type of feature representation, limiting the efficacy of Speech Emotion Recognition (SER). To address these limitations, a comprehensive approach that uses Shifted Window Transformers is proposed, which considers the many different aspects of emotional expression in speech and the use of diverse feature representations to improve SER performance. This paper outlines a novel, Shifted Windowed Transformer Emotional Network (SwinEmoNet), incorporating shifted window attention mechanisms for efficient emotion classification. SwinEmoNet uses local window attention rather than traditional transformer architectures’ global attention mechanisms. This capability allows the model to concentrate essential data in small, finer sections of the input speech signal. The proposed SwinEmoNet architecture has been evaluated against three distinct speech spectrograms. This paper deals with the effectiveness of the proposed SER method by analyzing its performance on the Berlin Emotional Database (EMODB) and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), with an emphasis on metrics such as accuracy, precision, recall and F1-score. With the EMODB and RAVDESS datasets, the accuracy of the SwinEmoNet is 94.93 and 96.51%, respectively, significantly outperforming existing transformer models and current state-of-the-art standards.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Mohanaprasad.

Ethics declarations

Conflict of interest

All the authors do not have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramesh, R., Prahaladhan, V.B., Nithish, P. et al. Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network). Int J Speech Technol 27, 551–568 (2024). https://doi.org/10.1007/s10772-024-10123-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-024-10123-7

Keywords

Navigation