research-article

Open access

Emotion Classification from Speech by an Ensemble Strategy

Authors:

Pedro J.S. Cardoso,

João M.F. RodriguesAuthors Info & Claims

DSAI '22: Proceedings of the 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion

Pages 85 - 90

https://doi.org/10.1145/3563137.3563170

Published: 25 May 2023 Publication History

All formats PDF

Abstract

Humans are prepared to comprehend each other's emotions through subtle body movements and speech expressions, and from those, they change the way they deliver/understand messages when communicating between them. Socially assistive robots need to empower their ability in recognizing emotions in a way to change the interaction with humans, especially with elders. This paper presents a framework for speech emotion prediction supported by an ensemble of distinct out-of-the-box methods, being the main contribution of the integration of the outputs of those methods in a single prediction consistent with the expression presented by the system's user. Results show a classification accuracy of 75.56% over the RAVDESS dataset and 86.43% in a group of datasets constituted by RAVDESS, SAVEE, and TESS.

References

[1]

P. Ekman. 1992. Facial expressions of emotion: New findings new questions, Psychol. Sci., vol. 3, no. 1, pp. 34-38

[2]

H. Kaur and V. Mangat. 2017. A survey of sentiment analysis techniques. In Procs 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), pp. 921-925.

[3]

J. Abdi, A. Al-Hindawi, T. Ng, and M. P. Vizcaychipi. 2018. Scoping review on the use of socially assistive robot technology in elderly care. BMJ open, 8(2), e018815.

[4]

M. Kyrarini, F. Lygerakis, A. Rajavenkatanarayanan, C. Sevastopoulos, H. R. Nambiappan, ... and F. Makedon, F. 2021. A survey of robots in healthcare. Technologies, 9(1), 8.

[5]

C. Getson and G. Nejat. 2021. Socially Assistive Robots Helping Older Adults through the Pandemic and Life after COVID-19. Robotics, 10(3), 106.

[6]

J. Li, Z. Lin, P. Fu, Q. Si and W. Wang. 2020. A hierarchical transformer with speaker modeling for emotion recognition in conversation. arXiv preprint arXiv:2012.14781.

[7]

H. Abdollahi, M. Mahoor, R. Zandie, J. Sewierski and S. Qualls. 2022. Artificial emotional intelligence in socially assistive robots for older adults: A pilot study. IEEE Transactions on Affective Computing.

Digital Library

[8]

A. Sorrentino, G. Mancioppi, L. Coviello, F. Cavallo and L. Fiorini. 2021. Feasibility Study on the Role of Personality, Emotion, and Engagement in Socially Assistive Robotics: A Cognitive Assessment Scenario. Informatics 8, no. 2: 23.

[9]

R. Novais, P.J.S. Cardoso and J.M.F. Rodrigues. 2022. Facial Emotions Classification Supported in an Ensemble Strategy. Accepted in 16th International Conference on Universal Access in Human-Computer Interaction, 26 June - 1 July 2022 (virtual conference)

Digital Library

[10]

S. Ardabili, A. Mosavi and A. R. Várkonyi-Kóczy. 2020. Advances in Machine Learning modeling reviewing hybrid and ensemble methods (pp. 215–227).

[11]

S. R. Livingstone and F.A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391.

[12]

H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova and R. Verma. 2014. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset. IEEE Transactions on Affective Computing, vol. 5, no. 4, pp. 377-390.

[13]

S. Haq, P.J.B. Jackson, and J.D. Edge. 2008. Audio-Visual Feature Selection and Reduction for Emotion Classification. In Proc. Int'l Conf. on Auditory-Visual Speech Processing, pp. 185-190, 2008.

[14]

K. Dupuis and M. K. Pichora-Fuller. 2010. Toronto emotional speech set (TESS).

[15]

A. S. Popova, A. G. Rassadin, and A. A. Ponomarenko. 2017. Emotion recognition in sound. In International Conference on Neuroinformatics, pp. 117-124. Springer, Cham.

[16]

M. Chen, X. He, J. Yang and Han Zhang. 2018 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition. IEEE Signal Processing Letters, vol. 25, no. 10, pp. 1440-1444.

[17]

K. Palanisamy, D. Singhania and A. Yao. 2020. Rethinking CNN models for audio classification. arXiv preprint arXiv:2007.11154.

[18]

M. G. de Pinto, M. Polignano, P. Lops and G. Semeraro. 2020. Emotions Understanding Model from Spoken Language using Deep Neural Networks and Mel-Frequency Cepstral Coefficients. IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), pp. 1-5.

[19]

M. El Seknedy and S. Fawzi. 2021. Speech Emotion Recognition System for Human Interaction Applications. In 10th International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 361-368. IEEE.

[20]

U. Kumaran, S. Radha Rammohan, S. M. Nagarajan and A. Prathik. 2021. Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. International Journal of Speech Technology, 24(2), 303-314.

Digital Library

[21]

B. J. Abbaschian, D. Sierra-Sosa, and A. Elmaghraby. 2021. Deep learning techniques for speech emotion recognition, from databases to models. Sensors, 21(4), 1249.

[22]

E. Lieskovská, M. Jakubec, R. Jarina and M. Chmulík. 2021. A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163.

[23]

S. Prasanth, M. R. Thanka, E. B. Edwin and V. Nagaraj. 2021. Speech emotion recognition based on machine learning tactics and algorithms. Materials Today: Proceedings.

[24]

M. G. de Pinto. 2020. Audio Emotion Classification from Multiple Datasets, https://github.com/marcogdepinto/emotion-classification-from-audio-files, accessed 2022/05/02

[25]

S. Burnwal. 2020. Speech Emotion Recognition, https://www.kaggle.com/code/shivamburnwal/speech-emotion-recognition/notebook. accessed 2022/05/02

[26]

L. Breiman. 2001. Random forests. Machine learning, 45(1), 5-32.

[27]

T. Hastie, S. Rosset, J. Zhu and H. Zou. 2009. Multi-class Adaboost. Statistics and its Interface, 2(3), 349-360.

[28]

V. K. Ayyadevara. 2018. Pro machine learning algorithms. Berkeley: Apress.

[29]

B. McFee, C. Raffel, D. Liang, D. P .W. Ellis, M- McVicar, E- Battenberg and O- Nieto. 2015. Librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, pp. 18-25.

[30]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, ... and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. the Journal of Machine Learning research, 12, 2825-2830.

Cited By

Cardoso PRodrigues JNovais R(2023)Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification ModelsComputational Science – ICCS 202310.1007/978-3-031-36030-5_35(433-447)Online publication date: 26-Jun-2023
https://doi.org/10.1007/978-3-031-36030-5_35
Rodrigues JCardoso P(2023)Body-Focused Expression Analysis: A Conceptual FrameworkUniversal Access in Human-Computer Interaction10.1007/978-3-031-35897-5_42(596-608)Online publication date: 9-Jul-2023
https://doi.org/10.1007/978-3-031-35897-5_42

Recommendations

Speech Emotion Classification using Ensemble Models with MFCC
Abstract
Speech is one of the most promising features that reflects the underlying emotion of a human being. There are some measurable parameters in speech signals that reveal a persons affective state. Speech Emotion Recognition (SER) is a process of ...
Synthesized speech for model training in cross-corpus recognition of human emotion

Recognizing speakers in emotional conditions remains a challenging issue, since speaker states such as emotion affect the acoustic parameters used in typical speaker recognition systems. Thus, it is believed that knowledge of the current speaker emotion ...
Survey on speech emotion recognition: Features, classification schemes, and databases

Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

DSAI '22: Proceedings of the 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion

August 2022

237 pages

ISBN:9781450398077

DOI:10.1145/3563137

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Portuguese Foundation for Science and Technology (FCT), project LARSyS - FCT Project

Conference

DSAI 2022

DSAI 2022: 10th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion

August 31 - September 2, 2022

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 17 of 23 submissions, 74%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
314
Total Downloads

Downloads (Last 12 months)264
Downloads (Last 6 weeks)32

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cardoso PRodrigues JNovais R(2023)Multimodal Emotion Classification Supported in the Aggregation of Pre-trained Classification ModelsComputational Science – ICCS 202310.1007/978-3-031-36030-5_35(433-447)Online publication date: 26-Jun-2023
https://doi.org/10.1007/978-3-031-36030-5_35
Rodrigues JCardoso P(2023)Body-Focused Expression Analysis: A Conceptual FrameworkUniversal Access in Human-Computer Interaction10.1007/978-3-031-35897-5_42(596-608)Online publication date: 9-Jul-2023
https://doi.org/10.1007/978-3-031-35897-5_42

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents