Abstract
The current study focuses on speech emotion recognition based on a hierarchical classification scheme. The study aims at overcoming the problem of low accuracy in the case of a large number of emotions that are considered in a specific task. In the proposed method, the emotions are classified based on the valence-arousal 2-dimensional map, and models are trained for each group. In a second pass, with-in group recognition is performed for the group selected in the previous stage.
Dr. Panikos Heracleous is currently with Artificial Intelligence Research Center (AIRC), AIST, Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Busso, C., Bulut, M., Narayanan, S.: Toward effective automatic recognition systems of emotion in speech. In: Gratch, J., Marsella, S. (eds.) Social emotions in nature and artifact: emotions in human and human-computer interaction, pp. 110–127. Oxford University Press, New York, NY, USA (November (2013)
Feng, H., Ueno, S., Kawahara, T.: End-to-end speech emotion recognition combined with acoustic-to-word ASR model. In: Proceedings of Interspeech, pp. 501–505 (2020)
Huang, J., Tao, J., Liu, B., Lian, Z.: Learning utterance-level representations with label smoothing for speech emotion recognition. In: Proceedings of Interspeech, pp. 4079–4083 (2020)
Jalal, M.A., Milner, R., Hain, T., Moore, R.K.: Proceedings of Interspeech, pp. 4084–4088 (2020)
Jalal, M.A., Milner, R., Hain, T.: Empirical interpretation of speech emotion perception with attention based model for speech emotion recognition. In: Proceedings of Interspeech, pp. 4113–4117 (2020)
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke1, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, pp. 5688–5691 (2011)
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of Interspeech, pp. 2023–2027 (2014)
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of Signal and Information Processing Association Annual Summit and Conference (APSIPA) (2016)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Commun. 29, 2352–2449 (2017)
Huynh, X.-P., Tran, T.-D., Kim, Y.-G.: Convolutional neural network models for facial expression recognition using BU-3DFE database. In: Information Science and Applications (ICISA) 2016. LNEE, vol. 376, pp. 441–450. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-0557-2_44
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. J. Lang. Resour. Eval. 335–359 (2008). https://doi.org/10.1007/s10579-008-9076-6
Sahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543–565 (2012)
Bielefeld, B.: Language identification using shifted delta cepstrum. In: Fourteenth Annual Speech Research Symposium (1994)
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis for Speaker Verification. IEEE Trans. Audio, Speech Language Process. 19(4), 788–798 (2011)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd ed. New York. Academic Press, Cambridge, ch. 10 (1990)
Cristianini, N., Taylor, J.S.: Support Vector Machines. Cambridge University Press, Cambridge (2000)
Lubis, N., Sakti, S., Yoshino, K., Nakamura, S.: Positive emotion elicitation in chat-based dialogue systems. IEEE/ACM Trans. Audio, Speech Lang. Process. 27(4), 866–877 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Heracleous, P., Takai, K., Yasuda, K., Yoneyama, A. (2021). A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2021 - Late Breaking Posters. HCII 2021. Communications in Computer and Information Science, vol 1499. Springer, Cham. https://doi.org/10.1007/978-3-030-90179-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-90179-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90178-3
Online ISBN: 978-3-030-90179-0
eBook Packages: Computer ScienceComputer Science (R0)