Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks

Mahdi Bejani¹,
Davood Gharavian² &
Nasrollah Moghaddam Charkari³

1583 Accesses
Explore all metrics

Abstract

To make human–computer interaction more naturally and friendly, computers must enjoy the ability to understand human’s affective states the same way as human does. There are many modals such as face, body gesture and speech that people use to express their feelings. In this study, we simulate human perception of emotion through combining emotion-related information using facial expression and speech. Speech emotion recognition system is based on prosody features, mel-frequency cepstral coefficients (a representation of the short-term power spectrum of a sound) and facial expression recognition based on integrated time motion image and quantized image matrix, which can be seen as an extension to temporal templates. Experimental results showed that using the hybrid features and decision-level fusion improves the outcome of unimodal systems. This method can improve the recognition rate by about 15 % with respect to the speech unimodal system and by about 30 % with respect to the facial expression system. By using the proposed multi-classifier system that is an improved hybrid system, recognition rate would increase up to 7.5 % over the hybrid features and decision-level fusion with RBF, up to 22.7 % over the speech-based system and up to 38 % over the facial expression-based system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech emotion recognition using multimodal feature fusion with machine learning approach

Article 21 April 2023

Audiovisual emotion recognition in wild

Article Open access 19 July 2018

Audio-visual emotion recognition using multi-directional regression and Ridgelet transform

Article 26 November 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human call center dialogs. In: Proceedings of the interspeech, pp 801–804
Lee C-C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: Proceedings of the interspeech, pp 320–323
Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: Proceedings of the interspeech, pp 340–343
Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140
Article Google Scholar
Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183
Article Google Scholar
Mansoorizadeh M, Moghaddam Charkari N (2009) Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. In: Proceedings of the 14th international CSI computer conference
Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274
Article Google Scholar
Ekman P, Rosenberg EL (2005) What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS), 2nd edn. Oxford University Press, Oxford
Book Google Scholar
Mehrabian A (1968) Communication without words. Psychol Today 2:53–56
Google Scholar
Greenwald M, Cook E, Lang P (1989) Affective judgment and psychophysiological response: dimensional covariation in the evaluation of pictorial stimuli. J Psychophysiol 3:51–64
Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. PAMI 31:39–58
Article Google Scholar
Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Patt Anal Mach Intell 22:1424–1445
Google Scholar
De Silva LC, Pei Chi N (2000) Bimodal emotion recognition. In: Proceedings of the fourth IEEE international conference on automatic face and gesture recognition, vol 1, pp 332–335
Song M, You M, Li N, Chen C (1920) A robust multimodal approach for emotion recognition. Neurocomputing 71:1913–2008
Article Google Scholar
Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 2, pp 1085–1088
Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: Proceedings of the international conference on acoustics, speech, and signal processing, pp 1125–1128
Paleari M, Benmokhtar R, Huet B (2008) Evidence theory-based multimodal emotion recognition. In: MMM ‘09, pp 435–446
Sheikhan M, Bejani M, Gharavian D (2012) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput Appl J. doi:10.1007/s00521-012-0814-8
Google Scholar
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Transact Speech Audio Process 13:293–303
Article Google Scholar
Gharavian D, Ahadi SM (2005) The effect of emotion on farsi speech parameters: a statistical evaluation. In: Proceedings of the international conference on speech and computer, pp 463–466
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48:1162–1181
Article Google Scholar
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classifications of emotions in speech. Speech Commun 49:201–212
Article Google Scholar
Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multiclass SVMs in emotion detection. Expert Syst Appl 36:8197–8203
Article Google Scholar
Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl. doi:10.1007/s00521-011-0643-1
Google Scholar
Sheikhan M, Safdarkhani MK, Gharavian D (2011) Emotion recognition of speech using small-size selected feature set and ANN-based classifiers: a comparative study. World Appl Sci J 14:616–625
Google Scholar
Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22
Article Google Scholar
Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25:556–570
Article Google Scholar
López-Cózar R, Silovsky J, Kroul M (2011) Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Commun 53:1210–1228
Article Google Scholar
Boersma P, Weenink D (2007) Praat: doing phonetics by computer (version 4.6.12) [computer program]
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Patt Anal Mach Intell 23(3):257–267
Article Google Scholar
Valstar MF, Patras I, Pantic M (2004) Facial action unit recognition using temporal templates. In: IEEE international workshop on human robot interactive communication
Osadchy M, Keren D (2004) A rejection-based method for event detection in video. IEEE Trans Circuits Syst Video Technol 14(4):534–541
Article Google Scholar
Li N, Dettmer S, Shah M (1997) Visually recognizing speech using eigensequences. In: Motion-based recognition. Kluwer, Boston, pp 345–371
Chapter Google Scholar
Babua RV, Ramakrishnanb KR (2004) Recognition of human actions using motion history information extracted from the compressed video. Image Vis Comput 22:597–607
Article Google Scholar
Sadoghi Yazdi H, Amintoosi M, Fathy M (2007) Facial expression recognition with QIM and ITMI spatio-temporal database. In: 4th Iranian conference on machine vision and image processing, Mashhad, Iran, pp 14–15 (Persian)
Intel, OpenCV Open source computer vision library. http://www.intel.com/research/mrl/research/opencv/
Ebrahimpour R (2007) View-independent face recognition with mixture of experts. Dissertation, The Institute for Research in Fundamental Sciences (IPM)
Ghaderi R (2000) Arranging simple neural networks to solve complex classification problems. Dissertation, Surrey University
Wolpert DH (1992) Stacked generalisation. Complex Syst 5:241–259
MathSciNet Google Scholar
Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface ‘05 audio-visual emotion database. In: Proceedings of the 22nd international conference on data engineering workshops (ICDEW ‘06)
Paleari M, Huet B (2008) Toward emotion indexing of multimedia excerpts. In: CBMI
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Interspeech, Lisbon, Portugal
Mansoorizadeh M, Moghaddam Charkari N (2009) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl
Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. In: IEEE international conference on face and gesture recognition (AFGR ‘00), pp 46–53
SPSS (2007) Clementine^® 12.0 algorithms guide. Integral Solutions Limited, Chicago
Google Scholar
Zeng Z, Hu Y, Roisman GI, Wen Z, Fu Y, Huang TS (2007) Audio-visual spontaneous emotion recognition. Artif Intell Hum Comput 4451:72–90
Article Google Scholar
Busso C et al (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the sixth ACM international conference on multimodal interfaces (ICMI ‘04), pp 205–211
Cheng-Yao C, Yue-Kai H, Cook P (2005) Visual/acoustic emotion recognition, pp 1468–1471
Schuller B, Arsic D, Rigoll G, Wimmer M, Radig B (2007) Audiovisual behavior modeling by combined feature spaces. In: ICASSP, pp 733–736

Download references

Author information

Authors and Affiliations

Islamic Azad University, South Tehran Branch, Tehran, Iran
Mahdi Bejani
EE Department, Shahid Abbaspour University, Tehran, Iran
Davood Gharavian
Distributed Processing Lab, Tarbiat Modares University, Tehran, Iran
Nasrollah Moghaddam Charkari

Authors

Mahdi Bejani
View author publications
You can also search for this author in PubMed Google Scholar
Davood Gharavian
View author publications
You can also search for this author in PubMed Google Scholar
Nasrollah Moghaddam Charkari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davood Gharavian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bejani, M., Gharavian, D. & Charkari, N.M. Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput & Applic 24, 399–412 (2014). https://doi.org/10.1007/s00521-012-1228-3

Download citation

Received: 15 April 2012
Accepted: 15 October 2012
Published: 01 November 2012
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00521-012-1228-3

Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech emotion recognition using multimodal feature fusion with machine learning approach

Audiovisual emotion recognition in wild

Audio-visual emotion recognition using multi-directional regression and Ridgelet transform

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech emotion recognition using multimodal feature fusion with machine learning approach

Audiovisual emotion recognition in wild

Audio-visual emotion recognition using multi-directional regression and Ridgelet transform

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation