Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-642-12397-9_21guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Emotional vocal expressions recognition using the COST 2102 italian database of emotional speech

Published: 23 March 2009 Publication History

Abstract

The present paper proposes a new speaker-independent approach to the classification of emotional vocal expressions by using the COST 2102 Italian database of emotional speech. The audio records extracted from video clips of Italian movies possess a certain degree of spontaneity and are either noisy or slightly degraded by an interruption making the collected stimuli more realistic in comparison with available emotional databases containing utterances recorded under studio conditions. The audio stimuli represent 6 basic emotional states: happiness, sarcasm/irony, fear, anger, surprise, and sadness. For these more realistic conditions, and using a speaker independent approach, the proposed system is able to classify the emotions under examination with 60.7% accuracy by using a hierarchical structure consisting of a Perceptron and fifteen Gaussian Mixture Models (GMM) trained to distinguish within each pair (couple) of emotions under examination. The best features in terms of high discriminative power were selected by using the Sequential Floating Forward Selection (SFFS) algorithm among a large number of spectral, prosodic and voice quality features. The results were compared with the subjective evaluation of the stimuli provided by human subjects.

References

[1]
Christian, J., Deeming, A.: Affective Human-Robotic Interaction. Affect and Emotion in Human-Computer Interaction: From Theory to Applications, Christian Peter, Russell Beale (2008).
[2]
Sony AIBO Europe, Sony Entertainment, http://www.sonydigital-link.com/AIBO/.
[3]
Petrushin, V.: Emotion in Speech: Recognition and Application to Call Centers. In: Proceedings of the Conference on Artificial Neural Networks in Engineering, pp. 7-10 (1999).
[4]
Van Bezooijen, R.: The Characteristics and Recognisability of Vocal Expression of Emotions. Drodrecht, The Netherlands, Foris (1984).
[5]
Rahurkar, M., Hansen, J.H.L.: Frequency Band Analysis for Stress Detection Using Teager energy Operator Based Feature. In: Proc. Int. Conf. Spoken Language Processing (ICSLP 2002), vol. 3, pp. 2021-2024 (2002).
[6]
Navas, E., Hernáez, L.I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117-1127 (2006).
[7]
Atassi, H., Esposito, A.: A Speaker Independent Approach to the Classification of Emotional Vocal Expressions. In: Proc. of 20th Int. Conf. Tools with Artificial Intelligence, ICTAI 2008, pp. 147-151. IEEE Computer Society, Dayton (2008).
[8]
Pudil, P., Ferri, F., Novovicova, J., Kittler, J.: Floating search method for feature selection with non monotonic criterion functions. Pattern Recognition 2, 279-283 (1994).
[9]
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proceedings of Interspeech, pp. 1517-1520 (2005).
[10]
Ekman, P.: Facial expression of emotion: New findings, new questions. Psychological Science 3, 34-38 (1992).
[11]
Oatley, K., Jenkins, J.M.: Understanding emotions. Blackwell, Oxford (1996).
[12]
Banse, R., Scherer, K.: Acoustic profiles in vocal emotion expression. Journal of Personality & Social Psychology 70(3), 614-636 (1996).
[13]
Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 227-256 (2003).
[14]
Scherer, K.R., Banse, R., Wallbott, H.G.: Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 76-92 (2001).
[15]
Scherer, K.R., Banse, R., Wallbott, H.G., Goldbeck, T.: Vocal cues in emotion encoding and decoding. Motivation and Emotion 15, 123-148 (1991).
[16]
Scherer, K.R.: Vocal correlates of emotional arousal and affective disturbance. In: Wagner, H., Manstead, A. (eds.) Handbook of social Psychophysiology, pp. 165-197. Wiley, New York (1989).
[17]
Esposito, A., Riviello, M.T., Di Maio, G.: The COST 2102 Italian Audio and Video Emotional Database. In: To be published in Proceedings of WIRN 2009, Vietri sul Mare, May 28-30, IOS press, Amsterdam (2009).
[18]
Esposito, A., Riviello, M.T., Bourbakis, N.: Cultural Specific Effects on the Recognition of Basic Emotions: A Study on Italian Subjects. In: Holzinger, A. (ed.) USAB 2009. LNCS, vol. 5889, pp. 135-148. Springer, Heidelberg (2009).
[19]
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-Based Speech Emotion Recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2003, Hong Kong, China, vol. 2 (2003).
[20]
Nogueiras, A., Marino, J.B., Moreno, A., Bonafonte, A.: Speech emotion recognition using hidden Markov models. In: Proc. European Conf. Speech Communication and Technology (Eurospeech 2001), Denmark (2001).
[21]
Ververidis, D., Kotropoulos, C.: Emotional speech classification using Gaussian mixture models and the sequential floating forward selection algorithm. In: Proc. Int. Conf. Multimedia and Expo, ICME 2005 (2005).
[22]
Ververidis, D., Kotropoulos, C.: Automatic Speech Classification to five emotional states based on gender information. In: Proc. 12th European Signal Processing Conf., Vienna, pp. 341-344 (2004).
[23]
Pao, T., Chen, Y., Yeh, J.: Emotion Recognition from Mandarin Speech Signals. In: International Symposium on Spoken Language Processing, Chinese (2004).
[24]
Lugger, M., Yang, B.: The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition. In: Proceedings of ICASSP, Honolulu, Hawaii (2007).
[25]
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41, 603-623 (2003).
[26]
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of Acoustic Socienty (4), 1738-1753 (1990).
[27]
Apolloni, B., Aversano, G., Esposito, A.: Preprocessing and Classification of Emotional Features in Speech Sentences. In: Kosarev, Y. (ed.) Proc. of International Workshop on Speech and Computer, SPIIRAS, pp. 49-52 (2000).
[28]
Busso, C., Lee, S., Narayanan, S.S.: Using Neutral Speech Models for Emotional Speech Analysis. In: Interspeech- Eurospeech, Antwerp, Belgium, pp. 2225-2228 (2007).
[29]
Stejskal, V., Smekal, Z., Esposito, A., Bourbakis, N.: The Significance of Empty Speech Pauses: Cognitive and Algorithmic Issues. In: Mele, F., Ramella, G., Santillo, S., Ventriglia, F. (eds.) BVAI 2007. LNCS, vol. 4729, pp. 1-13. Springer, Heidelberg (2007).
[30]
Esposito, A., Aversano, G.: Text Independent Methods for Speech Segmentation. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 261-290. Springer, Heidelberg (2005).
[31]
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, Chichester (2003).
[32]
Scherer, S., Oubbati, M., Schwenker, F., Palm, G.: Real-time emotion recognition using echo state model. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 200-204. Springer, Heidelberg (2008).
[33]
Lee, C., Narayanan, S.: Emotion recognition using a data-driven fuzzy inference system. In: Proceedings of Eurospeech, pp. 157-160 (2003).
[34]
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), vol. 1, pp. 557-560 (2004).
[35]
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004).
[36]
Faundez-Zanuy, M.: Data Fusion at Different Levels. In: Multimodal Signals: Cognitive and Algorithmic Issues: COST Action 2102 and euCognition International School Vietri sul Mare, Italy, pp. 21-26 (2008).
[37]
Beerends, J.G., Rix, A.W., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) The new ITU standard for end-to-end speech quality assessment, Part I - Time-Delay Compensation. J. Audio Eng. Soc. 50(10), 755-764 (2002).
[38]
Esposito, A., Riviello, T.: The New Italian Audio and Video Emotional Database. In: Esposito, A., et al. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony. LNCS, vol. 5967, pp. 255-267. Springer, Heidelberg (2010).

Cited By

View all
  • (2011)Comparison of complementary spectral features of emotional speech for german, czech, and slovakProceedings of the 2011 international conference on Cognitive Behavioural Systems10.1007/978-3-642-34584-5_20(236-250)Online publication date: 21-Feb-2011
  • (2009)The new italian audio and video emotional databaseProceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony10.5555/2162410.2162445(406-422)Online publication date: 23-Mar-2009
  1. Emotional vocal expressions recognition using the COST 2102 italian database of emotional speech

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      COST'09: Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
      March 2009
      445 pages
      ISBN:3642123961
      • Editors:
      • Anna Esposito,
      • Nick Campbell,
      • Carl Vogel,
      • Amir Hussain,
      • Anton Nijholt

      Sponsors

      • Provincia di Salerno: Provincia di Salerno
      • International Institute for Advanced Scientific Studies "E.R. Caianiello": International Institute for Advanced Scientific Studies "E.R. Caianiello"
      • European COST Action 2102: European COST Action 2102
      • Second University of Naples: Second University of Naples
      • Regione Campania

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 23 March 2009

      Author Tags

      1. Italian database
      2. emotion recognition
      3. high level features
      4. spectral features
      5. speech

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2011)Comparison of complementary spectral features of emotional speech for german, czech, and slovakProceedings of the 2011 international conference on Cognitive Behavioural Systems10.1007/978-3-642-34584-5_20(236-250)Online publication date: 21-Feb-2011
      • (2009)The new italian audio and video emotional databaseProceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony10.5555/2162410.2162445(406-422)Online publication date: 23-Mar-2009

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media