Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3106426.3109423acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Enhanced face/audio emotion recognition: video and instance level classification using ConvNets and restricted Boltzmann Machines

Published: 23 August 2017 Publication History

Abstract

Face-based and audio-based emotion recognition modalities have been studied profusely obtaining successful classification rates for arousal/valence levels and multiple emotion categories settings. However, recent studies only focus their attention on classifying discrete emotion categories with a single image representation and/or a single set of audio feature descriptors. Face-based emotion recognition systems use a single image channel representations such as principal-components-analysis whitening, isotropic smoothing, or ZCA whitening. Similarly, audio emotion recognition systems use a standardized set of audio descriptors, including only averaged Mel-Frequency Cepstral coefficients. Both approaches imply the inclusion of decision-fusion modalities to compensate the limited feature separability and achieve high classification rates. In this paper, we propose two new methodologies for enhancing face-based and audio-based emotion recognition based on a single classifier decision and using the EU Emotion Stimulus dataset: (1) A combination of a Convolutional Neural Networks for frame-level feature extraction with a k-Nearest Neighbors classifier for the subsequent frame-level aggregation and video-level classification, and (2) a shallow Restricted Boltzmann Machine network for arousal/valence classification.

References

[1]
Yoshua Bengio and others. 2009. Learning deep architectures for AI. Foundations and trends® in Machine Learning 2, 1 (2009), 1--127.
[2]
Adam Coates and Andrew Y Ng. 2011. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems. 2528--2536.
[3]
Morena Danieli, Giuseppe Riccardi, and Firoj Alam. 2015. Emotion unfolding and affective scenes: A case study in spoken conversations. In Proceedings of the International Workshop on Emotion Representations and Modelling for Companion Technologies. ACM, 5--11.
[4]
Sidney K D'Mello, Scotty D Craig, and Art C Graesser. 2009. Multimethod assessment of affective experience and expression during deep learning. International Journal of Learning Technology 4, 3--4 (2009), 165--187.
[5]
Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. 2015. Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 467--474.
[6]
Paul Ekman. 1992. An argument for basic emotions. Cognition & emotion 6, 3--4 (1992), 169--200.
[7]
Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.
[8]
Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, and others. 2016. Emonets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces 10, 2 (2016), 99--111.
[9]
Samira Ebrahimi Kahou, Christopher Pal, Xavier Bouthillier, Pierre Froumenty, Çaglar Gülçehre, Roland Memisevic, Pascal Vincent, Aaron Courville, Yoshua Bengio, Raul Chandias Ferrari, and others. 2013. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International conference on multimodal interaction. ACM, 543--550.
[10]
Mahir Faik Karaaba, Olarik Surinta, LRB Schomaker, and Marco A Wiering. 2016. Robust face identification with small sample sizes using bag of words and histogram of oriented gradients. In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. 582--589.
[11]
Shengcai Liao, Anil K Jain, and Stan Z Li. 2016. A fast and accurate unconstrained face detector. IEEE transactions on pattern analysis and machine intelligence 38, 2 (2016), 211--223.
[12]
Heather O'leary, Juan M. Mayor Torres, Walter E. Kaufmann, and Mustafa Sahin. 2017. Classification of Respiratory Disturbances in Rett Syndrome patients using Restricted Boltzmann Machine. In 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE.
[13]
Helen O'Reilly, Delia Pigat, Shimrit Fridenson, Steve Berggren, Shahar Tal, Ofer Golan, Sven Bölte, Simon Baron-Cohen, and Daniel Lundqvist. 2016. The EU-emotion stimulus set: a validation study. Behavior research methods 48, 2 (2016), 567--576.
[14]
Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 439--448.
[15]
Peter Robinson and Tadas Baltrušaitis. 2015. Empirical analysis of continuous affect. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 288--294.
[16]
Anwar Saeed, Ayoub Al-Hamadi, and Heiko Neumann. 2017. Facial point localization via neural networks in a cascade regression framework. Multimedia Tools and Applications (2017), 1--23.
[17]
Hesam Sagha, Pavel Matejka, Maryna Gavryukova, Filip Povolny, Erik Marchi, and Björn Schuller. 2016. Enhancing multilingual recognition of emotion in speech by language identification. Interspeech 2016 (2016), 2949--2953.
[18]
Björn W Schuller, Stefan Steidl, Anton Batliner, and others. 2009. The INTERSPEECH 2009 emotion challenge. In Interspeech, Vol. 2009. 312--315.
[19]
Nicu Sebe, Ira Cohen, Theo Gevers, and Thomas S Huang. 2005. Multimodal approaches for emotion recognition: a survey. In Electronic Imaging 2005. International Society for Optics and Photonics, 56--67.
[20]
Ingo Siegert, Ronald Böck, Bogdan Vlasenko, David Philippou-Hübner, and Andreas Wendemuth. 2011. Appropriate emotional labelling of non-acted speech using basic emotions, geneva emotion wheel and self assessment manikins. In Multimedia and Expo (ICME), 2011 IEEE International Conference on. IEEE, 1--6.
[21]
Mohammad Soleymani, Guillaume Chanel, Joep JM Kierkels, and Thierry Pun. 2008. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on. Ieee, 228--235.
[22]
Vitomir Štruc and N Pavešic. 2011. Photometric normalization techniques for illumination invariance. Advances in Face Image Analysis: Techniques and Technologies (2011), 279--300.
[23]
Bo Sun, Liandong Li, Xuewen Wu, Tian Zuo, Ying Chen, Guoyan Zhou, Jun He, and Xiaoming Zhu. 2016. Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. Journal on Multimodal User Interfaces 10, 2 (2016), 125--137.
[24]
Martin Wöllmer, Moritz Kaiser, Florian Eyben, Felix Weninger, Björn Schuller, and Gerhard Rigoll. 2012. Fully automatic audiovisual emotion recognition: Voice, words, and the face. In Speech Communication; 10. ITG Symposium; Proceedings of. VDE, 1--4.
[25]
Jingjie Yan, Wenming Zheng, Qinyu Xu, Guanming Lu, Haibo Li, and Bei Wang. 2016. Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech. IEEE Transactions on Multimedia 18, 7 (2016), 1319--1329.
[26]
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for face alignment with auxiliary attributes. IEEE transactions on pattern analysis and machine intelligence 38, 5 (2016), 918--930.
[27]
Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2879--2886.

Cited By

View all
  • (2024)Transition Network-Based Analysis of Electrodermal Activity Signals for Emotion RecognitionIRBM10.1016/j.irbm.2024.100849(100849)Online publication date: Jul-2024
  • (2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
  • (2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WI '17: Proceedings of the International Conference on Web Intelligence
August 2017
1284 pages
ISBN:9781450349512
DOI:10.1145/3106426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ConvNets
  2. EU emotion stimulus
  3. RBM
  4. audio emotion
  5. face emotion

Qualifiers

  • Research-article

Conference

WI '17
Sponsor:

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;
Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Transition Network-Based Analysis of Electrodermal Activity Signals for Emotion RecognitionIRBM10.1016/j.irbm.2024.100849(100849)Online publication date: Jul-2024
  • (2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
  • (2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
  • (2020)A Survey on Automatic Multimodal Emotion Recognition in the WildAdvances in Data Science: Methodologies and Applications10.1007/978-3-030-51870-7_3(35-64)Online publication date: 27-Aug-2020
  • (2019)Audio based Emotion Detection and Recognizing Tool Using Mel Frequency based Cepstral CoefficientJournal of Physics: Conference Series10.1088/1742-6596/1362/1/0120631362(012063)Online publication date: 16-Nov-2019
  • (2019)Emotion recognition from geometric fuzzy membership functionsMultimedia Tools and Applications10.1007/s11042-018-6954-978:13(17847-17878)Online publication date: 1-Jul-2019
  • (2019)MQSMER: a mixed quadratic shape model with optimal fuzzy membership functions for emotion recognitionNeural Computing and Applications10.1007/s00521-018-3940-0Online publication date: 22-Jan-2019
  • (2019)Emotion Differentiation Based on Arousal Intensity Estimation from Facial ExpressionsInformation Science and Applications10.1007/978-981-15-1465-4_26(249-257)Online publication date: 19-Dec-2019
  • (2018)Enhanced Error Decoding from Error-Related Potentials using Convolutional Neural Networks2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)10.1109/EMBC.2018.8512183(360-363)Online publication date: Jul-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media