research-article

Enhanced face/audio emotion recognition: video and instance level classification using ConvNets and restricted Boltzmann Machines

Authors:

Juan M. Mayor Torres,

Evgeny A. StepanovAuthors Info & Claims

WI '17: Proceedings of the International Conference on Web Intelligence

Pages 939 - 946

https://doi.org/10.1145/3106426.3109423

Published: 23 August 2017 Publication History

Abstract

Face-based and audio-based emotion recognition modalities have been studied profusely obtaining successful classification rates for arousal/valence levels and multiple emotion categories settings. However, recent studies only focus their attention on classifying discrete emotion categories with a single image representation and/or a single set of audio feature descriptors. Face-based emotion recognition systems use a single image channel representations such as principal-components-analysis whitening, isotropic smoothing, or ZCA whitening. Similarly, audio emotion recognition systems use a standardized set of audio descriptors, including only averaged Mel-Frequency Cepstral coefficients. Both approaches imply the inclusion of decision-fusion modalities to compensate the limited feature separability and achieve high classification rates. In this paper, we propose two new methodologies for enhancing face-based and audio-based emotion recognition based on a single classifier decision and using the EU Emotion Stimulus dataset: (1) A combination of a Convolutional Neural Networks for frame-level feature extraction with a k-Nearest Neighbors classifier for the subsequent frame-level aggregation and video-level classification, and (2) a shallow Restricted Boltzmann Machine network for arousal/valence classification.

References

[1]

Yoshua Bengio and others. 2009. Learning deep architectures for AI. Foundations and trends® in Machine Learning 2, 1 (2009), 1--127.

Digital Library

[2]

Adam Coates and Andrew Y Ng. 2011. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems. 2528--2536.

Digital Library

[3]

Morena Danieli, Giuseppe Riccardi, and Firoj Alam. 2015. Emotion unfolding and affective scenes: A case study in spoken conversations. In Proceedings of the International Workshop on Emotion Representations and Modelling for Companion Technologies. ACM, 5--11.

Digital Library

[4]

Sidney K D'Mello, Scotty D Craig, and Art C Graesser. 2009. Multimethod assessment of affective experience and expression during deep learning. International Journal of Learning Technology 4, 3--4 (2009), 165--187.

Digital Library

[5]

Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. 2015. Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 467--474.

Digital Library

[6]

Paul Ekman. 1992. An argument for basic emotions. Cognition & emotion 6, 3--4 (1992), 169--200.

[7]

Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.

Digital Library

[8]

Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, and others. 2016. Emonets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces 10, 2 (2016), 99--111.

[9]

Samira Ebrahimi Kahou, Christopher Pal, Xavier Bouthillier, Pierre Froumenty, Çaglar Gülçehre, Roland Memisevic, Pascal Vincent, Aaron Courville, Yoshua Bengio, Raul Chandias Ferrari, and others. 2013. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International conference on multimodal interaction. ACM, 543--550.

Digital Library

[10]

Mahir Faik Karaaba, Olarik Surinta, LRB Schomaker, and Marco A Wiering. 2016. Robust face identification with small sample sizes using bag of words and histogram of oriented gradients. In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. 582--589.

[11]

Shengcai Liao, Anil K Jain, and Stan Z Li. 2016. A fast and accurate unconstrained face detector. IEEE transactions on pattern analysis and machine intelligence 38, 2 (2016), 211--223.

Digital Library

[12]

Heather O'leary, Juan M. Mayor Torres, Walter E. Kaufmann, and Mustafa Sahin. 2017. Classification of Respiratory Disturbances in Rett Syndrome patients using Restricted Boltzmann Machine. In 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE.

[13]

Helen O'Reilly, Delia Pigat, Shimrit Fridenson, Steve Berggren, Shahar Tal, Ofer Golan, Sven Bölte, Simon Baron-Cohen, and Daniel Lundqvist. 2016. The EU-emotion stimulus set: a validation study. Behavior research methods 48, 2 (2016), 567--576.

[14]

Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 439--448.

[15]

Peter Robinson and Tadas Baltrušaitis. 2015. Empirical analysis of continuous affect. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 288--294.

Digital Library

[16]

Anwar Saeed, Ayoub Al-Hamadi, and Heiko Neumann. 2017. Facial point localization via neural networks in a cascade regression framework. Multimedia Tools and Applications (2017), 1--23.

[17]

Hesam Sagha, Pavel Matejka, Maryna Gavryukova, Filip Povolny, Erik Marchi, and Björn Schuller. 2016. Enhancing multilingual recognition of emotion in speech by language identification. Interspeech 2016 (2016), 2949--2953.

[18]

Björn W Schuller, Stefan Steidl, Anton Batliner, and others. 2009. The INTERSPEECH 2009 emotion challenge. In Interspeech, Vol. 2009. 312--315.

[19]

Nicu Sebe, Ira Cohen, Theo Gevers, and Thomas S Huang. 2005. Multimodal approaches for emotion recognition: a survey. In Electronic Imaging 2005. International Society for Optics and Photonics, 56--67.

[20]

Ingo Siegert, Ronald Böck, Bogdan Vlasenko, David Philippou-Hübner, and Andreas Wendemuth. 2011. Appropriate emotional labelling of non-acted speech using basic emotions, geneva emotion wheel and self assessment manikins. In Multimedia and Expo (ICME), 2011 IEEE International Conference on. IEEE, 1--6.

Digital Library

[21]

Mohammad Soleymani, Guillaume Chanel, Joep JM Kierkels, and Thierry Pun. 2008. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on. Ieee, 228--235.

Digital Library

[22]

Vitomir Štruc and N Pavešic. 2011. Photometric normalization techniques for illumination invariance. Advances in Face Image Analysis: Techniques and Technologies (2011), 279--300.

[23]

Bo Sun, Liandong Li, Xuewen Wu, Tian Zuo, Ying Chen, Guoyan Zhou, Jun He, and Xiaoming Zhu. 2016. Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. Journal on Multimodal User Interfaces 10, 2 (2016), 125--137.

[24]

Martin Wöllmer, Moritz Kaiser, Florian Eyben, Felix Weninger, Björn Schuller, and Gerhard Rigoll. 2012. Fully automatic audiovisual emotion recognition: Voice, words, and the face. In Speech Communication; 10. ITG Symposium; Proceedings of. VDE, 1--4.

[25]

Jingjie Yan, Wenming Zheng, Qinyu Xu, Guanming Lu, Haibo Li, and Bei Wang. 2016. Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech. IEEE Transactions on Multimedia 18, 7 (2016), 1319--1329.

Digital Library

[26]

Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for face alignment with auxiliary attributes. IEEE transactions on pattern analysis and machine intelligence 38, 5 (2016), 918--930.

Digital Library

[27]

Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2879--2886.

Digital Library

Cited By

Rao Veeranki YPosada-Quintero HSwaminathan R(2024)Transition Network-Based Analysis of Electrodermal Activity Signals for Emotion RecognitionIRBM10.1016/j.irbm.2024.100849(100849)Online publication date: Jul-2024
https://doi.org/10.1016/j.irbm.2024.100849
Siddiqui MDhakal PYang XJavaid A(2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
https://doi.org/10.3390/mti6060047
Siddiqui MJavaid A(2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
https://doi.org/10.3390/mti4030046
Show More Cited By

Recommendations

Audio and face video emotion recognition in the wild using deep neural networks and small datasets
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

This paper presents the techniques used in our contribution to Emotion Recognition in the Wild 2016’s video based sub-challenge. The purpose of the sub-challenge is to classify the six basic emotions (angry, sad, happy, surprise, fear & disgust) and ...
The affective facial recognition task: The influence of cognitive styles and exposure times
Abstract
The main task of emotional facial recognition is to understand human emotion expression through the recognition of facial expressions, so as to achieve more effective communication and interpersonal communication. Therefore, facial ...
Emotion recognition by face dynamics
CompSysTech '13: Proceedings of the 14th International Conference on Computer Systems and Technologies

The paper proposes an accessible method for emotion recognition from facial dynamics in video streams. The emotions considered are anger, disgust, fear, happiness, sadness, surprise, and the neutral expression as well. The method is based on the Facial ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WI '17: Proceedings of the International Conference on Web Intelligence

August 2017

1284 pages

ISBN:9781450349512

DOI:10.1145/3106426

Conference Chair:
Amit Sheth
Wright State University GuoROLE@GENERAL CHAIR
,
General Chairs:
Axel Ngonga
Leipzig University, Germany
,
yin Wang
Chongqing University of Posts and Telecommunications, China
,
Elizabeth Chang
The University of New South Wales, Australia
,
Dominik Ślęzak
Infobright Inc. & University of Warsaw, Poland
,
Bogdan Franczyk
Leipzig University, Germany
,
Program Chairs:
Rainer Alt
Leipzig University, Germany
,
Xiaohui Tao
University of Southern Queensland, Australia

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
TCII: IEEE Computer Society Technical Committee on Intelligent Informatics
Web Intelligence Consortium

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WI '17

Sponsor:

SIGAI
TCII

WI '17: International Conference on Web Intelligence 2017

August 23 - 26, 2017

Leipzig, Germany

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;

Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
209
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rao Veeranki YPosada-Quintero HSwaminathan R(2024)Transition Network-Based Analysis of Electrodermal Activity Signals for Emotion RecognitionIRBM10.1016/j.irbm.2024.100849(100849)Online publication date: Jul-2024
https://doi.org/10.1016/j.irbm.2024.100849
Siddiqui MDhakal PYang XJavaid A(2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
https://doi.org/10.3390/mti6060047
Siddiqui MJavaid A(2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
https://doi.org/10.3390/mti4030046
Sharma GDhall A(2020)A Survey on Automatic Multimodal Emotion Recognition in the WildAdvances in Data Science: Methodologies and Applications10.1007/978-3-030-51870-7_3(35-64)Online publication date: 27-Aug-2020
https://doi.org/10.1007/978-3-030-51870-7_3
Naveenkumar MKaliappan V(2019)Audio based Emotion Detection and Recognizing Tool Using Mel Frequency based Cepstral CoefficientJournal of Physics: Conference Series10.1088/1742-6596/1362/1/0120631362(012063)Online publication date: 16-Nov-2019
https://doi.org/10.1088/1742-6596/1362/1/012063
Vishnu Priya R(2019)Emotion recognition from geometric fuzzy membership functionsMultimedia Tools and Applications10.1007/s11042-018-6954-978:13(17847-17878)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1007/s11042-018-6954-9
Vishnu Priya RVijayakumar VTavares J(2019)MQSMER: a mixed quadratic shape model with optimal fuzzy membership functions for emotion recognitionNeural Computing and Applications10.1007/s00521-018-3940-0Online publication date: 22-Jan-2019
https://doi.org/10.1007/s00521-018-3940-0
Hwooi SLoo CSabri A(2019)Emotion Differentiation Based on Arousal Intensity Estimation from Facial ExpressionsInformation Science and Applications10.1007/978-981-15-1465-4_26(249-257)Online publication date: 19-Dec-2019
https://doi.org/10.1007/978-981-15-1465-4_26
Torres JClarkson TStepanov ELuhmann CLerner MRiccardi G(2018)Enhanced Error Decoding from Error-Related Potentials using Convolutional Neural Networks2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)10.1109/EMBC.2018.8512183(360-363)Online publication date: Jul-2018
https://doi.org/10.1109/EMBC.2018.8512183

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents