research-article

Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze

Authors:

Yunhong WangAuthors Info & Claims

ICMI '19: 2019 International Conference on Multimodal Interaction

Pages 40 - 48

https://doi.org/10.1145/3340555.3353739

Published: 14 October 2019 Publication History

Abstract

Continuous emotion recognition is of great significance in affective computing and human-computer interaction. Most of existing methods for video based continuous emotion recognition utilize facial expression. However, besides facial expression, other clues including head pose and eye gaze are also closely related to human emotion, but have not been well explored in continuous emotion recognition task. On the one hand, head pose and eye gaze could result in different degrees of credibility of facial expression features. On the other hand, head pose and eye gaze carry emotional clues themselves, which are complementary to facial expression. Accordingly, in this paper we propose two ways to incorporate these two clues into continuous emotion recognition. They are respectively an attention mechanism based on head pose and eye gaze clues to guide the utilization of facial features in continuous emotion recognition, and an auxiliary line which helps extract more useful emotion information from head pose and eye gaze. Experiments are conducted on the Recola dataset, a database for continuous emotion recognition, and the results show that our framework outperforms other state-of-the-art methods due to the full use of head pose and eye gaze clues in addition to facial expression for continuous emotion recognition.

References

[1]

Andra Adams, Marwa Mahmoud, Tadas Baltrušaitis, and Peter Robinson. 2015. Decoupling facial expressions and head motions in complex emotions. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 274–280.

Digital Library

[2]

Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, and Friedhelm Schwenker. 2016. Continuous multimodal human affect estimation using echo state networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 67–74.

Digital Library

[3]

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018).

[4]

Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66.

Digital Library

[5]

Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, and Thomas S Huang. 2016. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In PProceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 97–104.

Digital Library

[6]

Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies 59, 1-2 (2003), 119–155.

Digital Library

[7]

Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li, and Zhengqi Wen. 2015. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.

Digital Library

[8]

Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1130–1139.

[9]

Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 19–26.

Digital Library

[10]

Ellen Douglas-Cowie, Roddy Cowie, Ian Sneddon, Cate Cox, Orla Lowry, Margaret Mcrorie, Jean-Claude Martin, Laurence Devillers, Sarkis Abrilian, Anton Batliner, 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 488–500.

Digital Library

[11]

Johnny RJ Fontaine, Klaus R Scherer, Etienne B Roesch, and Phoebe C Ellsworth. 2007. The world of emotions is not two-dimensional. Psychological Science 18, 12 (2007), 1050–1057.

[12]

Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing. 117–124.

[13]

Hatice Gunes, Mihalis A Nicolaou, and Maja Pantic. 2011. Continuous analysis of affect from voice and face. In Proceedings of Computer Analysis of Human Behavior. 255–291.

[14]

Hatice Gunes and Maja Pantic. 2010. Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners. In Proceedings of International Conference on Intelligent Virtual Agents. 371–377.

[15]

Lang He, Dongmei Jiang, Le Yang, Ercheng Pei, Peng Wu, and Hichem Sahli. 2015. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 73–80.

Digital Library

[16]

Matthias Holschneider, Richard Kronland-Martinet, Jean Morlet, and Ph Tchamitchian. 1990. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets. 286–297.

[17]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.

[18]

Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, and Emily Mower Provost. 2017. Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition. arXiv preprint arXiv:1708.07050(2017).

[19]

Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proceedings of IEEE International Conference on Image Processing. 619–623.

[20]

Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 156–165.

[21]

Jiyoung Lee, Sunok Kim, Seungryong Kiim, and Kwanghoon Sohn. 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 1513–1517.

Digital Library

[22]

Qirong Mao, Qiyu Rao, Yongbin Yu, and Ming Dong. 2017. Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Transactions on Multimedia 19, 4 (2017), 861–873.

Digital Library

[23]

Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of ACM International Conference on Multimodal Interaction. 501–508.

Digital Library

[24]

Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, 2018. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 3–13.

Digital Library

[25]

Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of ACM Audio/Visual Emotion Challenge and Workshop. 3–9.

Digital Library

[26]

Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1–8.

[27]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.

Digital Library

[28]

Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 6(2015), 1113–1133.

Digital Library

[29]

Bo Sun, Siming Cao, Liandong Li, Jun He, and Lejun Yu. 2016. Exploring multimodal visual features for continuous affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 83–88.

Digital Library

[30]

Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.

[31]

Hugo Toscano, Thomas W Schubert, and Steffen R Giessner. 2018. Eye Gaze and Head Posture Jointly Influence Judgments of Dominance, Physical Strength, and Anger. Journal of Nonverbal Behavior 42, 3 (2018), 285–309.

[32]

George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 5200–5204.

Digital Library

[33]

Michel F Valstar, Enrique Sánchez-Lozano, Jeffrey F Cohn, László A Jeni, Jeffrey M Girard, Zheng Zhang, Lijun Yin, and Maja Pantic. 2017. Fera 2017-addressing head pose in the third facial expression recognition and analysis challenge. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition. 839–847.

Digital Library

[34]

Laurens Van Der Maaten. 2012. Audio-visual emotion challenge 2012: a simple approach. In Proceedings of ACM International Conference on Multimodal Interaction. 473–476.

Digital Library

[35]

Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn Schuller, Cate Cox, Ellen Douglas-Cowie, and Roddy Cowie. 2008. Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of Interspeech. 597–600.

[36]

Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).

[37]

Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.

Digital Library

Cited By

Goyal SLaddi A(2024)Machine Learning Techniques for Emotion Detection Using Eye Gaze LocalisationMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch002(24-60)Online publication date: 14-May-2024
https://doi.org/10.4018/979-8-3693-4143-8.ch002
Zhang HYin LZhang H(2024)Using subjective emotion, facial expression, and gaze direction to evaluate user affective experience and predict preference when playing single-player gamesErgonomics10.1080/00140139.2024.2359123(1-21)Online publication date: 4-Jun-2024
https://doi.org/10.1080/00140139.2024.2359123
Bisogni CNappi MTortora GDel Bimbo A(2024)Gaze analysisImage and Vision Computing10.1016/j.imavis.2024.104961144:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104961
Show More Cited By

Recommendations

The role of trait anxiety in the interaction between eye gaze and facial expressions
ICNC'09: Proceedings of the 5th international conference on Natural computation

Previous research revealed an interaction between eye gaze and emotional facial expressions. There are also evidences that elevated levels of trait anxiety are associated with an increased ability to accurately recognize facial expressions of fear, and ...
The Role of Trait Anxiety in the Interaction between Eye Gaze and Facial Expressions
ICNC '09: Proceedings of the 2009 Fifth International Conference on Natural Computation - Volume 01

Previous research revealed an interaction between eye gaze and emotional facial expressions. There are also evidences that elevated levels of trait anxiety are associated with an increased ability to accurately recognize facial expressions of fear, and ...
Human-Computer Interaction Using Emotion Recognition from Facial Expression
EMS '11: Proceedings of the 2011 UKSim 5th European Symposium on Computer Modeling and Simulation

This paper describes emotion recognition system based on facial expression. A fully automatic facial expression recognition system is based on three steps: face detection, facial characteristic extraction and facial expression classification. We have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '19: 2019 International Conference on Multimodal Interaction

October 2019

601 pages

ISBN:9781450368605

DOI:10.1145/3340555

Editors:
Wen Gao
Peking University, China
,
Helen Mei Ling Meng
Chinese University of Hong Kong, China
,
Matthew Turk
Toyota Technological Institute at Chicago, USA
,
Susan R. Fussell
Cornell University, USA
,
Björn Schuller
Imperial College London / University of Augsburg, UK
,
Yale Song
Microsoft Research, USA
,
Kai Yu
Shanghai Jiao Tong University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

ICMI '19

ICMI '19: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 14 - 18, 2019

Suzhou, China

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
920
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)7

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Goyal SLaddi A(2024)Machine Learning Techniques for Emotion Detection Using Eye Gaze LocalisationMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch002(24-60)Online publication date: 14-May-2024
https://doi.org/10.4018/979-8-3693-4143-8.ch002
Zhang HYin LZhang H(2024)Using subjective emotion, facial expression, and gaze direction to evaluate user affective experience and predict preference when playing single-player gamesErgonomics10.1080/00140139.2024.2359123(1-21)Online publication date: 4-Jun-2024
https://doi.org/10.1080/00140139.2024.2359123
Bisogni CNappi MTortora GDel Bimbo A(2024)Gaze analysisImage and Vision Computing10.1016/j.imavis.2024.104961144:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104961
Shi CZhang YLiu B(2024)A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videosApplied Intelligence10.1007/s10489-024-05329-w54:4(3040-3057)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s10489-024-05329-w
Williams JFrancombe JMurphy D(2023)Evaluating the Influence of Room Illumination on Camera-Based Physiological Measurements for the Assessment of Screen-Based MediaApplied Sciences10.3390/app1314848213:14(8482)Online publication date: 22-Jul-2023
https://doi.org/10.3390/app13148482
Zhang TEl Ali AWang CHanjalic ACesar P(2023)Weakly-Supervised Learning for Fine-Grained Emotion Recognition Using Physiological SignalsIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315823414:3(2304-2322)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3158234
Singh NKapoor R(2023)Multi-modal Expression Detection (MED)Engineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106661125:COnline publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1016/j.engappai.2023.106661
Kodithuwakku JArachchi DRajasekera J(2022)An Emotion and Attention Recognition System to Classify the Level of Engagement to a Video Conversation by Participants in Real Time Using Machine Learning Models and Utilizing a Neural Accelerator ChipAlgorithms10.3390/a1505015015:5(150)Online publication date: 27-Apr-2022
https://doi.org/10.3390/a15050150
Terhürne PSchwartz BBaur TSchiller DEberhardt SAndré ELutz W(2022)Validation and application of the Non-Verbal Behavior Analyzer: An automated tool to assess non-verbal emotional expressions in psychotherapyFrontiers in Psychiatry10.3389/fpsyt.2022.102601513Online publication date: 28-Oct-2022
https://doi.org/10.3389/fpsyt.2022.1026015
陆玉(2022)A Novel Engagement Recognition Network by Fusion Facial Appearance and Multi-Behavioral FeaturesComputer Science and Application10.12677/CSA.2022.12411912:04(1163-1174)Online publication date: 2022
https://doi.org/10.12677/CSA.2022.124119
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents