Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3340555.3353739acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Continuous Emotion Recognition in Videos by Fusing Facial Expression, Head Pose and Eye Gaze

Published: 14 October 2019 Publication History

Abstract

Continuous emotion recognition is of great significance in affective computing and human-computer interaction. Most of existing methods for video based continuous emotion recognition utilize facial expression. However, besides facial expression, other clues including head pose and eye gaze are also closely related to human emotion, but have not been well explored in continuous emotion recognition task. On the one hand, head pose and eye gaze could result in different degrees of credibility of facial expression features. On the other hand, head pose and eye gaze carry emotional clues themselves, which are complementary to facial expression. Accordingly, in this paper we propose two ways to incorporate these two clues into continuous emotion recognition. They are respectively an attention mechanism based on head pose and eye gaze clues to guide the utilization of facial features in continuous emotion recognition, and an auxiliary line which helps extract more useful emotion information from head pose and eye gaze. Experiments are conducted on the Recola dataset, a database for continuous emotion recognition, and the results show that our framework outperforms other state-of-the-art methods due to the full use of head pose and eye gaze clues in addition to facial expression for continuous emotion recognition.

References

[1]
Andra Adams, Marwa Mahmoud, Tadas Baltrušaitis, and Peter Robinson. 2015. Decoupling facial expressions and head motions in complex emotions. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 274–280.
[2]
Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, and Friedhelm Schwenker. 2016. Continuous multimodal human affect estimation using echo state networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 67–74.
[3]
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271(2018).
[4]
Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59–66.
[5]
Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, and Thomas S Huang. 2016. Multi-modal audio, video and physiological sensor learning for continuous emotion prediction. In PProceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 97–104.
[6]
Cynthia Breazeal. 2003. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies 59, 1-2 (2003), 119–155.
[7]
Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li, and Zhengqi Wen. 2015. Long short term memory recurrent neural network based multimodal dimensional emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.
[8]
Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1130–1139.
[9]
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 19–26.
[10]
Ellen Douglas-Cowie, Roddy Cowie, Ian Sneddon, Cate Cox, Orla Lowry, Margaret Mcrorie, Jean-Claude Martin, Laurence Devillers, Sarkis Abrilian, Anton Batliner, 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of International Conference on Affective Computing and Intelligent Interaction. 488–500.
[11]
Johnny RJ Fontaine, Klaus R Scherer, Etienne B Roesch, and Phoebe C Ellsworth. 2007. The world of emotions is not two-dimensional. Psychological Science 18, 12 (2007), 1050–1057.
[12]
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Proceedings of International Conference on Neural Information Processing. 117–124.
[13]
Hatice Gunes, Mihalis A Nicolaou, and Maja Pantic. 2011. Continuous analysis of affect from voice and face. In Proceedings of Computer Analysis of Human Behavior. 255–291.
[14]
Hatice Gunes and Maja Pantic. 2010. Dimensional emotion prediction from spontaneous head gestures for interaction with sensitive artificial listeners. In Proceedings of International Conference on Intelligent Virtual Agents. 371–377.
[15]
Lang He, Dongmei Jiang, Le Yang, Ercheng Pei, Peng Wu, and Hichem Sahli. 2015. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 73–80.
[16]
Matthias Holschneider, Richard Kronland-Martinet, Jean Morlet, and Ph Tchamitchian. 1990. A real-time algorithm for signal analysis with the help of the wavelet transform. In Wavelets. 286–297.
[17]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[18]
Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, and Emily Mower Provost. 2017. Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition. arXiv preprint arXiv:1708.07050(2017).
[19]
Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proceedings of IEEE International Conference on Image Processing. 619–623.
[20]
Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 156–165.
[21]
Jiyoung Lee, Sunok Kim, Seungryong Kiim, and Kwanghoon Sohn. 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 1513–1517.
[22]
Qirong Mao, Qiyu Rao, Yongbin Yu, and Ming Dong. 2017. Hierarchical bayesian theme models for multipose facial expression recognition. IEEE Transactions on Multimedia 19, 4 (2017), 861–873.
[23]
Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of ACM International Conference on Multimodal Interaction. 501–508.
[24]
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, 2018. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 3–13.
[25]
Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of ACM Audio/Visual Emotion Challenge and Workshop. 3–9.
[26]
Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. 1–8.
[27]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.
[28]
Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 6(2015), 1113–1133.
[29]
Bo Sun, Siming Cao, Liandong Li, Jun He, and Lejun Yu. 2016. Exploring multimodal visual features for continuous affect recognition. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 83–88.
[30]
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.
[31]
Hugo Toscano, Thomas W Schubert, and Steffen R Giessner. 2018. Eye Gaze and Head Posture Jointly Influence Judgments of Dominance, Physical Strength, and Anger. Journal of Nonverbal Behavior 42, 3 (2018), 285–309.
[32]
George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 5200–5204.
[33]
Michel F Valstar, Enrique Sánchez-Lozano, Jeffrey F Cohn, László A Jeni, Jeffrey M Girard, Zheng Zhang, Lijun Yin, and Maja Pantic. 2017. Fera 2017-addressing head pose in the third facial expression recognition and analysis challenge. In Proceedings of IEEE International Conference on Automatic Face & Gesture Recognition. 839–847.
[34]
Laurens Van Der Maaten. 2012. Audio-visual emotion challenge 2012: a simple approach. In Proceedings of ACM International Conference on Multimodal Interaction. 473–476.
[35]
Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn Schuller, Cate Cox, Ellen Douglas-Cowie, and Roddy Cowie. 2008. Abandoning emotion classes-towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of Interspeech. 597–600.
[36]
Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701(2012).
[37]
Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions. In Proceedings of ACM International Audio/Visual Emotion Challenge and Workshop. 65–72.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '19: 2019 International Conference on Multimodal Interaction
October 2019
601 pages
ISBN:9781450368605
DOI:10.1145/3340555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention
  2. Continuous Emotion Recognition
  3. Eye gaze
  4. Facial Expression
  5. Head Pose

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICMI '19

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)7
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Machine Learning Techniques for Emotion Detection Using Eye Gaze LocalisationMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch002(24-60)Online publication date: 14-May-2024
  • (2024)Using subjective emotion, facial expression, and gaze direction to evaluate user affective experience and predict preference when playing single-player gamesErgonomics10.1080/00140139.2024.2359123(1-21)Online publication date: 4-Jun-2024
  • (2024)Gaze analysisImage and Vision Computing10.1016/j.imavis.2024.104961144:COnline publication date: 1-Apr-2024
  • (2024)A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videosApplied Intelligence10.1007/s10489-024-05329-w54:4(3040-3057)Online publication date: 1-Feb-2024
  • (2023)Evaluating the Influence of Room Illumination on Camera-Based Physiological Measurements for the Assessment of Screen-Based MediaApplied Sciences10.3390/app1314848213:14(8482)Online publication date: 22-Jul-2023
  • (2023)Weakly-Supervised Learning for Fine-Grained Emotion Recognition Using Physiological SignalsIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315823414:3(2304-2322)Online publication date: 1-Jul-2023
  • (2023)Multi-modal Expression Detection (MED)Engineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106661125:COnline publication date: 1-Oct-2023
  • (2022)An Emotion and Attention Recognition System to Classify the Level of Engagement to a Video Conversation by Participants in Real Time Using Machine Learning Models and Utilizing a Neural Accelerator ChipAlgorithms10.3390/a1505015015:5(150)Online publication date: 27-Apr-2022
  • (2022)Validation and application of the Non-Verbal Behavior Analyzer: An automated tool to assess non-verbal emotional expressions in psychotherapyFrontiers in Psychiatry10.3389/fpsyt.2022.102601513Online publication date: 28-Oct-2022
  • (2022)A Novel Engagement Recognition Network by Fusion Facial Appearance and Multi-Behavioral FeaturesComputer Science and Application10.12677/CSA.2022.12411912:04(1163-1174)Online publication date: 2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media