short-paper

Video-based Emotion Recognition Using Deeply-Supervised Neural Networks

Authors:

Jacqueline C. K. Lam,

Victor O. K. LiAuthors Info & Claims

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 584 - 588

https://doi.org/10.1145/3242969.3264978

Published: 02 October 2018 Publication History

Abstract

Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises generated from head pose, face deformation, and illumination variation. To address this challenge, motivated by recent progress in Convolutional Neural Network (CNN), we develop a novel deeply supervised CNN (DSN) architecture, taking the multi-level and multi-scale features extracted from different convolutional layers to provide a more advanced representation of ER. By embedding a series of side-output layers, our DSN model provides class-wise supervision and integrates predictions from multiple layers. Finally, our team ranked 3rd at the EmotiW 2018 challenge with our model achieving an accuracy of 61.1%.

References

[1]

Sarah Adel Bargal, Emad Barsoum, Cristian Canton Ferrer, and Cha Zhang. 2016. Emotion recognition in the wild from videos using images Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 433--436.

Digital Library

[2]

Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.

[3]

Abhinav Dhall, Roland Goecke, Simon Lucey, Tom Gedeon, et al. 2012. Collecting large, richly annotated facial-expression databases from movies. IEEE multimedia Vol. 19, 3 (2012), 34--41.

Digital Library

[4]

Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction (in press). In Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM.

Digital Library

[5]

Paul Ekman. 2002. Facial action coding system (FACS). A human face (2002).

[6]

Yingruo Fan, Jacqueline C. K. Lam, and Victor O. K. Li. 2018. Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition. arXiv preprint arXiv:1807.10575 (2018).

[7]

Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.

Digital Library

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[9]

Zhiqun He, Yingruo Fan, Junfei Zhuang, Yuan Dong, and HongLiang Bai. 2017. Correlation Filters with Weighted Convolution Responses. ICCV Workshops. 1992--2000.

[10]

Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip Torr. 2017. Deeply supervised salient object detection with short connections Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 5300--5309.

[11]

Ping Hu, Dongqi Cai, Shandong Wang, Anbang Yao, and Yurong Chen. 2017. Learning supervised scoring ensemble for emotion recognition in the wild Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 553--560.

Digital Library

[12]

Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. Vol. 1. 3.

[13]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678.

Digital Library

[14]

Dae Ha Kim, Min Kyu Lee, Dong Yoon Choi, and Byung Cheol Song. 2017. Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 529--535.

Digital Library

[15]

Davis E. King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research Vol. 10, Jul (2009), 1755--1758.

Digital Library

[16]

Boris Knyazev, Roman Shvetsov, Natalia Efremova, and Artem Kuharenko. 2017. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598 (2017).

[17]

Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2584--2593.

[18]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[19]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.

[20]

Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, et al. 2015. Deep Face Recognition. In BMVC, Vol. Vol. 1. 6.

[21]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks Advances in neural information processing systems. 91--99.

Digital Library

[22]

Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2017. Learning bases of activity for facial expression recognition. IEEE Transactions on Image Processing Vol. 26, 4 (2017), 1965--1978.

Digital Library

[23]

Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image and vision Computing Vol. 27, 6 (2009), 803--816.

Digital Library

[24]

Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett. 2012. Exploring bag of words architectures in the facial expression domain European Conference on Computer Vision. Springer, 250--259.

Digital Library

[25]

Chong Sun, Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. 2018. Learning spatial-Aware regressions for visual tracking Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8962--8970.

[26]

Lin Zhong, Qingshan Liu, Peng Yang, Bo Liu, Junzhou Huang, and Dimitris N. Metaxas. 2012. Learning active facial patches for expression analysis Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2562--2569.

Digital Library

Cited By

Mao JXu RYin XChang YNie BHuang AWang Y(2025)POSTER++: A simpler and stronger facial expression recognition networkPattern Recognition10.1016/j.patcog.2024.110951157(110951)Online publication date: Jan-2025
https://doi.org/10.1016/j.patcog.2024.110951
Huang QChen J(2025)xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory NetworkWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_21(249-259)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0055-7_21
Zhang JWang WLi XHan Y(2024)Recognizing facial expressions based on pyramid multi-head grid and spatial attention networkComputer Vision and Image Understanding10.1016/j.cviu.2024.104010244(104010)Online publication date: Jul-2024
https://doi.org/10.1016/j.cviu.2024.104010
Show More Cited By

Index Terms

Video-based Emotion Recognition Using Deeply-Supervised Neural Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Activity recognition and understanding

Recommendations

Video-based emotion recognition using CNN-RNN and C3D hybrid networks
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

In this paper, we present a video-based emotion recognition system submitted to the EmotiW 2016 Challenge. The core module of this system is a hybrid network that combines recurrent neural network (RNN) and 3D convolutional networks (C3D) in a late-...
Recurrent Neural Networks for Emotion Recognition in Video
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Deep learning based approaches to facial analysis and video analysis have recently demonstrated high performance on a variety of key tasks such as face recognition, emotion recognition and activity recognition. In the case of video, information often ...
Multi-Feature Based Emotion Recognition for Video Clips
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

In this paper, we present our latest progress in Emotion Recognition techniques, which combines acoustic features and facial features in both non-temporal and temporal mode. This paper presents the details of our techniques used in the Audio-Video ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

October 2018

687 pages

ISBN:9781450356923

DOI:10.1145/3242969

General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

the Research Grants Council of Hong Kong

Conference

ICMI '18

Sponsor:

SIGCHI

ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 16 - 20, 2018

CO, Boulder, USA

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
1,205
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mao JXu RYin XChang YNie BHuang AWang Y(2025)POSTER++: A simpler and stronger facial expression recognition networkPattern Recognition10.1016/j.patcog.2024.110951157(110951)Online publication date: Jan-2025
https://doi.org/10.1016/j.patcog.2024.110951
Huang QChen J(2025)xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory NetworkWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_21(249-259)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0055-7_21
Zhang JWang WLi XHan Y(2024)Recognizing facial expressions based on pyramid multi-head grid and spatial attention networkComputer Vision and Image Understanding10.1016/j.cviu.2024.104010244(104010)Online publication date: Jul-2024
https://doi.org/10.1016/j.cviu.2024.104010
Yan SWang YMai XZhao QSong WHuang JTao ZWang HGao SZhang W(2024)Empower smart cities with sampling-wise dynamic facial expression recognition via frame-sequence contrastive learningComputer Communications10.1016/j.comcom.2023.12.032216(130-139)Online publication date: Feb-2024
https://doi.org/10.1016/j.comcom.2023.12.032
Bai XWang J(2023)LASTNet: A Swin Transformer with LANets Network for Video emotion recognitionProceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering10.1145/3652628.3652676(291-294)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.1145/3652628.3652676
Zhao RLiu THuang ZLun DLam K(2023)Spatial-Temporal Graphs Plus Transformers for Geometry-Guided Facial Expression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.318173614:4(2751-2767)Online publication date: 1-Oct-2023
https://doi.org/10.1109/TAFFC.2022.3181736
Li WDong XWang Y(2023)Human Emotion Recognition With Relational Region-Level AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306491814:1(650-663)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TAFFC.2021.3064918
Giri MBansal MRamesh ASatvik DD U(2023)Enhancing Safety in Vehicles using Emotion Recognition with Artificial Intelligence2023 IEEE 8th International Conference for Convergence in Technology (I2CT)10.1109/I2CT57861.2023.10126274(1-10)Online publication date: 7-Apr-2023
https://doi.org/10.1109/I2CT57861.2023.10126274
Lee BShin HKu BKo H(2023)Frame Level Emotion Guided Dynamic Facial Expression Recognition with Emotion Grouping2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00602(5681-5691)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00602
Rathod BVanzara RPandya D(2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
https://doi.org/10.1016/j.jvcir.2023.103988
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten