Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3242969.3264978acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Video-based Emotion Recognition Using Deeply-Supervised Neural Networks

Published: 02 October 2018 Publication History

Abstract

Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises generated from head pose, face deformation, and illumination variation. To address this challenge, motivated by recent progress in Convolutional Neural Network (CNN), we develop a novel deeply supervised CNN (DSN) architecture, taking the multi-level and multi-scale features extracted from different convolutional layers to provide a more advanced representation of ER. By embedding a series of side-output layers, our DSN model provides class-wise supervision and integrates predictions from multiple layers. Finally, our team ranked 3rd at the EmotiW 2018 challenge with our model achieving an accuracy of 61.1%.

References

[1]
Sarah Adel Bargal, Emad Barsoum, Cristian Canton Ferrer, and Cha Zhang. 2016. Emotion recognition in the wild from videos using images Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 433--436.
[2]
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.
[3]
Abhinav Dhall, Roland Goecke, Simon Lucey, Tom Gedeon, et al. 2012. Collecting large, richly annotated facial-expression databases from movies. IEEE multimedia Vol. 19, 3 (2012), 34--41.
[4]
Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction (in press). In Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM.
[5]
Paul Ekman. 2002. Facial action coding system (FACS). A human face (2002).
[6]
Yingruo Fan, Jacqueline C. K. Lam, and Victor O. K. Li. 2018. Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition. arXiv preprint arXiv:1807.10575 (2018).
[7]
Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 445--450.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[9]
Zhiqun He, Yingruo Fan, Junfei Zhuang, Yuan Dong, and HongLiang Bai. 2017. Correlation Filters with Weighted Convolution Responses. ICCV Workshops. 1992--2000.
[10]
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip Torr. 2017. Deeply supervised salient object detection with short connections Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 5300--5309.
[11]
Ping Hu, Dongqi Cai, Shandong Wang, Anbang Yao, and Yurong Chen. 2017. Learning supervised scoring ensemble for emotion recognition in the wild Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 553--560.
[12]
Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. Vol. 1. 3.
[13]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678.
[14]
Dae Ha Kim, Min Kyu Lee, Dong Yoon Choi, and Byung Cheol Song. 2017. Multi-modal emotion recognition using semi-supervised learning and multiple neural networks in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 529--535.
[15]
Davis E. King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research Vol. 10, Jul (2009), 1755--1758.
[16]
Boris Knyazev, Roman Shvetsov, Natalia Efremova, and Artem Kuharenko. 2017. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598 (2017).
[17]
Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2584--2593.
[18]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[19]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.
[20]
Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, et al. 2015. Deep Face Recognition. In BMVC, Vol. Vol. 1. 6.
[21]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks Advances in neural information processing systems. 91--99.
[22]
Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2017. Learning bases of activity for facial expression recognition. IEEE Transactions on Image Processing Vol. 26, 4 (2017), 1965--1978.
[23]
Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image and vision Computing Vol. 27, 6 (2009), 803--816.
[24]
Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett. 2012. Exploring bag of words architectures in the facial expression domain European Conference on Computer Vision. Springer, 250--259.
[25]
Chong Sun, Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. 2018. Learning spatial-Aware regressions for visual tracking Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8962--8970.
[26]
Lin Zhong, Qingshan Liu, Peng Yang, Bo Liu, Junzhou Huang, and Dimitris N. Metaxas. 2012. Learning active facial patches for expression analysis Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2562--2569.

Cited By

View all
  • (2025)POSTER++: A simpler and stronger facial expression recognition networkPattern Recognition10.1016/j.patcog.2024.110951157(110951)Online publication date: Jan-2025
  • (2025)xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory NetworkWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_21(249-259)Online publication date: 31-Jan-2025
  • (2024)Recognizing facial expressions based on pyramid multi-head grid and spatial attention networkComputer Vision and Image Understanding10.1016/j.cviu.2024.104010244(104010)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural network
  2. deeply-supervised
  3. emotion recognition
  4. emotiw 2018 challenge
  5. side-output layers

Qualifiers

  • Short-paper

Funding Sources

  • the Research Grants Council of Hong Kong

Conference

ICMI '18
Sponsor:
  • SIGCHI

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)POSTER++: A simpler and stronger facial expression recognition networkPattern Recognition10.1016/j.patcog.2024.110951157(110951)Online publication date: Jan-2025
  • (2025)xLSTM-FER: Enhancing Student Expression Recognition with Extended Vision Long Short-Term Memory NetworkWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_21(249-259)Online publication date: 31-Jan-2025
  • (2024)Recognizing facial expressions based on pyramid multi-head grid and spatial attention networkComputer Vision and Image Understanding10.1016/j.cviu.2024.104010244(104010)Online publication date: Jul-2024
  • (2024)Empower smart cities with sampling-wise dynamic facial expression recognition via frame-sequence contrastive learningComputer Communications10.1016/j.comcom.2023.12.032216(130-139)Online publication date: Feb-2024
  • (2023)LASTNet: A Swin Transformer with LANets Network for Video emotion recognitionProceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering10.1145/3652628.3652676(291-294)Online publication date: 17-Nov-2023
  • (2023)Spatial-Temporal Graphs Plus Transformers for Geometry-Guided Facial Expression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.318173614:4(2751-2767)Online publication date: 1-Oct-2023
  • (2023)Human Emotion Recognition With Relational Region-Level AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306491814:1(650-663)Online publication date: 1-Jan-2023
  • (2023)Enhancing Safety in Vehicles using Emotion Recognition with Artificial Intelligence2023 IEEE 8th International Conference for Convergence in Technology (I2CT)10.1109/I2CT57861.2023.10126274(1-10)Online publication date: 7-Apr-2023
  • (2023)Frame Level Emotion Guided Dynamic Facial Expression Recognition with Emotion Grouping2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00602(5681-5691)Online publication date: Jun-2023
  • (2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media