research-article

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning

Authors:

Viet Dung Nguyen,

Vassilios Vonikakis,

Stefan WinklerAuthors Info & Claims

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 443 - 449

https://doi.org/10.1145/2818346.2830593

Published: 09 November 2015 Publication History

Abstract

This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.

References

[1]

M. Boucart, J.-F. Dinon, P. Despretz, T. Desmettre, K. Hladiuk, and A. Oliva. Recognition of facial emotion in low vision: A flexible usage of facial features. Visual Neuroscience, 25(4):603--609, 2008.

[2]

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.

[3]

J. Chen, Z. Chen, Z. Chi, and H. Fu. Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 508--513, New York, NY, USA, 2014. ACM.

Digital Library

[4]

F. De la Torre, W.-S. Chu, X. Xiong, F. Vicente, X. Ding, and J. Cohn. Intraface. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, pages 1--8, May 2015.

[5]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 2106--2112, Nov 2011.

[6]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. MultiMedia, IEEE, 19(3):34--41, July 2012.

Digital Library

[7]

A. Dhall, R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 17th International Conference on Multimodal Interaction, ICMI '15. ACM, 2015.

Digital Library

[8]

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition, 2014.

[9]

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

Digital Library

[10]

I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, 64:59--63, 2015. Special Issue on "Deep Learning of Representations".

Digital Library

[11]

S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, M. Mirza, S. Jean, P.-L. Carrier, Y. Dauphin, N. Boulanger-Lewandowski, A. Aggarwal, J. Zumer, P. Lamblin, J.-P. Raymond, G. Desjardins, R. Pascanu, D. Warde-Farley, A. Torabi, A. Sharma, E. Bengio, M. Côté, K. R. Konda, and Z. Wu. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 543--550, New York, NY, USA, 2013. ACM.

Digital Library

[12]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.

Digital Library

[13]

M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen. Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 525--530, New York, NY, USA, 2013. ACM.

Digital Library

[14]

M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 494--501, New York, NY, USA, 2014. ACM.

Digital Library

[15]

A. Omigbodun and G. Cottrell. Is facial expression processing holistic? In Proceedings of the 35th Annual Conference of the Cognitive Science Society, CogSci 2013. CSS, 2013.

[16]

E. R. Prazak and E. D. Burgund. Keeping it real: Recognizing expressions in real compared to schematic faces. Visual Cognition, 22(5):737--750, 2014.

[17]

R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 759--766, New York, NY, USA, 2007. ACM.

Digital Library

[18]

O. Rudovic, V. Pavlovic, and M. Pantic. Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(5):944--958, May 2015.

[19]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), pages 1--42, April 2015.

Digital Library

[20]

J. R. S. Widen and A. Brooks. Anger and disgust: Discrete or overlapping categories? In Proceedings of the 2004 APS Annual Convention, 2004.

[21]

E. Sariyanidi, H. Gunes, and A. Cavallaro. Automatic analysis of facial affect: A survey of registration, representation, and recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(6):1113--1133, June 2015.

Digital Library

[22]

K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 517--524, New York, NY, USA, 2013. ACM.

Digital Library

[23]

B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 481--486, New York, NY, USA, 2014. ACM.

Digital Library

[24]

Y.-L. Tian, T. Kanade, and J. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97--115, Feb 2001.

Digital Library

[25]

P. A. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137--154, 2004.

Digital Library

[26]

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems 27 (NIPS '14), pages 3320--3328. Curran Associates, Inc., 2014.

[27]

K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, and D. Feng. Learning realistic facial expressions from web images. Pattern Recognition, 46(8):2144--2155, 2013.

Digital Library

[28]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.

[29]

L. Zhang, D. Tjondronegoro, and V. Chandran. Representation of facial expression categories in continuous arousal-valence space: Feature and correlation. Image and Vision Computing, 32(12):1067--1079, 2014.

Digital Library

Cited By

Konstantakis PManousidaki MTzortzakis S(2025)Encrypted optical information in nonlinear chaotic systems uncovered using neural networksOptica10.1364/OPTICA.53064312:2(131)Online publication date: 30-Jan-2025
https://doi.org/10.1364/OPTICA.530643
Kurian ATripathi S(2025)m_AutNet–A Framework for Personalized Multimodal Emotion Recognition in Autistic ChildrenIEEE Access10.1109/ACCESS.2024.340308713(1651-1662)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3403087
Zheng GIvanov DBrintrup A(2025)An adaptive federated learning system for information sharing in supply chainsInternational Journal of Production Research10.1080/00207543.2024.2432469(1-23)Online publication date: 4-Jan-2025
https://doi.org/10.1080/00207543.2024.2432469
Show More Cited By

Index Terms

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
Dynamics of Information Systems
Abstract
Automatic recognition of facial expressions is a common problem in human-computer interaction. While humans can recognize facial expressions very easily, machines cannot do it as easily as humans. Analyzing facial changes during facial expressions ...
Group-level emotion recognition using transfer learning from face identification
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

In this paper, we describe our algorithmic approach, which was used for submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017) group-level emotion recognition sub-challenge. We extracted feature vectors of detected faces using the ...
An Improved Face-Emotion Recognition to Automatically Generate Human Expression With Emoticons

Any human face image expression naturally identifies expressions of happy, sad etc.; sometimes human facial image expression recognition is complex, and it is a combination of two emotions. The existing literature provides face emotion classification ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

November 2015

678 pages

ISBN:9781450339124

DOI:10.1145/2818346

General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Agency for Science Technology and Research

Conference

ICMI '15

Sponsor:

SIGCHI

ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 9 - 13, 2015

Washington, Seattle, USA

Acceptance Rates

ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

419
Total Citations
View Citations
5,203
Total Downloads

Downloads (Last 12 months)317
Downloads (Last 6 weeks)32

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Konstantakis PManousidaki MTzortzakis S(2025)Encrypted optical information in nonlinear chaotic systems uncovered using neural networksOptica10.1364/OPTICA.53064312:2(131)Online publication date: 30-Jan-2025
https://doi.org/10.1364/OPTICA.530643
Kurian ATripathi S(2025)m_AutNet–A Framework for Personalized Multimodal Emotion Recognition in Autistic ChildrenIEEE Access10.1109/ACCESS.2024.340308713(1651-1662)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3403087
Zheng GIvanov DBrintrup A(2025)An adaptive federated learning system for information sharing in supply chainsInternational Journal of Production Research10.1080/00207543.2024.2432469(1-23)Online publication date: 4-Jan-2025
https://doi.org/10.1080/00207543.2024.2432469
Liu S(2025)A Detection Method for Fatigue Driving Based on Facial Emotion RecognitionProceedings of the Second International Conference on Artificial Intelligence and Communication Technologies (ICAICT 2024)10.1007/978-981-96-0092-2_20(263-274)Online publication date: 7-Mar-2025
https://doi.org/10.1007/978-981-96-0092-2_20
Ali IGhaffar F(2024)Robust CNN for facial emotion recognition and real-time GUIAIMS Electronics and Electrical Engineering10.3934/electreng.20240108:2(227-246)Online publication date: 2024
https://doi.org/10.3934/electreng.2024010
Kang CKang S(2024)Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive RegularizationSensors10.3390/s2411352224:11(3522)Online publication date: 30-May-2024
https://doi.org/10.3390/s24113522
Mahum RIrtaza AJaved AMahmoud HHassan H(2024)DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detectionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00335-92024:1Online publication date: 1-Apr-2024
https://doi.org/10.1186/s13636-024-00335-9
Zheng ZZhao WHable BGong YWang XShannon RLiu K(2024)Transfer Learning-Based Independent Component AnalysisIEEE Transactions on Automation Science and Engineering10.1109/TASE.2022.322929421:1(783-798)Online publication date: Jan-2024
https://doi.org/10.1109/TASE.2022.3229294
Gao YWang LLiu JDang JOkada S(2024)Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329079515:2(697-708)Online publication date: Apr-2024
https://doi.org/10.1109/TAFFC.2023.3290795
Makantasis KPinitas KLiapis AYannakakis G(2024)From the Lab to the Wild: Affect Modeling Via Privileged InformationIEEE Transactions on Affective Computing10.1109/TAFFC.2023.326507215:2(380-392)Online publication date: Apr-2024
https://doi.org/10.1109/TAFFC.2023.3265072
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten