Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2818346.2830593acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning

Published: 09 November 2015 Publication History

Abstract

This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.

References

[1]
M. Boucart, J.-F. Dinon, P. Despretz, T. Desmettre, K. Hladiuk, and A. Oliva. Recognition of facial emotion in low vision: A flexible usage of facial features. Visual Neuroscience, 25(4):603--609, 2008.
[2]
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.
[3]
J. Chen, Z. Chen, Z. Chi, and H. Fu. Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 508--513, New York, NY, USA, 2014. ACM.
[4]
F. De la Torre, W.-S. Chu, X. Xiong, F. Vicente, X. Ding, and J. Cohn. Intraface. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, pages 1--8, May 2015.
[5]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 2106--2112, Nov 2011.
[6]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. MultiMedia, IEEE, 19(3):34--41, July 2012.
[7]
A. Dhall, R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 17th International Conference on Multimodal Interaction, ICMI '15. ACM, 2015.
[8]
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition, 2014.
[9]
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[10]
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, 64:59--63, 2015. Special Issue on "Deep Learning of Representations".
[11]
S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, M. Mirza, S. Jean, P.-L. Carrier, Y. Dauphin, N. Boulanger-Lewandowski, A. Aggarwal, J. Zumer, P. Lamblin, J.-P. Raymond, G. Desjardins, R. Pascanu, D. Warde-Farley, A. Torabi, A. Sharma, E. Bengio, M. Côté, K. R. Konda, and Z. Wu. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 543--550, New York, NY, USA, 2013. ACM.
[12]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.
[13]
M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen. Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 525--530, New York, NY, USA, 2013. ACM.
[14]
M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 494--501, New York, NY, USA, 2014. ACM.
[15]
A. Omigbodun and G. Cottrell. Is facial expression processing holistic? In Proceedings of the 35th Annual Conference of the Cognitive Science Society, CogSci 2013. CSS, 2013.
[16]
E. R. Prazak and E. D. Burgund. Keeping it real: Recognizing expressions in real compared to schematic faces. Visual Cognition, 22(5):737--750, 2014.
[17]
R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 759--766, New York, NY, USA, 2007. ACM.
[18]
O. Rudovic, V. Pavlovic, and M. Pantic. Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(5):944--958, May 2015.
[19]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), pages 1--42, April 2015.
[20]
J. R. S. Widen and A. Brooks. Anger and disgust: Discrete or overlapping categories? In Proceedings of the 2004 APS Annual Convention, 2004.
[21]
E. Sariyanidi, H. Gunes, and A. Cavallaro. Automatic analysis of facial affect: A survey of registration, representation, and recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(6):1113--1133, June 2015.
[22]
K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 517--524, New York, NY, USA, 2013. ACM.
[23]
B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 481--486, New York, NY, USA, 2014. ACM.
[24]
Y.-L. Tian, T. Kanade, and J. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97--115, Feb 2001.
[25]
P. A. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137--154, 2004.
[26]
J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems 27 (NIPS '14), pages 3320--3328. Curran Associates, Inc., 2014.
[27]
K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, and D. Feng. Learning realistic facial expressions from web images. Pattern Recognition, 46(8):2144--2155, 2013.
[28]
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.
[29]
L. Zhang, D. Tjondronegoro, and V. Chandran. Representation of facial expression categories in continuous arousal-valence space: Feature and correlation. Image and Vision Computing, 32(12):1067--1079, 2014.

Cited By

View all
  • (2024)Robust CNN for facial emotion recognition and real-time GUIAIMS Electronics and Electrical Engineering10.3934/electreng.20240108:2(217-236)Online publication date: 2024
  • (2024)Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive RegularizationSensors10.3390/s2411352224:11(3522)Online publication date: 30-May-2024
  • (2024)DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detectionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00335-92024:1Online publication date: 1-Apr-2024
  • Show More Cited By

Index Terms

  1. Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
    November 2015
    678 pages
    ISBN:9781450339124
    DOI:10.1145/2818346
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning networks
    2. emotion classification
    3. facial expression analysis

    Qualifiers

    • Research-article

    Funding Sources

    • Agency for Science Technology and Research

    Conference

    ICMI '15
    Sponsor:
    ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    November 9 - 13, 2015
    Washington, Seattle, USA

    Acceptance Rates

    ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)359
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Robust CNN for facial emotion recognition and real-time GUIAIMS Electronics and Electrical Engineering10.3934/electreng.20240108:2(217-236)Online publication date: 2024
    • (2024)Deep Transfer Learning Method Using Self-Pixel and Global Channel Attentive RegularizationSensors10.3390/s2411352224:11(3522)Online publication date: 30-May-2024
    • (2024)DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detectionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00335-92024:1Online publication date: 1-Apr-2024
    • (2024)Transfer Learning-Based Independent Component AnalysisIEEE Transactions on Automation Science and Engineering10.1109/TASE.2022.322929421:1(783-798)Online publication date: Jan-2024
    • (2024)Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329079515:2(697-708)Online publication date: Apr-2024
    • (2024)From the Lab to the Wild: Affect Modeling Via Privileged InformationIEEE Transactions on Affective Computing10.1109/TAFFC.2023.326507215:2(380-392)Online publication date: Apr-2024
    • (2024)Relevant Musical Schema-based Human Emotion Controller using Deep Learning Techniques2024 Third International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS)10.1109/INCOS59338.2024.10527687(1-5)Online publication date: 14-Mar-2024
    • (2024)TweetFeel: Analyzing Emotions in the Twittersphere2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS)10.1109/ICKECS61492.2024.10617413(1-6)Online publication date: 18-Apr-2024
    • (2024)Facial Emotion Recognition for Virtual Customer Service Agents2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)10.1109/IC3SE62002.2024.10593310(321-326)Online publication date: 9-May-2024
    • (2024)Optimized-CNN enabled Facial Emotion Recognition within Collaborative Edge Computing2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580276(12-17)Online publication date: 8-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media