Abstract
Facial expression recognition (FER) on real world databases is an active and challenging research topic. Existing CNN-based facial expression classifiers usually have good performance on common expressions, including happy and surprise, but have lower accuracy on difficult expressions, such as disgust and fear. Two main factors are responsible for this problem. Firstly, intra-class variation makes classification of difficult expressions more complex than other expressions. Secondly, severe data imbalance of difficult expressions in most FER datasets leads to overfitting during training. In this work, a new network architecture is proposed to address the intra-class variation problem. The proposed model consists of a spatial enhancement module and a semantic aggregation module to enhance fine-level expression features and high-level semantic features. To alleviate the data imbalance problem, an iterative learning method is introduced to collect difficult expression samples. New samples with inconsistent labels are classified by using a fuzzy clustering algorithm. The proposed FER framework has been evaluated on three real world expression datasets. Experimental results demonstrate that the proposed method significantly improved the recognition accuracy of difficult expressions and achieved top performance compared with state-of-the-art works.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Du S, Tao Y, Martinez A (2014) Compound facial expressions of emotion. Proc Natl Acad Sci 111:1454–1462
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. IEEE/CVF conference on computer vision and pattern recognition, pp 2584–2593
Zhang F, Zhang T, Mao Q, Xu C (2018) Joint pose and expression modeling for facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition, pp 3359–3368
Lin F, Hong R, Zhou W, Li H (2018) Facial expression recognition with data augmentation and compact feature learning. IEEE international conference on image processing, https://doi.org/10.1109/ICIP.2018.8451039
Agarwal S, Mukherjee DP (2019) Synthesis of realistic facial expressions using expression map. IEEE Trans Multimed 21:902–914
Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28:2439–2450. https://doi.org/10.1109/TIP.2018.2886767
Kim D, Baddar WJ, Jang J, Ro YM (2017) Multi-objective based spatial-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput 10:223–236. https://doi.org/10.1109/TAFFC.2017.2695999
Ma H, Celik T (2019) Fer-net facial expression recognition using densely connected convolutional network. Electron Lett 55:184–186
Zhang X, Ma Y (2019) Learning of complicate facial expression categories. International conference on image, video and signal process
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH (2015) Challenges in representation learning: A report on three machine learning contests. Neural Netw 64:59–63. https://doi.org/j.neunet.2014.09.005
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International conference on learning representations
Lin T, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE/CVF conference on computer vision and pattern recognition
Fu C, Liu W, Ranga A, Tyagi A, Berg A (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Kuo C, Lai S, Sarkis M (2018) A compact deep learning model for robust facial expression recognition. IEEE/CVF conference on computer vision and pattern recognition workshops, https://doi.org/10.1109/CVPRW.2018.00286
Xie S, Hu H (2019) Facial expression recognition using hierarchical features with deep comprehensive multi-patches aggregation convolutional neural networks. IEEE Trans Multimed 21:211–220
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10:18–31
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE/CVF Conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Meng Z, Liu P, Cai J, Han S, Tong Y (2017) Identity-aware convolutional neural network for facial expression recognition. IEEE international conference on automatic face and gesture recognition, pp 558–565
Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649
Zia MS, Hussain M, Jaffar MA (2018) A novel spontaneous facial expression recognition using dynamically weighted majority voting based ensemble classifier. Multimed Tools Appl 77:25537–25567
Li D, Wen G, Li X, Cai X (2019) Graph-based dynamic ensemble pruning for facial expression recognition. Appl Intell 49:3188–3206
Li H, Wen G (2019) Sample awareness-based personalized facial expression recognition. Appl Intell 49:2956–2969
Lopes A, Aguiar E, Souza AD, Oliveira-Santos T (2017) Facial expression recognition with convolutional neural networks: Coping with few data and the training sample order. Pattern Recogn 61:610–628
Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications 91:464–471. https://doi.org/10.1016/j.eswa.2017.09.030
Li S, Deng W (2016) Real world expression recognition: A highly imbalanced detection problem. IEEE international conference on biometrics, pp 1–6. https://doi.org/10.1109/ICB.2016.7550074
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. IEEE International conference on computer vision workshops, pp 2106–2112. https://doi.org/10.1109/ICCVW.2011.6130508
Ekman P, Friesen W (1978) Facial action coding system: A technique for the measurement of facial movement. Facial action coding system
Liu M, Li S, Shan S, Chen X (2015) Au-inspired deep networks for facial expression feature learning. Neurocomputing 159:126–136
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. International conference on learning representations
Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. European conference on computer vision, pp 1–16
Wang Z (2020) A new clustering method based on morphological operations. Expert Sys Appl, vol 145
Wang Z (2017) Determining the clustering centers by slope difference distribution. IEEE Access 5:10995–11002
Pantic M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. IEEE International conference on multimedia and expo, pp 317–321. https://doi.org/10.1109/ICME.2005.1521424
Lucey P, Cohn JF, Kanade T, Saragih J (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. IEEE conference on computer vision and pattern recognition workshops, pp 94–101
Lyons MJ, Akamatsu S, Kamachi M, Gyoba J, Budynek J (1998) The japanese female facial expression (jaffe) database. Proceedings of third international conference on automatic face and gesture recognition, pp 14–16
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36:1532–1545
Zhao C, Chen K, Wei Z, Chen Y, Miao D, Wang W (2019) Multilevel triplet deep learning model for person re-identification. Pattern Recogn Lett 117:161–168
Zhao C, Lv X, Zhang Z, Zuo W, Wu J, Miao D (2020) Deep fusion feature representation learning with hard mining center-triplet loss for person re-identification. IEEE Trans Multimed 22:3180–3195
Li S, Den W (2020) A deeper look at facial expression dataset bias. IEEE Trans Affect Comput, pp 1–13
Nguyen D, Kim S, Lee G, Yang H, Na I, Kim S (2020) Facial expression recognition using a temporal ensemble of multi-level convolutional neural networks. IEEE Trans Affect Comput, pp 1–12
Georgescu M, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
Tsai KY, Ding JJ, Lee YC (2018) Frontalization with adaptive exponentially-weighted average ensemble rule for deep learning based facial expression recognition. IEEE Asia Pacific conference on circuits and systems, pp 447–450
Acharya D, Huang Z, Paudel D, Gool LV (2018) Covariance pooling for facial expression recognition. IEEE conference on computer vision and pattern recognition, pp 2584–2593
Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the National Natural Science Foundation of China under Grant 61461039.
Rights and permissions
About this article
Cite this article
Ma, Y., Wang, X. & Wei, L. Multi-level spatial and semantic enhancement network for expression recognition. Appl Intell 51, 8565–8578 (2021). https://doi.org/10.1007/s10489-021-02254-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02254-0