Abstract
Facial Expression Recognition (FER) is crucial for human-computer interaction and has achieved satisfactory results on lab-collected datasets. However, occlusion and head pose variation in the real world make FER extremely challenging due to facial information deficiency. This paper proposes a novel Dual Stream Attention Network (DSAN) for occlusion and head pose robust FER. Specifically, DSAN consists of a Global Feature Element-based Attention Network (GFE-AN) and a Multi-Feature Fusion-based Attention Network (MFF-AN). A sparse attention block and a feature recalibration loss designed in GFE-AN selectively emphasize feature elements meaningful for facial expression and suppress those unrelated to facial expression. And a lightweight local feature attention block is customized in MFF-AN to extract rich semantic information from different representation sub-spaces. In addition, DSAN takes into account computation overhead minimization when designing model architecture. Extensive experiments on public benchmarks demonstrate that the proposed DSAN outperforms the state-of-the-art methods with 89.70% on RAF-DB, 89.93% on FERPlus, 65.77% on AffectNet-7, 62.13% on AffectNet-8. Moreover, the parameter size of DSAN is only 11.33M, which is lightweight compared to most of the recent in-the-wild FER algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
All datasets supporting the findings of this study are available and have been described in Sect. 4.1.
Notes
References
Darwin C (1872) The expression of the emotions in man and animals. John Murray, London
Tian Y-I, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115
Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput 13(3):1195–1215
Benitez-Quiroz CF, Srinivasan R, Martinez AM (2018) Discriminant functional learning of color features for the recognition of facial action units and their intensities. IEEE Trans Pattern Anal Mach Intell 41(12):2835–2845
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), pp 46–53
Shih FY, Chuang C-F, Wang PS (2008) Performance comparisons of facial expression recognition in Jaffe database. Int J Pattern Recognit Artif Intell 22(3):445–459
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.-Workshops (CVPRW), pp 94–101
Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Int. Workshop Emotion (satell. of LREC): Corpora Res. Emotion Affect. Paris, France
Wen Z, Lin W, Wang T, Xu G (2021) Distract your attention: multi-head cross attention network for facial expression recognition. arXiv prepr. arXiv:2109.07270
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 2852–2861
Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proc. 18th ACM Int. Conf. Multimodal Interact., pp 279–283
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Liu C, Hirota K, Dai Y (2023) Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf Sci 619:781–794
Zhao Z, Liu Q, Wang S (2021) Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans Image Process 30:6544–6556
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proc. Eur. Conf. Comput. Vis. Springer, pp 818–833
Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 7660–7669
Ma F, Sun B, Li S (2023) Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans Affect Comput 14(2):1236–1248
Sun M, Cui W, Zhang Y, Yu S, Liao X, Hu B, Li Y (2023) Attention-rectified and texture-enhanced cross-attention transformer feature fusion network for facial expression recognition. IEEE Trans Ind Inform 19:11823–11832
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 6897–6906
Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans Image Process 30:2016–2028
Jin X, Lai Z, Jin Z (2021) Learning dynamic relationships for facial expression recognition based on graph convolutional network. IEEE Trans Image Process 30:7143–7155
Ding H, Zhou SK, Chellappa R (2017) FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition. In: Proc. 2017 12th IEEE Int. Conf. Aut. Face Gesture Recognit. (FG 2017), pp 118–126
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 4510–4520
Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548
Nan Y, Ju J, Hua Q, Zhang H, Wang B (2022) A-MobileNet: an approach of facial expression recognition. Alex Eng J 61(6):4435–4444
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proc. Neural Inf. Proces. Syst., pp 5999–6009
Happy S, George A, Routray A (2012) A real time facial expression classification system using local binary patterns. In: Proc. 2012 4th Int. Conf. Intell. Hum. Comput. Interact. IEEE, pp 1–5
Kaya H, Gürpinar, F, Afshar S, Salah AA (2015) Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proc. ACM Int. Conf. Multimodal Interact., pp 459–466
Valstar MF, Mehu M, Jiang B, Pantic M, Scherer K (2012) Meta-analysis of the first facial expression recognition challenge. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(4):966–979
Kotsia I, Buciu I, Pitas I (2008) An analysis of facial expression recognition under partial facial image occlusion. Image Vis Comput 26(7):1052–1067
Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. In: Proc. IEEE Comput. Society Conf. Comput. Vis. Pattern Recognit., vol 2. IEEE, pp 568–573
Jiang B, Martinez B, Valstar MF, Pantic M (2014) Decision level fusion of domain specific regions for facial action recognition. In: Proc. 22th Int. Conf. Pattern Recognit. IEEE, pp 1776–1781
Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3D facial expression recognition. In: Proc. 20th Int. Conf. Pattern Recognit. IEEE, pp 4125–4128
Gritti T, Shan C, Jeanne V, Braspenning R (2008) Local features based facial expression recognition with face registration errors. In: Proc. 8th IEEE Int. Conf. Automatic Face Gesture Recognit. IEEE, pp 1–8
Umer S, Rout RK, Pero C, Nappi M (2022) Facial expression recognition with trade-offs between data augmentation and deep learning features. J Ambient Intell Humaniz Comput 13(2):721–735
Zhao X, Liang X, Liu L, Li T, Han Y, Vasconcelos N, Yan S (2016) Peak-piloted deep network for facial expression recognition. In: Proc. Eur. Conf. Comput. Vis.. Springer, pp 425–442
Wang Z, Chen J, Hoi SC (2021) Deep learning for image super-resolution: a survey. IEEE Trans Pattern Anal Mach Intell 43(10):3365–3387
Shao J, Cheng Q (2021) E-FCNN for tiny facial expression recognition. Appl Intell 51(1):549–559
Vo T-H, Lee G-S, Yang H-J, Kim S-H (2020) Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8:131988–132001
Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit-Workshops., pp 136–144
Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proc. Eur. Conf. Comput. Vis., pp 222–237
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Proc. Eur. Conf. Comput. Vis. Springer, pp 499–515
Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: Proc. 13th IEEE Int. Conf. Automatic Face Gesture Recognit.. IEEE, pp 302–309
Farzaneh AH, Qi X (2021) Facial expression recognition in the wild via deep attentive center loss. In: Proc. IEEE Winter Conf. Appl. Comput. Vis., pp 2402–2411
Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Proc. Asian Conf. Mach. Learn.. PMLR, pp 897–911
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Proc. Adv. Neural Inf. Process. Syst., pp 2204–2212
Chen W, Zhang D, Li M, Lee D-J (2020) STCAM: spatial-temporal and channel attention module for dynamic facial expression recognition. IEEE Trans Affect Comput 14:800–810
Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In: Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp 2278–2288. https://doi.org/10.1109/CVPRW.2017.282
Wu R, Zhang G, Lu S, Chen T (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 5021–5030
Ni R, Yang B, Zhou X, Cangelosi A, Liu X (2022) Facial expression recognition through cross-modality attention fusion. IEEE Trans Cogn Dev Syst 15:175–185
Gera D, Balasubramanian S (2021) Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition. Pattern Recognit Lett 145:58–66
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 770–778
Zhou S-Y, Su C-Y (2021) A novel lightweight convolutional neural network, exquisitenetv2. arXiv prepr. arXiv:2105.09008
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 7132–7141
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Inf Fusion 45:153–178
Zeng Z, Pantic M, Roisman GI, Huang TS (2008) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Li S, Kang X, Fang L, Hu J, Yin H (2017) Pixel-level image fusion: a survey of the state of the art. Inf Fusion 33:100–112
Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Han S, Liu P, Chen M, Tong Y (2019) Feature-level and model-level audiovisual fusion for emotion recognition in the wild. In: Proc. IEEE Conf. Multimedia Inf. Process. Retr.. IEEE, pp 443–448
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H (2013) Challenges in representation learning: a report on three machine learning contests. In: Proc. Int. Conf. Neural Inf. Process. Springer, pp 117–124
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Proc. Eur. Conf. Comput. Vis.. Springer, pp 87–102
She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 6248–6257
Albanie S, Nagrani A, Vedaldi A, Zisserman A (2018) Emotion recognition in speech using cross-modal transfer in the wild. In: Proc. 26th ACM Int. Conf. Multimedia, pp 292–301
Li H, Sui M, Zhao F, Zha Z, Wu F (2021) MVT: mask vision transformer for facial expression recognition in the wild. arXiv prepr. arXiv:2106.04520
Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proc. AAAI Conf. Artif. Intell., vol. 35, pp 3510–3519
Li H, Wang N, Yang X, Wang X, Gao X (2023) Unconstrained facial expression recognition with no-reference de-elements learning. IEEE Trans Affect Comput 15:173–185
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants (Grant Nos 61802184, 61972204, 62103110, 62103192, and 62102002).
Author information
Authors and Affiliations
Contributions
Conceptualization, Methodology, Data curation, Validation, Visualization, and Writing - original draft preparation: Hui Tang; Funding acquisition and Resources: Zhong Jin; Supervision: Yichang Li and Zhong Jin; Formal analysis, Investigation, and Writing - review and editing: Hui Tang, Yichang Li, and Zhong Jin.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, H., Li, Y. & Jin, Z. A dual stream attention network for facial expression recognition in the wild. Int. J. Mach. Learn. & Cyber. 15, 5863–5880 (2024). https://doi.org/10.1007/s13042-024-02287-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-024-02287-0