A dual stream attention network for facial expression recognition in the wild

148 Accesses
Explore all metrics

Abstract

Facial Expression Recognition (FER) is crucial for human-computer interaction and has achieved satisfactory results on lab-collected datasets. However, occlusion and head pose variation in the real world make FER extremely challenging due to facial information deficiency. This paper proposes a novel Dual Stream Attention Network (DSAN) for occlusion and head pose robust FER. Specifically, DSAN consists of a Global Feature Element-based Attention Network (GFE-AN) and a Multi-Feature Fusion-based Attention Network (MFF-AN). A sparse attention block and a feature recalibration loss designed in GFE-AN selectively emphasize feature elements meaningful for facial expression and suppress those unrelated to facial expression. And a lightweight local feature attention block is customized in MFF-AN to extract rich semantic information from different representation sub-spaces. In addition, DSAN takes into account computation overhead minimization when designing model architecture. Extensive experiments on public benchmarks demonstrate that the proposed DSAN outperforms the state-of-the-art methods with 89.70% on RAF-DB, 89.93% on FERPlus, 65.77% on AffectNet-7, 62.13% on AffectNet-8. Moreover, the parameter size of DSAN is only 11.33M, which is lightweight compared to most of the recent in-the-wild FER algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

Article 05 January 2024

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Article Open access 08 February 2024

A Novel Facial Expression Recognition (FER) Model Using Multi-scale Attention Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

All datasets supporting the findings of this study are available and have been described in Sect. 4.1.

Notes

https://pytorch.org/.

References

Darwin C (1872) The expression of the emotions in man and animals. John Murray, London
Book Google Scholar
Tian Y-I, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115
Article Google Scholar
Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput 13(3):1195–1215
Article MathSciNet Google Scholar
Benitez-Quiroz CF, Srinivasan R, Martinez AM (2018) Discriminant functional learning of color features for the recognition of facial action units and their intensities. IEEE Trans Pattern Anal Mach Intell 41(12):2835–2845
Article Google Scholar
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), pp 46–53
Shih FY, Chuang C-F, Wang PS (2008) Performance comparisons of facial expression recognition in Jaffe database. Int J Pattern Recognit Artif Intell 22(3):445–459
Article Google Scholar
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.-Workshops (CVPRW), pp 94–101
Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Article Google Scholar
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Int. Workshop Emotion (satell. of LREC): Corpora Res. Emotion Affect. Paris, France
Wen Z, Lin W, Wang T, Xu G (2021) Distract your attention: multi-head cross attention network for facial expression recognition. arXiv prepr. arXiv:2109.07270
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 2852–2861
Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proc. 18th ACM Int. Conf. Multimodal Interact., pp 279–283
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Article Google Scholar
Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450
Article MathSciNet Google Scholar
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Article Google Scholar
Liu C, Hirota K, Dai Y (2023) Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf Sci 619:781–794
Article Google Scholar
Zhao Z, Liu Q, Wang S (2021) Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans Image Process 30:6544–6556
Article Google Scholar
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proc. Eur. Conf. Comput. Vis. Springer, pp 818–833
Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 7660–7669
Ma F, Sun B, Li S (2023) Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans Affect Comput 14(2):1236–1248
Article Google Scholar
Sun M, Cui W, Zhang Y, Yu S, Liao X, Hu B, Li Y (2023) Attention-rectified and texture-enhanced cross-attention transformer feature fusion network for facial expression recognition. IEEE Trans Ind Inform 19:11823–11832
Article Google Scholar
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 6897–6906
Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans Image Process 30:2016–2028
Article Google Scholar
Jin X, Lai Z, Jin Z (2021) Learning dynamic relationships for facial expression recognition based on graph convolutional network. IEEE Trans Image Process 30:7143–7155
Article Google Scholar
Ding H, Zhou SK, Chellappa R (2017) FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition. In: Proc. 2017 12th IEEE Int. Conf. Aut. Face Gesture Recognit. (FG 2017), pp 118–126
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 4510–4520
Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548
Article Google Scholar
Nan Y, Ju J, Hua Q, Zhang H, Wang B (2022) A-MobileNet: an approach of facial expression recognition. Alex Eng J 61(6):4435–4444
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proc. Neural Inf. Proces. Syst., pp 5999–6009
Happy S, George A, Routray A (2012) A real time facial expression classification system using local binary patterns. In: Proc. 2012 4th Int. Conf. Intell. Hum. Comput. Interact. IEEE, pp 1–5
Kaya H, Gürpinar, F, Afshar S, Salah AA (2015) Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proc. ACM Int. Conf. Multimodal Interact., pp 459–466
Valstar MF, Mehu M, Jiang B, Pantic M, Scherer K (2012) Meta-analysis of the first facial expression recognition challenge. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(4):966–979
Article Google Scholar
Kotsia I, Buciu I, Pitas I (2008) An analysis of facial expression recognition under partial facial image occlusion. Image Vis Comput 26(7):1052–1067
Article Google Scholar
Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. In: Proc. IEEE Comput. Society Conf. Comput. Vis. Pattern Recognit., vol 2. IEEE, pp 568–573
Jiang B, Martinez B, Valstar MF, Pantic M (2014) Decision level fusion of domain specific regions for facial action recognition. In: Proc. 22th Int. Conf. Pattern Recognit. IEEE, pp 1776–1781
Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3D facial expression recognition. In: Proc. 20th Int. Conf. Pattern Recognit. IEEE, pp 4125–4128
Gritti T, Shan C, Jeanne V, Braspenning R (2008) Local features based facial expression recognition with face registration errors. In: Proc. 8th IEEE Int. Conf. Automatic Face Gesture Recognit. IEEE, pp 1–8
Umer S, Rout RK, Pero C, Nappi M (2022) Facial expression recognition with trade-offs between data augmentation and deep learning features. J Ambient Intell Humaniz Comput 13(2):721–735
Article Google Scholar
Zhao X, Liang X, Liu L, Li T, Han Y, Vasconcelos N, Yan S (2016) Peak-piloted deep network for facial expression recognition. In: Proc. Eur. Conf. Comput. Vis.. Springer, pp 425–442
Wang Z, Chen J, Hoi SC (2021) Deep learning for image super-resolution: a survey. IEEE Trans Pattern Anal Mach Intell 43(10):3365–3387
Article Google Scholar
Shao J, Cheng Q (2021) E-FCNN for tiny facial expression recognition. Appl Intell 51(1):549–559
Article Google Scholar
Vo T-H, Lee G-S, Yang H-J, Kim S-H (2020) Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8:131988–132001
Article Google Scholar
Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit-Workshops., pp 136–144
Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proc. Eur. Conf. Comput. Vis., pp 222–237
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Proc. Eur. Conf. Comput. Vis. Springer, pp 499–515
Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: Proc. 13th IEEE Int. Conf. Automatic Face Gesture Recognit.. IEEE, pp 302–309
Farzaneh AH, Qi X (2021) Facial expression recognition in the wild via deep attentive center loss. In: Proc. IEEE Winter Conf. Appl. Comput. Vis., pp 2402–2411
Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Proc. Asian Conf. Mach. Learn.. PMLR, pp 897–911
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Proc. Adv. Neural Inf. Process. Syst., pp 2204–2212
Chen W, Zhang D, Li M, Lee D-J (2020) STCAM: spatial-temporal and channel attention module for dynamic facial expression recognition. IEEE Trans Affect Comput 14:800–810
Article Google Scholar
Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In: Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp 2278–2288. https://doi.org/10.1109/CVPRW.2017.282
Wu R, Zhang G, Lu S, Chen T (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 5021–5030
Ni R, Yang B, Zhou X, Cangelosi A, Liu X (2022) Facial expression recognition through cross-modality attention fusion. IEEE Trans Cogn Dev Syst 15:175–185
Article Google Scholar
Gera D, Balasubramanian S (2021) Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition. Pattern Recognit Lett 145:58–66
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 770–778
Zhou S-Y, Su C-Y (2021) A novel lightweight convolutional neural network, exquisitenetv2. arXiv prepr. arXiv:2105.09008
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 7132–7141
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Inf Fusion 45:153–178
Article Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2008) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar
Li S, Kang X, Fang L, Hu J, Yin H (2017) Pixel-level image fusion: a survey of the state of the art. Inf Fusion 33:100–112
Article Google Scholar
Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Han S, Liu P, Chen M, Tong Y (2019) Feature-level and model-level audiovisual fusion for emotion recognition in the wild. In: Proc. IEEE Conf. Multimedia Inf. Process. Retr.. IEEE, pp 443–448
Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H (2013) Challenges in representation learning: a report on three machine learning contests. In: Proc. Int. Conf. Neural Inf. Process. Springer, pp 117–124
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Proc. Eur. Conf. Comput. Vis.. Springer, pp 87–102
She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 6248–6257
Albanie S, Nagrani A, Vedaldi A, Zisserman A (2018) Emotion recognition in speech using cross-modal transfer in the wild. In: Proc. 26th ACM Int. Conf. Multimedia, pp 292–301
Li H, Sui M, Zhao F, Zha Z, Wu F (2021) MVT: mask vision transformer for facial expression recognition in the wild. arXiv prepr. arXiv:2106.04520
Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proc. AAAI Conf. Artif. Intell., vol. 35, pp 3510–3519
Li H, Wang N, Yang X, Wang X, Gao X (2023) Unconstrained facial expression recognition with no-reference de-elements learning. IEEE Trans Affect Comput 15:173–185
Article Google Scholar
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants (Grant Nos 61802184, 61972204, 62103110, 62103192, and 62102002).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Nanjing University of Science and Technology, No. 200 Xiao Lingwei Street, Nanjing, 210094, Jiangsu, China
Hui Tang & Zhong Jin
Department of Computer, China University of Petroleum-Beijing at Karamay, No. 255 Anding Road, Karamay, 834000, Xinjiang, China
Yichang Li & Zhong Jin

Authors

Hui Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yichang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Jin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Methodology, Data curation, Validation, Visualization, and Writing - original draft preparation: Hui Tang; Funding acquisition and Resources: Zhong Jin; Supervision: Yichang Li and Zhong Jin; Formal analysis, Investigation, and Writing - review and editing: Hui Tang, Yichang Li, and Zhong Jin.

Corresponding author

Correspondence to Zhong Jin.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, H., Li, Y. & Jin, Z. A dual stream attention network for facial expression recognition in the wild. Int. J. Mach. Learn. & Cyber. 15, 5863–5880 (2024). https://doi.org/10.1007/s13042-024-02287-0

Download citation

Received: 03 September 2023
Accepted: 14 July 2024
Published: 23 July 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s13042-024-02287-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

A Novel Facial Expression Recognition (FER) Model Using Multi-scale Attention Network

Data availability and access

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A dual stream attention network for facial expression recognition in the wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

A Novel Facial Expression Recognition (FER) Model Using Multi-scale Attention Network

Explore related subjects

Data availability and access

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation