Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

A dual stream attention network for facial expression recognition in the wild

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Facial Expression Recognition (FER) is crucial for human-computer interaction and has achieved satisfactory results on lab-collected datasets. However, occlusion and head pose variation in the real world make FER extremely challenging due to facial information deficiency. This paper proposes a novel Dual Stream Attention Network (DSAN) for occlusion and head pose robust FER. Specifically, DSAN consists of a Global Feature Element-based Attention Network (GFE-AN) and a Multi-Feature Fusion-based Attention Network (MFF-AN). A sparse attention block and a feature recalibration loss designed in GFE-AN selectively emphasize feature elements meaningful for facial expression and suppress those unrelated to facial expression. And a lightweight local feature attention block is customized in MFF-AN to extract rich semantic information from different representation sub-spaces. In addition, DSAN takes into account computation overhead minimization when designing model architecture. Extensive experiments on public benchmarks demonstrate that the proposed DSAN outperforms the state-of-the-art methods with 89.70% on RAF-DB, 89.93% on FERPlus, 65.77% on AffectNet-7, 62.13% on AffectNet-8. Moreover, the parameter size of DSAN is only 11.33M, which is lightweight compared to most of the recent in-the-wild FER algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

All datasets supporting the findings of this study are available and have been described in Sect. 4.1.

Notes

  1. https://pytorch.org/.

References

  1. Darwin C (1872) The expression of the emotions in man and animals. John Murray, London

    Book  Google Scholar 

  2. Tian Y-I, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115

    Article  Google Scholar 

  3. Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput 13(3):1195–1215

    Article  MathSciNet  Google Scholar 

  4. Benitez-Quiroz CF, Srinivasan R, Martinez AM (2018) Discriminant functional learning of color features for the recognition of facial action units and their intensities. IEEE Trans Pattern Anal Mach Intell 41(12):2835–2845

    Article  Google Scholar 

  5. Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), pp 46–53

  6. Shih FY, Chuang C-F, Wang PS (2008) Performance comparisons of facial expression recognition in Jaffe database. Int J Pattern Recognit Artif Intell 22(3):445–459

    Article  Google Scholar 

  7. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.-Workshops (CVPRW), pp 94–101

  8. Zhao G, Huang X, Taini M, Li SZ, PietikäInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619

    Article  Google Scholar 

  9. Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Int. Workshop Emotion (satell. of LREC): Corpora Res. Emotion Affect. Paris, France

  10. Wen Z, Lin W, Wang T, Xu G (2021) Distract your attention: multi-head cross attention network for facial expression recognition. arXiv prepr. arXiv:2109.07270

  11. Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 2852–2861

  12. Barsoum E, Zhang C, Ferrer CC, Zhang Z (2016) Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proc. 18th ACM Int. Conf. Multimodal Interact., pp 279–283

  13. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  14. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  15. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  Google Scholar 

  16. Liu C, Hirota K, Dai Y (2023) Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf Sci 619:781–794

    Article  Google Scholar 

  17. Zhao Z, Liu Q, Wang S (2021) Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans Image Process 30:6544–6556

    Article  Google Scholar 

  18. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proc. Eur. Conf. Comput. Vis. Springer, pp 818–833

  19. Ruan D, Yan Y, Lai S, Chai Z, Shen C, Wang H (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 7660–7669

  20. Ma F, Sun B, Li S (2023) Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans Affect Comput 14(2):1236–1248

    Article  Google Scholar 

  21. Sun M, Cui W, Zhang Y, Yu S, Liao X, Hu B, Li Y (2023) Attention-rectified and texture-enhanced cross-attention transformer feature fusion network for facial expression recognition. IEEE Trans Ind Inform 19:11823–11832

    Article  Google Scholar 

  22. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 6897–6906

  23. Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans Image Process 30:2016–2028

    Article  Google Scholar 

  24. Jin X, Lai Z, Jin Z (2021) Learning dynamic relationships for facial expression recognition based on graph convolutional network. IEEE Trans Image Process 30:7143–7155

    Article  Google Scholar 

  25. Ding H, Zhou SK, Chellappa R (2017) FaceNet2ExpNet: regularizing a deep face recognition net for expression recognition. In: Proc. 2017 12th IEEE Int. Conf. Aut. Face Gesture Recognit. (FG 2017), pp 118–126

  26. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp 4510–4520

  27. Fu Y, Wu X, Li X, Pan Z, Luo D (2020) Semantic neighborhood-aware deep facial expression recognition. IEEE Trans Image Process 29:6535–6548

    Article  Google Scholar 

  28. Nan Y, Ju J, Hua Q, Zhang H, Wang B (2022) A-MobileNet: an approach of facial expression recognition. Alex Eng J 61(6):4435–4444

    Article  Google Scholar 

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proc. Neural Inf. Proces. Syst., pp 5999–6009

  30. Happy S, George A, Routray A (2012) A real time facial expression classification system using local binary patterns. In: Proc. 2012 4th Int. Conf. Intell. Hum. Comput. Interact. IEEE, pp 1–5

  31. Kaya H, Gürpinar, F, Afshar S, Salah AA (2015) Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proc. ACM Int. Conf. Multimodal Interact., pp 459–466

  32. Valstar MF, Mehu M, Jiang B, Pantic M, Scherer K (2012) Meta-analysis of the first facial expression recognition challenge. IEEE Trans Syst Man Cybern Part B (Cybernetics) 42(4):966–979

    Article  Google Scholar 

  33. Kotsia I, Buciu I, Pitas I (2008) An analysis of facial expression recognition under partial facial image occlusion. Image Vis Comput 26(7):1052–1067

    Article  Google Scholar 

  34. Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. In: Proc. IEEE Comput. Society Conf. Comput. Vis. Pattern Recognit., vol 2. IEEE, pp 568–573

  35. Jiang B, Martinez B, Valstar MF, Pantic M (2014) Decision level fusion of domain specific regions for facial action recognition. In: Proc. 22th Int. Conf. Pattern Recognit. IEEE, pp 1776–1781

  36. Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3D facial expression recognition. In: Proc. 20th Int. Conf. Pattern Recognit. IEEE, pp 4125–4128

  37. Gritti T, Shan C, Jeanne V, Braspenning R (2008) Local features based facial expression recognition with face registration errors. In: Proc. 8th IEEE Int. Conf. Automatic Face Gesture Recognit. IEEE, pp 1–8

  38. Umer S, Rout RK, Pero C, Nappi M (2022) Facial expression recognition with trade-offs between data augmentation and deep learning features. J Ambient Intell Humaniz Comput 13(2):721–735

    Article  Google Scholar 

  39. Zhao X, Liang X, Liu L, Li T, Han Y, Vasconcelos N, Yan S (2016) Peak-piloted deep network for facial expression recognition. In: Proc. Eur. Conf. Comput. Vis.. Springer, pp 425–442

  40. Wang Z, Chen J, Hoi SC (2021) Deep learning for image super-resolution: a survey. IEEE Trans Pattern Anal Mach Intell 43(10):3365–3387

    Article  Google Scholar 

  41. Shao J, Cheng Q (2021) E-FCNN for tiny facial expression recognition. Appl Intell 51(1):549–559

    Article  Google Scholar 

  42. Vo T-H, Lee G-S, Yang H-J, Kim S-H (2020) Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8:131988–132001

    Article  Google Scholar 

  43. Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit-Workshops., pp 136–144

  44. Zeng J, Shan S, Chen X (2018) Facial expression recognition with inconsistently annotated datasets. In: Proc. Eur. Conf. Comput. Vis., pp 222–237

  45. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Proc. Eur. Conf. Comput. Vis. Springer, pp 499–515

  46. Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Tong Y (2018) Island loss for learning discriminative features in facial expression recognition. In: Proc. 13th IEEE Int. Conf. Automatic Face Gesture Recognit.. IEEE, pp 302–309

  47. Farzaneh AH, Qi X (2021) Facial expression recognition in the wild via deep attentive center loss. In: Proc. IEEE Winter Conf. Appl. Comput. Vis., pp 2402–2411

  48. Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Proc. Asian Conf. Mach. Learn.. PMLR, pp 897–911

  49. Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Proc. Adv. Neural Inf. Process. Syst., pp 2204–2212

  50. Chen W, Zhang D, Li M, Lee D-J (2020) STCAM: spatial-temporal and channel attention module for dynamic facial expression recognition. IEEE Trans Affect Comput 14:800–810

    Article  Google Scholar 

  51. Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In: Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp 2278–2288. https://doi.org/10.1109/CVPRW.2017.282

  52. Wu R, Zhang G, Lu S, Chen T (2020) Cascade EF-GAN: progressive facial expression editing with local focuses. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 5021–5030

  53. Ni R, Yang B, Zhou X, Cangelosi A, Liu X (2022) Facial expression recognition through cross-modality attention fusion. IEEE Trans Cogn Dev Syst 15:175–185

    Article  Google Scholar 

  54. Gera D, Balasubramanian S (2021) Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition. Pattern Recognit Lett 145:58–66

    Article  Google Scholar 

  55. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 770–778

  56. Zhou S-Y, Su C-Y (2021) A novel lightweight convolutional neural network, exquisitenetv2. arXiv prepr. arXiv:2105.09008

  57. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp 7132–7141

  58. Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Inf Fusion 45:153–178

    Article  Google Scholar 

  59. Zeng Z, Pantic M, Roisman GI, Huang TS (2008) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Article  Google Scholar 

  60. Li S, Kang X, Fang L, Hu J, Yin H (2017) Pixel-level image fusion: a survey of the state of the art. Inf Fusion 33:100–112

    Article  Google Scholar 

  61. Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Han S, Liu P, Chen M, Tong Y (2019) Feature-level and model-level audiovisual fusion for emotion recognition in the wild. In: Proc. IEEE Conf. Multimedia Inf. Process. Retr.. IEEE, pp 443–448

  62. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H (2013) Challenges in representation learning: a report on three machine learning contests. In: Proc. Int. Conf. Neural Inf. Process. Springer, pp 117–124

  63. Guo Y, Zhang L, Hu Y, He X, Gao J (2016) MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Proc. Eur. Conf. Comput. Vis.. Springer, pp 87–102

  64. She J, Hu Y, Shi H, Wang J, Shen Q, Mei T (2021) Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 6248–6257

  65. Albanie S, Nagrani A, Vedaldi A, Zisserman A (2018) Emotion recognition in speech using cross-modal transfer in the wild. In: Proc. 26th ACM Int. Conf. Multimedia, pp 292–301

  66. Li H, Sui M, Zhao F, Zha Z, Wu F (2021) MVT: mask vision transformer for facial expression recognition in the wild. arXiv prepr. arXiv:2106.04520

  67. Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proc. AAAI Conf. Artif. Intell., vol. 35, pp 3510–3519

  68. Li H, Wang N, Yang X, Wang X, Gao X (2023) Unconstrained facial expression recognition with no-reference de-elements learning. IEEE Trans Affect Comput 15:173–185

    Article  Google Scholar 

  69. Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants (Grant Nos 61802184, 61972204, 62103110, 62103192, and 62102002).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Methodology, Data curation, Validation, Visualization, and Writing - original draft preparation: Hui Tang; Funding acquisition and Resources: Zhong Jin; Supervision: Yichang Li and Zhong Jin; Formal analysis, Investigation, and Writing - review and editing: Hui Tang, Yichang Li, and Zhong Jin.

Corresponding author

Correspondence to Zhong Jin.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, H., Li, Y. & Jin, Z. A dual stream attention network for facial expression recognition in the wild. Int. J. Mach. Learn. & Cyber. 15, 5863–5880 (2024). https://doi.org/10.1007/s13042-024-02287-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-024-02287-0

Keywords

Navigation