Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Two-pathway attention network for real-time facial expression recognition

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Many scholars are committed to using deep learning methods to study facial expression recognition (FER). In recent years, FER has gradually been confined to psychology research in the early days to now involves knowledge of many disciplines such as physiology, psychology, cognition and medicine. With the extreme achievement of computer version techniques, various convolutional neural network structures were developed for real-time and accurate FER. There are two main problems in the existing convolutional neural network for handling FER problems: insufficient training data caused over-fitting and expression-unrelated intra-class differences. In this paper, we propose a two-pathway attention network to solve these two problems better. We suppress the intra-class differences efficiently by extracting facial regions based on facial muscle movements driven by facial expressions. We prevent deep networks from insufficient training data by extensively extracting global structures and local facial regions as the training dataset to feed a two-pathway ensemble model. Further more, we weight the whole feature maps from the global image and local regions by introducing an attention mechanism module to reweighs each part according to its contribution to FER. We adopt real-time facial region extraction and multi-layer feature data compression to ensure the real-time performance of the algorithm and reduce the amount of parameters in ensemble model. Experiments on public datasets suggest that our method certifies its effectiveness, reaches human-level performance, and outperforms current state-of-the-art methods with 92.8% on CK+ and 87.0% on FERPLUS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Amos, B., Ludwiczuk, B., Satyanarayanan, M., et al.: Openface: a general-purpose face recognition library with mobile applications. CMU Sch Comput Sci 6(2) (2016)

  2. Bargal, S.A., Barsoum, E., Ferrer, C.C., Zhang, C.: Emotion recognition in the wild from videos using images. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 433–436 (2016). https://doi.org/10.1145/2993148.2997627

  3. Barros, P., Churamani, N., Sciutti, A.: The facechannel: a fast and furious deep neural network for facial expression recognition. SN Comput. Sci. 1(6), 1–10 (2020)

  4. Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016). https://doi.org/10.1145/2993148.2993165

  5. Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI ’16, p. 279–283. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2993148.2993165

  6. Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 302–309. IEEE (2018)

  7. Chen, L., Yang, X., Jeon, G., Anisetti, M., Liu, K.: A trusted medical image super-resolution method based on feedback adaptive weighted dense network. Artif. Intell. Med. 106, 101857 (2020)

    Article  Google Scholar 

  8. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649. IEEE (2012)

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)

  10. Darwin, C.: The expression of the emotions in man and animals

    Google Scholar 

  11. Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 653–656 (2018). https://doi.org/10.1145/3242969.3264993

  12. Ding, H., Zhou, S.K., Chellappa, R.: Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 118–126 (2017).https://doi.org/10.1109/FG.2017.23

  13. Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570 (2016). https://doi.org/10.1109/CVPR.2016.600

  14. Fasel, B.: Robust face analysis using convolutional neural networks. In: Object Recognition Supported by User Interaction for Service Robots, vol. 2, pp. 40–43. IEEE (2002). https://doi.org/10.1109/ICPR.2002.1048231

  15. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., et al.: Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer (2013)

  16. Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102. Springer (2016)

  17. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645. Springer (2016)

  18. Jeon, G., Anisetti, M., Wang, L., Damiani, E.: Locally estimated heterogeneity property and its fuzzy filter application for deinterlacing. Inf. Sci. 354, 112–130 (2016)

    Article  Google Scholar 

  19. Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR (2018)

  20. Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)

  21. Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R., Vincent, P., Courville, A., Bengio, Y., Ferrari, R.C., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 543–550 (2013). https://doi.org/10.1145/2522848.2531745

  22. Kim, B.K., Lee, H., Roh, J., Lee, S.Y.: Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 427–434 (2015). https://doi.org/10.1145/2818346.2830590

  23. Kotsia, I., Buciu, I., Pitas, I.: An analysis of facial expression recognition under partial facial image occlusion. Image Vis. Comput. 26, 1052–1067 (2008)

    Article  Google Scholar 

  24. Li, M., Xu, H., Huang, X., Song, Z., Liu, X., Li, X.: Facial expression recognition with identity and emotion joint learning. IEEE Trans. Affect. Comput. (2018). https://doi.org/10.1109/TAFFC.2018.2880201

  25. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. ArXiv:1804.08348 (2020)

  26. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017). https://doi.org/10.1109/CVPR.2017.277

  27. Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018)

    Article  MathSciNet  Google Scholar 

  28. Liu, K., Zhang, M., Pan, Z.: Facial expression recognition with cnn ensemble. In: 2016 International Conference on Cyberworlds (CW), pp. 163–166. IEEE (2016)

  29. Liu, M., Li, S., Shan, S., Chen, X.: Au-aware deep networks for facial expression recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–6. IEEE (2013)

  30. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, pp. 94–101. IEEE (2010)

  31. Miao, S., Xu, H., Han, Z., Zhu, Y.: Recognizing facial expressions using a shallow convolutional neural network. IEEE Access 7, 78000–78011 (2019). https://doi.org/10.1109/ACCESS.2019.2921220

    Article  Google Scholar 

  32. Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)

    Article  Google Scholar 

  33. Ng, P.C., Henikoff, S.: Sift: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)

    Article  Google Scholar 

  34. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)

  35. Pons, G., Masip, D.: Supervised committee of convolutional neural networks in automated facial expression analysis. IEEE Trans. Affect. Comput. 9(3), 343–350 (2017)

    Article  Google Scholar 

  36. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)

    Article  Google Scholar 

  37. Siqueira, H., Magg, S., Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. Arxiv (2020)

  38. Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 97–115 (2001). https://doi.org/10.1109/34.908962

    Article  Google Scholar 

  39. Wang, J., Wu, J., Wu, Z., Anisetti, M., Jeon, G.: Bayesian method application for color demosaicking. Opt. Eng. 57(5), 053102 (2018)

    Google Scholar 

  40. Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020). https://doi.org/10.1109/CVPR42600.2020.00693

  41. Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)

    Article  Google Scholar 

  42. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

  43. Wu, J., Anisetti, M., Wu, W., Damiani, E., Jeon, G.: Bayer demosaicking with polynomial interpolation. IEEE Trans. Image Process. 25(11), 5369–5382 (2016)

    Article  MathSciNet  Google Scholar 

  44. Yovel, G., Duchaine, B.: Specialized face perception mechanisms extract both part and spacing information: evidence from developmental prosopagnosia. J. Cogn. Neurosci. 18, 580–593 (2006)

    Article  Google Scholar 

  45. Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 435–442 (2015)

  46. Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., Dobaie, A.M.: Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273, 643–649 (2018)

    Article  Google Scholar 

  47. Zhang, J., Kan, M., Shan, S., Chen, X.: Occlusion-free face alignment: deep regression networks coupled with de-corrupt autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3428–3437 (2016)

  48. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)

    Article  Google Scholar 

  49. Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., Metaxas, D.N.: Learning active facial patches for expression analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2562–2569. IEEE (2012)

Download references

Funding

National Key R and D Program of China, under Grant No.2020AAA0104500. The funding is from Sichuan University under grant 2020SCUNG205.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., He, Z., Meng, B. et al. Two-pathway attention network for real-time facial expression recognition. J Real-Time Image Proc 18, 1173–1182 (2021). https://doi.org/10.1007/s11554-021-01123-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01123-w

Keywords

Navigation