Abstract
Many scholars are committed to using deep learning methods to study facial expression recognition (FER). In recent years, FER has gradually been confined to psychology research in the early days to now involves knowledge of many disciplines such as physiology, psychology, cognition and medicine. With the extreme achievement of computer version techniques, various convolutional neural network structures were developed for real-time and accurate FER. There are two main problems in the existing convolutional neural network for handling FER problems: insufficient training data caused over-fitting and expression-unrelated intra-class differences. In this paper, we propose a two-pathway attention network to solve these two problems better. We suppress the intra-class differences efficiently by extracting facial regions based on facial muscle movements driven by facial expressions. We prevent deep networks from insufficient training data by extensively extracting global structures and local facial regions as the training dataset to feed a two-pathway ensemble model. Further more, we weight the whole feature maps from the global image and local regions by introducing an attention mechanism module to reweighs each part according to its contribution to FER. We adopt real-time facial region extraction and multi-layer feature data compression to ensure the real-time performance of the algorithm and reduce the amount of parameters in ensemble model. Experiments on public datasets suggest that our method certifies its effectiveness, reaches human-level performance, and outperforms current state-of-the-art methods with 92.8% on CK+ and 87.0% on FERPLUS.
Similar content being viewed by others
References
Amos, B., Ludwiczuk, B., Satyanarayanan, M., et al.: Openface: a general-purpose face recognition library with mobile applications. CMU Sch Comput Sci 6(2) (2016)
Bargal, S.A., Barsoum, E., Ferrer, C.C., Zhang, C.: Emotion recognition in the wild from videos using images. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 433–436 (2016). https://doi.org/10.1145/2993148.2997627
Barros, P., Churamani, N., Sciutti, A.: The facechannel: a fast and furious deep neural network for facial expression recognition. SN Comput. Sci. 1(6), 1–10 (2020)
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016). https://doi.org/10.1145/2993148.2993165
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, ICMI ’16, p. 279–283. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2993148.2993165
Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 302–309. IEEE (2018)
Chen, L., Yang, X., Jeon, G., Anisetti, M., Liu, K.: A trusted medical image super-resolution method based on feedback adaptive weighted dense network. Artif. Intell. Med. 106, 101857 (2020)
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649. IEEE (2012)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005)
Darwin, C.: The expression of the emotions in man and animals
Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 653–656 (2018). https://doi.org/10.1145/3242969.3264993
Ding, H., Zhou, S.K., Chellappa, R.: Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 118–126 (2017).https://doi.org/10.1109/FG.2017.23
Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570 (2016). https://doi.org/10.1109/CVPR.2016.600
Fasel, B.: Robust face analysis using convolutional neural networks. In: Object Recognition Supported by User Interaction for Service Robots, vol. 2, pp. 40–43. IEEE (2002). https://doi.org/10.1109/ICPR.2002.1048231
Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., et al.: Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117–124. Springer (2013)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102. Springer (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645. Springer (2016)
Jeon, G., Anisetti, M., Wang, L., Damiani, E.: Locally estimated heterogeneity property and its fuzzy filter application for deinterlacing. Inf. Sci. 354, 112–130 (2016)
Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR (2018)
Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)
Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R., Vincent, P., Courville, A., Bengio, Y., Ferrari, R.C., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 543–550 (2013). https://doi.org/10.1145/2522848.2531745
Kim, B.K., Lee, H., Roh, J., Lee, S.Y.: Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 427–434 (2015). https://doi.org/10.1145/2818346.2830590
Kotsia, I., Buciu, I., Pitas, I.: An analysis of facial expression recognition under partial facial image occlusion. Image Vis. Comput. 26, 1052–1067 (2008)
Li, M., Xu, H., Huang, X., Song, Z., Liu, X., Li, X.: Facial expression recognition with identity and emotion joint learning. IEEE Trans. Affect. Comput. (2018). https://doi.org/10.1109/TAFFC.2018.2880201
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. ArXiv:1804.08348 (2020)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017). https://doi.org/10.1109/CVPR.2017.277
Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018)
Liu, K., Zhang, M., Pan, Z.: Facial expression recognition with cnn ensemble. In: 2016 International Conference on Cyberworlds (CW), pp. 163–166. IEEE (2016)
Liu, M., Li, S., Shan, S., Chen, X.: Au-aware deep networks for facial expression recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–6. IEEE (2013)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, pp. 94–101. IEEE (2010)
Miao, S., Xu, H., Han, Z., Zhu, Y.: Recognizing facial expressions using a shallow convolutional neural network. IEEE Access 7, 78000–78011 (2019). https://doi.org/10.1109/ACCESS.2019.2921220
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Ng, P.C., Henikoff, S.: Sift: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Pons, G., Masip, D.: Supervised committee of convolutional neural networks in automated facial expression analysis. IEEE Trans. Affect. Comput. 9(3), 343–350 (2017)
Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Siqueira, H., Magg, S., Wermter, S.: Efficient facial feature learning with wide ensemble-based convolutional neural networks. Arxiv (2020)
Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 97–115 (2001). https://doi.org/10.1109/34.908962
Wang, J., Wu, J., Wu, Z., Anisetti, M., Jeon, G.: Bayesian method application for color demosaicking. Opt. Eng. 57(5), 053102 (2018)
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020). https://doi.org/10.1109/CVPR42600.2020.00693
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Wu, J., Anisetti, M., Wu, W., Damiani, E., Jeon, G.: Bayer demosaicking with polynomial interpolation. IEEE Trans. Image Process. 25(11), 5369–5382 (2016)
Yovel, G., Duchaine, B.: Specialized face perception mechanisms extract both part and spacing information: evidence from developmental prosopagnosia. J. Cogn. Neurosci. 18, 580–593 (2006)
Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 435–442 (2015)
Zeng, N., Zhang, H., Song, B., Liu, W., Li, Y., Dobaie, A.M.: Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273, 643–649 (2018)
Zhang, J., Kan, M., Shan, S., Chen, X.: Occlusion-free face alignment: deep regression networks coupled with de-corrupt autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3428–3437 (2016)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., Metaxas, D.N.: Learning active facial patches for expression analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2562–2569. IEEE (2012)
Funding
National Key R and D Program of China, under Grant No.2020AAA0104500. The funding is from Sichuan University under grant 2020SCUNG205.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, L., He, Z., Meng, B. et al. Two-pathway attention network for real-time facial expression recognition. J Real-Time Image Proc 18, 1173–1182 (2021). https://doi.org/10.1007/s11554-021-01123-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01123-w