Abstract
Facial expression recognition (FER) is still challenging due to the small interclass discrepancy in facial expression data. In view of the significance of facial crucial regions for FER, many existing studies utilize the prior information from some annotated crucial points to improve the performance of FER. However, it is complicated and time-consuming to manually annotate facial crucial points, especially for vast wild expression images. Based on this, a local non-local joint network is proposed to adaptively enhance the facial crucial regions in feature learning of FER in this paper. In the proposed method, two parts are constructed based on facial local and non-local information, where an ensemble of multiple local networks is proposed to extract local features corresponding to multiple facial local regions and a non-local attention network is addressed to explore the significance of each local region. In particular, the attention weights obtained by the non-local network are fed into the local part to achieve interactive feedback between the facial global and local information. Interestingly, the non-local weights corresponding to local regions are gradually updated and higher weights are given to more crucial regions. Moreover, U-Net is employed to extract the integrated features of deep semantic information and low hierarchical detail information of expression images. Finally, experimental results illustrate that the proposed method achieves more competitive performance than several state-of-the-art methods on five benchmark datasets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
C. Darwin, P. Ekman. The Expression of the Emotions in Man and Animals, 3rd ed., Oxford, UK: Oxford University Press, 1998.
R. Buck, R. E. Miller, W. F. Caul. Sex, personality, and physiological variables in the communication of affect via facial expression. Journal of Personality and Social Psychology, vol. 30, no. 4, pp. 587–596, 1974. DOI: https://doi.org/10.1037/h0037041.
M. C. Smith, M. K. Smith, H. Ellgring. Spontaneous and posed facial expression in parkinson’s disease. Journal of the International Neuropsychological Society, vol. 2, no. 5, pp. 383–391, 1996. DOI: https://doi.org/10.1017/S1355617700001454.
C. A. Corneanu, M. O. Simón, J. F. Cohn, S. E. Guerrero. Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1548–1568, 2016. DOI: https://doi.org/10.1109/TPAM1.2016.2515606.
A. Majumder, L. Behera, V. K. Subramanian. Automatic facial expression recognition system using deep network-based data fusion. IEEE Transactions on Cybernetics, vol. 48, no. 1, pp. 103–114, 2018. DOI: https://doi.org/10.1109/TCYB.2016.2625419.
W. C. Xie, L. L. Shen, J. M. Duan. Adaptive weighting of handcrafted feature losses for facial expression recognition. IEEE Transactions on Cybernetics, vol. 51, no. 5, pp. 2787–2800, 2021. DOI: https://doi.org/10.1109/TCYB.2019.2925095.
E. L. Rosenberg, P. Ekman. What the Face Reveals: Basic and Applied Studies of Spontaneous Expression using the Facial Action Coding System (FACS), 3rd ed., Oxford, UK: Oxford University Press, 2020.
S. F. Wang, G. Z. Peng, S. Y. Chen, Q. Ji. Weakly supervised facial action unit recognition with domain knowledge. IEEE Transactions on Cybernetics, vol. 48, no. 11, pp. 3265–3276, 2018. DOI: https://doi.org/10.1109/TCYB.2018.2868194.
H. K. Ekenel, R. Stiefelhagen. Why is facial occlusion a challenging problem? In Proceedings of the 3rd International Conference on Biometrics, Springer, Alghero, Italy, pp. 299–308, 2009. DOI: https://doi.org/10.1007/978-3-642-01793-3_31.
L. Zhong, Q. S. Liu, P. Yang, J. Z. Huang, D. N. Metaxas. Learning multiscale active facial patches for expression analysis. IEEE Transactions on Cybernetics, vol. 45, no. 8, pp. 1499–1510, 2015. DOI: https://doi.org/10.1109/TCYB.2014.2354351.
Y. R. Fan, J. C. K. Lam, V. O. K. Li. Multi-region ensemble convolutional neural network for facial expression recognition. In Proceedings of the 27th International Conference on Artificial Neural Networks, Springer, Rhodes, Greece, pp.84–94, 2018. DOI: https://doi.org/10.1007/978-3-030-01418-6_9.
Y. Li, J. B. Zeng, S. G. Shan, X. L. Chen. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2439–2450, 2019. DOI: https://doi.org/10.1109/TIP.2018.2886767.
S. L. Happy A. Routray. Automatic facial expression recognition using features of salient facial patches. IEEE Transactions on Affective Computing, vol. 6, no. 1, pp. 1–12, 2015. DOI: https://doi.org/10.1109/TAFFC.2014.2386334.
K. Wang, X. J. Peng, J. F. Yang, D. B. Meng, Y. Qiao. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Transactions on Image Processing, vol. 29, pp. 4057–4069, 2020. DOI: https://doi.org/10.1109/TIP.2019.2956143.
M. Y. Liu, S. G. Shan, R. P. Wang, X. L. Chen. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1749–1756, 2014. DOI: https://doi.org/10.1109/CVPR2014.226
C. F. Shan, S. G. Gong, P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, vol. 27, no. 6, pp. 803–816, 2009. DOI: https://doi.org/10.1016/j.imavis.2008.08.005.
I. Kotsia, I. Pitas. Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Transactions on Image Processing, vol. 16, no. 1, pp. 172–187, 2007. DOI: https://doi.org/10.1109/TIP.2006.884954.
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews. The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, USA, pp. 94–101, 2010. DOI: https://doi.org/10.1109/CVPRW.2010.5543262.
M. Pantic, M. Valstar, R. Rademaker, L. Maat. Web-based database for facial expression analysis. In Proceedings of IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, pp.317–321, 2005. DOI: https://doi.org/10.1109/ICME.2005.1521424.
M. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba. Coding facial expressions with gabor wavelets. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 200–205, 1998. DOI: https://doi.org/10.1109/AFGR.1998.670949.
G. Y. Zhao, X. H. Huang, M. Taini, S. Z. Li, M. Pietikäinen. Facial expression recognition from near-in-frared videos. Image and Vision Computing, vol. 29, no. 9, pp. 607–619, 2011. DOI: https://doi.org/10.1016/j.imavis.2011.07.002.
S. Li, W. H. Deng. Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing, vol. 13, no. 3, pp. 1195–1215, 2022. DOI: https://doi.org/10.1109/TAFFC.2020.2981446.
S. Li, W. H. Deng, J. P. Du. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 2584–2593, 2017. DOI: https://doi.org/10.1109/CVPR.2017.277.
A. Mollahosseini, B. Hasani, M. H. Mahoor. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2019. DOI: https://doi.org/10.1109/TAFFC.2017.2740923.
C. F. Benitez-Quiroz, R. Srinivasan, A. M. Martinez. EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 5562–5570, 2016. DOI: https://doi.org/10.1109/CVPR.2016.600.
L. T. Mou, C. Zhou, P. T. Xie, P. F. Zhao, R. Jain, W. Gao, B. C. Yin. Isotropic self-supervised learning for driver drowsiness detection with attention-based multimodal fusion. IEEE Transactions on Multimedia, vol. 25, pp. 529–542, 2021. DOI: https://doi.org/10.1109/TMM.2021.3128738.
L. T. Mou, C. Zhou, P. F. Zhao, B. Nakisa, M. N. Rastgoo, R. Jain, W. Gao. Driver stress detection via multimodal fusion using attention-based CNN-LSTM. Expert Systems with Applications, vol. 173, Article number 114693, 2021. DOI: https://doi.org/10.1016/j.eswa.2021.114693.
L. Song, J. F. Yang, Q. Z. Shang, M. A. Li. Dense face network: A dense face detector based on global context and visual attention mechanism. Machine Intelligence Research, vol. 19, no. 3, pp. 247–256, 2022. DOI: https://doi.org/10.1007/s11633-022-1327-2.
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. [Online], Available: https://arxiv.org/abs/1409.1556, 2014.
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
P. Hu, D. Q. Cai, S. D. Wang, A. B. Yao, Y. R. Chen. Learning supervised scoring ensemble for emotion recognition in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, pp. 553–560, 2017. DOI: https://doi.org/10.1145/3136755.3143009.
D. Acharya, Z. W. Huang, D. P. Paudel, L. van Gool. Covariance pooling for facial expression recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Salt Lake City, USA, pp. 480–4807, 2018. DOI: https://doi.org/10.1109/CVPRW.2018.00077.
S. Li, W. H. Deng. Blended emotion in-the-wild: Multi-label facial expression recognition using crowdsourced annotations and deep locality feature learning. International Journal of Computer Vision, vol. 127, no. 6–7, pp. 884–906, 2019. DOI: https://doi.org/10.1007/s11263-018-1131-1.
H. Y. Yang, L. J. Yin. CNN based 3D facial expression recognition using masking and landmark features. In Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction, IEEE, San Antonio, USA, pp. 556–560, 2017. DOI: https://doi.org/10.1109/ACII.2017.8273654.
W. Q. Wu, Y. J. Yin, Y. Y. Wang, X. G. Wang, D. Xu. Facial expression recognition for different pose faces based on special landmark detection. In Proceedings of the 24th International Conference on Pattern Recognition, IEEE, Beijing, China, pp. 1524–1529, 2018. DOI: https://doi.org/10.1109/ICPR.2018.8545725.
J. B. Zeng, S. G. Shan, X. L. Chen. Facial expression recognition with inconsistently annotated datasets. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 227–243, 2018. DOI: https://doi.org/10.1007/978-3-030-01261-8_14.
Y. L. Gan, J. Y. Chen, L. H. Xu. Facial expression recognition boosted by soft label with a diverse ensemble. Pattern Recognition Letters, vol. 125, pp. 105–112, 2019. DOI: https://doi.org/10.1016/j.patrec.2019.04.002.
H. Y. Yang, U. Ciftci, L. J. Yin. Facial expression recognition by de-expression residue learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 2168–2177, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00231.
F. F. Zhang, T. Z. Zhang, Q. R. Mao, C. S. Xu. Joint pose and expression modeling for facial expression recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3359–3368, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00354.
S. C. Zhao, C. Lin, P. F. Xu, S. D. Zhao, Y. C. Guo, R. Krishna, G. G. Ding, K. Keutzer. CycleemotionGAN: Emotional semantic consistency preserved cycleGAN for adapting image emotions. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, pp. 2620–2627, 2019. DOI: https://doi.org/10.1609/aaai.v33i01.33012620.
L. J. Fan, W. B. Huang, C. Gan, J. Z. Huang, B. Q. Gong. Controllable image-to-video translation: A case study on facial expression generation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, pp. 3510–3517, 2019. DOI: https://doi.org/10.1609/aaai.v33i01.33013510.
R. L. Wu, G. J. Zhang, S. J. Lu, T. Chen. Cascade EFGAN: Progressive facial expression editing with local focuses. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 5020–5029, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00507.
A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, F. Moreno-Noguer. GANimation: Anatomically-aware facial animation from a single image. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 835–851, 2018. DOI: https://doi.org/10.1007/978-3-030-01249-6_50.
Y. Choi, M. Choi, M. Kim, J. W. Ha, S. Kim, J. Choo. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 8789–8797, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00916.
P. Barros, G. I. Parisi, C. Weber, S. Wermter. Emotion-modulated attention improves expression recognition: A deep learning model. Neurocomputing, vol. 253, pp. 104–114, 2017. DOI: https://doi.org/10.1016/j.neucom.2017.01.096.
X. H. Wang, M. Z. Peng, L. J. Pan, M. Hu, C. H. Jin, F. J. Ren. Two-level attention with two-stage multi-task learning for facial emotion recognition. Journal of Visual Communication and Image Representation, vol. 62, pp. 217–225, 2019. DOI: https://doi.org/10.1016/j.jvcir.2019.05.009.
O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-assisted Intervention, Springer, Munich, Germany, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.
T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, A. Dovzhenko, O. Tietz, C. Dal Bosco, S. Walsh, D. Saltukoglu, T. L. Tay, M. Prinz, K. Palme, M. Simons, I. Diester, T. Brox, O. Ronneberger. U-Net: Deep learning for cell counting, detection, and morphometry. Nature Methods, vol. 16, no. 1, pp. 67–70, 2019. DOI: https://doi.org/10.1038/s41592-018-0261-2.
Z. X. Zhang, Q. J. Liu, Y. H. Wang. Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018. DOI: https://doi.org/10.1109/LGRS.2018.2802944.
T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, S. Belongie. Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 936–944, 2017. DOI: https://doi.org/10.1109/CVPR.2017.106.
J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 3431–3440, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, pp. 6000–6010, 2017. DOI: https://doi.org/10.5555/3295222.3295349.
X. L. Wang, R. Girshick, A. Gupta, K. M. He. Non-local neural networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7794–7803, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00813.
H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, J. Y. Jia. Pyramid scene parsing network. In Proceed-ings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6230–6239, 2017. DOI: https://doi.org/10.1109/CVPR.2017.660.
A. Dhall, R. Goecke, S. Lucey, T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Proceedings of IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, pp. 2106–2112, 2011. DOI: https://doi.org/10.1109/ICCVW.2011.6130508.
J. Goldberger, E. Ben-Reuven. Training deep neural-networks using a noise adaptation layer. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
Y. D. Wen, K. P. Zhang, Z. F. Li, Y. Qiao. A discriminative feature learning approach for deep face recognition. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 499–515, 2016. DOI: https://doi.org/10.1007/978-3-319-46478-7_31.
S. K. Chen, J. F. Wang, Y. D. Chen, Z. C. Shi, X. Geng, Y. Rui. Label distribution learning on auxiliary label space graphs for facial expression recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 13981–13990, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01400.
A. P. Fard, M. H. Mahoor. Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access, vol. 10, pp. 26756–26768, 2022. DOI: https://doi.org/10.1109/ACCESS.2022.3156598.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declared that they have no conflicts of interest to this work.
Additional information
Colored figures are available in the online version at https://link.springer.com/journal/11633
Guanghui Shi received the B. Sc. degree in electronics and information engineering from Wuhan University of Technology, China in 2018, and the M. Sc. degree in electronics and communication engineering from Xidian University, China in 2021. He is currently a Ph. D. degree candidate in computer science and technology at Xidian University, China.
His research interests include deep learning, facial expression recognition and visual pattern mining.
Shasha Mao received the Ph. D. degree in circuit and system from Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, China in 2014. She is currently an associate professor at Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, School of Artificial Intelligence, Xidian University, China. She worked as a research fellow in Nanyang Technological University and Singapore University of Technology and Design, Singapore from 2014 to 2018.
Her research interests include ensemble learning, deep learning, imbalanced learning, facial expression recognition and SAR image registration.
Shuiping Gou received the B. Sc. and M. Sc. degrees in computer science and technology from Xidian University, China in 2000 and 2003, respectively, and the Ph. D. degree in pattern recognition and intelligent system from Xidian University, China in 2008. She is currently a professor with Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, School of Artificial Intelligence, Xidian University, China.
Her research interests include machine learning, data mining, remote sensing image analysis and medical image analysis.
Dandan Yan received the B. Sc. degree in computer science and technology from Xi’an University of Technology, China in 2021. She is currently a master student in computer science and technology at School of Artificial Intelligence, Xidian University, China.
Her research interests include deep learning, facial expression recognition and label distribution learning.
Licheng Jiao received the B. Sc. degree in electronic engineering from Shanghai Jiao Tong University, China in 1982, the M. Sc. and Ph. D. degrees in electronic engineering from Xi’an Jiaotong University, China, in 1984 and 1990, respectively. From 1990 to 1991, he was a postdoctoral fellow with National Key Laboratory for Radar Signal Processing, Xidian University, China. Since 1992, he has been a professor with School of Electronic Engineering, Xidian University, China. Currently, he is a professor with School of Artificial Intelligence, Xidian University, China, and he is also the director of Key Laboratory of Intelligent Perception and Image Understanding, Ministry of Education of China, Xidian University, China. He is in charge of approximately 40 important Scientific research projects. He has authored or coauthored more than 20 monographs and 100 papers in international journals and conferences. He is a member of the IEEE Xi’an Section Execution Committee, the Chairman of Awards and Recognition Committee, the Vice Board Chairperson of the Chinese Association of Artificial Intelligence, the Councilor of the Chinese Institute of Electronics, the Committee Member of the Chinese Committee of Neural Networks, and an Expert of Academic Degrees Dommittee of the State Council.
His research interests include image processing, natural computation, machine learning and intelligent information processing.
Lin Xiong received the Ph. D. degree in pattern recognition & intelligent system from Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, China in 2015. Currently, he works as a senior algorithm expert in SenseTime, China. He was a research scientist at JD Finance America Corporation from 2018 to 2022.
His research interests include neural implicit rendering, neural radiance field, distributed model parallelism, unconstrained/large-scale face recognition, deep learning architecture engineering, person reidentification, face recognition, Riemannian manifold optimization, sparse and low-rank matrix factorization.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shi, G., Mao, S., Gou, S. et al. Adaptively Enhancing Facial Expression Crucial Regions via a Local Non-local Joint Network. Mach. Intell. Res. 21, 331–348 (2024). https://doi.org/10.1007/s11633-023-1417-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-023-1417-9