research-article

Toward Facial Expression Recognition in the Wild via Noise-Tolerant Network

Authors:

Fuji RenAuthors Info & Claims

IEEE Transactions on Circuits and Systems for Video Technology, Volume 33, Issue 5

Pages 2033 - 2047

https://doi.org/10.1109/TCSVT.2022.3220669

Published: 01 May 2023 Publication History

Abstract

Facial Expression Recognition (FER) has recently emerged as a crucial area in Human-Computer Interaction (HCI) system for understanding the user’s inner state and intention. However, feature- and label-noise constitute the major challenge for FER in the wild due to the ambiguity of facial expressions worsened by low-quality images. To deal with this problem, in this paper, we propose a simple but effective Facial Expression Noise-tolerant Network (FENN) which explores the inter-class correlations for mitigating ambiguity that usually happens between morphologically similar classes. Specifically, FENN leverages a multivariate normal distribution to model such correlations at the final hidden layer of the neural network to suppress the heteroscedastic uncertainty caused by inter-class label noise. Furthermore, the discriminative ability of deep features is weakened by the subtle differences between expressions and the presence of feature noise. FENN utilizes a feature-noise mitigation module to extract compact intra-class feature representations under feature noise while preserving the intrinsic inter-class relationships. We conduct extensive experiments to evaluate the effectiveness of FENN on both original annotated images and synthetic noisy annotated images from RAF-DB, AffectNet, and FERPlus in-the-wild facial expression datasets. The results show that FENN significantly outperforms state-of-the-art FER methods.

References

[1]

P. Ekman and W. V. Friesen, “Detecting deception from the body or face,” J. Personality Social Psychol., vol. 29, no. 3, p. 288, 1974.

[2]

M. G. Frank and P. Ekman, “The ability to detect deceit generalizes across different types of high-stake lies,” J. Personality Social Psychol., vol. 72, no. 6, p. 1429, 1997.

[3]

S. Singh and N. P. Papanikolopoulos, “Monitoring driver fatigue using facial analysis techniques,” in Proc. IEEE/IEEJ/JSAI Int. Conf. Intell. Transp. Syst., Oct. 1999, pp. 314–318.

[4]

X. Fan, Y. Sun, B. Yin, and X. Guo, “Gabor-based dynamic representation for human fatigue monitoring in facial image sequences,” Pattern Recognit. Lett., vol. 31, no. 3, pp. 234–243, Feb. 2010.

Digital Library

[5]

M. Yang, Y. Ma, Z. Liu, H. Cai, X. Hu, and B. Hu, “Undisturbed mental state assessment in the 5G era: A case study of depression detection based on facial expressions,” IEEE Wireless Commun., vol. 28, no. 3, pp. 46–53, Jun. 2021.

Digital Library

[6]

Q. Wang, L. Lu, Q. Zhang, F. Fang, X. Zou, and L. Yi, “Eye avoidance in young children with autism spectrum disorder is modulated by emotional facial expressions,” J. Abnormal Psychol., vol. 127, no. 7, p. 722, 2018.

[7]

Y. Gu, H. Yan, X. Zhang, Z. Liu, and F. Ren, “3-D facial expression recognition via attention-based multichannel data fusion network,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10, 2021.

[8]

Y. Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2439–2450, May 2019.

[9]

Y. Gu, X. Zhang, Z. Liu, and F. Ren, “WiFE: WiFi and vision based intelligent facial-gesture emotion recognition,” 2020, arXiv:2004.09889.

[10]

Y. Li, Y. Gao, B. Chen, Z. Zhang, L. Zhu, and G. Lu, “JDMAN: Joint discriminative and mutual adaptation networks for cross-domain facial expression recognition,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 3312–3320.

[11]

W. Xie, H. Wu, Y. Tian, M. Bai, and L. Shen, “Triplet loss with multistage outlier suppression and class-pair margins for facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 690–703, Feb. 2022.

[12]

M. Huang, X. Zhang, X. Lan, H. Wang, and Y. Tang, “Convolution by multiplication: Accelerated two-stream Fourier domain convolutional neural network for facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1431–1442, Mar. 2022.

[13]

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, Jun. 2010, pp. 94–101.

[14]

M. Valstaret al., “Induced disgust, happiness and surprise: An addition to the MMI facial expression database,” in Proc. 3rd Intern. Workshop EMOTION (Satellite LREC), Corpora Res. Emotion Affect, Paris, France, 2010, p. 65.

[15]

S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2852–2861.

[16]

A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Trans. Affect. Comput., vol. 10, no. 1, pp. 18–31, Jan./Mar. 2017.

Digital Library

[17]

E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recognition with crowd-sourced label distribution,” in Proc. 18th ACM Int. Conf. Multimodal Interact., Oct. 2016, pp. 279–283.

[18]

A. Kendall and Y. Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.

[19]

M. Collier, B. Mustafa, E. Kokiopoulou, R. Jenatton, and J. Berent, “Correlated input-dependent label noise in large-scale image classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1551–1560.

[20]

V. Fortuinet al., “Deep classifiers with label noise modeling and distance awareness,” 2021, arXiv:2110.02609.

[21]

M. Collier, B. Mustafa, E. Kokiopoulou, R. Jenatton, and J. Berent, “A simple probabilistic method for deep classification under input-dependent label noise,” 2020, arXiv:2003.06778.

[22]

J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4690–4699.

[23]

X. Shu, J. Tang, H. Lai, L. Liu, and S. Yan, “Personalized age progression with aging dictionary,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 3970–3978.

[24]

X. Shu, J. Tang, Z. Li, H. Lai, L. Zhang, and S. Yan, “Personalized age progression with Bi-level aging dictionary learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 905–917, Apr. 2018.

[25]

S. Liuet al., “Face aging with contextual generative adversarial nets,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 82–90.

[26]

J. Sunet al., “FENeRF: Face editing in neural radiance fields,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2022, pp. 7672–7682.

[27]

H. Yang, U. Ciftci, and L. Yin, “Facial expression recognition by de-expression residue learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 2168–2177.

[28]

K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, “Region attention networks for pose and occlusion robust facial expression recognition,” IEEE Trans. Image Process., vol. 29, pp. 4057–4069, 2020.

Digital Library

[29]

D. Ruan, Y. Yan, S. Chen, J.-H. Xue, and H. Wang, “Deep disturbance-disentangled learning for facial expression recognition,” in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 2833–2841.

[30]

G. Wen, T. Chang, H. Li, and L. Jiang, “Dynamic objectives learning for facial expression recognition,” IEEE Trans. Multimedia, vol. 22, no. 11, pp. 2914–2925, Nov. 2020.

[31]

E. Friesen and P. Ekman, “Facial action coding system: A technique for the measurement of facial movement,” Palo Alto, vol. 3, no. 2, p. 5, 1978.

[32]

Y. Liet al., “Learning informative and discriminative features for facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3178–3189, May 2022.

Digital Library

[33]

Y. Li, Y. Gao, B. Chen, Z. Zhang, G. Lu, and D. Zhang, “Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3190–3202, May 2022.

Digital Library

[34]

T. Pu, T. Chen, Y. Xie, H. Wu, and L. Lin, “AU-expression knowledge constrained representation learning for facial expression recognition,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2021, pp. 11154–11161.

[35]

J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 222–237.

[36]

K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, “Suppressing uncertainties for large-scale facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 6897–6906.

[37]

S. Maoet al., “Label distribution amendment with emotional semantic correlations for facial expression recognition,” 2021, arXiv:2107.11061.

[38]

J. She, Y. Hu, H. Shi, J. Wang, Q. Shen, and T. Mei, “Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6248–6257.

[39]

Y. Zhang, C. Wang, and W. Deng, “Relative uncertainty learning for facial expression recognition,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 17616–17627.

[40]

L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2304–2313.

[41]

T. Liu and D. Tao, “Classification with noisy labels by importance reweighting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 3, pp. 447–461, Jul. 2016.

Digital Library

[42]

R. Wang, T. Liu, and D. Tao, “Multiclass learning with partially corrupted labels,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2568–2580, Jun. 2018.

[43]

L. P. F. Garcia, J. A. Sáez, J. Luengo, A. C. Lorena, A. C. de Carvalho, and F. Herrera, “Using the one-vs-one decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems,” Knowl.-Based Syst., vol. 90, pp. 153–164, Dec. 2015.

Digital Library

[44]

J. Luengo, S.-O. Shim, S. Alshomrani, A. Altalhi, and F. Herrera, “CNC-NOS: Class noise cleaning by ensemble filtering and noise scoring,” Knowl.-Based Syst., vol. 140, pp. 27–49, Jan. 2018.

Digital Library

[45]

J. Li, R. Socher, and S. C. Hoi, “DivideMix: Learning with noisy labels as semi-supervised learning,” in Proc. Int. Conf. Learn. Represent., 2019, pp. 1–14.

[46]

Y. Yan, R. Rosales, G. Fung, R. Subramanian, and J. Dy, “Learning from multiple annotators with varying expertise,” Mach. Learn., vol. 95, no. 3, pp. 291–327, 2014.

Digital Library

[47]

F. Rodrigues and F. Pereira, “Deep learning from crowds,” in Proc. AAAI Conf. Artif. Intell., 2018, vol. 32, no. 1, pp. 1–8.

[48]

R. Tanno, A. Saeedi, S. Sankaranarayanan, D. C. Alexander, and N. Silberman, “Learning from noisy labels by regularized estimation of annotator confusion,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 11244–11253.

[49]

B. Hanet al., “Co-Teaching: Robust training of deep neural networks with extremely noisy labels,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–11.

[50]

K. E. Train, Discrete Choice Methods With Simulation. Cambridge, U.K.: Cambridge Univ. Press, 2009.

[51]

L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade. Berlin, Germany: Springer, 2012, pp. 421–436.

[52]

Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 499–515.

[53]

I. J. Goodfellowet al., “Challenges in representation learning: A report on three machine learning contests,” in Proc. Int. Conf. Neural Inf. Process. Cham, Switzerland: Springer, 2013, pp. 117–124.

[54]

Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “MS-Celeb-1M: A dataset and benchmark for large-scale face recognition,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 87–102.

[55]

K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016.

[56]

Y. Li, J. Zeng, S. Shan, and X. Chen, “Patch-gated CNN for occlusion-aware facial expression recognition,” in Proc. 24th Int. Conf. Pattern Recognit. (ICPR), Aug. 2018, pp. 2209–2214.

[57]

H. Siqueira, S. Magg, and S. Wermter, “Efficient facial feature learning with wide ensemble-based convolutional neural networks,” in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 4, pp. 5800–5809.

[58]

L. Lo, H. X. Xie, H.-H. Shuai, and W.-H. Cheng, “Facial chirality: Using self-face reflection to learn discriminative features for facial expression recognition,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2021, pp. 1–6.

[59]

Z. Zhao, Q. Liu, and F. Zhou, “Robust lightweight facial expression recognition network with label distribution training,” in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 4, pp. 3510–3519.

[60]

D. Zeng, Z. Lin, X. Yan, Y. Liu, F. Wang, and B. Tang, “Face2Exp: Combating data biases for facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 20291–20300.

[61]

S. Xie, H. Hu, and Y. Chen, “Facial expression recognition with two-branch disentangled generative adversarial network,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 6, pp. 2359–2371, Jun. 2021.

[62]

L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 1–27, 2008.

[63]

O. Chapelle and A. Zien, “Semi-supervised classification by low density separation,” in Proc. Int. Workshop Artif. Intell. Statist., 2005, pp. 57–64.

Cited By

Li HXiao XLiu XWen GLiu L(2024)Learning Cognitive Features as Complementary for Facial Expression RecognitionInternational Journal of Intelligent Systems10.1155/2024/73211752024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/7321175
Liu YZhang XKauttonen JZhao G(2024)Uncertain Facial Expression Recognition via Multi-Task Assisted CorrectionIEEE Transactions on Multimedia10.1109/TMM.2023.330120926(2531-2543)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3301209
Xie WPeng ZShen LLu WZhang YSong S(2024)Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337845933(2514-2529)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3378459
Show More Cited By

Recommendations

Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Subtle facial expression recognition using motion magnification

This paper proposes a novel method for subtle facial expression recognition that uses motion magnification to transform subtle expressions into corresponding exaggerated ones. Motion magnification consists of four steps: First, active appearance model (...
PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild
Many Facial expression recognition methods have achieved great success, but they only considered front facial images or facial images close to the front. Besides, unlike in-the-laboratory datasets, the facial images in the real world (or in the wild) are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology

IEEE Transactions on Circuits and Systems for Video Technology Volume 33, Issue 5

May 2023

524 pages

ISSN:1051-8215

Issue’s Table of Contents

1051-8215 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 May 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HXiao XLiu XWen GLiu L(2024)Learning Cognitive Features as Complementary for Facial Expression RecognitionInternational Journal of Intelligent Systems10.1155/2024/73211752024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/7321175
Liu YZhang XKauttonen JZhao G(2024)Uncertain Facial Expression Recognition via Multi-Task Assisted CorrectionIEEE Transactions on Multimedia10.1109/TMM.2023.330120926(2531-2543)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3301209
Xie WPeng ZShen LLu WZhang YSong S(2024)Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337845933(2514-2529)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3378459
Chen CSun XTu ZWang M(2023)AST-GCN: Augmented Spatial Temporal Graph Convolutional Neural Network for Gait Emotion RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334172834:6(4581-4595)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3341728

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents