Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Toward Facial Expression Recognition in the Wild via Noise-Tolerant Network

Published: 01 May 2023 Publication History

Abstract

Facial Expression Recognition (FER) has recently emerged as a crucial area in Human-Computer Interaction (HCI) system for understanding the user’s inner state and intention. However, feature- and label-noise constitute the major challenge for FER in the wild due to the ambiguity of facial expressions worsened by low-quality images. To deal with this problem, in this paper, we propose a simple but effective Facial Expression Noise-tolerant Network (FENN) which explores the inter-class correlations for mitigating ambiguity that usually happens between morphologically similar classes. Specifically, FENN leverages a multivariate normal distribution to model such correlations at the final hidden layer of the neural network to suppress the heteroscedastic uncertainty caused by inter-class label noise. Furthermore, the discriminative ability of deep features is weakened by the subtle differences between expressions and the presence of feature noise. FENN utilizes a feature-noise mitigation module to extract compact intra-class feature representations under feature noise while preserving the intrinsic inter-class relationships. We conduct extensive experiments to evaluate the effectiveness of FENN on both original annotated images and synthetic noisy annotated images from RAF-DB, AffectNet, and FERPlus in-the-wild facial expression datasets. The results show that FENN significantly outperforms state-of-the-art FER methods.

References

[1]
P. Ekman and W. V. Friesen, “Detecting deception from the body or face,” J. Personality Social Psychol., vol. 29, no. 3, p. 288, 1974.
[2]
M. G. Frank and P. Ekman, “The ability to detect deceit generalizes across different types of high-stake lies,” J. Personality Social Psychol., vol. 72, no. 6, p. 1429, 1997.
[3]
S. Singh and N. P. Papanikolopoulos, “Monitoring driver fatigue using facial analysis techniques,” in Proc. IEEE/IEEJ/JSAI Int. Conf. Intell. Transp. Syst., Oct. 1999, pp. 314–318.
[4]
X. Fan, Y. Sun, B. Yin, and X. Guo, “Gabor-based dynamic representation for human fatigue monitoring in facial image sequences,” Pattern Recognit. Lett., vol. 31, no. 3, pp. 234–243, Feb. 2010.
[5]
M. Yang, Y. Ma, Z. Liu, H. Cai, X. Hu, and B. Hu, “Undisturbed mental state assessment in the 5G era: A case study of depression detection based on facial expressions,” IEEE Wireless Commun., vol. 28, no. 3, pp. 46–53, Jun. 2021.
[6]
Q. Wang, L. Lu, Q. Zhang, F. Fang, X. Zou, and L. Yi, “Eye avoidance in young children with autism spectrum disorder is modulated by emotional facial expressions,” J. Abnormal Psychol., vol. 127, no. 7, p. 722, 2018.
[7]
Y. Gu, H. Yan, X. Zhang, Z. Liu, and F. Ren, “3-D facial expression recognition via attention-based multichannel data fusion network,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10, 2021.
[8]
Y. Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2439–2450, May 2019.
[9]
Y. Gu, X. Zhang, Z. Liu, and F. Ren, “WiFE: WiFi and vision based intelligent facial-gesture emotion recognition,” 2020, arXiv:2004.09889.
[10]
Y. Li, Y. Gao, B. Chen, Z. Zhang, L. Zhu, and G. Lu, “JDMAN: Joint discriminative and mutual adaptation networks for cross-domain facial expression recognition,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 3312–3320.
[11]
W. Xie, H. Wu, Y. Tian, M. Bai, and L. Shen, “Triplet loss with multistage outlier suppression and class-pair margins for facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 690–703, Feb. 2022.
[12]
M. Huang, X. Zhang, X. Lan, H. Wang, and Y. Tang, “Convolution by multiplication: Accelerated two-stream Fourier domain convolutional neural network for facial expression recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1431–1442, Mar. 2022.
[13]
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, Jun. 2010, pp. 94–101.
[14]
M. Valstaret al., “Induced disgust, happiness and surprise: An addition to the MMI facial expression database,” in Proc. 3rd Intern. Workshop EMOTION (Satellite LREC), Corpora Res. Emotion Affect, Paris, France, 2010, p. 65.
[15]
S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2852–2861.
[16]
A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Trans. Affect. Comput., vol. 10, no. 1, pp. 18–31, Jan./Mar. 2017.
[17]
E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recognition with crowd-sourced label distribution,” in Proc. 18th ACM Int. Conf. Multimodal Interact., Oct. 2016, pp. 279–283.
[18]
A. Kendall and Y. Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
[19]
M. Collier, B. Mustafa, E. Kokiopoulou, R. Jenatton, and J. Berent, “Correlated input-dependent label noise in large-scale image classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1551–1560.
[20]
V. Fortuinet al., “Deep classifiers with label noise modeling and distance awareness,” 2021, arXiv:2110.02609.
[21]
M. Collier, B. Mustafa, E. Kokiopoulou, R. Jenatton, and J. Berent, “A simple probabilistic method for deep classification under input-dependent label noise,” 2020, arXiv:2003.06778.
[22]
J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4690–4699.
[23]
X. Shu, J. Tang, H. Lai, L. Liu, and S. Yan, “Personalized age progression with aging dictionary,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 3970–3978.
[24]
X. Shu, J. Tang, Z. Li, H. Lai, L. Zhang, and S. Yan, “Personalized age progression with Bi-level aging dictionary learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 905–917, Apr. 2018.
[25]
S. Liuet al., “Face aging with contextual generative adversarial nets,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 82–90.
[26]
J. Sunet al., “FENeRF: Face editing in neural radiance fields,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2022, pp. 7672–7682.
[27]
H. Yang, U. Ciftci, and L. Yin, “Facial expression recognition by de-expression residue learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 2168–2177.
[28]
K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, “Region attention networks for pose and occlusion robust facial expression recognition,” IEEE Trans. Image Process., vol. 29, pp. 4057–4069, 2020.
[29]
D. Ruan, Y. Yan, S. Chen, J.-H. Xue, and H. Wang, “Deep disturbance-disentangled learning for facial expression recognition,” in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 2833–2841.
[30]
G. Wen, T. Chang, H. Li, and L. Jiang, “Dynamic objectives learning for facial expression recognition,” IEEE Trans. Multimedia, vol. 22, no. 11, pp. 2914–2925, Nov. 2020.
[31]
E. Friesen and P. Ekman, “Facial action coding system: A technique for the measurement of facial movement,” Palo Alto, vol. 3, no. 2, p. 5, 1978.
[32]
Y. Liet al., “Learning informative and discriminative features for facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3178–3189, May 2022.
[33]
Y. Li, Y. Gao, B. Chen, Z. Zhang, G. Lu, and D. Zhang, “Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3190–3202, May 2022.
[34]
T. Pu, T. Chen, Y. Xie, H. Wu, and L. Lin, “AU-expression knowledge constrained representation learning for facial expression recognition,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2021, pp. 11154–11161.
[35]
J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 222–237.
[36]
K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, “Suppressing uncertainties for large-scale facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 6897–6906.
[37]
S. Maoet al., “Label distribution amendment with emotional semantic correlations for facial expression recognition,” 2021, arXiv:2107.11061.
[38]
J. She, Y. Hu, H. Shi, J. Wang, Q. Shen, and T. Mei, “Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 6248–6257.
[39]
Y. Zhang, C. Wang, and W. Deng, “Relative uncertainty learning for facial expression recognition,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 17616–17627.
[40]
L. Jiang, Z. Zhou, T. Leung, L.-J. Li, and L. Fei-Fei, “MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 2304–2313.
[41]
T. Liu and D. Tao, “Classification with noisy labels by importance reweighting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 3, pp. 447–461, Jul. 2016.
[42]
R. Wang, T. Liu, and D. Tao, “Multiclass learning with partially corrupted labels,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2568–2580, Jun. 2018.
[43]
L. P. F. Garcia, J. A. Sáez, J. Luengo, A. C. Lorena, A. C. de Carvalho, and F. Herrera, “Using the one-vs-one decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems,” Knowl.-Based Syst., vol. 90, pp. 153–164, Dec. 2015.
[44]
J. Luengo, S.-O. Shim, S. Alshomrani, A. Altalhi, and F. Herrera, “CNC-NOS: Class noise cleaning by ensemble filtering and noise scoring,” Knowl.-Based Syst., vol. 140, pp. 27–49, Jan. 2018.
[45]
J. Li, R. Socher, and S. C. Hoi, “DivideMix: Learning with noisy labels as semi-supervised learning,” in Proc. Int. Conf. Learn. Represent., 2019, pp. 1–14.
[46]
Y. Yan, R. Rosales, G. Fung, R. Subramanian, and J. Dy, “Learning from multiple annotators with varying expertise,” Mach. Learn., vol. 95, no. 3, pp. 291–327, 2014.
[47]
F. Rodrigues and F. Pereira, “Deep learning from crowds,” in Proc. AAAI Conf. Artif. Intell., 2018, vol. 32, no. 1, pp. 1–8.
[48]
R. Tanno, A. Saeedi, S. Sankaranarayanan, D. C. Alexander, and N. Silberman, “Learning from noisy labels by regularized estimation of annotator confusion,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 11244–11253.
[49]
B. Hanet al., “Co-Teaching: Robust training of deep neural networks with extremely noisy labels,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–11.
[50]
K. E. Train, Discrete Choice Methods With Simulation. Cambridge, U.K.: Cambridge Univ. Press, 2009.
[51]
L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade. Berlin, Germany: Springer, 2012, pp. 421–436.
[52]
Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 499–515.
[53]
I. J. Goodfellowet al., “Challenges in representation learning: A report on three machine learning contests,” in Proc. Int. Conf. Neural Inf. Process. Cham, Switzerland: Springer, 2013, pp. 117–124.
[54]
Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “MS-Celeb-1M: A dataset and benchmark for large-scale face recognition,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 87–102.
[55]
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016.
[56]
Y. Li, J. Zeng, S. Shan, and X. Chen, “Patch-gated CNN for occlusion-aware facial expression recognition,” in Proc. 24th Int. Conf. Pattern Recognit. (ICPR), Aug. 2018, pp. 2209–2214.
[57]
H. Siqueira, S. Magg, and S. Wermter, “Efficient facial feature learning with wide ensemble-based convolutional neural networks,” in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 4, pp. 5800–5809.
[58]
L. Lo, H. X. Xie, H.-H. Shuai, and W.-H. Cheng, “Facial chirality: Using self-face reflection to learn discriminative features for facial expression recognition,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2021, pp. 1–6.
[59]
Z. Zhao, Q. Liu, and F. Zhou, “Robust lightweight facial expression recognition network with label distribution training,” in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, no. 4, pp. 3510–3519.
[60]
D. Zeng, Z. Lin, X. Yan, Y. Liu, F. Wang, and B. Tang, “Face2Exp: Combating data biases for facial expression recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 20291–20300.
[61]
S. Xie, H. Hu, and Y. Chen, “Facial expression recognition with two-branch disentangled generative adversarial network,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 6, pp. 2359–2371, Jun. 2021.
[62]
L. V. D. Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 1–27, 2008.
[63]
O. Chapelle and A. Zien, “Semi-supervised classification by low density separation,” in Proc. Int. Workshop Artif. Intell. Statist., 2005, pp. 57–64.

Cited By

View all
  • (2024)Learning Cognitive Features as Complementary for Facial Expression RecognitionInternational Journal of Intelligent Systems10.1155/2024/73211752024Online publication date: 1-Jan-2024
  • (2024)Uncertain Facial Expression Recognition via Multi-Task Assisted CorrectionIEEE Transactions on Multimedia10.1109/TMM.2023.330120926(2531-2543)Online publication date: 1-Jan-2024
  • (2024)Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337845933(2514-2529)Online publication date: 27-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology  Volume 33, Issue 5
May 2023
524 pages

Publisher

IEEE Press

Publication History

Published: 01 May 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Learning Cognitive Features as Complementary for Facial Expression RecognitionInternational Journal of Intelligent Systems10.1155/2024/73211752024Online publication date: 1-Jan-2024
  • (2024)Uncertain Facial Expression Recognition via Multi-Task Assisted CorrectionIEEE Transactions on Multimedia10.1109/TMM.2023.330120926(2531-2543)Online publication date: 1-Jan-2024
  • (2024)Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337845933(2514-2529)Online publication date: 27-Mar-2024
  • (2023)AST-GCN: Augmented Spatial Temporal Graph Convolutional Neural Network for Gait Emotion RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334172834:6(4581-4595)Online publication date: 12-Dec-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media