research-article

Facial Structure Guided GAN for Identity-preserved Face Image De-occlusion

Authors:

Yiu-Ming Cheung,

Rong ZouAuthors Info & Claims

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 46 - 54

https://doi.org/10.1145/3460426.3463642

Published: 01 September 2021 Publication History

Abstract

In some practical scenarios, such as video surveillance and personal identification, we often have to address the recognition problem of occluded faces, where content replacement by serious occlusion with non-face objects always produces partial appearance and ambiguous representation. Under the circumstances, the performance of face recognition algorithms will often deteriorate to a certain degree. In this paper, we therefore address this problem by removing occlusions on face images and present a new two-stage Facial Structure Guided Generative Adversarial Network (FSG-GAN). In Stage I of the FSG-GAN, the variational auto-encoder is used to predict the facial structure. In Stage II, the predicted facial structure and the occluded image are concatenated and fed into a generative adversarial network (GAN) based model to synthesize the de-occlusion face image. In this way, the facial structure knowledge can be transferred to the synthesis network. Especially, in order to enable the occluded face image to be perceived well, the generator in the GAN based synthesis network utilizes the hybrid dilated convolution modules to extend the receptive field. Furthermore, aiming at further eliminating the appearance ambiguity as well as unnatural texture, a multi-receptive fields discriminator is proposed to utilize the features from different levels. Experiments on the benchmark datasets show the efficacy of the proposed FSG-GAN.

References

[1]

Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B. Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing . ACM Transactions on Graphics, Vol. 28, 3 (2009), 1--12. https://doi.org/10.1145/1531326.1531330

Digital Library

[2]

Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image Inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., USA, 417--424. https://doi.org/10.1145/344779.344972

Digital Library

[3]

Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., USA, 187--194.

Digital Library

[4]

Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. 2016. Importance Weighted Autoencoders. arxiv: 1509.00519 [cs.LG]

[5]

Jiancheng Cai, Han Hu, Shiguang Shan, and Xilin Chen. 2019. Fcsr-gan: End-to-end learning for joint face completion and super-resolution. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, 1--8. https://doi.org/10.1109/FG.2019.8756607

Digital Library

[6]

Tony F. Chan and Jianhong Shen. 2001. Nontexture inpainting by curvature-driven diffusions . Journal of Visual Communication and Image Representation, Vol. 12, 4 (2001), 436--449. https://doi.org/10.1006/jvci.2001.0487

Digital Library

[7]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848. https://doi.org/10.1109/TPAMI.2017.2699184

[8]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) . 801--818.

Digital Library

[9]

Yiu-ming Cheung, Xin Liu, and Xinge You. 2012. A local region based approach to lip tracking. Pattern Recognition, Vol. 45, 9 (2012), 3336--3347.

Digital Library

[10]

Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, and Pradeep Sen. 2012. Image melding: Combining inconsistent images using patch-based synthesis . ACM Transactions on Graphics, Vol. 31, 4 (2012). https://doi.org/10.1145/2185520.2185578

Digital Library

[11]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanosl Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4690--4699.

[12]

Jiayuan Dong, Liyan Zhang, Hanwang Zhang, and Weichen Liu. 2020. Occlusion-Aware GAN for Face De-Occlusion in the Wild. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6. https://doi.org/10.1109/ICME46284.2020.9102788

[13]

Ishan Durugkar, Ian Gemp, and Sridhar Mahadevan. 2017. Generative Multi-Adversarial Networks. arxiv: 1611.01673 [cs.LG]

[14]

Bernhard Egger, Sandro Schönborn, Andreas Schneider, Adam Kortylewski, Andreas Morel-Forster, Clemens Blumer, and Thomas Vetter. 2018. Occlusion-aware 3d morphable models and an illumination prior for face image analysis. International Journal of Computer Vision, Vol. 126, 12 (2018), 1269--1287. https://doi.org/10.1007/s11263-018--1064--8

Digital Library

[15]

Selim Esedoglu and Jianhong Shen. 2002. Digital inpainting based on the Mumford-Shah-Euler image model . European Journal of Applied Mathematics, Vol. 13, 4 (2002), 353--370. https://doi.org/10.1017/S0956792502004904

[16]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. Draw: A recurrent neural network for image generation. In International Conference on Machine Learning (ICML). 1462----1471.

[17]

Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective Face Frontalization in Unconstrained Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .

[18]

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments . Technical Report 07--49. University of Massachusetts, Amherst.

[19]

Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision (CVPR). 2439--2448.

[20]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-To-Image Translation With Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 1125--1134.

[21]

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2017. Neural Machine Translation in Linear Time. arxiv: 1610.10099 [cs.CL]

[22]

Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR) .

[23]

Lap-tak Law and Yiu-ming Cheung. 2003. Color image segmentation using rival penalized controlled competitive learning. In Proceedings of the International Joint Conference on Neural Networks, Vol. 1. 108--112. https://doi.org/10.1109/IJCNN.2003.1223306

[24]

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et almbox. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) . 4681--4690.

[25]

Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang. 2017. Generative Face Completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .

[26]

Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, and Lu Yuan. 2019. Face Parsing With RoI Tanh-Warping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5654--5663.

[27]

Hongyu Liu, Bin Jiang, Yi Xiao, and Chao Yang. 2019. Coherent semantic attention for image inpainting. In Proceedings of the IEEE International Conference on Computer Vision (CVPR). 4170--4179.

[28]

Xin Liu, Yiu-ming Cheung, Meng Li, and Hailin Liu. 2010. A lip contour extraction method using localized active contour model with automatic parameter selection. In 2010 20th International Conference on Pattern Recognition. IEEE, 4332--4335. https://doi.org/10.1109/ICPR.2010.1053

Digital Library

[29]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In The IEEE International Conference on Computer Vision. IEEE, 3730--3738.

[30]

Florian Luisier, Thierry Blu, and Michael Unser. 2007. A new SURE approach to image denoising: Interscale orthonormal wavelet thresholding. IEEE Transactions on image processing, Vol. 16, 3 (2007), 593--606. https://doi.org/10.1109/TIP.2007.891064

Digital Library

[31]

Albert Michotte, Georges Thinès, and Geneviève Crabbé. 1991. Amodal completion of perceptual structures. Michotte's experimental phenomenology of perception (1991), 140--167.

[32]

Bence Nanay. 2007. Four theories of amodal perception. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29. CogSci, USA, 1331--1336.

[33]

Bruno A Olshausen and David J Field. 1997. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision research, Vol. 37, 23 (1997), 3311--3325. https://doi.org/10.1016/S0042--6989(97)00169--7

[34]

Meng Pang, Yiu-Ming Cheung, Binghui Wang, and Jian Lou. 2019. Synergistic Generic Learning for Face Recognition From a Contaminated Single Sample per Person. IEEE Transactions on Information Forensics and Security, Vol. 15 (2019), 195--209. https://doi.org/10.1109/TIFS.2019.2919950

[35]

Javier Portilla, Vasily Strela, Martin J Wainwright, and Eero P Simoncelli. 2003. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Transactions on Image processing, Vol. 12, 11 (2003), 1338--1351. https://doi.org/10.1109/TIP.2003.818640

Digital Library

[36]

Rajesh Ranganath, Dustin Tran, and David Blei. 2016. Hierarchical variational models. In International Conference on Machine Learning (ICML). 324--333.

[37]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234--241.

[38]

Christos Sagonas, Yannis Panagakis, Stefanos Zafeiriou, and Maja Pantic. 2015. Robust statistical face frontalization. In Proceedings of the IEEE international conference on computer vision (CVPR). 3871--3879.

Digital Library

[39]

Jianhong Shen and Tony F Chan. 2002. Mathematical Models for Local Nontexture Inpaintings . SIAM J. Appl. Math., Vol. 62, 3 (January 2002), 1019--1043. https://doi.org/10.1137/S0036139900368844

Digital Library

[40]

Casper Kaae Sø nderby, Tapani Raiko, Lars Maalø e, Sø ren Kaae Sø nderby, and Ole Winther. 2016. Ladder Variational Autoencoders. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., 3738--3746.

[41]

Arash Vahdat and Jan Kautz. 2021. NVAE: A Deep Hierarchical Variational Autoencoder. arxiv: 2007.03898 [stat.ML]

[42]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. WaveNet: A Generative Model for Raw Audio. arxiv: 1609.03499 [cs.SD]

[43]

Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. 2018b. Additive margin softmax for face verification. IEEE Signal Processing Letters, Vol. 25, 7 (2018), 926--930. https://doi.org/10.1109/LSP.2018.2822810

[44]

Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. 2017. Normface: L2 hypersphere embedding for face verification. In Proceedings of the 25th ACM international conference on Multimedia. 1041--1049. https://doi.org/10.1145/3123266.3123359

Digital Library

[45]

Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018a. Understanding convolution for semantic segmentation. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 1451--1460. https://doi.org/10.1109/WACV.2018.00163

[46]

John Wright, Allen Y Yang, Arvind Ganesh, S Shankar Sastry, and Yi Ma. 2008. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 31, 2 (2008), 210--227. https://doi.org/10.1109/TPAMI.2008.79

Digital Library

[47]

Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light cnn for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, Vol. 13, 11 (2018), 2884--2896. https://doi.org/10.1109/TIFS.2018.2833032

[48]

Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N Do. 2016. Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, Vol. 2, 3 (2016).

[49]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 5505--5514.

[50]

Xiaowei Yuan and In Kyu Park. 2019. Face de-occlusion using 3d morphable model and generative adversarial network. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) . 10062--10071.

[51]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (CVPR). 2223--2232.

Cited By

Li HZhang YWang WZhang SZhang S(2025)SwapInpaint2: Towards high structural consistency in identity-guided inpainting via background-preserving GAN inversionPattern Recognition10.1016/j.patcog.2024.110969158(110969)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.110969
Sharma ANath BKar TKhasim D(2024)Masked GANs for Face Completion: A Novel Deep Learning ApproachEAI Endorsed Transactions on Pervasive Health and Technology10.4108/eetpht.9.48509Online publication date: 15-Jan-2024
https://doi.org/10.4108/eetpht.9.4850
Zhou YPang MHuang WWang B(2024)Early Diagnosing Parkinson's Disease Via a Deep Learning Model Based on Augmented Facial Expression DataICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447406(1621-1625)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447406
Show More Cited By

Index Terms

Facial Structure Guided GAN for Identity-preserved Face Image De-occlusion
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification
        Object recognition
        Reconstruction

Recommendations

Partial Face Recognition: Alignment-Free Approach

Numerous methods have been developed for holistic face recognition with impressive performance. However, few studies have tackled how to recognize an arbitrary patch of a face image. Partial faces frequently appear in unconstrained scenarios, with ...
Multi-Task Pose-Invariant Face Recognition
Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable ...
SILP-autoencoder for face de-occlusion
Abstract
Recognizing faces with partial occlusion is a challenging problem in many real-world applications. Although various methods have been proposed to deal with the facial image de-occlusion tasks, most of them only concern the local ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

August 2021

715 pages

ISBN:9781450384636

DOI:10.1145/3460426

General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Innovation and Technology Fund of Innovation and Technology Commission of the Government of the Hong Kong
SZSTC
National Natural Science Foundation of China
Hong Kong Baptist University

Conference

ICMR '21

Sponsor:

SIGMM

ICMR '21: International Conference on Multimedia Retrieval

August 21 - 24, 2021

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
224
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HZhang YWang WZhang SZhang S(2025)SwapInpaint2: Towards high structural consistency in identity-guided inpainting via background-preserving GAN inversionPattern Recognition10.1016/j.patcog.2024.110969158(110969)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.110969
Sharma ANath BKar TKhasim D(2024)Masked GANs for Face Completion: A Novel Deep Learning ApproachEAI Endorsed Transactions on Pervasive Health and Technology10.4108/eetpht.9.48509Online publication date: 15-Jan-2024
https://doi.org/10.4108/eetpht.9.4850
Zhou YPang MHuang WWang B(2024)Early Diagnosing Parkinson's Disease Via a Deep Learning Model Based on Augmented Facial Expression DataICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447406(1621-1625)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447406
Saleh KSzénási SVámossy Z(2023)Generative Adversarial Network for Overcoming Occlusion in Images: A SurveyAlgorithms10.3390/a1603017516:3(175)Online publication date: 22-Mar-2023
https://doi.org/10.3390/a16030175

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents