Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3460426.3463642acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Facial Structure Guided GAN for Identity-preserved Face Image De-occlusion

Published: 01 September 2021 Publication History

Abstract

In some practical scenarios, such as video surveillance and personal identification, we often have to address the recognition problem of occluded faces, where content replacement by serious occlusion with non-face objects always produces partial appearance and ambiguous representation. Under the circumstances, the performance of face recognition algorithms will often deteriorate to a certain degree. In this paper, we therefore address this problem by removing occlusions on face images and present a new two-stage Facial Structure Guided Generative Adversarial Network (FSG-GAN). In Stage I of the FSG-GAN, the variational auto-encoder is used to predict the facial structure. In Stage II, the predicted facial structure and the occluded image are concatenated and fed into a generative adversarial network (GAN) based model to synthesize the de-occlusion face image. In this way, the facial structure knowledge can be transferred to the synthesis network. Especially, in order to enable the occluded face image to be perceived well, the generator in the GAN based synthesis network utilizes the hybrid dilated convolution modules to extend the receptive field. Furthermore, aiming at further eliminating the appearance ambiguity as well as unnatural texture, a multi-receptive fields discriminator is proposed to utilize the features from different levels. Experiments on the benchmark datasets show the efficacy of the proposed FSG-GAN.

References

[1]
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B. Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing . ACM Transactions on Graphics, Vol. 28, 3 (2009), 1--12. https://doi.org/10.1145/1531326.1531330
[2]
Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image Inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., USA, 417--424. https://doi.org/10.1145/344779.344972
[3]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., USA, 187--194.
[4]
Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. 2016. Importance Weighted Autoencoders. arxiv: 1509.00519 [cs.LG]
[5]
Jiancheng Cai, Han Hu, Shiguang Shan, and Xilin Chen. 2019. Fcsr-gan: End-to-end learning for joint face completion and super-resolution. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, 1--8. https://doi.org/10.1109/FG.2019.8756607
[6]
Tony F. Chan and Jianhong Shen. 2001. Nontexture inpainting by curvature-driven diffusions . Journal of Visual Communication and Image Representation, Vol. 12, 4 (2001), 436--449. https://doi.org/10.1006/jvci.2001.0487
[7]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848. https://doi.org/10.1109/TPAMI.2017.2699184
[8]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) . 801--818.
[9]
Yiu-ming Cheung, Xin Liu, and Xinge You. 2012. A local region based approach to lip tracking. Pattern Recognition, Vol. 45, 9 (2012), 3336--3347.
[10]
Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, and Pradeep Sen. 2012. Image melding: Combining inconsistent images using patch-based synthesis . ACM Transactions on Graphics, Vol. 31, 4 (2012). https://doi.org/10.1145/2185520.2185578
[11]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanosl Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4690--4699.
[12]
Jiayuan Dong, Liyan Zhang, Hanwang Zhang, and Weichen Liu. 2020. Occlusion-Aware GAN for Face De-Occlusion in the Wild. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6. https://doi.org/10.1109/ICME46284.2020.9102788
[13]
Ishan Durugkar, Ian Gemp, and Sridhar Mahadevan. 2017. Generative Multi-Adversarial Networks. arxiv: 1611.01673 [cs.LG]
[14]
Bernhard Egger, Sandro Schönborn, Andreas Schneider, Adam Kortylewski, Andreas Morel-Forster, Clemens Blumer, and Thomas Vetter. 2018. Occlusion-aware 3d morphable models and an illumination prior for face image analysis. International Journal of Computer Vision, Vol. 126, 12 (2018), 1269--1287. https://doi.org/10.1007/s11263-018--1064--8
[15]
Selim Esedoglu and Jianhong Shen. 2002. Digital inpainting based on the Mumford-Shah-Euler image model . European Journal of Applied Mathematics, Vol. 13, 4 (2002), 353--370. https://doi.org/10.1017/S0956792502004904
[16]
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. Draw: A recurrent neural network for image generation. In International Conference on Machine Learning (ICML). 1462----1471.
[17]
Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective Face Frontalization in Unconstrained Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[18]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments . Technical Report 07--49. University of Massachusetts, Amherst.
[19]
Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision (CVPR). 2439--2448.
[20]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-To-Image Translation With Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 1125--1134.
[21]
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2017. Neural Machine Translation in Linear Time. arxiv: 1610.10099 [cs.CL]
[22]
Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR) .
[23]
Lap-tak Law and Yiu-ming Cheung. 2003. Color image segmentation using rival penalized controlled competitive learning. In Proceedings of the International Joint Conference on Neural Networks, Vol. 1. 108--112. https://doi.org/10.1109/IJCNN.2003.1223306
[24]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et almbox. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) . 4681--4690.
[25]
Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang. 2017. Generative Face Completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[26]
Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, and Lu Yuan. 2019. Face Parsing With RoI Tanh-Warping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5654--5663.
[27]
Hongyu Liu, Bin Jiang, Yi Xiao, and Chao Yang. 2019. Coherent semantic attention for image inpainting. In Proceedings of the IEEE International Conference on Computer Vision (CVPR). 4170--4179.
[28]
Xin Liu, Yiu-ming Cheung, Meng Li, and Hailin Liu. 2010. A lip contour extraction method using localized active contour model with automatic parameter selection. In 2010 20th International Conference on Pattern Recognition. IEEE, 4332--4335. https://doi.org/10.1109/ICPR.2010.1053
[29]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In The IEEE International Conference on Computer Vision. IEEE, 3730--3738.
[30]
Florian Luisier, Thierry Blu, and Michael Unser. 2007. A new SURE approach to image denoising: Interscale orthonormal wavelet thresholding. IEEE Transactions on image processing, Vol. 16, 3 (2007), 593--606. https://doi.org/10.1109/TIP.2007.891064
[31]
Albert Michotte, Georges Thinès, and Geneviève Crabbé. 1991. Amodal completion of perceptual structures. Michotte's experimental phenomenology of perception (1991), 140--167.
[32]
Bence Nanay. 2007. Four theories of amodal perception. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29. CogSci, USA, 1331--1336.
[33]
Bruno A Olshausen and David J Field. 1997. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision research, Vol. 37, 23 (1997), 3311--3325. https://doi.org/10.1016/S0042--6989(97)00169--7
[34]
Meng Pang, Yiu-Ming Cheung, Binghui Wang, and Jian Lou. 2019. Synergistic Generic Learning for Face Recognition From a Contaminated Single Sample per Person. IEEE Transactions on Information Forensics and Security, Vol. 15 (2019), 195--209. https://doi.org/10.1109/TIFS.2019.2919950
[35]
Javier Portilla, Vasily Strela, Martin J Wainwright, and Eero P Simoncelli. 2003. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Transactions on Image processing, Vol. 12, 11 (2003), 1338--1351. https://doi.org/10.1109/TIP.2003.818640
[36]
Rajesh Ranganath, Dustin Tran, and David Blei. 2016. Hierarchical variational models. In International Conference on Machine Learning (ICML). 324--333.
[37]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234--241.
[38]
Christos Sagonas, Yannis Panagakis, Stefanos Zafeiriou, and Maja Pantic. 2015. Robust statistical face frontalization. In Proceedings of the IEEE international conference on computer vision (CVPR). 3871--3879.
[39]
Jianhong Shen and Tony F Chan. 2002. Mathematical Models for Local Nontexture Inpaintings . SIAM J. Appl. Math., Vol. 62, 3 (January 2002), 1019--1043. https://doi.org/10.1137/S0036139900368844
[40]
Casper Kaae Sø nderby, Tapani Raiko, Lars Maalø e, Sø ren Kaae Sø nderby, and Ole Winther. 2016. Ladder Variational Autoencoders. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., 3738--3746.
[41]
Arash Vahdat and Jan Kautz. 2021. NVAE: A Deep Hierarchical Variational Autoencoder. arxiv: 2007.03898 [stat.ML]
[42]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. WaveNet: A Generative Model for Raw Audio. arxiv: 1609.03499 [cs.SD]
[43]
Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. 2018b. Additive margin softmax for face verification. IEEE Signal Processing Letters, Vol. 25, 7 (2018), 926--930. https://doi.org/10.1109/LSP.2018.2822810
[44]
Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. 2017. Normface: L2 hypersphere embedding for face verification. In Proceedings of the 25th ACM international conference on Multimedia. 1041--1049. https://doi.org/10.1145/3123266.3123359
[45]
Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018a. Understanding convolution for semantic segmentation. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 1451--1460. https://doi.org/10.1109/WACV.2018.00163
[46]
John Wright, Allen Y Yang, Arvind Ganesh, S Shankar Sastry, and Yi Ma. 2008. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 31, 2 (2008), 210--227. https://doi.org/10.1109/TPAMI.2008.79
[47]
Xiang Wu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. A light cnn for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, Vol. 13, 11 (2018), 2884--2896. https://doi.org/10.1109/TIFS.2018.2833032
[48]
Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh N Do. 2016. Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, Vol. 2, 3 (2016).
[49]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 5505--5514.
[50]
Xiaowei Yuan and In Kyu Park. 2019. Face de-occlusion using 3d morphable model and generative adversarial network. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) . 10062--10071.
[51]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (CVPR). 2223--2232.

Cited By

View all
  • (2025)SwapInpaint2: Towards high structural consistency in identity-guided inpainting via background-preserving GAN inversionPattern Recognition10.1016/j.patcog.2024.110969158(110969)Online publication date: Feb-2025
  • (2024)Masked GANs for Face Completion: A Novel Deep Learning ApproachEAI Endorsed Transactions on Pervasive Health and Technology10.4108/eetpht.9.48509Online publication date: 15-Jan-2024
  • (2024)Early Diagnosing Parkinson's Disease Via a Deep Learning Model Based on Augmented Facial Expression DataICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447406(1621-1625)Online publication date: 14-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. face de-occlusion
  2. generative adversarial networks
  3. partial face recognition

Qualifiers

  • Research-article

Funding Sources

  • The Innovation and Technology Fund of Innovation and Technology Commission of the Government of the Hong Kong
  • SZSTC
  • National Natural Science Foundation of China
  • Hong Kong Baptist University

Conference

ICMR '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)SwapInpaint2: Towards high structural consistency in identity-guided inpainting via background-preserving GAN inversionPattern Recognition10.1016/j.patcog.2024.110969158(110969)Online publication date: Feb-2025
  • (2024)Masked GANs for Face Completion: A Novel Deep Learning ApproachEAI Endorsed Transactions on Pervasive Health and Technology10.4108/eetpht.9.48509Online publication date: 15-Jan-2024
  • (2024)Early Diagnosing Parkinson's Disease Via a Deep Learning Model Based on Augmented Facial Expression DataICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447406(1621-1625)Online publication date: 14-Apr-2024
  • (2023)Generative Adversarial Network for Overcoming Occlusion in Images: A SurveyAlgorithms10.3390/a1603017516:3(175)Online publication date: 22-Mar-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media