A Two-Stage Deep Generative Model for Masked Face Synthesis
<p>An overall framework of the proposed method. In this figure, an RGB face image of 80 × 80 pixels is used for input as an example. The stage of mask pattern generation aims to produce pose-alike face with a mask pattern reflecting the pose view of an input unmasked face. The stage of mask region extraction focuses on localizing the mask pattern by producing the segmentation map. In the stage of image masking and fusion, the mask pattern was fused with the input unmasked face, resulting in a realistic masked face.</p> "> Figure 2
<p>(<b>a</b>) Input of the generator <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mi mathvariant="normal">p</mi> </msub> </mrow> </semantics></math>. (<b>b</b>) Output of the generator <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mi mathvariant="normal">p</mi> </msub> </mrow> </semantics></math>. (<b>c</b>) Output of the generator <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mi mathvariant="normal">r</mi> </msub> </mrow> </semantics></math>. (<b>d</b>) The synthesized face.</p> "> Figure 3
<p>Image pairs of unmasked face and masked face used for learning the mask pattern generator. The masked face images are obtained by using the SNOW application. The pose views of the used faces can be roughly categorized into seven groups (left-to-right): (<b>1</b>) −45~60 degrees; (<b>2</b>) −15~30 degrees; (<b>3</b>) frontal; (<b>4</b>) +15~30 degrees; (<b>5</b>) +45~60 degrees; (<b>6</b>) up; and (<b>7</b>) down.</p> "> Figure 4
<p>Examples for a pair of masked face image (<b>left</b>) and segmentation map image (<b>right</b>) that can be used to learn <math display="inline"><semantics> <mrow> <msub> <mi>G</mi> <mi mathvariant="normal">r</mi> </msub> </mrow> </semantics></math>.</p> "> Figure 5
<p>Examples for a segmentation map <math display="inline"><semantics> <mrow> <mover> <mi>s</mi> <mo stretchy="false">^</mo> </mover> <mo> </mo> </mrow> </semantics></math>(<b>left</b>) and the processed binary segmentation map <math display="inline"><semantics> <mrow> <msub> <mover> <mi>s</mi> <mo stretchy="false">^</mo> </mover> <mi mathvariant="normal">b</mi> </msub> </mrow> </semantics></math> (<b>right</b>).</p> "> Figure 6
<p>Results for different facial poses and resolutions. (<b>a</b>) −45~60 degrees. (<b>b</b>) −15~30 degrees. (<b>c</b>) frontal. (<b>d</b>) +15~30 degrees. (<b>e</b>) +45~60 degrees. (<b>f</b>) down. (<b>g</b>) up.</p> "> Figure 7
<p>Face retrieval result. Faces were sorted in ascending order by pose vector distance (i.e., the smallest Euclidean distance value is ranked as 1st).</p> "> Figure 8
<p>The CNN classifier used for measuring the FR rates. A face image of 80 × 80 pixels was used as an input image in this figure.</p> "> Figure 9
<p>Recognition rates for different facial resolutions. The recognition rates measured for unmasked face images were also present for comparison.</p> "> Figure 10
<p>Comparison results on LFW database. (<b>a</b>) Input. (<b>b</b>) Method in [<a href="#B12-sensors-22-07903" class="html-bibr">12</a>]. (<b>c</b>) Method in [<a href="#B12-sensors-22-07903" class="html-bibr">12</a>]. (<b>d</b>) Method in [<a href="#B16-sensors-22-07903" class="html-bibr">16</a>]. (<b>e</b>) Method in [<a href="#B17-sensors-22-07903" class="html-bibr">17</a>]. (<b>f</b>) Proposed method.</p> "> Figure 11
<p>Results for the proposed masked face synthesis on LFW database. The results were categorized into four different challenges.</p> ">
Abstract
:1. Introduction
- A two-stage generative model based on cascading two convolutional auto-encoders (CAEs) [18] is introduced in this paper. The goal of the former CAE is to obtain a virtual mask pattern suitable for an input face image in terms of pose view. The latter CAE aims to generate a segmentation map to localize the mask pattern region. Based on the segmentation map, the mask pattern can be successfully fused with the input face by means of simple image processing techniques (e.g., pixel-wise sum). Different from the methods in [12,16,17], the proposed generative model relies on face appearance reconstruction without any facial landmark detection or localization techniques.
- With the appearance-based approach described above, the proposed generative model can be practically used for constructing large datasets of masked face images. As demonstrated in experiments, the proposed method is able to process low resolution faces of 25 × 25 pixels with acceptable synthesis results without high loss in recognition rate (refer to Section 4.3). Additionally, the proposed method works well with moderately rotated face images. It is possible to complete learning of the two complementary CAEs with tens of seconds using a PC with a single GPU (refer to Table 1). Hence, one could easily extend the cascaded network for applying various mask patterns.
2. Overview of Proposed Method
3. Masked Face Synthesis Using the Proposed Generative Model
3.1. Mask Pattern Generation
3.2. Mask Region Extraction
3.3. Image Masking and Fusion
4. Experiment
4.1. Results for Various Pose Views and Resolutions
- (1)
- The proposed method was able to generate similar mask synthesis results for the three different facial resolutions of an input face image. This demonstrated that the proposed method was robust to variation in facial resolution.
- (2)
- The generated mask pattern fitted accurately the faces with moderate out-of-plane rotations in pitch or yaw. This was because the generator could be learned to reconstruct pose-alike faces (refer to Section 4.2) by utilizing training faces with different pose views.
- (3)
- Thanks to the reconstruction ability of the appearance-based method, the proposed model was able to generate the masked faces even for the faces occluded by a hand (see Figure 6c).
4.2. Analysis on Facial Pose
4.3. Analysis on Facial Resolution
4.4. Results on Real-World Face Images (LFW Database)
5. Conclusions
- (1)
- The generative model was very compact and easy to implement. The two generator networks did not require complicated network settings (recall that the two generators shared the same network architecture which have 16 feature maps for convolution layer and 225 nodes for hidden layer).
- (2)
- As demonstrated in qualitative and quantitative experiments, the generative model could be robust against moderate out-of-plane rotation (up to ± 60 degrees) and resolution variation present in face images (80 × 80, 40 × 40, and 25 × 25 pixels).
- (3)
- The generative model had very low computational cost for both training and inference. Because it took only 0.28 s to process 397 face images, the model could rapidly generate synthesized masked faces with real-time processing. Furthermore, due to the compact network architecture, the generative model could be easily extended to other mask patterns by simply retraining the generator networks.
Funding
Conflicts of Interest
References
- Alzu’bi, A.; Albalas, F.; L-Hadhrami, T.A.; Younis, L.B.; Bashayreh, A. Masked face recognition using deep learning: A review. Electronics 2021, 10, 2666. [Google Scholar] [CrossRef]
- Vu, H.N.; Nguyen, M.H.; Pham, C. Masked face recognition with convolutional neural networks and local binary patterns. Appl. Intell. 2022, 52, 5497–5512. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Guo, K.; Lu, Y.; Liu, L. Cropping and attention based approach for masked face recognition. Appl. Intell. 2021, 51, 3012–3025. [Google Scholar] [CrossRef] [PubMed]
- Hariri, W. Efficient Masked Face Recognition Method during the COVID-19 Pandemic. 2020. Available online: https://www.researchsquare.com/article/rs-39289/v3 (accessed on 8 October 2022).
- Din, N.U.; Javed, K.; Bae, S.; Yi, J. A novel GAN-based network for unmasking of masked face. IEEE Access 2020, 8, 44276–44287. [Google Scholar] [CrossRef]
- Priya, G.N.; Banu, R.W. Occlusion invariant face recognition using a mean based weight matrix and support vector machine. Sadhana 2014, 39, 303–315. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks. In Proceedings of the Neural Information and Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Wang, Z.; Wang, G.; Huang, B.; Xiong, Z.; Hong, Q.; Wu, H.; Yi, P.; Jiang, K.; Wang, N.; Pei, Y.; et al. Masked Face Recognition Dataset and Application. 2020. Available online: http://arxiv.org/abs/2003.09093 (accessed on 8 October 2022).
- Huang, G.B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17 October 2008. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Boyko, N.; Basystiuk, O.; Shakhovska, N. Performance evaluation and comparison of software for face recognition, based on dlib and opencv library. In Proceedings of the IEEE Second Conference on Data Stream Mining and Processing, Lviv, Ukraine, 21 August 2018. [Google Scholar]
- Ngan, M.; Grother, P.; Hanaoka, K. Ongoing Face Recognition Vendor Test (FRVT) Part 6A: Face Recognition Accuracy with Masks Using Pre- COVID-19 Algorithms; NLST: Gaithersburg, MD, USA, 2020. [Google Scholar] [CrossRef]
- Anwar, A.; Raychowdhury, A. Masked face recognition for secure authentication. arXiv 2020, arXiv:2008.11104. [Google Scholar]
- Masci, J.; Meier, U.; Cire¸san, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the International Conference on Artificial Neural Networks, Berlin, Germany, 14 June 2011; pp. 52–59. [Google Scholar]
- Available online: https://en.wikipedia.org/wiki/Snow_(app) (accessed on 8 October 2022).
- Tarrés, F.; Rama, A. GTAV Face Database. Available online: http://gps-tsc.upc.es/GTAV/ResearchAreas/UPCFaceDatabase/GTAVFaceDatabase.htm (accessed on 8 October 2022).
- Wang, Z.; Bovik, A.C. Mean Squared Error: Love it or leave it? A new look at signal fidelty measures. IEEE Signal Process. Mag. 2009, 26, 98–117. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. Proc. Int. Conf. Learn. Represent. 2015, 1–41. [Google Scholar] [CrossRef]
- Available online: https://www.kaggle.com/datasets/akashguna/lfw-dataset-with-masks?resource=download (accessed on 8 October 2022).
- Available online: https://github.com/leesh903/masked-face-dataset-LFW- (accessed on 8 October 2022).
- Ullah, N.; Javed, A.; Ghazanfar, M.A.; Alsufyani, A.; Bourouis, S. A novel DeepMaskNet model for face mask detection and masked facial recognition. J. King Saud Univ.-Comput. Inf. Sci. 2022. [Google Scholar] [CrossRef]
- Wang, H.; Wang, Y.; Cao, Y. Video-based face recognition: A survey. World Acad. Sci. Eng. Technol. 2009, 60, 293–302. [Google Scholar]
- Stallkamp, J.; Ekenel, H.K.; Stiefelhagen, R. Video-based face recognition on real-world dataset. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- Bashbaghi, S.; Granger, E.; Sabourin, R.; Parchami, M. Deep Learning Architectures for Face Recognition in Video Surveillance; Springer: Singapore, 2019; pp. 133–154. [Google Scholar]
80 × 80 Was Selected as Facial Resolution | ||
---|---|---|
Generator | Computation Time (s) | |
Learning | Inference (for Processing 397 Face Images) | |
Gp | 25.66 | 0.14 |
Gr | 25.04 | 0.14 |
Performing Masked Face Synthesis | Performing Face Recognition | Recognition Rate (%) |
---|---|---|
at 80 × 80 pixels | at 80 × 80 pixels | 84.15 |
at 40 × 40 pixels | at 40 × 40 pixels | 81.11 |
at 25 × 25 pixels | at 25 × 25 pixels | 73.80 |
at 80 × 80 pixels | at 25 × 25 pixels (synthesizing masked face at 80 × 80 and resizing it to 25 × 25 pixels for face recognition) | 73.54 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, S. A Two-Stage Deep Generative Model for Masked Face Synthesis. Sensors 2022, 22, 7903. https://doi.org/10.3390/s22207903
Lee S. A Two-Stage Deep Generative Model for Masked Face Synthesis. Sensors. 2022; 22(20):7903. https://doi.org/10.3390/s22207903
Chicago/Turabian StyleLee, Seungho. 2022. "A Two-Stage Deep Generative Model for Masked Face Synthesis" Sensors 22, no. 20: 7903. https://doi.org/10.3390/s22207903
APA StyleLee, S. (2022). A Two-Stage Deep Generative Model for Masked Face Synthesis. Sensors, 22(20), 7903. https://doi.org/10.3390/s22207903