High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images

Stephan J. Garbin¹²,
Marek Kowalski¹²,
Matthew Johnson¹² &
…
Jamie Shotton¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12373))

Included in the following conference series:

European Conference on Computer Vision

3390 Accesses
6 Citations

Abstract

Generating photorealistic images of human faces at scale remains a prohibitively difficult task using computer graphics approaches. This is because these require the simulation of light to be photorealistic, which in turn requires physically accurate modelling of geometry, materials, and light sources, for both the head and the surrounding scene. Non-photorealistic renders however are increasingly easy to produce. In contrast to computer graphics approaches, generative models learned from more readily available 2D image data have been shown to produce samples of human faces that are hard to distinguish from real data. The process of learning usually corresponds to a loss of control over the shape and appearance of the generated images. For instance, even simple disentangling tasks such as modifying the hair independently of the face, which is trivial to accomplish in a computer graphics approach, remains an open research question. In this work, we propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model which, in turn, maps the vector to a photorealistic image of a person of the same pose, expression, hair, and lighting. In contrast to most previous work, we require no synthetic training data. To the best of our knowledge, this is the first algorithm of its kind to work at a resolution of 1K and represents a significant leap forward in visual realism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

InvGAN: Invertible GANs

A Deeper Analysis of Volumetric Relightable Faces

Article Open access 31 October 2023

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

References

Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: How to edit the embedded images? (2019)
Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: How to embed images into the StyleGAN latent space? CoRR abs/1904.03189 (2019). http://arxiv.org/abs/1904.03189
AlBahar, B., Huang, J.B.: Guided image-to-image translation with bi-directional feature transformation (2019)
Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 59–66, May 2018. https://doi.org/10.1109/FG.2018.00019
Baltrusaitis, T., et al.: A high fidelity synthetic face framework for computer vision. Technical Report MSR-TR-2020-24, Microsoft (July 2020). https://www.microsoft.com/en-us/research/publication/high-fidelity-face-synthetics/
Benaim, S., Wolf, L.: One-shot unsupervised cross domain translation. CoRR abs/1806.06029 (2018). http://arxiv.org/abs/1806.06029
Bérard, P., Bradley, D., Gross, M., Beeler, T.: Lightweight eye capture using a parametric model. ACM Trans. Graph. 35(4), 1–12 (2016). https://doi.org/10.1145/2897824.2925962
Article Google Scholar
Bi, S., Sunkavalli, K., Perazzi, F., Shechtman, E., Kim, V.G., Ramamoorthi, R.: Deep CG2Real: synthetic-to-real translation via image disentanglement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Bishop, C.M.: Mixture density networks. Technical report, Citeseer (1994)
Google Scholar
Burt, P.J.: Fast filter transform for image processing. Comput. Graph. Image Proc. 16(1), 20–51 (1981). https://doi.org/10.1016/0146-664X(81)90092-7
Article Google Scholar
Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. In: Fischler, M.A., Firschein, O. (eds.) Readings in Computer Vision, pp. 671–679. Morgan Kaufmann, San Francisco (1987). https://doi.org/10.1016/B978-0-08-051581-6.50065-9
Cherian, A., Sullivan, A.: Sem-GAN: semantically-consistent image-to-image translation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1797–1806, January 2019. https://doi.org/10.1109/WACV.2019.00196
Dinh, L., Krueger, D., Bengio, Y.: Nice: Non-linear independent components estimation (2014)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP (2016)
Google Scholar
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Zhang, K., Tao, D.: Geometry-consistent adversarial networks for one-sided unsupervised domain mapping. CoRR abs/1809.05852 (2018). http://arxiv.org/abs/1809.05852
Gajane, P.: On formalizing fairness in prediction with machine learning. CoRR abs/1710.03184 (2017). http://arxiv.org/abs/1710.03184
Gecer, B., Bhattarai, B., Kittler, J., Kim, T.: Semi-supervised adversarial learning to generate photorealistic face images of new identities from 3D morphable model. CoRR abs/1804.03675 (2018). http://arxiv.org/abs/1804.03675
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. CoRR abs/1811.12231 (2018). http://arxiv.org/abs/1811.12231
Goodfellow, I.J., et al.: Generative Adversarial Networks. ArXiv e-prints (June 2014)
Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 9, pp. 297–304. PMLR, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. http://proceedings.mlr.press/v9/gutmann10a.html
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Klambauer, G., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a nash equilibrium. CoRR abs/1706.08500 (2017). http://arxiv.org/abs/1706.08500
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: International Conference on Computer Vision (ICCV), Venice, Italy (2017). https://vision.cornell.edu/se3/wp-content/uploads/2017/08/adain.pdf. oral
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part III. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. CoRR abs/1611.07004 (2016). http://arxiv.org/abs/1611.07004
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. CoRR abs/1907.07171 (2019). http://arxiv.org/abs/1907.07171
Jimenez Rezende, D., Mohamed, S., Wierstra, D.: Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv e-prints (January 2014)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018). http://arxiv.org/abs/1812.04948
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN (2019)
Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, June 2014. https://doi.org/10.1109/CVPR.2014.241
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ArXiv e-prints (December 2013)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions (2018)
Google Scholar
Larochelle, H., Murray, I.: The neural autoregressive distribution estimator. In: The Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR: W and CP, vol. 15, pp. 29–37 (2011)
Google Scholar
Lin, J., Xia, Y., Liu, S., Qin, T., Chen, Z.: Zstgan: An adversarial approach for unsupervised zero-shot image-to-image translation. CoRR abs/1906.00184 (2019). http://arxiv.org/abs/1906.00184
Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Mori, M., MacDorman, K., Kageki, N.: The uncanny valley. IEEE Robot. Autom. Mag. 19, 98–100 (2012). https://doi.org/10.1109/MRA.2012.2192811
Article Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). http://arxiv.org/abs/1511.06434
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. CoRR abs/1606.03498 (2016). http://arxiv.org/abs/1606.03498
Sohl-Dickstein, J., Weiss, E.A., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. CoRR abs/1503.03585 (2015). http://arxiv.org/abs/1503.03585
Sutherland, D.J., et al.: Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy. ArXiv e-prints (November 2016)
Google Scholar
Uria, B., Murray, I., Larochelle, H.: RNADE: the real-valued neural autoregressive density-estimator. Adv. Neural Inf. Proc. Syst. 26, 2175–2183 (2013)
Google Scholar
Wang, C., Zheng, H., Yu, Z., Zheng, Z., Gu, Z., Zheng, B.: Discriminative region proposal adversarial networks for high-quality image-to-image translation. CoRR abs/1711.09554 (2017). http://arxiv.org/abs/1711.09554
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs (2017)
Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, pp. 1398–1402, November 2003. https://doi.org/10.1109/ACSSC.2003.1292216
Wrenninge, M., Villemin, R., Hery, C.: Path traced subsurface scattering using anisotropic phase functions and non-exponential free flights. Technical report
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. CoRR abs/1801.03924 (2018). http://arxiv.org/abs/1801.03924
Zhang, R., Pfister, T., Li, J.: Harmonic unpaired image-to-image translation. CoRR abs/1902.09727 (2019). http://arxiv.org/abs/1902.09727
Zhao, S., Ren, H., Yuan, A., Song, J., Goodman, N.D., Ermon, S.: Bias and generalization in deep generative models: An empirical study. CoRR abs/1811.03259 (2018). http://arxiv.org/abs/1811.03259
Zheng, Z., Yu, Z., Zheng, H., Yang, Y., Shen, H.T.: One-shot image-to-image translation via part-global learning with a multi-adversarial framework. CoRR abs/1905.04729 (2019). http://arxiv.org/abs/1905.04729
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft, Cambridge, UK
Stephan J. Garbin, Marek Kowalski, Matthew Johnson & Jamie Shotton

Authors

Stephan J. Garbin
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kowalski
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Shotton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan J. Garbin .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 16670 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garbin, S.J., Kowalski, M., Johnson, M., Shotton, J. (2020). High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-58604-1_14
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58603-4
Online ISBN: 978-3-030-58604-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images

Abstract

Access this chapter

Subscribe and save

Buy Now