Abstract
Generating photorealistic images of human faces at scale remains a prohibitively difficult task using computer graphics approaches. This is because these require the simulation of light to be photorealistic, which in turn requires physically accurate modelling of geometry, materials, and light sources, for both the head and the surrounding scene. Non-photorealistic renders however are increasingly easy to produce. In contrast to computer graphics approaches, generative models learned from more readily available 2D image data have been shown to produce samples of human faces that are hard to distinguish from real data. The process of learning usually corresponds to a loss of control over the shape and appearance of the generated images. For instance, even simple disentangling tasks such as modifying the hair independently of the face, which is trivial to accomplish in a computer graphics approach, remains an open research question. In this work, we propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model which, in turn, maps the vector to a photorealistic image of a person of the same pose, expression, hair, and lighting. In contrast to most previous work, we require no synthetic training data. To the best of our knowledge, this is the first algorithm of its kind to work at a resolution of 1K and represents a significant leap forward in visual realism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: How to edit the embedded images? (2019)
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: How to embed images into the StyleGAN latent space? CoRR abs/1904.03189 (2019). http://arxiv.org/abs/1904.03189
AlBahar, B., Huang, J.B.: Guided image-to-image translation with bi-directional feature transformation (2019)
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 59–66, May 2018. https://doi.org/10.1109/FG.2018.00019
Baltrusaitis, T., et al.: A high fidelity synthetic face framework for computer vision. Technical Report MSR-TR-2020-24, Microsoft (July 2020). https://www.microsoft.com/en-us/research/publication/high-fidelity-face-synthetics/
Benaim, S., Wolf, L.: One-shot unsupervised cross domain translation. CoRR abs/1806.06029 (2018). http://arxiv.org/abs/1806.06029
Bérard, P., Bradley, D., Gross, M., Beeler, T.: Lightweight eye capture using a parametric model. ACM Trans. Graph. 35(4), 1–12 (2016). https://doi.org/10.1145/2897824.2925962
Bi, S., Sunkavalli, K., Perazzi, F., Shechtman, E., Kim, V.G., Ramamoorthi, R.: Deep CG2Real: synthetic-to-real translation via image disentanglement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
Bishop, C.M.: Mixture density networks. Technical report, Citeseer (1994)
Burt, P.J.: Fast filter transform for image processing. Comput. Graph. Image Proc. 16(1), 20–51 (1981). https://doi.org/10.1016/0146-664X(81)90092-7
Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. In: Fischler, M.A., Firschein, O. (eds.) Readings in Computer Vision, pp. 671–679. Morgan Kaufmann, San Francisco (1987). https://doi.org/10.1016/B978-0-08-051581-6.50065-9
Cherian, A., Sullivan, A.: Sem-GAN: semantically-consistent image-to-image translation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1797–1806, January 2019. https://doi.org/10.1109/WACV.2019.00196
Dinh, L., Krueger, D., Bengio, Y.: Nice: Non-linear independent components estimation (2014)
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP (2016)
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Zhang, K., Tao, D.: Geometry-consistent adversarial networks for one-sided unsupervised domain mapping. CoRR abs/1809.05852 (2018). http://arxiv.org/abs/1809.05852
Gajane, P.: On formalizing fairness in prediction with machine learning. CoRR abs/1710.03184 (2017). http://arxiv.org/abs/1710.03184
Gecer, B., Bhattarai, B., Kittler, J., Kim, T.: Semi-supervised adversarial learning to generate photorealistic face images of new identities from 3D morphable model. CoRR abs/1804.03675 (2018). http://arxiv.org/abs/1804.03675
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. CoRR abs/1811.12231 (2018). http://arxiv.org/abs/1811.12231
Goodfellow, I.J., et al.: Generative Adversarial Networks. ArXiv e-prints (June 2014)
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 9, pp. 297–304. PMLR, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. http://proceedings.mlr.press/v9/gutmann10a.html
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Klambauer, G., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a nash equilibrium. CoRR abs/1706.08500 (2017). http://arxiv.org/abs/1706.08500
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: International Conference on Computer Vision (ICCV), Venice, Italy (2017). https://vision.cornell.edu/se3/wp-content/uploads/2017/08/adain.pdf. oral
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part III. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. CoRR abs/1611.07004 (2016). http://arxiv.org/abs/1611.07004
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. CoRR abs/1907.07171 (2019). http://arxiv.org/abs/1907.07171
Jimenez Rezende, D., Mohamed, S., Wierstra, D.: Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv e-prints (January 2014)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018). http://arxiv.org/abs/1812.04948
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN (2019)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, June 2014. https://doi.org/10.1109/CVPR.2014.241
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ArXiv e-prints (December 2013)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions (2018)
Larochelle, H., Murray, I.: The neural autoregressive distribution estimator. In: The Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR: W and CP, vol. 15, pp. 29–37 (2011)
Lin, J., Xia, Y., Liu, S., Qin, T., Chen, Z.: Zstgan: An adversarial approach for unsupervised zero-shot image-to-image translation. CoRR abs/1906.00184 (2019). http://arxiv.org/abs/1906.00184
Liu, M.Y., et al.: Few-shot unsupervised image-to-image translation. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Mori, M., MacDorman, K., Kageki, N.: The uncanny valley. IEEE Robot. Autom. Mag. 19, 98–100 (2012). https://doi.org/10.1109/MRA.2012.2192811
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). http://arxiv.org/abs/1511.06434
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. CoRR abs/1606.03498 (2016). http://arxiv.org/abs/1606.03498
Sohl-Dickstein, J., Weiss, E.A., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. CoRR abs/1503.03585 (2015). http://arxiv.org/abs/1503.03585
Sutherland, D.J., et al.: Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy. ArXiv e-prints (November 2016)
Uria, B., Murray, I., Larochelle, H.: RNADE: the real-valued neural autoregressive density-estimator. Adv. Neural Inf. Proc. Syst. 26, 2175–2183 (2013)
Wang, C., Zheng, H., Yu, Z., Zheng, Z., Gu, Z., Zheng, B.: Discriminative region proposal adversarial networks for high-quality image-to-image translation. CoRR abs/1711.09554 (2017). http://arxiv.org/abs/1711.09554
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs (2017)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, pp. 1398–1402, November 2003. https://doi.org/10.1109/ACSSC.2003.1292216
Wrenninge, M., Villemin, R., Hery, C.: Path traced subsurface scattering using anisotropic phase functions and non-exponential free flights. Technical report
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. CoRR abs/1801.03924 (2018). http://arxiv.org/abs/1801.03924
Zhang, R., Pfister, T., Li, J.: Harmonic unpaired image-to-image translation. CoRR abs/1902.09727 (2019). http://arxiv.org/abs/1902.09727
Zhao, S., Ren, H., Yuan, A., Song, J., Goodman, N.D., Ermon, S.: Bias and generalization in deep generative models: An empirical study. CoRR abs/1811.03259 (2018). http://arxiv.org/abs/1811.03259
Zheng, Z., Yu, Z., Zheng, H., Yang, Y., Shen, H.T.: One-shot image-to-image translation via part-global learning with a multi-adversarial framework. CoRR abs/1905.04729 (2019). http://arxiv.org/abs/1905.04729
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Garbin, S.J., Kowalski, M., Johnson, M., Shotton, J. (2020). High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-58604-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58603-4
Online ISBN: 978-3-030-58604-1
eBook Packages: Computer ScienceComputer Science (R0)