Abstract
Image synthesis using representations learned by deep neural networks has gained wide attention in recent years. Among the different categories of natural images, face images are very important because of their broad range of applications. However, it is very challenging to synthesize face images due to their highly complicated hierarchical structure and the uniqueness of information contained in individual face images. This paper aims at providing a comprehensive review of the recent developments and applications of face synthesis and semantic manipulations using deep learning and discusses future perspectives for improving face perception.
Reprinted from Zhmoginov and Sandler (2016) with permission
Reproduced using DFI source code (deepfeatinterp 2017) with permission
Reproduced and adapted using Deep Image Analogy source code (msracver 2018) copyrighted © 2018 MSRA CVer under a MIT license
Reprinted from Cheung et al. (2014) with permission
Reprinted from Progressive Growing of GANs source code (tkarras 2018) copyrighted © 2018, NVIDIA CORPORATION under a Creative Commons licence (Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible)
Reproduced using StyleGAN source code (NVlabs 2019) copyrighted © 2019, NVIDIA CORPORATION under a Creative Commons licence (Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible)
Reprinted from Berthelot et al. (2017) with permission
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Antipov G, Baccouche M, Dugelay JL (2017) Face aging with conditional generative adversarial networks. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 2089–2093
Berthelot D, Schumm T, Metz L (2017) Began: boundary equilibrium generative adversarial networks. arXiv:170310717
Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. arXiv:160907093
Bruce V, Young A (1986) Understanding face recognition. Br J Psychol 77(3):305–327
Brundage M, Avin S, Clark J, Toner H, Eckersley P, Garfinkel B, Dafoe A, Scharre P, Zeitzoff T, Filar B et al (2018) The malicious use of artificial intelligence: forecasting, prevention, and mitigation. arXiv:180207228
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp 2172–2180
Cheung B, Livezey JA, Bansal AK, Olshausen BA (2014) Discovering hidden factors of variation in deep networks. arXiv:14126583
Cole F, Belanger D, Krishnan D, Sarna A, Mosseri I, Freeman WT (2017) Synthesizing normalized faces from facial identity features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3703–3712
Dahl R, Norouzi M, Shlens J (2017) Pixel recursive super resolution. In: Proceedings of the IEEE international conference on computer vision, pp 5439–5448
deepfakes (2019) deepfakes/faceswap: deepfakes software for all. https://github.com/deepfakes/faceswap. Accessed 18 Apr 2020
deepfeatinterp (2017) paulu/deepfeatinterp: deep feature interpolation (cvpr 2017). https://github.com/paulu/deepfeatinterp
Dinh L, Krueger D, Bengio Y (2014) Nice: non-linear independent components estimation. arXiv:14108516
Dinh L, Sohl-Dickstein J, Bengio S (2016) Density estimation using real nvp. arXiv:160508803
Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, Courville A (2016) Adversarially learned inference. arXiv:160600704
Edwards H, Storkey A (2016) Towards a neural statistician. arXiv:160602185
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Frigo O, Sabater N, Delon J, Hellier P (2016) Split and match: example-based adaptive patch sampling for unsupervised style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 553–561
Frigo O, Sabater N, Delon J, Hellier P (2019) Video style transfer by consistent adaptive patch sampling. Vis Comput 35(3):429–443
Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. arXiv:150806576
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423
Gauthier J (2014) Conditional generative adversarial nets for convolutional face generation. Class Proj Stanf CS231N Convolutional Neural Netw Vis Recognit Winter Semester 2014(5):2
glow (2018) Glow: better reversible generative models. https://openai.com/blog/glow/. Accessed 18 Apr 2020
Goodfellow (2019) Ian goodfellow on twitter: “4.5 years of gan progress on face generation. https://t.co/kiqkuyulmc, https://t.co/s4absu536b, https://t.co/8di6k6bxvc, https://t.co/uefhewds2m, https://t.co/s6hkqz9glz... https://t.co/bqyv6zgftb”. https://twitter.com/goodfellow_ian/status/1084973596236144640?lang=en
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vis Comput 28(5):807–813
Haxby JV, Hoffman EA, Gobbini MI (2000) The distributed human neural system for face perception. Trends Cogn Sci 4(6):223–233
Haxby JV, Hoffman EA, Gobbini MI (2002) Human neural systems for face recognition and social communication. Biol Psychiatry 51(1):59–67
Hinton (2018) Ovs — what’s wrong with convolutional nets? — video detail. https://techtv.mit.edu/videos/782615f9abc64fbbafb5d0d3c3387392/. Accessed 18 Apr 2020
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, Tech Rep 07-49
Inceptionism (2015) Google ai blog: Inceptionism: going deeper into neural networks. https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:171010196
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410
Kaur P, Zhang H, Dana KJ (2017) Photo-realistic facial texture transfer. arXiv:170604306
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th international conference on machine learning-volume 70, JMLR. org, pp 1857–1865
Kim H, Carrido P, Tewari A, Xu W, Thies J, Niessner M, Pérez P, Richardt C, Zollhöfer M, Theobalt C (2018) Deep video portraits. ACM Trans Graph (TOG) 37(4):163
Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1 x 1 convolutions. In: Advances in neural information processing systems, pp 10215–10224
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:13126114
Korshunova I, Shi W, Dambre J, Theis L (2017) Fast face-swap using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 3677–3685
Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338
Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ (2017) Building machines that learn and think like people. Behav Brain Sci 40:e253
Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv:151209300
Learned-Miller GBHE (2014) Labeled faces in the wild: updates and new reporting procedures. University of Massachusetts, Amherst, Tech Rep UM-CS-2014-003
LeCun (2015) What’s wrong with deep learning?—techtalks.tv. http://techtalks.tv/talks/whats-wrong-with-deep-learning/61639/
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Li C, Wand M (2016) Combining Markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2479–2486
Li X, Liu S, Kautz J, Yang MH (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3809–3817
Liao J, Yao Y, Yuan L, Hua G, Kang SB (2017) Visual attribute transfer through deep image analogy. arXiv:170501088
Liu MY, Tuzel O (2016) Coupled generative adversarial networks. In: Advances in neural information processing systems, pp 469–477
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV)
Liu S, Ou X, Qian R, Wang W, Cao X (2016) Makeup like a superstar: deep localized makeup transfer network. arXiv:160407102
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:14111784
msracver (2018) msracver/deep-image-analogy: the source code of ’visual attribute transfer through deep image analogy’. https://github.com/msracver/Deep-Image-Analogy
Natsume R, Yatagawa T, Morishima S (2018) Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv:180403447
Nirkin Y, Keller Y, Hassner T (2019) Fsgan: subject agnostic face swapping and reenactment. In: Proceedings of the IEEE international conference on computer vision, pp 7184–7193
NVlabs (2019) Nvlabs/stylegan: stylegan—official tensorflow implementation. https://github.com/NVlabs/stylegan
Perarnau G, Van De Weijer J, Raducanu B, Álvarez JM (2016) Invertible conditional gans for image editing. arXiv:161106355
Pumarola A, Agudo A, Martinez AM, Sanfeliu A, Moreno-Noguer F (2018) Ganimation: anatomically-aware facial animation from a single image. In: Proceedings of the European conference on computer vision (ECCV), pp 818–833
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:151106434
Research F (2016) Experiments about face and voice perception. http://faceresearch.org. Accessed 18 Apr 2020
Rezende DJ, Mohamed S (2015) Variational inference with normalizing flows. arXiv:150505770
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. arXiv:14014082
Rossion B (2014) Understanding face perception by means of human electrophysiology. Trends Cogn Sci 18(6):310–318
Royer A, Bousmalis K, Gouws S, Bertsch F, Mosseri I, Cole F, Murphy K (2017) Xgan: unsupervised image-to-image translation for many-to-many mappings. arXiv:171105139
Ruder M, Dosovitskiy A, Brox T (2016) Artistic style transfer for videos. In: German conference on pattern recognition. Springer, pp 26–36
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856–3866
Salimans T, Karpathy A, Chen X, Kingma DP (2017) Pixelcnn++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv:170105517
Sanchez E, Valstar M (2018) Triple consistency loss for pairing distributions in gan-based face synthesis. arXiv:181103492
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Selim A, Elgharib M, Doyle L (2016) Painting style transfer for head portraits using convolutional neural networks. ACM Trans Graph (ToG) 35(4):129
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Susskind JM, Anderson AK, Hinton GE (2010) The Toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada, Tech Rep 3
Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing obama: learning lip sync from audio. ACM Trans Graph (TOG) 36(4):95
ThisPersonDoesNotExist (2019) This person does not exist. https://www.thispersondoesnotexist.com/. Accessed 18 Apr 2020
tkarras (2018) tkarras/progressive growing of gans for improved quality, stability, and variation. https://github.com/tkarras/progressive_growing_of_gans
Tran L, Yin X, Liu X (2017) Disentangled representation learning gan for pose-invariant face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1415–1424
Tran LQ, Yin X, Liu X (2018) Representation learning by rotating your faces. IEEE Trans Pattern Anal Mach Intell 41(12):3007–3021
Tsao DY, Livingstone MS (2008) Mechanisms of face perception. Annu Rev Neurosci 31:411–437
Upchurch P, Gardner J, Pleiss G, Pless R, Snavely N, Bala K, Weinberger K (2017) Deep feature interpolation for image content changes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7064–7073
Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016a) Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems, pp 4790–4798
Van den Oord A, Kalchbrenner N, Kavukcuoglu K (2016b) Pixel recurrent neural networks. arXiv:160106759
VanRullen R (2017) Perception science in the age of deep neural networks. Front Psychol 8:142
Vinyals O, Blundell C, Lillicrap T, Wierstra D et al (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, pp 3630–3638
Wang TC, Liu MY, Zhu JY, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. IEEE
Wu W, Zhang Y, Li C, Qian C, Change Loy C (2018) Reenactgan: learning to reenact faces via boundary transfer. In: Proceedings of the European conference on computer vision (ECCV), pp 603–619
Yan Z, Zhou XS (2017) How intelligent are convolutional neural networks?. arXiv:170906126
Yi Z, Zhang H, Tan P, Gong M (2017) Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE international conference on computer vision, pp 2849–2857
Zhmoginov A, Sandler M (2016) Inverting face embeddings with convolutional neural networks. arXiv:160604189
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abdolahnejad, M., Liu, P.X. Deep learning for face image synthesis and semantic manipulations: a review and future perspectives. Artif Intell Rev 53, 5847–5880 (2020). https://doi.org/10.1007/s10462-020-09835-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09835-4