Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

HRInversion: High-Resolution GAN Inversion for Cross-Domain Image Synthesis

Published: 01 May 2023 Publication History

Abstract

We investigate GAN inversion problems of using pre-trained GANs to reconstruct real images. Recent methods for such problems typically employ a VGG perceptual loss to measure the difference between images. While the perceptual loss has achieved remarkable success in various computer vision tasks, it may cause unpleasant artifacts and is sensitive to changes in input scale. This paper delivers an important message that algorithm details are crucial for achieving satisfying performance. In particular, we propose two important but undervalued design principles: (i) not down-sampling the input of the perceptual loss to avoid high-frequency artifacts; and (ii) calculating the perceptual loss using convolutional features which are robust to scale. Integrating these designs derives the proposed framework, HRInversion, that achieves superior performance in reconstructing image details. We validate the effectiveness of HRInversion on a cross-domain image synthesis task and propose a post-processing approach named local style optimization (LSO) to synthesize clean and controllable stylized images. For the evaluation of the cross-domain images, we introduce a metric named ID retrieval which captures the similarity of face identities of stylized images to content images. We also test HRInversion on non-square images. Equipped with implicit neural representation, HRInversion applies to ultra-high resolution images with more than 10 million pixels. Furthermore, we show applications of style transfer and 3D-aware GAN inversion, paving the way for extending the application range of HRInversion.

References

[1]
I. Goodfellowet al., “Generative adversarial nets,” in Proc. NeurIPS, 2014, pp. 2672–2680.
[2]
T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4401–4410.
[3]
T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8110–8119.
[4]
T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,” in Proc. NeurIPS, 2020, pp. 1–11.
[5]
T. Karraset al., “Alias-free generative adversarial networks,” in Proc. NeurIPS, 2021, pp. 1–12.
[6]
Q. Duan, L. Zhang, and X. Gao, “Simultaneous face completion and frontalization via mask guided two-stage GAN,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 6, pp. 3761–3773, Jun. 2022.
[7]
L. Zhang, H. Yang, T. Qiu, and L. Li, “AP-GAN: Improving attribute preservation in video face swapping,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2226–2237, Apr. 2022.
[8]
W. Yan, Y. Zeng, and H. Hu, “Domain adversarial disentanglement network with cross-domain synthesis for generalized face anti-spoofing,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 10, pp. 7033–7046, Oct. 2022.
[9]
Y. Jun, C. Jiang, R. Li, C.-W. Luo, and Z.-F. Wang, “Real-time 3-D facial animation: From appearance to internal articulators,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 4, pp. 920–932, Apr. 2018.
[10]
Q. Liet al., “Concealed attack for robust watermarking based on generative model and perceptual loss,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5695–5706, Aug. 2022.
[11]
R. Abdal, Y. Qin, and P. Wonka, “Image2StyleGAN: How to embed images into the StyleGAN latent space?” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4432–4441.
[12]
J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros, “Generative visual manipulation on the natural image manifold,” in Proc. ECCV, 2018, pp. 597–613.
[13]
T. M. Dinh, A. T. Tran, R. Nguyen, and B.-S. Hua, “HyperInverter: Improving StyleGAN inversion via hypernetwork,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 11389–11398.
[14]
S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin, “PULSE: Self-supervised photo upsampling via latent space exploration of generative models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 2437–2445.
[15]
X. Pan, X. Zhan, B. Dai, D. Lin, C. C. Loy, and P. Luo, “Exploiting deep generative prior for versatile image restoration and manipulation,” in Proc. ECCV, 2020, pp. 262–277.
[16]
K. C. K. Chan, X. Wang, X. Xu, J. Gu, and C. C. Loy, “GLEAN: Generative latent bank for large-factor image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1–10.
[17]
J. Gu, Y. Shen, and B. Zhou, “Image processing using multi-code GAN prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3012–3021.
[18]
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 586–595.
[19]
L. A. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” in Proc. NeurIPS, 2015, pp. 1–9.
[20]
L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” 2015, arXiv:1508.06576.
[21]
J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in Proc. ECCV, 2016, pp. 694–711.
[22]
J. Bruna, P. Sprechmann, and Y. LeCun, “Super-resolution with deep convolutional sufficient statistics,” 2015, arXiv:1511.05666.
[23]
T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 2337–2346.
[24]
X. Liu, G. Yin, J. Shao, X. Wang, and H. Li, “Learning to predict layout-to-image conditional convolutions for semantic image synthesis,” in Proc. NeurIPS, 2019, pp. 1–11.
[25]
R. Abdal, Y. Qin, and P. Wonka, “Image2StyleGAN++: How to edit the embedded images?” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8296–8305.
[26]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[27]
S. Laine, “Feature-based metrics for exploring the latent space of generative models,” in Proc. ICLRW, 2018, pp. 1–4.
[28]
J. N. M. Pinkney and D. Adler, “Resolution dependent GAN interpolation for controllable image synthesis between domains,” 2020, arXiv:2010.05334.
[29]
D. Bauet al., “GAN Dissection: Visualizing and understanding generative adversarial networks,” in Proc. ICLR, 2019, pp. 1–18.
[30]
C. Yang, Y. Shen, and B. Zhou, “Semantic hierarchy emerges in deep generative representations for scene synthesis,” Int. J. Comput. Vis., vol. 129, no. 5, pp. 1451–1466, May 2021.
[31]
M. Liu, Q. Li, Z. Qin, G. Zhang, P. Wan, and W. Zheng, “BlendGAN: Implicitly GAN blending for arbitrary stylized face generation,” in Proc. NeurIPS, 2021, pp. 29710–29722.
[32]
Y. Men, Y. Yao, M. Cui, Z. Lian, X. Xie, and X.-S. Hua, “Unpaired cartoon image synthesis via gated cycle mapping,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 3501–3510.
[33]
A. Dosovitskiy and T. Brox, “Generating images with perceptual similarity metrics based on deep networks,” in Proc. NeurIPS, 2016, pp. 1–9.
[34]
X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1501–1510.
[35]
L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2414–2423.
[36]
K. Liao, C. Lin, Y. Zhao, and M. Gabbouj, “DR-GAN: Automatic radial distortion rectification using conditional GAN in real-time,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 3, pp. 725–733, Mar. 2020.
[37]
P. Wang, H. Zhu, H. Huang, H. Zhang, and N. Wang, “TMS-GAN: A twofold multi-scale generative adversarial network for single image dehazing,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 2760–2772, May 2022.
[38]
S. Xu, D. Liu, and Z. Xiong, “E2I: Generative inpainting from edge to image,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 4, pp. 1308–1322, Apr. 2021.
[39]
F. Peng, L. Yin, and M. Long, “BDC-GAN: Bidirectional conversion between computer-generated and natural facial images for anti-forensics,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 10, pp. 6657–6670, Oct. 2022.
[40]
W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou, and M.-H. Yang, “GAN inversion: A survey,” 2021, arXiv:2101.05278.
[41]
T. Parket al., “Swapping autoencoder for deep image manipulation,” in Proc. NeurIPS, 2020, pp. 7198–7211.
[42]
J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain GAN inversion for real image editing,” in Proc. ECCV, 2020, pp. 592–608.
[43]
Y. Alaluf, O. Patashnik, and D. Cohen-Or, “ReStyle: A residual-based styleGAN encoder via iterative refinement,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 6711–6720.
[44]
Y. Shen, J. Gu, X. Tang, and B. Zhou, “Interpreting the latent space of GANs for semantic face editing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9243–9252.
[45]
Y. Xu, Y. Shen, J. Zhu, C. Yang, and B. Zhou, “Generative hierarchical features from synthesizing images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 4432–4442.
[46]
E. Richardsonet al., “Encoding in style: A styleGAN encoder for image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 2287–2296.
[47]
Y.-D. Lu, H.-Y. Lee, H.-Y. Tseng, and M.-H. Yang, “Unsupervised discovery of disentangled manifolds in GANs,” 2020, arXiv:2011.11842.
[48]
T. Wang, Y. Zhang, Y. Fan, J. Wang, and Q. Chen, “High-fidelity GAN inversion for image attribute editing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 11379–11388.
[49]
A. Raj, Y. Li, and Y. Bresler, “GAN-based projector for faster recovery with convergence guarantees in linear inverse problems,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5602–5611.
[50]
E. Collins, R. Bala, B. Price, and S. Süsstrunk, “Editing in style: Uncovering the local semantics of GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5771–5780.
[51]
G. Daras, A. Odena, H. Zhang, and A. G. Dimakis, “Your local GAN: Designing two dimensional local attention mechanisms for generative models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 14531–14539.
[52]
D. Bau, S. Liu, T. Wang, J.-Y. Zhu, and A. Torralba, “Rewriting a deep generative model,” in Proc. ECCV, 2020, pp. 351–369.
[53]
M. Huh, R. Zhang, J.-Y. Zhu, S. Paris, and A. Hertzmann, “Transforming and projecting images into class-conditional generative networks,” in Proc. ECCV, 2020, pp. 17–34.
[54]
Y. Viazovetskyi, V. Ivashkin, and E. Kashin, “StyleGAN2 distillation for feed-forward image manipulation,” in Proc. ECCV, 2020, pp. 170–186.
[55]
A. Jahanian, L. Chai, and P. Isola, “On the ‘steerability’ of generative adversarial networks,” in Proc. ICLR, 2020, pp. 1–31.
[56]
E. Härkönen, A. Hertzmann, J. Lehtinen, and S. Paris, “GANSpace: Discovering interpretable GAN controls,” in Proc. NeurIPS, 2020, pp. 1–10.
[57]
Z. Wu, D. Lischinski, and E. Shechtman, “StyleSpace analysis: Disentangled controls for StyleGAN image generation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 12863–12872.
[58]
A. Cherepkov, A. Voynov, and A. Babenko, “Navigating the GAN parameter space for semantic image editing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 3671–3680.
[59]
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1125–1134.
[60]
T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8798–8807.
[61]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232.
[62]
Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8789–8797.
[63]
Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “StarGAN v2: Diverse image synthesis for multiple domains,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8188–8197.
[64]
X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” 2018, arXiv:1804.04732.
[65]
S. Yang, L. Jiang, Z. Liu, and C. C. Loy, “Pastiche master: Exemplar-based high-resolution portrait style transfer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 7693–7702.
[66]
D. Lee, J. Young Lee, D. Kim, J. Choi, and J. Kim, “Fix the noise: Disentangling source feature for transfer learning of StyleGAN,” 2022, arXiv:2204.14079.
[67]
J. Huang, J. Liao, and S. Kwong, “Unsupervised image-to-image translation via pre-trained StyleGAN2 network,” 2020, arXiv:2010.05713.
[68]
J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “DeepSDF: Learning continuous signed distance functions for shape representation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 165–174.
[69]
L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3D reconstruction in function space,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4460–4470.
[70]
Z. Chen and H. Zhang, “Learning implicit fields for generative shape modeling,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 5939–5948.
[71]
V. Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 9446–9454.
[72]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
[73]
R. Wightman, “PyTorch image models,” GitHub Repository, 2019.
[74]
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” 2020, arXiv:2012.12877.
[75]
Y. Shen and B. Zhou, “Closed-form factorization of latent semantics in GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1532–1540.
[76]
L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, “FaceShifter: Towards high fidelity and occlusion aware face swapping,” 2019, arXiv:1912.13457.
[77]
T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in Proc. ICLR, 2018, pp. 1–26.
[78]
J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4690–4699.
[79]
J. Huo, W. Li, Y. Shi, Y. Gao, and H. Yin, “WebCaricature: A benchmark for caricature recognition,” in Proc. Brit. Machine Vis. Conf., 2018, pp. 1–12.
[80]
J. Huo, Y. Gao, Y. Shi, and H. Yin, “Variation robust cross-modal metric learning for caricature recognition,” in Proc. Thematic Workshops ACM Multimedia, 2017, pp. 340–348.
[81]
G. Branwen, Anonymous, and Danbooru Community, “Danbooru2019 portraits: A large-scale anime head illustration dataset,” 2019.
[82]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[83]
X. Wanget al., “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proc. ECCV, 2018, pp. 1–16.
[84]
Y. Liu, H. Chen, Y. Chen, W. Yin, and C. Shen, “Generic perceptual loss for modeling structured output dependencies,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 5424–5432.
[85]
E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017, pp. 126–135.
[86]
P. Zhou, L. Xie, B. Ni, and Q. Tian, “CIPS-3D: A 3D-aware generator of GANs based on conditionally-independent pixel synthesis,” 2021, arXiv:2110.09788.

Cited By

View all
  • (2024)Lightweight dual-path octave generative adversarial networks for few-shot image generationMultimedia Systems10.1007/s00530-024-01484-930:5Online publication date: 20-Sep-2024
  • (2023)Rethinking Supervision in Document Unwarping: A Self-Consistent Flow-Free ApproachIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333606834:6(4817-4828)Online publication date: 23-Nov-2023
  • (2023)Hybrid Transformers With Attention-Guided Spatial Embeddings for Makeup Transfer and RemovalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.331279034:4(2876-2890)Online publication date: 11-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Circuits and Systems for Video Technology  Volume 33, Issue 5
May 2023
524 pages

Publisher

IEEE Press

Publication History

Published: 01 May 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Lightweight dual-path octave generative adversarial networks for few-shot image generationMultimedia Systems10.1007/s00530-024-01484-930:5Online publication date: 20-Sep-2024
  • (2023)Rethinking Supervision in Document Unwarping: A Self-Consistent Flow-Free ApproachIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333606834:6(4817-4828)Online publication date: 23-Nov-2023
  • (2023)Hybrid Transformers With Attention-Guided Spatial Embeddings for Makeup Transfer and RemovalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.331279034:4(2876-2890)Online publication date: 11-Sep-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media