Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Rethinking cross-domain semantic relation for few-shot image generation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Training well-performing Generative Adversarial Networks (GANs) with limited data has always been challenging. Existing methods either require sufficient data (over 100 training images) for training or generate images of low quality and low diversity. To solve this problem, we propose a new Cross-domain Semantic Relation (CSR) loss. The CSR loss improves the performance of the generative model by maintaining the relationship between instances in the source domain and generated images. At the same time, a perceptual similarity loss and a discriminative contrastive loss are designed to further enrich the diversity of generated images and stabilize the training process of models. Experiments on nine publicly available few-shot datasets and comparisons with the current nine methods show that our approach is superior to all baseline methods. Finally, we perform ablation studies on the proposed three loss functions and prove that these three loss functions are essential for few-shot image generation tasks. Code is available at https://github.com/gouayao/CSR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Benaim S, Wolf L (2017) One-sided unsupervised domain mapping. Advances in neural information processing systems 30

  2. Bojanowski P, Joulin A, Lopez-Pas D, Szlam A (2018) Optimizing the latent space of generative networks. In: International Conference on Machine Learning, PMLR, pp 600–609

  3. Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis. arXiv preprint http://arxiv.org/abs/1809.11096

  4. Cao J, Hou L, Yang MH, He R, Sun Z (2021) Remix: Towards image-to-image translation with limited data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15018–15027

  5. Chan KC, Wang X, Xu X, Gu J, Loy CC (2021) Glean: Generative latent bank for large-factor image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14245–14254

  6. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607

  7. Chong MJ, Forsyth D (2020) Effectively unbiased fid and inception score and where to find them. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6070–6079

  8. Frühstück A, Singh KK, Shechtman E, Mitra NJ, Wonka P, Lu J (2022) Insetgan for full-body image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7723–7732

  9. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems 27

  10. Gou Y, Li M, Song Y, He Y, Wang L (2022) Multi-feature contrastive learning for unpaired image-to-image translation. Complex & Intelligent Systems pp 1–12

  11. Gu Z, Li W, Huo J, Wang L, Gao Y (2021) Lofgan: Fusing local representations for few-shot image generation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8463–8471

  12. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738

  13. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30

  14. Hong Y, Niu L, Zhang J, Zhao W, Fu C, Zhang L (2020) F2gan: Fusing-and-filling gan for few-shot image generation. In: Proceedings of the 28th ACM international conference on multimedia, pp 2535–2543

  15. Hong Y, Niu L, Zhang J, Zhang L (2022) Deltagan: Towards diverse few-shot image generation with sample-specific delta. In: Computer Vision–ECCV 2022: 17th european conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, Springer, pp 259–276

  16. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  17. Jeong J, Shin J (2021) Training gans with stronger augmentations via contrastive discriminator. In: International Conference on Learning Representations

  18. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, Springer, pp 694–711

  19. Jung C, Kwon G, Ye JC (2022) Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18260–18269

  20. Kang M, Park J (2020) Contragan: Contrastive learning for conditional image generation. Advances in Neural Information Processing Systems 33:21357–21369

    Google Scholar 

  21. Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of gans for improved quality, stability, and variation. In: International Conference on Learning Representations

  22. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410

  23. Karras T, Aittala M, Hellsten J, Laine S, Lehtinen J, Aila T (2020) Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33:12104–12114

    Google Scholar 

  24. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020b) Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8110–8119

  25. Karras T, Aittala M, Laine S, Härkönen E, Hellsten J, Lehtinen J, Aila T (2021) Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34

  26. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint http://arxiv.org/abs/1412.6980

  27. Kong C, Kim J, Han D, Kwak N (2022) Few-shot image generation with mixup-based distance learning. In: Computer vision–ECCV 2022: 17th european conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, Springer, pp 563–580

  28. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

  29. Li M, Lin J, Ding Y, Liu Z, Zhu JY, Han S (2020a) Gan compression: Efficient architectures for interactive conditional gans. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5284–5294

  30. Li Y, Zhang R, Lu JC, Shechtman E (2020) Few-shot image generation with elastic weight consolidation. Advances in Neural Information Processing Systems 33:15885–15896

    Google Scholar 

  31. Liang J, Zeng H, Zhang L (2022) Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5657–5666

  32. Liu R, Ge Y, Choi CL, Wang X, Li H (2021a) Divco: Diverse conditional image synthesis via contrastive generative adversarial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16377–16386

  33. Liu Y, Shu Z, Li Y, Lin Z, Perazzi F, Kung SY (2021b) Content-aware gan compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12156–12166

  34. Mo S, Cho M, Shin J (2020) Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint http://arxiv.org/abs/2002.10964

  35. Noguchi A, Harada T (2019) Image generation from small datasets via batch statistics adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2750–2758

  36. Ojha U, Li Y, Lu J, Efros AA, Lee YJ, Shechtman E, Zhang R (2021) Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10743–10752

  37. Van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv e-prints pp arXiv–1807

  38. Ozbulak G (2019) Image colorization by capsule networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0

  39. Park T, Efros AA, Zhang R, Zhu JY (2020a) Contrastive learning for unpaired image-to-image translation. In: European conference on computer vision, Springer, pp 319–345

  40. Park T, Zhu JY, Wang O, Lu J, Shechtman E, Efros A, Zhang R (2020) Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems 33:7198–7211

    Google Scholar 

  41. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. Advances in neural information processing systems 29

  42. Skorokhodov I, Sotnikov G, Elhoseiny M (2021) Aligning latent and image spaces to connect the unconnectable. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14144–14153

  43. Tseng HY, Jiang L, Liu C, Yang MH, Yang W (2021) Regularizing generative adversarial networks under limited data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7921–7931

  44. Wang H, Gui S, Yang H, Liu J, Wang Z (2020a) Gan slimming: All-in-one gan compression by a unified optimization framework. In: European conference on computer vision, Springer, pp 54–73

  45. Wang X, Tang X (2008) Face photo-sketch synthesis and recognition. IEEE transactions on pattern analysis and machine intelligence 31(11):1955–1967

    Article  Google Scholar 

  46. Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Change Loy C (2018a) Esrgan: Enhanced super-resolution generative adversarial networks. In: Proceedings of the european conference on computer vision (ECCV) workshops, pp 0–0

  47. Wang Y, Wu C, Herranz L, van de Weijer J, Gonzalez-Garcia A, Raducanu B (2018b) Transferring gans: generating images from limited data. In: Proceedings of the european conference on computer vision (ECCV), pp 218–234

  48. Wang Y, Gonzalez-Garcia A, Berga D, Herranz L, Khan FS, Weijer Jvd (2020b) Minegan: effective knowledge transfer from gans to target domains with few images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9332–9341

  49. Wu Y, Wang X, Li Y, Zhang H, Zhao X, Shan Y (2021) Towards vivid and diverse image colorization with generative color prior. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14377–14386

  50. Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3733–3742

  51. Xiao J, Li L, Wang C, Zha ZJ, Huang Q (2022) Few shot generative model adaption via relaxed spatial structural alignment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11204–11213

  52. Xie J, Zheng Z, Fang X, Zhu SC, Wu YN (2021) Learning cycle-consistent cooperative networks via alternating mcmc teaching for unsupervised cross-domain translation. In: The Thirty-Fifth AAAI conference on artificial intelligence (AAAI)

  53. Yang M, Wang Z, Chi Z, Feng W (2022) Wavegan: Frequency-aware gan for high-fidelity few-shot image generation. In: Computer vision–ECCV 2022: 17th european conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, Springer, pp 1–17

  54. Yaniv J, Newman Y, Shamir A (2019) The face of art: landmark detection and geometric style in portraits. ACM Transactions on graphics (TOG) 38(4):1–15

  55. Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J (2015) Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint http://arxiv.org/abs/1506.03365

  56. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision, Springer, pp 649–666

  57. Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595

  58. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yujie He.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gou, Y., Li, M., Lv, Y. et al. Rethinking cross-domain semantic relation for few-shot image generation. Appl Intell 53, 22391–22404 (2023). https://doi.org/10.1007/s10489-023-04602-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04602-8

Keywords

Navigation