Nothing Special   »   [go: up one dir, main page]

Skip to main content

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15064))

Included in the following conference series:

  • 138 Accesses

Abstract

This paper attempts to address the object repetition issue in patch-wise higher-resolution image generation. We propose AccDiffusion, an accurate method for patch-wise higher-resolution image generation without training. An in-depth analysis in this paper reveals an identical text prompt for different patches causes repeated object generation, while no prompt compromises the image details. Therefore, our AccDiffusion, for the first time, proposes to decouple the vanilla image-content-aware prompt into a set of patch-content-aware prompts, each of which serves as a more precise description of an image patch. Besides, AccDiffusion also introduces dilated sampling with window interaction for better global consistency in higher-resolution image generation. Experimental comparison with existing methods demonstrates that our AccDiffusion effectively addresses the issue of repeated object generation and leads to better performance in higher-resolution image generation. Our code is released at https://github.com/lzhxmu/AccDiffusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bar-Tal, O., Yariv, L., Lipman, Y., Dekel, T.: Multidiffusion: fusing diffusion paths for controlled image generation. In: ICML (2023)

    Google Scholar 

  2. Chai, L., Gharbi, M., Shechtman, E., Isola, P., Zhang, R.: Any-resolution training for high-resolution image synthesis. In: ECCV, pp. 170–188 (2022)

    Google Scholar 

  3. Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. In: NeurIPS, pp. 8780–8794 (2021)

    Google Scholar 

  4. Du, R., Chang, D., Hospedales, T., Song, Y.Z., Ma, Z.: Demofusion: democratising high-resolution image generation with no $$\$. In: CVPR, pp. 6158–6168 (2024)

    Google Scholar 

  5. Ghosal, D., Majumder, N., Mehrish, A., Poria, S.: Text-to-audio generation using instruction-tuned llm and latent diffusion model. In: ACM MM (2023)

    Google Scholar 

  6. He, Y., et al.: Scalecrafter: tuning-free higher-resolution visual generation with diffusion models. In: ICLR (2024)

    Google Scholar 

  7. Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. In: ICLR (2023)

    Google Scholar 

  8. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017)

    Google Scholar 

  9. Ho, J., et al.: Imagen video: High definition video generation with diffusion models (2022). arXiv preprint arXiv:2210.02303

  10. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

    Google Scholar 

  11. Huang, R., et al.: Make-an-audio: text-to-audio generation with prompt-enhanced diffusion models. In: ICML (2023)

    Google Scholar 

  12. Jin, Z., Shen, X., Li, B., Xue, X.: Training-free diffusion model adaptation for variable-sized text-to-image synthesis. In: NeurIPS (2023)

    Google Scholar 

  13. Kirillov, A., et al.: Segment anything. In: ICCV (2023)

    Google Scholar 

  14. Lee, Y., Kim, K., Kim, H., Sung, M.: Syncdiffusion: coherent montage via synchronized joint diffusions. In: NeurIPS (2023)

    Google Scholar 

  15. Lin, C.H., et al.: Magic3d: high-resolution text-to-3d content creation. In: CVPR (2023)

    Google Scholar 

  16. Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: ICML (2021)

    Google Scholar 

  17. Podell, D., et al.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024)

    Google Scholar 

  18. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. In: ICLR (2023)

    Google Scholar 

  19. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

    Google Scholar 

  20. Robin Rombach, P.E.: Stable diffusion v1-5 model card. https://huggingface.co/runwayml/stable-diffusion-v1-5

  21. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

    Google Scholar 

  22. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: NeurIPS (2022)

    Google Scholar 

  23. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. TPAMI (2022)

    Google Scholar 

  24. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: NeurIPS (2016)

    Google Scholar 

  25. Schuhmann, C., et al.: Laion-5b: an open large-scale dataset for training next generation image-text models. In: NeurIPS (2022)

    Google Scholar 

  26. Singer, U., et al.: Make-a-video: text-to-video generation without text-video data. In: ICLR (2023)

    Google Scholar 

  27. Soille, P., et al.: Morphological image analysis: principles and applications, vol. 2. Springer (1999)

    Google Scholar 

  28. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)

    Google Scholar 

  29. Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution (2023). arXiv preprint arXiv:2305.07015

  30. Xie, E., et al.: Difffit: unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning. In: ICCV (2022)

    Google Scholar 

  31. Xu, J., et al.: Dream3d: zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In: CVPR (2023)

    Google Scholar 

  32. Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: CVPR (2021)

    Google Scholar 

  33. Zheng, Q., et al.: Any-size-diffusion: toward efficient text-driven synthesis for any-size hd images. In: AAAI (2024)

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Science and Technology Major Project (No. 2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No. 62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. U23A20383, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No. 2022J06001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongrong Ji .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6108 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, Z., Lin, M., Zhao, M., Ji, R. (2025). AccDiffusion: An Accurate Method for Higher-Resolution Image Generation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15064. Springer, Cham. https://doi.org/10.1007/978-3-031-72658-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72658-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72657-6

  • Online ISBN: 978-3-031-72658-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics