Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Boundary-aware GAN for multiple overlapping objects in layout-to-image generation

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Existing layout-to-image generation methods based on generative adversarial networks (GANs) have made great progress. However, a common problem that has not been effectively solved is region missing; that is, as the number of target objects in an image increases, multiple overlapping objects cannot be accurately generated at the boundary. The generation of the overlapped object parts directly determines the quality difference between the generated image and the real image. To solve this problem, we propose the Boundary-Aware GAN model (BAGAN). We adopt a strategy of separately generating the foreground and background. In the foreground generation process, BAGAN first uses attention regularization to accurately locate the boundaries of overlapping objects and then generates a clear foreground image by sharing the parameters of two transfer networks. During background generation, BAGAN uses conditional normalization to improve the quality of the generated background image. To better judge the overlap generation quality, we design the BoundaryFID evaluation metric. We validate and test our model on the COCO-Stuff and Visual Genome datasets. The experimental results show that according to the general evaluation metrics and BoundaryFID, our model can achieve the best results compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the findings of this study are available from the corresponding author, Fengnan Quan, upon reasonable request.

References

  1. Radford, A., Metz, L., Chintala, S.: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR (2016)

  2. Han, Z., Goodfellow, I., Metaxas, D., Odena, A.: Self-Attention Generative Adversarial Networks (2018)

  3. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral Normalization for Generative Adversarial Networks. https://doi.org/10.48550/arXiv.1802.05957 (2018)

  4. Brock, A., Donahue, J., Simonyan, K.: Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR (2019)

  5. Miyato, T., Koyama, M.: CGANs with Projection Discriminator. ICLR (2018)

  6. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR (2018)

  7. Karras, T., Laine, S., Aila, T.: A Style-Based Generator Architecture for Generative Adversarial Networks. TPAMI (2021)

  8. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. NeurIPS (2014)

  9. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and Improving the Image Quality of Stylegan. CVPR (2020)

  10. Johnson, J., Gupta, A., Fei-Fei, L.: Image Generation from Scene Graphs. CVPR (2018)

  11. Ashual, O., Wolf, L.: Specifying Object Attributes and Relations in Interactive Scene Generation. ICCV (2019)

  12. S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, ICML, (2016).

  13. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic Image Synthesis with Spatially-Adaptive Normalization, CVPR (2019)

  14. Zhao, B., Meng, L., Yin, W., Sigal, L.: Image Generation from Layout. CVPR, (2019)

  15. Sun, W., Wu, T.: Image Synthesis from Reconfigurable Layout and Style. ICCV, (2019)

  16. Sun, W., Wu, T.: Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis. TPAMI (2021)

  17. Sylvain, T., Zhang, P., Bengio, Y., Hjelm, R.D., Sharma, S.: Object-Centric Image Generation from Layouts. AAAI (2021)

  18. He, S., Liao, W., Yang, M.Y., Yang, Y., Song, Y.Z., Rosenhahn, B., Xiang, T.: Context-Aware Layout to Image Generation with Enhanced Object Appearance. CVPR (2021)

  19. Wang, B., Wu, T., Zhu, M., Du, P.: Interactive Image Synthesis with Panoptic Layout Generation. CVPR (2022)

  20. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NeurIPS (2017)

  21. Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier Gans. ICML (2017)

  22. Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. ICML (2017)

  23. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. CVPR (2017)

  24. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal Unsupervised Image-to-Image Translation. ECCV (2018)

  25. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.: StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. ICCV (2017)

  26. Quan, F., Lang, B., Liu, Y.: ARRPNGAN: text-to-image GAN with attention regularization and region proposal networks. Signal Process. Image Commun. 106, 116728 (2022)

    Article  Google Scholar 

  27. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-Image Translation with Conditional Adversarial Networks. CVPR (2017)

  28. Liu, X., Shao, J., Yin, G., Wang, X., Li, H.: Learning to Predict Layout-to-Image Conditional Convolutions for Semantic Image Synthesis. NeurIPS (2019)

  29. Li, Y., Cheng, Y., Gan, Z., Yu, L., Wang, L., Liu, J.: BachGAN: High-Resolution Image Synthesis from Salient Object Layout. CVPR (2020)

  30. Ma, K., Zhao, B., Sigal, L.: Attribute-Guided Image Generation from Layout. BMVC (2020)

  31. Hong, S., Yang, D., Choi, J., Lee, H.: Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis. CVPR (2018)

  32. Ioffe S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML (2015)

  33. Dumoulin, V., Shlens, J., Kudlur, M.: A Learned Representation for Artistic Style. ICLR (2017)

  34. Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ICLR, pp. 1–14 (2014)

  35. Caesar, H., Uijlings, J., Ferrari, V.: COCO-Stuff: Thing and Stuff Classes in Context. CVPR, (2018)

  36. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. IJCV, pp. 123 (2017)

  37. Lin, T., Zitnick, C.L., Doll, P.: Microsoft COCO: Common Objects in Context. ECCV, pp. 1–15 (2014)

  38. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved Techniques for Training GANs. NIPS (2016)

  39. Liu, S., Wei, Y., Lu, J., Zhou, J.: An Improved Evaluation Framework for Generative Adversarial Networks (2018). https://doi.org/10.48550/arXiv.1803.07474

  40. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: an imperative style, high-performance deep learning library. NeurIPS (2019)

  41. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    Article  Google Scholar 

  42. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. ICLR, pp. 1–15 (2014)

  43. He, K., Zhang, X, Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. CVPR (2016)

Download references

Acknowledgements

This work was supported in part by SKLSDE-2020ZX-02. The views presented in this paper are those of the authors and should not be interpreted as representing any funding agencies.

Author information

Authors and Affiliations

Authors

Contributions

Fengnan Quan designed the network structure, trained the model, and wrote the paper. Bo Lang offered suggestions about the structure of the network and modified the language of the paper.

Corresponding author

Correspondence to Fengnan Quan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quan, F., Lang, B. Boundary-aware GAN for multiple overlapping objects in layout-to-image generation. Multimedia Systems 30, 88 (2024). https://doi.org/10.1007/s00530-024-01287-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01287-y

Keywords

Navigation