Boundary-aware GAN for multiple overlapping objects in layout-to-image generation

Fengnan Quan¹ &
Bo Lang^1,2

113 Accesses
1 Citation
Explore all metrics

Abstract

Existing layout-to-image generation methods based on generative adversarial networks (GANs) have made great progress. However, a common problem that has not been effectively solved is region missing; that is, as the number of target objects in an image increases, multiple overlapping objects cannot be accurately generated at the boundary. The generation of the overlapped object parts directly determines the quality difference between the generated image and the real image. To solve this problem, we propose the Boundary-Aware GAN model (BAGAN). We adopt a strategy of separately generating the foreground and background. In the foreground generation process, BAGAN first uses attention regularization to accurately locate the boundaries of overlapping objects and then generates a clear foreground image by sharing the parameters of two transfer networks. During background generation, BAGAN uses conditional normalization to improve the quality of the generated background image. To better judge the overlap generation quality, we design the BoundaryFID evaluation metric. We validate and test our model on the COCO-Stuff and Visual Genome datasets. The experimental results show that according to the general evaluation metrics and BoundaryFID, our model can achieve the best results compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image generation with self pixel-wise normalization

Article 07 September 2022

Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges

Article 07 June 2024

Image Generation from Layout via Pair-Wise RaGAN

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study are available from the corresponding author, Fengnan Quan, upon reasonable request.

References

Radford, A., Metz, L., Chintala, S.: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR (2016)
Han, Z., Goodfellow, I., Metaxas, D., Odena, A.: Self-Attention Generative Adversarial Networks (2018)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral Normalization for Generative Adversarial Networks. https://doi.org/10.48550/arXiv.1802.05957 (2018)
Brock, A., Donahue, J., Simonyan, K.: Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR (2019)
Miyato, T., Koyama, M.: CGANs with Projection Discriminator. ICLR (2018)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR (2018)
Karras, T., Laine, S., Aila, T.: A Style-Based Generator Architecture for Generative Adversarial Networks. TPAMI (2021)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. NeurIPS (2014)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and Improving the Image Quality of Stylegan. CVPR (2020)
Johnson, J., Gupta, A., Fei-Fei, L.: Image Generation from Scene Graphs. CVPR (2018)
Ashual, O., Wolf, L.: Specifying Object Attributes and Relations in Interactive Scene Generation. ICCV (2019)
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to image synthesis, ICML, (2016).
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic Image Synthesis with Spatially-Adaptive Normalization, CVPR (2019)
Zhao, B., Meng, L., Yin, W., Sigal, L.: Image Generation from Layout. CVPR, (2019)
Sun, W., Wu, T.: Image Synthesis from Reconfigurable Layout and Style. ICCV, (2019)
Sun, W., Wu, T.: Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis. TPAMI (2021)
Sylvain, T., Zhang, P., Bengio, Y., Hjelm, R.D., Sharma, S.: Object-Centric Image Generation from Layouts. AAAI (2021)
He, S., Liao, W., Yang, M.Y., Yang, Y., Song, Y.Z., Rosenhahn, B., Xiang, T.: Context-Aware Layout to Image Generation with Enhanced Object Appearance. CVPR (2021)
Wang, B., Wu, T., Zhu, M., Du, P.: Interactive Image Synthesis with Panoptic Layout Generation. CVPR (2022)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NeurIPS (2017)
Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier Gans. ICML (2017)
Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. ICML (2017)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. CVPR (2017)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal Unsupervised Image-to-Image Translation. ECCV (2018)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.: StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. ICCV (2017)
Quan, F., Lang, B., Liu, Y.: ARRPNGAN: text-to-image GAN with attention regularization and region proposal networks. Signal Process. Image Commun. 106, 116728 (2022)
Article Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-Image Translation with Conditional Adversarial Networks. CVPR (2017)
Liu, X., Shao, J., Yin, G., Wang, X., Li, H.: Learning to Predict Layout-to-Image Conditional Convolutions for Semantic Image Synthesis. NeurIPS (2019)
Li, Y., Cheng, Y., Gan, Z., Yu, L., Wang, L., Liu, J.: BachGAN: High-Resolution Image Synthesis from Salient Object Layout. CVPR (2020)
Ma, K., Zhao, B., Sigal, L.: Attribute-Guided Image Generation from Layout. BMVC (2020)
Hong, S., Yang, D., Choi, J., Lee, H.: Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis. CVPR (2018)
Ioffe S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML (2015)
Dumoulin, V., Shlens, J., Kudlur, M.: A Learned Representation for Artistic Style. ICLR (2017)
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. ICLR, pp. 1–14 (2014)
Caesar, H., Uijlings, J., Ferrari, V.: COCO-Stuff: Thing and Stuff Classes in Context. CVPR, (2018)
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. IJCV, pp. 123 (2017)
Lin, T., Zitnick, C.L., Doll, P.: Microsoft COCO: Common Objects in Context. ECCV, pp. 1–15 (2014)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved Techniques for Training GANs. NIPS (2016)
Liu, S., Wei, Y., Lu, J., Zhou, J.: An Improved Evaluation Framework for Generative Adversarial Networks (2018). https://doi.org/10.48550/arXiv.1803.07474
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: an imperative style, high-performance deep learning library. NeurIPS (2019)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. ICLR, pp. 1–15 (2014)
He, K., Zhang, X, Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. CVPR (2016)

Download references

Acknowledgements

This work was supported in part by SKLSDE-2020ZX-02. The views presented in this paper are those of the authors and should not be interpreted as representing any funding agencies.

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, People’s Republic of China
Fengnan Quan & Bo Lang
Zhongguancun Laboratory, Beijing, People’s Republic of China
Bo Lang

Authors

Fengnan Quan
View author publications
You can also search for this author in PubMed Google Scholar
Bo Lang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Fengnan Quan designed the network structure, trained the model, and wrote the paper. Bo Lang offered suggestions about the structure of the network and modified the language of the paper.

Corresponding author

Correspondence to Fengnan Quan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Quan, F., Lang, B. Boundary-aware GAN for multiple overlapping objects in layout-to-image generation. Multimedia Systems 30, 88 (2024). https://doi.org/10.1007/s00530-024-01287-y

Download citation

Received: 11 July 2023
Accepted: 08 February 2024
Published: 21 March 2024
DOI: https://doi.org/10.1007/s00530-024-01287-y

Boundary-aware GAN for multiple overlapping objects in layout-to-image generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Image generation with self pixel-wise normalization

Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges

Image Generation from Layout via Pair-Wise RaGAN

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Boundary-aware GAN for multiple overlapping objects in layout-to-image generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Image generation with self pixel-wise normalization

Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges

Image Generation from Layout via Pair-Wise RaGAN

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation