Co-GAN: A Text-to-Image Synthesis Model with Local and Integral Features

Lulu Liu¹⁰,
Ziqi Xie¹⁰,
Yufei Chen¹⁰ &
…
Qiujun Deng¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1967))

Included in the following conference series:

International Conference on Neural Information Processing

724 Accesses

Abstract

Text-to-Image synthesis is a promising technology that generates realistic images from textual descriptions by deep learning model. However, the state-of-the-art text-to-image synthesis models often struggle to balance the overall integrity and local diversity of objects with rich details, leading to unsatisfactory generation results of some domain-specific images, such as industrial applications. To address this issue, we propose Co-GAN, a text-to-image synthesis model that introduces two modules to enhance local diversity and maintain overall structural integrity respectively. Local Feature Enhancement (LFE) module improves the local diversity of generated images, while Integral Structural Maintenance (ISM) module ensures that the integral information is preserved. Furthermore, a cascaded central loss is proposed to address the instability during the generative training. To tackle the problem of incomplete image types in existing datasets, we create a new text-to-image synthesis dataset containing seven types of industrial components, and test the effects of various existing methods based on the dataset. The results of comparative and ablation experiments show that, compared with other current methods, the images generated by Co-GAN incorporate more details and maintain the integrity.

Supported by the National Key Research and Development Plan (2020YFB1712301).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An efficient multi-path structure with staged connection and multi-scale mechanism for text-to-image synthesis

Article 27 February 2023

Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN

Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation

Article 17 November 2020

References

Van Den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756. PMLR (2016)
Google Scholar
Agnese, J., Herrera, J., Tao, H., Zhu, X.: A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 10(4), e1345 (2020)
Article Google Scholar
Yang, X., Chen, Y., Yue, X., Lin, X., Zhang, Q.: Variational synthesis network for generating micro computed tomography from cone beam computed tomography. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1611–1614. IEEE (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851 (2020)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. arXiv preprint arXiv:1605.08803 (2016)
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 $\times $ 1 convolutions. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
He, P., Wang, M., Tu, D., Wang, Z.: Dual discriminative adversarial cross-modal retrieval. Appl. Intell. 53(4), 4257–4267 (2023)
Article Google Scholar
Xu, L., Zhou, S., Guo, J., Tian, W., Tang, W., Yi, Z.: Metal artifact reduction for oral and maxillofacial computed tomography images by a generative adversarial network. Appl. Intell. 52(11), 13184–13194 (2022)
Article Google Scholar
Du, W., Xia, Z., Han, L., Gao, B.: 3D solid model generation method based on a generative adversarial network. Appl. Intell. 1–26 (2022)
Google Scholar
CAXA-gongyeyun. http://www.gongyeyun.com/SoftService/ResourceDownList
Singh, V., Tiwary, U.S.: Visual content generation from textual description using improved adversarial network. Multimed. Tools Appl. 82(7), 10943–10960 (2023)
Article Google Scholar
Mao, F., Ma, B., Chang, H., Shan, S., Chen, X.: Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation. Sci. China Inf. Sci. 64, 1–12 (2021)
Article MathSciNet Google Scholar
TraceParts. https://www.traceparts.cn/zh
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar
Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Article Google Scholar
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Google Scholar
Zhu, M., Pan, P., Chen, W., Yang, Y.: DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)
Google Scholar
Liao, K., Lin, C., Zhao, Y., Gabbouj, M.: DR-GAN: automatic radial distortion rectification using conditional GAN in real-time. IEEE Trans. Circuits Syst. Video Technol. 30(3), 725–733 (2019)
Article Google Scholar
Tao, M., Tang, H., Wu, F., Jing, X.Y., Bao, B.K., Xu, C.: DF-GAN: a simple and effective baseline for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16515–16525 (2022)
Google Scholar
Cheng, Q., Wen, K., Gu, X.: Vision-language matching for text-to-image synthesis via generative adversarial networks. IEEE Trans. Multimed. (2022)
Google Scholar
Ye, S., Wang, H., Tan, M., Liu, F.: Recurrent affine transformation for text-to-image synthesis. IEEE Trans. Multimed. (2023)
Google Scholar
Ma, J., Zhang, L., Zhang, J.: SD-GAN: saliency-discriminated GAN for remote sensing image super resolution. IEEE Geosci. Remote Sens. Lett. 17(11), 1973–1977 (2019)
Article Google Scholar
Liao, W., Hu, K., Yang, M.Y., Rosenhahn, B.: Text to image generation with semantic-spatial aware GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18187–18196 (2022)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Plan (2020YFB1712301).

Author information

Authors and Affiliations

College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, China
Lulu Liu, Ziqi Xie, Yufei Chen & Qiujun Deng

Authors

Lulu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ziqi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qiujun Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yufei Chen .

Editor information

Editors and Affiliations

Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangdong, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L., Xie, Z., Chen, Y., Deng, Q. (2024). Co-GAN: A Text-to-Image Synthesis Model with Local and Integral Features. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1967. Springer, Singapore. https://doi.org/10.1007/978-981-99-8178-6_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8178-6_19
Published: 30 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8177-9
Online ISBN: 978-981-99-8178-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Co-GAN: A Text-to-Image Synthesis Model with Local and Integral Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An efficient multi-path structure with staged connection and multi-scale mechanism for text-to-image synthesis

Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN

Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Co-GAN: A Text-to-Image Synthesis Model with Local and Integral Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An efficient multi-path structure with staged connection and multi-scale mechanism for text-to-image synthesis

Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN

Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation