Abstract
Text-to-Image synthesis is a promising technology that generates realistic images from textual descriptions by deep learning model. However, the state-of-the-art text-to-image synthesis models often struggle to balance the overall integrity and local diversity of objects with rich details, leading to unsatisfactory generation results of some domain-specific images, such as industrial applications. To address this issue, we propose Co-GAN, a text-to-image synthesis model that introduces two modules to enhance local diversity and maintain overall structural integrity respectively. Local Feature Enhancement (LFE) module improves the local diversity of generated images, while Integral Structural Maintenance (ISM) module ensures that the integral information is preserved. Furthermore, a cascaded central loss is proposed to address the instability during the generative training. To tackle the problem of incomplete image types in existing datasets, we create a new text-to-image synthesis dataset containing seven types of industrial components, and test the effects of various existing methods based on the dataset. The results of comparative and ablation experiments show that, compared with other current methods, the images generated by Co-GAN incorporate more details and maintain the integrity.
Supported by the National Key Research and Development Plan (2020YFB1712301).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Van Den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756. PMLR (2016)
Agnese, J., Herrera, J., Tao, H., Zhu, X.: A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 10(4), e1345 (2020)
Yang, X., Chen, Y., Yue, X., Lin, X., Zhang, Q.: Variational synthesis network for generating micro computed tomography from cone beam computed tomography. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1611–1614. IEEE (2021)
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851 (2020)
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. arXiv preprint arXiv:1605.08803 (2016)
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 \(\times \) 1 convolutions. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
He, P., Wang, M., Tu, D., Wang, Z.: Dual discriminative adversarial cross-modal retrieval. Appl. Intell. 53(4), 4257–4267 (2023)
Xu, L., Zhou, S., Guo, J., Tian, W., Tang, W., Yi, Z.: Metal artifact reduction for oral and maxillofacial computed tomography images by a generative adversarial network. Appl. Intell. 52(11), 13184–13194 (2022)
Du, W., Xia, Z., Han, L., Gao, B.: 3D solid model generation method based on a generative adversarial network. Appl. Intell. 1–26 (2022)
CAXA-gongyeyun. http://www.gongyeyun.com/SoftService/ResourceDownList
Singh, V., Tiwary, U.S.: Visual content generation from textual description using improved adversarial network. Multimed. Tools Appl. 82(7), 10943–10960 (2023)
Mao, F., Ma, B., Chang, H., Shan, S., Chen, X.: Learning efficient text-to-image synthesis via interstage cross-sample similarity distillation. Sci. China Inf. Sci. 64, 1–12 (2021)
TraceParts. https://www.traceparts.cn/zh
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Zhu, M., Pan, P., Chen, W., Yang, Y.: DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)
Liao, K., Lin, C., Zhao, Y., Gabbouj, M.: DR-GAN: automatic radial distortion rectification using conditional GAN in real-time. IEEE Trans. Circuits Syst. Video Technol. 30(3), 725–733 (2019)
Tao, M., Tang, H., Wu, F., Jing, X.Y., Bao, B.K., Xu, C.: DF-GAN: a simple and effective baseline for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16515–16525 (2022)
Cheng, Q., Wen, K., Gu, X.: Vision-language matching for text-to-image synthesis via generative adversarial networks. IEEE Trans. Multimed. (2022)
Ye, S., Wang, H., Tan, M., Liu, F.: Recurrent affine transformation for text-to-image synthesis. IEEE Trans. Multimed. (2023)
Ma, J., Zhang, L., Zhang, J.: SD-GAN: saliency-discriminated GAN for remote sensing image super resolution. IEEE Geosci. Remote Sens. Lett. 17(11), 1973–1977 (2019)
Liao, W., Hu, K., Yang, M.Y., Rosenhahn, B.: Text to image generation with semantic-spatial aware GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18187–18196 (2022)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Acknowledgements
This work was supported by the National Key Research and Development Plan (2020YFB1712301).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, L., Xie, Z., Chen, Y., Deng, Q. (2024). Co-GAN: A Text-to-Image Synthesis Model with Local and Integral Features. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1967. Springer, Singapore. https://doi.org/10.1007/978-981-99-8178-6_19
Download citation
DOI: https://doi.org/10.1007/978-981-99-8178-6_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8177-9
Online ISBN: 978-981-99-8178-6
eBook Packages: Computer ScienceComputer Science (R0)