Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3607541.3616811acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

TCGIS: Text and Contour Guided Controllable Image Synthesis

Published: 29 October 2023 Publication History

Abstract

Recently, text-to-image synthesis (T2I) has received extensive attention with encouraging results. However, the research still has the following challenges: 1) the quality of the synthesized images cannot be effectively guaranteed; 2) the human participation in the synthesis process is still insufficient. Facing the above challenges, we propose a text- and contour-guided artificial controllable image synthesis method. The method can synthesize corresponding image results based on the manual input text and simple contour, wherein the text determines the basic content and the contour determines the shape position. Based on the above idea, we propose a reasonable and efficient network structure using the attention mechanism and achieve amazing synthetic results. We validate the effectiveness of our proposed method on three widely used datasets, and both qualitative and quantitative experimental results show promising performance. In addition, we design a lightweight structure to further improve the practicability of the model.

Supplementary Material

MP4 File (TCGIS_fu.mp4)
McGE006: TCGIS: Text and Contour Guided Controllable Image Synthesis

References

[1]
Liu B, Song K, Zhu Y, de Melo G, and Elgammal A. 2021. TIME: Text and Image Mutual-Translation Adversarial Networks. In AAAI. 2082--2090.
[2]
Li B, Qi X, H. S. Torr P, and Lukasiewicz T. 2020a. Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. In NeurIPS.
[3]
Li B, Qi X, Lukasiewicz T, and H. S. Torr P. 2019. Controllable Text-to-Image Generation. In NeurIPS. 2063--2073.
[4]
Li B, Qi X, Lukasiewicz T, and H. S. Torr P. 2020b. ManiGAN: Text-Guided Image Manipulation. In CVPR. 7877--7886.
[5]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, and Wojna Z. 2016. Rethinking the Inception Architecture for Computer Vision. In CVPR. 2818--2826.
[6]
Wah C, Branson S, Welinder P, Perona P, and Belongie S. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.
[7]
Zhang H, Xu T, and Li H. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. 5908--5916.
[8]
J. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, C. Courville A, and Bengio Y. 2014. Generative Adversarial Nets. In NIPS. 2672--2680.
[9]
Zhu J, Park T, Isola P, and A. Efros A. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. 2242--2251.
[10]
He K, Zhang X, Ren S, and Sun J. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[11]
Simonyan K and Zisserman A. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.
[12]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[13]
Zhang L, Zhang L, Mou L, and Zhang D. 2011. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process., Vol. 20, 8 (2011), 2378--2386.
[14]
Heusel M, Ramsauer H, Unterthiner T, Nessler B, and Hochreiter S. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS. 6626--6637.
[15]
Mirza M and Osindero S. 2014. Conditional Generative Adversarial Nets. CoRR, Vol. abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784
[16]
Nilsback M and Zisserman A. 2008. Automated Flower Classification over a Large Number of Classes. In ICVGIP. 722--729.
[17]
Schuster M and K. Paliwal K. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process., Vol. 45, 11 (1997), 2673--2681.
[18]
Tao M, Tang H, Wu F, Jing X, Bao B, and Xu C. 2022. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. In CVPR. 16494--16504.
[19]
Zhu M, Pan P, Chen W, and Yang Y. 2019. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis. In CVPR. 5802--5810.
[20]
E. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, and Lee H. 2016. Generative Adversarial Text to Image Synthesis. In ICML. 1060--1069.
[21]
Lin T, Maire M, J. Belongie S, Hays J, Perona P, Ramanan D, Dollá r P, and L. Zitnick C. 2014. Microsoft COCO: Common Objects in Context. In ECCV, Vol. 8693. 740--755.
[22]
Salimans T, J. Goodfellow I, Zaremba W, Cheung V, Radford A, and Chen X. 2016. Improved Techniques for Training GANs. In NIPS. 2226--2234.
[23]
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, and He X. 2018. AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks. In CVPR. 1316--1324.
[24]
Z. Wang, E.P. Simoncelli, and A.C Bovik. 2003. Multiscale structural similarity for image quality assessment. In ACSSC. 1398--1402.
[25]
Zhang Z, Zhou J, Yu W, and Jiang N. 2021. Drawgan: Text to Image Synthesis with Drawing Generative Adversarial Networks. In ICASSP. 4195--4199.
[26]
Zhang Z, Zhou J, Yu W, and Jiang N. 2022. Text-to-image synthesis: Starting composite from the foreground content. Inf. Sci., Vol. 607 (2022), 1265--1285.
[27]
Zhang Z, Yu W, Zhou J, Zhang X, Jiang N, He G, and Yang Z. 2020. Customizable GAN: A Method for Image Synthesis of Human Controllable. IEEE Access, Vol. 8 (2020), 108004--108017. io

Index Terms

  1. TCGIS: Text and Contour Guided Controllable Image Synthesis

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice
      October 2023
      151 pages
      ISBN:9798400702785
      DOI:10.1145/3607541
      • General Chairs:
      • Cheng Jin,
      • Liang He,
      • Mingli Song,
      • Rui Wang
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. artificially controllable
      2. lightweight
      3. text and contour guided image synthesis

      Qualifiers

      • Research-article

      Funding Sources

      • Hosei University
      • The National Key Research and Development Program from the Ministry of Science and Technology of the PRC
      • Sichuan Science and Technology Program

      Conference

      MM '23
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 35
        Total Downloads
      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media