research-article

TCGIS: Text and Contour Guided Controllable Image Synthesis

Authors:

Zhiqiang Zhang,

Jinjia ZhouAuthors Info & Claims

McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

Pages 75 - 79

https://doi.org/10.1145/3607541.3616811

Published: 29 October 2023 Publication History

Abstract

Recently, text-to-image synthesis (T2I) has received extensive attention with encouraging results. However, the research still has the following challenges: 1) the quality of the synthesized images cannot be effectively guaranteed; 2) the human participation in the synthesis process is still insufficient. Facing the above challenges, we propose a text- and contour-guided artificial controllable image synthesis method. The method can synthesize corresponding image results based on the manual input text and simple contour, wherein the text determines the basic content and the contour determines the shape position. Based on the above idea, we propose a reasonable and efficient network structure using the attention mechanism and achieve amazing synthetic results. We validate the effectiveness of our proposed method on three widely used datasets, and both qualitative and quantitative experimental results show promising performance. In addition, we design a lightweight structure to further improve the practicability of the model.

Supplementary Material

MP4 File (TCGIS_fu.mp4)

McGE006: TCGIS: Text and Contour Guided Controllable Image Synthesis

Download
81.89 MB

References

[1]

Liu B, Song K, Zhu Y, de Melo G, and Elgammal A. 2021. TIME: Text and Image Mutual-Translation Adversarial Networks. In AAAI. 2082--2090.

[2]

Li B, Qi X, H. S. Torr P, and Lukasiewicz T. 2020a. Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. In NeurIPS.

[3]

Li B, Qi X, Lukasiewicz T, and H. S. Torr P. 2019. Controllable Text-to-Image Generation. In NeurIPS. 2063--2073.

[4]

Li B, Qi X, Lukasiewicz T, and H. S. Torr P. 2020b. ManiGAN: Text-Guided Image Manipulation. In CVPR. 7877--7886.

[5]

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, and Wojna Z. 2016. Rethinking the Inception Architecture for Computer Vision. In CVPR. 2818--2826.

[6]

Wah C, Branson S, Welinder P, Perona P, and Belongie S. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.

[7]

Zhang H, Xu T, and Li H. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. 5908--5916.

[8]

J. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, C. Courville A, and Bengio Y. 2014. Generative Adversarial Nets. In NIPS. 2672--2680.

[9]

Zhu J, Park T, Isola P, and A. Efros A. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. 2242--2251.

[10]

He K, Zhang X, Ren S, and Sun J. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.

[11]

Simonyan K and Zisserman A. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

[12]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[13]

Zhang L, Zhang L, Mou L, and Zhang D. 2011. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process., Vol. 20, 8 (2011), 2378--2386.

Digital Library

[14]

Heusel M, Ramsauer H, Unterthiner T, Nessler B, and Hochreiter S. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS. 6626--6637.

[15]

Mirza M and Osindero S. 2014. Conditional Generative Adversarial Nets. CoRR, Vol. abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784

[16]

Nilsback M and Zisserman A. 2008. Automated Flower Classification over a Large Number of Classes. In ICVGIP. 722--729.

[17]

Schuster M and K. Paliwal K. 1997. Bidirectional recurrent neural networks. IEEE Trans. Signal Process., Vol. 45, 11 (1997), 2673--2681.

Digital Library

[18]

Tao M, Tang H, Wu F, Jing X, Bao B, and Xu C. 2022. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. In CVPR. 16494--16504.

[19]

Zhu M, Pan P, Chen W, and Yang Y. 2019. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis. In CVPR. 5802--5810.

[20]

E. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, and Lee H. 2016. Generative Adversarial Text to Image Synthesis. In ICML. 1060--1069.

[21]

Lin T, Maire M, J. Belongie S, Hays J, Perona P, Ramanan D, Dollá r P, and L. Zitnick C. 2014. Microsoft COCO: Common Objects in Context. In ECCV, Vol. 8693. 740--755.

[22]

Salimans T, J. Goodfellow I, Zaremba W, Cheung V, Radford A, and Chen X. 2016. Improved Techniques for Training GANs. In NIPS. 2226--2234.

[23]

Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, and He X. 2018. AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks. In CVPR. 1316--1324.

[24]

Z. Wang, E.P. Simoncelli, and A.C Bovik. 2003. Multiscale structural similarity for image quality assessment. In ACSSC. 1398--1402.

[25]

Zhang Z, Zhou J, Yu W, and Jiang N. 2021. Drawgan: Text to Image Synthesis with Drawing Generative Adversarial Networks. In ICASSP. 4195--4199.

[26]

Zhang Z, Zhou J, Yu W, and Jiang N. 2022. Text-to-image synthesis: Starting composite from the foreground content. Inf. Sci., Vol. 607 (2022), 1265--1285.

Digital Library

[27]

Zhang Z, Yu W, Zhou J, Zhang X, Jiang N, He G, and Yang Z. 2020. Customizable GAN: A Method for Image Synthesis of Human Controllable. IEEE Access, Vol. 8 (2020), 108004--108017. io

Index Terms

TCGIS: Text and Contour Guided Controllable Image Synthesis
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics

Recommendations

Can reactive synthesis and syntax-guided synthesis be friends?
SPLASH Companion 2021: Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
TUMSyn: A Text-Guided Generalist Model for Customized Multimodal MR Image Synthesis
Foundation Models for General Medical AI
Abstract
Multimodal magnetic resonance (MR) imaging has revolutionized our understanding of the human brain. However, various limitations in clinical scanning hinder the data acquisition process. Current medical image synthesis techniques, often designed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

October 2023

151 pages

ISBN:9798400702785

DOI:10.1145/3607541

General Chairs:
Cheng Jin
Professor, Fudan University, China
,
Liang He
Professor, East China Normal University, China
,
Mingli Song
Professor, Zhejiang University, China
,
Rui Wang
Professor, IIE, Chinese Academy of Sciences, China

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hosei University
The National Key Research and Development Program from the Ministry of Science and Technology of the PRC
Sichuan Science and Technology Program

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29, 2023

Ottawa ON, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
35
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents