Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3571600.3571646acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

Published: 12 May 2023 Publication History

Abstract

Implicit neural representation (INR) has emanated as a powerful paradigm for 2D image representation. Recent works like INR-GAN have successfully adopted INR for 2D image synthesis. However, these lack explicit control on the generated images as achieved by their 3D-aware image synthesis counterparts like GIRAFFE. Our work investigates INRs for the task of controllable image synthesis. We propose a novel framework that allows for manipulation of foreground, background and their shape and appearance in the latent space. To achieve effective control over these attributes, we introduce a novel feature mask coupling technique that leverages the foreground and background masks for mutual learning. Extensive quantitative and qualitative analysis shows that our model can disentangle the latent space successfully and allows to change the foreground and/or background’s shape and appearance. We further demonstrate that our network takes lesser training time than other INR-based image synthesis methods.

References

[1]
Jonas Adler and Sebastian Lunz. 2018. Banach wasserstein gan. Advances in Neural Information Processing Systems 31 (2018).
[2]
Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, and Denis Korzhenkov. 2021. Image generators with conditionally-independent pixel synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14278–14287.
[3]
Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), 1798–1828.
[4]
Rohan Chabra, Jan E Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove, and Richard Newcombe. 2020. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In European Conference on Computer Vision. Springer, 608–625.
[5]
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems 29 (2016).
[6]
Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8628–8638.
[7]
Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939–5948.
[8]
Julian Chibane, Gerard Pons-Moll, 2020. Neural unsigned distance fields for implicit function learning. Advances in Neural Information Processing Systems 33 (2020), 21638–21652.
[9]
Emily L Denton 2017. Unsupervised learning of disentangled representations from video. Advances in neural information processing systems 30 (2017).
[10]
Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, 2021. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34 (2021), 19822–19835.
[11]
Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, and Thomas Funkhouser. 2019. Learning shape templates with structured implicit functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7154–7164.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).
[13]
Sonam Gupta, Arti Keshari, and Sukhendu Das. 2021. G3AN++ exploring wide GANs with complementary feature learning for video generation. In Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing. 1–9.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[15]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and S. Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS.
[16]
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2016. beta-vae: Learning basic visual concepts with a constrained variational framework. (2016).
[17]
Qiyang Hu, Attila Szabó, Tiziano Portenier, Paolo Favaro, and Matthias Zwicker. 2018. Disentangling factors of variation by mixing them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3399–3407.
[18]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501–1510.
[19]
Chiyu Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser, 2020. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6001–6010.
[20]
Zhiqiang Tao Jiang, Songyao and Yun Fu. 2019. Segmentation guided image-to-image translation with adversarial networks. In 14th IEEE International Conference on Automatic Face & Gesture Recognition.
[21]
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852–863.
[22]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
[23]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
[24]
Arti Keshari, Sonam Gupta, and Sukhendu Das. 2021. V3GAN: Decomposing Background, Foreground and Motion for Video Generation. (2021).
[25]
Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and Honglak Lee. 2020. High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision. Springer, 157–174.
[26]
Yuheng Li, Krishna Kumar Singh, Yang Xue, and Yong Jae Lee. 2021. Partgan: Weakly-supervised part decomposition for image generation and segmentation. In British Machine Vision Conference (BMVC).
[27]
Meichen Liu, Xin Yan, Chenhui Wang, and Kejun Wang. 2021. Segmentation mask-guided person image generation. Applied Intelligence 51, 2 (2021), 1161–1176.
[28]
Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481–3490.
[29]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405–421.
[30]
Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. 2020. Blockgan: Learning 3d object-aware scene representations from unlabelled images. Advances in Neural Information Processing Systems 33 (2020), 6767–6778.
[31]
Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.
[32]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642–2651.
[33]
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.
[34]
William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, and Antonio Torralba. 2020. The hessian penalty: A weak prior for unsupervised disentanglement. In European Conference on Computer Vision. Springer, 581–597.
[35]
Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301–5310.
[36]
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2304–2314.
[37]
Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems 33 (2020), 20154–20166.
[38]
Krishna Kumar Singh, Utkarsh Ojha, and Yong Jae Lee. 2019. Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6490–6499.
[39]
Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020. Implicit Neural Representations with Periodic Activation Functions. ArXiv abs/2006.09661(2020).
[40]
Ivan Skorokhodov, Savva Ignatyev, and Mohamed Elhoseiny. 2021. Adversarial generation of continuous images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10753–10764.
[41]
T Tieleman and Hinton G Lecture. 2012. 5Grmsprop: Dividethe gradientbyarunningaverageofitsrecent magnitude. cOURsERA: neural networks for machine Learning 4, 2 (2012), 2631.
[42]
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. Advances in Neural Information Processing Systems 29 (2016), 613–621.
[43]
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems. 802–810.
[44]
Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A High-Resolution 3D-aware Generative Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18440–18449.
[45]
Jianwei Yang, Anitha Kannan, Dhruv Batra, and Devi Parikh. 2017. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560(2017).
[46]
Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3973–3981.
[47]
Yasin Yaz, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar, 2018. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations.
[48]
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365(2015).
[49]
Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5810.

Index Terms

  1. Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing
    December 2022
    506 pages
    ISBN:9781450398220
    DOI:10.1145/3571600
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Generative adversarial networks
    2. controllable image generation
    3. foreground-background disentanglement
    4. implicit neural representation
    5. unsupervised learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICVGIP'22

    Acceptance Rates

    Overall Acceptance Rate 95 of 286 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 35
      Total Downloads
    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media