research-article

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

Authors:

Sukhendu DasAuthors Info & Claims

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

Article No.: 45, Pages 1 - 9

https://doi.org/10.1145/3571600.3571646

Published: 12 May 2023 Publication History

Abstract

Implicit neural representation (INR) has emanated as a powerful paradigm for 2D image representation. Recent works like INR-GAN have successfully adopted INR for 2D image synthesis. However, these lack explicit control on the generated images as achieved by their 3D-aware image synthesis counterparts like GIRAFFE. Our work investigates INRs for the task of controllable image synthesis. We propose a novel framework that allows for manipulation of foreground, background and their shape and appearance in the latent space. To achieve effective control over these attributes, we introduce a novel feature mask coupling technique that leverages the foreground and background masks for mutual learning. Extensive quantitative and qualitative analysis shows that our model can disentangle the latent space successfully and allows to change the foreground and/or background’s shape and appearance. We further demonstrate that our network takes lesser training time than other INR-based image synthesis methods.

References

[1]

Jonas Adler and Sebastian Lunz. 2018. Banach wasserstein gan. Advances in Neural Information Processing Systems 31 (2018).

[2]

Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, and Denis Korzhenkov. 2021. Image generators with conditionally-independent pixel synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14278–14287.

[3]

Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), 1798–1828.

Digital Library

[4]

Rohan Chabra, Jan E Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove, and Richard Newcombe. 2020. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In European Conference on Computer Vision. Springer, 608–625.

Digital Library

[5]

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems 29 (2016).

[6]

Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8628–8638.

[7]

Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939–5948.

[8]

Julian Chibane, Gerard Pons-Moll, 2020. Neural unsigned distance fields for implicit function learning. Advances in Neural Information Processing Systems 33 (2020), 21638–21652.

[9]

Emily L Denton 2017. Unsupervised learning of disentangled representations from video. Advances in neural information processing systems 30 (2017).

[10]

Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, 2021. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34 (2021), 19822–19835.

[11]

Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, and Thomas Funkhouser. 2019. Learning shape templates with structured implicit functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7154–7164.

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).

[13]

Sonam Gupta, Arti Keshari, and Sukhendu Das. 2021. G3AN++ exploring wide GANs with complementary feature learning for video generation. In Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing. 1–9.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[15]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and S. Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS.

[16]

Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2016. beta-vae: Learning basic visual concepts with a constrained variational framework. (2016).

[17]

Qiyang Hu, Attila Szabó, Tiziano Portenier, Paolo Favaro, and Matthias Zwicker. 2018. Disentangling factors of variation by mixing them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3399–3407.

[18]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501–1510.

[19]

Chiyu Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser, 2020. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6001–6010.

[20]

Zhiqiang Tao Jiang, Songyao and Yun Fu. 2019. Segmentation guided image-to-image translation with adversarial networks. In 14th IEEE International Conference on Automatic Face & Gesture Recognition.

Digital Library

[21]

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852–863.

[22]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.

[23]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.

[24]

Arti Keshari, Sonam Gupta, and Sukhendu Das. 2021. V3GAN: Decomposing Background, Foreground and Motion for Video Generation. (2021).

[25]

Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and Honglak Lee. 2020. High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision. Springer, 157–174.

Digital Library

[26]

Yuheng Li, Krishna Kumar Singh, Yang Xue, and Yong Jae Lee. 2021. Partgan: Weakly-supervised part decomposition for image generation and segmentation. In British Machine Vision Conference (BMVC).

[27]

Meichen Liu, Xin Yan, Chenhui Wang, and Kejun Wang. 2021. Segmentation mask-guided person image generation. Applied Intelligence 51, 2 (2021), 1161–1176.

Digital Library

[28]

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481–3490.

[29]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405–421.

Digital Library

[30]

Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. 2020. Blockgan: Learning 3d object-aware scene representations from unlabelled images. Advances in Neural Information Processing Systems 33 (2020), 6767–6778.

[31]

Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.

[32]

Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642–2651.

[33]

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.

[34]

William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, and Antonio Torralba. 2020. The hessian penalty: A weak prior for unsupervised disentanglement. In European Conference on Computer Vision. Springer, 581–597.

Digital Library

[35]

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301–5310.

[36]

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2304–2314.

[37]

Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems 33 (2020), 20154–20166.

[38]

Krishna Kumar Singh, Utkarsh Ojha, and Yong Jae Lee. 2019. Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6490–6499.

[39]

Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020. Implicit Neural Representations with Periodic Activation Functions. ArXiv abs/2006.09661(2020).

[40]

Ivan Skorokhodov, Savva Ignatyev, and Mohamed Elhoseiny. 2021. Adversarial generation of continuous images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10753–10764.

[41]

T Tieleman and Hinton G Lecture. 2012. 5Grmsprop: Dividethe gradientbyarunningaverageofitsrecent magnitude. cOURsERA: neural networks for machine Learning 4, 2 (2012), 2631.

[42]

Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. Advances in Neural Information Processing Systems 29 (2016), 613–621.

[43]

SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems. 802–810.

[44]

Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A High-Resolution 3D-aware Generative Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18440–18449.

[45]

Jianwei Yang, Anitha Kannan, Dhruv Batra, and Devi Parikh. 2017. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560(2017).

[46]

Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3973–3981.

[47]

Yasin Yaz, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar, 2018. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations.

[48]

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365(2015).

[49]

Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5810.

Index Terms

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Unsupervised Single Image Dehazing via Disentangled Representation
ICVIP '19: Proceedings of the 3rd International Conference on Video and Image Processing

Image dehazing aims to recover the latent clear content from the corresponding degraded hazy image. In this paper, we propose an unsupervised method for single image dehazing based on disentangled representation. Our proposed method does not rely on the ...
Implicit neural representation steganography by neuron pruning
Abstract
Recently, implicit neural representation (INR) has started to be applied in image steganography. However, the quality of stego and secret images represented by INR is generally low. In this paper, we propose an implicit neural representation ...
Dynamic Neural Networks for Adaptive Implicit Image Compression
Pattern Recognition and Computer Vision
Abstract
Compression with Implicit Neural Presentations (COIN) is a neural network image compression method based on multilayer perceptron (MLP). COIN encodes an image with an MLP that maps pixel positions to RGB values matching, the weights of the MLP are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

December 2022

506 pages

ISBN:9781450398220

DOI:10.1145/3571600

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICVGIP'22

ICVGIP'22: Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

December 8 - 10, 2022

Gandhinagar, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
35
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten