research-article

Bi-directional Generation between Attributes and Images

Authors:

Tengteng ZhangAuthors Info & Claims

CSAI '19: Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence

Pages 176 - 180

https://doi.org/10.1145/3374587.3374640

Published: 04 March 2020 Publication History

Abstract

This paper investigates the problem of generating images from visual attributes and vice versa. Given the prevailing research of image recognition, the bi-directional generation between attributes and images is rarely explored due to the challenges of learning a good bidirectionally generative model and the different structure of these two modalities. To address this problem, the bidirectional generative model (BGM) which based on a variant of variational auto-encoders (VAEs) is proposed in this paper. The attributes in BGM are represented by attribute functions. The attribute functions directly ground the meaning of attributes in visual representations. They also allow the BGM to generate images and attributes bi-directionally. The BGM is applied to 3D chairs dataset to verify its validity. The BGM achieves 85.2% and 81.7% accuracy in attribute inference and image reconstruction tasks, respectively. The experimental results demonstrate the efficiency of the BGM.

References

[1]

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. 2016. Deep learning for visual understanding: a review. Neurocomputing. 187(C), 27--48.

[2]

Szegedy, C., Liu, W., Jia, Y. et al., Going deeper with convolutions. 2015. In Proceedings of the CVPR.

[3]

Simonyan, K., Zisserman, A. Very deep convolutional networks for large-scale image recognition. 2015. In Proceedings of the ICLR.

[4]

Ren, S., He, K., Girshick, R. et al. 2015.Faster R-CNN: towards real-time object detection with region proposal networks. In Proceedings of the NIPS.

[5]

Mostajabi, M., Yadollahpour, P., Shakhnarovich, G. Feedforward semantic segmentation with zoom-out features. 2015. In Proceedings of the CVPR.

[6]

Suzuki, M., Nakayama, K., & Matsuo, Y. Joint multimodal learning with deep generative models. 2017.In ICLR Workshop track.

[7]

Mao, J.Y., Gan, C., Kohli, P., Tenenbaum, J.B., & Wu, J.J. 2019. The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision. In ICLR.

[8]

Kingma, D. P., & Welling, M. 2014. Auto-encoding variational bayes. In ICLR.

[9]

Kingma, D. P., Rezende, D. J., Mohamed, S. & Welling, M. 2014. Semi-supervised learning with deep generative models. In Advances in Neural In-formation Processing Systems.

[10]

Kulkarni, T. D., Whitney, W., Kohli, P. & Tenenbaum, J. B. 2015. Deep convolutional inverse graphics network. In Advances in Neural Information Processing Systems.

[11]

Sohn, K., Yan, X., Lee, H., & Arbor, A. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems.

[12]

Pandey, G., & Dukkipati, A. 2017. Variational methods for conditional multimodal deep learning. In International Joint Conference on Neural Networks.

[13]

Goodfellow, I., Pouget-Abadie, Jean, Mirza, M., Xu, B., et al. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems.

[14]

Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B. & Lee, H. 2016. Learning what and where to draw. In Advances in Neural Information Processing Systems.

[15]

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & H. Lee. 2016. Generative adversarial text to image synthesis. In International Conference on Machine Learning.

[16]

Sohn, K., Lee, H., & Yan, X.C. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems.

[17]

Pandey, G., & Dukkipati, A. 2016. Variational methods for conditional multimodal learning: Generating human faces from attributes. arXiv preprint arXiv:1603.01801.

[18]

Vedantam, R., Fischer, I., Huang, J., & Murphy, K. 2017.Generative models of visually grounded imagination. arXiv preprint arXiv:1705.10762.

[19]

Higgins, I., Sonnerat, N., Matthey, L. Pal, A., Burgess, C. P. & Bosnjak, M. et al. 2017. Scan: learning hierarchical compositional visual concepts. arXiv preprint arXiv:1707.03389.

[20]

Chen, R. T. Q., Li, X., Grosse, R., & Duvenaud, D. 2018. Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems.

[21]

Bengio, Y., Courville, A., & Vincent, P. Representation learning: a review and new perspectives. 2012. IEEE Transactions on Pattern Analysis & Machine Intelligence. 35(8), 1798--1828.

Digital Library

[22]

Aubry, M., Maturana, D., Efros, A. A., Russell B. C., & Sivic, J. 2014. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models. In IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[23]

Kingma, D., & Ba., J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Index Terms

Bi-directional Generation between Attributes and Images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

Bi-directional polarised light transport
EGSR '16: Proceedings of the Eurographics Symposium on Rendering: Experimental Ideas & Implementations

While there has been considerable applied research in computer graphics on polarisation rendering, no principled investigation of how the inclusion of polarisation information affects the mathematical formalisms that are used to describe light transport ...
A Novel Image Reconstruction Using Second Generation Wavelets
ARTCOM '09: Proceedings of the 2009 International Conference on Advances in Recent Technologies in Communication and Computing

In this paper, a novel Second Generation Wavelets based lifting filter for image reconstruction has been proposed. This paper, presents a progressive image reconstruction scheme based on the multi-scale edge representation of images. In the multi-...
Reconstructing arbitrarily focused images from two differently focused images using linear filters

We present a novel filtering method for reconstructing an all-in-focus image or an arbitrarily focused image from two images that are focused differently. The method can arbitrarily manipulate the degree of blur of the objects using linear filters ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSAI '19: Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence

December 2019

370 pages

ISBN:9781450376273

DOI:10.1145/3374587

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Shenzhen University: Shenzhen University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CSAI2019

CSAI2019: 2019 3rd International Conference on Computer Science and Artificial Intelligence

December 6 - 8, 2019

IL, Normal, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
44
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten