VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis

Tathagat Verma, Abir De, Yateesh Agrawal, Vishwa Vinay, Soumen Chakrabarti
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:22168-22183, 2022.

Abstract

Scene graphs are powerful abstractions that capture relationships between objects in images by modeling objects as nodes and relationships as edges. Generation of realistic synthetic scene graphs has applications like scene synthesis and data augmentation for supervised learning. Existing graph generative models are predominantly targeted toward molecular graphs, leveraging the limited vocabulary of atoms and bonds and also the well-defined semantics of chemical compounds. In contrast, scene graphs have much larger object and relation vocabularies, and their semantics are latent. To address this challenge, we propose a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs. Our method views a scene graph as a collection of star graphs and encodes it into a latent representation of the underlying stars. The decoder generates scene graphs by learning to sample the component stars and edges between them. Our experiments show that our method is able to mimic the underlying scene graph generative process more accurately than several state-of-the-art baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-verma22b, title = {{V}ar{S}cene: A Deep Generative Model for Realistic Scene Graph Synthesis}, author = {Verma, Tathagat and De, Abir and Agrawal, Yateesh and Vinay, Vishwa and Chakrabarti, Soumen}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {22168--22183}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/verma22b/verma22b.pdf}, url = {https://proceedings.mlr.press/v162/verma22b.html}, abstract = {Scene graphs are powerful abstractions that capture relationships between objects in images by modeling objects as nodes and relationships as edges. Generation of realistic synthetic scene graphs has applications like scene synthesis and data augmentation for supervised learning. Existing graph generative models are predominantly targeted toward molecular graphs, leveraging the limited vocabulary of atoms and bonds and also the well-defined semantics of chemical compounds. In contrast, scene graphs have much larger object and relation vocabularies, and their semantics are latent. To address this challenge, we propose a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs. Our method views a scene graph as a collection of star graphs and encodes it into a latent representation of the underlying stars. The decoder generates scene graphs by learning to sample the component stars and edges between them. Our experiments show that our method is able to mimic the underlying scene graph generative process more accurately than several state-of-the-art baselines.} }
Endnote
%0 Conference Paper %T VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis %A Tathagat Verma %A Abir De %A Yateesh Agrawal %A Vishwa Vinay %A Soumen Chakrabarti %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-verma22b %I PMLR %P 22168--22183 %U https://proceedings.mlr.press/v162/verma22b.html %V 162 %X Scene graphs are powerful abstractions that capture relationships between objects in images by modeling objects as nodes and relationships as edges. Generation of realistic synthetic scene graphs has applications like scene synthesis and data augmentation for supervised learning. Existing graph generative models are predominantly targeted toward molecular graphs, leveraging the limited vocabulary of atoms and bonds and also the well-defined semantics of chemical compounds. In contrast, scene graphs have much larger object and relation vocabularies, and their semantics are latent. To address this challenge, we propose a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs. Our method views a scene graph as a collection of star graphs and encodes it into a latent representation of the underlying stars. The decoder generates scene graphs by learning to sample the component stars and edges between them. Our experiments show that our method is able to mimic the underlying scene graph generative process more accurately than several state-of-the-art baselines.
APA
Verma, T., De, A., Agrawal, Y., Vinay, V. & Chakrabarti, S.. (2022). VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:22168-22183 Available from https://proceedings.mlr.press/v162/verma22b.html.

Related Material