Supervising Variational Autoencoder Latent Representations with Language

Thomas Lu, Aboli Marathe, Ada Martin
Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, PMLR 243:267-278, 2024.

Abstract

Supervising latent representations of data is of great interest for modern multi-modal generative machine learning. In this work, we propose two new methods to use text to condition the latent representations of a VAE, and evaluate them on a novel conditional image-generation benchmark task. We find that the applied methods can be used to generate highly accurate reconstructed images through language querying with minimal compute resources. Our methods are quantitatively successful at conforming to textually-supervised attributes of an image while keeping unsupervised attributes constant. At large, we present critical observations on disentanglement between supervised and unsupervised properties of images and identify common barriers to effective disentanglement.

Cite this Paper


BibTeX
@InProceedings{pmlr-v243-lu24a, title = {Supervising Variational Autoencoder Latent Representations with Language}, author = {Lu, Thomas and Marathe, Aboli and Martin, Ada}, booktitle = {Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models}, pages = {267--278}, year = {2024}, editor = {Fumero, Marco and Rodolá, Emanuele and Domine, Clementine and Locatello, Francesco and Dziugaite, Karolina and Mathilde, Caron}, volume = {243}, series = {Proceedings of Machine Learning Research}, month = {15 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v243/lu24a/lu24a.pdf}, url = {https://proceedings.mlr.press/v243/lu24a.html}, abstract = {Supervising latent representations of data is of great interest for modern multi-modal generative machine learning. In this work, we propose two new methods to use text to condition the latent representations of a VAE, and evaluate them on a novel conditional image-generation benchmark task. We find that the applied methods can be used to generate highly accurate reconstructed images through language querying with minimal compute resources. Our methods are quantitatively successful at conforming to textually-supervised attributes of an image while keeping unsupervised attributes constant. At large, we present critical observations on disentanglement between supervised and unsupervised properties of images and identify common barriers to effective disentanglement.} }
Endnote
%0 Conference Paper %T Supervising Variational Autoencoder Latent Representations with Language %A Thomas Lu %A Aboli Marathe %A Ada Martin %B Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models %C Proceedings of Machine Learning Research %D 2024 %E Marco Fumero %E Emanuele Rodolá %E Clementine Domine %E Francesco Locatello %E Karolina Dziugaite %E Caron Mathilde %F pmlr-v243-lu24a %I PMLR %P 267--278 %U https://proceedings.mlr.press/v243/lu24a.html %V 243 %X Supervising latent representations of data is of great interest for modern multi-modal generative machine learning. In this work, we propose two new methods to use text to condition the latent representations of a VAE, and evaluate them on a novel conditional image-generation benchmark task. We find that the applied methods can be used to generate highly accurate reconstructed images through language querying with minimal compute resources. Our methods are quantitatively successful at conforming to textually-supervised attributes of an image while keeping unsupervised attributes constant. At large, we present critical observations on disentanglement between supervised and unsupervised properties of images and identify common barriers to effective disentanglement.
APA
Lu, T., Marathe, A. & Martin, A.. (2024). Supervising Variational Autoencoder Latent Representations with Language. Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, in Proceedings of Machine Learning Research 243:267-278 Available from https://proceedings.mlr.press/v243/lu24a.html.

Related Material