Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.03553 (cs)

[Submitted on 5 Sep 2024 (v1), last revised 2 Oct 2024 (this version, v3)]

Title:Organized Grouped Discrete Representation for Object-Centric Learning

Authors:Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

Abstract:Object-Centric Learning (OCL) represents dense image or video pixels as sparse object features. Representative methods utilize discrete representation composed of Variational Autoencoder (VAE) template features to suppress pixel-level information redundancy and guide object-level feature aggregation. The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes. However, its naive channel grouping as decomposition may erroneously group channels belonging to different attributes together and discretize them as sub-optimal template attributes, which losses information and harms expressivity. We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes. In unsupervised segmentation experiments, OGDR is fully superior to GDR in augmentating classical transformer-based OCL methods; it even improves state-of-the-art diffusion-based ones. Codebook PCA and representation similarity analyses show that compared with GDR, our OGDR eliminates redundancy and preserves information better for guiding object representation learning. The source code is available in the supplementary material.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.03553 [cs.CV]
	(or arXiv:2409.03553v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.03553

Submission history

From: Rongzhen Zhao [view email]
[v1] Thu, 5 Sep 2024 14:13:05 UTC (6,343 KB)
[v2] Tue, 10 Sep 2024 21:44:08 UTC (5,254 KB)
[v3] Wed, 2 Oct 2024 12:40:01 UTC (4,023 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Organized Grouped Discrete Representation for Object-Centric Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Organized Grouped Discrete Representation for Object-Centric Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators