Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.01726 (cs)

This paper has been withdrawn by Rongzhen Zhao

[Submitted on 1 Jul 2024 (v1), last revised 19 Dec 2024 (this version, v3)]

Title:Grouped Discrete Representation Guides Object-Centric Learning

Authors:Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen

No PDF available, click to view other formats

Abstract:Similar to humans perceiving visual scenes as objects, Object-Centric Learning (OCL) can abstract dense images or videos into sparse object-level features. Transformer-based OCL handles complex textures well due to the decoding guidance of discrete representation, obtained by discretizing noisy features in image or video feature maps using template features from a codebook. However, treating features as minimal units overlooks their composing attributes, thus impeding model generalization; indexing features with natural numbers loses attribute-level commonalities and characteristics, thus diminishing heuristics for model convergence. We propose \textit{Grouped Discrete Representation} (GDR) to address these issues by grouping features into attributes and indexing them with tuple numbers. In extensive experiments across different query initializations, dataset modalities, and model architectures, GDR consistently improves convergence and generalizability. Visualizations show that our method effectively captures attribute-level information in features. The source code will be available upon acceptance.

Comments:	This paper, along with arXiv:2409.03553, was merged into arXiv:2411.02299
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
ACM classes:	I.4.6
Cite as:	arXiv:2407.01726 [cs.CV]
	(or arXiv:2407.01726v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.01726

Submission history

From: Rongzhen Zhao [view email]
[v1] Mon, 1 Jul 2024 19:00:40 UTC (6,844 KB)
[v2] Wed, 2 Oct 2024 11:49:31 UTC (3,091 KB)
[v3] Thu, 19 Dec 2024 19:01:41 UTC (1 KB) (withdrawn)

Computer Science > Computer Vision and Pattern Recognition

Title:Grouped Discrete Representation Guides Object-Centric Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Grouped Discrete Representation Guides Object-Centric Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators