Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.02931 (cs)

[Submitted on 5 Jan 2024]

Title:SPFormer: Enhancing Vision Transformer with Superpixel Representation

Authors:Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

Abstract:In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and applicable at both initial and intermediate feature levels.
SPFormer, trainable end-to-end, exhibits superior performance across various benchmarks. Notably, it exhibits significant improvements on the challenging ImageNet benchmark, achieving a 1.4% increase over DeiT-T and 1.1% over DeiT-S respectively. A standout feature of SPFormer is its inherent explainability. The superpixel structure offers a window into the model's internal processes, providing valuable insights that enhance the model's interpretability. This level of clarity significantly improves SPFormer's robustness, particularly in challenging scenarios such as image rotations and occlusions, demonstrating its adaptability and resilience.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.02931 [cs.CV]
	(or arXiv:2401.02931v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.02931

Submission history

From: Jieru Mei [view email]
[v1] Fri, 5 Jan 2024 18:15:26 UTC (391 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SPFormer: Enhancing Vision Transformer with Superpixel Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SPFormer: Enhancing Vision Transformer with Superpixel Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators