Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.08013 (cs)

[Submitted on 15 Dec 2022 (v1), last revised 23 Mar 2023 (this version, v2)]

Title:FlexiViT: One Model for All Patch Sizes

Authors:Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic

View PDF

Abstract:Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at this https URL

Comments:	Code and pre-trained models available at this https URL. All authors made significant technical contributions. CVPR 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2212.08013 [cs.CV]
	(or arXiv:2212.08013v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.08013

Submission history

From: Lucas Beyer [view email]
[v1] Thu, 15 Dec 2022 18:18:38 UTC (14,551 KB)
[v2] Thu, 23 Mar 2023 21:38:16 UTC (14,577 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FlexiViT: One Model for All Patch Sizes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FlexiViT: One Model for All Patch Sizes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators