Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.08083 (cs)

[Submitted on 10 Jul 2024]

Title:MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Abstract:We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. In addition, we conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results demonstrate that equipping the Mamba architecture with several self-attention blocks at the final layers greatly improves the modeling capacity to capture long-range spatial dependencies. Based on our findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria. For Image classification on ImageNet-1K dataset, MambaVision model variants achieve a new State-of-the-Art (SOTA) performance in terms of Top-1 accuracy and image throughput. In downstream tasks such as object detection, instance segmentation and semantic segmentation on MS COCO and ADE20K datasets, MambaVision outperforms comparably-sized backbones and demonstrates more favorable performance. Code: this https URL.

Comments:	Tech. report
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.08083 [cs.CV]
	(or arXiv:2407.08083v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.08083

Submission history

From: Ali Hatamizadeh [view email]
[v1] Wed, 10 Jul 2024 23:02:45 UTC (2,066 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators