Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.09808 (cs)

[Submitted on 15 Sep 2024 (v1), last revised 6 Oct 2024 (this version, v3)]

Title:Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Authors:Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang

Abstract:Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency enhancement technique for Vim models.

Comments:	Camera ready version of ECCV 2024 Workshop on Computational Aspects of Deep Learning (Best Paper Award)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.09808 [cs.CV]
	(or arXiv:2409.09808v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.09808

Submission history

From: Hui Shen [view email]
[v1] Sun, 15 Sep 2024 18:02:26 UTC (5,312 KB)
[v2] Tue, 1 Oct 2024 12:03:49 UTC (5,312 KB)
[v3] Sun, 6 Oct 2024 16:34:48 UTC (5,312 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators