Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.19905 (cs)

[Submitted on 28 Jun 2024 (v1), last revised 5 Aug 2024 (this version, v2)]

Title:Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Authors:Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

Abstract:The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token. However, the predictions are based solely on sample features and do not truly reveal the optimization directions of tokens. This may lead to severe optimization interference between different tokens assigned to an expert. To address this problem, this paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC). Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a specialized loss tailored to eliminate conflicts among tokens within each expert. Our method can serve as a plug-in for diverse Large Vision-Language Models, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.19905 [cs.CV]
	(or arXiv:2406.19905v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.19905

Submission history

From: Xi Li [view email]
[v1] Fri, 28 Jun 2024 13:20:17 UTC (2,512 KB)
[v2] Mon, 5 Aug 2024 12:12:48 UTC (2,370 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators