Computer Science > Computation and Language

arXiv:2405.15198 (cs)

[Submitted on 24 May 2024 (v1), last revised 20 Sep 2024 (this version, v2)]

Title:RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference

Authors:Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

Abstract:Deploying large language model inference remains challenging due to their high computational overhead. Early exiting optimizes model inference by adaptively reducing the number of inference layers. Existing methods typically train internal classifiers to determine whether to exit at intermediate layers. However, such classifier-based early exiting frameworks require significant effort to train the classifiers while can only achieve comparable performance at best. To address these limitations, this paper proposes RAEE, a robust Retrieval-Augmented Early Exiting framework for efficient inference. First, this paper demonstrates that the early exiting problem can be modeled as a distribution prediction problem, where the distribution is approximated using similar data's exiting information. Then, this paper details the process of collecting exiting information to build the retrieval database. Finally, based on the pre-built retrieval database, RAEE leverages the retrieved similar data's exiting information to guide the backbone model to exit at the layer, which is predicted by the approximated distribution. Experimental results demonstrate that the proposed RAEE can significantly accelerate inference. More importantly, RAEE can also achieve a robust zero-shot performance on 8 downstream tasks.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2405.15198 [cs.CL]
	(or arXiv:2405.15198v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.15198

Submission history

From: Lianming Huang [view email]
[v1] Fri, 24 May 2024 04:01:24 UTC (152 KB)
[v2] Fri, 20 Sep 2024 14:06:28 UTC (281 KB)

Computer Science > Computation and Language

Title:RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators