Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning

Chang Liu, Chongyang Tao, Jianxin Liang, Jiazhan Feng, Tao Shen, Quzhe Huang, Dongyan Zhao

Abstract

Pre-trained language models greatly improve the performance of various tasks but at a cost of high computation overhead. To facilitate practical applications, there are mainly two lines of research to accelerate model inference: model compression and dynamic computation (e.g., dynamic token pruning). Existing works either adopt these methods individually or simply apply dynamic computation approaches upon a compressed small language model. We argue that they are sub-optimal since the two approaches are separately designed so the compressed model may not be tailored for dynamic computation. To tackle this problem and make compressed small language models faster, we propose Length-Adaptive Distillation, a two-stage knowledge distillation framework that aims to produce a customized small language model for dynamic token pruning. In the general distillation stage, we enforce the student to mimic and reconstruct the teacher’s output based on the dynamically pruned representations. Then in the task-specific distillation stage, the student is further accustomed to token pruning while absorbing the task-specific knowledge. Experimental results on GLUE benchmark demonstrate that our method can make the small language model more customized for dynamic token pruning and achieve better speed-performance trade-off.

Anthology ID:: 2023.findings-emnlp.294
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4452–4463
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.294
DOI:: 10.18653/v1/2023.findings-emnlp.294
Bibkey:
Cite (ACL):: Chang Liu, Chongyang Tao, Jianxin Liang, Jiazhan Feng, Tao Shen, Quzhe Huang, and Dongyan Zhao. 2023. Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4452–4463, Singapore. Association for Computational Linguistics.
Cite (Informal):: Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning (Liu et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.294.pdf

PDF Cite Search