Nothing Special   »   [go: up one dir, main page]

×
Please click here if you are not redirected within a few seconds.
May 28, 2021 · We introduce a pre-training framework named "knowledge inheritance" (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre ...
We introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre ...
Knowledge Inheritance for Pre-trained Language Models. from github.com
Nov 25, 2023 · We also provide the pre-training data (already processed in fairseq format) we use in google drive, covering five pre-training domains (WB, News ...
A pre-training framework named “knowledge inheritance” (KI) is introduced and how could knowledge distillation serve as auxiliary supervision during pre- ...
Oct 16, 2021 · Specifically, we introduce a novel pre-training framework named "knowledge inheritance" (KI), which combines both self-learning and teacher- ...
Specifically, we introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary ...
Specifically, we introduce a novel pre-training framework named "knowledge inheritance" (KI), which combines both self-learning and teacher-guided learning to ...
People also ask
They start pre-training a small model with fewer Transformer layers, and then iteratively expand the model by stacking the already trained layers on the top.
Figure 1: (a) The validation PPL curve for pre-training ML under KI framework (BASE → LARGE) and the self- learning baseline (LARGE).
Knowledge Inheritance for Pre-trained Language Models. Y. Qin, Y. Lin, J. Yi, J. Zhang, X. Han, Z. Zhang, Y. Su, Z. Liu, P. Li, M. Sun, and J. Zhou.