May 28, 2021 · We introduce a pre-training framework named "knowledge inheritance" (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre ...
We introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre ...
A pre-training framework named “knowledge inheritance” (KI) is introduced and how could knowledge distillation serve as auxiliary supervision during pre- ...
Oct 16, 2021 · Specifically, we introduce a novel pre-training framework named "knowledge inheritance" (KI), which combines both self-learning and teacher- ...
Specifically, we introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary ...
Specifically, we introduce a novel pre-training framework named "knowledge inheritance" (KI), which combines both self-learning and teacher-guided learning to ...
People also ask
How do you train a Pretrained language model?
Can language models be knowledge bases?
Why are language models pretrained?
What have language models learned?
They start pre-training a small model with fewer Transformer layers, and then iteratively expand the model by stacking the already trained layers on the top.
Figure 1: (a) The validation PPL curve for pre-training ML under KI framework (BASE → LARGE) and the self- learning baseline (LARGE).
Knowledge Inheritance for Pre-trained Language Models. Y. Qin, Y. Lin, J. Yi, J. Zhang, X. Han, Z. Zhang, Y. Su, Z. Liu, P. Li, M. Sun, and J. Zhou.