Abstract
With the wide popularity of Pre-trained Language Models (PLMs), it has been a hot research topic to improve the performance of PLMs in the few-shot learning setting. FewCLUE is a new benchmark to evaluate the few-shot learning ability of PLMs over nine challenging Chinese language understanding tasks, which poses significant challenges to the learning process of PLMs with very little training data available. In this paper, we present our solution to FewCLUE tasks by means of large-scale knowledge-enhanced pre-training over massive texts and knowledge triples, together with a new few-shot learning algorithm for downstream tasks. Experimental results show that the generated models achieve the best performance in both limited and unlimited tracks of FewCLUE. Our solution is developed upon the PyTorch version of the EasyTransfer toolkit and will be released to public.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
For the Chinese language, we can use multiple masked tokens to generate model outputs in the form of multiple Chinese characters. For simplicity, in the algorithm description, we assume there is only one masked token.
- 3.
References
Bao, H., et al.: UniLMv2: pseudo-masked language models for unified language model pre-training. In: ICML, vol. 119, pp. 642–652 (2020)
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Brown, T.B., et al.: Language models are few-shot learners. In: NeurIPS (2020)
Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016)
Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z., Wang, S., Hu, G.: Pre-training with whole word masking for Chinese BERT. CoRR abs/1906.08101 (2019)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: ACL, pp. 2978–2988 (2019)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. CoRR abs/2012.15723 (2020)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: CVPR, pp. 2704–2713 (2018)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: ICLR (2020)
Liu, X., et al.: GPT understands, too. CoRR abs/2103.10385 (2021)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)
Micikevicius, P., et al.: Mixed precision training. CoRR abs/1710.03740 (2017)
Peters, M.E., et al.: Knowledge enhanced contextual word representations. In: EMNLP, pp. 43–54 (2019)
Phang, J., Févry, T., Bowman, S.R.: Sentence encoders on stilts: supplementary training on intermediate labeled-data tasks. CoRR abs/1811.01088 (2018)
Qiu, M., et al.: EasyTransfer - a simple and scalable deep transfer learning platform for NLP applications. CIKM 2021 (2020). https://arxiv.org/abs/2011.09463
Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X.: Pre-trained models for natural language processing: a survey. CoRR abs/2003.08271 (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)
Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: DeepSpeed: system optimizations enable training deep learning models with over 100 billion parameters. In: SIGKDD, pp. 3505–3506. ACM (2020)
Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: EACL, pp. 255–269 (2021)
Shin, T., Razeghi, Y., IV, R.L.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: EMNLP, pp. 4222–4235 (2020)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Wang, A., et al.: SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS, pp. 3261–3275 (2019)
Wang, S., Fang, H., Khabsa, M., Mao, H., Ma, H.: Entailment as few-shot learner. CoRR abs/2104.14690 (2021)
Wang, W., et al.: StructBERT: incorporating language structures into pre-training for deep language understanding. In: ICLR (2020)
Wang, X., Gao, T., Zhu, Z., Liu, Z., Li, J., Tang, J.: KEPLER: a unified model for knowledge embedding and pre-trained language representation. CoRR abs/1911.06136 (2019)
Xu, L., et al.: CLUE: a Chinese language understanding evaluation benchmark. In: COLING, pp. 4762–4772 (2020)
Xu, L., Zhang, X., Dong, Q.: CLUECorpus 2020: a large-scale Chinese corpus for pre-training language model. CoRR abs/2003.01355 (2020)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: NeurIPS, pp. 5754–5764 (2019)
Yin, W., Rajani, N.F., Radev, D.R., Socher, R., Xiong, C.: Universal natural language processing with limited annotations: try few-shot textual entailment as a start. In: EMNLP, pp. 8229–8239 (2020)
Zhang, D., et al.: E-BERT: a phrase and product knowledge enhanced language model for e-commerce. CoRR abs/2009.02835 (2020)
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: ACL, pp. 1441–1451 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Z. et al. (2021). When Few-Shot Learning Meets Large-Scale Knowledge-Enhanced Pre-training: Alibaba at FewCLUE. In: Wang, L., Feng, Y., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2021. Lecture Notes in Computer Science(), vol 13029. Springer, Cham. https://doi.org/10.1007/978-3-030-88483-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-88483-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88482-6
Online ISBN: 978-3-030-88483-3
eBook Packages: Computer ScienceComputer Science (R0)