Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Meta-learning for compressed language model: : A multiple choice question answering study

Published: 28 May 2022 Publication History

Abstract

Model compression is a promising approach for reducing the model size of pretrained-language-models (PLMs) on low resource edge devices and applications. Unfortunately, the compression process always accompanies a cost of performance degradation, especially for the low resource downstream tasks, i.e., multiple-choice question answering. To address the degradation issue of model compression on PLMs, we proposed an end-to-end reptile (ETER) meta-learning approach to improving the performance of PLMs on the low resource multiple-choice question answering task. Specifically, our ETER improves the traditional two-stage meta-learning to an end-to-end manner, integrating the target finetuning stage into the meta training stage. To strengthen the generic meta-learning, ETER employs two-level meta-task construction from instance-level and domain-level to enrich its task generalization. What is more, ETER optimizes meta-learning by parameter constraints to reduce its parameter learning space. Experiments demonstrate that ETER significantly improved the performance of compressed PLMs and achieved large superiority over the baselines on different datasets.

References

[1]
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL:https://www.aclweb.org/anthology/N19-1423.
[2]
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P.J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach. Learn. Res. 21 (140) (2020) 1–67. URL:http://jmlr.org/papers/v21/20-074.html.
[3]
K. Sun, D. Yu, J. Chen, D. Yu, Y. Choi, C. Cardie, DREAM: a challenge data set and models for dialogue-based reading comprehension, Trans. Assoc. Comput. Linguist. 7 (2019) 217–231,. URL:https://www.aclweb.org/anthology/Q19-1014.
[4]
G. Lai, Q. Xie, H. Liu, Y. Yang, E. Hovy, RACE: large-scale ReAding comprehension dataset from examinations, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, 2017, pp. 785–794,. URL:https://www.aclweb.org/anthology/D17-1082.
[5]
T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners (2020). arXiv:2005.14165.
[6]
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, Albert: a lite bert for self-supervised learning of language representations (2020). arXiv:1909.11942.
[7]
P. Ganesh, Y. Chen, X. Lou, M.A. Khan, Y. Yang, D. Chen, M. Winslett, H. Sajjad, P. Nakov, Compressing large-scale transformer-based models: a case study on bert (2020). arXiv:2002.11985.
[8]
D. Jin, E. Pan, N. Oufattole, W.-H. Weng, H. Fang, P. Szolovits, What disease does this patient have? A large-scale open domain question answering dataset from medical exams (2020). arXiv:2009.13081.
[9]
M. Richardson, C.J. Burges, E. Renshaw, MCTest: a challenge dataset for the open-domain machine comprehension of text, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, 2013, pp. 193–203. URL:https://www.aclweb.org/anthology/D13-1020.
[10]
M. Yan, H. Zhang, D. Jin, J.T. Zhou, Multi-source meta transfer for low resource multiple-choice question answering, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 7331–7341,. Online, URL:https://www.aclweb.org/anthology/2020.acl-main.654.
[11]
C. Finn, P. Abbeel, S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in: D. Precup, Y.W. Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 2017, pp. 1126–1135. URL:http://proceedings.mlr.press/v70/finn17a.html
[12]
V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter (2020). arXiv:1910.01108.
[13]
X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, Q. Liu, TinyBERT: distilling BERT for natural language understanding, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 4163–4174. URL:https://www.aclweb.org/anthology/2020.findings-emnlp.372.
[14]
M.A. Gordon, K. Duh, N. Andrews, Compressing bert: studying the effects of weight pruning on transfer learning (2020). URL:https://openreview.net/forum?id=SJlPOCEKvH.
[15]
V. Sanh, T. Wolf, A.M. Rush, Movement pruning: Adaptive sparsity by fine-tuning (2020). arXiv:2005.07683
[16]
Y. Tay, D. Bahri, D. Metzler, D.-C. Juan, Z. Zhao, C. Zheng, Synthesizer: rethinking self-attention in transformer models (2020). arXiv:2005.00743.
[17]
O. Zafrir, G. Boudoukh, P. Izsak, M. Wasserblat, Q8bert: Quantized 8bit bert (2019). arXiv:1910.06188.
[18]
S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, M.W. Mahoney, K. Keutzer, Q-bert: Hessian based ultra low precision quantization of bert (2019). arXiv:1909.05840.
[19]
Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang, D. Zhou, MobileBERT: a compact task-agnostic BERT for resource-limited devices, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 2158–2170. URL:https://www.aclweb.org/anthology/2020.acl-main.195
[20]
A. Raganato, Y. Scherrer, J. Tiedemann, Fixed encoder self-attention patterns in transformer-based machine translation, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 556–568. URL:https://www.aclweb.org/anthology/2020.findings-emnlp.49.
[21]
Z. Wang, J. Wohlwend, T. Lei, Structured pruning of large language models, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 6151–6162. URL:https://www.aclweb.org/anthology/2020.emnlp-main.496.
[22]
S. Zhao, R. Gupta, Y. Song, D. Zhou, Extreme language model compression with optimal subwords and shared projections (2019). arXiv:1909.11687.
[23]
Z. Jiang, W. Yu, D. Zhou, Y. Chen, J. Feng, S. Yan, Convbert: improving bert with span-based dynamic convolution (2020). arXiv:2008.02496.
[24]
C. Lemke, M. Budka, B. Gabrys, Metalearning: a survey of trends and technologies, Artif. Intell. Rev. 44 (1) (2015) 117–130,.
[25]
A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, T. Lillicrap, Meta-learning with memory-augmented neural networks, in: M.F. Balcan, K.Q. Weinberger (Eds.), Proceedings of The 33rd International Conference on Machine Learning, Vol. 48 of Proceedings of Machine Learning Research, PMLR, New York, New York, USA, 2016, pp. 1842–1850. URL:http://proceedings.mlr.press/v48/santoro16.html
[26]
O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, D. Wierstra, Matching networks for one shot learning, in: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, vol. 29, Curran Associates Inc., 2016, pp. 3630–3638. URL:https://proceedings.neurips.cc/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf.
[27]
J. Snell, K. Swersky, R. Zemel, Prototypical networks for few-shot learning, in: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, vol. 30, Curran Associates Inc., 2017, pp. 4077–4087. URL:https://proceedings.neurips.cc/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf.
[28]
A. Nichol, J. Achiam, J. Schulman, On first-order meta-learning algorithms (2018). arXiv:1803.02999.
[29]
R. Sennrich, B. Zhang, Revisiting low-resource neural machine translation: A case study, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 211–221. URL:https://www.aclweb.org/anthology/P19-1021.
[30]
D. Shen, M.R. Min, Y. Li, L. Carin, Learning context-sensitive convolutional filters for text processing, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 1839–1848,. URL:https://www.aclweb.org/anthology/D18-1210.
[31]
D. Guo, D. Tang, N. Duan, M. Zhou, J. Yin, Coupling retrieval and meta-learning for context-dependent semantic parsing, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 855–866,. URL:https://www.aclweb.org/anthology/P19-1082.
[32]
Q. Sun, Y. Liu, T.-S. Chua, B. Schiele, Meta-transfer learning for few-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 403–412.
[33]
Z. Liu, R. Zhang, Y. Song, M. Zhang, When does maml work the best? An empirical study on model-agnostic meta-learning in nlp applications (2020). arXiv:2005.11700.
[34]
D. Dua, Y. Wang, P. Dasigi, G. Stanovsky, S. Singh, M. Gardner, DROP: a reading comprehension benchmark requiring discrete reasoning over paragraphs, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 2368–2378,. URL:https://www.aclweb.org/anthology/N19-1246.
[35]
D. Jin, S. Gao, J.-Y. Kao, T. Chung, D. Hakkani-tur, Mmm: multi-stage multi-task learning for multi-choice reading comprehension, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 8010–8017.
[36]
C. Zhu, Y. Cheng, Z. Gan, S. Sun, T. Goldstein, J. Liu, Freelb: enhanced adversarial training for natural language understanding (2020). arXiv:1909.11764.
[37]
M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, B. Catanzaro, Megatron-lm: training multi-billion parameter language models using model parallelism (2020). arXiv:1909.08053.
[38]
S. Ostermann, M. Roth, A. Modi, S. Thater, M. Pinkal, SemEval-2018 task 11: machine comprehension using commonsense knowledge, in: Proceedings of The 12th International Workshop on Semantic Evaluation, Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 747–757. URL:https://www.aclweb.org/anthology/S18-1119.
[39]
P. Rajpurkar, J. Zhang, K. Lopyrev, P. Liang, SQuAD: 100,000+ questions for machine comprehension of text, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, 2016, pp. 2383–2392,. URL:https://www.aclweb.org/anthology/D16-1264.
[40]
A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, K. Suleman, NewsQA: a machine comprehension dataset, in: Proceedings of the 2nd Workshop on Representation Learning for NLP, Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 191–200. URL:https://www.aclweb.org/anthology/W17-2623.
[41]
T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T.L. Scao, S. Gugger, M. Drame, Q. Lhoest, A.M. Rush, Transformers: state-of-the-art natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, 2020, pp. 38–45. Online, URL:https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[42]
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R.R. Salakhutdinov, Q.V. Le, Xlnet: Generalized autoregressive pretraining for language understanding, in: Advances in Neural Information Processing Systems 32, Curran Associates Inc., 2019, pp. 5754–5764. URL:http://papers.nips.cc/paper/8812.pdf.
[43]
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: a robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692. URL:https://arxiv.org/abs/1907.11692.
[44]
P. Zhu, H. Zhao, X. Li, Duma: reading comprehension with transposition thinking (2020). arXiv:2001.09415.
[45]
L. Wang, M. Sun, W. Zhao, K. Shen, J. Liu, Yuanfudao at SemEval-2018 task 11: three-way attention and relational knowledge for commonsense machine comprehension, in: Proceedings of The 12th International Workshop on Semantic Evaluation, Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 758–762,. URL:https://www.aclweb.org/anthology/S18-1120.
[46]
Z. Chen, Y. Cui, W. Ma, S. Wang, T. Liu, G. Hu, Hfl-rc system at semeval-2018 task 11: hybrid multi-aspects model for commonsense reading comprehension (2018). arXiv:1803.05655.
[47]
J. Xia, C. Wu, M. Yan, Incorporating relation knowledge into commonsense reading comprehension with multi-task learning, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 2393–2396,. URL:https://doi.org/10.1145/3357384.3358165.
[48]
K. Sun, D. Yu, D. Yu, C. Cardie, Improving machine reading comprehension with general reading strategies, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 2633–2643. URL:https://www.aclweb.org/anthology/N19-1270.
[49]
C. Si, S. Wang, M.-Y. Kan, J. Jiang, What does bert learn from multiple-choice reading comprehension datasets? (2019). arXiv:1910.12391.
[50]
Y.-A. Chung, H.-Y. Lee, J. Glass, Supervised and unsupervised transfer learning for question answering, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 1585–1594. URL:https://www.aclweb.org/anthology/N18-1143.
[51]
J. Yu, Z. Zha, J. Yin, Inferential machine comprehension: answering questions by recursively deducing the evidence chain from text, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 2241–2251,. URL:https://www.aclweb.org/anthology/P19-1217.
[52]
D. Khashabi, S. Min, T. Khot, A. Sabharwal, O. Tafjord, P. Clark, H. Hajishirzi, Unifiedqa: crossing format boundaries with a single qa system (2020). arXiv:2005.00700.

Cited By

View all
  • (2024)An efficient confusing choices decoupling framework for multi-choice tasks over textsNeural Computing and Applications10.1007/s00521-023-08795-436:1(259-271)Online publication date: 1-Jan-2024
  • (2023)GACaps-HTC: graph attention capsule network for hierarchical text classificationApplied Intelligence10.1007/s10489-023-04585-653:17(20577-20594)Online publication date: 1-Sep-2023

Index Terms

  1. Meta-learning for compressed language model: A multiple choice question answering study
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Neurocomputing
          Neurocomputing  Volume 487, Issue C
          May 2022
          301 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 28 May 2022

          Author Tags

          1. End-to-end reptile
          2. Compressed pretrained-language-model
          3. Meta-learning
          4. Multiple-choice question answering

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 14 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)An efficient confusing choices decoupling framework for multi-choice tasks over textsNeural Computing and Applications10.1007/s00521-023-08795-436:1(259-271)Online publication date: 1-Jan-2024
          • (2023)GACaps-HTC: graph attention capsule network for hierarchical text classificationApplied Intelligence10.1007/s10489-023-04585-653:17(20577-20594)Online publication date: 1-Sep-2023

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media