Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3446999.3447006acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicitConference Proceedingsconference-collections
Article

Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging

Published: 09 April 2021 Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on May 30, 2021. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

Current deep learning based cross-lingual Part-of-Speech (POS) tagging methods are limited by their ability to achieve fast learning and generalization when the data in the target language is scarce. In this paper, we integrate a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better POS tagging. Based on the Model-Agnostic Meta-Learning framework (MAML), we propose a Lingual-Agnostic Meta-Learning (LAML) for cross-lingual low-resource POS tagging. The proposed LAML models cross-lingual POS tagging as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource languages. Also, different from the original MALM, LAML distinguishes parameters into shared parameters for represent learning and parameters to be adapted to different languages. We demonstrate that the proposed model outperforms the multilingual, joint-learning based approaches and enables us to train a competitive POS tagging system with only a fraction of samples.

Supplementary Material

3447006-vor (3447006-vor.pdf)
Version of Record for "Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging" by Zhang et al., 2020 The 8th International Conference on Information Technology: IoT and Smart City (ICIT '20).

References

[1]
J. Buys and J. A. Botha, “Cross-lingual morphological tagging for low-resource languages,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2016, pp. 1954–1964.
[2]
J. Guo, W. Che, D. Yarowsky, H. Wang, and T. Liu, “Cross-lingual dependency parsing based on distributed representations,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 2015, pp. 1234–1244. [Online]. Available: http://aclweb.org/anthology/ P15-1119.
[3]
L. Duong, T. Cohn, K. Verspoor, S. Bird, and P. Cook, “What can we get from 1000 tokens? a case study of multilingual pos tagging for resource-poor languages,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014, pp. 886–897. [Online]. Available: http://aclweb.org/anthology/D14-1096.
[4]
M. Fang and T. Cohn, “Model transfer for tagging low-resource languages using a bilingual dictionary,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 2017, pp. 587–593. [Online]. Available: http://aclweb.org/anthology/P17-2093.
[5]
B. Plank and Zˇ . Agic´, “Distant supervision from disparate sources for low-resource part-of-speech tagging,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018, pp. 614–620. [Online]. Available: http://aclweb.org/anthology/ D18-1061.
[6]
Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Transfer learning for sequence tagging with hierarchical recurrent networks,”ICLR, 2017.
[7]
J.-K. Kim, Y.-B. Kim, R. Sarikaya, and E. Fosler-Lussier, “Cross-lingual transfer learning for pos tagging without cross-lingual resources,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017, pp. 2832–2838. [Online]. Available: http://aclweb.org/anthology/D17-1302.
[8]
Y. Lin, S. Yang, V. Stoyanov, and H. Ji, “A multi-lingual multi-task architecture for low-resource sequence labeling,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2018, pp. 799–809. [Online]. Available: http://aclweb.org/anthology/P18-1074.
[9]
C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta- learning for fast adaptation of deep networks,” ICML, 2017.
[10]
J. Nivre, Zˇ . Agic´, M. J. Aranzabe, M. Asahara, A. Atutxa, M. Ballesteros, J. Bauer, K. Bengoetxea, R. A. Bhat, C. Bosco, “Universal dependencies 1.2,” 2015.
[11]
L. Ratinov and D. Roth, “Design challenges and mis-conceptions in named entity recognition,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), 2009, pp. 147–155.
[12]
J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” 2001.
[13]
A. Passos, V. Kumar, and A. McCallum, “Lexicon infused phrase embeddings for named entity resolution,” CoNLL, 2014.
[14]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” Journal of machine learning research, vol. 12, no. Aug, pp. 2493–2537, 2011.
[15]
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures for named entity recognition,” NAACL-HLT, 2016.
[16]
X. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” ACL, 2016.
[17]
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” EMNLP, 2016.
[18]
Y. Lin, S. Yang, V. Stoyanov, and H. Ji, “A multi-lingual multi- task architecture for low-resource sequence labeling,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 799–809.
[19]
J. Gu, Y. Wang, Y. Chen, K. Cho, and V. O. Li, “Meta-learning for low-resource neural machine translation,” EMNLP, 2018.
[20]
X. Jiang, M. Havaei, G. Chartrand, H. Chouaib, T. Vincent, A. Jes- son, N. Chapados, and S. Matwin, “On the importance of attention in meta-learning for few-shot text classification,” arXiv preprint arXiv:1806.00852, 2018.
[21]
Ravi, S. and H. Larochelle. “Optimization as a Model for Few-Shot Learning.” ICLR, 2017.
[22]
M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,” in Advances in Neural Information Processing Systems, 2016, pp. 3981–3989.
[23]
J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems, 2017, pp. 4077–4087.
[24]
O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, 2016, pp. 3630–3638.
[25]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
[26]
M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” NAACL-HLT, 2018.
[27]
Devlin, J. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” NAACL-HLT, 2019.
[28]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2014.
[29]
E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning word vectors for 157 languages,” in Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018.
[30]
Collins, Liam, Aryan Mokhtari, and Sanjay Shakkottai. "Task-Robust Model-Agnostic Meta-Learning." Advances in Neural Information Processing Systems 33, 2020.
[31]
Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Advances in Neural Information Processing Systems, pages 113–124, 2019.
[32]
Finn, Chelsea, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. "Online Meta-Learning." In ICML. 2019.
[33]
A. Antoniou, H. Edwards, and A. J. Storkey. How to train your MAML. In International Conference on Learning Representations, ICLR, 2019.
[34]
K. Lee, S. Maji, A. Ravichandran, and S. Soatto. Meta-learning with differentiable convex optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019.
[35]
Harrison, James, "Continuous meta-learning without tasks." Advances in neural information processing systems 33, 2020.
[36]
Simon, Christian, "On modulating the gradient for meta-learning." European Conference on Computer Vision. Springer, Cham, 2020.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City
December 2020
266 pages
ISBN:9781450388559
DOI:10.1145/3446999
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Artificial Intelligence
  2. Natural language processing
  3. Part-of-Speech tagging
  4. cross-lingual low-resource POS tagging
  5. meta-learning

Qualifiers

  • Article
  • Research
  • Refereed limited

Conference

ICIT 2020
ICIT 2020: IoT and Smart City
December 25 - 27, 2020
Xi'an, China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 53
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media