Article

Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging

Authors:

Pengyuan Zhang,

Yonghong YanAuthors Info & Claims

ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City

Pages 35 - 39

https://doi.org/10.1145/3446999.3447006

Published: 09 April 2021 Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on May 30, 2021. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

Current deep learning based cross-lingual Part-of-Speech (POS) tagging methods are limited by their ability to achieve fast learning and generalization when the data in the target language is scarce. In this paper, we integrate a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better POS tagging. Based on the Model-Agnostic Meta-Learning framework (MAML), we propose a Lingual-Agnostic Meta-Learning (LAML) for cross-lingual low-resource POS tagging. The proposed LAML models cross-lingual POS tagging as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource languages. Also, different from the original MALM, LAML distinguishes parameters into shared parameters for represent learning and parameters to be adapted to different languages. We demonstrate that the proposed model outperforms the multilingual, joint-learning based approaches and enables us to train a competitive POS tagging system with only a fraction of samples.

Supplementary Material

3447006-vor (3447006-vor.pdf)

Version of Record for "Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging" by Zhang et al., 2020 The 8th International Conference on Information Technology: IoT and Smart City (ICIT '20).

Download
932.59 KB

References

[1]

J. Buys and J. A. Botha, “Cross-lingual morphological tagging for low-resource languages,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2016, pp. 1954–1964.

[2]

J. Guo, W. Che, D. Yarowsky, H. Wang, and T. Liu, “Cross-lingual dependency parsing based on distributed representations,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 2015, pp. 1234–1244. [Online]. Available: http://aclweb.org/anthology/ P15-1119.

[3]

L. Duong, T. Cohn, K. Verspoor, S. Bird, and P. Cook, “What can we get from 1000 tokens? a case study of multilingual pos tagging for resource-poor languages,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014, pp. 886–897. [Online]. Available: http://aclweb.org/anthology/D14-1096.

[4]

M. Fang and T. Cohn, “Model transfer for tagging low-resource languages using a bilingual dictionary,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 2017, pp. 587–593. [Online]. Available: http://aclweb.org/anthology/P17-2093.

[5]

B. Plank and Zˇ . Agic´, “Distant supervision from disparate sources for low-resource part-of-speech tagging,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018, pp. 614–620. [Online]. Available: http://aclweb.org/anthology/ D18-1061.

[6]

Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Transfer learning for sequence tagging with hierarchical recurrent networks,”ICLR, 2017.

[7]

J.-K. Kim, Y.-B. Kim, R. Sarikaya, and E. Fosler-Lussier, “Cross-lingual transfer learning for pos tagging without cross-lingual resources,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017, pp. 2832–2838. [Online]. Available: http://aclweb.org/anthology/D17-1302.

[8]

Y. Lin, S. Yang, V. Stoyanov, and H. Ji, “A multi-lingual multi-task architecture for low-resource sequence labeling,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2018, pp. 799–809. [Online]. Available: http://aclweb.org/anthology/P18-1074.

[9]

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta- learning for fast adaptation of deep networks,” ICML, 2017.

[10]

J. Nivre, Zˇ . Agic´, M. J. Aranzabe, M. Asahara, A. Atutxa, M. Ballesteros, J. Bauer, K. Bengoetxea, R. A. Bhat, C. Bosco, “Universal dependencies 1.2,” 2015.

[11]

L. Ratinov and D. Roth, “Design challenges and mis-conceptions in named entity recognition,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), 2009, pp. 147–155.

[12]

J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” 2001.

Digital Library

[13]

A. Passos, V. Kumar, and A. McCallum, “Lexicon infused phrase embeddings for named entity resolution,” CoNLL, 2014.

[14]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” Journal of machine learning research, vol. 12, no. Aug, pp. 2493–2537, 2011.

Digital Library

[15]

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures for named entity recognition,” NAACL-HLT, 2016.

[16]

X. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” ACL, 2016.

[17]

P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” EMNLP, 2016.

[18]

Y. Lin, S. Yang, V. Stoyanov, and H. Ji, “A multi-lingual multi- task architecture for low-resource sequence labeling,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 799–809.

[19]

J. Gu, Y. Wang, Y. Chen, K. Cho, and V. O. Li, “Meta-learning for low-resource neural machine translation,” EMNLP, 2018.

[20]

X. Jiang, M. Havaei, G. Chartrand, H. Chouaib, T. Vincent, A. Jes- son, N. Chapados, and S. Matwin, “On the importance of attention in meta-learning for few-shot text classification,” arXiv preprint arXiv:1806.00852, 2018.

[21]

Ravi, S. and H. Larochelle. “Optimization as a Model for Few-Shot Learning.” ICLR, 2017.

[22]

M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,” in Advances in Neural Information Processing Systems, 2016, pp. 3981–3989.

Digital Library

[23]

J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems, 2017, pp. 4077–4087.

Digital Library

[24]

O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, 2016, pp. 3630–3638.

Digital Library

[25]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.

Digital Library

[26]

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” NAACL-HLT, 2018.

[27]

Devlin, J. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” NAACL-HLT, 2019.

[28]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2014.

[29]

E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning word vectors for 157 languages,” in Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018.

[30]

Collins, Liam, Aryan Mokhtari, and Sanjay Shakkottai. "Task-Robust Model-Agnostic Meta-Learning." Advances in Neural Information Processing Systems 33, 2020.

[31]

Aravind Rajeswaran, Chelsea Finn, Sham M Kakade, and Sergey Levine. Meta-learning with implicit gradients. In Advances in Neural Information Processing Systems, pages 113–124, 2019.

[32]

Finn, Chelsea, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. "Online Meta-Learning." In ICML. 2019.

[33]

A. Antoniou, H. Edwards, and A. J. Storkey. How to train your MAML. In International Conference on Learning Representations, ICLR, 2019.

[34]

K. Lee, S. Maji, A. Ravichandran, and S. Soatto. Meta-learning with differentiable convex optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019.

[35]

Harrison, James, "Continuous meta-learning without tasks." Advances in neural information processing systems 33, 2020.

[36]

Simon, Christian, "On modulating the gradient for meta-learning." European Conference on Computer Vision. Springer, Cham, 2020.

Recommendations

A Cross-lingual Part-of-Speech Tagging for Malay Language
ICAART 2015: Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2

Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance

of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay

is experimented as the less-resourced ...
Software-specific part-of-speech tagging: an experimental study on stack overflow
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing

Part-of-speech (POS) tagging performance degrades on out-of-domain data due to the lack of domain knowledge. Software engineering knowledge, embodied in textual documentations, bug reports and online forum discussions, is expressed in natural language, ...
Korean Part-of-speech Tagging Based on Morpheme Generation

Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City

December 2020

266 pages

ISBN:9781450388559

DOI:10.1145/3446999

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article
Research
Refereed limited

Conference

ICIT 2020

ICIT 2020: IoT and Smart City

December 25 - 27, 2020

Xi'an, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
53
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents