Building Powerful Dependency Parsers for Resource-Poor Languages

Junjie Yu¹⁸,
Wenliang Chen¹⁸,
Zhenghua Li¹⁸ &
…
Min Zhang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

4844 Accesses
1 Citation

Abstract

In this paper, we present an approach to building dependency parsers for the resource-poor languages without any annotated resources on the target side. Compared with the previous studies, our approach requires less human annotated resources. In our approach, we first train a POS tagger and a parser on the source treebank. Then, they are used to parse the source sentences in bilingual data. We obtain auto-parsed sentences (with POS tags and dependencies) on the target side by projection techniques. Based on the fully projected sentences, we can train a base POS tagger and a base parser on the target side. But most of sentence pairs are not fully projected, so we get lots of partially projected sentences. To make full use of partially projected sentences, we implement a learning algorithm to train POS taggers, which leads to better parsing performance. We further exploit a set of features from the large-scale monolingual data to help parsing. Finally, we evaluate our proposed approach on Google Universal Treebank (v2.0, standard). The experimental results show that the proposed approach can significantly improve parsing performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Delexicalized and Minimally Supervised Parsing on Universal Dependencies

Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

POS Tagger Model for South Indian Language Using a Deep Learning Approach

Notes

1.
https://code.google.com/p/uni-dep-tb/.
2.
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokeni zer.perl.
3.
https://dumps.wikimedia.org/.
4.
When using multiple source languages, they get an average accuracy of 82.18%.

References

Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1997)
Google Scholar
Carreras, X.: Experiments with a higher-order projective dependency parser. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 957–961. Association for Computational Linguistics, Prague, June 2007
Google Scholar
Chen, W., Kazama, J., Uchimoto, K., Torisawa, K.: Exploiting subtrees in auto-parsed data to improve dependency parsing. Comput. Intell. 28(28), 426451 (2012)
MathSciNet Google Scholar
Chen, W., Zhang, Y., Zhang, M.: Feature embedding for dependency parsing. In: Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 816–826. Dublin City University and Association for Computational Linguistics, Dublin, August 2014
Google Scholar
Cohen, S., Smith, N.A.: Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 74–82. Association for Computational Linguistics, Boulder, June 2009
Google Scholar
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics, Madrid, July 1997
Google Scholar
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)
MathSciNet MATH Google Scholar
Grave, E., Elhadad, N.: A convex and feature-rich discriminative approach to dependency grammar induction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1(Long Papers), pp. 1375–1384. Association for Computational Linguistics, Beijing, July 2015
Google Scholar
Klein, D., Manning, C.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004). Main Volume, pp. 478–485. Barcelona, Spain, July 2004
Google Scholar
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Mt Summit, vol. 5 (2004)
Google Scholar
Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of ACL 2008: HLT, pp. 595–603. Association for Computational Linguistics, Columbus, June 2008
Google Scholar
Koo, T., Collins, M.: Efficient third-order dependency parsers. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1–11. Association for Computational Linguistics, Uppsala, July 2010
Google Scholar
Li, Z., Chao, J., Zhang, M., Chen, W.: Coupled sequence labeling on heterogeneous annotations: POS tagging as a case study. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1(Long Papers), pp. 1783–1792. Association for Computational Linguistics, Beijing, July 2015
Google Scholar
Liang, P.: Semi-supervised learning for natural language. Masters thesis Mit (2005)
Google Scholar
Ma, X., Xia, F.: Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1(Long Papers), pp. 1337–1348. Association for Computational Linguistics, Baltimore, June 2014
Google Scholar
Marcheggiani, D., Artières, T.: An experimental comparison of active learning strategies for partially labeled sequences. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 898–906. Association for Computational Linguistics, Doha, October 2014
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the Penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Flexible text segmentation with structured multilabel classification. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 987–994. Association for Computational Linguistics, Vancouver, October 2005
Google Scholar
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 91–98. Association for Computational Linguistics, Ann Arbor, June 2005
Google Scholar
McDonald, R., Nivre, J.: Tutorial: Recent advances in dependency parsing (2014). http://eacl2014.org/tutorial-dependency-parsing
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., Lee, J.: Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2(Short Papers), pp. 92–97. Association for Computational Linguistics, Sofia, August 2013
Google Scholar
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 62–72. Association for Computational Linguistics, Edinburgh, July 2011
Google Scholar
Mcdonald, R.T., Pereira, F.C.N.: Online learning of approximate dependency parsing algorithms. In: Eacl 2006, Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, April 3–7, 2006, Trento, Italy, pp. 81–88 (2006)
Google Scholar
Nivre, J., Scholz, M.: Deterministic dependency parsing of english text. In: Proceedings of Coling 2004, pp. 64–70. COLING, Geneva, 23–27 August 2004
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article MATH Google Scholar
Petrov, S., Das, D., Mcdonald, R.: A universal part-of-speech tagset. Comput. Sci. 1(3), 2089–2096 (2011)
Google Scholar
Rasooli, M.S., Collins, M.: Density-driven cross-lingual transfer of dependency parsers. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 328–338. Association for Computational Linguistics, Lisbon, September 2015
Google Scholar
Riezler, S., King, T.H., Kaplan, R.M., Crouch, R., Maxwell, J.T.I., Johnson, M.: Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pp. 271–278. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002
Google Scholar
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Breaking out of local optima with count transforms and model recombination: a study in grammar induction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1983–1995. Association for Computational Linguistics, Seattle, October 2013
Google Scholar
Täckström, O., McDonald, R., Nivre, J.: Target language adaptation of discriminative transfer parsers. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1061–1071. Association for Computational Linguistics, Atlanta, June 2013
Google Scholar
Täckström, O., McDonald, R., Uszkoreit, J.: Cross-lingual word clusters for direct transfer of linguistic structure. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 477–487. Association for Computational Linguistics, Montréal, June 2012
Google Scholar
Zhang, H., Huang, L., Zhao, K., McDonald, R.: Online learning for inexact hypergraph search. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 908–913. Association for Computational Linguistics, Seattle, October 2013
Google Scholar

Download references

Acknowledgement

This project is supported by National Natural Science Foundation of China (Grant No. 61373095, 61572338, 61502325). This work is also partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Junjie Yu, Wenliang Chen, Zhenghua Li & Min Zhang

Authors

Junjie Yu
View author publications
Search author on:PubMed Google Scholar
Wenliang Chen
View author publications
Search author on:PubMed Google Scholar
Zhenghua Li
View author publications
Search author on:PubMed Google Scholar
Min Zhang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Wenliang Chen .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, J., Chen, W., Li, Z., Zhang, M. (2016). Building Powerful Dependency Parsers for Resource-Poor Languages. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_3
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics