Abstract
In this paper, we present an approach to building dependency parsers for the resource-poor languages without any annotated resources on the target side. Compared with the previous studies, our approach requires less human annotated resources. In our approach, we first train a POS tagger and a parser on the source treebank. Then, they are used to parse the source sentences in bilingual data. We obtain auto-parsed sentences (with POS tags and dependencies) on the target side by projection techniques. Based on the fully projected sentences, we can train a base POS tagger and a base parser on the target side. But most of sentence pairs are not fully projected, so we get lots of partially projected sentences. To make full use of partially projected sentences, we implement a learning algorithm to train POS taggers, which leads to better parsing performance. We further exploit a set of features from the large-scale monolingual data to help parsing. Finally, we evaluate our proposed approach on Google Universal Treebank (v2.0, standard). The experimental results show that the proposed approach can significantly improve parsing performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
When using multiple source languages, they get an average accuracy of 82.18%.
References
Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1997)
Carreras, X.: Experiments with a higher-order projective dependency parser. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 957–961. Association for Computational Linguistics, Prague, June 2007
Chen, W., Kazama, J., Uchimoto, K., Torisawa, K.: Exploiting subtrees in auto-parsed data to improve dependency parsing. Comput. Intell. 28(28), 426451 (2012)
Chen, W., Zhang, Y., Zhang, M.: Feature embedding for dependency parsing. In: Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 816–826. Dublin City University and Association for Computational Linguistics, Dublin, August 2014
Cohen, S., Smith, N.A.: Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 74–82. Association for Computational Linguistics, Boulder, June 2009
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics, Madrid, July 1997
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)
Grave, E., Elhadad, N.: A convex and feature-rich discriminative approach to dependency grammar induction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1(Long Papers), pp. 1375–1384. Association for Computational Linguistics, Beijing, July 2015
Klein, D., Manning, C.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004). Main Volume, pp. 478–485. Barcelona, Spain, July 2004
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Mt Summit, vol. 5 (2004)
Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of ACL 2008: HLT, pp. 595–603. Association for Computational Linguistics, Columbus, June 2008
Koo, T., Collins, M.: Efficient third-order dependency parsers. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1–11. Association for Computational Linguistics, Uppsala, July 2010
Li, Z., Chao, J., Zhang, M., Chen, W.: Coupled sequence labeling on heterogeneous annotations: POS tagging as a case study. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1(Long Papers), pp. 1783–1792. Association for Computational Linguistics, Beijing, July 2015
Liang, P.: Semi-supervised learning for natural language. Masters thesis Mit (2005)
Ma, X., Xia, F.: Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1(Long Papers), pp. 1337–1348. Association for Computational Linguistics, Baltimore, June 2014
Marcheggiani, D., Artières, T.: An experimental comparison of active learning strategies for partially labeled sequences. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 898–906. Association for Computational Linguistics, Doha, October 2014
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the Penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
McDonald, R., Crammer, K., Pereira, F.: Flexible text segmentation with structured multilabel classification. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 987–994. Association for Computational Linguistics, Vancouver, October 2005
McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 91–98. Association for Computational Linguistics, Ann Arbor, June 2005
McDonald, R., Nivre, J.: Tutorial: Recent advances in dependency parsing (2014). http://eacl2014.org/tutorial-dependency-parsing
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., Lee, J.: Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2(Short Papers), pp. 92–97. Association for Computational Linguistics, Sofia, August 2013
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 62–72. Association for Computational Linguistics, Edinburgh, July 2011
Mcdonald, R.T., Pereira, F.C.N.: Online learning of approximate dependency parsing algorithms. In: Eacl 2006, Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, April 3–7, 2006, Trento, Italy, pp. 81–88 (2006)
Nivre, J., Scholz, M.: Deterministic dependency parsing of english text. In: Proceedings of Coling 2004, pp. 64–70. COLING, Geneva, 23–27 August 2004
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Petrov, S., Das, D., Mcdonald, R.: A universal part-of-speech tagset. Comput. Sci. 1(3), 2089–2096 (2011)
Rasooli, M.S., Collins, M.: Density-driven cross-lingual transfer of dependency parsers. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 328–338. Association for Computational Linguistics, Lisbon, September 2015
Riezler, S., King, T.H., Kaplan, R.M., Crouch, R., Maxwell, J.T.I., Johnson, M.: Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pp. 271–278. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002
Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Breaking out of local optima with count transforms and model recombination: a study in grammar induction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1983–1995. Association for Computational Linguistics, Seattle, October 2013
Täckström, O., McDonald, R., Nivre, J.: Target language adaptation of discriminative transfer parsers. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1061–1071. Association for Computational Linguistics, Atlanta, June 2013
Täckström, O., McDonald, R., Uszkoreit, J.: Cross-lingual word clusters for direct transfer of linguistic structure. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 477–487. Association for Computational Linguistics, Montréal, June 2012
Zhang, H., Huang, L., Zhao, K., McDonald, R.: Online learning for inexact hypergraph search. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 908–913. Association for Computational Linguistics, Seattle, October 2013
Acknowledgement
This project is supported by National Natural Science Foundation of China (Grant No. 61373095, 61572338, 61502325). This work is also partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Yu, J., Chen, W., Li, Z., Zhang, M. (2016). Building Powerful Dependency Parsers for Resource-Poor Languages. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)