Nothing Special   »   [go: up one dir, main page]

Skip to main content

Building Powerful Dependency Parsers for Resource-Poor Languages

  • Conference paper
  • First Online:
Natural Language Understanding and Intelligent Applications (ICCPOL 2016, NLPCC 2016)

Abstract

In this paper, we present an approach to building dependency parsers for the resource-poor languages without any annotated resources on the target side. Compared with the previous studies, our approach requires less human annotated resources. In our approach, we first train a POS tagger and a parser on the source treebank. Then, they are used to parse the source sentences in bilingual data. We obtain auto-parsed sentences (with POS tags and dependencies) on the target side by projection techniques. Based on the fully projected sentences, we can train a base POS tagger and a base parser on the target side. But most of sentence pairs are not fully projected, so we get lots of partially projected sentences. To make full use of partially projected sentences, we implement a learning algorithm to train POS taggers, which leads to better parsing performance. We further exploit a set of features from the large-scale monolingual data to help parsing. Finally, we evaluate our proposed approach on Google Universal Treebank (v2.0, standard). The experimental results show that the proposed approach can significantly improve parsing performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://code.google.com/p/uni-dep-tb/.

  2. 2.

    https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokeni zer.perl.

  3. 3.

    https://dumps.wikimedia.org/.

  4. 4.

    When using multiple source languages, they get an average accuracy of 82.18%.

References

  1. Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1997)

    Google Scholar 

  2. Carreras, X.: Experiments with a higher-order projective dependency parser. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 957–961. Association for Computational Linguistics, Prague, June 2007

    Google Scholar 

  3. Chen, W., Kazama, J., Uchimoto, K., Torisawa, K.: Exploiting subtrees in auto-parsed data to improve dependency parsing. Comput. Intell. 28(28), 426451 (2012)

    MathSciNet  Google Scholar 

  4. Chen, W., Zhang, Y., Zhang, M.: Feature embedding for dependency parsing. In: Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 816–826. Dublin City University and Association for Computational Linguistics, Dublin, August 2014

    Google Scholar 

  5. Cohen, S., Smith, N.A.: Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 74–82. Association for Computational Linguistics, Boulder, June 2009

    Google Scholar 

  6. Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 16–23. Association for Computational Linguistics, Madrid, July 1997

    Google Scholar 

  7. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)

    MathSciNet  MATH  Google Scholar 

  8. Grave, E., Elhadad, N.: A convex and feature-rich discriminative approach to dependency grammar induction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1(Long Papers), pp. 1375–1384. Association for Computational Linguistics, Beijing, July 2015

    Google Scholar 

  9. Klein, D., Manning, C.: Corpus-based induction of syntactic structure: models of dependency and constituency. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004). Main Volume, pp. 478–485. Barcelona, Spain, July 2004

    Google Scholar 

  10. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Mt Summit, vol. 5 (2004)

    Google Scholar 

  11. Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of ACL 2008: HLT, pp. 595–603. Association for Computational Linguistics, Columbus, June 2008

    Google Scholar 

  12. Koo, T., Collins, M.: Efficient third-order dependency parsers. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1–11. Association for Computational Linguistics, Uppsala, July 2010

    Google Scholar 

  13. Li, Z., Chao, J., Zhang, M., Chen, W.: Coupled sequence labeling on heterogeneous annotations: POS tagging as a case study. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1(Long Papers), pp. 1783–1792. Association for Computational Linguistics, Beijing, July 2015

    Google Scholar 

  14. Liang, P.: Semi-supervised learning for natural language. Masters thesis Mit (2005)

    Google Scholar 

  15. Ma, X., Xia, F.: Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1(Long Papers), pp. 1337–1348. Association for Computational Linguistics, Baltimore, June 2014

    Google Scholar 

  16. Marcheggiani, D., Artières, T.: An experimental comparison of active learning strategies for partially labeled sequences. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 898–906. Association for Computational Linguistics, Doha, October 2014

    Google Scholar 

  17. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the Penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  18. McDonald, R., Crammer, K., Pereira, F.: Flexible text segmentation with structured multilabel classification. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 987–994. Association for Computational Linguistics, Vancouver, October 2005

    Google Scholar 

  19. McDonald, R., Crammer, K., Pereira, F.: Online large-margin training of dependency parsers. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 91–98. Association for Computational Linguistics, Ann Arbor, June 2005

    Google Scholar 

  20. McDonald, R., Nivre, J.: Tutorial: Recent advances in dependency parsing (2014). http://eacl2014.org/tutorial-dependency-parsing

  21. McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N., Lee, J.: Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2(Short Papers), pp. 92–97. Association for Computational Linguistics, Sofia, August 2013

    Google Scholar 

  22. McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 62–72. Association for Computational Linguistics, Edinburgh, July 2011

    Google Scholar 

  23. Mcdonald, R.T., Pereira, F.C.N.: Online learning of approximate dependency parsing algorithms. In: Eacl 2006, Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, April 3–7, 2006, Trento, Italy, pp. 81–88 (2006)

    Google Scholar 

  24. Nivre, J., Scholz, M.: Deterministic dependency parsing of english text. In: Proceedings of Coling 2004, pp. 64–70. COLING, Geneva, 23–27 August 2004

    Google Scholar 

  25. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  26. Petrov, S., Das, D., Mcdonald, R.: A universal part-of-speech tagset. Comput. Sci. 1(3), 2089–2096 (2011)

    Google Scholar 

  27. Rasooli, M.S., Collins, M.: Density-driven cross-lingual transfer of dependency parsers. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 328–338. Association for Computational Linguistics, Lisbon, September 2015

    Google Scholar 

  28. Riezler, S., King, T.H., Kaplan, R.M., Crouch, R., Maxwell, J.T.I., Johnson, M.: Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pp. 271–278. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002

    Google Scholar 

  29. Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: Breaking out of local optima with count transforms and model recombination: a study in grammar induction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1983–1995. Association for Computational Linguistics, Seattle, October 2013

    Google Scholar 

  30. Täckström, O., McDonald, R., Nivre, J.: Target language adaptation of discriminative transfer parsers. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1061–1071. Association for Computational Linguistics, Atlanta, June 2013

    Google Scholar 

  31. Täckström, O., McDonald, R., Uszkoreit, J.: Cross-lingual word clusters for direct transfer of linguistic structure. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 477–487. Association for Computational Linguistics, Montréal, June 2012

    Google Scholar 

  32. Zhang, H., Huang, L., Zhao, K., McDonald, R.: Online learning for inexact hypergraph search. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 908–913. Association for Computational Linguistics, Seattle, October 2013

    Google Scholar 

Download references

Acknowledgement

This project is supported by National Natural Science Foundation of China (Grant No. 61373095, 61572338, 61502325). This work is also partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenliang Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Yu, J., Chen, W., Li, Z., Zhang, M. (2016). Building Powerful Dependency Parsers for Resource-Poor Languages. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50496-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50495-7

  • Online ISBN: 978-3-319-50496-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics