Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2002472.2002664dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free access

Partial parsing from bitext projections

Published: 19 June 2011 Publication History

Abstract

Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper, we present a greedy non-directional parsing algorithm which doesn't need a fully connected parse and can learn from partial parses by utilizing available structural and syntactic information in them. Our parser achieved statistically significant improvements over a baseline system that trains on only fully connected parses for Bulgarian, Spanish and Hindi. It also gave a significant improvement over previously reported results for Bulgarian and set a benchmark for Hindi.

References

[1]
R. Begum, S. Husain, A. Dhwaj, D. Sharma, L. Bai, and R. Sangal. 2008. Dependency annotation scheme for indian languages. In In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India.
[2]
Michael John Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, USA. AAI9926110.
[3]
Michael Collins. 2002. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, EMNLP '02, pages 1--8, Morristown, NJ, USA. Association for Computational Linguistics.
[4]
Jason M. Eisner. 1996. Three new probabilistic models for dependency parsing: an exploration. In Proceedings of the 16th conference on Computational linguistics - Volume 1, pages 340--345, Morristown, NJ, USA. Association for Computational Linguistics.
[5]
Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar. 2009. Dependency grammar induction via bitext projection constraints. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, ACL-IJCNLP '09, pages 369--377, Morristown, NJ, USA. Association for Computational Linguistics.
[6]
Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm for easy-first non-directional dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 742--750, Morristown, NJ, USA. Association for Computational Linguistics.
[7]
Samar Husain, Prashanth Mannem, Bharath Ambati, and Phani Gadde. 2010. Icon 2010 tools contest on indian language dependency parsing. In Proceedings of ICON 2010 NLP Tools Contest.
[8]
Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Nat. Lang. Eng., 11:311--325, September.
[9]
Wenbin Jiang and Qun Liu. 2010. Dependency parsing and projection based on word-pair classification. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 12--20, Morristown, NJ, USA. Association for Computational Linguistics.
[10]
P. Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5. Citeseer.
[11]
Marco Kuhlmann and Joakim Nivre. 2006. Mildly non-projective dependency structures. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 507--514, Morristown, NJ, USA. Association for Computational Linguistics.
[12]
Mitchell P. Marcus, Beatrice Santorini, and Mary A. Marcinkiewicz. 1994. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):313--330.
[13]
R. McDonald, K. Crammer, and F. Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
[14]
Jens Nilsson and Joakim Nivre. 2008. Malteval: an evaluation and visualization tool for dependency parsing. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, may. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/.
[15]
Joakim Nivre, Johan Hall, Sandra Kübler, Ryan Mcdonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915--932, Prague, Czech Republic. Association for Computational Linguistics.
[16]
Joakim Nivre. 2003. An Efficient Algorithm for Projective Dependency Parsing. In Eighth International Workshop on Parsing Technologies, Nancy, France.
[17]
Joakim Nivre. 2009. Non-projective dependency parsing in expected linear time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 351--359, Suntec, Singapore, August. Association for Computational Linguistics.
[18]
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19--51.
[19]
Avinesh PVS. and Karthik Gali. 2007. Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation-Based Learning. In Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pages 21--24.
[20]
Roi Reichart and Ari Rappoport. 2007. Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 616--623, Prague, Czech Republic, June. Association for Computational Linguistics.
[21]
Libin Shen and Aravind Joshi. 2008. LTAG dependency parsing with bidirectional incremental construction. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 495--504, Honolulu, Hawaii, October. Association for Computational Linguistics.
[22]
L. Shen, G. Satta, and A. Joshi. 2007. Guided learning for bidirectional sequence classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL).
[23]
Mark Steedman, Miles Osborne, Anoop Sarkar, Stephen Clark, Rebecca Hwa, Julia Hockenmaier, Paul Ruhlen, Steven Baker, and Jeremiah Crim. 2003. Bootstrapping statistical parsers from small datasets. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, EACL '03, pages 331--338, Morristown, NJ, USA. Association for Computational Linguistics.
[24]
Jrg Tiedemann. 2002. MatsLex - a multilingual lexical database for machine translation. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC'2002), volume VI, pages 1909--1912, Las Palmas de Gran Canaria, Spain, 29--31 May.
[25]
Sriram Venkatapathy. 2008. Nlp tools contest - 2008: Summary. In Proceedings of ICON 2008 NLP Tools Contest.
[26]
Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In In Proceedings of IWPT, pages 195--206.
[27]
David Yarowsky, Grace Ngai, and Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the first international conference on Human language technology research, HLT '01, pages 1--8, Morristown, NJ, USA. Association for Computational Linguistics.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
June 2011
1696 pages
ISBN:9781932432879

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 19 June 2011

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 125
    Total Downloads
  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)6
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media