Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2002736.2002870dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free access

Using derivation trees for treebank error detection

Published: 19 June 2011 Publication History

Abstract

This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.

References

[1]
Adriane Boyd, Markus Dickinson, and Detmar Meurers. 2007. Increasing the recall of corpus annotation error detection. In Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT 2007), Bergen, Norway.
[2]
Tim Buckwalter. 2004. Buckwalter Arabic morphological analyzer version 2.0. Linguistic Data Consortium LDC2004L02.
[3]
David Chiang. 2003. Statistical parsing with an automatically extracted tree adjoining grammar. In Data Oriented Parsing. CSLI.
[4]
Markus Dickinson and Detmar Meurers. 2003a. Detecting errors in part-of-speech annotation. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03), pages 107--114, Budapest, Hungary.
[5]
Markus Dickinson and Detmar Meurers. 2003b. Detecting inconsistencies in treebanks. In Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT 2003), Sweden. Treebanks and Linguistic Theories.
[6]
Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, pages 205--208, Sapporo, Japan, July. Association for Computational Linguistics.
[7]
Spence Green and Christopher D. Manning. 2010. Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 394--402, Beijing, China, August. Coling 2010 Organizing Committee.
[8]
A. K. Joshi and Y. Schabes. 1997. Tree-adjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, Volume 3: Beyond Words, pages 69--124. Springer, New York.
[9]
Yoshihide Kato and Shigeki Matsubara. 2010. Correcting errors in a treebank based on synchronous tree substitution grammar. In Proceedings of the ACL 2010 Conference Short Papers, pages 74--79, Uppsala, Sweden, July. Association for Computational Linguistics.
[10]
Seth Kulick and Ann Bies. 2010. A TAG-derived database for treebank search and parser analysis. In TAG+10: The 10th International Conference on Tree Adjoining Grammars and Related Formalisms, Yale.
[11]
Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, and Basma Bouziri. 2008a. Arabic treebank part 1 - v4.0. Linguistic Data Consortium LDC2008E61, December 4.
[12]
Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, and Basma Bouziri. 2008b. Arabic treebank part 3 - v3.0. Linguistic Data Consortium LDC2008E22, August 20.
[13]
Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, and Basma Bouziri. 2009. Arabic treebank part 2- v3.0. Linguistic Data Consortium LDC2008E62, January 20.

Cited By

View all
  • (2015)Increased Recall in Annotation Variance Detection in TreebanksProceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 930210.1007/978-3-319-24033-6_65(578-586)Online publication date: 14-Sep-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
June 2011
765 pages
ISBN:9781932432886

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 19 June 2011

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)21
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Increased Recall in Annotation Variance Detection in TreebanksProceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 930210.1007/978-3-319-24033-6_65(578-586)Online publication date: 14-Sep-2015

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media