Abstract
Treebanks play an important role in the development of various natural language processing tools. Amongst other things, they provide crucial language-specific patterns that are exploited by various machine learning techniques. Quality control in any treebanking project is therefore extremely important. Manual validation of the treebank is one of the steps that is generally necessary to ensure good annotation quality. Needless to say, manual validation requires a lot of human time and effort. In this paper, we present an automatic approach which helps in detecting potential errors in a treebank. We use a dependency parser to detect such errors. By using this tool, validators can validate a treebank in less time and with reduced human effort.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abney, S.: Parsing by Chunks. Principle-Based Parsing 44, 257–278 (1991)
Agarwal, R., Ambati, B., Sharma, D.: A Hybrid Approach to Error Detection in a Treebank and its Impact on Manual Validation Time. Linguistic Issues in Language Technology 7(1) (2012)
Ambati, B.R., Gupta, M., Husain, S., Sharma, D.M.: A High Recall Error Identification Tool for Hindi Treebank Validation. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta (2010)
Ambati, B.R., Husain, S., Nivre, J., Sangal, R.: On the Role of Morphosyntactic Features in Hindi Dependency Parsing. In: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, SPMRL 2010, pp. 94–102. Association for Computational Linguistics, Stroudsburg (2010)
Ambati, B., Agarwal, R., Gupta, M., Husain, S., Sharma, D.: Error Detection for Treebank Validation. In: The 9th International Workshop on Asian Language Resources (ALR), Chiang Mai, Thailand (2011)
Begum, R., Husain, S., Dhwaj, A., Misra, D., Bai, L., Sangal, R.: Dependency Annotation Scheme for Indian Languages (2008)
Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural Language Processing: A Paninian Perspective. Prentice-Hall of India (1995)
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D.M., Xia, F.: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu. In: Proceedings of the Third Linguistic Annotation Workshop, ACL-IJCNLP 2009, pp. 186–189. Association for Computational Linguistics, Stroudsburg (2009)
Eskin, E.: Automatic Corpus Correction with Anomaly Detection. In: North American Chapter of the Association for Computational Linguistics (2000)
van Halteren, H.: The Detection of Inconsistency in Manually Tagged Text. In: Proceedings of LINC 2000, Luxembourg (2000)
Husain, S., Agrawal, B.: Analyzing Parser Errors to Improve Parsing Accuracy and to Inform Treebanking Decisions. Linguistic Issues in Language Technology 7(1) (2012)
Kaljurand, K.: Checking Treebank Consistency to Find Annotation Errors (2004)
de Kok, D., Ma, J., van Noord, G.: A Generalized Method for Iterative Error Mining in Parsing Results. In: Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks, GEAF 2009, pp. 71–79. Association for Computational Linguistics, Stroudsburg (2009)
Kordoni, V.: Strategies for Annotation of Large Corpora of Multilingual Spontaneous Speech Data. In: The workshop on Multilingual Corpora: Linguistic Requirements and Technical Perspectives held at Corpus Linguistics, Citeseer (2003)
Nivre, J.: Incrementality in Deterministic Dependency Parsing. In: Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together, pp. 50–57. Association for Computational Linguistics (2004)
Nivre, J., Hall, J.: Maltparser: A Language-Independent System for Data-Driven Dependency Parsing. In: Proc. of the Fourth Workshop on Treebanks and Linguistic Theories, pp. 13–95 (2005)
van Noord, G.: Error Mining for Wide-Coverage Grammar Engineering. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004. Association for Computational Linguistics, Stroudsburg (2004)
Volokh, A., Neumann, G.: Automatic Detection and Correction of Errors in Dependency Tree-Banks. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - HLT 2011, vol. 2, pp. 346–350. Association for Computational Linguistics, Stroudsburg (2011)
Xia, F., Rambow, O., Bhatt, R., Palmer, M., Sharma, D.: Towards a Multi-Representational Treebank. In: The 7th International Workshop on Treebanks and Linguistic Theories, Groningen, Netherlands. Citeseer (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agrawal, B., Agarwal, R., Husain, S., Sharma, D.M. (2013). An Automatic Approach to Treebank Error Detection Using a Dependency Parser. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-37247-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)