Abstract
The growing trend towards corpus-based linguistics has led researchers to manually annotate large quantities of text. The human effort involved in this task is often enormous, and requires highly specialised linguistically trained manpower. According to our point of view, another approach should be followed, using this highly trained manpower in other activities, more rewarding and creative, in a constructive dialogue among the various kinds of expertise needed for overcoming our ignorance about languages. As an experiment, we used tools and linguistic resources previously built for Contemporary Portuguese for partially automating the process of partial annotation of a Medieval Portuguese corpus. In this paper, we describe the tools used (POS tagger, lexical analyser and partial parser) and demonstrate that the similarities between a language at two different time periods is sufficient for bootstrapping and acquiring lexical knowledge from the partially parsed, automatically annotated corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Balsa, J. (1998). A hierarchical Multi-agent system for Natural Language Diagnosis. Proceedings of the 13th European Conference on Artificial Intelligence. Henri Prade, ed.. John Willey & Sons, 1998.
Balsa, J.; Lopes, J. G. (2000). A Distributed Approach for a robust and evolving NLP system. Proceedings of the NLP 2000 Conference, Patras, Greece, D. N Christodoulakis, ed.
Böhmová, Alena et al. (2003). The Prague Dependency Treebank: a three-level annotation scenario. This volume.
de la Clergerie, Eric; Lang, Bernard (1994). LPDA: Another Look at tabulation in logic programming. Proceedings of the International Conference on Logic Programming, Prague.
Ferreira da Silva, Joaquim; Lopes, J. Gabriel; Xavier, M. Francisca; Vicente, Graça (1999). Relevant Expressions in Language Corpora. Actes de l’ atelier “Corpus et Traitement Automatique des Langues: Pour une réflexion méthodologique” (TALN’99), Cargèse, Corse (France). Anne Condamines, Cécile Fabre and Marie-Paule Péry-Woodley, eds. p. 86–94.
Fiéis, M. Alexandra (2000). Interpolation in Medieval Portuguese. Proceedings of Lexicon & Grammar International Congress of Linguistics, Lugo.
Hobbs, Jerry R. et al. (1997). FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In Finite-State Language Processing, E. Roche and Y. Schabes, ed., MIT Press, pp. 383–406.
Lopes, J. Gabriel; Rocio, Vitor; Balsa, Joco (1999). Overcoming lexical information incompleteness, Proceedings of the First Portuguese Workshop on Computational Linguistics, Maria Antónia Mota, ed., Associação Por-tuguesa de Linguística.
Marcus, Mitchell P.; Santorini, Beatrice; Marcinkiewicz, Mary Ann (1993). Building a large annotated corpus of English: The Penn Treebank, Proceedings of the 31st Annual Meeting of the Association of Computational Linguistics (ACL’93).
Marques, Nuno; Lopes, J. Gabriel (1996). Using Neural Nets for Portuguese Part-of-Speech Tagging. Proceedings of the 5th International Conference on the Cognitive Science of Natural Language Processing. Dublin City University.
Marques, Nuno; Lopes J. Gabriel; Coelho, Carlos A. (1998). Learning Verbal Transitivity using LogLinear Models, Proceedings of the Tenth European Conference on Machine Learning (ECML-98), Lecture Notes in Artificial Intelligence, Springer Verlag.
Marques, Nuno (2000). Uma metodologia para a modelação estatística da subcategorização verbal. PhD thesis. Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2000.
Moreno, A., López, S., et Sánchez F., (2003). Developing a syntactic annotation scheme and tools for a Spanish treebank. This volume.
Pardo, M. A.; Souto; D.C.; Vilares, M.; de la Clergerie; E. (1999). Tabular Algorithms for TAG Parsing. Proceedings of EACL’99.
Rocio, Vitor; Lopes, J. Gabriel (1998). Partial parsing, deduction and tabling, Actes des premières Journées sur la Tabulation en Analyse Syntaxique et Déduction (Proceedings of the Workshop on Tabulation in Parsing and Deduction), Bernard Lang, ed., INRIA, Rocquencourt.
Rocio, Vitor; Lopes, J. Gabriel (1999a). An infra-structure for diagnosing causes for partially parsed natural language input, Proceedings of the 6th International Symposium on Social Communication, Santiago de Cuba, pp. 550–554, Editorial Oriente.
Rocio, Vitor; Lopes, J. Gabriel (1999b). Cascaded Syntactic Partial Analysis. Proceedings of the First Portuguese Workshop on Computational Linguistics, Maria Antónia Mota, ed. Associação Portuguesa de Linguistica.
Rocio, Vitor; de la Clergerie, Eric; Lopes, J. Gabriel (2000). Tabulation for multi-purpose partial parsing. Journal Grammars. To appear.
Shieber, S.M.; Schabes, Y.; Pereira, F. (1995). Principles and Implementation of Deductive Parsing. Journal of Logic Programming.
Xavier, M. Francisca; Vicente, Graça; Crispim, M. de Lourdes. (1999). Di-cionário de Verbos do Século 13, Lisboa, Centro de Linguistica da Uni-versidade Nova de Lisboa.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G. (2003). Automated Creation of a Medieval Portuguese Partial Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_12
Download citation
DOI: https://doi.org/10.1007/978-94-010-0201-1_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1335-5
Online ISBN: 978-94-010-0201-1
eBook Packages: Springer Book Archive