Automated Creation of a Medieval Portuguese Partial Treebank

Vitor Rocio⁴,
Mário Amado Alves⁴,
J. Gabriel Lopes⁴,
Maria Francisca Xavier⁵ &
…
Graça Vicente⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 20))

405 Accesses
4 Citations

Abstract

The growing trend towards corpus-based linguistics has led researchers to manually annotate large quantities of text. The human effort involved in this task is often enormous, and requires highly specialised linguistically trained manpower. According to our point of view, another approach should be followed, using this highly trained manpower in other activities, more rewarding and creative, in a constructive dialogue among the various kinds of expertise needed for overcoming our ignorance about languages. As an experiment, we used tools and linguistic resources previously built for Contemporary Portuguese for partially automating the process of partial annotation of a Medieval Portuguese corpus. In this paper, we describe the tools used (POS tagger, lexical analyser and partial parser) and demonstrate that the similarities between a language at two different time periods is sufficient for bootstrapping and acquiring lexical knowledge from the partially parsed, automatically annotated corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Early Experiments on Automatic Annotation of Portuguese Medieval Texts

Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish

Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces

Article 26 December 2023

References

Balsa, J. (1998). A hierarchical Multi-agent system for Natural Language Diagnosis. Proceedings of the 13^th European Conference on Artificial Intelligence. Henri Prade, ed.. John Willey & Sons, 1998.
Google Scholar
Balsa, J.; Lopes, J. G. (2000). A Distributed Approach for a robust and evolving NLP system. Proceedings of the NLP 2000 Conference, Patras, Greece, D. N Christodoulakis, ed.
Google Scholar
Böhmová, Alena et al. (2003). The Prague Dependency Treebank: a three-level annotation scenario. This volume.
Google Scholar
de la Clergerie, Eric; Lang, Bernard (1994). LPDA: Another Look at tabulation in logic programming. Proceedings of the International Conference on Logic Programming, Prague.
Google Scholar
Ferreira da Silva, Joaquim; Lopes, J. Gabriel; Xavier, M. Francisca; Vicente, Graça (1999). Relevant Expressions in Language Corpora. Actes de l’ atelier “Corpus et Traitement Automatique des Langues: Pour une réflexion méthodologique” (TALN’99), Cargèse, Corse (France). Anne Condamines, Cécile Fabre and Marie-Paule Péry-Woodley, eds. p. 86–94.
Google Scholar
Fiéis, M. Alexandra (2000). Interpolation in Medieval Portuguese. Proceedings of Lexicon & Grammar International Congress of Linguistics, Lugo.
Google Scholar
Hobbs, Jerry R. et al. (1997). FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In Finite-State Language Processing, E. Roche and Y. Schabes, ed., MIT Press, pp. 383–406.
Google Scholar
Lopes, J. Gabriel; Rocio, Vitor; Balsa, Joco (1999). Overcoming lexical information incompleteness, Proceedings of the First Portuguese Workshop on Computational Linguistics, Maria Antónia Mota, ed., Associação Por-tuguesa de Linguística.
Google Scholar
Marcus, Mitchell P.; Santorini, Beatrice; Marcinkiewicz, Mary Ann (1993). Building a large annotated corpus of English: The Penn Treebank, Proceedings of the 31st Annual Meeting of the Association of Computational Linguistics (ACL’93).
Google Scholar
Marques, Nuno; Lopes, J. Gabriel (1996). Using Neural Nets for Portuguese Part-of-Speech Tagging. Proceedings of the 5^th International Conference on the Cognitive Science of Natural Language Processing. Dublin City University.
Google Scholar
Marques, Nuno; Lopes J. Gabriel; Coelho, Carlos A. (1998). Learning Verbal Transitivity using LogLinear Models, Proceedings of the Tenth European Conference on Machine Learning (ECML-98), Lecture Notes in Artificial Intelligence, Springer Verlag.
Google Scholar
Marques, Nuno (2000). Uma metodologia para a modelação estatística da subcategorização verbal. PhD thesis. Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2000.
Google Scholar
Moreno, A., López, S., et Sánchez F., (2003). Developing a syntactic annotation scheme and tools for a Spanish treebank. This volume.
Google Scholar
Pardo, M. A.; Souto; D.C.; Vilares, M.; de la Clergerie; E. (1999). Tabular Algorithms for TAG Parsing. Proceedings of EACL’99.
Google Scholar
Rocio, Vitor; Lopes, J. Gabriel (1998). Partial parsing, deduction and tabling, Actes des premières Journées sur la Tabulation en Analyse Syntaxique et Déduction (Proceedings of the Workshop on Tabulation in Parsing and Deduction), Bernard Lang, ed., INRIA, Rocquencourt.
Google Scholar
Rocio, Vitor; Lopes, J. Gabriel (1999a). An infra-structure for diagnosing causes for partially parsed natural language input, Proceedings of the 6^th International Symposium on Social Communication, Santiago de Cuba, pp. 550–554, Editorial Oriente.
Google Scholar
Rocio, Vitor; Lopes, J. Gabriel (1999b). Cascaded Syntactic Partial Analysis. Proceedings of the First Portuguese Workshop on Computational Linguistics, Maria Antónia Mota, ed. Associação Portuguesa de Linguistica.
Google Scholar
Rocio, Vitor; de la Clergerie, Eric; Lopes, J. Gabriel (2000). Tabulation for multi-purpose partial parsing. Journal Grammars. To appear.
Google Scholar
Shieber, S.M.; Schabes, Y.; Pereira, F. (1995). Principles and Implementation of Deductive Parsing. Journal of Logic Programming.
Google Scholar
Xavier, M. Francisca; Vicente, Graça; Crispim, M. de Lourdes. (1999). Di-cionário de Verbos do Século 13, Lisboa, Centro de Linguistica da Uni-versidade Nova de Lisboa.
Google Scholar

Download references

Author information

Authors and Affiliations

CENTRIA, Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Portugal
Vitor Rocio, Mário Amado Alves & J. Gabriel Lopes
Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa, Portugal
Maria Francisca Xavier & Graça Vicente

Authors

Vitor Rocio
View author publications
You can also search for this author in PubMed Google Scholar
Mário Amado Alves
View author publications
You can also search for this author in PubMed Google Scholar
J. Gabriel Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Maria Francisca Xavier
View author publications
You can also search for this author in PubMed Google Scholar
Graça Vicente
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universite Paris 7, Paris, France
Anne Abeillé

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G. (2003). Automated Creation of a Medieval Portuguese Partial Treebank. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_12

Download citation

DOI: https://doi.org/10.1007/978-94-010-0201-1_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1335-5
Online ISBN: 978-94-010-0201-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Automated Creation of a Medieval Portuguese Partial Treebank

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Early Experiments on Automatic Annotation of Portuguese Medieval Texts

Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish

Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automated Creation of a Medieval Portuguese Partial Treebank

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Early Experiments on Automatic Annotation of Portuguese Medieval Texts

Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish

Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation