Abstract
This paper presents a methodology for building an electronic dictionary of anthroponyms of European Portuguese (DicPRO), which constitutes a useful resource for computational processing, due to the importance of names in the structuring of information in texts. The dictionary has been enriched with morphosyntactic and semantic information. It was then used in the specific task of capitalizing anthroponyms and other proper names on a corpus automatically produced by a broadcast news speech recognition system and manually corrected. The output of this system does not offer clues, such as capitalized words or punctuation. This task expects to contribute in rendering more readable the output of such system. The paper shows that, by combining lexical, contextual (positional) and statistical information, instead of only one of these strategies, better results can be achieved in this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fourour, N., Morin, E., Daille, B.: Incremental recognition and referential categorization of French proper names. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), vol. III, pp. 1068–1074 (2002)
Traboulsi, H.: A Local Grammar for Proper Names. MPhil Thesis. Surrey University (2004)
McDonald, D.: Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In: Boguraev, B., Putejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 61–76. MIT Press, Cambridge (1993)
Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theoretical Computer Science 313(1), 93–104 (2004)
Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: Proceedings of ACL 1994, pp. 88–95 (1994)
Yarowsky, D.: Hierarchical Decision Lists for Word Sense Disambiguation. Computers and the Humanities 34(1-2), 179–186 (2000)
Piton, O., Maurel, D.: Les noms propres géographiques et le dictionnaire Prolintex. In: Muller, C., Royautée, J., Silberztein, M. (eds.) Intex Pour la linguistique et le traitement automatique des langues. Cahiers MSH Ledoux, vol. 1, pp. 53–76. Presses Universitaires de Franche- Comté, Besançon (2004)
Moura, P.: Dicionário electrónico de siglas e acrónimos. MSc Thesis, Faculdade de Letras da Universidade de Lisboa (unpublished) (2000)
Caseiro, D., Trancoso, I.: Using dynamic wfst composition for recognizing broadcast news. In: Proc. ICSLP 2002, Denver, Colorado, EUA (2002)
Gary-Prieur, M.-N. (ed.): Syntaxe et sémantique des noms propres. Langue Française 92. Paris, Larousse (data)
Leroy, S.: Le nom propre en français. Ophrys, Paris (2004)
Molino, J. (ed.): Le nom propre. Langue Française 66. Paris, Larousse (data)
Anderson, J.: On the Grammar of names (in Language) (May 2004) (to appear)
Silberztein, M.: Dictionnaires électroniques et analyse automatique de texts. Le système Intex. Masson, Paris (1993)
Trancoso, I.: The ONOMASTICA Inter-Language Pronunciation Lexicon. In: Proceedings of EUROSPEECH 1995 - 4th European Conference on Speech Communication and Technology - Madrid, Spain (September 1995)
Ranchhod, E., Mota, C., Baptista, J.: A Computational Lexicon of Portuguese for Automatic Text Parsing. SIGLEX-99: Standardizing Lexical Resources, pp. 74-80. ACL/Maryland Univ., Maryland (1999)
Baptista, J.: A Local Grammar of Proper Nouns. Seminários de Linguística 2: pp. 21-37. Universidade do Algarve, Faro (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baptista, J., Batista, F., Mamede, N. (2006). Building a Dictionary of Anthroponyms. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_3
Download citation
DOI: https://doi.org/10.1007/11751984_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)