Nothing Special   »   [go: up one dir, main page]

Skip to main content

Building a Dictionary of Anthroponyms

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3960))

Abstract

This paper presents a methodology for building an electronic dictionary of anthroponyms of European Portuguese (DicPRO), which constitutes a useful resource for computational processing, due to the importance of names in the structuring of information in texts. The dictionary has been enriched with morphosyntactic and semantic information. It was then used in the specific task of capitalizing anthroponyms and other proper names on a corpus automatically produced by a broadcast news speech recognition system and manually corrected. The output of this system does not offer clues, such as capitalized words or punctuation. This task expects to contribute in rendering more readable the output of such system. The paper shows that, by combining lexical, contextual (positional) and statistical information, instead of only one of these strategies, better results can be achieved in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Fourour, N., Morin, E., Daille, B.: Incremental recognition and referential categorization of French proper names. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), vol. III, pp. 1068–1074 (2002)

    Google Scholar 

  2. Traboulsi, H.: A Local Grammar for Proper Names. MPhil Thesis. Surrey University (2004)

    Google Scholar 

  3. McDonald, D.: Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In: Boguraev, B., Putejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 61–76. MIT Press, Cambridge (1993)

    Google Scholar 

  4. Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theoretical Computer Science 313(1), 93–104 (2004)

    Article  Google Scholar 

  5. Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: Proceedings of ACL 1994, pp. 88–95 (1994)

    Google Scholar 

  6. Yarowsky, D.: Hierarchical Decision Lists for Word Sense Disambiguation. Computers and the Humanities 34(1-2), 179–186 (2000)

    Article  Google Scholar 

  7. Piton, O., Maurel, D.: Les noms propres géographiques et le dictionnaire Prolintex. In: Muller, C., Royautée, J., Silberztein, M. (eds.) Intex Pour la linguistique et le traitement automatique des langues. Cahiers MSH Ledoux, vol. 1, pp. 53–76. Presses Universitaires de Franche- Comté, Besançon (2004)

    Google Scholar 

  8. Moura, P.: Dicionário electrónico de siglas e acrónimos. MSc Thesis, Faculdade de Letras da Universidade de Lisboa (unpublished) (2000)

    Google Scholar 

  9. Caseiro, D., Trancoso, I.: Using dynamic wfst composition for recognizing broadcast news. In: Proc. ICSLP 2002, Denver, Colorado, EUA (2002)

    Google Scholar 

  10. Gary-Prieur, M.-N. (ed.): Syntaxe et sémantique des noms propres. Langue Française 92. Paris, Larousse (data)

    Google Scholar 

  11. Leroy, S.: Le nom propre en français. Ophrys, Paris (2004)

    Google Scholar 

  12. Molino, J. (ed.): Le nom propre. Langue Française 66. Paris, Larousse (data)

    Google Scholar 

  13. Anderson, J.: On the Grammar of names (in Language) (May 2004) (to appear)

    Google Scholar 

  14. Silberztein, M.: Dictionnaires électroniques et analyse automatique de texts. Le système Intex. Masson, Paris (1993)

    Google Scholar 

  15. Trancoso, I.: The ONOMASTICA Inter-Language Pronunciation Lexicon. In: Proceedings of EUROSPEECH 1995 - 4th European Conference on Speech Communication and Technology - Madrid, Spain (September 1995)

    Google Scholar 

  16. Ranchhod, E., Mota, C., Baptista, J.: A Computational Lexicon of Portuguese for Automatic Text Parsing. SIGLEX-99: Standardizing Lexical Resources, pp. 74-80. ACL/Maryland Univ., Maryland (1999)

    Google Scholar 

  17. Baptista, J.: A Local Grammar of Proper Nouns. Seminários de Linguística 2: pp. 21-37. Universidade do Algarve, Faro (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baptista, J., Batista, F., Mamede, N. (2006). Building a Dictionary of Anthroponyms. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_3

Download citation

  • DOI: https://doi.org/10.1007/11751984_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34045-4

  • Online ISBN: 978-3-540-34046-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics