Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2051115.2051158guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Automatically enriching a thesaurus with information from dictionaries

Published: 10 October 2011 Publication History

Abstract

Regarding that information in broad-coverage knowledge bases, such as thesauri, is usually incomplete, merging information from different sources is a good option to amplify coverage. We propose a method for the enrichment of a thesaurus with information acquired automatically from dictionaries: pairs of synonyms are assigned to candidate synsets and, the pairs whose elements are not in the thesaurus are clustered to identify new synsets. This method was used in the enrichment of a Brazilian Portuguese thesaurus with synonyms from a European Portuguese dictionary, and resulted in a larger and broader thesaurus with new words and new concepts. The assignments and the obtained synsets were manually evaluated and yielded correction scores higher than 71% and 85% respectively.

References

[1]
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proc. Human Language Technologies: 2009 Annual Conference of the North American Chapter of ACL (NAACL-HLT), pp. 19-27. ACL, Stroudsburg (2009).
[2]
Dolan, W.B.:Word sense ambiguation: clustering related senses. In: Proc. 15th Conference on Computational Linguistics (COLING), pp. 712-716. ACL, Morristown (1994).
[3]
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (May 1998).
[4]
Gangemi, A., Guarino, N., Masolo, C., Oltramari, A.: Interfacing WordNet with DOLCE: towards OntoWordNet. In: Ontology and the Lexicon: A Natural Language Processing Perspective, ch.3. Cambridge University Press (2010).
[5]
Gfeller, D., Chappelier, J.C., Rios, P.D.L.: Synonym Dictionary Improvement through Markov Clustering and Clustering Stability. In: Proc. International Symposium on Applied Stochastic Models and Data Analysis (ASMDA), pp. 106-113 (2005).
[6]
Gomes, P., Pereira, F.C., Paiva, P., Seco, N., Carreiro, P., Ferreira, J.L., Bento, C.: Noun sense disambiguation with wordnet for software design retrieval. In: Proc. Advances in Artificial Intelligence, 16th Conference of the Canadian Society for Computational Studies of Intelligence, Halifax, Canada, pp. 537-543 (2003).
[7]
Gonçalo Oliveira, H., Gomes, P.: Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese. In: Proc. 5th European Starting AI Researcher Symposium (STAIRS 2010). IOS Press (2010).
[8]
Gonçalo Oliveira, H., Gomes, P.: Automatic discovery of fuzzy synsets from dictionary definitions. In: Proc. 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain (2011).
[9]
Gonçalo Oliveira, H., Santos, D., Gomes, P.: Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação. Linguamática 2(1), 77-93 (2010).
[10]
Harabagiu, S.M., Moldovan, D.I.: Enriching the WordNet taxonomy with contextual knowledge acquired from text. In: Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, pp. 301-333. MIT Press, Cambridge (2000).
[11]
Hearst, M.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database and Some of its Applications, pp. 131- 153. MIT Press, Cambridge (1998).
[12]
Kilgarriff, A.: Word senses are not bona fide objects: implications for cognitive science, formal semantics. In: Proc. 5th International Conference on the Cognitive Science of Natural Language Processing, NLP, pp. 193-200 (1996).
[13]
Lin, D., Pantel, P.: Concept discovery from text. In: Proc. 19th International Conference on Computational Linguistics (COLING), pp. 577-583 (2002).
[14]
Maziero, E.G., Pardo, T.A.S., Felippo, A.D., Dias-da-Silva, B.C.: A Base de Dados Lexical e a Interface Web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In: VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL), pp. 390-392 (2008).
[15]
Nastase, V., Szpakowicz, S.: Augmenting WordNet's Structure Using LDOCE. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 281-294. Springer, Heidelberg (2003).
[16]
Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, T.Y., Magistry, P., Huang, C.R.: Wiktionary and NLP: Improving synonymy networks. In: Proc. 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, pp. 19-27. ACL, Suntec (2009).
[17]
Navigli, R., Velardi, P., Cucchiarelli, A., Neri, F.: Extending and enriching Word-Net with OntoLearn. In: Proc. 2nd Global WordNet Conference (GWC), pp. 279- 284. Masaryk University, Brno (2004).
[18]
Niemann, E., Gurevych, I.: The people's web meets linguistic knowledge: Automatic sense alignment of wikipedia and WordNet. In: Proc. International Conference on Computational Semantics (IWCS), Oxford, UK, pp. 205-214 (2011).
[19]
Pantel, P.: Inducing ontological co-occurrence vectors. In: Proc. 43rd Annual Meeting of the Association for Computational Linguistics, pp. 125-132. ACL Press (2005).
[20]
Pasca, M., Harabagiu, S.M.: The informative role of WordNet in open-domain question answering. In: Proc. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh, USA, pp. 138-143 (2001).
[21]
Pease, A., Fellbaum, C.: Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet linking project and global WordNet. In: Ontology and the Lexicon: A Natural Language Processing Perspective, ch.2., Cambridge University Press (2010).
[22]
Peters, W., Peters, I., Vossen, P.: Automatic sense clustering in EuroWordnet. In: Proc. 1st International Conference on Language Resources and Evaluation (LREC), Granada, pp. 409-416 (May 1998).
[23]
Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proc. 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, California, pp. 2083-2088 (2009).
[24]
Ponzetto, S.P., Navigli, R.: Knowledge-rich word sense disambiguation rivaling supervised systems. In: Procs. of 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522-1531. ACL Press, Uppsala (2010).
[25]
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380-386. Springer, Heidelberg (2005).
[26]
Santos, D., Barreiro, A., Costa, L., Freitas, C., Gomes, P., Gonçalo Oliveira, H., Medeiros, J.C., Silva, R.: O papel das relações semânticas em português: Comparando o TeP, o MWN.PT e o PAPEL. In: Actas do XXV Encontro Nacional da Associação Portuguesa de Linguística (forthcomming, 2010).
[27]
Teixeira, J., Sarmento, L., Oliveira, E.: Comparing Verb Synonym Resources for Portuguese. In: Computational Processing of the Portuguese Language, 9th International Conference Proc. (PROPOR), Porto Alegre, Brasil, pp. 100-109 (2010).
[28]
Tonelli, S., Pighin, D.: New features for FrameNet: WordNet mapping. In: Proc. 13th Conference on Computational Natural Language Learning (CoNLL), pp. 219- 227. ACL, Stroudsburg (2009).
[29]
Toral, A., Muñoz, R., Monachini, M.: Named Entity Wordnet. In: Proc. International Conference on Language Resources and Evaluation (LREC). ELRA, Marrakech (2008).
[30]
Vossen, P.: EuroWordNet: a multilingual database for information retrievaleuroWordNet: a multilingual database for information retrieval. In: Proc. DELOS workshop on Cross-Language Information Retrieval, Zurich (1997).

Cited By

View all
  • (2012)Integrating lexical-semantic knowledge to build a public lexical ontology for portugueseProceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems10.1007/978-3-642-31178-9_23(210-215)Online publication date: 26-Jun-2012

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
EPIA'11: Proceedings of the 15th Portugese conference on Progress in artificial intelligence
October 2011
703 pages
ISBN:9783642247682
  • Editors:
  • Luis Antunes,
  • H. Sofia Pinto

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 October 2011

Author Tags

  1. clustering
  2. lexico-semantic knowledge
  3. synonymy
  4. thesaurus

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Integrating lexical-semantic knowledge to build a public lexical ontology for portugueseProceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems10.1007/978-3-642-31178-9_23(210-215)Online publication date: 26-Jun-2012

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media