Abstract
In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology.
This paper has been supported by the Spanish Government under projects CESS-ECE (HUM2004-21127-E) and R2D2 (TIC2003-07158-C04-01).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bentivogli, L., Pianta, E.: Exploiting Parallel Texts in the Creation of Multilingual Semantically Annotated Resources: The MultiSemCor Corpus. Natural Language Engineering. Special Issue on Parallel Text 11(3), 247–261 (2005)
Civit, M., Martí, M.A., Navarro, B., Bufí, N., Fernández, B., Marcos, R.: Issues in the Syntactic Annotation of Cast3LB. In: 4th International on Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003 workshop, Budapest, Hungary (2003)
Kohomban, U.S., Lee, W.S.: Learning Semantic Classes for Word Sense Disambiguation. In: Proceeding of the 43th Annual Meeting of the Association for Computational Linguistics, Michigan, USA (2005)
Magnini, B., Cavaglia, G.: Integrating Subject Field Codes into WordNet. In: Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece (2000)
Miller, G.A., Leacock, C., Randee, T., Bunker, R.: A Semantic Concordance. In: Proceedings of the 3rd ARPA Workshop on Human Language Technology, San Francisco (1993)
Navarro, B., Civit, M., Martí, M.A., Marcos, R., Fernández, B.: Syntactic, Semantic and Pragmatic Annotation in Cast3LB. In: Corpus Linguistics 2003 Workshop on Shallow Procesing of Large Corpora, Lancaster, UK (2003)
Navarro, B., Marcos, R., Abad, P.: Semantic Annotation and Inter-Annotators Agreement in Cast3LB Corpus. In: Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), Barcelona, Spain (2005)
Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of 2nd International Conference on Formal Ontology in Information Systems (FOIS 2001), Ogunquit, USA (2001)
Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington (1997)
Sebastián, N., Martí, M.A., Carreiras, M.F., Cuetos, F.: LEXESP: Léxico Informatizado del Español, Edicions de la Universitat de Barcelona (2000)
Segond, F., Schiller, A., Grefenstette, G., Chanod, J.-P.: An Experiment in Semantic Tagging using Hidden Markov Model Tagging. In: Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications. Proceedings of ACL 1997, Madrid, Spain, pp. 78–81 (1997)
Snyder, B., Palmer, M.: The English All-Word Task. In: Porceedings of SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain (2004)
Uliveri, M., Guazzini, E., Bertagna, F., Calzolari, N.: Senseval-3: The Italian All-words Task. In: Proceeding of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Anlysis of Texts, Barcelona, Spain (2004)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Villarejo, L., Márquez, L., Rigau, G.: Exploring the construction of semantic class classifiers for WSD. Revista de Procesamiento del Lenguaje Natural 35, 195–202 (2005)
Yarowsky, D.: Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora. In: Proceedings, COLING 1992, Nantes, France, pp. 454–460 (1992)
Vossen, P.: EuroWordNet: a multilingual database with lexical semantic networks for European Languages (1998)
Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 233–237 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Izquierdo-Beviá, R., Moreno-Monteagudo, L., Navarro, B., Suárez, A. (2006). Spanish All-Words Semantic Class Disambiguation Using Cast3LB Corpus. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_84
Download citation
DOI: https://doi.org/10.1007/11925231_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)