Abstract
The research project reported in this paper aims at automatic extraction of linguistic information from contexts in the Russian National Corpus (RNC) and its subsequent use in building a comprehensive lexicographic resource – the Index of Russian lexical constructions. The proposed approach implies automatic context classification intended for word sense disambiguation (WSD) and construction identification (CxI). The automatic context processing procedure takes into account the following types of contextual information represented in the RNC multilevel annotation: lexical (lemma) tags (lex), morphological (grammatical) tags (gr), semantic (taxonomy) tags (sem), and combinations of the various types of tags. Multiple experiments on WSD and CxI are performed using RNC representative context samples for nouns. In each series of experiments we analyze (1) different context markers of meaning of target words and (2) constructions including context markers and target words.
This work was supported by Russian Foundation for Basic Research (grant No 10-06-00586) and the RAS Presidium program of basic research “Corpus Linguistics”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Russian National Corpus, http://ruscorpora.ru
Russian National Corpus: 2003–2005. Indrik, Moscow (2005) (in Russian)
Russian National Corpus: 2006–2008. New results and future development. Nestor-Istorija, St. Petersburg (2009) (in Russian)
Nivre, J., Boguslavsky, I.M., Iomdin, L.: Parsing the SynTagRus Treebank of Russian. In: COLING 2008, Manchester, UK, vol. 1, pp. 641–648 (2008)
Goldberg, A.E.: Constructions. A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago (1995)
Goldberg, A.E.: Constructions at Work: the Nature of Generalization in Language. Oxford University Press, Oxford (2006)
Fillmore, C.J.: The Mechanisms of Construction Grammar. Proceedings of the Berkeley Linguistic Society 14, 35–55 (1988)
Tomasello, M.: Constructing a Language: A Usage-Based Approach to Child Language Acquisition. Harvard University Press, Cambridge (2003)
Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications. Text, Speech and Language Technology, vol. 33. Springer, Heidelberg (2007)
Mihalcea, R., Pedersen, T.: Word Sense Disambiguation Tutorial (2005), http://www.d.umn.edu/~tpederse/WSDTutorial.html
Navigli, R.: Word Sense Disambiguation: a Survey. ACM Computing Surveys 41(2), 1–69 (2009)
WordNet, http://wordnet.princeton.edu/
FrameNet, http://framenet.icsi.berkeley.edu/
Pedersen, T.: A Baseline Methodology for Word Sense Disambiguation. In: Gelbukh, A.F. (ed.) CICLing 2002. LNCS, vol. 2276, p. 126. Springer, Heidelberg (2002)
Schütze, H.: Automatic Word Sense Disambiguation. Computational Linguistics 24(1), 23–97 (1998)
Leacock, C., Chodorow, M., Miller, G.: Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics 24(1), 147–165 (1998)
Mihalcea, R.: Word Sense Disambiguation Using Pattern Learning and Automatic Feature Selection. Journal of Natural Language and Engineering 1(1), 1–15 (2002)
Mitrofanova, O., Panicheva, P., Lashevskaya, O.: Statistical Word Sense Disambiguation in Contexts for Russian Nouns Denoting Physical Objects. In: Sojka, P., et al. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 153–159. Springer, Heidelberg (2008a)
Mitrofanova, O., Lashevskaya, O., Panicheva, P.: Experiments on statistical WSD for Russian nouns in a corpus. In: Proceedings of the International Conference Corpora 2008, St. Petersburg, Russia, October 6–10, pp. 284–293 (2008b) (in Russian)
Lukashevich, N.V., Chujko, D.S.: Automatic WSD based on thesaurus knowledge. In: Internet-matematika 2007, Ekaterinburg, pp. 108–117 (2007) (in Russian)
Rahilina, E.V., Kobritsov, B.P., Kustova, G.I., Lashevskaja, O.N., Shemanaeva, O.J.: Semantic ambiguity as an application-oriented problem: word class tagging in the RNC. In: Computational Linguistics and Intellectual Technologies. Proceedings of the International Workshop Dialogue 2006, Moscow, pp. 445–450 (2006) (in Russian)
Kustova, G.I., Lashevskaja, O.N., Paducheva, E.V., Rakhilina, E.V.: Verb Taxonomy: From Theoretical Lexical Semantics to Practice of Corpus Tagging. In: Lewandowska, B., Dziwirek, K. (eds.) Cognitive Corpus Linguistics Studies. Peter Lang, Frankfurt (2009)
Azarova, I.V., Bichineva, S.V., Vakhitova, D.T.: Automatic WSD of the most frequent nouns (in terms of the structural units of RussNet). In: Proceedings of the International Conference Corpora 2008, St. Petersburg, Russia, October 6–10, pp. 5–8 (2008) (in Russian)
Azarova, I.V., Marina, A.S.: Computational context classification: preparing the data for the thesaurus RussNet. In: Computational Linguistics and Intellectual Technologies. Proceedings of the International Workshop Dialogue 2006, pp. 13–17. RGGU, Moscow (2006) (in Russian)
Kobritsov, B.P., Lashevskaja, O.N., Shemanajeva, O.J.: WSD in mass media texts: shallow rules and statistic evaluation. In: Internet–matematika 2005: Avtomaticheskaja obrabotka web-dannyx, Moscow, pp. 38–57 (2005) (in Russian)
Toldova, S.J., Kustova, G.I., Lashevskaja, O.N.: Semantic filters for WSD in the Russian National Corpus: verbs. In: Computational linguistics and intellectual technologies. Proceedings of the International Workshop Dialogue 2008, pp. 522–529. RGGU, Moscow (2008) (in Russian)
Sahlgren, M., Knutsson, O.: Workshop on Extracting and Using Constructions in NLP. In: NODALIDA 2009. SICS Technical Report T2009:10 (2009)
Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, pp. 25–31, Los Angeles, CA (2010)
Wible, D., Tsao, N.-L.: StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions. In: Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, Los Angeles, CA, pp. 25–31 (2010)
Lashevskaja, O., Mitrofanova, O.: Disambiguation of Taxonomy Markers in Context: Russian Nouns. In: Jokinen, K., Bick, E. (eds.) NODALIDA 2009. NEALT Proceedings Series, vol. 4, pp. 111–117 (2009)
Mitrofanova, O., Lyashevskaya, O.: Context markers of the nouns with concrete meaning in the lexico-semantic annotation of the RNC. In: XXXVIII International philological Conference, St. Petersburg (2009) (in Russian)
Atkins, B.T.S., Rundell, M.: The Oxford Guide to Practical Lexicography. Oxford University Press, New York (2008)
Gries, S.T., Divjak, D.: Behavioral Profiles: a Corpus-Based Approach to Cognitive Semantic Analysis. In: Evans, V., Pourcel, S.S. (eds.) New Directions in Cognitive Linguistics. John Benjamins, Amsterdam (2008)
Fillmore, C.J., Lee-Goldman, R.R., Rhodes, R.: The FrameNet Constructicon. In: Boas, H.C., Sag, I.A. (eds.) Sign-based Construction Grammar. CSLI Publications, Stanford (forthcoming)
Lyashevskaya, O.: Bank of Russian Constructions and Valencies. In: LREC 2010, pp. 1802–1805. ELRA (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lyashevskaya, O., Mitrofanova, O., Grachkova, M., Romanov, S., Shimorina, A., Shurygina, A. (2011). Automatic Word Sense Disambiguation and Construction Identification Based on Corpus Multilevel Annotation. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)