Abstract
The ultimate goal of our work is to extend a syntactic valence dictionary of Polish verbs by adding some semantic information to verb arguments. This information consists of wordnet semantic categories of words. In order to provide syntactic slots of dictionary entries with lists of appropriate semantic categories of corresponding nouns, we need a treebank with all nouns semantically annotated with such categories, as both syntactic (i.e., argument structure) and semantic information is required.
We aim here at Word Sense Disambiguation (WSD). To solve this task for our specific application, we adapt EM selection algorithm elaborated for extraction of syntactic valence frames.
In the paper, the whole process of data processing is shown. The main focus is put on WSD task. Three versions of the EM selection algorithm are presented: the original one and its two modifications. Finally, the evaluation and comparison of the algorithms is performed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation. Algorithms and Applications. Text, Speach and Language Technology, vol. 33. Springer, Dordrecht (2006)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of the COLING-ACL’98 Conference, Montreal, Canada, pp. 86–90 (1998)
Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)
Broda, B., Piasecki, M., Radziszewski, A.: Towards a set of general purpose morphosyntactic tools for Polish. In: [22], pp. 441–450 (2008)
Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), New Brunswick, Canada (2002)
Dębowski, Ł.: Valence extraction using the EM selection and co-occurrence matrices. arXiv (2007)
Dębowski, Ł., Woliński, M.: Argument co-occurrence matrix as a description of verb valence. In: Vetulani, Z. (ed.) Proceedings of the 3rd Language & Technology Conference, Poznań, Poland, pp. 260–264 (2007)
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M.: Polish WordNet on a shoestring. In: Data Structures for Linguistic Resources and Applications: Proceedings of the GLDV 2007 Biannual Conference of the Society for Computational Linguistics and Language Technology, Universität Tübingen, Tübingen, Germany, pp. 169–178 (2007)
Derwojedowa, M., Szpakowicz, S., Zawisławska, M., Piasecki, M.: Lexical units as the centrepiece of a wordnet. In: [22] (2008)
Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, concepts and relations in the construction of Polish WordNet. In: Tanacs, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proceedings of the Global WordNet Conference, Seged, Hungary, pp. 162–177 (2008)
Dorr, B.J., Jones, D.: Role of word sense disambiguation in lexical acquisition: Predicting semantics from syntactic cues. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING-1996), Copenhagen, Denmark, pp. 322–327 (1996)
Escudero, G., Arquez, L.M., Rigau, G.: Naive bayes and exemplar-based approaches to word sense disambiguation revisited. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI’00), Budapest, Hungary, pp. 421–425 (2003)
Fellbaum, C. (ed.): WordNet — An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fillmore, C.J., Johnson, C.R., Petruck, M.R.: Background to FrameNet. International Journal of Lexicography 16(3), 235–250 (2003)
Gale, W., Church, K., Yarowsky, D.: Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL’92), Newark, DL, pp. 249–256 (1992)
Gaustad, T.: Linguistic knowledge and word sense disambiguation. PhD thesis, Rijksuniversiteit Groningen, Groningen (2004)
Hajnicz, E.: Dobór czasowników do badań przy tworzeniu słownika semantycznego czasowników polskich. Technical Report 1003, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2007)
Hajnicz, E.: Towards extending syntactic valence dictionary for Polish with semantic categories. In: Lingustic Investigation into Formal Description of Slavic Languages, Peter Lang, Leipzig (2008)
Hajnicz, E., Kupść, A.: Przegląd analizatorów morfologicznych dla języka polskiego. Technical Report 937, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2001)
Hajnicz, E., Murzynowski, G., Woliński, M.: Anotatornia — lingwistyczna baza danych. Conference page InfoBazy2008 (2008), http://www.infobazy.gda.pl/
Ion, R., Tufiş, D.: Multilingual word sense disambiguation using aligned wordnets. Romanian Journal of Information Science and Technology 7(1–2), 183–200 (2004)
Kłopotek, M.A., Przepiórkowski, A., Wierzchoń, S.T. (eds.): Proceedings of the Intelligent Information Systems XVI (IIS’08). Challenging Problems in Science: Computer Science, Zakopane, Poland. Academic Publishing House Exit (2008)
Král, R.: Three approaches to word sense disambiguation for Czech. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 174–179. Springer, Heidelberg (2001)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Levin, B.: English verb classes and alternation: a preliminary investigation. University of Chicago Press, Chicago (1993)
Lin, D., Pantel, P.: Concept discovery from texts. In: [5], pp. 577–583 (2002)
McCarthy, D., Carroll, J.: Disambiguating nouns, verbs and adjectives using automatically acquired selectional preferences. Computational Linguistics 29(4), 639–654 (2003)
Mędak, S.: Praktyczny Słownik Łączliwości Składniowej Czasowników Polskich. Universitas, Cracow (2005)
Obrębski, T.: Automatyczna analiza składniowa języka polskiego z wykorzystaniem gramatyki zależnościowej. PhD thesis, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2002)
Przepiórkowski, A.: The IPI PAN corpus. Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)
Przepiórkowski, A.: What to acquire from corpora in automatic valence acquisition. In: Koseska-Toszewa, V., Roszko, R. (eds.) Semantyka a konfrontacja językowa, vol. 3 (2006)
Przepiórkowski, A., Buczyński, A.: \(\spadesuit\): Shallow parsing and disambiguation engine. In: Vetulani, Z. (ed.) Proceedings of the 3rd Language & Technology Conference, Poznań, Poland, pp. 340–344 (2007)
Przepiórkowski, A., Fast, J.: Baseline experiments in the extraction of Polish valence frames. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS:IIPWM’05, Gdańsk, Poland. Advances in Soft Computing, pp. 511–520. Springer, Heidelberg (2005)
Przepiórkowski, A., Kupść, A., Marciniak, M., Mykowiecka, A.: Formalny opis języka polskiego. Teoria i implementacja. Academic Publishing House Exit, Warsaw (2002)
Rabiega-Wiśniewska, J.: Podstawy lingwistyczne automatycznego analizatora morfologicznego Amor. Poradnik Językowy 10, 59–78 (2004)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
Stevenson, M., Wilks, Y.: The interaction of knowledge sources in word sense disambiguation. Computational Linguistics 27(3), 321–349 (2001)
Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), New Brunswick, Canada, pp. 960–966 (2002)
Świdziński, M.: Gramatyka formalna języka polskiego. Rozprawy Uniwersytetu Warszawskiego. Wydawnictwa Uniwersytetu Warszawskiego, Warsaw (1992)
Świdziński, M.: Syntactic Dictionary of Polish Verbs. Uniwersytet Warszawski / Universiteit van Amsterdam (1994)
Vetulani, Z. (ed.): Proceedings of the 3rd Language & Technology Conference, Poznań, Poland (2007)
Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic network. Kluwer Academic Publishers, Dordrecht (1998)
Woliński, M.: An efficient implementation of a large grammar of Polish. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznań, Poland, pp. 343–347 (2005)
Woliński, M.: Komputerowa weryfikacja gramatyki Świdzińskiego. PhD thesis, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)
Woliński, M.: Morfeusz — a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS:IIPWM’06, Ustroń, Poland. Advances in Soft Computing, pp. 503–512. Springer, Heidelberg (2006)
Wołosz, R.: Efektywna metoda analizy i syntezy morfologicznej w języku polskim. Academic Publishing House Exit, Warsaw (2005)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL’95), Cambridge, MA, pp. 189–196 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hajnicz, E. (2009). Semantic Annotation of Verb Arguments in Shallow Parsed Polish Sentences by Means of the EM Selection Algorithm. In: Marciniak, M., Mykowiecka, A. (eds) Aspects of Natural Language Processing. Lecture Notes in Computer Science, vol 5070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04735-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-04735-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04734-3
Online ISBN: 978-3-642-04735-0
eBook Packages: Computer ScienceComputer Science (R0)