Nothing Special   »   [go: up one dir, main page]

Skip to main content

Semantic Annotation of Verb Arguments in Shallow Parsed Polish Sentences by Means of the EM Selection Algorithm

  • Chapter
Aspects of Natural Language Processing

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5070))

Abstract

The ultimate goal of our work is to extend a syntactic valence dictionary of Polish verbs by adding some semantic information to verb arguments. This information consists of wordnet semantic categories of words. In order to provide syntactic slots of dictionary entries with lists of appropriate semantic categories of corresponding nouns, we need a treebank with all nouns semantically annotated with such categories, as both syntactic (i.e., argument structure) and semantic information is required.

We aim here at Word Sense Disambiguation (WSD). To solve this task for our specific application, we adapt EM selection algorithm elaborated for extraction of syntactic valence frames.

In the paper, the whole process of data processing is shown. The main focus is put on WSD task. Three versions of the EM selection algorithm are presented: the original one and its two modifications. Finally, the evaluation and comparison of the algorithms is performed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

eBook
USD 15.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 15.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation. Algorithms and Applications. Text, Speach and Language Technology, vol. 33. Springer, Dordrecht (2006)

    Google Scholar 

  2. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of the COLING-ACL’98 Conference, Montreal, Canada, pp. 86–90 (1998)

    Google Scholar 

  3. Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Broda, B., Piasecki, M., Radziszewski, A.: Towards a set of general purpose morphosyntactic tools for Polish. In: [22], pp. 441–450 (2008)

    Google Scholar 

  5. Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), New Brunswick, Canada (2002)

    Google Scholar 

  6. Dębowski, Ł.: Valence extraction using the EM selection and co-occurrence matrices. arXiv (2007)

    Google Scholar 

  7. Dębowski, Ł., Woliński, M.: Argument co-occurrence matrix as a description of verb valence. In: Vetulani, Z. (ed.) Proceedings of the 3rd Language & Technology Conference, Poznań, Poland, pp. 260–264 (2007)

    Google Scholar 

  8. Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M.: Polish WordNet on a shoestring. In: Data Structures for Linguistic Resources and Applications: Proceedings of the GLDV 2007 Biannual Conference of the Society for Computational Linguistics and Language Technology, Universität Tübingen, Tübingen, Germany, pp. 169–178 (2007)

    Google Scholar 

  9. Derwojedowa, M., Szpakowicz, S., Zawisławska, M., Piasecki, M.: Lexical units as the centrepiece of a wordnet. In: [22] (2008)

    Google Scholar 

  10. Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, concepts and relations in the construction of Polish WordNet. In: Tanacs, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proceedings of the Global WordNet Conference, Seged, Hungary, pp. 162–177 (2008)

    Google Scholar 

  11. Dorr, B.J., Jones, D.: Role of word sense disambiguation in lexical acquisition: Predicting semantics from syntactic cues. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING-1996), Copenhagen, Denmark, pp. 322–327 (1996)

    Google Scholar 

  12. Escudero, G., Arquez, L.M., Rigau, G.: Naive bayes and exemplar-based approaches to word sense disambiguation revisited. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI’00), Budapest, Hungary, pp. 421–425 (2003)

    Google Scholar 

  13. Fellbaum, C. (ed.): WordNet — An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  14. Fillmore, C.J., Johnson, C.R., Petruck, M.R.: Background to FrameNet. International Journal of Lexicography 16(3), 235–250 (2003)

    Article  Google Scholar 

  15. Gale, W., Church, K., Yarowsky, D.: Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL’92), Newark, DL, pp. 249–256 (1992)

    Google Scholar 

  16. Gaustad, T.: Linguistic knowledge and word sense disambiguation. PhD thesis, Rijksuniversiteit Groningen, Groningen (2004)

    Google Scholar 

  17. Hajnicz, E.: Dobór czasowników do badań przy tworzeniu słownika semantycznego czasowników polskich. Technical Report 1003, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2007)

    Google Scholar 

  18. Hajnicz, E.: Towards extending syntactic valence dictionary for Polish with semantic categories. In: Lingustic Investigation into Formal Description of Slavic Languages, Peter Lang, Leipzig (2008)

    Google Scholar 

  19. Hajnicz, E., Kupść, A.: Przegląd analizatorów morfologicznych dla języka polskiego. Technical Report 937, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2001)

    Google Scholar 

  20. Hajnicz, E., Murzynowski, G., Woliński, M.: Anotatornia — lingwistyczna baza danych. Conference page InfoBazy2008 (2008), http://www.infobazy.gda.pl/

  21. Ion, R., Tufiş, D.: Multilingual word sense disambiguation using aligned wordnets. Romanian Journal of Information Science and Technology 7(1–2), 183–200 (2004)

    Google Scholar 

  22. Kłopotek, M.A., Przepiórkowski, A., Wierzchoń, S.T. (eds.): Proceedings of the Intelligent Information Systems XVI (IIS’08). Challenging Problems in Science: Computer Science, Zakopane, Poland. Academic Publishing House Exit (2008)

    Google Scholar 

  23. Král, R.: Three approaches to word sense disambiguation for Czech. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 174–179. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  24. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  25. Levin, B.: English verb classes and alternation: a preliminary investigation. University of Chicago Press, Chicago (1993)

    Google Scholar 

  26. Lin, D., Pantel, P.: Concept discovery from texts. In: [5], pp. 577–583 (2002)

    Google Scholar 

  27. McCarthy, D., Carroll, J.: Disambiguating nouns, verbs and adjectives using automatically acquired selectional preferences. Computational Linguistics 29(4), 639–654 (2003)

    Article  MATH  Google Scholar 

  28. Mędak, S.: Praktyczny Słownik Łączliwości Składniowej Czasowników Polskich. Universitas, Cracow (2005)

    Google Scholar 

  29. Obrębski, T.: Automatyczna analiza składniowa języka polskiego z wykorzystaniem gramatyki zależnościowej. PhD thesis, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2002)

    Google Scholar 

  30. Przepiórkowski, A.: The IPI PAN corpus. Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)

    Google Scholar 

  31. Przepiórkowski, A.: What to acquire from corpora in automatic valence acquisition. In: Koseska-Toszewa, V., Roszko, R. (eds.) Semantyka a konfrontacja językowa, vol. 3 (2006)

    Google Scholar 

  32. Przepiórkowski, A., Buczyński, A.: \(\spadesuit\): Shallow parsing and disambiguation engine. In: Vetulani, Z. (ed.) Proceedings of the 3rd Language & Technology Conference, Poznań, Poland, pp. 340–344 (2007)

    Google Scholar 

  33. Przepiórkowski, A., Fast, J.: Baseline experiments in the extraction of Polish valence frames. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS:IIPWM’05, Gdańsk, Poland. Advances in Soft Computing, pp. 511–520. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  34. Przepiórkowski, A., Kupść, A., Marciniak, M., Mykowiecka, A.: Formalny opis języka polskiego. Teoria i implementacja. Academic Publishing House Exit, Warsaw (2002)

    Google Scholar 

  35. Rabiega-Wiśniewska, J.: Podstawy lingwistyczne automatycznego analizatora morfologicznego Amor. Poradnik Językowy 10, 59–78 (2004)

    Google Scholar 

  36. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)

    Google Scholar 

  37. Stevenson, M., Wilks, Y.: The interaction of knowledge sources in word sense disambiguation. Computational Linguistics 27(3), 321–349 (2001)

    Article  Google Scholar 

  38. Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), New Brunswick, Canada, pp. 960–966 (2002)

    Google Scholar 

  39. Świdziński, M.: Gramatyka formalna języka polskiego. Rozprawy Uniwersytetu Warszawskiego. Wydawnictwa Uniwersytetu Warszawskiego, Warsaw (1992)

    Google Scholar 

  40. Świdziński, M.: Syntactic Dictionary of Polish Verbs. Uniwersytet Warszawski / Universiteit van Amsterdam (1994)

    Google Scholar 

  41. Vetulani, Z. (ed.): Proceedings of the 3rd Language & Technology Conference, Poznań, Poland (2007)

    Google Scholar 

  42. Vossen, P. (ed.): EuroWordNet: a multilingual database with lexical semantic network. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

  43. Woliński, M.: An efficient implementation of a large grammar of Polish. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznań, Poland, pp. 343–347 (2005)

    Google Scholar 

  44. Woliński, M.: Komputerowa weryfikacja gramatyki Świdzińskiego. PhD thesis, Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)

    Google Scholar 

  45. Woliński, M.: Morfeusz — a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS:IIPWM’06, Ustroń, Poland. Advances in Soft Computing, pp. 503–512. Springer, Heidelberg (2006)

    Google Scholar 

  46. Wołosz, R.: Efektywna metoda analizy i syntezy morfologicznej w języku polskim. Academic Publishing House Exit, Warsaw (2005)

    Google Scholar 

  47. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL’95), Cambridge, MA, pp. 189–196 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hajnicz, E. (2009). Semantic Annotation of Verb Arguments in Shallow Parsed Polish Sentences by Means of the EM Selection Algorithm. In: Marciniak, M., Mykowiecka, A. (eds) Aspects of Natural Language Processing. Lecture Notes in Computer Science, vol 5070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04735-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04735-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04734-3

  • Online ISBN: 978-3-642-04735-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics