Nothing Special   »   [go: up one dir, main page]

Skip to main content

Aligning GermaNet Senses with Wiktionary Sense Definitions

  • Conference paper
  • First Online:
Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

Abstract

Sense definitions are a crucial component for wordnets and enhance the usability of wordnets for a wide variety of NLP applications. Many wordnets for languages other than English – including the German wordnet GermaNet – lack comprehensive coverage of such definitions. The purpose of this paper is to automatically align sense descriptions from the web-based dictionary Wiktionary to lexical units in GermaNet in order to extend GermaNet with sense descriptions. An alignment algorithm based on word overlaps is developed and different setups of the algorithm are compared. This algorithm yields as the best result an accuracy of 93.8 % and an F1-score of 84.3, which confirms the viability of the proposed method for automatically enriching GermaNet. This best result crucially involves the use of coordinated relations as a novel concept for calculating sense alignment.

The present paper substantially extends the research described earlier in [9] and presents the results of a detailed evaluation of the automatic GermaNet-Wiktionary alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The reason for this lack of sense definitions is entirely pragmatic: the inclusion of descriptions requires considerable human resources, which are often not available.

  2. 2.

    That is, for GermaNet release 6.0, April 2011.

  3. 3.

    Note that the terms mapping and alignment are used interchangeably throughout this paper.

  4. 4.

    See http://www.wiktionary.org.

  5. 5.

    http://www.ukp.tu-darmstadt.de/software/jwktl

  6. 6.

    A similar kind of technique using all related words for constructing pseudo glosses has been used by Gurevych [5] for the purpose of computing semantic relatedness for any two words in GermaNet.

  7. 7.

    Experiments with stemming and lemmatization yielded better results with stemming. Thus, all below described experiments use stemming (Snowball stemmer [19]) as a preprocessing step.

  8. 8.

    Here, directly connected means that the path length between two words is exactly one – disregarding the type of relation (lexical or conceptual).

  9. 9.

    Needless to say, assigning an arbitrary count of at least 1 to the overlap score between words occurring exactly once in both resources will result in a positive mapping of these two senses which, in turn, will result in a prediction of false positives for all cases, where those senses do not match (see the example of Angeln in Sect. 3 above). However, such cases are rare and therefore the heuristic in question works well in practice (see the evaluation section below).

  10. 10.

    The numbers from Table 2 do not exactly add up to 20 997 because some words have more than one part-of-speech.

  11. 11.

    To be even more precise, the accuracies for setups A to E are actually above 93 % and thus human correction is needed only for one out of 14 mappings.

  12. 12.

    Denoting the performance as lower is meant in a relative sense, i.e., compared to the results for the other setups for nouns. Note that setup E for nouns does not perform lower than setup E for adjectives and verbs.

  13. 13.

    http://www.sfs.uni-tuebingen.de/GermaNet/wiktionary.shtml

  14. 14.

    The only comparable work on the same language and resource pair is the one by Matuschek and Gurevych [14]. They have reported results that are 4.2 % (for recall), 9.9 % (for precision), and 2.7 (for F1-score) higher and 8.8 % (for accuracy) lower than ours (for this comparison, always the setup that reports highest numbers is taken). The reason why our accuracy is higher than theirs whereas our precision is lower than theirs lies in the differing focus of parameter adjudication; we aimed at high accuracy.

References

  1. Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pp. 805–810. Morgan Kaufmann Publishers Inc., San Francisco (2003)

    Google Scholar 

  2. Eisenberg, P.: Das Wort - Grundriss der Deutschen Grammatik, 3rd edn. Verlag J. B. Melzer, Stuttgart/Weimar, Germany (2006)

    Google Scholar 

  3. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  4. Fernando, S., Stevenson, M.: Mapping WordNet synsets to Wikipedia articles. In: Proceedings of the Eight International Conference on Language Resources and Evaluation, LREC’12, Istanbul, Turkey, pp. 590–596 (2012)

    Google Scholar 

  5. Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Hamp, B., Feldweg, H.: GermaNet - a lexical-semantic net for German. In: Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, Madrid (1997)

    Google Scholar 

  7. Henrich, V., Hinrichs, E.: GernEdiT – the GermaNet editing tool. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC’10, Valletta, Malta, pp. 2228–2235 (2010)

    Google Scholar 

  8. Henrich, V., Hinrichs, E., Suttner, K.: Automatically linking GermaNet to Wikipedia for harvesting Corpus examples for GermaNet senses. J. Lang. Technol. Comput. Linguist. (JLCL) 27(1), 1–19 (2012)

    Google Scholar 

  9. Henrich, V., Hinrichs, E., Vodolazova, T.: Semi-Automatic extension of GermaNet with sense definitions from Wiktionary. In: Proceedings of the 5th Language and Technology Conference, LTC’11, Poznań, Poland, pp. 126–130 (2011)

    Google Scholar 

  10. Henrich, V., Hinrichs, E., Vodolazova, T.: WebCAGe – a web-harvested corpus annotated with GermaNet senses. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’12, pp. 387–396, Avignon, France (2012)

    Google Scholar 

  11. Kwong, O.Y.: Aligning wordnet with additional lexical resources. In: Proceedings of the COLING-ACL’98 Workshop on ‘Usage of WordNet in Natural Language Processing Systems’, Montreal, QC, Canada, pp. 73–79 (1998)

    Google Scholar 

  12. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC ’86, pp. 24–26. ACM, New York (1986)

    Google Scholar 

  13. Litkowski, K.C.: Towards a meaning-full comparison of lexical resources. In: Proceedings of the ACL Special Interest Group on the Lexicon Workshop on Standardizing Lexical Resources, College Park, MD, USA, pp. 30–37 (1999)

    Google Scholar 

  14. Matuschek, M., Gurevych, I.: Dijkstra-wsa: a graph-based approach to word sense alignment. Trans. Assoc. Comput. Linguist. (TACL) 1, 151–164 (2013)

    Google Scholar 

  15. Meyer, C.M., Gurevych, I.: What psycholinguists know about chemistry: aligning wiktionary and wordnet for increased domain coverage. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, IJCNLP ’11, pp. 883–892 (2011)

    Google Scholar 

  16. Niemann, E., Gurevych, I.: The people’s web meets linguistic knowledge: automatic sense alignment of Wikipedia and WordNet. In: Proceedings of the 9th International Conference on Computational Semantics, IWCS ’11, pp. 205–214, Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  17. Niles, I., Pease, A.: Linking Lexicons and ontologies: mapping WordNet to the suggested upper merged ontology. In: Proceedings of the IEEE International Conference on Information and Knowledge Engineering, IKE’03, pp. 412–416, Las Vegas, Nevada (2003)

    Google Scholar 

  18. Ponzetto, S.P., Navigli, R.: Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pp. 1522–1531. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  19. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  20. Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. Zesch, T.: What’s the difference? - comparing expert-built and collaboratively-built lexical semantic resources. In: FLaReNet Forum 2010, Barcelona, Spain (2010)

    Google Scholar 

Download references

Acknowledgments

The research reported in this paper was jointly funded by the SFB 833 grant of the DFG and by the CLARIN-D grant of the BMBF. We would like to thank Reinhild Barkey, Sarah Schulz, and Johannes Wahle for their help with the evaluation reported in Sect. 5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verena Henrich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Henrich, V., Hinrichs, E., Vodolazova, T. (2014). Aligning GermaNet Senses with Wiktionary Sense Definitions. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics