Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-540-85287-2_30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

Published: 25 August 2008 Publication History

Abstract

This paper describes disambiguation of Farsi homographs in unrestricted text using thesaurus and corpus. The proposed method is based on [1] with some differences. These differences consist of first using collocational information to avoid the collection of spurious contexts caused by polysemous words in thesaurus categories, and second contribution of all words in the test data context, even those not appeared in the collected contexts to the calculation of the conceptual classes' score. Using a Farsi corpus and a Farsi thesaurus, this method correctly disambiguated 91.46% of the instances of 15 Farsi homographs. This method was compared to three supervised corpus based methods including Naïve Bayes, Exemplar-based, and Decision List. Unlike supervised methods, this method needs no training data, and has a good performance on disambiguation of uncommon words. In addition, this method can be used for removing some kinds of morphological ambiguities.

References

[1]
Yarowsky, D.: Word-sense disambiguation using statistical models of Roget's categories trained on large corpora. In: 15th {sic} International Conference on Computational Linguistics (Coling), Nantes, pp. 454-460 (1992)
[2]
Ide, N., Veronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1-40 (1998)
[3]
Escudero, G., Marquez, L., Rigau, G.: Naïve Bayes and Exemplar-Based Approaches to Word Sense Disambiguation Revisited. In: 14th European Conference on Artificial Intelligence, ECAI, Berlin, Germany (2000)
[4]
Gausted, T.: Linguistic Knowledge and Word Sense Disambiguation, PhD dissertation, Groningen University (2004)
[5]
Gale, B., Church, K., Yarowsky, D.: A method for disambiguating word senses in a corpus. Computers and the Humanities 26, 415-439 (1992)
[6]
Bijankhan, M.: Farsi text corpus, Research Center of Intelligent Signal Processing of Iran (RCISP), http://www.rcisp.com
[7]
Fararooy, J.: thesaurus and Electronic transfer of Persian language content. In: 2nd workshop on Persian language and computer, Tehran, Iran (2004)
[8]
Fararooy, J.: Thesaurus of Persian Words and Phrases (1999)
[9]
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
[10]
Gale, B., Church, K., Yarowsky, D.: Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In: 30th Annual Meeting of the Association for Computational Linguistics, Newark, pp. 249-256 (1992)
[11]
Ng, H.T.: Exemplar-Base Word Sense Disambiguation: Some Recent Improvements. In: 2nd Conference on Empirical Methods in Natural Language Processing, EMNLP 1997 (1997)
[12]
Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: 32th Annual Meeting of the Association for Computational Linguistics, Las Cruces (1994)
[13]
Ng, H.T., Lee, H.B.: Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-based Approach. In: 34th Annual Meeting of the Association for Computational Linguistics, pp. 40-47. N.J. Association for Computational Linguistics, Somerset (1996)

Cited By

View all
  • (2011)Towards automatic acquisition of a fully sense tagged corpus for persianProceedings of the 19th international conference on Foundations of intelligent systems10.5555/2029759.2029817(449-455)Online publication date: 28-Jun-2011
  • (2011)Cross-lingual word sense disambiguation for languages with scarce resourcesProceedings of the 24th Canadian conference on Advances in artificial intelligence10.5555/2018192.2018234(347-358)Online publication date: 25-May-2011
  1. Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language Processing
      August 2008
      509 pages
      ISBN:9783540852865
      • Editors:
      • Bengt Nordström,
      • Aarne Ranta

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 25 August 2008

      Author Tags

      1. Corpus
      2. Farsi Language
      3. Thesaurus
      4. Word Sense Disambiguation

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2011)Towards automatic acquisition of a fully sense tagged corpus for persianProceedings of the 19th international conference on Foundations of intelligent systems10.5555/2029759.2029817(449-455)Online publication date: 28-Jun-2011
      • (2011)Cross-lingual word sense disambiguation for languages with scarce resourcesProceedings of the 24th Canadian conference on Advances in artificial intelligence10.5555/2018192.2018234(347-358)Online publication date: 25-May-2011

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media