Abstract
In this research we introduce a new approach for Arabic word sense disambiguation by utilizing Wikipedia as a lexical resource for disambiguation. The nearest sense for an ambiguous word is selected using Vector Space Model as a representation and cosine similarity between the word context and the retrieved senses from Wikipedia as a measure. Three experiments have been conducted to evaluate the proposed approach, two experiments use the first retrieved sentence for each sense from Wikipedia but they use different Vector Space Model representations while the third experiment uses the first paragraph for the retrieved sense from Wikipedia. The experiments show that using first paragraph is better than the first sentence and the use of TF-IDF is better than using abstract frequency in VSM. Also, the proposed approach is tested on English words and it gives better results using the first sentence retrieved from Wikipedia for each sense.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdullah, A. (2013). Arabic Wikipedia: Why it lags behind. London: Asfar e-Journal.
Bouhriz, N., Benabbou, F., & Lahmar, E. H. B. (2016). Word sense disambiguation approach for Arabic text, (IJACSA). International Journal of Advanced Computer Science and Applications, 7(4), 381–385.
Carpaut, M., & Wu, D. (2005). Word sense disambiguation vs. statistical machine translation, In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 387–394.
Chan, Y., Ng, H., & Chiang, D., 2007, “Word sense disambiguation improves statistical machine translation”, In: Proc. of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 33–40.
Cleary JG, Trigg LE (1995) K*: An instance-based learner using an entropic distance measure. In: 12th International Conference on Machine Learning, 108–114.
Dandala, B. (2013). Multilingual word sense disambiguation using Wikipedia. PhD Dissertation, University of North Texas.
Diab, M. (2003). Word sense disambiguation within a multilingual framework. PhD dissertation, University of Maryland.
El Bachir Menai, M., Alsaeedan, W. (2012). Genetic algorithm for Arabic word sense disambiguation, 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, IEEE, pp. 195–200.
El-Gedawy, M. N. (2013). Using fuzzifiers to solve word sense ambiguation in Arabic language. International Journal of Computer Applications, 79(2), 1–8.
Elkateb, S., Black, W., Vossen, P., Farwell, D., Rodríguez, H., Pease, A., & Alkhalifa, M. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of Arabic NLP/MT Conference. London, UK.
Hadni, M., El Alaoui, S., & Lachkar, A. (2016). Word sense disambiguation for Arabic text categorization. The International Arab Journal of Information Technology, 13(1A), 215–222.
Ide, N., & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.
Jacquemin, B., Brun, C., & Boux, C. (2002). Enriching a text by semantic disambiguation for information extraction. In: Proc. of the Workshop on Using Semantics for Information Retrieval and Filtering in the 3rd International Conference in Language Resources and Evaluation (LREC).
Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proceedings of SIGDOC’86.
Lowe, W. (2001). Towards a theory of semantic space. In: Proceedings of the Twenty-_rst Annual Conference of the Cognitive Science Society, pp. 576–581.
Mallery, J. C. (1988). Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Ph.D. dissertation. MIT Political Science Department, Cambridge, MA.
Merhbene, L., Zouaghi, A., & Zrigui, M. (2012). Lexical disambiguation of Arabic language: An experimental study, polibits no. 46. México, 2012, 49–54.
Merhbene, L., Zouaghi, A., Zrigui, M. (2013). A semi-supervised method for arabic word sense disambiguation using a weighted directed graph, In: International Joint Conference on Natural Language Processing (pp. 1027–1031).
Mihalcea, R., Tarau, P., Figa, E. (2004). PageRank on semantic networks with application to word sense disambiguation. In: Proceedings of the 20th international conference on Computational Linguistics, COLING ‘04, doi:10.3115/1220355.1220517, ACM.
Navigli, R. (2009). Word sense disambiguation: a survey, ACM Computing Surveys. 41(2), ACM Press, pp 1–69.
Pal, A. R., & Saha, D. (2015). Word sense disambiguation: A survey. International Journal of Control Theory and Computer Modeling (IJCTCM), 5(3). doi:10.5121/ijctcm.2015.5301.
Pinto, D., Rosso, P., Benajiba, Y., Ahachad, A., Jiménez-Salazar, H. (2007). Word sense induction in the Arabic language: A self-term expansion based approach, Proc. 7th Conference on Language Engineering of the Egyptian Society of Language Engineering-ESOLE, pp. 235–245.
Ponzetto, S.P., Navigli, R. (2010). Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1522–1531.
Salton, G., Wong, A., & Yang, C. S. (1975). A Vector Space Model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Schütze, H., & Pedersen, J. (1995). Information retrieval based on word senses. In: Proc. of Symposium on Document Analysis and Information Retrieval (SDAIR’95), pp. 161–175.
Stokoe, C., Oakes, M., & Tait, J. (2003). Word sense disambiguation in information retrieval revisited. In: Proc. of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 159–166.
Turney, P. D., & Pantel, Patrick. (2010). From frequency to meaning: Vector Space Models of semantics. Journal of Articial Intelligence Research, 37(2010), 141–188.
Weaver, W. (1955). Translation. In W. Locke & D. Booth (Eds.), Machine translation of languages: Fourteen essays. Cambridge, MA: MIT Press.
Wiki. (2016). Arabic Wikipedia definition retrieved at 22 June 2016 from: https://en.wikipedia.org/wiki/Arabic_Wikipedia
Zouaghi, A. (2012). A hybrid approach for arabic word sense disambiguation. International Journal of Computer Processing of Languages, 24(2), 133–151.
Zouaghi, A., Merhbene, L., & Zrigui, M. (2011). Word sense disambiguation for Arabic language using the variants of the Lesk algorithm. WORLDCOMP’, 11, 561–567.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alian, M., Awajan, A. & Al-Kouz, A. Word sense disambiguation for Arabic text using Wikipedia and Vector Space Model. Int J Speech Technol 19, 857–867 (2016). https://doi.org/10.1007/s10772-016-9376-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-016-9376-y