research-article

Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks

Authors:

Chafik Aloulou,

Lamia Hadrich BelguithAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 19, Issue 6

Article No.: 86, Pages 1 - 16

https://doi.org/10.1145/3410569

Published: 15 October 2020 Publication History

Abstract

How can we determine the semantic meaning of a word in relation to its context of appearance? We eventually have to grabble with this difficult question, as one of the paramount problems of Natural Language Processing (NLP). In other words, this issue is commonly defined as Word Sense Disambiguation (WSD). The latter is one of the crucial difficulties within the NLP field. In this respect, word vectors extracted from a neural network model have been successfully applied for resolving the WSD problem. Accordingly, this article presents an unprecedented method to disambiguate Arabic words according to both their contextual appearance in a source text and the era in which they emerged. In fact, in the few previous decades, many researchers have been grabbling with Arabic Word Sense Disambiguation.

It should be noted that the Arabic language can be divided into three major historical periods: old Arabic, middle-age Arabic, and contemporary Arabic. Actually, contemporary Arabic has proved to be the greatest concern of many researchers. The main gist of our work is to disambiguate Arabic words according to the historical period in which they appeared. To perform such a task, we suggest a method that deploys contextualized word embeddings to better gather valid syntactic and semantic information of the same word by taking into account its contextual uses. The preponderant thing is to convert both the senses and the contextual uses of an ambiguous item to vectors, then determine which of the possible conceptual meanings of the target word is closer to the given context.

References

[1]

Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. 2016. Farasa: A fast and furious segmenter for arabic. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’16).

[2]

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR : An easy-to-use framework for state-of-the-art NLP. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).

[3]

Alan Akbik, Tanja Bergmann, and Rol Vollgraf. 2019. Pooled contextualized embeddings for named entity recognition. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).

[4]

Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the International Conference on Computational Linguistics (COLING’18).

[5]

Almoataz B. Al-Said. 2015. The historical arabic dictionary resources. J. Arab Lang. 129 (2015).

[6]

Almoataz B. Al-Said and Lucía Medea-García. 2014. The historical arabic dictionary corpus and its suitability for a grammaticalization approach. In Proceedings of the 5th International Conference in Linguistics.

[7]

Marwah Alian, Arafat Awajan, and Akram Al-Kouz. 2016. Arabic word sense disambiguation using wikipedia. Int. J. Comput. Info. Sci. 12 (2016), 857--867.

[8]

Marwah Alian, Arafat Awajan, and Akram Al-Kouz. 2017. Arabic word sense disambiguation—Survey. In Proceedings of the International Conference on New Trends in Computing Sciences.

[9]

Ali Alkhatlan, Jugal Kalita, and Ahmed Alhaddad. 2018. Word sense disambiguation for arabic exploiting arabic wordnet and word embedding and word embedding. In Proceedings of the 4th International Conference On Arabic Compitational Linguistics (ACLing’18).

[10]

Jiang Bian, Bin Gao, and Tie-Yan Liu. 2014. Knowledge-powered deep learning for word embedding. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECMLPKDD’14).

Digital Library

[11]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135--146.

[12]

Nadia Bouhriz, Faouzia Benabbou, and El Habib Ben Lahmar. 2016. Word sense disambiguation approach for arabic text. Int. J. Adv. Comput. Sci. and Appl. 7, 4 (2016).

[13]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing:Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning.

Digital Library

[14]

Arjun Das, Debasis Ganguly, and Utpal Garain. 2017. Named entity recognition with word embeddings and wikipedia categories for a low-resource language. ACM Trans. Asian Low-Res. Lang. Info. Process. 16, 3 (2017).

[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).

[16]

O. Dongsuk, Sunjae Kwon, Kyungsun Kim, and Youngjoong Ko. 2018. WordSense disambiguation based on word similarity calculation using word vector representation from a knowledge-based graph. In Proceedings of the 27th International Conference on Computational Linguistics.

[17]

Jibril Frej, Jean-Pierre Chevallet, and Didier Schwab. 2018. Enhancing translation language models with word embedding for information retrieval. Comput. Res. Repos. (2018), 1801.03844.

[18]

Zellig S. Harris. 1954. Distributional structure. Word 10, 2--3 (1954).

[19]

Mustafa Jarrar. 2018. The Arabic Ontology Basics. Retrieved from http://www.jarrar.info/courses/Jarrar.LectureNotes.ArabicOntology.pdf.

[20]

Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2018. Word sense disambiguation to create a historical dictionary for arabic language. In Proceedings of the 8th International Conference on Computer Science and Information Technology (CSIT’18).

[21]

Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2018. Word2vec for arabic word sense disambiguation. In Proceedings of the International Conference on Natural Language 8 Information Systems (NLDB’18).

Digital Library

[22]

Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2020. Towards a historical dictionary for arabic language. Int. J. Speech Technol. (2020).

[23]

Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. A deep dive into word sense disambiguation with LSTM. In Proceedings of the International Conference on Computational Linguistics.

[24]

Yuncong Li, Cunxiang Yin, Ting Wei, Huiqiang Zhong, Jinchang Luo, Siqi Xu, and Xiaohui Wu. 2019. A joint model for aspect-category sentiment analysis with contextualized aspect embedding. Comput. Res. Repos. (2019), 1908.11017.

[25]

Mohamed El Bachir Menai. 2014. Word sense disambiguation using evolutionary algorithms—Application to Arabic language. Comput. Hum. Behav. 41 (2014), 92--103.

Digital Library

[26]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).

[27]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Adgate Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems.

[28]

Andriy Mnih and Geoffrey E. Hinton. 2008. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems. MIT Press.

[29]

Korawit Orkphol and Wu Yang. 2019. Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. Big Data Anal. Artific. Intell. (2019).

[30]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).

[31]

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’18).

[32]

Barbara Plank, Anders Søgaard, and Yoav Goldberg. 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the Association for Computational Linguistics (ACL’16).

[33]

Sanjana Ramprasad and James Maddox. 2019. CoKE : Word sense induction using contextualized knowledge embeddings. In Proceedings of the Spring Symposium on Combining Machine Learning with Knowledge Engineering.

[34]

Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classification and clustering of arguments with contextualized word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

[35]

Motaz Saad and Wesam Ashour. 2010. OSAC: Open source arabic corpora. In Proceedings of the International Conference on Electrical and Computer Systems.

[36]

Joaquim Santos, Juliano Terra, Bernardo Consoli, and Renata Vieira. 2019. Multidomain contextual embeddings for namedentity recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF’19).

[37]

Bianca Scarlini, Tommaso Pasini, and Roberto Navigli. 2020. SensEmBERT: Context-enhanced sense embeddings for multilingual word sense disambiguation. In TProceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).

[38]

Didier Schwab, Laurent Besacier, Jérémy Ferrero, and Frédéric Agnès. 2017. Using word embedding for cross-language plagiarism detection. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL’17).

[39]

Miran Seok, Hye-Jeong Song, Chan-Young Park, Jong-Dae Kim, and Yu-Seop Kim. 2016. Named entity recognition using word embedding as a feature. Int. J. Softw. Eng. Appl. (2016).

[40]

D. Shashavali, V. Vishwjeet, Rahul Kumar, Gaurav Mathur, Nikhil Nihal, Siddhartha Mukherjee, and Suresh Venkanagouda Patil. 2019. Sentence similarity techniques for short vs variable length text using word embeddings. Comput. Sist. 23, 3 (2019).

[41]

Dima Suleiman, Arafat Awajan, and Nailah Al-Madi. 2017. Deep learning-based technique for plagiarism detection in arabic texts. In Proceedings of the International Conference on New Trends in Computing Sciences (ICTCS’17).

[42]

Dongfang Xu, Egoitz Laparra, and Steven Bethard. 2019. Pre-trained contextualized character embeddings lead to major improvements in time normalization: A detailed analysis. In Proceedings of the 8th Joint Conference on Lexical and Computational Semantics (SEM’19).

[43]

Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. 2016. Semi supervised word sense disambiguation with neural models. In Proceedings of the International Conference on Computational Linguistics (COLING’16).

[44]

Anis Zouaghi, Laroussi Merhbene, and Mounir Zrigui. 2012. Combination of information retrieval methods with LESK algorithm for arabic word sense disambiguation. Artific. Intell. Rev. (2012).

[45]

Anis Zouaghi, Laroussi Merhbene, and Mounir Zrigui. 2012. A hybrid approach for arabic word sense disambiguation. Int. J. Comput. Process. Lang. (2012).

Cited By

Djaidri AAliane HAzzoune H(2025)Exploring the Impact of Stop Words and Particles on Arabic Word Sense DisambiguationArabic Language Processing: From Theory to Practice10.1007/978-3-031-80438-0_3(30-40)Online publication date: 2-Feb-2025
https://doi.org/10.1007/978-3-031-80438-0_3
Almanea M(2024)Deep Learning in Written Arabic Linguistic Studies: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2024.348855312(172196-172233)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3488553
Djaidri AAliane HAzzoune H(2023)The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense DisambiguationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360577722:8(1-23)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.1145/3605777
Show More Cited By

Index Terms

Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Lexical semantics

Recommendations

The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation
Word sense disambiguation (WSD) is the task of automatically determining the meaning of a polysemous word in a specific context. Word sense induction is the unsupervised clustering of word usages in a different context to distinguish senses and perform ...
Towards a historical dictionary for Arabic language
Abstract
A historical dictionary is a language dictionary which studies the evolution of the construction of words and their meanings through the chronological stages the language has undergone. However, despite its richness, Arabic does not yet have a ...
Unsupervised translated word sense disambiguation in constructing bilingual lexical database
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

The performance of a machine translation system depends on the availability of bilingual lexical dictionary and completion of its word sense disambiguation performance. Word sense disambiguation plays a vital role in several applications such as machine ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 19, Issue 6

November 2020

277 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3426881

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2020

Accepted: 01 July 2020

Revised: 01 May 2020

Received: 01 February 2020

Published in TALLIP Volume 19, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
130
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Djaidri AAliane HAzzoune H(2025)Exploring the Impact of Stop Words and Particles on Arabic Word Sense DisambiguationArabic Language Processing: From Theory to Practice10.1007/978-3-031-80438-0_3(30-40)Online publication date: 2-Feb-2025
https://doi.org/10.1007/978-3-031-80438-0_3
Almanea M(2024)Deep Learning in Written Arabic Linguistic Studies: A Comprehensive SurveyIEEE Access10.1109/ACCESS.2024.348855312(172196-172233)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3488553
Djaidri AAliane HAzzoune H(2023)The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense DisambiguationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360577722:8(1-23)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.1145/3605777
Belhadi ADjenouri YSrivastava GLin J(2023)Fast and Accurate Framework for Ontology Matching in Web of ThingsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357870822:5(1-19)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1145/3578708
Laatar RRhayem AAloulou CBelguith L(2022)Towards a Historical Ontology for Arabic Language: Investigation and Future DirectionsIntelligent Systems Design and Applications10.1007/978-3-030-96308-8_100(1078-1087)Online publication date: 27-Mar-2022
https://doi.org/10.1007/978-3-030-96308-8_100
El-Razzaz MFakhr MMaghraby F(2021)Arabic Gloss WSD Using BERTApplied Sciences10.3390/app1106256711:6(2567)Online publication date: 13-Mar-2021
https://doi.org/10.3390/app11062567
Hou K(2021)Principal Component Analysis and Prediction of Students’ Physical Health Standard Test Results Based on Recurrent Convolution Neural NetworkWireless Communications & Mobile Computing10.1155/2021/24386562021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/2438656
Neji SChenaina TShoeb ABen Ayed L(2021)HIR: A Hybrid IR Ranking Model2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00256(1717-1722)Online publication date: Jul-2021
https://doi.org/10.1109/COMPSAC51774.2021.00256
Neji SChenaina TShoeb AAyed L(2021)SemApp: A Semantic Approach to Enhance Information RetrievalComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_6(62-78)Online publication date: 11-Sep-2021
https://doi.org/10.1007/978-3-030-86970-0_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents