Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks

Published: 15 October 2020 Publication History

Abstract

How can we determine the semantic meaning of a word in relation to its context of appearance? We eventually have to grabble with this difficult question, as one of the paramount problems of Natural Language Processing (NLP). In other words, this issue is commonly defined as Word Sense Disambiguation (WSD). The latter is one of the crucial difficulties within the NLP field. In this respect, word vectors extracted from a neural network model have been successfully applied for resolving the WSD problem. Accordingly, this article presents an unprecedented method to disambiguate Arabic words according to both their contextual appearance in a source text and the era in which they emerged. In fact, in the few previous decades, many researchers have been grabbling with Arabic Word Sense Disambiguation.
It should be noted that the Arabic language can be divided into three major historical periods: old Arabic, middle-age Arabic, and contemporary Arabic. Actually, contemporary Arabic has proved to be the greatest concern of many researchers. The main gist of our work is to disambiguate Arabic words according to the historical period in which they appeared. To perform such a task, we suggest a method that deploys contextualized word embeddings to better gather valid syntactic and semantic information of the same word by taking into account its contextual uses. The preponderant thing is to convert both the senses and the contextual uses of an ambiguous item to vectors, then determine which of the possible conceptual meanings of the target word is closer to the given context.

References

[1]
Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. 2016. Farasa: A fast and furious segmenter for arabic. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’16).
[2]
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR : An easy-to-use framework for state-of-the-art NLP. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).
[3]
Alan Akbik, Tanja Bergmann, and Rol Vollgraf. 2019. Pooled contextualized embeddings for named entity recognition. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).
[4]
Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the International Conference on Computational Linguistics (COLING’18).
[5]
Almoataz B. Al-Said. 2015. The historical arabic dictionary resources. J. Arab Lang. 129 (2015).
[6]
Almoataz B. Al-Said and Lucía Medea-García. 2014. The historical arabic dictionary corpus and its suitability for a grammaticalization approach. In Proceedings of the 5th International Conference in Linguistics.
[7]
Marwah Alian, Arafat Awajan, and Akram Al-Kouz. 2016. Arabic word sense disambiguation using wikipedia. Int. J. Comput. Info. Sci. 12 (2016), 857--867.
[8]
Marwah Alian, Arafat Awajan, and Akram Al-Kouz. 2017. Arabic word sense disambiguation—Survey. In Proceedings of the International Conference on New Trends in Computing Sciences.
[9]
Ali Alkhatlan, Jugal Kalita, and Ahmed Alhaddad. 2018. Word sense disambiguation for arabic exploiting arabic wordnet and word embedding and word embedding. In Proceedings of the 4th International Conference On Arabic Compitational Linguistics (ACLing’18).
[10]
Jiang Bian, Bin Gao, and Tie-Yan Liu. 2014. Knowledge-powered deep learning for word embedding. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECMLPKDD’14).
[11]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135--146.
[12]
Nadia Bouhriz, Faouzia Benabbou, and El Habib Ben Lahmar. 2016. Word sense disambiguation approach for arabic text. Int. J. Adv. Comput. Sci. and Appl. 7, 4 (2016).
[13]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing:Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning.
[14]
Arjun Das, Debasis Ganguly, and Utpal Garain. 2017. Named entity recognition with word embeddings and wikipedia categories for a low-resource language. ACM Trans. Asian Low-Res. Lang. Info. Process. 16, 3 (2017).
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).
[16]
O. Dongsuk, Sunjae Kwon, Kyungsun Kim, and Youngjoong Ko. 2018. WordSense disambiguation based on word similarity calculation using word vector representation from a knowledge-based graph. In Proceedings of the 27th International Conference on Computational Linguistics.
[17]
Jibril Frej, Jean-Pierre Chevallet, and Didier Schwab. 2018. Enhancing translation language models with word embedding for information retrieval. Comput. Res. Repos. (2018), 1801.03844.
[18]
Zellig S. Harris. 1954. Distributional structure. Word 10, 2--3 (1954).
[19]
Mustafa Jarrar. 2018. The Arabic Ontology Basics. Retrieved from http://www.jarrar.info/courses/Jarrar.LectureNotes.ArabicOntology.pdf.
[20]
Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2018. Word sense disambiguation to create a historical dictionary for arabic language. In Proceedings of the 8th International Conference on Computer Science and Information Technology (CSIT’18).
[21]
Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2018. Word2vec for arabic word sense disambiguation. In Proceedings of the International Conference on Natural Language 8 Information Systems (NLDB’18).
[22]
Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2020. Towards a historical dictionary for arabic language. Int. J. Speech Technol. (2020).
[23]
Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. A deep dive into word sense disambiguation with LSTM. In Proceedings of the International Conference on Computational Linguistics.
[24]
Yuncong Li, Cunxiang Yin, Ting Wei, Huiqiang Zhong, Jinchang Luo, Siqi Xu, and Xiaohui Wu. 2019. A joint model for aspect-category sentiment analysis with contextualized aspect embedding. Comput. Res. Repos. (2019), 1908.11017.
[25]
Mohamed El Bachir Menai. 2014. Word sense disambiguation using evolutionary algorithms—Application to Arabic language. Comput. Hum. Behav. 41 (2014), 92--103.
[26]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).
[27]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Adgate Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems.
[28]
Andriy Mnih and Geoffrey E. Hinton. 2008. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems. MIT Press.
[29]
Korawit Orkphol and Wu Yang. 2019. Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. Big Data Anal. Artific. Intell. (2019).
[30]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).
[31]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’18).
[32]
Barbara Plank, Anders Søgaard, and Yoav Goldberg. 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the Association for Computational Linguistics (ACL’16).
[33]
Sanjana Ramprasad and James Maddox. 2019. CoKE : Word sense induction using contextualized knowledge embeddings. In Proceedings of the Spring Symposium on Combining Machine Learning with Knowledge Engineering.
[34]
Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classification and clustering of arguments with contextualized word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
[35]
Motaz Saad and Wesam Ashour. 2010. OSAC: Open source arabic corpora. In Proceedings of the International Conference on Electrical and Computer Systems.
[36]
Joaquim Santos, Juliano Terra, Bernardo Consoli, and Renata Vieira. 2019. Multidomain contextual embeddings for namedentity recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF’19).
[37]
Bianca Scarlini, Tommaso Pasini, and Roberto Navigli. 2020. SensEmBERT: Context-enhanced sense embeddings for multilingual word sense disambiguation. In TProceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).
[38]
Didier Schwab, Laurent Besacier, Jérémy Ferrero, and Frédéric Agnès. 2017. Using word embedding for cross-language plagiarism detection. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL’17).
[39]
Miran Seok, Hye-Jeong Song, Chan-Young Park, Jong-Dae Kim, and Yu-Seop Kim. 2016. Named entity recognition using word embedding as a feature. Int. J. Softw. Eng. Appl. (2016).
[40]
D. Shashavali, V. Vishwjeet, Rahul Kumar, Gaurav Mathur, Nikhil Nihal, Siddhartha Mukherjee, and Suresh Venkanagouda Patil. 2019. Sentence similarity techniques for short vs variable length text using word embeddings. Comput. Sist. 23, 3 (2019).
[41]
Dima Suleiman, Arafat Awajan, and Nailah Al-Madi. 2017. Deep learning-based technique for plagiarism detection in arabic texts. In Proceedings of the International Conference on New Trends in Computing Sciences (ICTCS’17).
[42]
Dongfang Xu, Egoitz Laparra, and Steven Bethard. 2019. Pre-trained contextualized character embeddings lead to major improvements in time normalization: A detailed analysis. In Proceedings of the 8th Joint Conference on Lexical and Computational Semantics (SEM’19).
[43]
Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. 2016. Semi supervised word sense disambiguation with neural models. In Proceedings of the International Conference on Computational Linguistics (COLING’16).
[44]
Anis Zouaghi, Laroussi Merhbene, and Mounir Zrigui. 2012. Combination of information retrieval methods with LESK algorithm for arabic word sense disambiguation. Artific. Intell. Rev. (2012).
[45]
Anis Zouaghi, Laroussi Merhbene, and Mounir Zrigui. 2012. A hybrid approach for arabic word sense disambiguation. Int. J. Comput. Process. Lang. (2012).

Cited By

View all
  • (2023)The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense DisambiguationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360577722:8(1-23)Online publication date: 24-Aug-2023
  • (2023)Fast and Accurate Framework for Ontology Matching in Web of ThingsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357870822:5(1-19)Online publication date: 17-Jan-2023
  • (2022)Towards a Historical Ontology for Arabic Language: Investigation and Future DirectionsIntelligent Systems Design and Applications10.1007/978-3-030-96308-8_100(1078-1087)Online publication date: 27-Mar-2022
  • Show More Cited By

Index Terms

  1. Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 6
    November 2020
    277 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3426881
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2020
    Accepted: 01 July 2020
    Revised: 01 May 2020
    Received: 01 February 2020
    Published in TALLIP Volume 19, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Natural language processing
    2. contemporary arabic
    3. contextualized word embeddings
    4. historical dictionary
    5. middle-age arabic
    6. old arabic
    7. recurrent neural networks
    8. word sense disambiguation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense DisambiguationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360577722:8(1-23)Online publication date: 24-Aug-2023
    • (2023)Fast and Accurate Framework for Ontology Matching in Web of ThingsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357870822:5(1-19)Online publication date: 17-Jan-2023
    • (2022)Towards a Historical Ontology for Arabic Language: Investigation and Future DirectionsIntelligent Systems Design and Applications10.1007/978-3-030-96308-8_100(1078-1087)Online publication date: 27-Mar-2022
    • (2021)Arabic Gloss WSD Using BERTApplied Sciences10.3390/app1106256711:6(2567)Online publication date: 13-Mar-2021
    • (2021)Principal Component Analysis and Prediction of Students’ Physical Health Standard Test Results Based on Recurrent Convolution Neural NetworkWireless Communications & Mobile Computing10.1155/2021/24386562021Online publication date: 1-Jan-2021
    • (2021)HIR: A Hybrid IR Ranking Model2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC51774.2021.00256(1717-1722)Online publication date: Jul-2021
    • (2021)SemApp: A Semantic Approach to Enhance Information RetrievalComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_6(62-78)Online publication date: 11-Sep-2021

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media