Abstract
The main challenge of paraphrase is how to detect the semantic relationship between the suspect text document and the source text document. Nowadays, the combination of Natural Language Processing NLP and deep learning based approaches have a booming in the field of text analysis, including: text classification, machine translation, text similarity detection, etc. In this context, we proposed a deep learning based method to detect Arabic paraphrase composed by the following phases: First, we started with a preprocessing phase by extracting the relevant information from text document. Then, word2vec algorithm was used to generate word vectors representation which they would be combined subsequently to generate a sentence vectors representation. Finally, we used a Convolutional Neural Network CNN to improve the ability to capture statistical regularities in the context of sentences which then makes it possible to facilitate the similarity measurement operation between the representations of source and suspicious sentences. The evaluation of our proposed approach gave us a promising result in term of precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abakush I.: Methods and tools for plagiarism detection in Arabic documents. In: International Scientific Conference on ICT and E-Business Related Research SINTEZA, Serbia (2016)
Ramesh, N.R., Landge, M.B., Namrata, M.C.: A review on plagiarism detection tools. Int. J. Comput. Appl. IJCA 125(11), 16–22 (2015)
Cedeno, A.B., Gupta, P., Rosso, P.: Methods for Cross-Language Plagiarism Detection, vol. 50, pp. 211–217. Elsevier, Amsterdam (2013)
Samuel, F., Mark, S.: A semantic similarity approach to paraphrase detection. In: Proceedings of the Computational Linguistics UK CLUK, UK (2008)
Liu, Y., Sun, C., Lin, L., Zhao, Y., Wang, X.: Computing semantic text similarity using rich features. In: The 29th Pacific Asia Conference on Language, Information and Computing, PACLIC29, China (2015)
Ben Mohamed, M.A., Mallat, S., Nahdi, M.A., Zrigui, M.: Exploring the potential of schemes in building NLP tools for Arabic language. Int. Arab. J. Inf. Technol. IAJIT 6(12), 13–19 (2015)
Zrigui, S., Zouaghi, A., Ayadi, R., Zrigui, S., Zrigui, M.: ISAO: an intelligent system of opinion analysis. Res. Comput. Sci. 110, 21–30 (2016)
Meddeb, O., Maraoui, M., Aljawarneh, S.: Hybrid modeling of an offline Arabic handwriting recognition system AHRS. In: International Conference on Engineering & MIS, Maroc (2016)
Zouaghi, A., Zrigui, M., Antoniadis, G.: Compréhension automatique de la parole arabe spontanée. In: Traitement Automatique des Langues, Belgique (2008)
Saidan, T., Zrigui, M., Ahmed, M.B.: La transcription orthographique-phonetique de la langue arabe. In: RÉCITAL, Maroc (2004)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: The 31th International Conference on Machine Learning JMLR, vol. 32, pp. 1188–1196 (2014)
Ameer, A., Youssef, A., JuzaiddinAb, A.M.: Enhanced TF-IDF weighting scheme for plagiarism detection model for Arabic language. Aust. J. Basic Appl. Sci. AEDSI 9, 23, 90–96 (2015)
Omar, K., AlKhatib, B., Dashash, M.: Plagiarism detection in Arabic using translation man medical ontology. Int. J. Curr. Med. Pharm. Res. IJCMPR 2(9), 648–653 (2016)
Dumais, S.T., Letsche, T.A., Littman, M.L., Landaver, T.K.: Automatic cross-language retrieval using latent semantic indexing. In: Spring Symposium Series, Standford (1997)
Barron-Cedeno, A., Rosso, P., Pinto, D., Juan, A.: A cross lingual plagiarism analysis using a statistical model. In: PAN, India (2008)
Farhat, F., Asen, A.S., Zaher, M.A., Fahiem, A.M.: Detection plagiarism in Arabic E-learning using text mining. Britsh J. Math. Comput. Sci. BJMC 8(4), 298–308 (2015)
Rakian, S., Esfahani, F.S., Rastegari, H.: A Persian fuzzy plagiarism detection approach. J. Inf. Syst. Telecommun. JIST 3(3), 182–190 (2015)
Menai, M.E., Bagais, M.: A plag: a plagiarism checker for Arabic texts. In: International Conference on Computer Science & Education (ICCSE), Singapore (2011)
Niraula, N.B., Gantam, D., Banjadae, R., Mahayan, N., Rus, V.: Combining word representations for measuring word relatedness and similarity. In: Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, Florida (2015)
Kin, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Qatar (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Neural Information Processing Systems NIPS, USA (2013)
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Empirical Methods in Natural Language Processing EMNLP, Portugal (2015)
He, H., Wieting, J., Gimpel, K., Rao, J., Lin, J.: Attention-based multi-perspective convolutional neural networks for textual similarity measurement. In: International Workshop on Semantic Evaluation SemEval, California (2016)
Mohtarami, M., et al.: Neural-based approaches for ranking in community question answering. In: International Workshop on Semantic Evaluation SemEval, California (2016)
Zhou, L., Wang, H.: News authorship identification with deep learning. In: Conference and Labs of the Evaluation Forum, Portugal (2016)
Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: CVPR 2015, Boston (2015)
Lai, S., Xu, L., Liu, X., Zhao, J.: Reccurent convolutional neural networks for text classification. In: AAAI 2015 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Texas (2015)
Alaa, Z., Tiun, S., Abdulameer, M.: Cross-language plagiarism of Arabi-English documents using linear logistic regression. J. Theor. Appl. Inf. Technol. 1(83), 20–33 (2016)
Kahloula, B., Berri, J.: Plagiarism detection in Arabic documents: approaches, architecture and systems. J. Digit. Inf. Manag. 14(2), 124–135 (2016)
Liu, Y., Sun, C., Lin, L., Zhao, Y., Wang, X.: Computing semantic text similarity using rich features. In: The 29th Pacific Asia Conference on Language, Information and Computation PACLIC29, Shanghai (2015)
Altszyler, E., Sigman, M., Selzak, D.F.: Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. In: Super Corr Expo Orlando, USA (2016)
Alzahrani, S.: Arabic plagiarism detection using word correlation in N-Grams with K-overlapping approach. In: Working Notes for PAN-ArabPlagDet at FIRE, Gandhinagar (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Mahmoud, A., Zrigui, A., Zrigui, M. (2018). A Text Semantic Similarity Approach for Arabic Paraphrase Detection. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-77116-8_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)