Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Arabic text classification using Polynomial Networks

Published: 01 October 2015 Publication History

Abstract

In this paper, an Arabic statistical learning-based text classification system has been developed using Polynomial Neural Networks. Polynomial Networks have been recently applied to English text classification, but they were never used for Arabic text classification. In this research, we investigate the performance of Polynomial Networks in classifying Arabic texts. Experiments are conducted on a widely used Arabic dataset in text classification: Al-Jazeera News dataset. We chose this dataset to enable direct comparisons of the performance of Polynomial Networks classifier versus other well-known classifiers on this dataset in the literature of Arabic text classification. Results of experiments show that Polynomial Networks classifier is a competitive algorithm to the state-of-the-art ones in the field of Arabic text classification.

References

[1]
J. Ababneh, O. Almomani, W. Hadi, N. Kamel, T. El-Omari, A. Al-Ibrahim, Vector space models to classify Arabic text, Int. J. Comput. Trends Technol. (IJCTT), 7 (2014) 219-223.
[2]
Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A., 2008. Automatic Arabic text classification. In: JADT;08, France, pp. 77-83.
[3]
S. Al-Saleem, Associative classification to categorize Arabic data sets, Int. J. ACM Jordan, 1 (2010) 118-127.
[4]
S. Al-Saleem, Automated Arabic text categorization using SVM and NB, Int. Arab J. e-Technol., 2 (2011) 124-128.
[5]
Al-Shalabi, R., Kannan, G., Gharaibeh, H., 2006. Arabic text categorization using KNN algorithm. In: The Proc. of Int. Multi Conf. on Computer Science and Information Technology CSIT06.
[6]
M.M. Al-Tahrawi, The role of rare terms in enhancing the performance of polynomial networks based text categorization, J. Intell. Learn. Syst. Appl., 5 (2013) 84-89.
[7]
M.M. AL-Tahrawi, The significance of low frequent terms in text classification, Int. J. Intell. Syst., 29 (2014) 389-406.
[8]
M.M. AL-Tahrawi, Class-based aggressive feature selection for polynomial networks text classifiers - an empirical study, U.P.B. Sci. Bull. Ser. C, 77 (2015) 93-110.
[9]
M.M. AL-Tahrawi, R. Abu Zitar, Polynomial networks versus other techniques in text categorization, Int. J. Pattern Recognit. Artif. Intell. (IJPRAI), 22 (2008) 295-322.
[10]
K. Assaleh, M. Al-Rousan, A new method for Arabic sign language recognition, in: EURASIP J Appl Signal Processing, Hindawi Publishing Corporation, New York, 2005, pp. 2136-2145.
[11]
W.A. Awad, Machine learning algorithms in web page classification, Int. J. Comput. Sci. Inf. Technol. (IJCSIT), 4 (2012) 93-101.
[12]
Belkebir, R., Guessoum, A., 2013. A hybrid BSO-Chi2-SVM approach to Arabic text categorization. In: IEEE Computer Systems and Applications (AICCSA), 2013 ACS International Conference, 27-30 May 2013, Ifrane, pp. 1-7.
[13]
Campbell, W.M., Assaleh, K.T., Broun, C.C., 2001. A novel algorithm for training polynomial networks. In: Int NAISO Symp Information Science Innovations ISI'2001, Dubai, UAE, March 2001.
[14]
H.K. Chantar, D.W. Corne, Feature subset selection for Arabic document categorization using BPSO-KNN, IEEE, 546-551 (2011).
[15]
K. Crammer, Y. Singer, A family of additive online algorithms for category ranking, JMLR, 3 (2003) 1025-1058.
[16]
F. Debole, F. Sebastiani, An analysis of the relative hardness of Reuters-21578 subsets, JASIS, 56 (2005) 584-596.
[17]
Duwairi, R., 2005. A distance-based classifier for Arabic text categorization. In: The Proc. of the Int. Conf. on Data Mining DMIN'05, June, Las Vegas, USA, pp. 20-23.
[18]
R. Duwairi, Arabic text categorization, Int. Arab J. Inf. Technol., 4 (2007) 125-131.
[19]
Eldin, S., 2007. Development of a computer-based Arabic Lexicon. In: The Int. Symposium on Computers & Arabic Language, ISCAL, Riyadh, KSA.
[20]
M. Eldos, Arabic Text Data Mining: A Root Extractor for Dimensionality Reduction, ACTA Press, A Scientific and Technical Publishing Company, 2002.
[21]
A.M. El-Halees, Arabic text classification using maximum entropy, Islamic Univ. J., 15 (2007) 157-167.
[22]
A.M. El-Halees, A comparative study on arabic text classification, Egypt. Comput. Sci. J., 30 (2008).
[23]
El-Kourdi, M., Bensaid, A., Rachidi, T., 2004. Automatic Arabic document categorization based on the Naïve Bayes algorithm. In: The 20th Int. Conf. on Computational Linguistics, Geneva, August, 27, 2004.
[24]
Fang, Y.C., Parthasarathy, S., Schwartz, F., 2001. Using clustering to boost text classification. ICDM Workshop on Text Mining (TextDM'01).
[25]
Fodil, L., Sayoud, H., Ouamour, S., 2014. Theme classification of Arabic text: a statistical approach. In: Terminology and Knowledge Engineering 2014, Berlin, Germany, pp. 77-86.
[26]
K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.
[27]
S. Ghwanmeh, Applying clustering of hierarchical K-means-like algorithm on arabic language, Int. J. Info. Technol., 3 (2007) 168-172.
[28]
Harrag, F., El-Qawasmeh, E., 2009a. Neural Network for Arabic Text Classification. In: The Second International Conference on the Applications of Digital Information, London, UK, pp. 805-810.
[29]
F. Harrag, E. El-Qawasmeh, P. Pichappan, Improving Arabic text categorization using decision trees, IEEE, NDT'09 (2009) 110-115.
[30]
<http://www.InternetWorldStats.com>.
[31]
<http://zeus.cs.pacificu.edu/shereen/ArabicStemmerCode.zip>. (January, 2014).
[32]
T. Joachims, Learning to Classify Text Using SVM, Kluwer Academic Publishers, 2002.
[33]
G. Kanaan, R. Al-Shalabi, S. Ghwanmeh, A comparison of text-classification techniques applied to Arabic text, J. Am. Soc. Inform. Sci. Technol., 60 (2009) 1836-1844.
[34]
Khoja, S., Garside, R., 1999. Stemming Arabic text. Computing Department, Lancaster University, Lancaster. <http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps>. (January, 2014).
[35]
M. Khorsheed, A. Al-Thubaity, Comparative evaluation of text classification techniques using a large diverse Arabic dataset, Lang Resour. Eval. Springer, 47 (2013) 513-538.
[36]
Khreisat, L., 2006. Arabic Text Classification Using N-Gram Frequency Statistics: A Comparative Study. In: Proceedings of the 2006 International Conference on Data Mining (DMIN 2006), June 26-29, Las Vegas, Nevada, USA, pp. 78-82.
[37]
L. Larkey, M.E. Connell, Arabic information retrieval at UMass in TREC-10, in: Proceedings of TREC, NIST, Gaithersburg, 2001.
[38]
Lewis, D.D., Ringuette, M., 1994. A comparison of two learning algorithms for text categorization. In: Proc Third Ann Symp Document Analysis and Information Retrieval (SDAIR'94), Las Vegas, USA, pp. 81-93.
[39]
D. Lewis, Y. Yang, T.G. Rose, F. Li, A new benchmark collection for text categorization research, JMLR, 5 (2004) 361-397.
[40]
Liu C.L., 2006. Polynomial Network Classifier with Discriminative Feature Extraction, Joint IAPR International Workshops, SSPR 2006 and SPR 2006, Hong-Kong.
[41]
A.A. Mesleh, Chi square feature extraction based Svms Arabic language text categorization system, J. Comput. Sci., 3 (2007) 430-435.
[42]
Mohamed, S., Ata, W., Darwish, N., 2005. A new technique for automatic text categorization for Arabic documents. In: Proc. of the 5th IBIMA International Conference on Internet and Information Technology in Modern Organizations, Cairo, Egypt, pp. 13-15.
[43]
Said, D., Wanas, N., Darwish, N., Hegazy, N., 2009. A Study of Arabic Text preprocessing methods for Text Categorization. In: The 2nd Int. conf. on Arabic Language Resources and Tools, April, 22-23, Cairo, Egypt, pp. 230-236.
[44]
Sakhr Software Company's website: <www.sakhrsoft.com>, 2004.
[45]
Sawaf, H., Zaplo, J., Ney, H., 2001. Statistical classification methods for Arabic news articles. Arabic Natural Language Processing Workshop, ACL'2001, Toulouse, France, pp. 127-132.
[46]
Sawalha, M., Atwell, E., 2008. Comparative evaluation of Arabic language morphological analyzers and stemmers. In: The Proc. of COLING'2008 22nd Int. Conf. on Computational Linguistics, (poster volume), pp. 107-110.
[47]
F. Sebastiani, Machine learning in automated text categorization, ACM Comput. Surv., 34 (2002) 1-47.
[48]
B. Sharef, N. Omar, Z. Sharef, An automated Arabic text categorization based on the frequency ratio accumulation, Int. Arab J. Info. Technol., 11 (2014) 213-221.
[49]
Thabtah, F., Eljinini, M., Zamzeer, M., Hadi, W., 2009. Naïve Bayesian based on Chi square to categorize Arabic data. In: Proceedings of The 11th International Business Information Management Association Conference (IBIMA) Conference on Innovation and Knowledge Management in Twin Track Economies, Cairo, Egypt, pp. 930-935.
[50]
C.J. Van Rijsbergen, Information Retrieval, Butterworths, London, 1979.
[51]
Yahyaoui, M., 2001. Toward an Arabic web page classifier. Master project, AUI.
[52]
Yang, Y., Liu, X., 1999. A re-examination of text categorization methods. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), Berkeley, CA, pp. 42-49.
[53]
Z. Zheng, X. Wu, R. Srihari, Feature selection for text categorization on imbalanced data, SIGKDD Explorations, 6 (June 2004) 80-89.

Cited By

View all
  • (2024)Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data AugmentationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367904923:11(1-28)Online publication date: 21-Nov-2024
  • (2023)Analysis of Cursive Text Recognition Systems: A Systematic Literature ReviewACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359260022:7(1-30)Online publication date: 13-Apr-2023
  • (2022)Arabic text classification: the need for multi-labeling systemsNeural Computing and Applications10.1007/s00521-021-06390-z34:2(1135-1159)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of King Saud University - Computer and Information Sciences
Journal of King Saud University - Computer and Information Sciences  Volume 27, Issue 4
October 2015
116 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2015

Author Tags

  1. Arabic document categorization
  2. Arabic text classification
  3. Polynomial Networks

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data AugmentationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/367904923:11(1-28)Online publication date: 21-Nov-2024
  • (2023)Analysis of Cursive Text Recognition Systems: A Systematic Literature ReviewACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359260022:7(1-30)Online publication date: 13-Apr-2023
  • (2022)Arabic text classification: the need for multi-labeling systemsNeural Computing and Applications10.1007/s00521-021-06390-z34:2(1135-1159)Online publication date: 1-Jan-2022
  • (2020)Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/339009219:5(1-16)Online publication date: 1-Jul-2020

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media