Abstract
Text classification has a growing interest among NLP researchers due to its tremendous availability on online platforms and emergence on various Web 2.0 applications. Recently, text classification in resource-constrained languages has been bringing much attention due to the sharp increase of digital resources. This paper presents a CNN based text classification model for one of the low resource languages like Bengali. The goal of the Bengali text classification is to assign a particular category to a text into one of the pre-defined categories based on its semantic and syntactic meaning. The proposed system comprises of four key modules: embedding model generation, Text to feature representation, training, and testing. The classification system trained and validated with 39, 079 and 6, 000 text datasets. Experimental evaluation with 9, 779 test datasets shows the accuracy of \(96.85\%\), which indicates the superior performance compared to the existing techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Phani, S., Lahiri, S., Biswas, A.: A supervised learning approach for authorship attribution of Bengali literary texts. ACM Trans. Asian Low Resour. Lang. Inf. Process 16(4), 1–15 (2017)
Hossain, M.R., Hoque, M.M.: Automatic Bengali Document Categorization Based on Deep Convolution Nets, Emerging Research in Computing, Information, Communication and Applications, vol. 882. Springer, Singapore (2019)
Utomo, M.R.A., Sibaroni, Y.: Text Classification of British English and American English using support vector machine. In: Proceedings of International Conference on ICoICT, pp. 1–6 (2019)
Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. J. Inf. Pro. Man 57(1) (2020)
Xie, J., Hou, Y., Wang, Y., et al.: Chinese text classification based on attention mechanism and feature-enhanced fusion neural network. Computing 102, 683–700 (2020)
Mikolov, T., Chen, K., Corrado G., Dean, J.: Efficient Estimation of Word Representations in Vector Space, Journal of CoRR, (2013)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. J CoRR abs/1607.04606 (2016)
Kowsari, K., Brown, D.E., Heidarysafa, M., et al.: Hierarchical deep learning for text classification. In: 16th IEEE ICMLA, Cancun, Mexico pp. 364–371, December 2017
Karim, M.R., Chakravarthi, B.R., McCrae J.P., Cochez, M.: Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network, arXiv:2004.07807 (2020)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and Word2Vec for text classification with semantic features. In: Proceedings of on ICCI*CC, pp. 136–140 (2015)
Yeh, C.L., Loni, B., Schuth, A.: Tom Jumbo-Grumbo at SemEval-2019 Task 4: hyperpartisan news detection with GloVe vectors and SVM. In: Proceedings of the International Workshop on Semantic Evaluation, ACL (2019)
Alghamdi N., Assiri, F.: A comparison of FastText implementations using Arabic text classification. In: Intelligent Systems and Applications, vol. 1038, pp. 306–311. Springer, Cham (2019)
Jang, B., Kim, I., Kim, J.W.: Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE 14(8), e0220976 (2019)
Parwez, M.A., Abulaish, M., Jahiruddin: Multi-label classification of microblogging texts using convolution neural network. IEEE Access 7, 68678–68691 (2019)
Pal, K., Patel, B.V.: Automatic multiclass document classification of Hindi poems using machine learning techniques. In: 2020 International Conference for Emerging Technology (INCET), pp. 1–5, Belgaum, India (2020)
Oshi, R., Goel, P., Joshi, R.: Deep learning for Hindi text classification: a comparison. In: Intelligent Human Computer Interaction (IHCI) 2019. Lecture Notes in Computer Science, vol. 11886. Springer, Cham (2019)
Rahman, M., Haque, R., Saurav, Z.R.: Identifying and categorizing opinions expressed in Bangla sentences using deep learning technique. Int. J. Comput. Appl. 176(17), 13–17 (2020)
Acknowledgement
This work was supported by the University Grants Commission of Bangladesh.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hossain, M.R., Hoque, M.M., Sarker, I.H. (2021). Text Classification Using Convolution Neural Networks with FastText Embedding. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-73050-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73049-9
Online ISBN: 978-3-030-73050-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)