Nothing Special   »   [go: up one dir, main page]

Skip to main content

Text Classification Using Convolution Neural Networks with FastText Embedding

  • Conference paper
  • First Online:
Hybrid Intelligent Systems (HIS 2020)

Abstract

Text classification has a growing interest among NLP researchers due to its tremendous availability on online platforms and emergence on various Web 2.0 applications. Recently, text classification in resource-constrained languages has been bringing much attention due to the sharp increase of digital resources. This paper presents a CNN based text classification model for one of the low resource languages like Bengali. The goal of the Bengali text classification is to assign a particular category to a text into one of the pre-defined categories based on its semantic and syntactic meaning. The proposed system comprises of four key modules: embedding model generation, Text to feature representation, training, and testing. The classification system trained and validated with 39, 079 and 6, 000 text datasets. Experimental evaluation with 9, 779 test datasets shows the accuracy of \(96.85\%\), which indicates the superior performance compared to the existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Phani, S., Lahiri, S., Biswas, A.: A supervised learning approach for authorship attribution of Bengali literary texts. ACM Trans. Asian Low Resour. Lang. Inf. Process 16(4), 1–15 (2017)

    Article  Google Scholar 

  2. Hossain, M.R., Hoque, M.M.: Automatic Bengali Document Categorization Based on Deep Convolution Nets, Emerging Research in Computing, Information, Communication and Applications, vol. 882. Springer, Singapore (2019)

    Google Scholar 

  3. Utomo, M.R.A., Sibaroni, Y.: Text Classification of British English and American English using support vector machine. In: Proceedings of International Conference on ICoICT, pp. 1–6 (2019)

    Google Scholar 

  4. Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. J. Inf. Pro. Man 57(1) (2020)

    Google Scholar 

  5. Xie, J., Hou, Y., Wang, Y., et al.: Chinese text classification based on attention mechanism and feature-enhanced fusion neural network. Computing 102, 683–700 (2020)

    Article  MathSciNet  Google Scholar 

  6. Mikolov, T., Chen, K., Corrado G., Dean, J.: Efficient Estimation of Word Representations in Vector Space, Journal of CoRR, (2013)

    Google Scholar 

  7. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  8. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. J CoRR abs/1607.04606 (2016)

    Google Scholar 

  9. Kowsari, K., Brown, D.E., Heidarysafa, M., et al.: Hierarchical deep learning for text classification. In: 16th IEEE ICMLA, Cancun, Mexico pp. 364–371, December 2017

    Google Scholar 

  10. Karim, M.R., Chakravarthi, B.R., McCrae J.P., Cochez, M.: Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network, arXiv:2004.07807 (2020)

  11. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  Google Scholar 

  12. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and Word2Vec for text classification with semantic features. In: Proceedings of on ICCI*CC, pp. 136–140 (2015)

    Google Scholar 

  13. Yeh, C.L., Loni, B., Schuth, A.: Tom Jumbo-Grumbo at SemEval-2019 Task 4: hyperpartisan news detection with GloVe vectors and SVM. In: Proceedings of the International Workshop on Semantic Evaluation, ACL (2019)

    Google Scholar 

  14. Alghamdi N., Assiri, F.: A comparison of FastText implementations using Arabic text classification. In: Intelligent Systems and Applications, vol. 1038, pp. 306–311. Springer, Cham (2019)

    Google Scholar 

  15. Jang, B., Kim, I., Kim, J.W.: Word2Vec convolutional neural networks for classification of news articles and tweets. PLOS ONE 14(8), e0220976 (2019)

    Article  Google Scholar 

  16. Parwez, M.A., Abulaish, M., Jahiruddin: Multi-label classification of microblogging texts using convolution neural network. IEEE Access 7, 68678–68691 (2019)

    Google Scholar 

  17. Pal, K., Patel, B.V.: Automatic multiclass document classification of Hindi poems using machine learning techniques. In: 2020 International Conference for Emerging Technology (INCET), pp. 1–5, Belgaum, India (2020)

    Google Scholar 

  18. Oshi, R., Goel, P., Joshi, R.: Deep learning for Hindi text classification: a comparison. In: Intelligent Human Computer Interaction (IHCI) 2019. Lecture Notes in Computer Science, vol. 11886. Springer, Cham (2019)

    Google Scholar 

  19. Rahman, M., Haque, R., Saurav, Z.R.: Identifying and categorizing opinions expressed in Bangla sentences using deep learning technique. Int. J. Comput. Appl. 176(17), 13–17 (2020)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the University Grants Commission of Bangladesh.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Moshiul Hoque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hossain, M.R., Hoque, M.M., Sarker, I.H. (2021). Text Classification Using Convolution Neural Networks with FastText Embedding. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_11

Download citation

Publish with us

Policies and ethics