Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Bengali paper classification using ensemble machine learning algorithms

Published: 01 January 2022 Publication History

Abstract

Text classification is one of the most challenging problems in natural language processing (NLP). Language models are at the heart of NLP. The ability to represent texts as numbers has given rise to many NLP tasks, for example, text categorisation, translation, and summarisation. Unfortunately, NLP for Bengali texts has not reached the state-of-art level of other Languages like English yet, mostly due to the scarcity of resources and the complexities seen in Bengali grammar. Therefore, not much work has been done in this field. In this paper, we have studied one of the word embedding methods, Word2vec, based on continuous bag of words (CBOW) with several ensemble machine learning algorithms, e.g., Adaptive Boosting Classifiers, Light Gradient Boosting Machine, XGboost, and random forest classifiers (RFC). The model is trained on a large corpus of Bengali newspapers of a considerable size that has 99283949 words and 8284804 sentences in 392772 documents. In our experiment, Word2vec CBOW model with XGboost algorithm performed much better than other models and achieved 92.24% accuracy.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Knowledge Engineering and Soft Data Paradigms
International Journal of Knowledge Engineering and Soft Data Paradigms  Volume 7, Issue 2
2022
61 pages
ISSN:1755-3210
EISSN:1755-3229
DOI:10.1504/ijkesdp.2022.7.issue-2
Issue’s Table of Contents

Publisher

Inderscience Publishers

Geneva 15, Switzerland

Publication History

Published: 01 January 2022

Author Tags

  1. NLP
  2. natural language processing
  3. categorisation
  4. document classification
  5. decision tree classifier

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media