Nothing Special   »   [go: up one dir, main page]

Skip to main content

PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map

  • Conference paper
  • First Online:
Information, Communication and Computing Technology (ICICCT 2020)

Abstract

The Part-of-speech (POS) tagging is mandatory for almost all kinds of Natural Language Processing (NLP) tasks such as Grammar checking, Machine translation, summary writing, sentiment analysis, information retrievals, and speech processing etc. Having very few successful researches on computational linguistics in Bangla language, it still remains the demand for technology. The existing works on Bangla parts-of-speech tagging require large training data set and not applicable for all language styles. In this research, we proposed Prediction Maximization Model (PMM) for Bangla parts-of-speech tagging. We used statistical data for learning and used rule-based analysis. Hidden Markov Model (HMM) is applied with tag mapping and scoring in PMM to maximize the accuracy by using relatively less statistical training data. PMM achieved 95.6% accuracy that is relatively high compared with two other existing POS tagger which claims the nearest accuracy but with the relatively much higher number of training data sets. In our experiment, we used around 14K unique token as training data for PMM and the other two existing systems and PMM performed best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1–8 (2002)

    Google Scholar 

  2. Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: the SVM approach revisited. Ranlp, pp. 153–163 (2003)

    Google Scholar 

  3. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)

    MATH  Google Scholar 

  4. Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)

    Google Scholar 

  5. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, 8 June, pp. 282–289 (2001)

    Google Scholar 

  6. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for Sighan Bakeoff 2005. Word J. Int. Linguist. Assoc. (X), 168–171 (2005)

    Google Scholar 

  7. Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proceedings of the Twentieth International Conference on Machine Learning – ICML 2003, pp. 3–10 (2003)

    Google Scholar 

  8. Paul, A.K.: A fine-grained tagset for Bangla language. SUST J. Sci. Technol. 21(1), 1–8 (2014)

    Google Scholar 

  9. Junejo, S.A.: Research report on (37), pp. 0–33 (n.d.)

    Google Scholar 

  10. Dasgupta, S., Khan, N., Sarkar, A.I., Shahriar, D., Pavel, H., Khan, M.: Morphological analysis of inflecting compound words in Bangla (n.d.)

    Google Scholar 

  11. Seddiqui, H., Al, A., Maruf, M., Chy, A.N.: Recursive suffix stripping to augment Bangla stemmer (n.d.)

    Google Scholar 

  12. Islam, S., Das, J.K.: A new approach: automatically identify naming word from Bangla sentence for machine translation. Int. J. Adv. Sci. Technol. 74, 49–62 (2015)

    Article  Google Scholar 

  13. Dasgupta, S., Ng, V.: Unsupervised part-of-speech acquisition for resource-scarce languages, pp. 218–227 (2007)

    Google Scholar 

  14. Rapp, R.: A practical solution to the problem of automatic part-of-speech induction from text. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 77–80 (2005)

    Google Scholar 

  15. Ekbal, A., Bandyopadhyay, S.: Part of speech tagging in Bangla using support vector machine. In: Proceedings - 11th International Conference on Information Technology, ICIT 2008, pp. 106–111 (2008)

    Google Scholar 

  16. Ekbal, A.: Bangla part of speech tagging using conditional random field. In: Proceedings of Seventh International Symposium on Natural Language Processing (SNLP2007), June 2017, pp. 131–136 (2007). Accessed

    Google Scholar 

  17. Alam, F., Chowdhury, S.A., Noori, S.R.H.: Bidirectional LSTMs - CRFs networks for bangla POS tagging. In: 19th International Conference on Computer and Information Technology, ICCIT 2016, pp. 377–382 (2017)

    Google Scholar 

  18. Ali, H.: An unsupervised parts-of-speech tagger for the Bangla language (n.d.)

    Google Scholar 

  19. Ekbal, A., Hasanuzzaman, M., Bandyopadhyay, S.: Voted approach for part of speech tagging in Bangla. In: PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 1, pp. 120–129 (2009)

    Google Scholar 

  20. Dandapat, S., Sarkar, S., Basu, A.: A hybrid model for part-of-speech tagging and its application to Bangla. In: International Conference on Computational Intelligence, December, pp. 169–172 (2004)

    Google Scholar 

  21. Hoque, Md.N., Seddiqui, M.H.: Bangla parts-of-speech tagging using Bangla stemmer and rule based analyzer. In: 2015 18th International Conference on Computer and Information Technology, ICCIT 2015, pp. 440–444 (2016)

    Google Scholar 

  22. Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Prosanta Kumar Chaki , Biman Barua , Md Mozammel Hossain Sazal or Shikha Anirban .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chaki, P.K., Barua, B., Sazal, M.M.H., Anirban, S. (2020). PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map. In: Badica, C., Liatsis, P., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2020. Communications in Computer and Information Science, vol 1170. Springer, Singapore. https://doi.org/10.1007/978-981-15-9671-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-9671-1_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-9670-4

  • Online ISBN: 978-981-15-9671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics