PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1170))

Included in the following conference series:

International Conference on Information, Communication and Computing Technology

431 Accesses

Abstract

The Part-of-speech (POS) tagging is mandatory for almost all kinds of Natural Language Processing (NLP) tasks such as Grammar checking, Machine translation, summary writing, sentiment analysis, information retrievals, and speech processing etc. Having very few successful researches on computational linguistics in Bangla language, it still remains the demand for technology. The existing works on Bangla parts-of-speech tagging require large training data set and not applicable for all language styles. In this research, we proposed Prediction Maximization Model (PMM) for Bangla parts-of-speech tagging. We used statistical data for learning and used rule-based analysis. Hidden Markov Model (HMM) is applied with tag mapping and scoring in PMM to maximize the accuracy by using relatively less statistical training data. PMM achieved 95.6% accuracy that is relatively high compared with two other existing POS tagger which claims the nearest accuracy but with the relatively much higher number of training data sets. In our experiment, we used around 14K unique token as training data for PMM and the other two existing systems and PMM performed best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Stochastic Based Part of Speech Tagging in Mizo Language: Unigram and Bigram Hidden Markov Model

Development of HMM Based Parts of Speech Tagger for Hadoti

Part of Speech (PoS) Tagging for Konkani Language Using HMM

References

Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1–8 (2002)
Google Scholar
Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: the SVM approach revisited. Ranlp, pp. 153–163 (2003)
Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)
MATH Google Scholar
Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, 8 June, pp. 282–289 (2001)
Google Scholar
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for Sighan Bakeoff 2005. Word J. Int. Linguist. Assoc. (X), 168–171 (2005)
Google Scholar
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proceedings of the Twentieth International Conference on Machine Learning – ICML 2003, pp. 3–10 (2003)
Google Scholar
Paul, A.K.: A fine-grained tagset for Bangla language. SUST J. Sci. Technol. 21(1), 1–8 (2014)
Google Scholar
Junejo, S.A.: Research report on (37), pp. 0–33 (n.d.)
Google Scholar
Dasgupta, S., Khan, N., Sarkar, A.I., Shahriar, D., Pavel, H., Khan, M.: Morphological analysis of inflecting compound words in Bangla (n.d.)
Google Scholar
Seddiqui, H., Al, A., Maruf, M., Chy, A.N.: Recursive suffix stripping to augment Bangla stemmer (n.d.)
Google Scholar
Islam, S., Das, J.K.: A new approach: automatically identify naming word from Bangla sentence for machine translation. Int. J. Adv. Sci. Technol. 74, 49–62 (2015)
Article Google Scholar
Dasgupta, S., Ng, V.: Unsupervised part-of-speech acquisition for resource-scarce languages, pp. 218–227 (2007)
Google Scholar
Rapp, R.: A practical solution to the problem of automatic part-of-speech induction from text. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 77–80 (2005)
Google Scholar
Ekbal, A., Bandyopadhyay, S.: Part of speech tagging in Bangla using support vector machine. In: Proceedings - 11th International Conference on Information Technology, ICIT 2008, pp. 106–111 (2008)
Google Scholar
Ekbal, A.: Bangla part of speech tagging using conditional random field. In: Proceedings of Seventh International Symposium on Natural Language Processing (SNLP2007), June 2017, pp. 131–136 (2007). Accessed
Google Scholar
Alam, F., Chowdhury, S.A., Noori, S.R.H.: Bidirectional LSTMs - CRFs networks for bangla POS tagging. In: 19th International Conference on Computer and Information Technology, ICCIT 2016, pp. 377–382 (2017)
Google Scholar
Ali, H.: An unsupervised parts-of-speech tagger for the Bangla language (n.d.)
Google Scholar
Ekbal, A., Hasanuzzaman, M., Bandyopadhyay, S.: Voted approach for part of speech tagging in Bangla. In: PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 1, pp. 120–129 (2009)
Google Scholar
Dandapat, S., Sarkar, S., Basu, A.: A hybrid model for part-of-speech tagging and its application to Bangla. In: International Conference on Computational Intelligence, December, pp. 169–172 (2004)
Google Scholar
Hoque, Md.N., Seddiqui, M.H.: Bangla parts-of-speech tagging using Bangla stemmer and rule based analyzer. In: 2015 18th International Conference on Computer and Information Technology, ICCIT 2015, pp. 440–444 (2016)
Google Scholar
Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Daffodil International University, Dhaka, Bangladesh
Prosanta Kumar Chaki, Md Mozammel Hossain Sazal & Shikha Anirban
BGMEA University of Fashion and Technology, Dhaka, Bangladesh
Biman Barua

Authors

Prosanta Kumar Chaki
View author publications
You can also search for this author in PubMed Google Scholar
Biman Barua
View author publications
You can also search for this author in PubMed Google Scholar
Md Mozammel Hossain Sazal
View author publications
You can also search for this author in PubMed Google Scholar
Shikha Anirban
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Prosanta Kumar Chaki , Biman Barua , Md Mozammel Hossain Sazal or Shikha Anirban .

Editor information

Editors and Affiliations

Computer and Information Technology, University of Craiova, Craiova, Romania
Costin Badica
Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates
Panos Liatsis
Department of IT, Jagan Institute of Management Studies, Delhi, India
Latika Kharb
Department of IT, Jagan Institute of Management Studies, Delhi, India
Deepak Chahal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaki, P.K., Barua, B., Sazal, M.M.H., Anirban, S. (2020). PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map. In: Badica, C., Liatsis, P., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2020. Communications in Computer and Information Science, vol 1170. Springer, Singapore. https://doi.org/10.1007/978-981-15-9671-1_15

Download citation

DOI: https://doi.org/10.1007/978-981-15-9671-1_15
Published: 05 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9670-4
Online ISBN: 978-981-15-9671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stochastic Based Part of Speech Tagging in Mizo Language: Unigram and Bigram Hidden Markov Model

Development of HMM Based Parts of Speech Tagger for Hadoti

Part of Speech (PoS) Tagging for Konkani Language Using HMM

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Stochastic Based Part of Speech Tagging in Mizo Language: Unigram and Bigram Hidden Markov Model

Development of HMM Based Parts of Speech Tagger for Hadoti

Part of Speech (PoS) Tagging for Konkani Language Using HMM

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation