Abstract
The Part-of-speech (POS) tagging is mandatory for almost all kinds of Natural Language Processing (NLP) tasks such as Grammar checking, Machine translation, summary writing, sentiment analysis, information retrievals, and speech processing etc. Having very few successful researches on computational linguistics in Bangla language, it still remains the demand for technology. The existing works on Bangla parts-of-speech tagging require large training data set and not applicable for all language styles. In this research, we proposed Prediction Maximization Model (PMM) for Bangla parts-of-speech tagging. We used statistical data for learning and used rule-based analysis. Hidden Markov Model (HMM) is applied with tag mapping and scoring in PMM to maximize the accuracy by using relatively less statistical training data. PMM achieved 95.6% accuracy that is relatively high compared with two other existing POS tagger which claims the nearest accuracy but with the relatively much higher number of training data sets. In our experiment, we used around 14K unique token as training data for PMM and the other two existing systems and PMM performed best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1–8 (2002)
Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: the SVM approach revisited. Ranlp, pp. 153–163 (2003)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)
Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, 8 June, pp. 282–289 (2001)
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for Sighan Bakeoff 2005. Word J. Int. Linguist. Assoc. (X), 168–171 (2005)
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proceedings of the Twentieth International Conference on Machine Learning – ICML 2003, pp. 3–10 (2003)
Paul, A.K.: A fine-grained tagset for Bangla language. SUST J. Sci. Technol. 21(1), 1–8 (2014)
Junejo, S.A.: Research report on (37), pp. 0–33 (n.d.)
Dasgupta, S., Khan, N., Sarkar, A.I., Shahriar, D., Pavel, H., Khan, M.: Morphological analysis of inflecting compound words in Bangla (n.d.)
Seddiqui, H., Al, A., Maruf, M., Chy, A.N.: Recursive suffix stripping to augment Bangla stemmer (n.d.)
Islam, S., Das, J.K.: A new approach: automatically identify naming word from Bangla sentence for machine translation. Int. J. Adv. Sci. Technol. 74, 49–62 (2015)
Dasgupta, S., Ng, V.: Unsupervised part-of-speech acquisition for resource-scarce languages, pp. 218–227 (2007)
Rapp, R.: A practical solution to the problem of automatic part-of-speech induction from text. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 77–80 (2005)
Ekbal, A., Bandyopadhyay, S.: Part of speech tagging in Bangla using support vector machine. In: Proceedings - 11th International Conference on Information Technology, ICIT 2008, pp. 106–111 (2008)
Ekbal, A.: Bangla part of speech tagging using conditional random field. In: Proceedings of Seventh International Symposium on Natural Language Processing (SNLP2007), June 2017, pp. 131–136 (2007). Accessed
Alam, F., Chowdhury, S.A., Noori, S.R.H.: Bidirectional LSTMs - CRFs networks for bangla POS tagging. In: 19th International Conference on Computer and Information Technology, ICCIT 2016, pp. 377–382 (2017)
Ali, H.: An unsupervised parts-of-speech tagger for the Bangla language (n.d.)
Ekbal, A., Hasanuzzaman, M., Bandyopadhyay, S.: Voted approach for part of speech tagging in Bangla. In: PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 1, pp. 120–129 (2009)
Dandapat, S., Sarkar, S., Basu, A.: A hybrid model for part-of-speech tagging and its application to Bangla. In: International Conference on Computational Intelligence, December, pp. 169–172 (2004)
Hoque, Md.N., Seddiqui, M.H.: Bangla parts-of-speech tagging using Bangla stemmer and rule based analyzer. In: 2015 18th International Conference on Computer and Information Technology, ICCIT 2015, pp. 440–444 (2016)
Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chaki, P.K., Barua, B., Sazal, M.M.H., Anirban, S. (2020). PMM: A Model for Bangla Parts-of-Speech Tagging Using Sentence Map. In: Badica, C., Liatsis, P., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2020. Communications in Computer and Information Science, vol 1170. Springer, Singapore. https://doi.org/10.1007/978-981-15-9671-1_15
Download citation
DOI: https://doi.org/10.1007/978-981-15-9671-1_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9670-4
Online ISBN: 978-981-15-9671-1
eBook Packages: Computer ScienceComputer Science (R0)