Abstract
This work is implemented using the Naïve Bayes probabilistic model. The whole task is implemented in two phases. First, the algorithm was tested on a dataset from the Bengali corpus, which was developed in the TDIL (Technology Development for the Indian Languages) project of the Govt. of India. In the first execution of the algorithm, the accuracy of result was nearly 80 %. In addition to the disambiguation task, the sense evaluated sentences were inserted into the related learning sets to take part in the next executions. In the second phase, after a small manipulation over the learning sets, a new input data set was tested using the same algorithm, and in this second execution, the algorithm produced a better result, around 83 %. The results were verified with the help of a standard Bengali dictionary.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ide, N., and Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics, Vol. 24, No. 1, Pp. 1–40 (1998).
Cucerzan, R.S., Schafer, C., and Yarowsky, D.: Combining classifiers for word sense disambiguation. In: Natural Language Engineering, Vol. 8, No. 4, Cambridge University Press, Pp. 327–341 (2002).
Nameh, M. S., Fakhrahmad, M., Jahromi, M.Z: A New Approach to Word Sense Disambiguation Based on Context Similarity. In: Proceedings of the World Congress on Engineering, Vol. I. (2011).
Xiaojie, W., Matsumot, Y.: Chinese word sense disambiguation by combining pseudo training data. In: Proceedings of The International Conference on Natural Language Processing and Knowledge Engineering, Pp. 138–143 (2003).
Navigli, R.: Word Sense Disambiguation: a Survey. In: ACM Computing Surveys, Vol. 41, No. 2, ACM Press, Pp. 1–69 (2009).
Gaizauskas, R.: Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs. In: Computer Speech and Language, Vol. 12, No. 3, Special Issue on Evaluation of Speech and Language Technology, pp. 453–472 (1997).
http://en.wikipedia.org/wiki/Naive_bayes Dated: 27/02/2015.
Miller, G.A.: WordNet: A Lexical Database. In: Comm. ACM, Vol. 38, No. 11, Pp. 39–41 (1993).
http://cse.iitkgp.ac.in/~ayand/ICON-2013_submission_36.pdf date: 14/05/2015.
Dash, N.S.: Bangla pronouns-a corpus based study. In: Literary and Linguistic Computing. 15(4): 433–444 (2000).
Dash, N.S.: Language Corpora: Present Indian Need, Indian Statistical Institute, Kolkata, (2004). http://www.elda.org/en/proj/scalla/SCALLA2004/dash.pdf.
Dash. N.S.: Methods in Madness of Bengali Spelling. In: A Corpus-based Investigation, South Asian Language Review, Vol. XV, No. 2 (2005).
Dash, N.S.: From KCIE to LDC-IL: Some Milestones in NLP Journey in Indian Multilingual Panorama. Indian Linguistics. 73(1–4): 129-146 (2012).
Dash, N.S. and Chaudhuri, B.B.: A corpus based study of the Bangla language. Indian Journal of Linguistics. 20: 19–40 (2001).
Dash, N.S., and Chaudhuri, B.B.: Corpus-based empirical analysis of form, function and frequency of characters used in Bangla. In: Rayson, P., Wilson, A., McEnery, T., Hardie, A., and Khoja, S., (eds.) Special issue of the Proceedings of the Corpus Linguistics 2001 Conference, Lancaster: Lancaster University Press. UK. 13: 144–157 (2001).
Dash, N.S. and Chaudhuri, B.B.: Corpus generation and text processing. In: International Journal of Dravidian Linguistics. 31(1): 25–44 (2002).
Dash, N.S. and Chaudhuri, B.B.: Using Text Corpora for Understanding Polysemy in Bangla. In: Proceedings of the Language Engineering Conference (LEC’02) IEEE (2002).
Dolamic, L. and Savoy, J.: Comparative Study of Indexing and Search Strategies for the Hindi, Marathi and Bengali Languages. In: ACM Transactions on Asian Language Information Processing, 9(3): 1–24 (2010).
Dash, N.S.: Indian scenario in language corpus generation. In: Dash, Ni S., P. Dasgupta and P. Sarkar (Eds.) Rainbow of Linguistics: Vol. I. Kolkata: T. Media Publication. Pp. 129–162 (2007).
Dash, N.S.: Corpus oriented Bangla language processing. In: Jadavpur Journal of Philosophy. 11(1): 1–28 (1999).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Pal, A.R., Saha, D. (2016). Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result. In: Satapathy, S., Mandal, J., Udgata, S., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 435. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2757-1_42
Download citation
DOI: https://doi.org/10.1007/978-81-322-2757-1_42
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2756-4
Online ISBN: 978-81-322-2757-1
eBook Packages: EngineeringEngineering (R0)