Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result

Alok Ranjan Pal⁶ &
Diganta Saha⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 435))

1578 Accesses

Abstract

This work is implemented using the Naïve Bayes probabilistic model. The whole task is implemented in two phases. First, the algorithm was tested on a dataset from the Bengali corpus, which was developed in the TDIL (Technology Development for the Indian Languages) project of the Govt. of India. In the first execution of the algorithm, the accuracy of result was nearly 80 %. In addition to the disambiguation task, the sense evaluated sentences were inserted into the related learning sets to take part in the next executions. In the second phase, after a small manipulation over the learning sets, a new input data set was tested using the same algorithm, and in this second execution, the algorithm produced a better result, around 83 %. The results were verified with the help of a standard Bengali dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications

Article 27 June 2019

In search of a suitable method for disambiguation of word senses in Bengali

Article 01 February 2021

An Empirical Framework for Bangla Word Sense Disambiguation Using Statistical Approach

References

Ide, N., and Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics, Vol. 24, No. 1, Pp. 1–40 (1998).
Google Scholar
Cucerzan, R.S., Schafer, C., and Yarowsky, D.: Combining classifiers for word sense disambiguation. In: Natural Language Engineering, Vol. 8, No. 4, Cambridge University Press, Pp. 327–341 (2002).
Google Scholar
Nameh, M. S., Fakhrahmad, M., Jahromi, M.Z: A New Approach to Word Sense Disambiguation Based on Context Similarity. In: Proceedings of the World Congress on Engineering, Vol. I. (2011).
Google Scholar
Xiaojie, W., Matsumot, Y.: Chinese word sense disambiguation by combining pseudo training data. In: Proceedings of The International Conference on Natural Language Processing and Knowledge Engineering, Pp. 138–143 (2003).
Google Scholar
Navigli, R.: Word Sense Disambiguation: a Survey. In: ACM Computing Surveys, Vol. 41, No. 2, ACM Press, Pp. 1–69 (2009).
Google Scholar
Gaizauskas, R.: Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs. In: Computer Speech and Language, Vol. 12, No. 3, Special Issue on Evaluation of Speech and Language Technology, pp. 453–472 (1997).
Google Scholar
http://en.wikipedia.org/wiki/Naive_bayes Dated: 27/02/2015.
Miller, G.A.: WordNet: A Lexical Database. In: Comm. ACM, Vol. 38, No. 11, Pp. 39–41 (1993).
Google Scholar
http://arxiv.org/ftp/arxiv/papers/1508/1508.01346.pdf.
http://cse.iitkgp.ac.in/~ayand/ICON-2013_submission_36.pdf date: 14/05/2015.
Dash, N.S.: Bangla pronouns-a corpus based study. In: Literary and Linguistic Computing. 15(4): 433–444 (2000).
Google Scholar
Dash, N.S.: Language Corpora: Present Indian Need, Indian Statistical Institute, Kolkata, (2004). http://www.elda.org/en/proj/scalla/SCALLA2004/dash.pdf.
Dash. N.S.: Methods in Madness of Bengali Spelling. In: A Corpus-based Investigation, South Asian Language Review, Vol. XV, No. 2 (2005).
Google Scholar
Dash, N.S.: From KCIE to LDC-IL: Some Milestones in NLP Journey in Indian Multilingual Panorama. Indian Linguistics. 73(1–4): 129-146 (2012).
Google Scholar
Dash, N.S. and Chaudhuri, B.B.: A corpus based study of the Bangla language. Indian Journal of Linguistics. 20: 19–40 (2001).
Google Scholar
Dash, N.S., and Chaudhuri, B.B.: Corpus-based empirical analysis of form, function and frequency of characters used in Bangla. In: Rayson, P., Wilson, A., McEnery, T., Hardie, A., and Khoja, S., (eds.) Special issue of the Proceedings of the Corpus Linguistics 2001 Conference, Lancaster: Lancaster University Press. UK. 13: 144–157 (2001).
Google Scholar
Dash, N.S. and Chaudhuri, B.B.: Corpus generation and text processing. In: International Journal of Dravidian Linguistics. 31(1): 25–44 (2002).
Google Scholar
Dash, N.S. and Chaudhuri, B.B.: Using Text Corpora for Understanding Polysemy in Bangla. In: Proceedings of the Language Engineering Conference (LEC’02) IEEE (2002).
Google Scholar
Dolamic, L. and Savoy, J.: Comparative Study of Indexing and Search Strategies for the Hindi, Marathi and Bengali Languages. In: ACM Transactions on Asian Language Information Processing, 9(3): 1–24 (2010).
Google Scholar
Dash, N.S.: Indian scenario in language corpus generation. In: Dash, Ni S., P. Dasgupta and P. Sarkar (Eds.) Rainbow of Linguistics: Vol. I. Kolkata: T. Media Publication. Pp. 129–162 (2007).
Google Scholar
Dash, N.S.: Corpus oriented Bangla language processing. In: Jadavpur Journal of Philosophy. 11(1): 1–28 (1999).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, College of Engineering and Management, Kolaghat, India
Alok Ranjan Pal
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Diganta Saha

Authors

Alok Ranjan Pal
View author publications
You can also search for this author in PubMed Google Scholar
Diganta Saha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alok Ranjan Pal .

Editor information

Editors and Affiliations

Department of Computer Science Engineering, Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, India
Suresh Chandra Satapathy
Kalyani University, Nadia, West Bengal, India
Jyotsna Kumar Mandal
University of Hyderabad, Hyderabad, India
Siba K. Udgata
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pal, A.R., Saha, D. (2016). Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result. In: Satapathy, S., Mandal, J., Udgata, S., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 435. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2757-1_42

Download citation

DOI: https://doi.org/10.1007/978-81-322-2757-1_42
Published: 04 February 2016
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2756-4
Online ISBN: 978-81-322-2757-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics