Nothing Special   »   [go: up one dir, main page]

Skip to main content

Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result

  • Conference paper
  • First Online:
Information Systems Design and Intelligent Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 435))

  • 1578 Accesses

Abstract

This work is implemented using the Naïve Bayes probabilistic model. The whole task is implemented in two phases. First, the algorithm was tested on a dataset from the Bengali corpus, which was developed in the TDIL (Technology Development for the Indian Languages) project of the Govt. of India. In the first execution of the algorithm, the accuracy of result was nearly 80 %. In addition to the disambiguation task, the sense evaluated sentences were inserted into the related learning sets to take part in the next executions. In the second phase, after a small manipulation over the learning sets, a new input data set was tested using the same algorithm, and in this second execution, the algorithm produced a better result, around 83 %. The results were verified with the help of a standard Bengali dictionary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ide, N., and Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics, Vol. 24, No. 1, Pp. 1–40 (1998).

    Google Scholar 

  2. Cucerzan, R.S., Schafer, C., and Yarowsky, D.: Combining classifiers for word sense disambiguation. In: Natural Language Engineering, Vol. 8, No. 4, Cambridge University Press, Pp. 327–341 (2002).

    Google Scholar 

  3. Nameh, M. S., Fakhrahmad, M., Jahromi, M.Z: A New Approach to Word Sense Disambiguation Based on Context Similarity. In: Proceedings of the World Congress on Engineering, Vol. I. (2011).

    Google Scholar 

  4. Xiaojie, W., Matsumot, Y.: Chinese word sense disambiguation by combining pseudo training data. In: Proceedings of The International Conference on Natural Language Processing and Knowledge Engineering, Pp. 138–143 (2003).

    Google Scholar 

  5. Navigli, R.: Word Sense Disambiguation: a Survey. In: ACM Computing Surveys, Vol. 41, No. 2, ACM Press, Pp. 1–69 (2009).

    Google Scholar 

  6. Gaizauskas, R.: Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs. In: Computer Speech and Language, Vol. 12, No. 3, Special Issue on Evaluation of Speech and Language Technology, pp. 453–472 (1997).

    Google Scholar 

  7. http://en.wikipedia.org/wiki/Naive_bayes Dated: 27/02/2015.

  8. Miller, G.A.: WordNet: A Lexical Database. In: Comm. ACM, Vol. 38, No. 11, Pp. 39–41 (1993).

    Google Scholar 

  9. http://arxiv.org/ftp/arxiv/papers/1508/1508.01346.pdf.

  10. http://cse.iitkgp.ac.in/~ayand/ICON-2013_submission_36.pdf date: 14/05/2015.

  11. Dash, N.S.: Bangla pronouns-a corpus based study. In: Literary and Linguistic Computing. 15(4): 433–444 (2000).

    Google Scholar 

  12. Dash, N.S.: Language Corpora: Present Indian Need, Indian Statistical Institute, Kolkata, (2004). http://www.elda.org/en/proj/scalla/SCALLA2004/dash.pdf.

  13. Dash. N.S.: Methods in Madness of Bengali Spelling. In: A Corpus-based Investigation, South Asian Language Review, Vol. XV, No. 2 (2005).

    Google Scholar 

  14. Dash, N.S.: From KCIE to LDC-IL: Some Milestones in NLP Journey in Indian Multilingual Panorama. Indian Linguistics. 73(1–4): 129-146 (2012).

    Google Scholar 

  15. Dash, N.S. and Chaudhuri, B.B.: A corpus based study of the Bangla language. Indian Journal of Linguistics. 20: 19–40 (2001).

    Google Scholar 

  16. Dash, N.S., and Chaudhuri, B.B.: Corpus-based empirical analysis of form, function and frequency of characters used in Bangla. In: Rayson, P., Wilson, A., McEnery, T., Hardie, A., and Khoja, S., (eds.) Special issue of the Proceedings of the Corpus Linguistics 2001 Conference, Lancaster: Lancaster University Press. UK. 13: 144–157 (2001).

    Google Scholar 

  17. Dash, N.S. and Chaudhuri, B.B.: Corpus generation and text processing. In: International Journal of Dravidian Linguistics. 31(1): 25–44 (2002).

    Google Scholar 

  18. Dash, N.S. and Chaudhuri, B.B.: Using Text Corpora for Understanding Polysemy in Bangla. In: Proceedings of the Language Engineering Conference (LEC’02) IEEE (2002).

    Google Scholar 

  19. Dolamic, L. and Savoy, J.: Comparative Study of Indexing and Search Strategies for the Hindi, Marathi and Bengali Languages. In: ACM Transactions on Asian Language Information Processing, 9(3): 1–24 (2010).

    Google Scholar 

  20. Dash, N.S.: Indian scenario in language corpus generation. In: Dash, Ni S., P. Dasgupta and P. Sarkar (Eds.) Rainbow of Linguistics: Vol. I. Kolkata: T. Media Publication. Pp. 129–162 (2007).

    Google Scholar 

  21. Dash, N.S.: Corpus oriented Bangla language processing. In: Jadavpur Journal of Philosophy. 11(1): 1–28 (1999).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Ranjan Pal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Pal, A.R., Saha, D. (2016). Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result. In: Satapathy, S., Mandal, J., Udgata, S., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 435. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2757-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2757-1_42

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2756-4

  • Online ISBN: 978-81-322-2757-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics