Abstract
Word sense disambiguation (WSD) is the process of finding out the appropriate meaning of a polysemous word based on any given context. The Bengali language inherently comprises a large number of polysemous words. Recently, researchers in the domain of linguistics have been attracted to the problem of WSD in Bengali text due to its numerous interesting applications, viz. machine translation, opinion polarity identification, question-answering systems, etc. In this paper, lexeme connexion measure of cohesive lexical ambiguity revealing factor has been proposed that takes a decision on the disambiguation of senses of a Bengali polysemous word. All the polysemous words have been treated as target words, and a context window of three different sizes, viz. five, seven, and ten are considered based on these target words. This paper has generated lexeme harmony measure for quantifying heuristically of syntactic belongings of a collection of lexemes in Bengali text. The proposed methodology has been extracted a feature vector by considering the cohesive lexical ambiguity revealing factor or CLARF, depending on frame lexeme harmony (FLH), sense lexeme harmony (SLH), polysemy singularity coherence (PSC), polysemy distribution factor (PDF), and relative polysemy singularity coherence (RPSC) factor of a lexeme. This Bengali WSD technique has been applied max-rule of integrated lexeme connexion measure (LCM) of each lexeme of both the testing and training cases score for sense recognition. The proposed algorithm has succeeded in eliminating the drawback of the Bengali WSD approaches, as it can focus on both the lexical and semantic relationships between words. The performance of this algorithm has been evaluated on a dataset that consists of 100 polysemous words of three/four senses. Various evaluation metrics have been used to analyse the results obtained by the proposed algorithm. The obtained results indicate the robustness of the proposed algorithm.
Similar content being viewed by others
Data Availability
The datasets generated and/or analysed during the current study are available in the “Kaggle” repository, https://www.kaggle.com/dsv/3985193 with DOI: 10.34740/KAGGLE/DSV/3985193.
References
Agirre E, De Lacalle OL (2007) Ubc-alm: combining k-nn with svd for wsd. In: Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pp 342–345
Agirre E, Edmonds P (2007) Word sense disambiguation: algorithms and applications, vol 33. Springer science & business media
Anand Kumar M, Rajendran S, Soman KP (2014) Tamil word sense disambiguation using support vector machines with rich features. Int J Appl Eng Res 9(20):7609–20
Bala P (2013) Knowledge based approach for word sense disambiguation using hindi wordnet. Int J Eng Sci 2(4):36–41
Banerjee S, Naskar SK, Bandyopadhyay S (2014) Bfqa: a bengali factoid question answering system. In: International conference on text, speech, and dialogue. Springer, pp 217–224
Biswas M, Sharif O, Hoque MM (2021) An empirical framework for bangla word sense disambiguation using statistical approach. In: International conference on machine learning and big data analytics. Springer, pp 22–33
Bonami O, Boyé G, Dal G, Giraudo H, Namer F (2018) The lexeme in descriptive and theoretical morphology. Language science press
Cohn T (2003) Performance metrics for word sense disambiguation. In: Proceedings of the australasian language technology workshop, vol 2003, pp 86–93
Dang HT, Chia C-Y, Palmer M, Chiou F-D (2002) Simple features for chinese word sense disambiguation. In: Proceedings of the 19th international conference on computational linguistics. Association for computational linguistics, vol 1, pp 1–7
Das D, Bandyopadhyay S (2009) Word to sentence level emotion tagging for bengali blogs. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 149–152
Das A, Bandyopadhyay S (2009) Subjectivity detection in english and bengali: a crf-based approach. Proceeding of ICON
Das A, Bandyopadhyay S (2010) Opinion-polarity identification in bengali. In: International conference on computer processing of oriental languages, pp 169–182
Das A, Sarkar S (2013) Word sense disambiguation in bengali applied to bengali-hindi machine translation. In: Proc of international conference on natural language processing (ICON), vol 10, pp 20–28
Das Dawn D, Khan A, Shaikh SH, Pal RK (2022) A dataset for evaluating Bengali word sense disambiguation techniques. J Ambient Intell Humanized Comput 1–30
Dawn DD, Shaikh SH, Pal RK (2020) A comprehensive review of bengali word sense disambiguation. Artif Intell Rev 53(6):4183–4213
Dey A (2020) Attention based lstm cnn framework for sentiment extraction from bengali texts. In: 2020 11th International conference on electrical and computer engineering (ICECE). IEEE, pp 226–229
Dhungana UR, Shakya S (2014) Word sense disambiguation in nepali language. In: 2014 fourth international conference on digital information and communication technology and its applications (DICTAP). IEEE, pp 46–50
Ekbal A, Haque R, Bandyopadhyay S (2007) Bengali part of speech tagging using conditional random field. In: Proceedings of seventh international symposium on natural language processing (SNLP2007), pp 131–136
Florian R, Wicentowski R (2002) Unsupervised Italian word sense disambiguation using wordnets and unlabeled corpora. In: Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions, pp 67–73
Hadni M, Ouatik SEA, Lachkar A (2016) Word sense disambiguation for arabic text categorization. Int Arab J Inf Technol 13(1A):215–222
Haque A, Haque MM (2016) Bangla word sense disambiguation system using dictionary based approach. ICAICT, Bangladesh
Hoste V, Daelemans W, Hendrickx I, Bosch AVD (2002) Dutch word sense disambiguation: optimizing the localness of context. In: Proceedings of the ACL-02 workshop on word sense disambiguation: recent successes and future directions. Association for computational linguistics, vol 8, pp 61–66
Islam M, Islam M, Mohammad Masum AK, Abujar S, Hossain SA et al (2021) Abstraction based bengali text summarization using bi-directional attentive recurrent neural networks. In: Emerging technologies in data mining and information security. Springer, pp 317–327
Joachims T (1996) A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, Carnegie-Mellon Univ Pittsburgh PA dept of computer science
Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international conference on information and knowledge management, pp 625–633
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on Systems documentation, pp 24–26
Liu H, Johnson SB, Friedman C (2002) Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the umls. J Am Med Inform Assoc 9(6):621–636
Màrquez L, Escudero G, Martínez D, Rigau G (2007) Supervised corpus-based methods for wsd. In: Word sense disambiguation. Springer, pp 167–216
McCallum A, Nigam K et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization. Citeseer, number 1, pp 41–48
Menai MEB (2014) Word sense disambiguation using an evolutionary approach. Informatica, vol 38(3)
Merhbene L, Zouaghi A, Zrigui M (2010) Ambiguous arabic words disambiguation. In: 2010 11th ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing. IEEE, pp 157–164
Merhbene L, Zouaghi A, Zrigui M (2013) A semi-supervised method for arabic word sense disambiguation using a weighted directed graph. In: Proceedings of the sixth international joint conference on natural language processing, pp 1027–1031
Mukaka MM (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research malawi medical journal
Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H (2001) Japanese word sense disambiguation using the simple bayes and support vector machine methods. In: Proceedings of SENSEVAL-2 second international workshop on evaluating word sense disambiguation systems, pp 135–138
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surveys (CSUR) 41(2):1–69
Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 27(7):1075–1086
Ng HT, Lee HB (1996) Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach. In: Proceedings of the 34th annual meeting on association for computational linguistics. Association for computational linguistics, pp 40–47
Pal AR, Kundu A, Singh A, Shekhar R, Sinha K (2015) A hybrid approach to word sense disambiguation combining supervised and unsupervised learning. arXiv:1611.01083
Pal AR, Saha D (2016) Word sense disambiguation in bengali: an auto-updated learning set increases the accuracy of the result. In: Information systems design and intelligent applications. Springer, pp 423–430
Pal AR, Saha D (2019) Word sense disambiguation in bengali language using unsupervised methodology with modifications. Sādhanā 44(7):168
Pal AR, Saha D, Dash NS (2015) Automatic classification of bengali sentences based on sense definitions present in bengali wordnet. arXiv:1508.01349
Pal AR, Saha D, Dash NS, Naskar SK, Pal A (2019) A novel approach to word sense disambiguation in bengali language using supervised methodology. Sādhanā 44(8):1–12
Pal AR, Saha D, Naskar S, Dash NS (2015) Word sense disambiguation in bengali: a lemmatized system increases the accuracy of the result. In: 2015 IEEE 2nd international conference on recent trends in information systems (ReTIS). IEEE, pp 342–346
Pal AR, Saha D, Naskar SK, Dash NS (2021) In search of a suitable method for disambiguation of word senses in bengali. Int J Speech Technol 24(2):439–454
Pal AR, Saha D, Pal A (2017) A knowledge based methodology for word sense disambiguation for low resource language. Adv Computat Sci Technol 10 (2):267–283
Palanati DP, Kolikipogu R (2013) Decision list algorithm for word sense disambiguation for telegu natural language processing. Int J Electron Commun Comput Eng 4(6):176–180
Pandit R, Naskar SK (2015) A memory based approach to word sense disambiguation in bengali using k-nn method. In: 2015 IEEE 2nd international conference on recent trends in information systems (reTIS). IEEE, pp 383–386
Parameswarappa S, Narayana VN (2011) Kannada word sense disambiguation using association rules. In: International conference on computing and communication systems. Springer, pp 47–56
Parameswarappa S, Narayana VN, Yarowsky D (2013) Kannada word sense disambiguation using decision list. Int J Emerging Trends Technol Comput Sci (IJETTCS) 2(3):272–278
Pedersen T (2007) Unsupervised corpus-based methods for wsd. In: Word sense disambiguation. Springer, pp 133–166
Rana P, Kumar P (2015) Word sense disambiguation for punjabi language using overlap based approach. In: Advances in intelligent informatics. Springer, pp 607–619
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. arXiv:cmp-lg/9511007
Ritter A, Etzioni O et al (2010) A latent dirichlet allocation method for selectional preferences. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for computational linguistics, pp 424–434
Roy A, Sarkar S, Purkayastha BS (2014) Knowledge based approaches to nepali word sense disambiguation. Int J Natural Lang Comput (IJNLC) 3(3):51–63
Sarmah J, Sarma SK (2016) Decision tree based supervised word sense disambiguation for assamese. Int J Comput Appl 141(1):42–48
Sengupta S, Pandit R, Mitra P, Naskar SK, Sardar MM (2019) Word sense induction in bengali using parallel corpora and distributional semantics. J Intell Fuzzy Syst 36(5):4821–4832
Sharma DK et al (2015) A comparative analysis of hindi word sense disambiguation and its approaches. In: International conference on computing, communication & automation. IEEE, pp 314–321
Sidorov G, Gelbukh A (2001) Word sense disambiguation in a spanish explanatory dictionary. In: Proceedings of TALN, pp 398–402
Singh RL, Ghosh K, Nongmeikapam K, Bandyopadhyay S (2014) A decision tree based word sense disambiguation system in manipuri language. Adv Comput 5(4):17
Singh S, Singh VK, Siddiqui TJ (2013) Hindi word sense disambiguation using semantic relatedness measure. In: International workshop on multi-disciplinary trends in artificial intelligence. Springer, pp 247–256
Sinha M, Kumar M, Pande P, Kashyap L, Bhattacharyya P (2004) Hindi word sense disambiguation. In: International symposium on machine translation, natural language processing and translation support systems, Delhi, India
Sruthi Sankar KP, Reghu Raj PC, Jayan V (2016) Unsupervised approach to word sense disambiguation in malayalam. Proced Technol 24:1507–1513
Sultana M, Chakraborty P, Choudhury T (2022) Bengali abstractive news summarization using seq2seq learning with attention. In: Cyber intelligence and information retrieval. Springer, pp 279–289
Tayal DK, Ahuja L, Chhabra S (2015) Word sense disambiguation in hindi language using hyperspace analogue to language and fuzzy c-means clustering. In: Proceedings of the 12th international conference on natural language processing, pp 49–58
Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for computational linguistics, pp 384–394
Vishwakarma SK, Vishwakarma CK (2012) A graph based approach to word sense disambiguation for hindi language. Int J Sci Res Eng Technol (IJSRET) Vol 1:313–318
Yadav P, Vishwakarma S (2013) Mining association rules based approach to word sense disambiguation for hindi language. Int J Emerging Technol Adv Eng 3(5):470–473
Zipf GK (1949) Human behavior and the principle of least effort. Adison-Wesley Press
Zouaghi A, Merhbene L, Zrigui M (2011) Word sense disambiguation for arabic language using the variants of the lesk algorithm. WORLDCOMP 11:561–567
Zungre NB, Dhopavkar GM (2016) Sense disambiguation for marathi language words using decision graph method. In: 2016 World conference on futuristic trends in research and innovation for social welfare (startup conclave). IEEE, pp 1–6
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Das Dawn, D., Khan, A., Shaikh, S.H. et al. Lexeme connexion measure of cohesive lexical ambiguity revealing factor: a robust approach for word sense disambiguation of Bengali text. Multimed Tools Appl 83, 12939–12983 (2024). https://doi.org/10.1007/s11042-023-14676-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14676-8