Abstract
Social media platforms (SMPs) are frequently utilised as a readily accessible and comprehensive medium for expressing personal opinions nowdays. The use of euphemism, a linguistic strategy in which the underlying feeling of expressive content is veiled by the use of mild language, has been a longtime practise in the realm of SMPs for the purpose of reducing harshness or to discuss sensitive topics [1]. The identification of masked contents [2] in euphemism is challenging due to their inherent nature. This study presents a proposed identification mechanism aimed at detecting domain-specific euphemisms through the utilisation of clustering techniques. The pattern categorization feature is created utilising domain-specific lexical features combined with frequency-based features. In order to identify the most suitable match, the hybrid feature extraction algorithms incorporate uni-gram and bi-gram features dependent on frequency based feature, in conjunction with a lexicon. The objective of the dimension reduction phase is to address the issue of sparsity and to identify the most significant words for each sample in order to classify them into different domains using centroid and density-based clustering techniques. The DBSCAN algorithm is employed with an epsilon value of 2.5 and a minimum number of points set to 6, resulting in the identification of 7 distinct clusters. To calculate the optimal value for k in the K-means algorithm, the Silhouette score is utilised. The clusters that were obtained are examined by manual means. We compare our model to FLUTE dataset with epsilon value of 0.2, minpoints of 5 for DBSCAN, and obtain validation score of 0.55. The DBSCAN clustering algorithm generates distinct clusters that extend beyond the scope of the inquiry domain.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Domain-specific Euphemism bearing key-phrases for English language used in this experiment is annexed in Annexure A
Notes
Word length three and above.
References
Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J Potts C (2013) A computational approach to politeness with application to social factors, 250–259 (ACL)
Magu R Luo J Fišer D et al (2018) (eds) Determining code words in euphemistic hate speech using word embedding networks. (eds Fišer, D. et al.) Proceedings of the 2nd Workshop on Abusive Language Online, 93–100
Khanday AMUD, Khan QR, Rabani ST (2021) Identifying propaganda from online social networks during covid-19 using machine learning techniques. Int J InformTech 13:115–122
Zaid M, Batool F, Khan A, Mangla S H (2018) Euphemistic expressions: A challenge to l2 learners. International Journal on Studies in English Language and Literature 6
Samoškaitė L (2011) 21st century political euphemisms: semantic and structural study. Master’s thesis, Department Of English Philology, Vytautas Magnus University
Felt C, Riloff E, Klebanov B B et al (2020) (eds) Recognizing euphemisms and dysphemisms using sentiment analysis. (eds Klebanov, B. B. et al.) Proceedings of the Second Workshop on Figurative Language Processing, 136–145
Zentner M, Grandjean D, Scherer KR (2008) Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8:494
Russell JA (1980) A circumplex model of affect. J personality social psy 39:1161
Plutchik R (2001) The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist 89:344–350
Scherer K R, Shuman V, Fontaine J J R, Soriano C (2013) in The GRID meets the Wheel: Assessing emotional feeling via self-report (eds Fontaine, J. J. R., Scherer, K. R. & Soriano, C.) Components of Emotional Meaning: A sourcebook 281–298 (Oxford University Press)
Kumar P, Vardhan M (2022) Pwebsa: Twitter sentiment analysis by combining plutchik wheel of emotion and word embedding. International Journal of Information Technology 1–9
Esuli A, Sebastiani F, Calzolari N et al(2006) (eds) Sentiwordnet: A publicly available lexical resource for opinion mining. (eds Calzolari, N. et al.) Proceedings of the Fifth International Conference on Language Resources and Evaluation, 417–422 (ELRA)
Strapparava C, Valitutti A, Lino M T, Xavier M F, Ferreira F, Costa R, Silva R (2004) (eds) Wordnet affect: an affective extension of wordnet. (eds Lino, M. T., Xavier, M. F., Ferreira, F., Costa, R. & Silva, R.) Proceedings of the Fourth International Conference on Language Resources and Evaluation, 1083–1086 (ELRA)
Cambria E, Olsher D, Rajagopal D Stracuzzi D, Gunning D (2014) (eds) Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. (eds Stracuzzi, D. & Gunning, D.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Vol. 28, 1515–1521 (AAAI Press)
Shutova E (2010) Automatic metaphor interpretation as a paraphrasing task, 1029–1037
Pfaff KL Jr, RWG, Johnson MD (1997) Metaphor in using and understanding euphemism and dysphemism. Applied Psycholinguistics 18:59–83
Shutova E, Sun L, Korhonen A (2010) Metaphor identification using verb and noun clustering, 1002–1010
Rababah HA (2014) The translatability and use of x-phemism expressions (x-phemization): euphemisms, dysphemisms and orthophemisms in the medical discourse. Stud literature lang 9:229
Crespo-Fernández E (2018) Euphemism as a discursive strategy in us local and state politics. J Lang Polit 17:789–811
Li-Na Z (2015) Euphemism in modern american english. Sino-US English Teach 12:265–270
Kaplan D (1999) Explorations in the theory of meaning as use
Maran E et al (2020) Spirituality and practice of the euphemism in the workplace: perceptions of a nursing team. Revista Brasileira de Enfermagem 73
Hojati A (2012) A study of euphemisms in the context of english-speaking media. Int J Linguistics 4:552
Ryabova M (2013) Euphemisms and media framing. European Scientific Journal 9
Sadullaeva N, Mamatova F, Sayfullaeva R (2020) Classification of euphemism and its formation in the uzbek language. J Crit Rev 7:426–430
Jamet D (2018) The neological functions of disease euphemisms in english and french: Verbal hygiene or speech pathology? Lexis. Journal in English Lexicology
Niraula NB, Dulal S, Koirala D (2022) Linguistic taboos and euphemisms in nepali. ACM Trans Asian Low-Resource Lang Inform Proces 21:1–26
Elisabeth D, Budi I, Ibrohim M O (2020) Hate code detection in indonesian tweets using machine learning approach: a dataset and preliminary study, 1–6 (IEEE)
Thelen M, Riloff E, Hajic J, Matsumoto Y (2002) (eds) A bootstrapping method for learning semantic lexicons using extraction pattern contexts. (eds Hajic, J. & Matsumoto, Y.) Proceedings of the conference on empirical methods in natural language processing, 214–221
Roget PM (2020) Roget’s Thesaurus. Good Press
Takuro H, Yuichi S, Tahara Y, Ohsuga A (2020) Codewords detection in microblogs focusing on differences in word use between two corpora, 103–108 (IEEE)
Dwivedi V, Ghosh S (2023) Semantic relations classification in hindi compound nouns using embeddings. International Journal of Information Technology 1–6
Keh S S et al (2022) Eureka: Euphemism recognition enhanced through KNN-based methods and augmentation, 111–117 (ACL)
Yang H et al (2017) How to learn klingon without a dictionary: Detection and measurement of black keywords used by the underground economy, 751–769 (IEEE)
Yuan K, Lu H, Liao X, Wang X (2018) Reading thieves’ cant: Automatically identifying and understanding dark jargons from cybercrime marketplaces, 1027–1041. USENIX Association, Baltimore, MD
Zhu W et al (2021) Self-supervised euphemism detection and identification for content moderation, 229–246 (IEEE)
Wiriyathammabhum P (2023) Tedb system description to a shared task on euphemism detection 2022. arXiv preprint arXiv:2301.06602
Sharaff A, Jain M, Modugula G (2022) Feature based cluster ranking approach for single document summarization. Int J Inform Techn 14:2057–2065
Riaz S, Fatima M, Kamran M, Nisar MW (2019) Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Computing 22:7149–7164
Ma B, Yuan H, Wu Y (2017) Exploring performance of clustering methods on document sentiment analysis. J Inform Sci 43:54–74
Nhlabano V, Lutu P, Madhav N, Asare S D, Macharia P, Dwarika J (2018) (eds) Impact of text pre-processing on the performance of sentiment analysis models for social media data. (eds Madhav, N., Asare, S. D., Macharia, P. & Dwarika, J.) International Conference on Advances in Big Data, Computing and Data Communication Systems, 1–6
Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879
Rout JK et al (2018) A model for sentiment and emotion analysis of unstructured social media text. Electronic Commerce Res 18:181–199
Kalra V, Kashyap I, Kaur H (2022) Generation of domain-specific vocabulary set and classification of documents: weight-inclusion approach. International Journal of Information Technology 1–11
Baccianella S, Esuli A, Sebastiani F, Calzolari N et al (2010) (eds) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. (eds Calzolari, N. et al.) Proceedings of the Seventh International Conference on Language Resources and Evaluation, 2200–2204
Bradley MM, Lang PJ (1999) Affective norms for english words (ANEW): Instruction manual and affective ratings. Tech. Rep., University of Florida
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora
Mitchell M, Aguilar J, Wilson T, Durme B V, Yarowsky D, Baldwin T, Korhonen A, Livescu K, Bethard S (2013) (eds) Open domain targeted sentiment. (eds Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K. & Bethard, S.) Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1643–1654 (Seattle, Washington, USA)
Li L, Goh T-T, Jin D (2020) How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Comput Appl 32:4387–4415
Saharia N (2017) Phone-based identification of language in code-mixed social network data. J Statist Manag Syst 20:565–574
Naeem S, Wumaier A (2018) Study and implementing k-mean clustering algorithm on english text and techniques to find the optimal value of k. Int. J. Comput. Appl 182:7–14
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J comput appl math 20:53–65
Chakrabarty T, Saakyan A, Ghosh D Muresan S (2022) Flute: Figurative language understanding through textual explanations, 7139–7159
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no competing interests.
Ethical approval
Not applicable. In this experiment, neither a human nor an animal is employed as an experimental element.
Euphemism bearing domain-specific key-phrases
Euphemism bearing domain-specific key-phrases
Domain-specific euphemism bearing key-phrases for English language used in this experiment is listed below.
Domain | Keyword |
---|---|
Disability | mentally challenged, special needs, physically challenged, differently abled, visually challenged, verbally challenged, verbally challenged, verbally challenged, chronically challenged, financially challenged, aesthetically challenged, vertically challenged, sartorially challenged, intellectually challenged, hearing impaired, horizontally challenged, electronically challenged |
Physical appearance | heavyset, portly, Full-figured, extra pounds, husky, big, heavy, curvy, fluffy, zaftig, plus sized, Thick-boned, extra large, plump, rubenesque, stump |
Profession | sanitation engineer, automobile engineer, administrative assistant, domestic engineer, technologist, correctional facility, cemetery operative, call girl, business girl, comfort girl, security officer, domestic manager, laid off, discharged, dismissed, made redundant, furloughed, pink slip, outplaced, riffed, bought out, released, unassigned, cut ties, uninstalled, separated, services no longer required, early retired, eased out, force resignation, stepped down, position eliminated, given the package, released from talent pool, declined to extend, assignment expired, helped her exit, one person layoff, managed out, career transition, career change opportunity, contract not renewed, end of trial period, involuntary separation, freed up for future, relieved of duties, taking it for team, promoted to customer, retail workers, universities, exterminating engineer, emporium, salon, parlor, beautician |
Interrogation | may, could, would, In a way, to some extent |
Politics | the deprived, man of modest means, less well off, under privileged, economically disadvantaged, substandard housing, illegal aliens, culturally deprived environment, undocumented workers, temporary negative cash, under performing assets, economic downturn, economic slowdown, undeveloped countries, underdeveloped countries, depressed neighborhood, developing countries, third world, forth world |
Education | slow student, underachiever, learning difficulties, special needs, bend the truth, tell a white lie, color the truth, economical with truth, dissemble, unreliable, peer homework, comparing answers, collaborating, harvesting answer, non versatile, misspoke, exceptional child, detained, lead ship qualities |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Devi, M.D., Saharia, N. Identification of domain-specific euphemistic tweets using clustering. Int. j. inf. tecnol. 16, 21–31 (2024). https://doi.org/10.1007/s41870-023-01595-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-023-01595-y