Identification of domain-specific euphemistic tweets using clustering

179 Accesses
Explore all metrics

Abstract

Social media platforms (SMPs) are frequently utilised as a readily accessible and comprehensive medium for expressing personal opinions nowdays. The use of euphemism, a linguistic strategy in which the underlying feeling of expressive content is veiled by the use of mild language, has been a longtime practise in the realm of SMPs for the purpose of reducing harshness or to discuss sensitive topics [1]. The identification of masked contents [2] in euphemism is challenging due to their inherent nature. This study presents a proposed identification mechanism aimed at detecting domain-specific euphemisms through the utilisation of clustering techniques. The pattern categorization feature is created utilising domain-specific lexical features combined with frequency-based features. In order to identify the most suitable match, the hybrid feature extraction algorithms incorporate uni-gram and bi-gram features dependent on frequency based feature, in conjunction with a lexicon. The objective of the dimension reduction phase is to address the issue of sparsity and to identify the most significant words for each sample in order to classify them into different domains using centroid and density-based clustering techniques. The DBSCAN algorithm is employed with an epsilon value of 2.5 and a minimum number of points set to 6, resulting in the identification of 7 distinct clusters. To calculate the optimal value for k in the K-means algorithm, the Silhouette score is utilised. The clusters that were obtained are examined by manual means. We compare our model to FLUTE dataset with epsilon value of 0.2, minpoints of 5 for DBSCAN, and obtain validation score of 0.55. The DBSCAN clustering algorithm generates distinct clusters that extend beyond the scope of the inquiry domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Article Open access 10 September 2022

Finding Records in Social Media: A Natural Language Processing Fundamentals Exploration

Differences between antisemitic and non-antisemitic English language tweets

Article 09 September 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Domain-specific Euphemism bearing key-phrases for English language used in this experiment is annexed in Annexure A

Notes

Word length three and above.

References

Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J Potts C (2013) A computational approach to politeness with application to social factors, 250–259 (ACL)
Magu R Luo J Fišer D et al (2018) (eds) Determining code words in euphemistic hate speech using word embedding networks. (eds Fišer, D. et al.) Proceedings of the 2nd Workshop on Abusive Language Online, 93–100
Khanday AMUD, Khan QR, Rabani ST (2021) Identifying propaganda from online social networks during covid-19 using machine learning techniques. Int J InformTech 13:115–122
Google Scholar
Zaid M, Batool F, Khan A, Mangla S H (2018) Euphemistic expressions: A challenge to l2 learners. International Journal on Studies in English Language and Literature 6
Samoškaitė L (2011) 21st century political euphemisms: semantic and structural study. Master’s thesis, Department Of English Philology, Vytautas Magnus University
Felt C, Riloff E, Klebanov B B et al (2020) (eds) Recognizing euphemisms and dysphemisms using sentiment analysis. (eds Klebanov, B. B. et al.) Proceedings of the Second Workshop on Figurative Language Processing, 136–145
Zentner M, Grandjean D, Scherer KR (2008) Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8:494
Article Google Scholar
Russell JA (1980) A circumplex model of affect. J personality social psy 39:1161
Article Google Scholar
Plutchik R (2001) The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist 89:344–350
Article Google Scholar
Scherer K R, Shuman V, Fontaine J J R, Soriano C (2013) in The GRID meets the Wheel: Assessing emotional feeling via self-report (eds Fontaine, J. J. R., Scherer, K. R. & Soriano, C.) Components of Emotional Meaning: A sourcebook 281–298 (Oxford University Press)
Kumar P, Vardhan M (2022) Pwebsa: Twitter sentiment analysis by combining plutchik wheel of emotion and word embedding. International Journal of Information Technology 1–9
Esuli A, Sebastiani F, Calzolari N et al(2006) (eds) Sentiwordnet: A publicly available lexical resource for opinion mining. (eds Calzolari, N. et al.) Proceedings of the Fifth International Conference on Language Resources and Evaluation, 417–422 (ELRA)
Strapparava C, Valitutti A, Lino M T, Xavier M F, Ferreira F, Costa R, Silva R (2004) (eds) Wordnet affect: an affective extension of wordnet. (eds Lino, M. T., Xavier, M. F., Ferreira, F., Costa, R. & Silva, R.) Proceedings of the Fourth International Conference on Language Resources and Evaluation, 1083–1086 (ELRA)
Cambria E, Olsher D, Rajagopal D Stracuzzi D, Gunning D (2014) (eds) Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. (eds Stracuzzi, D. & Gunning, D.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Vol. 28, 1515–1521 (AAAI Press)
Shutova E (2010) Automatic metaphor interpretation as a paraphrasing task, 1029–1037
Pfaff KL Jr, RWG, Johnson MD (1997) Metaphor in using and understanding euphemism and dysphemism. Applied Psycholinguistics 18:59–83
Shutova E, Sun L, Korhonen A (2010) Metaphor identification using verb and noun clustering, 1002–1010
Rababah HA (2014) The translatability and use of x-phemism expressions (x-phemization): euphemisms, dysphemisms and orthophemisms in the medical discourse. Stud literature lang 9:229
Google Scholar
Crespo-Fernández E (2018) Euphemism as a discursive strategy in us local and state politics. J Lang Polit 17:789–811
Article Google Scholar
Li-Na Z (2015) Euphemism in modern american english. Sino-US English Teach 12:265–270
Google Scholar
Kaplan D (1999) Explorations in the theory of meaning as use
Maran E et al (2020) Spirituality and practice of the euphemism in the workplace: perceptions of a nursing team. Revista Brasileira de Enfermagem 73
Hojati A (2012) A study of euphemisms in the context of english-speaking media. Int J Linguistics 4:552
Article Google Scholar
Ryabova M (2013) Euphemisms and media framing. European Scientific Journal 9
Sadullaeva N, Mamatova F, Sayfullaeva R (2020) Classification of euphemism and its formation in the uzbek language. J Crit Rev 7:426–430
Google Scholar
Jamet D (2018) The neological functions of disease euphemisms in english and french: Verbal hygiene or speech pathology? Lexis. Journal in English Lexicology
Niraula NB, Dulal S, Koirala D (2022) Linguistic taboos and euphemisms in nepali. ACM Trans Asian Low-Resource Lang Inform Proces 21:1–26
Article Google Scholar
Elisabeth D, Budi I, Ibrohim M O (2020) Hate code detection in indonesian tweets using machine learning approach: a dataset and preliminary study, 1–6 (IEEE)
Thelen M, Riloff E, Hajic J, Matsumoto Y (2002) (eds) A bootstrapping method for learning semantic lexicons using extraction pattern contexts. (eds Hajic, J. & Matsumoto, Y.) Proceedings of the conference on empirical methods in natural language processing, 214–221
Roget PM (2020) Roget’s Thesaurus. Good Press
Takuro H, Yuichi S, Tahara Y, Ohsuga A (2020) Codewords detection in microblogs focusing on differences in word use between two corpora, 103–108 (IEEE)
Dwivedi V, Ghosh S (2023) Semantic relations classification in hindi compound nouns using embeddings. International Journal of Information Technology 1–6
Keh S S et al (2022) Eureka: Euphemism recognition enhanced through KNN-based methods and augmentation, 111–117 (ACL)
Yang H et al (2017) How to learn klingon without a dictionary: Detection and measurement of black keywords used by the underground economy, 751–769 (IEEE)
Yuan K, Lu H, Liao X, Wang X (2018) Reading thieves’ cant: Automatically identifying and understanding dark jargons from cybercrime marketplaces, 1027–1041. USENIX Association, Baltimore, MD
Google Scholar
Zhu W et al (2021) Self-supervised euphemism detection and identification for content moderation, 229–246 (IEEE)
Wiriyathammabhum P (2023) Tedb system description to a shared task on euphemism detection 2022. arXiv preprint arXiv:2301.06602
Sharaff A, Jain M, Modugula G (2022) Feature based cluster ranking approach for single document summarization. Int J Inform Techn 14:2057–2065
Google Scholar
Riaz S, Fatima M, Kamran M, Nisar MW (2019) Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Computing 22:7149–7164
Article Google Scholar
Ma B, Yuan H, Wu Y (2017) Exploring performance of clustering methods on document sentiment analysis. J Inform Sci 43:54–74
Article Google Scholar
Nhlabano V, Lutu P, Madhav N, Asare S D, Macharia P, Dwarika J (2018) (eds) Impact of text pre-processing on the performance of sentiment analysis models for social media data. (eds Madhav, N., Asare, S. D., Macharia, P. & Dwarika, J.) International Conference on Advances in Big Data, Computing and Data Communication Systems, 1–6
Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879
Article Google Scholar
Rout JK et al (2018) A model for sentiment and emotion analysis of unstructured social media text. Electronic Commerce Res 18:181–199
Article Google Scholar
Kalra V, Kashyap I, Kaur H (2022) Generation of domain-specific vocabulary set and classification of documents: weight-inclusion approach. International Journal of Information Technology 1–11
Baccianella S, Esuli A, Sebastiani F, Calzolari N et al (2010) (eds) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. (eds Calzolari, N. et al.) Proceedings of the Seventh International Conference on Language Resources and Evaluation, 2200–2204
Bradley MM, Lang PJ (1999) Affective norms for english words (ANEW): Instruction manual and affective ratings. Tech. Rep., University of Florida
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora
Mitchell M, Aguilar J, Wilson T, Durme B V, Yarowsky D, Baldwin T, Korhonen A, Livescu K, Bethard S (2013) (eds) Open domain targeted sentiment. (eds Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K. & Bethard, S.) Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1643–1654 (Seattle, Washington, USA)
Li L, Goh T-T, Jin D (2020) How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Comput Appl 32:4387–4415
Article Google Scholar
Saharia N (2017) Phone-based identification of language in code-mixed social network data. J Statist Manag Syst 20:565–574
Google Scholar
Naeem S, Wumaier A (2018) Study and implementing k-mean clustering algorithm on english text and techniques to find the optimal value of k. Int. J. Comput. Appl 182:7–14
Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J comput appl math 20:53–65
Article Google Scholar
Chakrabarty T, Saakyan A, Ghosh D Muresan S (2022) Flute: Figurative language understanding through textual explanations, 7139–7159

Download references

Author information

Authors and Affiliations

Data Engineering Lab, Department of Computer Science & Engineering, IIIT Senapati, Imphal, Manipur, 795002, India
Maibam Debina Devi & Navanath Saharia

Authors

Maibam Debina Devi
View author publications
You can also search for this author in PubMed Google Scholar
Navanath Saharia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Navanath Saharia.

Ethics declarations

Conflict of interest

The authors declare that there is no competing interests.

Ethical approval

Not applicable. In this experiment, neither a human nor an animal is employed as an experimental element.

Euphemism bearing domain-specific key-phrases

Domain-specific euphemism bearing key-phrases for English language used in this experiment is listed below.

Domain	Keyword
Disability	mentally challenged, special needs, physically challenged, differently abled, visually challenged, verbally challenged, verbally challenged, verbally challenged, chronically challenged, financially challenged, aesthetically challenged, vertically challenged, sartorially challenged, intellectually challenged, hearing impaired, horizontally challenged, electronically challenged
Physical appearance	heavyset, portly, Full-figured, extra pounds, husky, big, heavy, curvy, fluffy, zaftig, plus sized, Thick-boned, extra large, plump, rubenesque, stump
Profession	sanitation engineer, automobile engineer, administrative assistant, domestic engineer, technologist, correctional facility, cemetery operative, call girl, business girl, comfort girl, security officer, domestic manager, laid off, discharged, dismissed, made redundant, furloughed, pink slip, outplaced, riffed, bought out, released, unassigned, cut ties, uninstalled, separated, services no longer required, early retired, eased out, force resignation, stepped down, position eliminated, given the package, released from talent pool, declined to extend, assignment expired, helped her exit, one person layoff, managed out, career transition, career change opportunity, contract not renewed, end of trial period, involuntary separation, freed up for future, relieved of duties, taking it for team, promoted to customer, retail workers, universities, exterminating engineer, emporium, salon, parlor, beautician
Interrogation	may, could, would, In a way, to some extent
Politics	the deprived, man of modest means, less well off, under privileged, economically disadvantaged, substandard housing, illegal aliens, culturally deprived environment, undocumented workers, temporary negative cash, under performing assets, economic downturn, economic slowdown, undeveloped countries, underdeveloped countries, depressed neighborhood, developing countries, third world, forth world
Education	slow student, underachiever, learning difficulties, special needs, bend the truth, tell a white lie, color the truth, economical with truth, dissemble, unreliable, peer homework, comparing answers, collaborating, harvesting answer, non versatile, misspoke, exceptional child, detained, lead ship qualities

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Devi, M.D., Saharia, N. Identification of domain-specific euphemistic tweets using clustering. Int. j. inf. tecnol. 16, 21–31 (2024). https://doi.org/10.1007/s41870-023-01595-y

Download citation

Received: 25 July 2023
Accepted: 25 October 2023
Published: 22 November 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s41870-023-01595-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Finding Records in Social Media: A Natural Language Processing Fundamentals Exploration

Differences between antisemitic and non-antisemitic English language tweets

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Euphemism bearing domain-specific key-phrases

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Identification of domain-specific euphemistic tweets using clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Finding Records in Social Media: A Natural Language Processing Fundamentals Exploration

Differences between antisemitic and non-antisemitic English language tweets

Explore related subjects

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Euphemism bearing domain-specific key-phrases

Euphemism bearing domain-specific key-phrases

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now