Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Identification of domain-specific euphemistic tweets using clustering

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Social media platforms (SMPs) are frequently utilised as a readily accessible and comprehensive medium for expressing personal opinions nowdays. The use of euphemism, a linguistic strategy in which the underlying feeling of expressive content is veiled by the use of mild language, has been a longtime practise in the realm of SMPs for the purpose of reducing harshness or to discuss sensitive topics [1]. The identification of masked contents [2] in euphemism is challenging due to their inherent nature. This study presents a proposed identification mechanism aimed at detecting domain-specific euphemisms through the utilisation of clustering techniques. The pattern categorization feature is created utilising domain-specific lexical features combined with frequency-based features. In order to identify the most suitable match, the hybrid feature extraction algorithms incorporate uni-gram and bi-gram features dependent on frequency based feature, in conjunction with a lexicon. The objective of the dimension reduction phase is to address the issue of sparsity and to identify the most significant words for each sample in order to classify them into different domains using centroid and density-based clustering techniques. The DBSCAN algorithm is employed with an epsilon value of 2.5 and a minimum number of points set to 6, resulting in the identification of 7 distinct clusters. To calculate the optimal value for k in the K-means algorithm, the Silhouette score is utilised. The clusters that were obtained are examined by manual means. We compare our model to FLUTE dataset with epsilon value of 0.2, minpoints of 5 for DBSCAN, and obtain validation score of 0.55. The DBSCAN clustering algorithm generates distinct clusters that extend beyond the scope of the inquiry domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Domain-specific Euphemism bearing key-phrases for English language used in this experiment is annexed in Annexure A

Notes

  1. Word length three and above.

References

  1. Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J Potts C (2013) A computational approach to politeness with application to social factors, 250–259 (ACL)

  2. Magu R Luo J Fišer D et al (2018) (eds) Determining code words in euphemistic hate speech using word embedding networks. (eds Fišer, D. et al.) Proceedings of the 2nd Workshop on Abusive Language Online, 93–100

  3. Khanday AMUD, Khan QR, Rabani ST (2021) Identifying propaganda from online social networks during covid-19 using machine learning techniques. Int J InformTech 13:115–122

    Google Scholar 

  4. Zaid M, Batool F, Khan A, Mangla S H (2018) Euphemistic expressions: A challenge to l2 learners. International Journal on Studies in English Language and Literature 6

  5. Samoškaitė L (2011) 21st century political euphemisms: semantic and structural study. Master’s thesis, Department Of English Philology, Vytautas Magnus University

  6. Felt C, Riloff E, Klebanov B B et al (2020) (eds) Recognizing euphemisms and dysphemisms using sentiment analysis. (eds Klebanov, B. B. et al.) Proceedings of the Second Workshop on Figurative Language Processing, 136–145

  7. Zentner M, Grandjean D, Scherer KR (2008) Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8:494

    Article  Google Scholar 

  8. Russell JA (1980) A circumplex model of affect. J personality social psy 39:1161

    Article  Google Scholar 

  9. Plutchik R (2001) The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist 89:344–350

    Article  Google Scholar 

  10. Scherer K R, Shuman V, Fontaine J J R, Soriano C (2013) in The GRID meets the Wheel: Assessing emotional feeling via self-report (eds Fontaine, J. J. R., Scherer, K. R. & Soriano, C.) Components of Emotional Meaning: A sourcebook 281–298 (Oxford University Press)

  11. Kumar P, Vardhan M (2022) Pwebsa: Twitter sentiment analysis by combining plutchik wheel of emotion and word embedding. International Journal of Information Technology 1–9

  12. Esuli A, Sebastiani F, Calzolari N et al(2006) (eds) Sentiwordnet: A publicly available lexical resource for opinion mining. (eds Calzolari, N. et al.) Proceedings of the Fifth International Conference on Language Resources and Evaluation, 417–422 (ELRA)

  13. Strapparava C, Valitutti A, Lino M T, Xavier M F, Ferreira F, Costa R, Silva R (2004) (eds) Wordnet affect: an affective extension of wordnet. (eds Lino, M. T., Xavier, M. F., Ferreira, F., Costa, R. & Silva, R.) Proceedings of the Fourth International Conference on Language Resources and Evaluation, 1083–1086 (ELRA)

  14. Cambria E, Olsher D, Rajagopal D Stracuzzi D, Gunning D (2014) (eds) Senticnet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. (eds Stracuzzi, D. & Gunning, D.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Vol. 28, 1515–1521 (AAAI Press)

  15. Shutova E (2010) Automatic metaphor interpretation as a paraphrasing task, 1029–1037

  16. Pfaff KL Jr, RWG, Johnson MD (1997) Metaphor in using and understanding euphemism and dysphemism. Applied Psycholinguistics 18:59–83

  17. Shutova E, Sun L, Korhonen A (2010) Metaphor identification using verb and noun clustering, 1002–1010

  18. Rababah HA (2014) The translatability and use of x-phemism expressions (x-phemization): euphemisms, dysphemisms and orthophemisms in the medical discourse. Stud literature lang 9:229

    Google Scholar 

  19. Crespo-Fernández E (2018) Euphemism as a discursive strategy in us local and state politics. J Lang Polit 17:789–811

    Article  Google Scholar 

  20. Li-Na Z (2015) Euphemism in modern american english. Sino-US English Teach 12:265–270

    Google Scholar 

  21. Kaplan D (1999) Explorations in the theory of meaning as use

  22. Maran E et al (2020) Spirituality and practice of the euphemism in the workplace: perceptions of a nursing team. Revista Brasileira de Enfermagem 73

  23. Hojati A (2012) A study of euphemisms in the context of english-speaking media. Int J Linguistics 4:552

    Article  Google Scholar 

  24. Ryabova M (2013) Euphemisms and media framing. European Scientific Journal 9

  25. Sadullaeva N, Mamatova F, Sayfullaeva R (2020) Classification of euphemism and its formation in the uzbek language. J Crit Rev 7:426–430

    Google Scholar 

  26. Jamet D (2018) The neological functions of disease euphemisms in english and french: Verbal hygiene or speech pathology? Lexis. Journal in English Lexicology

  27. Niraula NB, Dulal S, Koirala D (2022) Linguistic taboos and euphemisms in nepali. ACM Trans Asian Low-Resource Lang Inform Proces 21:1–26

    Article  Google Scholar 

  28. Elisabeth D, Budi I, Ibrohim M O (2020) Hate code detection in indonesian tweets using machine learning approach: a dataset and preliminary study, 1–6 (IEEE)

  29. Thelen M, Riloff E, Hajic J, Matsumoto Y (2002) (eds) A bootstrapping method for learning semantic lexicons using extraction pattern contexts. (eds Hajic, J. & Matsumoto, Y.) Proceedings of the conference on empirical methods in natural language processing, 214–221

  30. Roget PM (2020) Roget’s Thesaurus. Good Press

  31. Takuro H, Yuichi S, Tahara Y, Ohsuga A (2020) Codewords detection in microblogs focusing on differences in word use between two corpora, 103–108 (IEEE)

  32. Dwivedi V, Ghosh S (2023) Semantic relations classification in hindi compound nouns using embeddings. International Journal of Information Technology 1–6

  33. Keh S S et al (2022) Eureka: Euphemism recognition enhanced through KNN-based methods and augmentation, 111–117 (ACL)

  34. Yang H et al (2017) How to learn klingon without a dictionary: Detection and measurement of black keywords used by the underground economy, 751–769 (IEEE)

  35. Yuan K, Lu H, Liao X, Wang X (2018) Reading thieves’ cant: Automatically identifying and understanding dark jargons from cybercrime marketplaces, 1027–1041. USENIX Association, Baltimore, MD

    Google Scholar 

  36. Zhu W et al (2021) Self-supervised euphemism detection and identification for content moderation, 229–246 (IEEE)

  37. Wiriyathammabhum P (2023) Tedb system description to a shared task on euphemism detection 2022. arXiv preprint arXiv:2301.06602

  38. Sharaff A, Jain M, Modugula G (2022) Feature based cluster ranking approach for single document summarization. Int J Inform Techn 14:2057–2065

    Google Scholar 

  39. Riaz S, Fatima M, Kamran M, Nisar MW (2019) Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Computing 22:7149–7164

    Article  Google Scholar 

  40. Ma B, Yuan H, Wu Y (2017) Exploring performance of clustering methods on document sentiment analysis. J Inform Sci 43:54–74

    Article  Google Scholar 

  41. Nhlabano V, Lutu P, Madhav N, Asare S D, Macharia P, Dwarika J (2018) (eds) Impact of text pre-processing on the performance of sentiment analysis models for social media data. (eds Madhav, N., Asare, S. D., Macharia, P. & Dwarika, J.) International Conference on Advances in Big Data, Computing and Data Communication Systems, 1–6

  42. Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access 5:2870–2879

    Article  Google Scholar 

  43. Rout JK et al (2018) A model for sentiment and emotion analysis of unstructured social media text. Electronic Commerce Res 18:181–199

    Article  Google Scholar 

  44. Kalra V, Kashyap I, Kaur H (2022) Generation of domain-specific vocabulary set and classification of documents: weight-inclusion approach. International Journal of Information Technology 1–11

  45. Baccianella S, Esuli A, Sebastiani F, Calzolari N et al (2010) (eds) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. (eds Calzolari, N. et al.) Proceedings of the Seventh International Conference on Language Resources and Evaluation, 2200–2204

  46. Bradley MM, Lang PJ (1999) Affective norms for english words (ANEW): Instruction manual and affective ratings. Tech. Rep., University of Florida

  47. Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora

  48. Mitchell M, Aguilar J, Wilson T, Durme B V, Yarowsky D, Baldwin T, Korhonen A, Livescu K, Bethard S (2013) (eds) Open domain targeted sentiment. (eds Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K. & Bethard, S.) Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1643–1654 (Seattle, Washington, USA)

  49. Li L, Goh T-T, Jin D (2020) How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Comput Appl 32:4387–4415

    Article  Google Scholar 

  50. Saharia N (2017) Phone-based identification of language in code-mixed social network data. J Statist Manag Syst 20:565–574

    Google Scholar 

  51. Naeem S, Wumaier A (2018) Study and implementing k-mean clustering algorithm on english text and techniques to find the optimal value of k. Int. J. Comput. Appl 182:7–14

    Google Scholar 

  52. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J comput appl math 20:53–65

    Article  Google Scholar 

  53. Chakrabarty T, Saakyan A, Ghosh D Muresan S (2022) Flute: Figurative language understanding through textual explanations, 7139–7159

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Navanath Saharia.

Ethics declarations

Conflict of interest

The authors declare that there is no competing interests.

Ethical approval

Not applicable. In this experiment, neither a human nor an animal is employed as an experimental element.

Euphemism bearing domain-specific key-phrases

Euphemism bearing domain-specific key-phrases

Domain-specific euphemism bearing key-phrases for English language used in this experiment is listed below.

Domain

Keyword

Disability

mentally challenged, special needs, physically challenged, differently abled, visually challenged, verbally challenged, verbally challenged, verbally challenged, chronically challenged, financially challenged, aesthetically challenged, vertically challenged, sartorially challenged, intellectually challenged, hearing impaired, horizontally challenged, electronically challenged

Physical appearance

heavyset, portly, Full-figured, extra pounds, husky, big, heavy, curvy, fluffy, zaftig, plus sized, Thick-boned, extra large, plump, rubenesque, stump

Profession

sanitation engineer, automobile engineer, administrative assistant, domestic engineer, technologist, correctional facility, cemetery operative, call girl, business girl, comfort girl, security officer, domestic manager, laid off, discharged, dismissed, made redundant, furloughed, pink slip, outplaced, riffed, bought out, released, unassigned, cut ties, uninstalled, separated, services no longer required, early retired, eased out, force resignation, stepped down, position eliminated, given the package, released from talent pool, declined to extend, assignment expired, helped her exit, one person layoff, managed out, career transition, career change opportunity, contract not renewed, end of trial period, involuntary separation, freed up for future, relieved of duties, taking it for team, promoted to customer, retail workers, universities, exterminating engineer, emporium, salon, parlor, beautician

Interrogation

may, could, would, In a way, to some extent

Politics

the deprived, man of modest means, less well off, under privileged, economically disadvantaged, substandard housing, illegal aliens, culturally deprived environment, undocumented workers, temporary negative cash, under performing assets, economic downturn, economic slowdown, undeveloped countries, underdeveloped countries, depressed neighborhood, developing countries, third world, forth world

Education

slow student, underachiever, learning difficulties, special needs, bend the truth, tell a white lie, color the truth, economical with truth, dissemble, unreliable, peer homework, comparing answers, collaborating, harvesting answer, non versatile, misspoke, exceptional child, detained, lead ship qualities

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Devi, M.D., Saharia, N. Identification of domain-specific euphemistic tweets using clustering. Int. j. inf. tecnol. 16, 21–31 (2024). https://doi.org/10.1007/s41870-023-01595-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-023-01595-y

Keywords

Navigation