Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3578503.3583619acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

A longitudinal study of the top 1% toxic Twitter profiles

Published: 30 April 2023 Publication History

Abstract

Toxicity is endemic to online social networks (OSNs) including Twitter. It follows a Pareto-like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these “toxic profiles” is critical. Prior research has largely focused on sporadic, event-centric toxic content (i.e., tweets) to characterize toxicity on the platform. Instead, we approach the problem of characterizing toxic content from a profile-centric point of view. We study 143K Twitter profiles and focus on the behavior of the top 1% producers of toxic content on Twitter, based on toxicity scores of their tweets availed by Perspective API. With a total of 293M tweets, spanning 16 years of activity, the longitudinal data allows us to reconstruct the timelines of all profiles involved. We use these timelines to gauge the behavior of the most toxic Twitter profiles compared to the rest of the Twitter population. We study the pattern of tweet posting from highly toxic accounts, based on the frequency and how prolific they are, the nature of hashtags and URLs, profile metadata, and Botometer scores. We find that the highly toxic profiles post coherent and well-articulated content, their tweets keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other, and have a high likelihood of bot-like behavior, likely to have progenitors with intentions to influence, based on high fake followers score. Our work contributes insight into the top 1% toxic profiles on Twitter and establishes the profile-centric approach to investigate toxicity on Twitter to be beneficial. The identification of the most toxic profiles can aid in the reporting and suspension of such profiles, making Twitter a better place for discussions. Finally, we contribute to the research community with this large-scale and longitudinal dataset1, annotated with six types of toxic scores.

References

[1]
Meysam Alizadeh, Jacob N Shapiro, Cody Buntain, and Joshua A Tucker. 2020. Content-based features predict social media influence operations. Science advances 6, 30 (2020).
[2]
Raghad Alshalan, Hend Al-Khalifa, Duaa Alsaeed, Heyam Al-Baity, Shahad Alshalan, 2020. Detection of Hate Speech in COVID-19–Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach. Journal Medical Internet Research 22, 12 (8 Dec 2020).
[3]
Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. 2018. Automatic identification and classification of misogynistic language on twitter. In ICANLIS.
[4]
ari. 2019. The Automated Readability Index.
[5]
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In WoSE.
[6]
Matthew C Benigni, Kenneth Joseph, and Kathleen M Carley. 2017. Online extremism and the communities that sustain it: Detecting the ISIS supporting community on Twitter. PloS one 12, 12 (2017).
[7]
Carlos Arcila Calderón, Gonzalo de la Vega, and David Blanco Herrero. 2020. Topic modeling and characterization of hate speech against immigrants on Twitter around the emergence of a far-right party in Spain. Social Sciences 9, 11 (2020), 188.
[8]
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In ACM Web Science.
[9]
Isobelle Clarke and Jack Grieve. 2017. Dimensions of abusive language on Twitter. In ALO.
[10]
Shaniece Criss, Eli Michaels, Kamra Solomon, Amani Allen, and Thu Nguyen. 2020. Twitter Fingers and Echo Chambers: Exploring Expressions and Experiences of Online Racism Using Twitter. Journal of Racial and Ethnic Health Disparities 8 (10 2020).
[11]
Ashwin Geet d’Sa, Irina Illina, and Dominique Fohr. 2020. Bert and fasttext embeddings for automatic detection of toxic speech. In OCTA.
[12]
Derar Eleyan, Abed Othman, and Amna Eleyan. 2020. Enhancing Software Comments Readability Using Flesch Reading Ease Score. Information 11, 9 (Sep 2020), 430. https://doi.org/10.3390/info11090430
[13]
Norwegian Defence Research Establishment. 2019. Social network centric warfare - understanding influence operations in social media. Technical Report. FFI - Forsvarets forskningsinstitutt - Norwegian Defence Research Establishment.
[14]
Miriam Fernandez, Moizzah Asif, and Harith Alani. 2018. Understanding the roots of radicalisation on twitter. In ACM Web Science.
[15]
Emilio Ferrara. 2020. Bots, elections, and social media: a brief overview. Disinformation, Misinformation, and Fake News in Social Media (2020), 95–114.
[16]
Rudolph Flesch. 1948. A new readability yardstick.Journal of applied psychology 32, 3 (1948), 221.
[17]
Paula Fortuna and Sérgio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM CSUR 51, 4 (2018).
[18]
Antigoni Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media.
[19]
Aditya Gaydhani, Vikrant Doma, Shrikant Kendre, and Laxmi Bhagwat. 2018. Detecting hate speech and offensive language on twitter using machine learning: An n-gram and tfidf based approach. arXiv preprint arXiv:1809.08651(2018).
[20]
Corrado Gini. 1912. Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche.[Fasc. I.]. Tipogr. di P. Cuppini.
[21]
Raul Gomez, Jaume Gibert, Lluis Gomez, and Dimosthenis Karatzas. 2019. Exploring Hate Speech Detection in Multimodal Publications.
[22]
Google. 2021. Perspective API - Using machine learning to reduce toxicity online.https://www.perspectiveapi.com/.
[23]
Lara Grimminger and Roman Klinger. 2021. Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection.
[24]
Bing He, Caleb Ziems, Sandeep Soni, Naren Ramakrishnan, Diyi Yang, and Srijan Kumar. 2020. Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis.
[25]
Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving Google’s Perspective API Built for Detecting Toxic Comments. arXiv:1702.08138
[26]
Fortinet Inc.2021. Web Filter Categories. https://fortiguard.com/webfilter/categories.
[27]
Edwin Jain, Stephan Brown, Jeffery Chen, Erin Neaton, Mohammad Baidas, Ziqian Dong, Huanying Gu, and Nabi Sertac Artan. 2018. Adversarial Text Generation for Google’s Perspective API. In CSCI.
[28]
Akshita Jha and Radhika Mamidi. 2017. When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data. In Proceedings of the second workshop on NLP and computational social science. 7–16.
[29]
Kaggle. 2020. Hatred on Twitter During MeToo Movement - Kaggle. https://www.kaggle.com/rahulgoel1106/hatred-on-twitter-during-metoo-movement.
[30]
Tobias R. Keller and Ulrike Klinger. 2019. Social Bots in Election Campaigns: Theoretical, Empirical, and Methodological Implications. Political Communication 36, 1 (2019), 171–189.
[31]
Eun-Kyeong Kim and Hang-Hyun Jo. 2016. Measuring burstiness for finite event sequences. Physical Review E 94, 3 (2016), 032311.
[32]
Bence Kollanyi, Philip N Howard, and Samuel C Woolley. 2016. Bots and automation over Twitter during the first US presidential debate. Comprop data memo 1(2016), 1–4.
[33]
Kristopher Kyle, Scott A. Crossley, and Scott Jarvis. 2021. Assessing the Validity of Lexical Diversity Indices Using Direct Judgements. Language Assessment Quarterly 18, 2 (2021), 154–170. https://doi.org/10.1080/15434303.2020.1844205 arXiv:https://doi.org/10.1080/15434303.2020.1844205
[34]
Po-Ching Lin and Po-Min Huang. 2013. A study of effective features for detecting long-surviving Twitter spam accounts. In 2013 15th International Conference on Advanced Communications Technology (ICACT). 841–846.
[35]
Binny Mathew, Navish Kumar, Pawan Goyal, Animesh Mukherjee, 2018. Analyzing the hate and counter speech accounts on twitter. arXiv preprint arXiv:1812.02712(2018).
[36]
Bryan C McCannon. 2019. Readability and research impact. Economics Letters 180(2019), 76–79.
[37]
Arian Akhavan Niaki, Nguyen Phong Hoang, Phillipa Gill, Amir Houmansadr, 2020. Triplet Censors: Demystifying Great {Firewall’s}{DNS} Censorship Behavior. In FOCI.
[38]
Diogo Pacheco, Pik-Mai Hui, Christopher Torres-Lugo, Bao Tran Truong, Alessandro Flammini, and Filippo Menczer. 2021. Uncovering Coordinated Networks on Social Media: Methods and Case Studies.
[39]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
[40]
Pew Research Center. 2018. Bots in the Twittersphere. https://www.pewresearch.org/internet/2018/04/09/bots-in-the-twittersphere/.
[41]
Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Detecting and tracking political abuse in social media. In Proceedings of the International AAAI Conference on Web and social media, Vol. 5. 297–304.
[42]
GeoPy Repository. 2022. GeoPy Documentation. https://geopy.readthedocs.io/en/stable/.
[43]
Manoel Horta Ribeiro, Pedro H Calais, Yuri A Santos, Virgílio AF Almeida, and Wagner Meira Jr. 2018. Characterizing and detecting hateful users on twitter. In Twelfth international AAAI conference on web and social media.
[44]
Manoel Horta Ribeiro, Pedro H. Calais, Yuri A. Santos, Virgílio A. F. Almeida, and Wagner Meira Jr au2. 2018. "Like Sheep Among Wolves": Characterizing Hateful Users on Twitter. In MWSDM.
[45]
Manoel Horta Ribeiro, Pedro H. Calais, Yuri A. Santos, Virgílio A. F. Almeida, and Wagner Meira. 2018. Characterizing and Detecting Hateful Users on Twitter.
[46]
Caitlin M Rivers and Bryan L Lewis. 2014. Ethical research standards in a world of big data. F1000Research 3(2014).
[47]
Karla Dhungana Sainju, Niti Mishra, Akosua Kuffour, and Lisa Young. 2021. Bullying discourse on Twitter: An examination of bully-related tweets using supervised machine learning. Computers in human behavior 120 (2021), 106735.
[48]
Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2020. Detection of novel social bots by ensembles of specialized classifiers. In Proceedings of the 29th ACM international conference on information & knowledge management. 2725–2732.
[49]
RJ Senter and Edgar A Smith. 1967. Automated readability index. Technical Report. AMRL.
[50]
Statista. 2019. Number of monthly active Twitter users worldwide.
[51]
Hannah Stevens, Muhammad Ehab Rasul, and Yoo Jung Oh. 2022. Emotions and Incivility in Vaccine Mandate Discourse: Natural Language Processing Insights. JMIR Infodemiology 2, 2 (13 Sep 2022), e37635. https://doi.org/10.2196/37635
[52]
Mikael Thalen. 2022. Twitter verified a number of bot accounts. https://www.dailydot.com/debug/twitter-verified-bot-accounts/.
[53]
Twitter. 2021. Twitter API Documentation. https://developer.twitter.com/en/docs/twitter-api.
[54]
Twitter. 2023. Automation Rules.
[55]
Twitter. 2023. hateful conduct policy.
[56]
Twitter. 2023. Platform manipulation and spam policy.
[57]
Twitter.com. 2022. Twitter automation rules. https://help.twitter.com/en/rules-and-policies/twitter-automation.
[58]
Zeerak Waseem. 2016. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the first workshop on NLP and computational social science. 138–142.
[59]
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93.
[60]
Hajime Watanabe, Mondher Bouazizi, and Tomoaki Ohtsuki. 2018. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access 6(2018), 13825–13835.
[61]
Yifang Wei and Lisa Singh. 2018. Detecting Users Who Share Extremist Content on Twitter. Springer International Publishing, Cham, 351–368.
[62]
Jason wise. 2022. Twitter Bot Accounts: How Many Bots Are On Twitter in 2022?https://earthweb.com/how-many-bots-are-on-twitter/.
[63]
Dale John Wong. 2023. Twitter now has verified bot accounts to help you follow the good ones.
[64]
Ziqi Zhang and Lei Luo. 2019. Hate speech detection: A solved problem? the challenging case of long tail on twitter. Semantic Web 10, 5 (2019), 925–945.

Cited By

View all
  • (2024)Beyond phase-in: assessing impacts on disinformation of the EU Digital Services ActAI and Ethics10.1007/s43681-024-00467-wOnline publication date: 11-Apr-2024
  • (2024)The medium is the message: toxicity declines in structured vs unstructured online deliberationsWorld Wide Web10.1007/s11280-024-01269-027:3Online publication date: 8-May-2024
  • (2023)Exploring the Distinctive Tweeting Patterns of Toxic Twitter Users2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386402(3624-3633)Online publication date: 15-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023
April 2023
373 pages
ISBN:9798400700897
DOI:10.1145/3578503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Perspective score
  2. Twitter
  3. longitudinal
  4. measurement
  5. profile
  6. toxicity

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • MQ university cyber security Hub

Conference

WebSci '23
Sponsor:
WebSci '23: 15th ACM Web Science Conference 2023
April 30 - May 1, 2023
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Beyond phase-in: assessing impacts on disinformation of the EU Digital Services ActAI and Ethics10.1007/s43681-024-00467-wOnline publication date: 11-Apr-2024
  • (2024)The medium is the message: toxicity declines in structured vs unstructured online deliberationsWorld Wide Web10.1007/s11280-024-01269-027:3Online publication date: 8-May-2024
  • (2023)Exploring the Distinctive Tweeting Patterns of Toxic Twitter Users2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386402(3624-3633)Online publication date: 15-Dec-2023
  • (2023)Theory and practice of agenda setting: understanding media, bot, and public agendas in the South Korean presidential electionAsian Journal of Communication10.1080/01292986.2023.226111234:1(24-56)Online publication date: 4-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media