research-article

Automated Detection of Doxing on Twitter

Authors:

Anna Squicciarini,

Shomir WilsonAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 6, Issue CSCW2

Article No.: 276, Pages 1 - 24

https://doi.org/10.1145/3555167

Published: 11 November 2022 Publication History

Abstract

Doxing refers to the practice of disclosing sensitive personal information about a person without their consent. This form of cyberbullying is an unpleasant and sometimes dangerous phenomenon for online social networks. Although prior work exists on automated identification of other types of cyberbullying, a need exists for methods capable of detecting doxing on Twitter specifically. We propose and evaluate a set of approaches for automatically detecting second- and third-party disclosures on Twitter of sensitive private information, a subset of which constitutes doxing. We summarize our findings of common intentions behind doxing episodes and compare nine different approaches for automated detection based on string-matching and one-hot encoded heuristics, as well as word and contextualized string embedding representations of tweets. We identify an approach providing 96.86% accuracy and 97.37% recall using contextualized string embeddings and conclude by discussing the practicality of our proposed methods.

References

[1]

Sweta Agrawal and Amit Awekar. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In European Conference on Information Retrieval. Springer, 141--153.

[2]

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 54--59.

[3]

Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual String Embeddings for Sequence Labeling. In COLING 2018, 27th International Conference on Computational Linguistics. 1638--1649.

[4]

Monirah A Al-Ajlan and Mourad Ykhlef. 2018. Optimized Twitter cyberbullying detection based on deep learning. In 2018 21st Saudi Computer Society National Computer Conference (NCC). IEEE, 1--5.

[5]

Mohammed Ali Al-garadi, Kasturi Dewi Varathan, and Sri Devi Ravana. 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior, Vol. 63 (2016), 433--443.

Digital Library

[6]

Hazim Almuhimedi, Shomir Wilson, Bin Liu, Norman Sadeh, and Alessandro Acquisti. 2013. Tweets are forever: a large-scale quantitative analysis of deleted tweets. In Proceedings of the 2013 conference on Computer supported cooperative work. 897--908.

Digital Library

[7]

Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. 759--760.

Digital Library

[8]

Vijay Banerjee, Jui Telavane, Pooja Gaikwad, and Pallavi Vartak. 2019. Detection of cyberbullying using deep neural network. In 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS). IEEE, 604--607.

[9]

Rajesh Basak, Shamik Sural, Niloy Ganguly, and Soumya K Ghosh. 2019. Online public shaming on twitter: Detection, analysis, and mitigation. IEEE Transactions on Computational Social Systems, Vol. 6, 2 (2019), 208--220.

[10]

Amy Bellmore, Angela J Calvin, Jun-Ming Xu, and Xiaojin Zhu. 2015. The five W's of "bullying" on Twitter: Who, what, why, where, and when. Computers in human behavior, Vol. 44 (2015), 305--314.

Digital Library

[11]

Aylin Caliskan Islam, Jonathan Walsh, and Rachel Greenstadt. 2014. Privacy detective: Detecting private information and collective privacy behavior in a large social network. In Proceedings of the 13th Workshop on Privacy in the Electronic Society. 35--46.

Digital Library

[12]

Gerardo Canfora, Andrea Di Sorbo, Enrico Emanuele, Sara Forootani, and Corrado A Visaggio. 2018. A nlp-based solution to prevent from privacy leaks in social network posts. In Proceedings of the 13th International Conference on Availability, Reliability and Security. 1--6.

Digital Library

[13]

Ko Ling Chan. 2019. Child victimization in the context of family violence.

[14]

Mengtong Chen, Anne Shann Yue Cheung, and Ko Ling Chan. 2019. Doxing: What adolescents look for and their intentions. International journal of environmental research and public health, Vol. 16, 2 (2019), 218.

[15]

Qiqi Chen, Ko Ling Chan, and Anne Shann Yue Cheung. 2018. Doxing victimization and emotional problems among secondary school students in Hong Kong. International journal of environmental research and public health, Vol. 15, 12 (2018), 2665.

[16]

Maral Dadvar and Kai Eckert. 2020. Cyberbullying detection in social networks using deep learning based models. In International Conference on Big Data Analytics and Knowledge Discovery. Springer, 245--255.

Digital Library

[17]

Elena Daehnhardt, Nick K Taylor, and Yanguo Jing. 2015. Usage and consequences of privacy settings in microblogs. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing. IEEE, 667--674.

[18]

Adam Dalton, Ehsan Aghaei, Ehab Al-Shaer, Archna Bhatia, Esteban Castillo, Zhuo Cheng, Sreekar Dhaduvai, Qi Duan, Bryanna Hebenstreit, Md Mazharul Islam, Younes Karimi, Amir Masoumzadeh, Brodie Mather, Sashank Santhanam, Samira Shaikh, Alan Zemel, Tomek Strzalkowski, and Bonnie J. Dorr. 2020. Active Defense Against Social Engineering: The Case for Human Language Technology. In Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management. European Language Resources Association, Marseille, France, 1--8. https://www.aclweb.org/anthology/2020.stoc-1.1

[19]

A Dalton, A Zemel, A Masoumzadeh, A Bhatia, B Dorr, B Mather, B Hebenstreit, E Al-Shaer, ECJ Ellisa Khoja, L Bunch, et al. 2019. Modeling social engineering risk using attitudes, actions, and intentions reflected in language use. In Proc. Thirty-Second International Florida Artificial Intelligence Research Society Conference, Sarasota, FL, USA, May 19--22 2019.

[20]

Leena Deodhar, Dinil Mon Divakaran, and Mohan Gurusamy. 2017. Analysis of Privacy Leak on Twitter. In GLOBECOM 2017--2017 IEEE Global Communications Conference. IEEE, 1--6.

[21]

Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation, Vol. 10, 7 (1998), 1895--1923.

[22]

David M Douglas. 2016. Doxing: a conceptual analysis. Ethics and information technology, Vol. 18, 3 (2016), 199--210.

[23]

Jim Edwards. 2017. FBI's `Gamergate' file says prosecutors didn't charge men who sent death threats to female video game fans - even when suspects confessed. https://www.businessinsider.com/gamergate-fbi-file-2017--2. [Online; accessed 28-September-2021].

[24]

Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, Vol. 76, 5 (1971), 378.

[25]

Li Gao and James Stanyer. 2014. Hunting corrupt officials online: the human flesh search engine and the search for justice in China. Information, Communication & Society, Vol. 17, 7 (2014), 814--829.

[26]

R Geetha, S Karthika, and Ponnurangam Kumaraguru. 2020. "Will I Regret for This Tweet?'-Twitter User's Behavior Analysis System for Private Data Disclosure. Comput. J. (2020).

[27]

Kilem L Gwet. 2014. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.

[28]

Jiawei Han, Micheline Kamber, and Jian Pei. 2012. 2 - Getting to Know Your Data. In Data Mining (Third Edition) third edition ed.), Jiawei Han, Micheline Kamber, and Jian Pei (Eds.). Morgan Kaufmann, Boston, 39--82. https://doi.org/10.1016/B978-0--12--381479--1.00002--2

[29]

Qianjia Huang, Vivek Kumar Singh, and Pradeep Kumar Atrey. 2014. Cyber bullying detection using social and textual analysis. In Proceedings of the 3rd International Workshop on Socially-Aware Multimedia. 3--6.

Digital Library

[30]

Andri Ioannou, Jeremy Blackburn, Gianluca Stringhini, Emiliano De Cristofaro, Nicolas Kourtellis, and Michael Sirivianos. 2018. From risk factors to detection and intervention: a practical proposal for future work on cyberbullying. Behaviour & Information Technology, Vol. 37, 3 (2018), 258--266.

[31]

Taraneh Khazaei, Lu Xiao, Robert E Mercer, and Atif Khan. 2016. Detecting privacy preferences from online social footprints: a literature review. IConference 2016 Proceedings (2016).

[32]

Helena C Kraemer. 2014. Kappa coefficient. Wiley StatsRef: Statistics Reference Online (2014), 1--4.

[33]

Akshi Kumar and Nitin Sachdeva. 2020. Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data. Multimedia systems (2020), 1--15.

Digital Library

[34]

J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.

[35]

Raquel Lozano-Blasco, Alejandra Cortés-Pascual, and Pilar Latorre-Mart'inez. 2020. Being a cybervictim and a cyberbully--The duality of cyberbullying: A meta-analysis. Computers in Human Behavior (2020), 106444.

[36]

David Mart'in-Gutiérrez, Gustavo Hernández-Pe naloza, Alberto Belmonte Hernández, Alicia Lozano-Diez, and Federico Álvarez. 2021. A Deep Learning Approach for Robust Detection of Bots in Twitter Using Transformers. IEEE Access, Vol. 9 (2021), 54591--54601.

[37]

Jasmine McNealy. 2018. What is doxxing, and why is it so scary? https://theconversation.com/what-is-doxxing-and-why-is-it-so-scary-95849. [Online; accessed 28-September-2021].

[38]

AKM Nuhil Mehdy and Hoda Mehrpouyan. 2020. A User-Centric and Sentiment Aware Privacy-Disclosure Detection Framework based on Multi-input Neural Network. In PrivateNLP@ WSDM. 21--26.

[39]

Nikita Nitin Parab. 2019. Twitter Rumour Detection using Temporal Property of Tweets. Ph.D. Dissertation. Dublin, National College of Ireland.

[40]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14--1162

[41]

Felice Resnik, Amy Bellmore, Jun-Ming Xu, and Xiaojin Zhu. 2016. Celebrities emerge as advocates in tweets about bullying. Translational Issues in Psychological Science, Vol. 2, 3 (2016), 323.

[42]

Eli Rosenberg and Herman Wong. 2017. A police officer fatally shot a man while responding to an emergency call now called a `swatting' prank. https://www.washingtonpost.com/news/post-nation/wp/2017/12/29/a-police-officer-fatally-shot-a-man-while-responding-to-an-emergency-call-now-called-a-swatting-prank. [Online; accessed 28-September-2021].

[43]

Peter Snyder, Periwinkle Doerfler, Chris Kanich, and Damon McCoy. 2017. Fifteen minutes of unwanted fame: Detecting and characterizing doxing. In Proceedings of the 2017 internet measurement conference. 432--444.

Digital Library

[44]

Daniel J Solove. 2007. The future of reputation: Gossip, rumor, and privacy on the Internet. Yale University Press.

[45]

Xuemeng Song, Xiang Wang, Liqiang Nie, Xiangnan He, Zhumin Chen, and Wei Liu. 2018. A personal privacy preserving framework: I let you know who can see what. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 295--304.

Digital Library

[46]

Ananya Srivastava, Mohammed Hasan, Bhargav Yagnik, Rahee Walambe, and Ketan Kotecha. 2021. Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media. arXiv preprint arXiv:2105.04913 (2021).

[47]

Gianluca Stringhini and Olivier Thonnard. 2015. That ain't you: Blocking spearphishing through behavioral modelling. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 78--97.

Digital Library

[48]

Qiutian Sun and Yabin Xu. 2019. Research on Privacy Concerns of Social Network Users. In 2019 IEEE 5th International Conference on Computer and Communications (ICCC). IEEE, 1453--1460.

[49]

Johnny Torres, Carmen Vaca. 2019. Cross-lingual perspectives about crisis-related conversations on Twitter. In Companion Proceedings of The 2019 World Wide Web Conference. 255--261.

Digital Library

[50]

Prasanna Umar, Anna Squicciarini, and Sarah Rajtmajer. 2019. Detection and analysis of self-disclosure in online news commentaries. In The World Wide Web Conference. 3272--3278.

Digital Library

[51]

David Van Bruwaene, Qianjia Huang, and Diana Inkpen. 2020. A multi-platform dataset for detecting cyberbullying in social media. Language Resources and Evaluation (2020), 1--24.

[52]

Qiaozhi Wang, Hao Xue, Fengjun Li, Dongwon Lee, and Bo Luo. 2019. # DontTweetThis: Scoring Private Information in Social Networks. Proceedings on Privacy Enhancing Technologies, Vol. 2019, 4 (2019), 72--92.

[53]

Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88--93.

[54]

Krzysztof Wróbel. 2019. Approaching automatic cyberbullying detection for Polish tweets. (2019).

[55]

Guosheng Xu, Chunhao Qi, Hai Yu, Shengwei Xu, Chunlu Zhao, and Jing Yuan. 2019. Detecting Sensitive Information of Unstructured Text Using Convolutional Neural Network. In 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). IEEE, 474--479.

[56]

Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. 2012. Learning from bullying traces in social media. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies. 656--666.

Digital Library

[57]

H. A. Yajam, Y. K. Ahmadabadi, and M. Akhaee. 2016. PapiaPass: Sentence-based passwords using dependency trees. In 2016 13th International Iranian Society of Cryptology Conference on Information Security and Cryptology (ISCISC). 91--96. https://doi.org/10.1109/ISCISC.2016.7736457

[58]

Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, and Chris Biemann. 2020. Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models. In Proceedings of the 28th International Conference on Computational Linguistics. 1048--1060.

[59]

Kirsten Zeiter, Sandra Pepera, and Molly Middlehurst. 2019. Tweets That Chill: Analyzing Online Violence Against Women in Politics. https://www.ndi.org/tweets-that-chill Publisher: National Democratic Institute.

Cited By

Datey IZytko D(2024)"Just Like, Risking Your Life Here": Participatory Design of User Interactions with Risk Detection AI to Prevent Online-to-Offline Harm Through Dating AppsProceedings of the ACM on Human-Computer Interaction10.1145/36869068:CSCW2(1-41)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686906
Azumah SAdewopo VElsayed ZElsayed NOzer M(2024)A Secure Open-Source Intelligence Framework For Cyberbullying Investigation2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)10.1109/ICAIC60265.2024.10433832(1-8)Online publication date: 7-Feb-2024
https://doi.org/10.1109/ICAIC60265.2024.10433832
Wang CTang HZhu HZheng JJiang C(2024)Behavioral authentication for security and safetySecurity and Safety10.1051/sands/20240033(2024003)Online publication date: 30-Apr-2024
https://doi.org/10.1051/sands/2024003
Show More Cited By

Index Terms

Automated Detection of Doxing on Twitter
1. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections
    2. Social aspects of security and privacy

Recommendations

Doxing: a conceptual analysis

Doxing is the intentional public release onto the Internet of personal information about an individual by a third party, often with the intent to humiliate, threaten, intimidate, or punish the identified individual. In this paper I present a conceptual ...
Loose tweets: an analysis of privacy leaks on twitter
WPES '11: Proceedings of the 10th annual ACM workshop on Privacy in the electronic society

Twitter has become one of the most popular microblogging sites for people to broadcast (or "tweet") their thoughts to the world in 140 characters or less. Since these messages are available for public consumption, one may expect these tweets not to ...
Privacy and twitter in qatar: traditional values in the digital world
WebSci '16: Proceedings of the 8th ACM Conference on Web Science

We explore the meaning of "privacy" from the perspective of Qatari nationals as it manifests in digital environments. Although privacy is an essential and widely respected value in many cultures, the way in which it is understood and enacted depends on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 6, Issue CSCW2

CSCW

November 2022

8205 pages

EISSN:2573-0142

DOI:10.1145/3571154

Editor:
Jeff Nichols
Google

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2022

Published in PACMHCI Volume 6, Issue CSCW2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
448
Total Downloads

Downloads (Last 12 months)192
Downloads (Last 6 weeks)34

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Datey IZytko D(2024)"Just Like, Risking Your Life Here": Participatory Design of User Interactions with Risk Detection AI to Prevent Online-to-Offline Harm Through Dating AppsProceedings of the ACM on Human-Computer Interaction10.1145/36869068:CSCW2(1-41)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686906
Azumah SAdewopo VElsayed ZElsayed NOzer M(2024)A Secure Open-Source Intelligence Framework For Cyberbullying Investigation2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC)10.1109/ICAIC60265.2024.10433832(1-8)Online publication date: 7-Feb-2024
https://doi.org/10.1109/ICAIC60265.2024.10433832
Wang CTang HZhu HZheng JJiang C(2024)Behavioral authentication for security and safetySecurity and Safety10.1051/sands/20240033(2024003)Online publication date: 30-Apr-2024
https://doi.org/10.1051/sands/2024003
Karimi YSquicciarini AForster P(2024)A longitudinal dataset and analysis of Twitter ISIS users and propagandaSocial Network Analysis and Mining10.1007/s13278-023-01177-714:1Online publication date: 3-Jan-2024
https://doi.org/10.1007/s13278-023-01177-7
Zheng WWalquist EDatey IZhou XBerishaj KMcdonald MParkhill MZhu DZytko D(2023)Towards Trauma-Informed Data Donation of Sexual Experience in Online Dating to Improve Sexual Risk Detection AIAdjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586182.3616689(1-3)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586182.3616689
Löbner STesfay WBracamonte VNakamura T(2023)Systematizing the State of Knowledge in Detecting Privacy Sensitive Information in Unstructured Texts using Machine Learning2023 20th Annual International Conference on Privacy, Security and Trust (PST)10.1109/PST58708.2023.10320187(1-7)Online publication date: 21-Aug-2023
https://doi.org/10.1109/PST58708.2023.10320187
Azumah SElsayed NElSayed ZOzer M(2023)Cyberbullying in text content detection: an analytical reviewInternational Journal of Computers and Applications10.1080/1206212X.2023.225604845:9(579-586)Online publication date: 14-Sep-2023
https://doi.org/10.1080/1206212X.2023.2256048

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents