Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/SP.2014.21guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Doppelgänger Finder: Taking Stylometry to the Underground

Published: 18 May 2014 Publication History

Abstract

Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.

Cited By

View all
  • (2024)A multiview clustering framework for detecting deceptive reviewsJournal of Computer Security10.3233/JCS-22000132:1(31-52)Online publication date: 2-Feb-2024
  • (2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
  • (2024)The Art of Cybercrime Community ResearchACM Computing Surveys10.1145/363936256:6(1-26)Online publication date: 10-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SP '14: Proceedings of the 2014 IEEE Symposium on Security and Privacy
May 2014
694 pages
ISBN:9781479946860

Publisher

IEEE Computer Society

United States

Publication History

Published: 18 May 2014

Author Tag

  1. Stylometry, cybercrime, underground forum

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A multiview clustering framework for detecting deceptive reviewsJournal of Computer Security10.3233/JCS-22000132:1(31-52)Online publication date: 2-Feb-2024
  • (2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
  • (2024)The Art of Cybercrime Community ResearchACM Computing Surveys10.1145/363936256:6(1-26)Online publication date: 10-Jan-2024
  • (2022)There is a fine Line between Personalization and Surveillance: Semantic User Interest Tracing via Entity-level AnalyticsProceedings of the 14th ACM Web Science Conference 202210.1145/3501247.3531592(22-33)Online publication date: 26-Jun-2022
  • (2021)Large-scale and Robust Code Authorship Identification with Deep Feature LearningACM Transactions on Privacy and Security10.1145/346166624:4(1-35)Online publication date: 19-Jul-2021
  • (2020)The Limitations of Stylometry for Detecting Machine-Generated Fake NewsComputational Linguistics10.1162/coli_a_0038046:2(499-510)Online publication date: 1-Jun-2020
  • (2020)eDarkFind: Unsupervised Multi-view Learning for Sybil Account DetectionProceedings of The Web Conference 202010.1145/3366423.3380263(1955-1965)Online publication date: 20-Apr-2020
  • (2019)Academic Plagiarism DetectionACM Computing Surveys10.1145/334531752:6(1-42)Online publication date: 16-Oct-2019
  • (2019)Text Analysis in Adversarial SettingsACM Computing Surveys10.1145/331033152:3(1-36)Online publication date: 18-Jun-2019
  • (2019)Hi Doppelgänger : Towards Detecting Manipulation in News CommentsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316496(197-205)Online publication date: 13-May-2019
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media