Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3038912.3052600acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Public Access

Tools for Automated Analysis of Cybercriminal Markets

Published: 03 April 2017 Publication History

Abstract

Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and criminal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to understand the markets makes manual exploration of these forums unscalable. In this work, we propose an automated, top-down approach for analyzing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pair of case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices.

References

[1]
Sadia Afroz, Vaibhav Garg, Damon McCoy, and Rachel Greenstadt. 2013. Honor among thieves: A common's analysis of cybercrime economies. (2013), 1--11.
[2]
David Bamman, Brendan O'Connor, and Noah A. Smith. 2013. Learning Latent Personas of Film Characters. In Proceedings of ACL .
[3]
Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics 18, 4 (1992), 467--479.
[4]
Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of EMNLP.
[5]
Hal Daume III. 2007. Frustratingly Easy Domain Adaptation. In Proceedings of ACL.
[6]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR 12 (2011), 2121--2159.
[7]
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of EMNLP.
[8]
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. (2008), 1871--874.
[9]
J. L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378--382.
[10]
Jason Franklin, Vern Paxson, Adrian Perrig, and Stefan Savage. 2007. An Inquiry into the Nature and Causes of the Wealth of Internet Miscreants. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07). ACM, 375--388.
[11]
Dayne Freitag and Andrew McCallum. 2000. Information Extraction with HMM Structures Learned by Stochastic Optimization. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence.
[12]
Vaibhav Garg, Sadia Afroz, Rebekah Overdorf, and Rachel Greenstadt. 2015. Computer-Supported Cooperative Crime. In Financial Cryptography and Data Security. 32--43.
[13]
T. J. Holt and E. Lampke. 2010. Exploring Stolen Data Markets Online: Products and Market Forces. Criminal Justice Studies 23, 1 (2010), 33--50.
[14]
Rasoul Kaljahi, Jennifer Foster, Johann Roturier, Corentin Ribeyre, Teresa Lynn, and Joseph Le Roux. 2015. Foreebank: Syntactic Analysis of Customer Support Forums. In Proceedings of EMNLP.
[15]
Su Nam Kim, Li Wang, and Timothy Baldwin. 2010. Tagging and Linking Web Forum Posts. In Proceedings of CoNLL.
[16]
Brian Krebs. 2013a. Cards Stolen in Target Breach Flood Underground Markets. http://krebsonsecurity. com/2013/12/cards-stolen-in-target-breach\-flood-underground-markets. (2013).
[17]
Brian Krebs. 2013b. Who's Selling Credit Cards from Target? http://krebsonsecurity.com/2013/12/ whos-selling-credit-cards-from-target. (2013).
[18]
Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, and Dan Klein. 2015. An Empirical Analysis of Optimization for Max-Margin NLP. In Proceedings of EMNLP.
[19]
Marco Lui and Timothy Baldwin. 2010. Classifying User Forum Participants: Separating the Gurus from the Hacks, and Other Tales of the Internet. In Proceedings of the Australasian Language Technology Association Workshop (ALTA).
[20]
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of ACL: System Demonstrations.
[21]
Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011. An Analysis of Underground Forums. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 71--80.
[22]
NIST. 2005. The ACE 2005 Evaluation Plan. In NIST.
[23]
Brendan O'Connor, Brandon M. Stewart, and Noah A. Smith. 2013. Learning to Extract International Relations from Political Context. In Proceedings of ACL.
[24]
Department of Justice. 2015. Major Computer Hacking Forum Dismantled. https://www.justice.gov/opa/pr/ major-computer-hacking-forum-dismantled. (2015).
[25]
Ankur P. Parikh, Hoifung Poon, and Kristina Toutanova. 2015. Grounded Semantic Parsing for Complex Knowledge Extraction. In Proceedings of NAACL.
[26]
Nathan J. Ratliff, Andrew Bagnell, and Martin Zinkevich. 2007. (Online) Subgradient Methods for Structured Prediction. In Proceedings of the International Conference on Artificial Intelligence and Statistics.
[27]
Kyle Soska and Nicolas Christin. 2015. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In 24th USENIX Security Symposium (USENIX Security 15). 33--48.
[28]
Brett Stone-Gross, Thorsten Holz, Gianluca Stringhini, and Giovanni Vigna. 2011. The Underground Economy of Spam: A Botmaster's Perspective of Coordinating Large-scale Spam Campaigns. In Proceedings of the 4th USENIX Conference on Large-scale Exploits and Emergent Threats (LEET'11).
[29]
Mihai Surdeanu. 2013. Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling,. In Proceedings of the TAC-KBP 2013 Workshop.
[30]
Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL.
[31]
Li Wang, Marco Lui, Su Nam Kim, Joakim Nivre, and Timothy Baldwin. 2011. Predicting Thread Discourse Structure over Technical Web Forums. In Proceedings of EMNLP.
[32]
M. Yip, N. Shadbolt, and C. Webber. 2012. Structural analysis of online criminal social networks. In Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on. 60--65

Cited By

View all
  • (2024)The Art of Cybercrime Community ResearchACM Computing Surveys10.1145/3639362Online publication date: 10-Jan-2024
  • (2024)Heterogeneous Domain Adaptation for Multistream Classification on Cyber Threat DataIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.318168221:1(1-11)Online publication date: Jan-2024
  • (2024)Tor-Quest (The Onion Router Crawler)2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM)10.1109/ICONSTEM60960.2024.10568905(1-5)Online publication date: 4-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '17: Proceedings of the 26th International Conference on World Wide Web
April 2017
1678 pages
ISBN:9781450349130

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 03 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cybercrime
  2. machine learning/nlp
  3. measurement

Qualifiers

  • Research-article

Funding Sources

Conference

WWW '17
Sponsor:
  • IW3C2

Acceptance Rates

WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)36
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Art of Cybercrime Community ResearchACM Computing Surveys10.1145/3639362Online publication date: 10-Jan-2024
  • (2024)Heterogeneous Domain Adaptation for Multistream Classification on Cyber Threat DataIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.318168221:1(1-11)Online publication date: Jan-2024
  • (2024)Tor-Quest (The Onion Router Crawler)2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM)10.1109/ICONSTEM60960.2024.10568905(1-5)Online publication date: 4-Apr-2024
  • (2023)Know your cybercriminalProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620269(553-570)Online publication date: 9-Aug-2023
  • (2023)C2Store: C2 Server Profiles at Your FingertipsProceedings of the ACM on Networking10.1145/36291321:CoNEXT3(1-21)Online publication date: 28-Nov-2023
  • (2023)The Role of Machine Learning in CybersecurityDigital Threats: Research and Practice10.1145/35455744:1(1-38)Online publication date: 7-Mar-2023
  • (2023)Autism Disclosures and Cybercrime Discourse on a Large Underground Forum2023 APWG Symposium on Electronic Crime Research (eCrime)10.1109/eCrime61234.2023.10485504(1-14)Online publication date: 15-Nov-2023
  • (2023)Hacker's Paradise: Analysing Music in a Cybercrime Forum2023 APWG Symposium on Electronic Crime Research (eCrime)10.1109/eCrime61234.2023.10485503(1-14)Online publication date: 15-Nov-2023
  • (2023)A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) ForumsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.330442418(5473-5483)Online publication date: 2023
  • (2023)Syntactic Enhanced Euphemisms Identification Based on Graph Convolution Networks and Dependency Parsing2023 8th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC59305.2023.00034(172-180)Online publication date: 18-Aug-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media