research-article

Public Access

Tools for Automated Analysis of Cybercriminal Markets

Authors:

Rebecca S. Portnoff,

Jonathan K. Kummerfeld,

Taylor Berg-Kirkpatrick,

Kirill Levchenko,

Vern PaxsonAuthors Info & Claims

WWW '17: Proceedings of the 26th International Conference on World Wide Web

Pages 657 - 666

https://doi.org/10.1145/3038912.3052600

Published: 03 April 2017 Publication History

Abstract

Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and criminal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to understand the markets makes manual exploration of these forums unscalable. In this work, we propose an automated, top-down approach for analyzing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pair of case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices.

References

[1]

Sadia Afroz, Vaibhav Garg, Damon McCoy, and Rachel Greenstadt. 2013. Honor among thieves: A common's analysis of cybercrime economies. (2013), 1--11.

[2]

David Bamman, Brendan O'Connor, and Noah A. Smith. 2013. Learning Latent Personas of Film Characters. In Proceedings of ACL .

[3]

Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics 18, 4 (1992), 467--479.

Digital Library

[4]

Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of EMNLP.

[5]

Hal Daume III. 2007. Frustratingly Easy Domain Adaptation. In Proceedings of ACL.

[6]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. JMLR 12 (2011), 2121--2159.

Digital Library

[7]

Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of EMNLP.

Digital Library

[8]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. (2008), 1871--874.

Digital Library

[9]

J. L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378--382.

[10]

Jason Franklin, Vern Paxson, Adrian Perrig, and Stefan Savage. 2007. An Inquiry into the Nature and Causes of the Wealth of Internet Miscreants. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS '07). ACM, 375--388.

Digital Library

[11]

Dayne Freitag and Andrew McCallum. 2000. Information Extraction with HMM Structures Learned by Stochastic Optimization. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence.

Digital Library

[12]

Vaibhav Garg, Sadia Afroz, Rebekah Overdorf, and Rachel Greenstadt. 2015. Computer-Supported Cooperative Crime. In Financial Cryptography and Data Security. 32--43.

[13]

T. J. Holt and E. Lampke. 2010. Exploring Stolen Data Markets Online: Products and Market Forces. Criminal Justice Studies 23, 1 (2010), 33--50.

[14]

Rasoul Kaljahi, Jennifer Foster, Johann Roturier, Corentin Ribeyre, Teresa Lynn, and Joseph Le Roux. 2015. Foreebank: Syntactic Analysis of Customer Support Forums. In Proceedings of EMNLP.

[15]

Su Nam Kim, Li Wang, and Timothy Baldwin. 2010. Tagging and Linking Web Forum Posts. In Proceedings of CoNLL.

Digital Library

[16]

Brian Krebs. 2013a. Cards Stolen in Target Breach Flood Underground Markets. http://krebsonsecurity. com/2013/12/cards-stolen-in-target-breach\-flood-underground-markets. (2013).

[17]

Brian Krebs. 2013b. Who's Selling Credit Cards from Target? http://krebsonsecurity.com/2013/12/ whos-selling-credit-cards-from-target. (2013).

[18]

Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, and Dan Klein. 2015. An Empirical Analysis of Optimization for Max-Margin NLP. In Proceedings of EMNLP.

[19]

Marco Lui and Timothy Baldwin. 2010. Classifying User Forum Participants: Separating the Gurus from the Hacks, and Other Tales of the Internet. In Proceedings of the Australasian Language Technology Association Workshop (ALTA).

[20]

Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of ACL: System Demonstrations.

[21]

Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011. An Analysis of Underground Forums. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 71--80.

Digital Library

[22]

NIST. 2005. The ACE 2005 Evaluation Plan. In NIST.

[23]

Brendan O'Connor, Brandon M. Stewart, and Noah A. Smith. 2013. Learning to Extract International Relations from Political Context. In Proceedings of ACL.

[24]

Department of Justice. 2015. Major Computer Hacking Forum Dismantled. https://www.justice.gov/opa/pr/ major-computer-hacking-forum-dismantled. (2015).

[25]

Ankur P. Parikh, Hoifung Poon, and Kristina Toutanova. 2015. Grounded Semantic Parsing for Complex Knowledge Extraction. In Proceedings of NAACL.

[26]

Nathan J. Ratliff, Andrew Bagnell, and Martin Zinkevich. 2007. (Online) Subgradient Methods for Structured Prediction. In Proceedings of the International Conference on Artificial Intelligence and Statistics.

[27]

Kyle Soska and Nicolas Christin. 2015. Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In 24th USENIX Security Symposium (USENIX Security 15). 33--48.

Digital Library

[28]

Brett Stone-Gross, Thorsten Holz, Gianluca Stringhini, and Giovanni Vigna. 2011. The Underground Economy of Spam: A Botmaster's Perspective of Coordinating Large-scale Spam Campaigns. In Proceedings of the 4th USENIX Conference on Large-scale Exploits and Emergent Threats (LEET'11).

Digital Library

[29]

Mihai Surdeanu. 2013. Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling,. In Proceedings of the TAC-KBP 2013 Workshop.

[30]

Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL.

Digital Library

[31]

Li Wang, Marco Lui, Su Nam Kim, Joakim Nivre, and Timothy Baldwin. 2011. Predicting Thread Discourse Structure over Technical Web Forums. In Proceedings of EMNLP.

Digital Library

[32]

M. Yip, N. Shadbolt, and C. Webber. 2012. Structural analysis of online criminal social networks. In Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on. 60--65

Cited By

Hughes JPastrana SHutchings AAfroz SSamtani SLi WMarin E(2024)The Art of Cybercrime Community ResearchACM Computing Surveys10.1145/3639362Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3639362
Li YGao YAyoade GKhan LSinghal AThuraisingham B(2024)Heterogeneous Domain Adaptation for Multistream Classification on Cyber Threat DataIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.318168221:1(1-11)Online publication date: Jan-2024
https://doi.org/10.1109/TDSC.2022.3181682
M SG MV PS PK S(2024)Tor-Quest (The Onion Router Crawler)2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM)10.1109/ICONSTEM60960.2024.10568905(1-5)Online publication date: 4-Apr-2024
https://doi.org/10.1109/ICONSTEM60960.2024.10568905
Show More Cited By

Index Terms

Tools for Automated Analysis of Cybercriminal Markets
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
2. Social and professional topics
  1. Computing / technology policy
    1. Computer crime

Recommendations

Is there a cybercriminal personality? Comparing cyber offenders and offline offenders on HEXACO personality domains and their underlying facets
Abstract
Cyberspace creates opportunities for new forms of crime that may be related to specific personality characteristics of offenders. Few studies have investigated the personality characteristics of cyber offenders. We address this gap by ...
Highlights
- Compared to offline offenders, cyber offenders' personality scores are more similar to those of a community sample.
Emotions Behind Drive-by Download Propagation on Twitter

Twitter has emerged as one of the most popular platforms to get updates on entertainment and current events. However, due to its 280-character restriction and automatic shortening of URLs, it is continuously targeted by cybercriminals to carry out drive-...
Characterizing Large-Scale Click Fraud in ZeroAccess
CCS '14: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security

Click fraud is a scam that hits a criminal sweet spot by both tapping into the vast wealth of online advertising and exploiting that ecosystem's complex structure to obfuscate the flow of money to its perpetrators. In this work, we illuminate the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '17: Proceedings of the 26th International Conference on World Wide Web

April 2017

1678 pages

ISBN:9781450349130

General Chairs:
Rick Barrett
W3Events
,
Rick Cummings
Murdoch University
,
Program Chairs:
Eugene Agichtein
Emory University
,
Evgeniy Gabrilovich
Google Research

Copyright © 2017 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 03 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Office of Naval Research
National Science Foundation
Center for Long-Term Cybersecurity

Conference

WWW '17

Sponsor:

IW3C2

WWW '17: 26th International World Wide Web Conference

April 3 - 7, 2017

Perth, Australia

Acceptance Rates

WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
1,194
Total Downloads

Downloads (Last 12 months)201
Downloads (Last 6 weeks)36

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hughes JPastrana SHutchings AAfroz SSamtani SLi WMarin E(2024)The Art of Cybercrime Community ResearchACM Computing Surveys10.1145/3639362Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3639362
Li YGao YAyoade GKhan LSinghal AThuraisingham B(2024)Heterogeneous Domain Adaptation for Multistream Classification on Cyber Threat DataIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.318168221:1(1-11)Online publication date: Jan-2024
https://doi.org/10.1109/TDSC.2022.3181682
M SG MV PS PK S(2024)Tor-Quest (The Onion Router Crawler)2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM)10.1109/ICONSTEM60960.2024.10568905(1-5)Online publication date: 4-Apr-2024
https://doi.org/10.1109/ICONSTEM60960.2024.10568905
Campobasso MAllodi LCalandrino JTroncoso C(2023)Know your cybercriminalProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620269(553-570)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620269
Jain VAlam SKrishnamurthy SFaloutsos M(2023)C2Store: C2 Server Profiles at Your FingertipsProceedings of the ACM on Networking10.1145/36291321:CoNEXT3(1-21)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1145/3629132
Apruzzese GLaskov PMontes de Oca EMallouli WBrdalo Rapa LGrammatopoulos ADi Franco F(2023)The Role of Machine Learning in CybersecurityDigital Threats: Research and Practice10.1145/35455744:1(1-38)Online publication date: 7-Mar-2023
https://dl.acm.org/doi/10.1145/3545574
Man JSiu GHutchings A(2023)Autism Disclosures and Cybercrime Discourse on a Large Underground Forum2023 APWG Symposium on Electronic Crime Research (eCrime)10.1109/eCrime61234.2023.10485504(1-14)Online publication date: 15-Nov-2023
https://doi.org/10.1109/eCrime61234.2023.10485504
Talas AHutchings A(2023)Hacker's Paradise: Analysing Music in a Cybercrime Forum2023 APWG Symposium on Electronic Crime Research (eCrime)10.1109/eCrime61234.2023.10485503(1-14)Online publication date: 15-Nov-2023
https://doi.org/10.1109/eCrime61234.2023.10485503
Di Tizio GSiu GHutchings AMassacci F(2023)A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) ForumsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.330442418(5473-5483)Online publication date: 2023
https://doi.org/10.1109/TIFS.2023.3304424
Huang XZhao JShi JSun YWang XShen L(2023)Syntactic Enhanced Euphemisms Identification Based on Graph Convolution Networks and Dependency Parsing2023 8th International Conference on Data Science in Cyberspace (DSC)10.1109/DSC59305.2023.00034(172-180)Online publication date: 18-Aug-2023
https://doi.org/10.1109/DSC59305.2023.00034
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents