research-article

Voting for authorship attribution applied to dark web data

Authors:

Britta Sennewald,

Rainer Herpers,

Marco Hülsmann,

Kenneth B. KentAuthors Info & Claims

CASCON '20: Proceedings of the 30th Annual International Conference on Computer Science and Software Engineering

Pages 217 - 226

Published: 10 November 2020 Publication History

Abstract

This research is about authorship attribution (AA) within multiple Dark Web forums and the question of whether AA is possible beyond the boundaries of a single forum. AA can become a curse for users that try to protect their anonymity and simultaneously become a blessing for law enforcement groups that try to track users. In this paper, we explore AA within multiple Dark Web forums to determine whether AA is possible beyond the boundaries of a single forum. The analysis revealed that analyzing all features together with a single classifier does not achieve as good results as when they are classified separately and the final result is computed by a voting mechanism. The latter achieves an F1-Score that is up to 44% higher than in the former case. On top of that, the analyses show that the author of a post is at least 94% within the top three most likely candidates. This shows that AA can threaten the anonymity of Dark Web users across the boundaries of different forums.

References

[1]

Johanna Amann and Robin Sommer. 2016. Exploring Tor's Activity Through Long-Term Passive TLS Traffic Measurement. In Passive and Active Measurement, Thomas Karagiannis and Xenofontas Dimitropoulos (Eds.). Springer International Publishing, Cham, 3--15.

[2]

M. Ashcroft, F. Johansson, L. Kaati, and A. Shrestha. 2016. Multi-domain Alias Matching Using Machine Learning. In 2016 Third European Network Intelligence Conference (ENIC). 77--84.

[3]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc.

[4]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, null (March 2003), 993--1022.

[5]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR abs/1607.04606 (2016). http://arxiv.org/abs/1607.04606

[6]

Gwern Branwen, Nicolas Christin, David Décary-Hétu, Rasmus Munksgaard Andersen, StExo, El Presidente, Anonymous, Daryl Lau, Sohhlz, Delyan Kratunov, Vince Cakic, Van Buskirk, Whom, Michael McKenna, and Sigi Goode. 2015. Dark Net Market archives, 2011--2015. www.gwern.net/DNM-archives. Accessed: 22-05-2019.

[7]

CAS-Atlantic. [n. d.]. Dark Web forum dataset 2019 (DWF-CAS-IVC-2019). https://github.com/CAS-Atlantic/Dark-Web-forum-dataset-2019--DWF-CAS-IVC-2019. Accessed: 21-08-2020.

[8]

LanguageTooler GmbH. [n. d.]. LanguageTool - Proofreading Software. https://languagetool.org. Accessed: 09-08-2020.

[9]

Thanh Nghia Ho and Wee Keong Ng. 2016. Application of Stylometry to DarkWeb Forum User Identification. In Information and Communications Security, Kwok-Yan Lam, Chi-Hung Chi, and Sihan Qing (Eds.). Springer International Publishing, Cham, 173--183.

[10]

C.J. Hutto and E.E. Gilbert. 2014. VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI.

[11]

R. Layton, P. Watters, and R. Dazeley. 2010. Authorship Attribution for Twitter in 140 Characters or Less. In 2010 Second Cybercrime and Trustworthy Computing Workshop. 1--8.

[12]

G. Me, L. Pesticcio, and P. Spagnoletti. 2017. Discovering Hidden Relations Between Tor Marketplaces Users. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). 494--501.

[13]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. 1--12. https://arxiv.org/abs/1301.3781

[14]

Jelena Mirkovic and Peter Reiher. 2004. A Taxonomy of DDoS Attack and DDoS Defense Mechanisms. SIGCOMM Comput. Commun. Rev. 34, 2 (apr 2004), 39--53.

Digital Library

[15]

M. La Morgia, A. Mei, S. Raponi, and J. Stefa. 2018. Time-Zone Geolocation of Crowds in the Dark Web. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). 445--455.

[16]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 12, null (Nov. 2011), 2825--2830.

[17]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14-1162

[18]

S. R. Pillay and T. Solorio. 2010. Authorship attribution of web forum posts. In 2010 eCrime Researchers Summit. 1--7.

[19]

Radim Rehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50.

[20]

Britta Sennewald. 2020. Authorship Attribution in the Dark Web. Master's thesis. University of New Brunswick, Fredericton, NB, Canada.

[21]

M. Spitters, F. Klaver, G. Koot, and M. v. Staalduinen. 2015. Authorship Analysis on Dark Marketplace Forums. In 2015 European Intelligence and Security Informatics Conference. 1--8.

[22]

M. Sultana, P. Polash, and M. Gavrilova. 2017. Authorship recognition of tweets: A comparison between social behavior and linguistic profiles. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 471--476.

[23]

S. Swain, G. Mishra, and C. Sindhu. 2017. Recent approaches on authorship attribution techniques --- An overview. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Vol. 1. 557--566.

[24]

Bonn-Rhein-Sieg University. [n. d.]. Platform for Scientific Computing at Bonn-Rhein-Sieg University. https://wr0.wr.inf.h-brs.de. Accessed: 08-05-2020.

[25]

Min Yang and Kam-Pui Chow. 2014. Authorship Attribution for Forensic Investigation with Thousands of Authors. In ICT Systems Security and Privacy Protection, Nora Cuppens-Boulahia, Frédéric Cuppens, Sushil Jajodia, Anas Abou El Kalam, and Thierry Sans (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 339--350.

Index Terms

Voting for authorship attribution applied to dark web data
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Security and privacy

Recommendations

Authorship Attribution of Android Apps
CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy

Since the first computer virus hit the Advanced Research Projects Agency Network (ARPANET) in the early 1970s, the security community interest revolved around ways to expose the identities of malware writers. Knowledge of the adversarial identities ...
Weighted Voting and Meta-Learning for Combining Authorship Attribution Methods
Intelligent Data Engineering and Automated Learning – IDEAL 2018
Abstract
Our research concentrates on ways to combine machine learning techniques for authorship attribution. Traditionally, research in authorship attribution is focused on the development of new base-classifiers (combinations of stylometric features and ...
Contrastive Disentanglement for Authorship Attribution
WWW '24: Companion Proceedings of the ACM Web Conference 2024

Authorship Attribution (AA) seeks to determine the authorship of texts by examining distinctive writing styles. Although current AA methods have shown promising results, they often underperform in scenarios with significant topic shifts. This limitation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

CASCON '20: Proceedings of the 30th Annual International Conference on Computer Science and Software Engineering

November 2020

297 pages

Editors:
Lily Shaddick
IBM Canada Ltd.
,
Guy-Vincent Jourdan
University of Ottawa
,
Vio Onut
IBM Canada Ltd.
,
Tinny Ng
IBM Canada Ltd.

Sponsors

IBM Centre for Advanced Studies (CAS)
IBM Canada: IBM Canada

Publisher

IBM Corp.

United States

Publication History

Published: 10 November 2020

Author Tags

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
43
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents