Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3432601.3432630dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Voting for authorship attribution applied to dark web data

Published: 10 November 2020 Publication History

Abstract

This research is about authorship attribution (AA) within multiple Dark Web forums and the question of whether AA is possible beyond the boundaries of a single forum. AA can become a curse for users that try to protect their anonymity and simultaneously become a blessing for law enforcement groups that try to track users. In this paper, we explore AA within multiple Dark Web forums to determine whether AA is possible beyond the boundaries of a single forum. The analysis revealed that analyzing all features together with a single classifier does not achieve as good results as when they are classified separately and the final result is computed by a voting mechanism. The latter achieves an F1-Score that is up to 44% higher than in the former case. On top of that, the analyses show that the author of a post is at least 94% within the top three most likely candidates. This shows that AA can threaten the anonymity of Dark Web users across the boundaries of different forums.

References

[1]
Johanna Amann and Robin Sommer. 2016. Exploring Tor's Activity Through Long-Term Passive TLS Traffic Measurement. In Passive and Active Measurement, Thomas Karagiannis and Xenofontas Dimitropoulos (Eds.). Springer International Publishing, Cham, 3--15.
[2]
M. Ashcroft, F. Johansson, L. Kaati, and A. Shrestha. 2016. Multi-domain Alias Matching Using Machine Learning. In 2016 Third European Network Intelligence Conference (ENIC). 77--84.
[3]
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc.
[4]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, null (March 2003), 993--1022.
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. CoRR abs/1607.04606 (2016). http://arxiv.org/abs/1607.04606
[6]
Gwern Branwen, Nicolas Christin, David Décary-Hétu, Rasmus Munksgaard Andersen, StExo, El Presidente, Anonymous, Daryl Lau, Sohhlz, Delyan Kratunov, Vince Cakic, Van Buskirk, Whom, Michael McKenna, and Sigi Goode. 2015. Dark Net Market archives, 2011--2015. www.gwern.net/DNM-archives. Accessed: 22-05-2019.
[7]
CAS-Atlantic. [n. d.]. Dark Web forum dataset 2019 (DWF-CAS-IVC-2019). https://github.com/CAS-Atlantic/Dark-Web-forum-dataset-2019--DWF-CAS-IVC-2019. Accessed: 21-08-2020.
[8]
LanguageTooler GmbH. [n. d.]. LanguageTool - Proofreading Software. https://languagetool.org. Accessed: 09-08-2020.
[9]
Thanh Nghia Ho and Wee Keong Ng. 2016. Application of Stylometry to DarkWeb Forum User Identification. In Information and Communications Security, Kwok-Yan Lam, Chi-Hung Chi, and Sihan Qing (Eds.). Springer International Publishing, Cham, 173--183.
[10]
C.J. Hutto and E.E. Gilbert. 2014. VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI.
[11]
R. Layton, P. Watters, and R. Dazeley. 2010. Authorship Attribution for Twitter in 140 Characters or Less. In 2010 Second Cybercrime and Trustworthy Computing Workshop. 1--8.
[12]
G. Me, L. Pesticcio, and P. Spagnoletti. 2017. Discovering Hidden Relations Between Tor Marketplaces Users. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). 494--501.
[13]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. 1--12. https://arxiv.org/abs/1301.3781
[14]
Jelena Mirkovic and Peter Reiher. 2004. A Taxonomy of DDoS Attack and DDoS Defense Mechanisms. SIGCOMM Comput. Commun. Rev. 34, 2 (apr 2004), 39--53.
[15]
M. La Morgia, A. Mei, S. Raponi, and J. Stefa. 2018. Time-Zone Geolocation of Crowds in the Dark Web. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). 445--455.
[16]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 12, null (Nov. 2011), 2825--2830.
[17]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14-1162
[18]
S. R. Pillay and T. Solorio. 2010. Authorship attribution of web forum posts. In 2010 eCrime Researchers Summit. 1--7.
[19]
Radim Rehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50.
[20]
Britta Sennewald. 2020. Authorship Attribution in the Dark Web. Master's thesis. University of New Brunswick, Fredericton, NB, Canada.
[21]
M. Spitters, F. Klaver, G. Koot, and M. v. Staalduinen. 2015. Authorship Analysis on Dark Marketplace Forums. In 2015 European Intelligence and Security Informatics Conference. 1--8.
[22]
M. Sultana, P. Polash, and M. Gavrilova. 2017. Authorship recognition of tweets: A comparison between social behavior and linguistic profiles. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 471--476.
[23]
S. Swain, G. Mishra, and C. Sindhu. 2017. Recent approaches on authorship attribution techniques --- An overview. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Vol. 1. 557--566.
[24]
Bonn-Rhein-Sieg University. [n. d.]. Platform for Scientific Computing at Bonn-Rhein-Sieg University. https://wr0.wr.inf.h-brs.de. Accessed: 08-05-2020.
[25]
Min Yang and Kam-Pui Chow. 2014. Authorship Attribution for Forensic Investigation with Thousands of Authors. In ICT Systems Security and Privacy Protection, Nora Cuppens-Boulahia, Frédéric Cuppens, Sushil Jajodia, Anas Abou El Kalam, and Thierry Sans (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 339--350.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '20: Proceedings of the 30th Annual International Conference on Computer Science and Software Engineering
November 2020
297 pages

Sponsors

  • IBM Centre for Advanced Studies (CAS)
  • IBM Canada: IBM Canada

Publisher

IBM Corp.

United States

Publication History

Published: 10 November 2020

Author Tags

  1. authorship attribution
  2. dark web
  3. machine learning
  4. natural language processing
  5. voting

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 43
    Total Downloads
  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media