research-article

Gossip: Automatically Identifying Malicious Domains from Mailing List Discussions

Authors:

Luca Invernizzi,

Christopher Kruegel,

Giovanni VignaAuthors Info & Claims

ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

Pages 494 - 505

https://doi.org/10.1145/3052973.3053017

Published: 02 April 2017 Publication History

Abstract

Domain names play a critical role in cybercrime, because they identify hosts that serve malicious content (such as malware, Trojan binaries, or malicious scripts), operate as command-and-control servers, or carry out some other role in the malicious network infrastructure. To defend against Internet attacks and scams, operators widely use blacklisting to detect and block malicious domain names and IP addresses. Existing blacklists are typically generated by crawling suspicious domains, manually or automatically analyzing malware, and collecting information from honeypots and intrusion detection systems. Unfortunately, such blacklists are difficult to maintain and are often slow to respond to new attacks. Security experts set up and join mailing lists to discuss and share intelligence information, which provides a better chance to identify emerging malicious activities. In this paper, we design Gossip, a novel approach to automatically detect malicious domains based on the analysis of discussions in technical mailing lists (particularly on security-related topics) by using natural language processing and machine learning techniques. We identify a set of effective features extracted from email threads, users participating in the discussions, and content keywords, to infer malicious domains from mailing lists, without the need to actually crawl the suspect websites. Our result shows that Gossip achieves high detection accuracy. Moreover, the detection from our system is often days or weeks earlier than existing public blacklists.

References

[1]

Malc0de database. http://malc0de.com/database.

[2]

Malware domain list. http://www.malwaredomainlist.com.

[3]

Mozilla public suffic list. http://publicsuffix.org.

[4]

Phishtank. https://www.phishtank.com.

[5]

The spamhaus project. https://www.spamhaus.org.

[6]

The swiss security blog. https://www.abuse.ch.

[7]

S. Afroz, A. C. Islam, A. Stolerman, R. Greenstadt, and D. McCoy. Doppelganger finder: Taking stylometry to the underground. In IEEE Symposium on Security and Privacy, 2014.

Digital Library

[8]

H. Almuhimedi, A. P. Felt, R. W. Reeder, and S. Consolvo. Your reputation precedes you: History, reputation, and the Chrome malware warning. In Symposium on Usable Privacy and Security (SOUPS), 2014.

[9]

M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster. Building a dynamic reputation system for DNS. In Proceedings of 19th USENIX Security Symposium, 2010.

Digital Library

[10]

M. Antonakakis, R. Perdisci, W. Lee, N. Vasiloglou II, and D. Dagon. Detecting malware domains at the upper DNS hierarchy. In Proceedings of 20th USENIX Security Symposium, 2011.

Digital Library

[11]

L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi. EXPOSURE: Finding malicious domains using passive DNS analysis. In Proceedings of the Network and Distributed System Security Symposium (NDSS), 2011.

[12]

D. Canali, M. Cova, G. Vigna, and C. Kruegel. Prophiler: A fast filter for the large-scale detection of malicious web pages. In Proceedings of the International World Wide Web Conference (WWW), 2011.

Digital Library

[13]

R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, 2006.

Digital Library

[14]

C.-M. Chen, J.-J. Huang, and Y.-H. Ou. Detecting web attacks based on domain statistics. In Intelligence and Security Informatics, pages 97--106. Springer, 2013.

[15]

M. Cova, C. Kruegel, and G. Vigna. Detection and analysis of drive-by-download attacks and malicious javascript code. In Proceedings of the World Wide Web Conference (WWW), 2010.

Digital Library

[16]

C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. Zozzle: Low-overhead mostly static Javascript malware detection. In Proceedings of 20th USENIX Security Symposium, 2011.

Digital Library

[17]

M. Cutts. Oxford guide to plain English. OUP Oxford, 2013.

[18]

M. Darling, G. Heileman, G. Gressel, A. Ashok, and P. Poornachandran. A lexical approach for classifying malicious URLs. In IEEE International Conference on High Performance Computing & Simulation (HPCS), pages 195--202, 2015.

[19]

G. groups. 10,000 most common English words. https://github.com/first20hours/google-10000-english.

[20]

G. groups. VirusTotal. https://www.virustotal.com.

[21]

R. Gunning et al. How to take the fog out of writing. 1964.

[22]

N. Habash, O. Rambow, and R. Roth. Mada

[23]

tokan: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), 2009.

[24]

S. Hao, N. Feamster, and R. Pandrangi. Monitoring the initial DNS behavior of malicious domains. In Proceedings of the ACM Internet Measurement Conference, 2011.

Digital Library

[25]

S. Hao, A. Kantchelian, B. Miller, V. Paxson, and N. Feamster. Predator: Proactive recognition and elimination of domain abuse at time-of-registration. In ACM Conference on Computer and Communications Security, 2016.

Digital Library

[26]

S. Hao, M. Thomas, V. Paxson, N. Feamster, C. Kreibich, C. Grier, and S. Hollenbeck. Understanding the domain registration behavior of spammers. In Proceedings of the ACM Internet Measurement Conference, 2013.

Digital Library

[27]

Y. He, Z. Zhong, S. Krasser, and Y. Tang. Mining DNS for malicious domain registrations. In Proceedings of the 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2010.

[28]

L. Invernizzi, P. M. Comparetti, S. Benvenuti, C. Kruegel, M. Cova, and G. Vigna. EvilSeed: A guided approach to finding malicious web pages. In IEEE Symposium on Security and Privacy, 2012.

Digital Library

[29]

A. Kapravelos, M. Cova, C. Kruegel, and G. Vigna. Escape from monkey island: Evading high-interaction honeyclients. In Proceedings of the 8th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA). 2011.

Digital Library

[30]

M. Kührer and T. Holz. An empirical analysis of malware blacklists. Praxis der Informationsverarbeitung und Kommunikation, 35(1):11--16, 2012.

[31]

M. Kührer, C. Rossow, and T. Holz. Paint it black: Evaluating the effectiveness of malware blacklists. In Symposium on Recent Advances in Intrusion Detection. 2014.

[32]

X. Liao, K. Yuan, X. Wang, Z. Li, L. Xing, and R. Beyah. Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence. In ACM Conference on Computer and Communications Security, 2016.

Digital Library

[33]

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009.

Digital Library

[34]

McAfee. https://www.siteadvisor.com.

[35]

D. K. McGrath and M. Gupta. Behind phishing: An examination of phisher modi operandi. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2008.

Digital Library

[36]

T. Moore and R. Clayton. Evaluating the wisdom of crowds in assessing phishing websites. In Proceedings of the Conference on Financial Cryptography and Data Security. 2008.

Digital Library

[37]

S. Mukherjee, G. Weikum, and C. Danescu-Niculescu-Mizil. People on drugs: Credibility of user statements in health communities. In Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2014.

Digital Library

[38]

A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, and D. Song. On the feasibility of internet-scale author identification. In IEEE Symposium on Security and Privacy, 2012.

Digital Library

[39]

L. Olshen, C. J. Stone, et al. Classification and regression trees. Wadsworth International Group, 93(99):101, 1984.

[40]

A. Pitsillidis, C. Kanich, G. M. Voelker, K. Levchenko, and S. Savage. Taster's choice: A comparative analysis of spam feeds. In Proceedings of the ACM Internet Measurement Conference, 2012.

Digital Library

[41]

M. F. Porter. Snowball: A language for stemming algorithms, 2001.

[42]

L. Richardson. Beautiful soup documentation. 2007.

[43]

K. Rieck and P. Laskov. Detecting unknown network attacks using language models. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 2006.

Digital Library

[44]

S. Sheng, B. Wardman, G. Warner, L. F. Cranor, J. Hong, and C. Zhang. An empirical analysis of phishing blacklists. In Proceedings of Sixth Conference on Email and Anti-Spam (CEAS), 2009.

[45]

B. Steven, E. Klein, and E. Loper. Natural language processing with Python. OReilly Media, 2009.

Digital Library

[46]

P. Vadrevu, B. Rahbarinia, R. Perdisci, K. Li, and M. Antonakakis. Measuring and detecting malware downloads in live network traffic. In Proceedings of the European Symposium on Research in Computer Security, 2013.

[47]

K. Wang, C. Thrasher, and B.-J. P. Hsu. Web scale NLP: A case study on URL word breaking. In Proceedings of the 20th International Conference on World Wide Web, 2011.

Digital Library

[48]

K. Wang, C. Thrasher, E. Viegas, X. Li, and B.-j. P. Hsu. An overview of Microsoft Web N-gram corpus and applications. In Proceedings of the NAACL HLT 2010 Demonstration Session, pages 45--48. Association for Computational Linguistics, 2010.

Digital Library

[49]

W. Wang and K. E. Shirley. Breaking bad: Detecting malicious domains using word segmentation. In IEEE Web 2.0 Security and Privacy Workshop. 2015.

[50]

P. Willett. The Porter stemming algorithm: then and now. Program, 40(3):219--223, 2006.

[51]

G. Xiang, J. Hong, C. P. Rose, and L. Cranor. Cantina

[52]

: A feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), 14(2):21, 2011.

Digital Library

[53]

W. Zhang, W. Wang, X. Zhang, and H. Shi. Research on privacy protection of WHOIS information in DNS. In Computer Science and its Applications, pages 71--76. Springer, 2015.

Cited By

Kim JKim JWi SKim YSon SBulusu NAryafar EBalasubramanian ASong J(2022)HearMeOutProceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services10.1145/3498361.3538939(422-435)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3498361.3538939
Zhu EChen ZCui JZhong H(2022)MOE/RF: A Novel Phishing Detection Model Based on Revised Multiobjective Evolution Optimization Algorithm and Random ForestIEEE Transactions on Network and Service Management10.1109/TNSM.2022.316288519:4(4461-4478)Online publication date: Dec-2022
https://doi.org/10.1109/TNSM.2022.3162885
Apoorva KSangeetha S(2022)Analysis of uniform resource locator using boosting algorithms for forensic purposeComputer Communications10.1016/j.comcom.2022.04.002190:C(69-77)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1016/j.comcom.2022.04.002
Show More Cited By

Index Terms

Gossip: Automatically Identifying Malicious Domains from Mailing List Discussions
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

Filtering spam with behavioral blacklisting
CCS '07: Proceedings of the 14th ACM conference on Computer and communications security

Spam filters often use the reputation of an IP address (or IP address range) to classify email senders. This approach worked well when most spam originated from senders with fixed IP addresses, but spam today is also sent from IP addresses for which ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Malware Detection Method Focusing on Anti-debugging Functions
CANDAR '14: Proceedings of the 2014 Second International Symposium on Computing and Networking

Malware has received much attention in recent years. Antivirus software is widely used as a countermeasure against malware. However, some kinds of malware can evade detection by antivirus software, hence, a new detection method is required. In this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

April 2017

952 pages

ISBN:9781450349444

DOI:10.1145/3052973

General Chairs:
Ramesh Karri
New York University, New York, USA
,
Ozgur Sinanoglu
New York University Abu Dhabi, UAE
,
Program Chairs:
Ahmad-Reza Sadeghi
Technische Universität Darmstadt, Germany
,
Xun Yi
RMIT University, Australia

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASIA CCS '17

Sponsor:

SIGSAC

ASIA CCS '17: ACM Asia Conference on Computer and Communications Security

April 2 - 6, 2017

Abu Dhabi, United Arab Emirates

Acceptance Rates

ASIA CCS '17 Paper Acceptance Rate 67 of 359 submissions, 19%;

Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
543
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim JKim JWi SKim YSon SBulusu NAryafar EBalasubramanian ASong J(2022)HearMeOutProceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services10.1145/3498361.3538939(422-435)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3498361.3538939
Zhu EChen ZCui JZhong H(2022)MOE/RF: A Novel Phishing Detection Model Based on Revised Multiobjective Evolution Optimization Algorithm and Random ForestIEEE Transactions on Network and Service Management10.1109/TNSM.2022.316288519:4(4461-4478)Online publication date: Dec-2022
https://doi.org/10.1109/TNSM.2022.3162885
Apoorva KSangeetha S(2022)Analysis of uniform resource locator using boosting algorithms for forensic purposeComputer Communications10.1016/j.comcom.2022.04.002190:C(69-77)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1016/j.comcom.2022.04.002
Shin HShim WKim SLee SKang YHwang Y(2021)#Twiti: Social Listening for Threat IntelligenceProceedings of the Web Conference 202110.1145/3442381.3449797(92-104)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449797
Zeng VBaki SAassal AVerma RDe Moraes LDas AVerma RKhan LMohan C(2020)Diverse Datasets and a Customizable Benchmarking Framework for PhishingProceedings of the Sixth International Workshop on Security and Privacy Analytics10.1145/3375708.3380313(35-41)Online publication date: 16-Mar-2020
https://dl.acm.org/doi/10.1145/3375708.3380313
El Aassal ABaki SDas AVerma R(2020)An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security NeedsIEEE Access10.1109/ACCESS.2020.29697808(22170-22192)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2969780
Yan GLi QGuo DLi B(2019)AULD: Large Scale Suspicious DNS Activities Detection via Unsupervised Learning in Advanced Persistent ThreatsSensors10.3390/s1914318019:14(3180)Online publication date: 19-Jul-2019
https://doi.org/10.3390/s19143180
Long TGao JYang MHu YYin B(2019)Locality Preserving Projection via Deep Neural Network2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852218(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852218
Long ZTan LZhou SHe CLiu X(2019)Collecting Indicators of Compromise from Unstructured Text of Cybersecurity Articles using Neural-Based Sequence Labelling2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852142(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8852142
Zhu EChen YYe CLi XLiu F(2019)OFS-NN: An Effective Phishing Websites Detection Model Based on Optimal Feature Selection and Neural NetworkIEEE Access10.1109/ACCESS.2019.29206557(73271-73284)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2920655
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten