Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3469096.3475060acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Trustworthiness of spam email addresses using machine learning

Published: 16 August 2021 Publication History

Abstract

Cybercriminals have increasingly used spam email to send scams, phishing, malware and other frauds to organisations and people. They design sophisticated and contextualised emails to make them look trustworthy for users, being the sender addresses an essential part. Although cybersecurity agencies and companies develop products and organise courses for people to detect emails patterns, spam attacks are not totally avoided yet.
This work presents a proof-of-concept methodology to give the user more meaningful information about trustworthiness to detect these harmful emails. For the first time in the literature, we present an email address dataset manually labelled into two classes, low and high quality. Moreover, we extracted 18 handcrafted features based on social engineering techniques and natural language properties. We evaluated four popular machine learning classifiers and obtained the best performance with Naive Bayes, i.e., 88.17% of accuracy and 0.808 of F1-Score. Additionally, we applied the InterpretML framework to find out the most relevant properties to eventually implement an automatic system able to inform about the trustworthiness of email addresses.

References

[1]
Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre, and Rocío Alaíz-Rodríguez. 2020. File Name Classification Approach to Identify Child Sexual Abuse. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1. ICPRAM,. SciTePress, 228--234.
[2]
Moneer Alshaikh, Sean B. Maynard, and Atif Ahmad. 2021. Applying social marketing to evaluate current security education training and awareness programs in organisations. Computers & Security 100 (2021), 102090.
[3]
Carlo Marcelo Revoredo da Silva, Eduardo Luzeiro Feitosa, and Vinicius Cardoso Garcia. 2020. Heuristic-based strategy for Phishing prediction: A survey of URL-based approach. Computers & Security 88 (2020), 101613.
[4]
Emmanuel Gbenga Dada, Joseph Stephen Bassi, Haruna Chiroma, Shafi'i Muhammad Abdulhamid, Adebayo Olusola Adetunmbi, and Opeyemi Emmanuel Ajibuwa. 2019. Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5, 6 (2019), e01802.
[5]
Europol. 2019. Spear phishing, a law enforcement and cross-industry perspective. https://www.europol.europa.eu/newsroom/news/europol-publishes-law-enforcement-and-industry-report-spear-phishing. Accessed: 2021-06-01.
[6]
Luigi Gallo, Alessandro Maiello, Alessio Botta, and Giorgio Ventre. 2021. 2 Years in the anti-phishing group of a large company. Computers & Security, (2021), 102259.
[7]
Tie Li, Gang Kou, and Yi Peng. 2020. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Information Systems 91 (2020), 101494.
[8]
Rami Mohammad and A Mohammad. 2020. A lifelong spam emails classification model. Applied Computing and Informatics, (01 2020), 10.
[9]
Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. 2019. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv preprint arXiv:1909.09223, (2019),.
[10]
Daniela Seabra Oliveira, Tian Lin, Harold Rocha, Donovan Ellis, Sandeep Dommaraju, Huizi Yang, Devon Weir, Sebastian Marin, and Natalie C. Ebner. 2019. Empirical analysis of weapons of influence, life domains, and demographic-targeting in modern spam: an age-comparative perspective. Crime Science 8, 1 (2019), 3.
[11]
Routhu Rao and Alwyn Pais. 2019. Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and Applications 31 (08 2019).
[12]
Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, and Banu Diri. 2019. Machine learning based phishing detection from URLs. Expert Systems with Applications 117 (2019), 345--357.
[13]
M. Sánchez-Paniagua, E. Fidalgo, V. González-Castro, and E. Alegre. 2021. Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection. In 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020), Álvaro Herrero, Carlos Cambra, Daniel Urda, Javier Sedano, Héctor Quintián, and Emilio Corchado (Eds.). Springer International Publishing, Cham, 87--96.
[14]
Wei, Qiao Ke, Jakub Nowak, Marcin Korytkowski, Rafał Scherer, and Marcin Woźniak. 2020. Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks 178 (2020), 107275.
[15]
P. Yang, G. Zhao, and P. Zeng. 2019. Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning. IEEE Access 7 (2019), 15196--15209.

Cited By

View all
  • (2024)Bootstrap Deep Metric for Seed Expansion in Attributed NetworksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657687(1629-1638)Online publication date: 10-Jul-2024
  • (2023)Artificial Intelligence Methods in Email Marketing—A SurveyDependable Computer Systems and Networks10.1007/978-3-031-37720-4_8(85-94)Online publication date: 11-Aug-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering
August 2021
178 pages
ISBN:9781450385961
DOI:10.1145/3469096
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification algorithm
  2. cybersecurity
  3. feature extraction
  4. machine learning
  5. spam email detection

Qualifiers

  • Short-paper

Conference

DocEng '21
Sponsor:
DocEng '21: ACM Symposium on Document Engineering 2021
August 24 - 27, 2021
Limerick, Ireland

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)10
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bootstrap Deep Metric for Seed Expansion in Attributed NetworksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657687(1629-1638)Online publication date: 10-Jul-2024
  • (2023)Artificial Intelligence Methods in Email Marketing—A SurveyDependable Computer Systems and Networks10.1007/978-3-031-37720-4_8(85-94)Online publication date: 11-Aug-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media