short-paper

Trustworthiness of spam email addresses using machine learning

Authors:

Francisco Jáñez-Martino,

Rocío Alaiz-Rodríguez,

Víctor González-Castro,

Eduardo FidalgoAuthors Info & Claims

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

Article No.: 17, Pages 1 - 4

https://doi.org/10.1145/3469096.3475060

Published: 16 August 2021 Publication History

Abstract

Cybercriminals have increasingly used spam email to send scams, phishing, malware and other frauds to organisations and people. They design sophisticated and contextualised emails to make them look trustworthy for users, being the sender addresses an essential part. Although cybersecurity agencies and companies develop products and organise courses for people to detect emails patterns, spam attacks are not totally avoided yet.

This work presents a proof-of-concept methodology to give the user more meaningful information about trustworthiness to detect these harmful emails. For the first time in the literature, we present an email address dataset manually labelled into two classes, low and high quality. Moreover, we extracted 18 handcrafted features based on social engineering techniques and natural language properties. We evaluated four popular machine learning classifiers and obtained the best performance with Naive Bayes, i.e., 88.17% of accuracy and 0.808 of F1-Score. Additionally, we applied the InterpretML framework to find out the most relevant properties to eventually implement an automatic system able to inform about the trustworthiness of email addresses.

References

[1]

Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre, and Rocío Alaíz-Rodríguez. 2020. File Name Classification Approach to Identify Child Sexual Abuse. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1. ICPRAM,. SciTePress, 228--234.

[2]

Moneer Alshaikh, Sean B. Maynard, and Atif Ahmad. 2021. Applying social marketing to evaluate current security education training and awareness programs in organisations. Computers & Security 100 (2021), 102090.

[3]

Carlo Marcelo Revoredo da Silva, Eduardo Luzeiro Feitosa, and Vinicius Cardoso Garcia. 2020. Heuristic-based strategy for Phishing prediction: A survey of URL-based approach. Computers & Security 88 (2020), 101613.

Digital Library

[4]

Emmanuel Gbenga Dada, Joseph Stephen Bassi, Haruna Chiroma, Shafi'i Muhammad Abdulhamid, Adebayo Olusola Adetunmbi, and Opeyemi Emmanuel Ajibuwa. 2019. Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5, 6 (2019), e01802.

[5]

Europol. 2019. Spear phishing, a law enforcement and cross-industry perspective. https://www.europol.europa.eu/newsroom/news/europol-publishes-law-enforcement-and-industry-report-spear-phishing. Accessed: 2021-06-01.

[6]

Luigi Gallo, Alessandro Maiello, Alessio Botta, and Giorgio Ventre. 2021. 2 Years in the anti-phishing group of a large company. Computers & Security, (2021), 102259.

[7]

Tie Li, Gang Kou, and Yi Peng. 2020. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Information Systems 91 (2020), 101494.

[8]

Rami Mohammad and A Mohammad. 2020. A lifelong spam emails classification model. Applied Computing and Informatics, (01 2020), 10.

[9]

Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. 2019. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv preprint arXiv:1909.09223, (2019),.

[10]

Daniela Seabra Oliveira, Tian Lin, Harold Rocha, Donovan Ellis, Sandeep Dommaraju, Huizi Yang, Devon Weir, Sebastian Marin, and Natalie C. Ebner. 2019. Empirical analysis of weapons of influence, life domains, and demographic-targeting in modern spam: an age-comparative perspective. Crime Science 8, 1 (2019), 3.

[11]

Routhu Rao and Alwyn Pais. 2019. Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and Applications 31 (08 2019).

[12]

Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, and Banu Diri. 2019. Machine learning based phishing detection from URLs. Expert Systems with Applications 117 (2019), 345--357.

[13]

M. Sánchez-Paniagua, E. Fidalgo, V. González-Castro, and E. Alegre. 2021. Impact of Current Phishing Strategies in Machine Learning Models for Phishing Detection. In 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020), Álvaro Herrero, Carlos Cambra, Daniel Urda, Javier Sedano, Héctor Quintián, and Emilio Corchado (Eds.). Springer International Publishing, Cham, 87--96.

[14]

Wei, Qiao Ke, Jakub Nowak, Marcin Korytkowski, Rafał Scherer, and Marcin Woźniak. 2020. Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks 178 (2020), 107275.

Digital Library

[15]

P. Yang, G. Zhao, and P. Zeng. 2019. Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning. IEEE Access 7 (2019), 15196--15209.

Cited By

Liang CWang YChen QFeng XWang LLi MZhang HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Bootstrap Deep Metric for Seed Expansion in Attributed NetworksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657687(1629-1638)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657687
Jach A(2023)Artificial Intelligence Methods in Email Marketing—A SurveyDependable Computer Systems and Networks10.1007/978-3-031-37720-4_8(85-94)Online publication date: 11-Aug-2023
https://doi.org/10.1007/978-3-031-37720-4_8

Index Terms

Trustworthiness of spam email addresses using machine learning

Recommendations

Detecting malware using text documents extracted from spam email through machine learning
DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering

Spam has become an effective way for cybercriminals to spread malware. Although cybersecurity agencies and companies develop products and organise courses for people to detect malicious spam email patterns, spam attacks are not totally avoided yet. In ...
Comparison of machine learning techniques for spam detection
Abstract
Email is a useful communication medium for better reach. There are two types of emails, those are ham or legitimate email and spam email. Spam is a kind of bulk or unsolicited email that contains an advertisement, phishing website link, malware, ...
Preventing Spam Email by Delivery Limitation in RMX
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications Symposium

On the rule-based email exchange system called RMX, similar to general mailing lists, anyone can send emails by sending to an address unique to RMX. However, there is a security problem that we cannot prevent spam emails and accidentally sending email ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

August 2021

178 pages

ISBN:9781450385961

DOI:10.1145/3469096

General Chairs:
Patrick Healy
University of Limerick, Ireland
,
Mihai Bilauca
University of Limerick, Ireland
,
Program Chair:
Alexandra Bonnici
University of Malta, Malta

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DocEng '21

Sponsor:

SIGWEB

DocEng '21: ACM Symposium on Document Engineering 2021

August 24 - 27, 2021

Limerick, Ireland

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
166
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)10

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liang CWang YChen QFeng XWang LLi MZhang HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Bootstrap Deep Metric for Seed Expansion in Attributed NetworksProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657687(1629-1638)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657687
Jach A(2023)Artificial Intelligence Methods in Email Marketing—A SurveyDependable Computer Systems and Networks10.1007/978-3-031-37720-4_8(85-94)Online publication date: 11-Aug-2023
https://doi.org/10.1007/978-3-031-37720-4_8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents