research-article

Identifying spam e-mail based-on statistical header features and sender behavior

Authors:

Ismail M. Khater,

Mahdi WashahaAuthors Info & Claims

CUBE '12: Proceedings of the CUBE International Information Technology Conference

Pages 771 - 778

https://doi.org/10.1145/2381716.2381863

Published: 03 September 2012 Publication History

Abstract

Email Spam filtering still a sophisticated and challenging problem as long as spammers continue developing new methods and techniques that are being used in their campaigns to defeat and confuse email spam filtering process. Moreover, utilizing email header information imposing additional challenges in classifying emails because the header information can be easily spoofed by spammers. Also, in recent years, spam has become a major problem at social, economical, political, and organizational levels because it decreases the employee productivity and causes traffic congestions in networks. In this paper, we present a powerful and useful email header features by utilizing the header session messages based on publicly datasets. Then, we apply many machine learning-based classifiers on the extracted header features to show the power of the extracted header features in filtering spam and ham messages by evaluating and comparing classifiers performance. In experiment stage, we apply the following classifiers: Random Forest (RF), C4.5 Decision Tree (J48), Voting Feature Intervals (VFI), Random Tree (RT), REPTree (REPT), Bayesian Network (BN), and Naïve Bayes (NB). The experimental results show that the RF classifier has the best performance with an accuracy, precision, recall, F-measure of 99.27%, 99.40%, 99.50%, and 99.50% when all mentioned features are used included the trust feature.

References

[1]

Christian, K., et al., Spamcraft: an inside look at spam campaign orchestration, in Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more. 2009, USENIX Association: Boston, MA.

Digital Library

[2]

Intelligence, S. Symantec Intelligence Report: November 2011 2011 {cited January, 2012}; Available from: http://www.symantec.com/content/en/us/enterprise/other_resources/b-intelligence_report_11-2011.en-us.pdf.

[3]

The Real Cost of Spam. 2007 {cited January, 2012}; Available from: http://www.itsecurity.com/features/real-cost-of-spam-121007/.

[4]

Reading and Understanding Email Headers. {cited March, 2012}; Available from: http://www.by-users.co.uk/faqs/email/headers/.

[5]

J. K Network Working Group. Simple Mail Transfer Protocol. {cited; Available from: http://tools.ietf.org/html/rfc5321.

[6]

P. R. Network Working Group, E. Request for Comments RFC 2822,. {cited March, 2012}; Available from: http://tools.ietf.org/html/rfc2822.html.

[7]

Gansterer, W. N., et al., Spam Filtering Based on Latent Semantic Indexing Survey of Text Mining II. 2008, Springer London. p. 165--183.

[8]

Chih-Hung, W., Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst. Appl., 2009. 36(3): p. 4321--4330.

Digital Library

[9]

Miao, Y., et al. A Spam Discrimination Based on Mail Header Feature and SVM. in Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on. 2008.

[10]

Hu, Y., et al., A scalable intelligent non-content-based spam-filtering framework. Expert Systems with Applications. 37(12): p. 8557--8565.

Digital Library

[11]

Wang, C.-C. and S.-Y. Chen, Using header session messages to anti-spamming. Computers & Security, 2007. 26(5): p. 381--390.

Digital Library

[12]

J., S., An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization. I. J. Network Security. 9: p. 34--43.

[13]

Al-Jarrah, O., I. Khater, and B. Al-Duwairi. Identifying Potentially Useful Email Header Features for Email Spam Filtering. in The Sixth International Conference on Digital Society (ICDS), 2012. Valencia, Spain.

[14]

n-gram. {cited March, 2012}; Available from: http://en.wikipedia.org/wiki/N-gram.

[15]

Web of Trust. {cited March, 2012}; Available from: http://www.mywot.com/.

[16]

Mark Hall, E. F., Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations.

Digital Library

[17]

corpus, C. L. S. C. L. {cited March, 2012}; Available from: http://plg1.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/fooceas.

[18]

C. GROUP. (2010, S.e.d., CSDMC2010 and S. corpus). {cited March, 2012}; Available from: http://csmining.org/index.php/spam-email-datasets-.html.

Cited By

Zhao PJin S(2024)Fewshing: A Few-Shot Learning Approach to Phishing Email Detection2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674290(371-375)Online publication date: 21-Jun-2024
https://doi.org/10.1109/SEAI62072.2024.10674290
Papageorgiou GEconomou PBersimis S(2024)A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emailsJournal of Applied Statistics10.1080/02664763.2024.230753551:13(2592-2626)Online publication date: 30-Jan-2024
https://doi.org/10.1080/02664763.2024.2307535
Muralidharan TNissim N(2023)Improving malicious email detection through novel designated deep-learning architectures utilizing entire emailNeural Networks10.1016/j.neunet.2022.09.002157:C(257-279)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.neunet.2022.09.002
Show More Cited By

Index Terms

Identifying spam e-mail based-on statistical header features and sender behavior
1. Applied computing
  1. Electronic commerce
    1. Secure online transactions
2. Security and privacy
  1. Human and societal aspects of security and privacy
  2. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

An evaluation of statistical spam filtering techniques

This paper evaluates five supervised learning methods in the context of statistical spam filtering. We study the impact of different feature pruning methods and feature set sizes on each learner's performance using cost-sensitive measures. It is ...
Improving spam email classification accuracy using ensemble techniques: a stacking approach
Abstract
Spam emails pose a substantial cybersecurity danger, necessitating accurate classification to reduce unwanted messages and mitigate risks. This study focuses on enhancing spam email classification accuracy using stacking ensemble machine learning ...
Spam filtering in twitter using sender-receiver relationship
RAID'11: Proceedings of the 14th international conference on Recent Advances in Intrusion Detection

Twitter is one of the most visited sites in these days. Twitter spam, however, is constantly increasing. Since Twitter spam is different from traditional spam such as email and blog spam, conventional spam filtering methods are inappropriate to detect ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CUBE '12: Proceedings of the CUBE International Information Technology Conference

September 2012

879 pages

ISBN:9781450311854

DOI:10.1145/2381716

General Chair:
Vidyasagar Potdar
Curtin University, Australia
,
Program Chair:
Debajyoti Mukhopadhyay
Maharashtra Institute of Technology, India

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

CUOT: Curtin University of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CUBE '12

Sponsor:

CUOT

CUBE '12: CUBE International IT Conference & Exhibition

September 3 - 5, 2012

Pune, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
414
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao PJin S(2024)Fewshing: A Few-Shot Learning Approach to Phishing Email Detection2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674290(371-375)Online publication date: 21-Jun-2024
https://doi.org/10.1109/SEAI62072.2024.10674290
Papageorgiou GEconomou PBersimis S(2024)A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emailsJournal of Applied Statistics10.1080/02664763.2024.230753551:13(2592-2626)Online publication date: 30-Jan-2024
https://doi.org/10.1080/02664763.2024.2307535
Muralidharan TNissim N(2023)Improving malicious email detection through novel designated deep-learning architectures utilizing entire emailNeural Networks10.1016/j.neunet.2022.09.002157:C(257-279)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.neunet.2022.09.002
Ul Haq IBlack PGondal IKamruzzaman JWatters PKayes A(2022)Spam Email Categorization with NLP and Using Federated Deep LearningAdvanced Data Mining and Applications10.1007/978-3-031-22137-8_2(15-27)Online publication date: 24-Nov-2022
https://doi.org/10.1007/978-3-031-22137-8_2
Bouarara HHamou RAmine A(2020)New Bio Inspired Techniques in the Filtering of SpamRobotic Systems10.4018/978-1-7998-1754-3.ch037(693-726)Online publication date: 2020
https://doi.org/10.4018/978-1-7998-1754-3.ch037
Bayram UPestian JSantel DMinai A(2019)What’s in a Word? Detecting Partisan Affiliation from Word Use in Congressional Speeches2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8851739(1-8)Online publication date: Jul-2019
https://doi.org/10.1109/IJCNN.2019.8851739
Zhang JLi WGong LGu ZWu J(2019)Targeted Malicious Email Detection Using Hypervisor-Based Dynamic Analysis and Ensemble Learning2019 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM38437.2019.9014069(1-6)Online publication date: Dec-2019
https://doi.org/10.1109/GLOBECOM38437.2019.9014069
Zhang SWang WTang SJin SJiang T(2019)Localizing Backscatters by a Single Robot with Zero Start-Up Cost2019 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM38437.2019.9013768(1-6)Online publication date: Dec-2019
https://doi.org/10.1109/GLOBECOM38437.2019.9013768
Dan KKitagawa NSakuraba SYamai N(2019)Spam Domain Detection Method Using Active DNS Data and E-Mail Reception Log2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00133(896-899)Online publication date: Jul-2019
https://doi.org/10.1109/COMPSAC.2019.00133
Krause TUetz RKretschmann T(2019)Recognizing Email Spam from Meta Data Only2019 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS.2019.8802827(178-186)Online publication date: Jun-2019
https://doi.org/10.1109/CNS.2019.8802827
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents