Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2381716.2381863acmotherconferencesArticle/Chapter ViewAbstractPublication PagescubeConference Proceedingsconference-collections
research-article

Identifying spam e-mail based-on statistical header features and sender behavior

Published: 03 September 2012 Publication History

Abstract

Email Spam filtering still a sophisticated and challenging problem as long as spammers continue developing new methods and techniques that are being used in their campaigns to defeat and confuse email spam filtering process. Moreover, utilizing email header information imposing additional challenges in classifying emails because the header information can be easily spoofed by spammers. Also, in recent years, spam has become a major problem at social, economical, political, and organizational levels because it decreases the employee productivity and causes traffic congestions in networks. In this paper, we present a powerful and useful email header features by utilizing the header session messages based on publicly datasets. Then, we apply many machine learning-based classifiers on the extracted header features to show the power of the extracted header features in filtering spam and ham messages by evaluating and comparing classifiers performance. In experiment stage, we apply the following classifiers: Random Forest (RF), C4.5 Decision Tree (J48), Voting Feature Intervals (VFI), Random Tree (RT), REPTree (REPT), Bayesian Network (BN), and Naïve Bayes (NB). The experimental results show that the RF classifier has the best performance with an accuracy, precision, recall, F-measure of 99.27%, 99.40%, 99.50%, and 99.50% when all mentioned features are used included the trust feature.

References

[1]
Christian, K., et al., Spamcraft: an inside look at spam campaign orchestration, in Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more. 2009, USENIX Association: Boston, MA.
[2]
Intelligence, S. Symantec Intelligence Report: November 2011 2011 {cited January, 2012}; Available from: http://www.symantec.com/content/en/us/enterprise/other_resources/b-intelligence_report_11-2011.en-us.pdf.
[3]
The Real Cost of Spam. 2007 {cited January, 2012}; Available from: http://www.itsecurity.com/features/real-cost-of-spam-121007/.
[4]
Reading and Understanding Email Headers. {cited March, 2012}; Available from: http://www.by-users.co.uk/faqs/email/headers/.
[5]
J. K Network Working Group. Simple Mail Transfer Protocol. {cited; Available from: http://tools.ietf.org/html/rfc5321.
[6]
P. R. Network Working Group, E. Request for Comments RFC 2822,. {cited March, 2012}; Available from: http://tools.ietf.org/html/rfc2822.html.
[7]
Gansterer, W. N., et al., Spam Filtering Based on Latent Semantic Indexing Survey of Text Mining II. 2008, Springer London. p. 165--183.
[8]
Chih-Hung, W., Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst. Appl., 2009. 36(3): p. 4321--4330.
[9]
Miao, Y., et al. A Spam Discrimination Based on Mail Header Feature and SVM. in Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on. 2008.
[10]
Hu, Y., et al., A scalable intelligent non-content-based spam-filtering framework. Expert Systems with Applications. 37(12): p. 8557--8565.
[11]
Wang, C.-C. and S.-Y. Chen, Using header session messages to anti-spamming. Computers & Security, 2007. 26(5): p. 381--390.
[12]
J., S., An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization. I. J. Network Security. 9: p. 34--43.
[13]
Al-Jarrah, O., I. Khater, and B. Al-Duwairi. Identifying Potentially Useful Email Header Features for Email Spam Filtering. in The Sixth International Conference on Digital Society (ICDS), 2012. Valencia, Spain.
[14]
n-gram. {cited March, 2012}; Available from: http://en.wikipedia.org/wiki/N-gram.
[15]
Web of Trust. {cited March, 2012}; Available from: http://www.mywot.com/.
[16]
Mark Hall, E. F., Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations.
[17]
corpus, C. L. S. C. L. {cited March, 2012}; Available from: http://plg1.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/fooceas.
[18]
C. GROUP. (2010, S.e.d., CSDMC2010 and S. corpus). {cited March, 2012}; Available from: http://csmining.org/index.php/spam-email-datasets-.html.

Cited By

View all
  • (2024)Fewshing: A Few-Shot Learning Approach to Phishing Email Detection2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674290(371-375)Online publication date: 21-Jun-2024
  • (2024)A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emailsJournal of Applied Statistics10.1080/02664763.2024.230753551:13(2592-2626)Online publication date: 30-Jan-2024
  • (2023)Improving malicious email detection through novel designated deep-learning architectures utilizing entire emailNeural Networks10.1016/j.neunet.2022.09.002157:C(257-279)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CUBE '12: Proceedings of the CUBE International Information Technology Conference
September 2012
879 pages
ISBN:9781450311854
DOI:10.1145/2381716
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • CUOT: Curtin University of Technology

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. ham
  3. machine learning
  4. spam
  5. spam filtering

Qualifiers

  • Research-article

Conference

CUBE '12
Sponsor:
  • CUOT

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fewshing: A Few-Shot Learning Approach to Phishing Email Detection2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674290(371-375)Online publication date: 21-Jun-2024
  • (2024)A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emailsJournal of Applied Statistics10.1080/02664763.2024.230753551:13(2592-2626)Online publication date: 30-Jan-2024
  • (2023)Improving malicious email detection through novel designated deep-learning architectures utilizing entire emailNeural Networks10.1016/j.neunet.2022.09.002157:C(257-279)Online publication date: 1-Jan-2023
  • (2022)Spam Email Categorization with NLP and Using Federated Deep LearningAdvanced Data Mining and Applications10.1007/978-3-031-22137-8_2(15-27)Online publication date: 24-Nov-2022
  • (2020)New Bio Inspired Techniques in the Filtering of SpamRobotic Systems10.4018/978-1-7998-1754-3.ch037(693-726)Online publication date: 2020
  • (2019)What’s in a Word? Detecting Partisan Affiliation from Word Use in Congressional Speeches2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8851739(1-8)Online publication date: Jul-2019
  • (2019)Targeted Malicious Email Detection Using Hypervisor-Based Dynamic Analysis and Ensemble Learning2019 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM38437.2019.9014069(1-6)Online publication date: Dec-2019
  • (2019)Localizing Backscatters by a Single Robot with Zero Start-Up Cost2019 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM38437.2019.9013768(1-6)Online publication date: Dec-2019
  • (2019)Spam Domain Detection Method Using Active DNS Data and E-Mail Reception Log2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00133(896-899)Online publication date: Jul-2019
  • (2019)Recognizing Email Spam from Meta Data Only2019 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS.2019.8802827(178-186)Online publication date: Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media