research-article

Enhancing scalability in anomaly-based email spam filtering

Authors:

Carlos Laorden,

Xabier Ugarte-Pedrero,

Pablo G. BringasAuthors Info & Claims

CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

Pages 13 - 22

https://doi.org/10.1145/2030376.2030378

Published: 01 September 2011 Publication History

Abstract

Spam has become an important problem for computer security because it is a channel for the spreading of threats such as computer viruses, worms and phishing. Currently, more than 85% of received emails are spam. Historical approaches to combat these messages, including simple techniques such as sender blacklisting or the use of email signatures, are no longer completely reliable. Many solutions utilise machine-learning approaches trained using statistical representations of the terms that usually appear in the emails. However, these methods require a time-consuming training step with labelled data. Dealing with the situation where the availability of labelled training instances is limited slows down the progress of filtering systems and offers advantages to spammers. In a previous work, we presented the first spam filtering method based on anomaly detection that reduces the necessity of labelling spam messages and only employs the representation of legitimate emails. We showed that this method achieved high accuracy rates detecting spam while maintaining a low false positive rate and reducing the effort produced by labelling spam. In this paper, we enhance that system applying a data reduction algorithm to the labelled dataset, finding similarities among legitimate emails and grouping them to form consistent clusters that reduce the amount of needed comparisons. We show that this improvement reduces drastically the processing time, while maintaining detection and false positive rates stable.

References

[1]

I. Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras, and C. Spyropoulos. An evaluation of naive bayesian anti-spam filtering. In Proceedings of the workshop on Machine Learning in the New Information Age, pages 9--17, 2000.

[2]

R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

Digital Library

[3]

C. Bishop. Pattern recognition and machine learning. Springer New York., 2006.

Digital Library

[4]

A. Bratko, B. Filipič, G. Cormack, T. Lynam, and B. Zupan. Spam filtering using statistical data compression models. The Journal of Machine Learning Research, 7:2673--2698, 2006.

Digital Library

[5]

B. Burton. Spamprobe-bayesian spam filtering tweaks. In Proceedings of the Spam Conference, 2003.

[6]

P. Chirita, J. Diederich, and W. Nejdl. MailRank: using ranking for spam detection. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 373--380. ACM, 2005.

Digital Library

[7]

Y. Chiu, C. Chen, B. Jeng, and H. Lin. An Alliance-Based Anti-spam Approach. In Natural Computation, 2007. ICNC 2007. Third International Conference on, volume 4, pages 203--207. IEEE, 2007.

Digital Library

[8]

G. Cormack. TREC 2007 spam track overview. In Sixteenth Text REtrieval Conference (TREC-2007), 2007.

[9]

L. Heyer, S. Kruglyak, and S. Yooseph. Exploring expression data: identification and analysis of coexpressed genes. Genome research, 9(11):1106--1115, 1999.

[10]

T. Jagatic, N. Johnson, M. Jakobsson, and F. Menczer. Social phishing. Communications of the ACM, 50(10):94--100, 2007.

Digital Library

[11]

J. Kent. Information gain and a general measure of correlation. Biometrika, 70(1):163, 1983.

[12]

R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence, volume 14, pages 1137--1145, 1995.

Digital Library

[13]

V. Kumar. An introduction to cluster analysis for data mining. Computer Science Department, University of Minnesota, USA, 2000.

[14]

J. Mason. Filtering spam with spamassassin. In HEANet Annual Conference, 2002.

[15]

E. Raymond. Bogofilter: A fast open source bayesian spam filters, 2005.

[16]

G. Robinson. A statistical approach to the spam problem. Linux J., 2003:3, March 2003.

Digital Library

[17]

G. Sakkis, I. Androutsopoulos, G. Paliouras, V. Karkaletsis, C. Spyropoulos, and P. Stamatopoulos. A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval, 6(1):49--73, 2003.

Digital Library

[18]

G. Salton and M. McGill. Introduction to modern information retrieval. McGraw-Hill New York, 1983.

Digital Library

[19]

G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975.

Digital Library

[20]

I. Santos, C. Laorden, X. Ugarte-Pedrero, B. Sanz, and P. G. Bringas. Anomaly-based spam filtering. In Proceedings of the 6 ^th International Conference on Security and Cryptography (SECRYPT), pages 5--14, 2011.

[21]

G. Schryen. A formal approach towards assessing the effectiveness of anti-spam procedures. In System Sciences, 2006. HICSS'06. Proceedings of the 39th Annual Hawaii International Conference on, volume 6, pages 129--138. IEEE, 2006.

Digital Library

[22]

W. Wilbur and K. Sirotkin. The automatic identification of stop words. Journal of information science, 18(1):45--55, 1992.

Digital Library

[23]

L. Zhang, J. Zhu, and T. Yao. An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP), 3(4):243--269, 2004.

Digital Library

Cited By

Chanti SChithralekha T(2019)Classification of Anti-phishing SolutionsSN Computer Science10.1007/s42979-019-0011-21:1Online publication date: 16-Jul-2019
https://doi.org/10.1007/s42979-019-0011-2

Index Terms

Enhancing scalability in anomaly-based email spam filtering
1. Applied computing
  1. Electronic commerce
    1. Secure online transactions
2. Security and privacy
  1. Human and societal aspects of security and privacy
  2. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

Boosting scalability in anomaly-based packed executable filtering
Inscrypt'11: Proceedings of the 7th international conference on Information Security and Cryptology

During the last years, malware writers have been using several techniques to evade detection. One of the most common techniques employed by the anti-virus industry is signature scanning. This method requires the end-host to compare files against a ...
Can We CAN the Email Spam
CTC '13: Proceedings of the 2013 Fourth Cybercrime and Trustworthy Computing Workshop

The purpose of email spam is to advertise to sell, phishing attacks, DDOS attacks and many more. Many solutions of various kinds such as blacklisting, whitelisting, grey-listing, content filtering have been proposed at the sender and receiver levels. ...
Economic metric to improve spam detectors

Economic lifting has made email spam a scathing threat to the society due to its related exploits. Many spam detection schemes have been proposed employing the tendency of spam to alter the normal statistical behavior of mail traffic. Threshold tuning ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

September 2011

230 pages

ISBN:9781450307888

DOI:10.1145/2030376

General Chair:
Vidyasagar Potdar
Curtin University, Australia

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CEAS '11

CEAS '11: The 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

September 1 - 2, 2011

Perth, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
247
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chanti SChithralekha T(2019)Classification of Anti-phishing SolutionsSN Computer Science10.1007/s42979-019-0011-21:1Online publication date: 16-Jul-2019
https://doi.org/10.1007/s42979-019-0011-2

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents