Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1323276.1323285guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Exploiting redundancy in natural language to penetrate Bayesian spam filters

Published: 06 August 2007 Publication History

Abstract

Today's attacks against Bayesian spam filters attempt to keep the content of spam mails visible to humans, but obscured to filters. A common technique is to fool filters by appending additional words to a spam mail. Because these words appear very rarely in spam mails, filters are inclined to classify the mail as legitimate.
The idea we present in this paper leverages the fact that natural language typically contains synonyms. Synonyms are different words that describe similar terms and concepts. Such words often have significantly different spam probabilities. Thus, an attacker might be able to penetrate Bayesian filters by replacing suspicious words by innocuous terms with the same meaning. A precondition for the success of such an attack is that Bayesian spam filters of different users assign similar spam probabilities to similar tokens. We first examine whether this precondition is met; afterwards, we measure the effectivity of an automated substitution attack by creating a test set of spam messages that are tested against SpamAssassin, DSPAM, and Gmail.

References

[1]
{1} ARADHYE, H. B., MYERS, G. K., AND HERSON, J. A. Image analysis for efficient categorization of image-based spam e-mail. In Eighth International Conference on Document Analysis and Recognition (2005).
[2]
{2} BOWERS, J. Bayes Attack Report. http://web. archive.org/web/20050206210806/www.jerf. org/writings/bayesReport.html, February 2003.
[3]
{3} CipherTrust SpamArchive. ftp://mirrors.blueyonder. co.uk/sites/ftp.spamarchive.org/pub/ archives/submit/.
[4]
{4} CUTTING, D., KUPIEC, J., AND PEDERSEN, J. A practical partof-speech tagger. In Third Conference on Applied Natural Language Processing (1992), Xerox Palo Alto Research Center.
[5]
{5} The DSPAM Project. http://dspam. nuclearelephant.com/.
[6]
{6} Enron Email Dataset. http://www.cs.cmu.edu/ ~enron/.
[7]
{7} GOOGLE. Gmail. http://mail.google.com/.
[8]
{8} GRAHAM-CUMMING, J. The spammers' compendium. http: //www.jgc.org/tsc.html.
[9]
{9} GRAHAM-CUMMING, J. How to beat an adaptive spam filter. In MIT Spam Conference (2004).
[10]
{10} GUENTER, B. Bruce Guenter's SPAM Archive. http://www. untroubled.org/spam/.
[11]
{11} KRAWETZ, N. Anti-Spam Solutions and Security. http:// www.securityfocus.com/infocus/1763, 2004.
[12]
{12} LingPipe 2.4.0. http://www.alias-i.com/lingpipe/.
[13]
{13} LOWD, D., AND MEEK, C. Good word attacks on statistical spam filters. In Conference on Email and Anti-Spam (2005).
[14]
{14} MIHALCEA, R. Senselearner. http://lit.csci.unt. edu/~senselearner/.
[15]
{15} PRINCTON. Wordnet 2.1. http://wordnet.princeton. edu/, 2006.
[16]
{16} PU, C., AND WEBB, S. Observed trends in spam construction techniques: A case study of spam evolution. In Third Conference on Email and Anti-Spam (CEAS) (2006), p. 104.
[17]
{17} SpamAssassin. http://spamassassin.apache.org/.
[18]
{18} SULLIVAN, T. The more things change: Volatility and stability in spam features. In MIT Spam Conference (2004).
[19]
{19} THORYK, R. Tliquest spam archives. http: //web.archive.org/web/20051104234750/http: //www.tliquest.net/spam/archive/.
[20]
{20} WITTEL, G., AND WU, F. Attacking statistical spam filters. In First Conference on Email and Anti-Spam (CEAS) (July 2004).
[21]
{21} ZDZIARSKI, J. Bayesian noise reduction: Contextual symmetry logic utilizing pattern consistency analysis. In MIT Spam Conference (2005).
[22]
{22} ZDZIARSKI, J. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
WOOT '07: Proceedings of the first USENIX workshop on Offensive Technologies
August 2007
78 pages

Sponsors

  • USENIX Assoc: USENIX Assoc

Publisher

USENIX Association

United States

Publication History

Published: 06 August 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Evasion-Robust Classification on Binary DomainsACM Transactions on Knowledge Discovery from Data10.1145/318628212:4(1-32)Online publication date: 8-Jun-2018
  • (2011)Removing web spam links from search engine resultsJournal in Computer Virology10.1007/s11416-009-0132-67:1(51-62)Online publication date: 1-Feb-2011
  • (2009)Comment spam injection made easyProceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference10.5555/1700527.1700820(1202-1206)Online publication date: 11-Jan-2009
  • (2009)All your contacts are belong to usProceedings of the 18th international conference on World wide web10.1145/1526709.1526784(551-560)Online publication date: 20-Apr-2009
  • (2008)Measurement and classification of humans and bots in internet chatProceedings of the 17th conference on Security symposium10.5555/1496711.1496722(155-169)Online publication date: 28-Jul-2008
  • (2008)Exploiting machine learning to subvert your spam filterProceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats10.5555/1387709.1387716(1-9)Online publication date: 15-Apr-2008

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media