Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2030376.2030392acmotherconferencesArticle/Chapter ViewAbstractPublication PagesceasConference Proceedingsconference-collections
research-article

A study of feature subset evaluators and feature subset searching methods for phishing classification

Published: 01 September 2011 Publication History

Abstract

Phishing is a semantic attack that aims to take advantage of the naivety of users of electronic services (e.g. e-banking). A number of solutions have been proposed to minimize the impact of phishing attacks. The most accurate email phishing classifiers, that are publicly known, use machine learning techniques. Previous work in phishing email classification via machine learning have primarily focused on enhancing the classification accuracy by studying the addition of novel features, ensembles, or classification algorithms. This study follows a different path by taking advantage of previously proposed features. The primary focus of this paper is to enhance the classification accuracy of phishing email classifiers by finding an effective feature subset out of a number of previously proposed features, by evaluating various feature selection methods. The selected feature subset in this study resulted in a classification model with an f1 score of 99.396% for 21 heuristic features and a single classifier.

References

[1]
Phishguru. http://www.wombatsecurity.com/phishguru. Accessed March 2011.
[2]
S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, eCrime '07, pages 60--69, New York, NY, USA, 2007. ACM.
[3]
A. Alnajim and M. Munro. An anti-phishing approach that uses training intervention for phishing websites detection. 2009.
[4]
E. Alpaydin. Introduction to machine learning. Knowl. Eng. Rev., 20:432--433, December 2005.
[5]
A. Bergholz, J. De Beer, S. Glahn, M.-F. Moens, G. Paaß, and S. Strobel. New filtering approaches for phishing email. J. Comput. Secur., 18:7--35, January 2010.
[6]
M. Chandrasekaran, K. Narayanan, and S. Upadhyaya. Phishing email detection based on structural properties. In NYS Cyber Security Conference, 2006.
[7]
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web, WWW '07, pages 649--656, New York, NY, USA, 2007. ACM.
[8]
W. N. Gansterer and D. Pölz. E-mail classification for phishing defense. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 449--460, Berlin, Heidelberg, 2009. Springer-Verlag.
[9]
M. A. Hall. Correlation-based feature selection for machine learning. 1998.
[10]
M. Khonji. Phishing studies. http://khonji.org/index.php/Phishing_Studies. Accessed April 2011.
[11]
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artif. Intell., 97:273--324, December 1997.
[12]
P. Kumaraguru, Y. Rhee, A. Acquisti, L. F. Cranor, J. Hong, and E. Nunge. Protecting people from phishing: the design and evaluation of an embedded training email system. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI '07, pages 905--914, New York, NY, USA, 2007. ACM.
[13]
P. Likarish, D. Dunbar, and T. E. Hansen. B-apt: Bayesian anti-phishing toolbar. 2008.
[14]
H. Liu and R. Setiono. A probabilistic approach to feature selection - a filter solution. pages 319--327. Morgan Kaufmann.
[15]
J. Nazario. Phishing corpus. http://monkey.org/~jose/wiki/doku.php?id=phishingcorpus. Accessed July 2010.
[16]
P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta. Phishnet: predictive blacklisting to detect phishing attacks. In INFOCOM'10: Proceedings of the 29th conference on Information communications, pages 346--350, Piscataway, NJ, USA, 2010. IEEE Press.
[17]
R. Quinlan. Data mining tools see5 and c5.0. http://www.rulequest.com/see5-info.html. Accessed April 2011.
[18]
S. Sheng, B. Magnien, P. Kumaraguru, A. Acquisti, L. F. Cranor, J. Hong, and E. Nunge. Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In Proceedings of the 3rd symposium on Usable privacy and security, SOUPS '07, pages 88--99, New York, NY, USA, 2007. ACM.
[19]
SpamAssassin. Public corpus. http://spamassassin.apache.org/publiccorpus/. Accessed January 2011.
[20]
F. Toolan and J. Carthy. Phishing detection using classifier ensembles. In eCrime Researchers Summit, 2009. eCRIME '09., pages 1--9, 20 2009-oct. 21 2009.
[21]
F. Toolan and J. Carthy. Feature selection for spam and phishing detection. In eCrime Researchers Summit (eCrime), 2010, eCrime '10, Dallas, TX, 2010.
[22]
W. University. Weka 3: Data mining software in java. http://www.cs.waikato.ac.nz/ml/weka/. Accessed January 2011.
[23]
C. Whittaker, B. Ryner, and M. Nazif. Large-scale automatic classification of phishing pages. http://research.google.com/pubs/pub35580.html. Accessed July 2010.
[24]
I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and Techniques (Third Edition). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011.

Cited By

View all
  • (2024)DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithmsSādhanā10.1007/s12046-024-02538-449:3Online publication date: 11-Jul-2024
  • (2023)URL Classification on Extracted Feature Using Deep LearningComputer Vision and Machine Intelligence10.1007/978-981-19-7867-8_33(415-428)Online publication date: 6-May-2023
  • (2021)A Comprehensive Survey of Phishing Email Detection and Protection TechniquesInformation Security Journal: A Global Perspective10.1080/19393555.2021.195967831:4(411-440)Online publication date: 15-Sep-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
September 2011
230 pages
ISBN:9781450307888
DOI:10.1145/2030376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature subset selection
  2. machine learning
  3. phishing e-mail classification

Qualifiers

  • Research-article

Conference

CEAS '11

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithmsSādhanā10.1007/s12046-024-02538-449:3Online publication date: 11-Jul-2024
  • (2023)URL Classification on Extracted Feature Using Deep LearningComputer Vision and Machine Intelligence10.1007/978-981-19-7867-8_33(415-428)Online publication date: 6-May-2023
  • (2021)A Comprehensive Survey of Phishing Email Detection and Protection TechniquesInformation Security Journal: A Global Perspective10.1080/19393555.2021.195967831:4(411-440)Online publication date: 15-Sep-2021
  • (2020)Phishing Email Detection Based on Binary Search Feature SelectionSN Computer Science10.1007/s42979-020-00194-z1:4Online publication date: 6-Jun-2020
  • (2020)Applicability of machine learning in spam and phishing email filtering: review and approachesArtificial Intelligence Review10.1007/s10462-020-09814-9Online publication date: 22-Feb-2020
  • (2019)Spears Against ShieldsProceedings of the ACM International Workshop on Security and Privacy Analytics10.1145/3309182.3309191(15-24)Online publication date: 13-Mar-2019
  • (2018)Detection of phishing websites using a novel twofold ensemble modelJournal of Systems and Information Technology10.1108/JSIT-09-2017-0074Online publication date: 18-Oct-2018
  • (2018)Defending against phishing attacksTelecommunications Systems10.1007/s11235-017-0334-z67:2(247-267)Online publication date: 1-Feb-2018
  • (2017)Employing machine learning techniques for detection and classification of phishing emails2017 Computing Conference10.1109/SAI.2017.8252096(149-156)Online publication date: Jul-2017
  • (2017)Phishing environments, techniques, and countermeasuresComputers and Security10.1016/j.cose.2017.04.00668:C(160-196)Online publication date: 1-Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media