research-article

A study of feature subset evaluators and feature subset searching methods for phishing classification

Authors:

Mahmoud Khonji,

Youssef IraqiAuthors Info & Claims

CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

Pages 135 - 144

https://doi.org/10.1145/2030376.2030392

Published: 01 September 2011 Publication History

Abstract

Phishing is a semantic attack that aims to take advantage of the naivety of users of electronic services (e.g. e-banking). A number of solutions have been proposed to minimize the impact of phishing attacks. The most accurate email phishing classifiers, that are publicly known, use machine learning techniques. Previous work in phishing email classification via machine learning have primarily focused on enhancing the classification accuracy by studying the addition of novel features, ensembles, or classification algorithms. This study follows a different path by taking advantage of previously proposed features. The primary focus of this paper is to enhance the classification accuracy of phishing email classifiers by finding an effective feature subset out of a number of previously proposed features, by evaluating various feature selection methods. The selected feature subset in this study resulted in a classification model with an f₁ score of 99.396% for 21 heuristic features and a single classifier.

References

[1]

Phishguru. http://www.wombatsecurity.com/phishguru. Accessed March 2011.

[2]

S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, eCrime '07, pages 60--69, New York, NY, USA, 2007. ACM.

Digital Library

[3]

A. Alnajim and M. Munro. An anti-phishing approach that uses training intervention for phishing websites detection. 2009.

[4]

E. Alpaydin. Introduction to machine learning. Knowl. Eng. Rev., 20:432--433, December 2005.

Digital Library

[5]

A. Bergholz, J. De Beer, S. Glahn, M.-F. Moens, G. Paaß, and S. Strobel. New filtering approaches for phishing email. J. Comput. Secur., 18:7--35, January 2010.

Digital Library

[6]

M. Chandrasekaran, K. Narayanan, and S. Upadhyaya. Phishing email detection based on structural properties. In NYS Cyber Security Conference, 2006.

[7]

I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the 16th international conference on World Wide Web, WWW '07, pages 649--656, New York, NY, USA, 2007. ACM.

Digital Library

[8]

W. N. Gansterer and D. Pölz. E-mail classification for phishing defense. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 449--460, Berlin, Heidelberg, 2009. Springer-Verlag.

Digital Library

[9]

M. A. Hall. Correlation-based feature selection for machine learning. 1998.

[10]

M. Khonji. Phishing studies. http://khonji.org/index.php/Phishing_Studies. Accessed April 2011.

[11]

R. Kohavi and G. H. John. Wrappers for feature subset selection. Artif. Intell., 97:273--324, December 1997.

Digital Library

[12]

P. Kumaraguru, Y. Rhee, A. Acquisti, L. F. Cranor, J. Hong, and E. Nunge. Protecting people from phishing: the design and evaluation of an embedded training email system. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI '07, pages 905--914, New York, NY, USA, 2007. ACM.

Digital Library

[13]

P. Likarish, D. Dunbar, and T. E. Hansen. B-apt: Bayesian anti-phishing toolbar. 2008.

[14]

H. Liu and R. Setiono. A probabilistic approach to feature selection - a filter solution. pages 319--327. Morgan Kaufmann.

[15]

J. Nazario. Phishing corpus. http://monkey.org/~jose/wiki/doku.php?id=phishingcorpus. Accessed July 2010.

[16]

P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta. Phishnet: predictive blacklisting to detect phishing attacks. In INFOCOM'10: Proceedings of the 29th conference on Information communications, pages 346--350, Piscataway, NJ, USA, 2010. IEEE Press.

Digital Library

[17]

R. Quinlan. Data mining tools see5 and c5.0. http://www.rulequest.com/see5-info.html. Accessed April 2011.

[18]

S. Sheng, B. Magnien, P. Kumaraguru, A. Acquisti, L. F. Cranor, J. Hong, and E. Nunge. Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In Proceedings of the 3rd symposium on Usable privacy and security, SOUPS '07, pages 88--99, New York, NY, USA, 2007. ACM.

Digital Library

[19]

SpamAssassin. Public corpus. http://spamassassin.apache.org/publiccorpus/. Accessed January 2011.

[20]

F. Toolan and J. Carthy. Phishing detection using classifier ensembles. In eCrime Researchers Summit, 2009. eCRIME '09., pages 1--9, 20 2009-oct. 21 2009.

[21]

F. Toolan and J. Carthy. Feature selection for spam and phishing detection. In eCrime Researchers Summit (eCrime), 2010, eCrime '10, Dallas, TX, 2010.

[22]

W. University. Weka 3: Data mining software in java. http://www.cs.waikato.ac.nz/ml/weka/. Accessed January 2011.

[23]

C. Whittaker, B. Ryner, and M. Nazif. Large-scale automatic classification of phishing pages. http://research.google.com/pubs/pub35580.html. Accessed July 2010.

[24]

I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and Techniques (Third Edition). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011.

Digital Library

Cited By

Somesha MPais A(2024)DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithmsSādhanā10.1007/s12046-024-02538-449:3Online publication date: 11-Jul-2024
https://doi.org/10.1007/s12046-024-02538-4
Sahoo VSingh VGourisaria MAcharya A(2023)URL Classification on Extracted Feature Using Deep LearningComputer Vision and Machine Intelligence10.1007/978-981-19-7867-8_33(415-428)Online publication date: 6-May-2023
https://doi.org/10.1007/978-981-19-7867-8_33
Kumar Birthriya SJain A(2021)A Comprehensive Survey of Phishing Email Detection and Protection TechniquesInformation Security Journal: A Global Perspective10.1080/19393555.2021.195967831:4(411-440)Online publication date: 15-Sep-2021
https://doi.org/10.1080/19393555.2021.1959678
Show More Cited By

Index Terms

A study of feature subset evaluators and feature subset searching methods for phishing classification

Recommendations

Selecting feature subset for high dimensional data via the propositional FOIL rules

Feature interaction is an important issue in feature subset selection. However, most of the existing algorithms only focus on dealing with irrelevant and redundant features. In this paper, a propositional FOIL rule based algorithm FRFS, which not only ...
Semi-wrapper feature subset selector for feed-forward neural networks: Applications to binary and multi-class classification problems
Highlights
- Subset-based feature selection + multi-layer perceptron.
- Subset-based feature selection + radial basis function neural networks.
- Semi-wrapper feature subset selection approach based on Naïve Bayes.
- Extensive experiments on 34 ...
Abstract
This paper explores widely the data preparation stage within the process of knowledge discovery and data mining via feature subset selection in the context of two very well-known neural models: radial basis function neural networks and multi-...
Relevant Feature Subset Selection from Ensemble of Multiple Feature Extraction Methods for Texture Classification

Performance of texture classification for a given set of texture patterns depends on the choice of feature extraction technique. Integration of features from various feature extraction methods not only eliminates risk of method selection but also brings ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

September 2011

230 pages

ISBN:9781450307888

DOI:10.1145/2030376

General Chair:
Vidyasagar Potdar
Curtin University, Australia

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CEAS '11

CEAS '11: The 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

September 1 - 2, 2011

Perth, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
417
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Somesha MPais A(2024)DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithmsSādhanā10.1007/s12046-024-02538-449:3Online publication date: 11-Jul-2024
https://doi.org/10.1007/s12046-024-02538-4
Sahoo VSingh VGourisaria MAcharya A(2023)URL Classification on Extracted Feature Using Deep LearningComputer Vision and Machine Intelligence10.1007/978-981-19-7867-8_33(415-428)Online publication date: 6-May-2023
https://doi.org/10.1007/978-981-19-7867-8_33
Kumar Birthriya SJain A(2021)A Comprehensive Survey of Phishing Email Detection and Protection TechniquesInformation Security Journal: A Global Perspective10.1080/19393555.2021.195967831:4(411-440)Online publication date: 15-Sep-2021
https://doi.org/10.1080/19393555.2021.1959678
Sonowal G(2020)Phishing Email Detection Based on Binary Search Feature SelectionSN Computer Science10.1007/s42979-020-00194-z1:4Online publication date: 6-Jun-2020
https://doi.org/10.1007/s42979-020-00194-z
Gangavarapu TJaidhar CChanduka B(2020)Applicability of machine learning in spam and phishing email filtering: review and approachesArtificial Intelligence Review10.1007/s10462-020-09814-9Online publication date: 22-Feb-2020
https://doi.org/10.1007/s10462-020-09814-9
El Aassal AVerma RVerma RSubramaniam DSung AVerma R(2019)Spears Against ShieldsProceedings of the ACM International Workshop on Security and Privacy Analytics10.1145/3309182.3309191(15-24)Online publication date: 13-Mar-2019
https://dl.acm.org/doi/10.1145/3309182.3309191
Nagaraj KBhattacharjee BSridhar AGS S(2018)Detection of phishing websites using a novel twofold ensemble modelJournal of Systems and Information Technology10.1108/JSIT-09-2017-0074Online publication date: 18-Oct-2018
https://doi.org/10.1108/JSIT-09-2017-0074
Gupta BArachchilage NPsannis K(2018)Defending against phishing attacksTelecommunications Systems10.1007/s11235-017-0334-z67:2(247-267)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s11235-017-0334-z
Moradpoor NClavie BBuchanan B(2017)Employing machine learning techniques for detection and classification of phishing emails2017 Computing Conference10.1109/SAI.2017.8252096(149-156)Online publication date: Jul-2017
https://doi.org/10.1109/SAI.2017.8252096
Aleroud AZhou L(2017)Phishing environments, techniques, and countermeasuresComputers and Security10.1016/j.cose.2017.04.00668:C(160-196)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1016/j.cose.2017.04.006
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents