article

Heuristic nonlinear regression strategy for detecting phishing websites

Authors:

Mehdi Babagoli,

Mohammad Pourmahmood Aghababa,

Vahid SoloukAuthors Info & Claims

Soft Computing - A Fusion of Foundations, Methodologies and Applications, Volume 23, Issue 12

Pages 4315 - 4327

https://doi.org/10.1007/s00500-018-3084-2

Published: 01 June 2019 Publication History

Abstract

In this paper, we propose a method of phishing website detection that utilizes a meta-heuristic-based nonlinear regression algorithm together with a feature selection approach. In order to validate the proposed method, we used a dataset comprised of 11055 phishing and legitimate webpages, and select 20 features to be extracted from the mentioned websites. This research utilizes two feature selection methods: decision tree and wrapper to select the best feature subset, while the latter incurred the detection accuracy rate as high as 96.32%. After the feature selection process, two meta-heuristic algorithms are successfully implemented to predict and detect the fraudulent websites: harmony search (HS) which was deployed based on nonlinear regression technique and support vector machine (SVM). The nonlinear regression approach was used to classify the websites, where the parameters of the proposed regression model were obtained using HS algorithm. The proposed HS algorithm uses dynamic pitch adjustment rate and generated new harmony. The nonlinear regression based on HS led to accuracy rates of 94.13 and 92.80% for train and test processes, respectively. As a result, the study finds that the nonlinear regression-based HS results in better performance compared to SVM.

References

[1]

Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41:5948- 5959.

[2]

Aburrous M, Hossain MA, Thabatah F, Dahal K (2008) Intelligent phishing website detection system using fuzzy techniques. In: 3rd international conference on information and communication technologies: from theory to applications. ICTTA 2008. IEEE, pp 1-6.

[3]

Aburrous M, Hossain MA, Dahal K, Thabtah F (2010) Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Syst Appl 37:7913-7921.

Digital Library

[4]

Ameli K, Alfi A, Aghaebrahimi M (2016) A fuzzy discrete harmony search algorithm applied to annual cost reduction in radial distribution systems. Eng Optim 48:1529-1549.

[5]

Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. In: Soft computing applications in industry. Springer, pp 373-383.

[6]

Bottazzi G, Casalicchio E, Cingolani D, Marturana F, Piu M (2015) MP-Shield: a framework for phishing detection in mobile devices. In: 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM). IEEE, pp 1977-1983.

[7]

Cai C, Han L, Ji ZL, Chen X, Chen YZ (2003) SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31:3692- 3697.

[8]

Cao J, Li Q, Ji Y, He Y, Guo D (2016) Detection of forwarding-based malicious URLs in online social networks. Int J Parallel Prog 44:163-180.

Digital Library

[9]

Fil BA, Korkmaz M, Özmetin C (2016) Application of nonlinear regression analysis for methyl violet (MV) dye adsorption from solutions onto illite clay. J Dispers Sci Technol 37:991-1001.

[10]

Gupta R, Shukla PK (2015) System design, investigation and counter measure of phishing attacks using data mining classification methods and its analysis. Int J Adv Sci Technol 78:29-40.

[11]

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10-18.

Digital Library

[12]

Hamid IRA, Abawajy J (2011) Phishing email feature selection approach. In: 2011 IEEE 10th international conference on trust, security and privacy in computing and communications. IEEE, pp 916-921.

Digital Library

[13]

He Y-L, Wang X-Z, Huang JZ (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222-240.

Digital Library

[14]

Jahn J (2017) Karush-Kuhn-Tucker conditions in set optimization. J Optim Theory Appl 172:707-725.

Digital Library

[15]

Jeong SY, Koh YS, Dobbie G (2016) Phishing detection on twitter streams. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 141-153.

Digital Library

[16]

Kalivarapu J, Jain S, Bag S (2016) An improved harmony search algorithm with dynamically varying bandwidth. Eng Optim 48:1091-1108.

[17]

Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902-3933.

[18]

Li K, Wang F, Zhang L (2016) A new algorithm for image recognition and classification based on improved Bag of Features algorithm. Opt Int J Light Electron Opt 127:4736-4740.

[19]

Manjarres D, Landa-Torres I, Gil-Lopez S, Del Ser J, Bilbao MN, Salcedo-Sanz S, Geem ZW (2013) A survey on applications of the harmony search algorithm. Eng Appl Artif Intell 26:1818-1831.

Digital Library

[20]

Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 492-497.

[21]

Mohammad RM, Thabtah F, McCluskey L (2014a) Intelligent rule-based phishing websites classification. IET Inf Secur 8:153-160.

[22]

Mohammad RM, Thabtah F, McCluskey L (2014b) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25:443-458.

Digital Library

[23]

Mohammad R, Thabtah FA, McCluskey T (2015) Phishing websites Dataset.

[24]

Montazer GA, ArabYarmohammadi S (2013) Identifying the critical indicators for phishing detection in Iranian e-banking system. In: 2013 5th conference on information and knowledge technology (IKT). IEEE, pp 107-112.

[25]

Naik B, Nayak J, Behera HS, Abraham A (2016) A self adaptive harmony search based functional link higher order ANN for non-linear data classification. Neurocomputing 179:69-87.

Digital Library

[26]

Pandey M, Ravi V (2012) Detecting phishing e-mails using text and data mining. In: 2012 IEEE international conference on computational intelligence & computing research (ICCIC). IEEE, pp 1-6.

[27]

Qiu J, Wei Y, Karimi HR, Gao H (2017a) Reliable control of discrete-time piecewise-affine time-delay systems via output feedback. IEEE Trans Reliab 99:1-13.

[28]

Qiu J, Wei Y, Wu L (2017b) A novel approach to reliable control of piecewise affine systems with actuator faults. IEEE Trans Circuits Syst II Express Briefs 64:957-961.

[29]

Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang X-S, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41:2250-2258.

Digital Library

[30]

Satapathy SC, Chittineni S, Krishna SM, Murthy J, Reddy PP (2012) Kalman particle swarm optimized polynomials for data classification. Appl Math Model 36:115-126.

[31]

Song Q, Jiang H, Liu J (2017) Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 81:22-27.

Digital Library

[32]

Wang L, Ni H, Yang R, Pappu V, Fenn MB, Pardalos PM (2014) Feature selection based on meta-heuristics for biomedicine. Optim Methods Softw 29:703-719.

[33]

Wang G-G, Gandomi AH, Zhao X, Chu HCE (2016) Hybridizing harmony search algorithm with cuckoo search for global numerical optimization. Soft Comput 20:273-285.

Digital Library

[34]

Wei Y, Qiu J, Karimi HR (2017) Reliable output feedback control of discrete-time fuzzy affine systems with actuator faults. IEEE Trans Circuits Syst I Regul Pap 64:170-181.

[35]

Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75:1947-1962.

Digital Library

Cited By

R RShukla AKarthikeyan JBanerjee K(2024)Robust Framework for Malevolent URL Detection using Hybrid Supervised LearningProcedia Computer Science10.1016/j.procs.2023.12.079230:C(241-247)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.12.079
Biswas BMukhopadhyay AKumar ADelen D(2024)A hybrid framework using explainable AI (XAI) in cyber-risk management for defence and recovery against phishing attacksDecision Support Systems10.1016/j.dss.2023.114102177:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.dss.2023.114102
Zhu ECheng KZhang ZWang H(2024)PDHFComputers and Security10.1016/j.cose.2023.103561136:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103561
Show More Cited By

Index Terms

Heuristic nonlinear regression strategy for detecting phishing websites
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Heuristic function construction
  2. Machine learning
    1. Machine learning approaches
2. Security and privacy

Index terms have been assigned to the content through auto-classification.

Recommendations

Feature selection & dominant feature selection for product reviews using meta-heuristic algorithms
COMPUTE '10: Proceedings of the Third Annual ACM Bangalore Conference

In this paper, Harmony Search is introduced as a meta-heuristic that has a stronger presence with respect to intensification and a tuned version of Genetic algorithms, with respect to diversification. An approach to solving the Feature selection (FS) ...
Predicting business failure using support vector machines with straightforward wrapper: A re-sampling study
Highlights
► A straightforward wrapper approach was used to help SVM model produce more accurate prediction of business failure. ► Linear SVM was used to select features for all SVMs in the wrapper in order to avoid over-fitting in ...
Abstract
Business failure prediction (BFP) is an effective tool to help financial institutions and relevant people to make the right decision in investments, especially in the current competitive environment. This topic belongs to a ...
A heuristic technique to detect phishing websites using TWSVM classifier
Abstract
Phishing websites are on the rise and are hosted on compromised domains such that legitimate behavior is embedded into the designed phishing site to overcome the detection. The traditional heuristic techniques using HTTPS, search engine, Page ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Soft Computing - A Fusion of Foundations, Methodologies and Applications

Soft Computing - A Fusion of Foundations, Methodologies and Applications Volume 23, Issue 12

June 2019

662 pages

ISSN:1432-7643

EISSN:1433-7479

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer-Verlag GmbH Germany, part of Springer Nature.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

R RShukla AKarthikeyan JBanerjee K(2024)Robust Framework for Malevolent URL Detection using Hybrid Supervised LearningProcedia Computer Science10.1016/j.procs.2023.12.079230:C(241-247)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.procs.2023.12.079
Biswas BMukhopadhyay AKumar ADelen D(2024)A hybrid framework using explainable AI (XAI) in cyber-risk management for defence and recovery against phishing attacksDecision Support Systems10.1016/j.dss.2023.114102177:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.dss.2023.114102
Zhu ECheng KZhang ZWang H(2024)PDHFComputers and Security10.1016/j.cose.2023.103561136:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103561
Apruzzese GLaskov PMontes de Oca EMallouli WBrdalo Rapa LGrammatopoulos ADi Franco F(2023)The Role of Machine Learning in CybersecurityDigital Threats: Research and Practice10.1145/35455744:1(1-38)Online publication date: 7-Mar-2023
https://dl.acm.org/doi/10.1145/3545574
Safi ASingh S(2023)A systematic literature review on phishing website detection techniquesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.01.00435:2(590-611)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.jksuci.2023.01.004
Bahaghighat MGhasemi MOzen F(2023)A high-accuracy phishing website detection method based on machine learningJournal of Information Security and Applications10.1016/j.jisa.2023.10355377:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.jisa.2023.103553
Alsenani TAyon SYousuf SAnik FChowdhury M(2023)Intelligent feature selection model based on particle swarm optimization to detect phishing websitesMultimedia Tools and Applications10.1007/s11042-023-15399-682:29(44943-44975)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s11042-023-15399-6
Bhattacharya MRoy SChattopadhyay SDas AShetty S(2023)A comprehensive survey on online social networks security and privacy issuesSecurity and Privacy10.1002/spy2.2756:1Online publication date: 16-Jan-2023
Minocha SSingh B(2022)A novel phishing detection system using binary modified equilibrium optimizer for feature selectionComputers and Electrical Engineering10.1016/j.compeleceng.2022.10768998:COnline publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.compeleceng.2022.107689
Sabahno MSafara F(2022)ISHO: improved spotted hyena optimization algorithm for phishing website detectionMultimedia Tools and Applications10.1007/s11042-021-10678-681:24(34677-34696)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s11042-021-10678-6
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents