Highly accurate phishing URL detection based on machine learning

1471 Accesses
15 Citations
Explore all metrics

Abstract

Phishing is a persistent and major threat on the internet that is growing steadily and dangerously. It is a type of cyber-attack, in which phisher mimics a legitimate website page to harvest victim’s sensitive information, such as usernames, emails, passwords and bank or credit card details. To prevent such attacks, several phishing detection techniques have been proposed such as AI based, 3rd party, heuristic and content based. However, these approaches suffer from a number of limitations that needs to be addressed in order to detect phishing URLs. Firstly, features extracted in the past are extensive, with a limitation that it takes a considerable amount of time to extract such features. Secondly, several approaches selected important features using statistical methods, while some propose their own features. Although both methods have been implemented successfully in various approaches, however, these methods produce incorrect results without amplification of domain knowledge. Thirdly, most of the literature has used pre-classified and smaller datasets, which fail to produce exact efficiency and precision on large and real world datasets. Fourthly, the previous proposed approaches lack in advanced evaluation measures. Hence, in this paper, effective machine learning framework is proposed, which predicts phishing URLs without visiting the webpage nor utilizing any 3rd party services. The proposed technique is based on URL and uses full URL, protocol scheme, hostname, path area of the URL, entropy feature, suspicious words and brand name matching using TF-IDF technique for the classification of phishing URLs. The experiments are carried out on six different datasets using eight different machine learning classifiers, in which Random Forest achieved a significant higher accuracy than other classifiers on all the datasets. The proposed framework with only 30 features achieved a higher accuracy of 96.25% and 94.65% on the Kaggle datasets. The comparative results show that the proposed model achieved an accuracy of 92.2%, 91.63%, 94.80, 96.85% on benchmark datasets, which is higher than the existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

A Review of Phishing URL Detection Using Machine Learning Classifiers

An Improved Method of Phishing URL Detection Using Machine Learning

Phishing Website Detection from URLs Using Classical Machine Learning ANN Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aburub F, Hadi W (2021) A new association classification based method for detecting phishing websites. J Theoret Appl Inf Technol 99(1):147–158
Google Scholar
Abuzuraiq A, Alkasassbeh M, Almseidin M (2020) Intelligent methods for accurately detecting phishing websites. In: 1th International Conference on information and communication systems (ICICS), pp 085–090, April 2020.
Al-Alyan A, Al-Ahmadi S (2020) Robust URL phishing detection based on deep learning. KSII Trans Internet Inf Syst 14(7):2752–2768
Google Scholar
Alexa (2022) Most popular legitimate URLs. https://www.alexa.com/. Accessed 5 Aug 2021
Alsharnouby M, Alaca F, Chiasson S (2015) Why phishing still works: user strategies for combating phishing attacks. Int J Hum Comput Stud 82:69–82
Article Google Scholar
APWG (2013–2020) Phishing activity trends reports, 1^st, 2^nd, 3^rd, and 4^th quarters of each years. https://apwg.org/trendsreports/, published 2013–2020
Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, González FA (2017) Classifying phishing URLs using recurrent neural networks. In: IEEE Proceedings of the APWG Symposium on electronic crime research (eCrime), pp 1–8, 2017
Banik B, Sarma A (2018) Phishing URL detection system based on URL features using SVM. Int J Electron Appl Res (IJEAR) 5(2):40–55
Article Google Scholar
Chatterjee M, Namin AS (2019) Detecting phishing websites through deep reinforcement learning. In: IEEE Annual Computer Software and Applications Conference, pp 227–232, 2019
Chavan S, Inamdar A, Dorle A, Kulkarni S, W, X-W (2019) Phishing detection: malicious and benign websites classification using machine learning techniques. In: Springer Proceeding of International Conference on computational science and applications (ICCSA), pp 437–446, August 2019
Chiew KL, Yong KSC, Tan CL (2018) A survey of phishing attacks: their types, vectors and technical approaches. Elsevier Expert Syst Appl 106:1–20
Article Google Scholar
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework formachine learning-based phishing detection system. Inf Sci 484:153–166
Article Google Scholar
Dou Z, Khalil I, Khreishah A, Al-Fuqaha A, Guizani M (2017) Systematization of knowledge (SoK): a systematic review of software-based web phishing detection. IEEE Commun Surveys & Tutor 19(4):2797–2819
Article Google Scholar
El Aassal A, Baki S, Das A, Verma RM (2020) An indepth benchmarking and evaluation of phishing detection research for security needs. IEEE Access 8:22170–22192
Article Google Scholar
Feng F, Zhou Q, Shen Z et al (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0786-3
Article Google Scholar
Gupta BB, Yadav K, Razzak I, Psannis K, Castiglione A, Chang X (2021) A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment. Comput Commun 175:47–57
Article Google Scholar
Hutchinson S, Zhang Z, Liu Q (2018) Detecting phishing websites with random forest. Springer ICST Inst Comput Sci Soc Inf Telecommun Eng MILICOM 251:470–479
Google Scholar
Jagadeesan S, Chaturvedi A, Kumar S (2018) Url phishing analysis using random forest. Int J Pure Appl Math 118(20):4159–4163
Google Scholar
Jain AK, Gupta BB (2018a) PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Springer cyber security, advances in intelligent systems and computing, pp 467–474
Jain AK, Gupta BB (2018b) A machine learning based approach for phishing detection using hyperlinks information. Springer J Ambient Intell Humaniz Comput, pp 2015–2028
Jalil S, Usman M (2020) A review of phishing URL detection using machine learning classifiers. Springer Adv Intell Syst Comput 1251:646–665
Article Google Scholar
Jeeva C, Rajsingh EB (2016) Intelligent phishing url detection using association rule mining. SpringerOpen Human-Centric Comput Inf Sci 6:10
Article Google Scholar
Joshi A, Pattanshetti TR (2019) Phishing attack detection using feature selection techniques. In: Proceedings of International Conference on communication and information processing (ICCIP), May 2019, pp 949–952
Korkmaz M, Sahingoz OK, Diri B (2020) Detection of phishing websites by using machine learning-based URL analysis. In: IEEE 11th International Conference on computing, communication and networking technologies (ICCCNT), pp 1–7
Kulkarni A, Brown LL (2019) Phishing websites detection using machine learning. Int J Adv Comput Sci Appl (IJACSA) 10/7:8–13
Google Scholar
Li JH, Wang SD (2017) Phishbox: an approach for phishing validation and detection. In: 2017 IEEE 15th Int. Conf. on Dependable, Autonomic and Secure Computing, 15th Int. Conf. on Pervasive Intelligence and Computing, 3rd Int. Conf. on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Orlando, FL, USA., 6 November 2017, pp 557–564
Li Y, Yang Z, Chen X et al (2019) A stacking model using URL and HTML features for phishing webpage detection. Elsevier Future Gener Comput Syst 94:27–39
Article Google Scholar
Opara C, Wei B, Chen Y (2020) HTMLPhish: enabling phishing web page detection by applying deep learning techniques on HTML analysis. In: IEEE International Joint Conference on neural networks (IJCNN), pp 1–8, 2020
Pandey A, Gill N, Sai Prasad Nadendla K, Sumaiya Thaseen I (2019) Identification of phishing attack in websites using random forest-SVM hybrid model. In: Springer intelligent systems design and applications (ISDA), pp 120–128
PhishTank (2022) Verified phishing URLs. https://www.phishtank.com/. Accessed 5 Aug 2021
Rao RS, Vaishnavi T, Pais AR (2019) CatchPhish: detection of phishing websites by inspecting URLs. Springer J Ambient Intell Humaniz Comput 11:813–825
Article Google Scholar
Sadique F, Kaul R, Badsha S, Sengupta S (2020) An automated framework for real-time phishing URL detection. In: IEEE 10th annual computing and communication workshop and conference (CCWC), pp 0335–0341
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. ScienceDirect J Expert Syst Appl 117:345–357
Article Google Scholar
Shahrivari V, Darabi MM, Izadi M (2020) Phishing detection using machine learning techniques. arXiv 2009.11116
Srinivasa Rao RS, Pais AR (2018) Detection of phishing websites using an efficient feature-based machine learning framework. Springer Neural Comput Appl 31:3851–3873
Google Scholar
Tan CL, Chiew KL, Wong K, Sze SN (2016) PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Elsevier Decis Support Syst 88:18–27
Article Google Scholar
UCI (2022) UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/ml/index.php/. Accessed 5 Aug 2021
Webroot (2020) Webroot threat report. https://mypage.webroot.com/rs/557-FSI-195/images/2020%20Webroot%20Threat%20Report_US_FINAL.pdf. Accessed 5 Aug 2021
Yang P, Zhao G, Zeng P (2019) Phishing website detection based on multidimensional features driven by deep learning. IEEE Access J Mag 7:15196–15209
Article Google Scholar
Zhu E, Chen Y, Ye C, Li X, Liu F (2019) OFS-NN: an effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access J Mag 7:73271–73284
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology University, Islamabad, Pakistan
Sajjad Jalil & Muhammad Usman
Western Michigan University, Kalamazoo, USA
Alvis Fong

Authors

Sajjad Jalil
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Usman
View author publications
You can also search for this author in PubMed Google Scholar
Alvis Fong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alvis Fong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 79 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jalil, S., Usman, M. & Fong, A. Highly accurate phishing URL detection based on machine learning. J Ambient Intell Human Comput 14, 9233–9251 (2023). https://doi.org/10.1007/s12652-022-04426-3

Download citation

Received: 06 August 2021
Accepted: 14 September 2022
Published: 08 October 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12652-022-04426-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Phishing URL Detection Using Machine Learning Classifiers

An Improved Method of Phishing URL Detection Using Machine Learning

Phishing Website Detection from URLs Using Classical Machine Learning ANN Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 79 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Highly accurate phishing URL detection based on machine learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Phishing URL Detection Using Machine Learning Classifiers

An Improved Method of Phishing URL Detection Using Machine Learning

Phishing Website Detection from URLs Using Classical Machine Learning ANN Model

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 79 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation