research-article

VORTEX : Visual phishing detectiOns aRe Through EXplanations

Authors:

Fabien Charmet,

Tomohiro Morikawa,

Takeshi TakahashiAuthors Info & Claims

ACM Transactions on Internet Technology, Volume 24, Issue 2

Article No.: 9, Pages 1 - 24

https://doi.org/10.1145/3654665

Published: 06 May 2024 Publication History

Abstract

Phishing attacks reached a record high in 2022, as reported by the Anti-Phishing Work Group, following an upward trend accelerated during the pandemic. Attackers employ increasingly sophisticated tools in their attempts to deceive unaware users into divulging confidential information. Recently, the research community has turned to the utilization of screenshots of legitimate and malicious websites to identify the brands that attackers aim to impersonate. In the field of Computer Vision, convolutional neural networks (CNNs) have been employed to analyze the visual rendering of websites, addressing the problem of phishing detection. However, along with the development of these new models, arose the need to understand their inner workings and the rationale behind each prediction. Answering the question, “How is this website attempting to steal the identity of a well-known brand?” becomes crucial when protecting end-users from such threats. In cybersecurity, the application of explainable AI (XAI) is an emerging approach that aims to answer such questions. In this article, we propose VORTEX, a phishing website detection solution equipped with the capability to explain how a screenshot attempts to impersonate a specific brand. We conduct an extensive analysis of XAI methods for the phishing detection problem and demonstrate that VORTEX provides meaningful explanations regarding the detection results. Additionally, we evaluate the robustness of our model against Adversarial Example attacks. We adapt these attacks to the VORTEX architecture and evaluate their efficacy across multiple models and datasets. Our results show that VORTEX achieves superior accuracy compared to previous models, and learns semantically meaningful patterns to provide actionable explanations about phishing websites. Finally, VORTEX demonstrates an acceptable level of robustness against adversarial example attacks.

References

[1]

(n.d.). Google Safe Browsing. Last Accessed date October 6, 2023 from https://safebrowsing.google.com/

[2]

(n.d.). OpenPhish. Last Accessed date October 6, 2023 from https://openphish.com/

[3]

(n.d.). PhishTank. Last Accessed date October 6, 2023 from https://phishtank.org/

[4]

Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. 2020. VisualPhishNet: Zero-day phishing website detection by visual similarity. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS ’20). ACM, New York, NY, USA, 1681–1698. DOI:

Digital Library

[5]

Zeeshan Ahmad, Adnan Shahid Khan, Cheah Wai Shiang, Johari Abdullah, and Farhan Ahmad. 2021. Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies 32, 1 (2021), e4150.

Digital Library

[6]

Rafa Alenezi and Simone A. Ludwig. 2021. Explainability of cybersecurity threats data using SHAP. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI ’21). IEEE, 01–10.

[7]

Anna Markella Antoniadi, Yuhan Du, Yasmine Guendouz, Lan Wei, Claudia Mazo, Brett A. Becker, and Catherine Mooney. 2021. Current challenges and future opportunities for XAI in machine learning-based clinical decision support systems: A systematic review. Applied Sciences 11, 11 (2021), 5088.

[8]

Liat Antwarg, Ronnie Mindlin Miller, Bracha Shapira, and Lior Rokach. 2021. Explaining anomalies detected by autoencoders using shapley additive explanations. Expert Systems with Applications 186 (2021), 115736.

Digital Library

[9]

Sultan Asiri, Yang Xiao, Saleh Alzahrani, Shuhui Li, and Tieshan Li. 2023. A survey of intelligent detection designs of HTML URL phishing attacks. IEEE Access 11 (2023), 6421–6443. DOI:

[10]

Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, and Ismail Ben Ayed. 2020. A unifying mutual information view of metric learning: Cross-entropy vs. pairwise losses. In Proceedings of the 16th European Conference on Computer Vision–ECCV 2020. Springer, 548–564.

Digital Library

[11]

Ahmet Selman Bozkir and Ebru Akcapinar Sezer. 2016. Use of HOG descriptors in phishing detection. In Proceedings of the 2016 4th International Symposium on Digital Forensic and Security (ISDFS ’16). IEEE, 148–153.

[12]

Nicola Capuano, Giuseppe Fenza, Vincenzo Loia, and Claudio Stanzione. 2022. Explainable artificial intelligence in CyberSecurity: A survey. IEEE Access 10 (2022), 93575–93600.

[13]

Fabien Charmet, Harry Chandra Tanuwidjaja, Solayman Ayoubi, Pierre-François Gimenez, Yufei Han, Houda Jmila, Gregory Blanc, Takeshi Takahashi, and Zonghua Zhang. 2022. Explainable artificial intelligence for cybersecurity: A literature survey. Annals of Telecommunications (2022), 1–24.

[14]

Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N. Balasubramanian. 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV ’18). IEEE, 839–847.

[15]

Teh-Chung Chen, Scott Dick, and James Miller. 2010. Detecting visually similar web pages: Application to phishing detection. ACM Transactions on Internet Technology (TOIT) 10, 2 (2010), 1–38.

Digital Library

[16]

Francesco Croce and Matthias Hein. 2020. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 206, 11 pages.

[17]

Feng Dai, Bin Chen, Hang Xu, Yike Ma, Xiaodong Li, Bailan Feng, Peng Yuan, Chenggang Yan, and Qiang Zhao. 2022. Unbiased IoU for spherical image object detection. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, 34th Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, 12th Symposium on Educational Advances in Artificial Intelligence, EAAI 2022. AAAI Press, 508–515.

[18]

Anthony Y. Fu, Liu Wenyin, and Xiaotie Deng. 2006. Detecting phishing web pages with visual similarity assessment based on Earth mover’s distance (EMD). IEEE Transactions on Dependable and Secure Computing 3, 4 (2006), 301–311.

Digital Library

[19]

Paolo Giudici and Emanuela Raffinetti. 2022. Explainable AI methods in cyber risk management. Quality and Reliability Engineering International 38, 3 (2022), 1318–1326.

[20]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572

[21]

Jiabo He, Sarah M. Erfani, Xingjun Ma, James Bailey, Ying Chi, and Xian-Sheng Hua. 2021. Alpha-IoU: A family of power intersection over union losses for bounding box regression. In Proceedings of the 35th Conference on Neural Information Processing Systems 2021, NeurIPS 2021. 20230–20242.

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[23]

Zakaria Abou El Houda, Bouziane Brik, and Lyes Khoukhi. 2022. “Why Should I Trust Your IDS?”: An explainable deep learning framework for intrusion detection systems in Internet of Things networks. IEEE Open Journal of the Communications Society 3 (2022), 1164–1176. DOI:

[24]

Giacomo Iadarola, Fabio Martinelli, Francesco Mercaldo, and Antonella Santone. 2021. Towards an interpretable deep learning model for mobile malware detection and family identification. Computers & Security 105 (2021), 102198.

[25]

Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming-Ming Cheng, and Yunchao Wei. 2021. Layercam: Exploring hierarchical class activation maps for localization. IEEE Transactions on Image Processing 30 (2021), 5875–5888.

Digital Library

[26]

Hassan Khosravi, Simon Buckingham Shum, Guanliang Chen, Cristina Conati, Yi-Shan Tsai, Judy Kay, Simon Knight, Roberto Mart’inez Maldonado, Shazia Wasim Sadiq, and Dragan Gasevic. 2022. Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence 3 (2022), 100074.

[27]

Hung Le, Quang Pham, Doyen Sahoo, and Steven CH Hoi. 2018. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv:1802.03162. Retrieved from https://arxiv.org/abs/1802.03162

[28]

Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. 2021. Phishpedia: A hybrid deep learning based approach to visually identify phishing webpages. In Proceedings of the 30th USENIX Security Symposium (USENIX Security ’21). USENIX Association, 3793–3810. Retrieved fromhttps://www.usenix.org/conference/usenixsecurity21/presentation/lin

[29]

Ruofan Liu, Yun Lin, Xianglin Yang, Siang Hwee Ng, Dinil Mon Divakaran, and Jin Song Dong. 2022. Inferring phishing intention via webpage appearance and dynamics: A deep vision based approach. In Proceedings of the 31st USENIX Security Symposium (USENIX Security ’22). USENIX Association, Boston, MA, 1633–1650. Retrieved fromhttps://www.usenix.org/conference/usenixsecurity22/presentation/liu-ruofan

[30]

Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems.

[31]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Retrieved from https://arxiv.org/abs/1706.06083

[32]

Adam Oest, Yeganeh Safei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Gary Warner. 2018. Inside a phisher’s mind: Understanding the anti-phishing ecosystem through phishing kit analysis. In Proceedings of the 2018 APWG Symposium on Electronic Crime Research (eCrime ’18). IEEE, 1–12.

[33]

Saurabh Desai and Harish G. Ramaswamy. 2020. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 983–991.

[34]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. InProceedings of the 28th International Conference on Neural Information Processing Systems.

[35]

Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian D. Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019. Computer Vision Foundation/IEEE, 658–666.

[36]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144.

Digital Library

[37]

Yao Rong, Tobias Leemann, Vadim Borisov, Gjergji Kasneci, and Enkelejda Kasneci. 2022. A consistent and efficient evaluation strategy for attribution methods. In Proceedings of the International Conference on Machine Learning, ICML 2022. Vol. 162, PMLR, 18770–18795.

[38]

Yao Rong, Tobias Leemann, Vadim Borisov, Gjergji Kasneci, and Enkelejda Kasneci. 2022. A consistent and efficient evaluation strategy for attribution methods. arXiv:2202.00449. Retrieved from https://arxiv.org/abs/2202.00449

[39]

Doyen Sahoo, Chenghao Liu, and Steven C. H. Hoi. 2017. Malicious URL detection using machine learning: A survey. arXiv:1701.07179. Retrieved from https://arxiv.org/abs/1701.07179

[40]

Mohanad Sarhan, Siamak Layeghy, and Marius Portmann. 2022. Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Big Data Res. 30 (2022), 100359.

[41]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815–823.

[42]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.

[43]

Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. 2019. Adversarial training for free! In Proceedings of the 33rd International Conference on Neural Information Processing Systems.

[44]

Gautam Srivastava, Rutvij H. Jhaveri, Sweta Bhattacharya, Sharnil Pandya, Rajeswari, Praveen Kumar Reddy Maddikunta, Gokul Yenduri, Jon G. Hall, Mamoun Alazab, and Thippa Reddy Gadekallu. 2022. XAI for cybersecurity: State of the art, challenges, open issues and future directions. arXiv:2206.03585. Retrieved from https://arxiv.org/abs/2206.03585

[45]

Karthika Subramani, William Melicher, Oleksii Starov, Phani Vadrevu, and Roberto Perdisci. 2022. PhishInPatterns: Measuring elicited user interactions at scale on phishing websites. In Proceedings of the 22nd ACM Internet Measurement Conference. 589–604.

Digital Library

[46]

Eu Wern Teh, Terrance DeVries, and Graham W. Taylor. 2020. ProxyNCA++: Revisiting and revitalizing proxy neighborhood component analysis. In Proceedings of the 16th European Conference on Computer Vision–ECCV 2020. Springer, 448–464.

Digital Library

[47]

Erico Tjoa and Cuntai Guan. 2020. A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems 32, 11 (2020), 4793–4813.

[48]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.

[49]

Bas H. M. Van der Velden, Hugo J. Kuijf, Kenneth G. A. Gilhuijs, and Max A. Viergever. 2022. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Medical Image Analysis (2022), 102470.

[50]

Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, and Xia Hu. 2020. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 24–25.

[51]

Maonan Wang, Kangfeng Zheng, Yanqing Yang, and Xiujuan Wang. 2020. An explainable machine learning framework for intrusion detection systems. IEEE Access 8 (2020), 73127–73141.

[52]

Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. 2021. On the convergence and robustness of adversarial training. arXiv:2112.08304. Retrieved from https://arxiv.org/abs/2112.08304

[53]

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. Retrieved October 6, 2023 from https://github.com/facebookresearch/detectron2

[54]

Dengpan Ye, Chuanxi Chen, Changrui Liu, Hao Wang, and Shunzhi Jiang. 2022. Detection defense against adversarial attacks with saliency map. International Journal of Intelligent Systems 37, 12 (2022), 10193–10210.

Digital Library

Index Terms

VORTEX : Visual phishing detectiOns aRe Through EXplanations
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Software and application security
    1. Web application security

Recommendations

Circumventing security toolbars and phishing filters via rogue wireless access points

One of the solutions that has been widely used by naive users to protect against phishing attacks is security toolbars or phishing filters in web browsers. The present study proposes a new attack to bypass security toolbars and phishing filters via ...
Anti-phishing: A comprehensive perspective
Abstract
Phishing is a form of deception technique that attackers often use to acquire sensitive information related to individuals and organizations fraudulently. Although Phishing attacks have been known for more than two decades, and there is ongoing ...
Highlights
- Classification and discussion of various phishing attacks, motives, and their types.
- The role of social and cognitive factors in the success of a phishing attack.
- A comprehensive survey of various phishing detection and prevention ...
Phish-IDetector: Message-Id Based Automatic Phishing Detection
ICETE 2015: Proceedings of the 12th International Joint Conference on e-Business and Telecommunications - Volume 4

Phishing attacks are a well known problem in our age of electronic communication. Sensitive information

like credit card details, login credentials for account, etc. are targeted by phishers. Emails are the most

common channel for launching phishing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology

ACM Transactions on Internet Technology Volume 24, Issue 2

May 2024

96 pages

EISSN:1557-6051

DOI:10.1145/3613553

Editor:
Ling Liu
Georgia Institute of Technology, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2024

Online AM: 28 March 2024

Accepted: 23 March 2024

Revised: 16 January 2024

Received: 06 October 2023

Published in TOIT Volume 24, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
346
Total Downloads

Downloads (Last 12 months)346
Downloads (Last 6 weeks)36

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents