research-article

Utilizing Large Language Models with Human Feedback Integration for Generating Dedicated Warning for Phishing Emails

Authors:

Quan Hong Nguyen,

Xingliang Yuan,

Carsten RudolphAuthors Info & Claims

SecTL '24: Proceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems

Pages 35 - 46

https://doi.org/10.1145/3665451.3665531

Published: 23 July 2024 Publication History

Abstract

With the rise of digital communication, phishing has emerged as the predominant cybercrime. Automated detection systems encounter challenges such as user trust issues and false positives, while human-centric solutions are resource-intensive and struggle with sophisticated attacks. Despite this threat, research on empowering users with automatic anti-phishing systems remains limited. This paper introduces a human-centric framework that utilizing Large Language Models (LLMs) to extract phishing indicators and generate meaningful warnings.

Recognizing that certain information is unique to users, our system integrates user insights into anti-phishing measures. Preliminary results demonstrate the promise of LLM-driven approaches in crafting meaningful warnings, highlighting the synergy between human insight and machine intelligence in combating phishing. Our framework achieves over 80% effectiveness in identifying phishing semantics with no false positives or negatives, indicating high precision. This research represents a significant advancement in phishing defense, offering a nuanced and effective email security approach.

References

[1]

Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. 2020. Visualphishnet: Zero-day phishing website detection by visual similarity. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security. 1681--1698.

Digital Library

[2]

APWG Phishing Trends Reports. [n. d.]. http://www.antiphishing.org [Online; accessed 4-October-2023].

[3]

Panagiotis Bountakas, Konstantinos Koutroumpouchos, and Christos Xenakis. 2021. A comparison of natural language processing and machine learning methods for phishing email detection. In Proceedings of the 16th International Conference on Availability, Reliability and Security. 1--12.

Digital Library

[4]

Pavlo Burda, Luca Allodi, and Nicola Zannone. 2020. Don't forget the human: a crowdsourced approach to automate response and containment against spear phishing attacks. In 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 471--476.

[5]

Marcus Butavicius, Ronnie Taib, and Simon J Han. 2022. Why people keep falling for phishing scams: The effects of time pressure and deception cues on the detection of phishing emails. Computers & Security 123 (2022), 102937.

Digital Library

[6]

Casey Inez Canfield, Baruch Fischhoff, and Alex Davis. 2016. Quantifying phishing susceptibility for detection and behavior decisions. Human factors 58, 8 (2016), 1158--1172.

[7]

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2022. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).

[8]

Robert Cialdini. 2001. Principles of persuasion. Arizona State University, eBrand Media Publication (2001).

[9]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37--46.

[10]

Paul Ekman and Wallace V Friesen. 1969. Nonverbal leakage and clues to deception. Psychiatry 32, 1 (1969), 88--106.

[11]

Yong Fang, Cheng Zhang, Cheng Huang, Liang Liu, and Yue Yang. 2019. Phishing email detection using improved RCNN model with multilevel vectors and attention mechanism. IEEE Access 7 (2019), 56329--56340.

[12]

Federal Bureau of Investigation. [n. d.]. FBI Internet Crime Report 2022. https://www.ic3.gov/Media/PDF/AnnualReport/ [Online; accessed 1-April-2023].

[13]

Edwin Donald Frauenstein and Stephen Flowerday. 2020. Susceptibility to phishing on social network sites: A personality information processing model. Computers & security 94 (2020), 101862.

[14]

Shakthidhar Gopavaram, Jayati Dev, Marthie Grobler, DongInn Kim, Sanchari Das, and L Jean Camp. 2021. Cross-national study on phishing resilience. In Proceedings of the Workshop on Usable Security and Privacy (USEC).

[15]

Christopher N Gutierrez, Taegyu Kim, Raffaele Della Corte, Jeffrey Avery, Dan Goldwasser, Marcello Cinque, and Saurabh Bagchi. 2018. Learning from the ones that got away: Detecting new forms of phishing attacks. IEEE Transactions on Dependable and Secure Computing 15, 6 (2018), 988--1001.

[16]

Grant Ho, Asaf Cidon, Lior Gavish, Marco Schweighauser, Vern Paxson, Stefan Savage, Geoffrey M. Voelker, and David Wagner. 2019. Detecting and Characterizing Lateral Phishing at Scale. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1273--1290. https://www.usenix.org/conference/usenixsecurity19/presentation/ho

[17]

IWSPA email dataset. [n. d.]. https://dasavisha.github.io/IWSPA-sharedtask/ [Online; accessed 1-April-2023].

[18]

Daniel Jampen, Gürkan Gür, Thomas Sutter, and Bernhard Tellenbach. 2020. Don't click: towards an effective anti-phishing training. A comparative literature review. Human-centric Computing and Information Sciences 10, 1 (2020), 1--41.

Digital Library

[19]

Gokul Jayakrishnan, Vijayanand Banahatti, and Sachin Lodha. 2022. PickMail: A Serious Game for Email Phishing Awareness Training. In Usable Security and Privacy (USEC). ndss.

[20]

Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. 2023. Mistral 7B. arXiv preprint arXiv:2310.06825 (2023).

[21]

Helen S Jones, John N Towse, Nicholas Race, and Timothy Harrison. 2019. Email fraud: The search for psychological predictors of susceptibility. PloS one 14, 1 (2019), e0209684.

[22]

Amir Kashapov, Tingmin Wu, Sharif Abuadbba, and Carsten Rudolph. 2022. Email summarization to assist users in phishing identification. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. 1234--1236.

Digital Library

[23]

Daniele Lain, Kari Kostiainen, and Srdjan Čapkun. 2022. Phishing in organizations: Findings from a large-scale and long-term study. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 842--859.

[24]

Jehyun Lee, Pingxiao Ye, Ruofan Liu, Dinil Mon Divakaran, and Mun Choon Chan. 2020. Building robust phishing detection system: an empirical analysis. NDSS MADWeb (2020).

[25]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74--81.

[26]

Tian Lin, Daniel E Capecci, Donovan M Ellis, Harold A Rocha, Sandeep Dommaraju, Daniela S Oliveira, and Natalie C Ebner. 2019. Susceptibility to spearphishing emails: Effects of internet user demographics and email content. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5 (2019), 1--28.

Digital Library

[27]

Miller Smiles Phishing Email Database. [n. d.]. http://www.millersmiles.co.uk/archives.php [Online; accessed 1-April-2023].

[28]

Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B Hall, Daniel Cer, and Yinfei Yang. 2021. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877 (2021).

[29]

Daniela Oliveira, Harold Rocha, Huizi Yang, Donovan Ellis, Sandeep Dommaraju, Melis Muradoglu, Devon Weir, Adam Soliman, Tian Lin, and Natalie Ebner. 2017. Dissecting spear phishing emails for older vs young adults: On the interplay of weapons of influence and life domains in predicting susceptibility to phishing. In Proceedings of the 2017 chi conference on human factors in computing systems. 6412--6424.

Digital Library

[30]

Alper Ozcan, Cagatay Catal, Emrah Donmez, and Behcet Senturk. 2021. A hybrid DNN-LSTM model for detecting phishing URLs. Neural Computing and Applications (2021), 1--17.

[31]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311--318.

Digital Library

[32]

Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, and Julien Launay. 2023. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116 (2023).

[33]

Justin Petelka, Yixin Zou, and Florian Schaub. 2019. Put your warning where your link is: Improving and evaluating email phishing warnings. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--15.

Digital Library

[34]

Phish Bowl dataset - Cornell University. [n. d.]. https://it.cornell.edu/phish-bowl [Online; accessed 1-April-2023].

[35]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485--5551.

Digital Library

[36]

Prashanth Rajivan and Cleotilde Gonzalez. 2018. Creative persuasion: a study on adversarial behaviors and strategies in phishing attacks. Frontiers in psychology 9 (2018), 135.

[37]

Routhu Srinivasa Rao and Alwyn Roshan Pais. 2019. Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and Applications 31 (2019), 3851--3873.

Digital Library

[38]

Justinas Rastenis, Simona Ramanauskaitė, Justinas Janulevičius, Antanas Čenys, Asta Slotkienė, and Kęstutis Pakrijauskas. 2020. E-mail-based phishing attack taxonomy. Applied Sciences 10, 7 (2020), 2363.

[39]

Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023).

[40]

Benjamin Reinheimer, Lukas Aldag, Peter Mayer, Mattia Mossano, Reyhan Duezguen, Bettina Lofthouse, Tatiana von Landesberger, and Melanie Volkamer. 2020. An investigation of phishing awareness and education over time: When and how to best remind users. In Sixteenth Symposium on Usable Privacy and Security (SOUPS 2020). USENIX Association, 259--284. https://www.usenix.org/conference/soups2020/presentation/reinheimer

[41]

Benjamin Reinheimer, Lukas Aldag, Peter Mayer, Mattia Mossano, Reyhan Duezguen, Bettina Lofthouse, Tatiana Von Landesberger, and Melanie Volkamer. 2020. An investigation of phishing awareness and education over time: When and how to best remind users. In Sixteenth Symposium on Usable Privacy and Security (SOUPS 2020). 259--284.

[42]

Tarini Saka, Kami Vaniea, and Nadin Kökciyan. 2022. Context-Based Clustering to Mitigate Phishing Attacks. In Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security. 115--126.

Digital Library

[43]

Said Salloum, Tarek Gaber, Sunil Vadera, and Khaled Shaalan. 2021. Phishing email detection using natural language processing techniques: a literature survey. Procedia Computer Science 189 (2021), 19--28.

[44]

Sebastian W Schuetz, Zachary R Steelman, and Rhonda A Syler. 2022. It's not just about accuracy: An investigation of the human factors in users' reliance on anti-phishing tools. Decision Support Systems 163 (2022), 113846.

Digital Library

[45]

Sentence-T5-xl. [n. d.]. https://huggingface.co/sentence-transformers/sentence-t5-xl [Online; accessed 1-Sep-2023].

[46]

Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need. arXiv preprint arXiv:1911.02150 (2019).

[47]

Hossein Shirazi, Bruhadeshwar Bezawada, and Indrakshi Ray. 2018. "Kn0w Thy Doma1n Name" Unbiased Phishing Detection Using Domain Name Based Features. In Proceedings of the 23nd ACM on symposium on access control models and technologies. 69--75.

Digital Library

[48]

Brian Stanton, Mary F Theofanos, Sandra Spickard Prettyman, and Susanne Furman. 2016. Security fatigue. It Professional 18, 5 (2016), 26--32.

[49]

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2023. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing (2023), 127063.

[50]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).

[51]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).

[52]

Ishant Tyagi, Jatin Shad, Shubham Sharma, Siddharth Gaur, and Gagandeep Kaur. 2018. A novel machine learning approach to detect phishing websites. In 2018 5th International conference on signal processing and integrated networks (SPIN). IEEE, 425--430.

[53]

Amber Van Der Heijden and Luca Allodi. 2019. Cognitive triaging of phishing attacks. In 28th USENIX Security Symposium (USENIX Security 19). 1309--1326.

[54]

Rick Wash, Norbert Nthala, and Emilee Rader. 2021. Knowledge and capabilities that {Non-Expert} users bring to phishing detection. In Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021). 377--396.

[55]

Zikai Alex Wen, Zhiqiu Lin, Rowena Chen, and Erik Andersen. 2019. What. hack: engaging anti-phishing training through a role-playing phishing simulation game. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.

Digital Library

[56]

Ammara Zamir, Hikmat Ullah Khan, Tassawar Iqbal, Nazish Yousaf, Farah Aslam, Almas Anjum, and Maryam Hamdani. 2020. Phishing web site detection using diverse machine learning algorithms. The Electronic Library 38, 1 (2020), 65--80.

[57]

Penghui Zhang, Zhibo Sun, Sukwha Kyung, Hans Walter Behrens, Zion Leonahenahe Basque, Haehyun Cho, Adam Oest, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, et al. 2022. I'm SPARTACUS, No, I'm SPARTACUS: Proactively Protecting Users from Phishing by Intentionally Triggering Cloaking Behavior. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 3165--3179.

Digital Library

Index Terms

Utilizing Large Language Models with Human Feedback Integration for Generating Dedicated Warning for Phishing Emails

Recommendations

A Sender-Centric Approach to Detecting Phishing Emails
CYBERSECURITY '12: Proceedings of the 2012 International Conference on Cyber Security

Email-based online phishing is a critical security threat on the Internet. Although phishers have great flexibility in manipulating both the content and structure of phishing emails, phishers have much less flexibility in completely concealing the ...
How Experts Detect Phishing Scam Emails
CSCW

Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails ...
Status Update on Phishing Emails Awareness: Jordanian Case
ICEMIS'21: The 7th International Conference on Engineering & MIS 2021

Abstract—This study is a response to the rapid proliferation of high-risk phishing emails, representing one of the most dangerous cybercrimes and the primary medium for the deception of online users. This study aims to investigate the various ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SecTL '24: Proceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems

July 2024

69 pages

ISBN:9798400706912

DOI:10.1145/3665451

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ASIA CCS '24

Sponsor:

SIGSAC

ASIA CCS '24: ACM Asia Conference on Computer and Communications Security

July 2 - 20, 2024

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
143
Total Downloads

Downloads (Last 12 months)143
Downloads (Last 6 weeks)33

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents