Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3485447.3512234acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Link: Black-Box Detection of Cross-Site Scripting Vulnerabilities Using Reinforcement Learning

Published: 25 April 2022 Publication History

Abstract

Black-box web scanners have been a prevalent means of performing penetration testing to find reflected cross-site scripting (XSS) vulnerabilities. Unfortunately, off-the-shelf black-box web scanners suffer from unscalable testing as well as false negatives that stem from a testing strategy that employs fixed attack payloads, thus disregarding the exploitation of contexts to trigger vulnerabilities. To this end, we propose a novel method of adapting attack payloads to a target reflected XSS vulnerability using reinforcement learning (RL). We present Link, a general RL framework whose states, actions, and a reward function are designed to find reflected XSS vulnerabilities in a black-box and fully automatic manner. Link finds 45, 213, and 60 vulnerabilities with no false positives in Firing-Range, OWASP, and WAVSEP benchmarks, respectively, outperforming state-of-the-art web scanners in terms of finding vulnerabilities and ending testing campaigns earlier. Link also finds 43 vulnerabilities in 12 real-world applications, demonstrating the promising efficacy of using RL in finding reflected XSS vulnerabilities.

References

[1]
4images. 2021. 4images Gallery. https://www.4homepages.de/.
[2]
Giovanni Agosta, Alessandro Barenghi, Antonio Parata, and Gerardo Pelosi. 2012. Automated Security Analysis of Dynamic Web Applications through Symbolic Code Execution. In Proceedings of the International Conference on Information Technology - New Generations. 189–194.
[3]
Abeer Alhuzali, Birhanu Eshete, Rigel Gjomemo, and VN Venkatakrishnan. 2016. Chainsaw: Chained automated workflow-based exploit generation. In Proceedings of the ACM Conference on Computer and Communications Security. 641–652.
[4]
Abeer Alhuzali, Rigel Gjomemo, Birhanu Eshete, and VN Venkatakrishnan. 2018. NAVEX: precise and scalable exploit generation for dynamic web applications. In Proceedings of the USENIX Security Symposium. 377–392.
[5]
Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2008. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In Proceedings of the IEEE Symposium on Security and Privacy. 387–401.
[6]
BBVA. 2019. WAF-Brain: The clever and efficient Firewall for the Web. https://github.com/BBVA/waf-brain.
[7]
Ahmet Salih Buyukkayhan, Can Gemicioglu, Tobias Lauinger, Alina Oprea, William Robertson, and Engin Kirda. 2020. What’s in an Exploit? An Empirical Analysis of Reflected Server XSS Exploitation Techniques. In Proceedings of the International Conference on Research in Attacks, Intrusions, and Defenses. 107–120.
[8]
Francesco Caturano, Gaetano Perrone, and Simon Pietro Romano. 2021. Discovering reflected cross-site scripting vulnerabilities using a multiobjective reinforcement learning environment. Computers & Security 103 (2021), 102204.
[9]
Shay Chen. 2014. WAVSEP: The Web Application Vulnerability Scanner Evaluation Project. https://github.com/sectooladdict/wavsep/.
[10]
Hyunsang Choi, Seongjin Hong, Sanghyun Cho, and Young-Gab Kim. 2017. HXD: Hybrid XSS detection by using a headless browser. In Proceedings of the International Conference on Computer Applications and Information Processing Technology. 1–4.
[11]
Tianshu Chu, Jie Wang, Lara Codecà, and Zhaojian Li. 2019. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems 21, 3(2019), 1086–1095.
[12]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the ACM Conference on Recommender Systems. 191–198.
[13]
Anthony Cozamanis. 2019. XSS Vectors Cheat Sheet. https://gist.github.com/kurobeats/9a613c9ab68914312cbb415134795b45.
[14]
Piotr Dabkowski. 2019. pyjsparser. https://github.com/PiotrDabkowski/pyjsparser.
[15]
Zoran Djuric. 2013. A black-box testing tool for detecting SQL injection vulnerabilities. In Proceedings of the International Conference on Informatics and Analytics. 216–221.
[16]
Gabriel Dulac-Arnold, Nir Levine, Daniel J Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, and Todd Hester. 2021. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning (2021), 1–50.
[17]
Gabriel Dulac-Arnold, Daniel Mankowitz, and Todd Hester. 2019. Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901(2019).
[18]
Alvaro Cabrejas Egea, Shaun Howell, Maksis Knutins, and Colm Connaughton. 2020. Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. 965–972.
[19]
Laszlo Erdodi, Åvald Åslaugson Sommervoll, and Fabio Massimo Zennaro. 2021. Simulating SQL Injection Vulnerability Exploitation Using Q-Learning Reinforcement Learning Agents. arXiv preprint arXiv:2101.03118(2021).
[20]
Benjamin Eriksson, Giancarlo Pellegrino, and Andrei Sabelfeld. 2021. Black widow: Blackbox data-driven web scanning. In Proceedings of the IEEE Symposium on Security and Privacy. 1125–1142.
[21]
Olorunjube James Falana, Ife Olalekan Ebo, Carolyn Oreoluwa Tinubu, Olusesi Alaba Adejimi, and Andeson Ntuk. 2020. Detection of Cross-Site Scripting Attacks using Dynamic Analysis and Fuzzy Inference System. In Proceedings of the International Conference in Mathematics, Computer Engineering and Computer Science. 1–6.
[22]
Yong Fang, Cheng Huang, Yijia Xu, and Yang Li. 2019. RLXSS: Optimizing XSS detection model to defend against adversarial attacks based on reinforcement learning. Future Internet 11, 8 (2019), 177.
[23]
Python Software Foundation. 2021. difflib: Helpers for computing deltas. https://docs.python.org/3/library/difflib.html.
[24]
Vincent François-Lavet, Raphael Fonteneau, and Damien Ernst. 2015. How to discount deep reinforcement learning: Towards new dynamic strategies. arXiv preprint arXiv:1512.02011(2015).
[25]
Mahmoud Ghorbanzadeh and Hamid Reza Shahriari. 2020. ANOVUL: Detection of logic vulnerabilities in annotated programs via data and control flow analysis. IET Digital Library 14, 3 (2020), 352–364.
[26]
Google. 2018. Firing Range. https://github.com/google/firing-range.
[27]
Mukesh Kumar Gupta, MC Govil, and Girdhari Singh. 2014. Static analysis approaches to detect SQL injection and cross site scripting vulnerabilities in web applications: A survey. In Proceedings of the International Conference on Recent Advances and Innovations in Engineering. 1–5.
[28]
Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, and Anssi Kanervisto. 2021. Stable baselines. https://github.com/hill-a/stable-baselines/.
[29]
Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In Proceedings of the USENIX Security Symposium.
[30]
Charlie Hou, Mingxun Zhou, Yan Ji, Phil Daian, Florian Tramer, Giulia Fanti, and Ari Juels. 2019. SquirRL: Automating attack analysis on blockchain incentive mechanisms with deep reinforcement learning. arXiv preprint arXiv:1912.01798(2019).
[31]
J Stuart Hunter. 1986. The exponentially weighted moving average. Journal of quality technology 18, 4 (1986), 203–210.
[32]
Martin Johns, Björn Engelmann, and Joachim Posegga. 2008. XSSDS: Server-side detection of cross-site scripting attacks. In Proceedings of the Annual Computer Security Applications Conference. 335–344.
[33]
Martin Johns and Moritz Jodeit. 2011. Scanstud: a methodology for systematic, fine-grained evaluation of static analysis tools. In Proceedings of the International Conference on Software Testing, Verification and Validation Workshops. 523–530.
[34]
Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. 1996. Reinforcement learning: A survey. Journal of artificial intelligence research 4 (1996), 237–285.
[35]
Martin Kleppe. 2021. JSFuck. http://www.jsfuck.com/.
[36]
Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Proceedings of the Advances in Neural Information Processing Systems. 1008–1014.
[37]
Penghui Li, Wei Meng, Kangjie Lu, and Changhua Luo. 2021. On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution. In Proceedings of the Web Conference. 58–69.
[38]
Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, and Yanran Li. 2018. Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940(2018).
[39]
Heloise Maurel, Santiago Vidal, and Tamara Rezk. 2021. Statically Identifying XSS using Deep Learning. In Proceedings of the International Conference on Security and Cryptography.
[40]
Sebastian Roschke Michal Zalewski, Niels Heinen. 2012. Skipfish - web application security scanner. https://code.google.com/archive/p/skipfish/.
[41]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. 1928–1937.
[42]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
[43]
Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn. 2018. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning. In Proceedings of the International Conference on Learning Representations.
[44]
Netsparker. 2021. Web Application Advisories by Netsparker. https://www.netsparker.com/web-applications-advisories/.
[45]
OpenAI. 2021. OpenAI Gym. https://gym.openai.com/.
[46]
OWASP. 2020. ZAP: The OWASP Zed Attack Proxy. https://www.zaproxy.org/.
[47]
OWASP. 2021. OWASP Benchmark. https://owasp.org/www-project-benchmark/.
[48]
OWASP. 2021. OWASP Top Ten. https://owasp.org/www-project-top-ten/.
[49]
OWASP. 2021. OWASP XSS Fitler Evasion Cheat Sheet. https://owasp.org/www-community/xss-filter-evasion-cheatsheet.
[50]
Cosmin Paduraru, Daniel J Mankowitz, Gabriel Dulac-Arnold, Jerry Li, Nir Levine, Sven Gowal, and Todd Hezster. 2021. Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks & Analysis. Machine Learning Journal(2021).
[51]
Giancarlo Pellegrino, Constantin Tschürtz, Eric Bodden, and Christian Rossow. 2015. jäk: Using dynamic analysis to crawl and test modern web applications. In Proceedings of the International Conference on Research in Attacks, Intrusions, and Defenses. 295–316.
[52]
Jan Peters, Sethu Vijayakumar, and Stefan Schaal. 2003. Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots Humanoids. 1–20.
[53]
PortSwigger. 2021. Burp Suite - Cybersecurity Software from PortSwigger. https://portswigger.net/burp.
[54]
Portswigger. 2021. Portswigger Research - Cross-Site Scripting. https://portswigger.net/research/cross-site-scripting-research.
[55]
PortSwigger. 2022. Cross-site scripting cheet sheet. https://portswigger.net/web-security/cross-site-scripting/cheat-sheet.
[56]
Abdul Razzaq, Ali Hur, H Farooq Ahmad, and Muddassar Masood. 2013. Cyber security: Threats, reasons, challenges, methodologies and state of the art solutions for industrial applications. In Proceedings of the IEEE Eleventh International Symposium on Autonomous Decentralized Systems. 1–6.
[57]
Leonard Richardson. 2021. Beautiful Soup. https://www.crummy.com/software/BeautifulSoup/.
[58]
Marcelo Invert Palma Salas and Eliane Martins. 2014. Security testing methodology for vulnerabilities detection of XSS in web services and ws-security. Electronic Notes in Theoretical Computer Science 302 (2014), 133–154.
[59]
Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017, 19 (2017), 70–76.
[60]
Somdev Sangwan. 2019. XSStrike - Advanced XSS Detection Suite. https://github.com/s0md3v/XSStrike.
[61]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).
[62]
Haruyama Seigo. 2011. Vulnerable-Site-Sample. https://github.com/haruyama/Vulnerable-Site-Sample/tree/master/xss.
[63]
Prashant S Shinde and Shrikant B Ardhapurkar. 2016. Cyber security analysis using vulnerability assessment and penetration testing. In Proceedings of the World Conference on Futuristic Trends in Research and Innovation for Social Welfare. 1–5.
[64]
SpiderLabs. 2021. ModSecurity: Open soruce Web Applictaion Firewall. https://github.com/SpiderLabs/ModSecurity.
[65]
Dafydd Stuttard. 2009. PortSwigger Blog - Content discovery. https://portswigger.net/blog/v13p-content-discovery.
[66]
Nicolas Surribas. 2021. Wapiti. https://wapiti.sourceforge.io/.
[67]
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the Advances in Neural Information Processing Systems. 1057–1063.
[68]
Iram Tariq, Muddassar Azam Sindhu, Rabeeh Ayaz Abbasi, Akmal Saeed Khattak, Onaiza Maqbool, and Ghazanfar Farooq Siddiqui. 2021. Resolving cross-site scripting attacks through genetic algorithm and reinforcement learning. Expert Systems with Applications 168 (2021), 114386.
[69]
UliCMS. 2022. UliCMS - Make Content Management Great Again. https://en.ulicms.de/.
[70]
Steven Van Acker, Nick Nikiforakis, Lieven Desmet, Wouter Joosen, and Frank Piessens. 2012. FlashOver: Automated discovery of cross-site scripting vulnerabilities in rich internet applications. In Proceedings of the ACM Symposium on Information, Computer and Communications Security. 12–13.
[71]
Shangxing Wang, Hanpeng Liu, Pedro Henrique Gomes, and Bhaskar Krishnamachari. 2018. Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Transactions on Cognitive Communications and Networking 4, 2(2018), 257–265.
[72]
Xianbo Wang and Han Hu. 2020. Evading Web Application Firewalls with Reinforcement Learning. https://openreview.net/forum?id=m5AntlhJ7Z5
[73]
Gary Wassermann and Zhendong Su. 2008. Static detection of cross-site scripting vulnerabilities. In Proceedings of the International Conference on Software Engineering. 171–180.
[74]
Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Richard Shin, and Dawn Song. 2011. A systematic analysis of XSS sanitization in web application frameworks. In Proceedings of the European Symposium on Research in Computer Security. 150–171.
[75]
XSSer. 2020. Cross Site ”Scripter” (aka XSSer). https://github.com/epsylon/xsser.

Cited By

View all
  • (2024)Bridging the Gap: A Survey and Classification of Research-Informed Ethical Hacking ToolsJournal of Cybersecurity and Privacy10.3390/jcp40300214:3(410-448)Online publication date: 16-Jul-2024
  • (2024)SoK: A Comparison of Autonomous Penetration Testing AgentsProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3664484(1-10)Online publication date: 30-Jul-2024
  • (2024)RecurScan: Detecting Recurring Vulnerabilities in PHP Web ApplicationsProceedings of the ACM Web Conference 202410.1145/3589334.3645530(1746-1755)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Link: Black-Box Detection of Cross-Site Scripting Vulnerabilities Using Reinforcement Learning
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '22: Proceedings of the ACM Web Conference 2022
    April 2022
    3764 pages
    ISBN:9781450390965
    DOI:10.1145/3485447
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-site scripting
    2. penetration testing;
    3. reinforcement learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Research Foundation of Korea (NRF) Grant

    Conference

    WWW '22
    Sponsor:
    WWW '22: The ACM Web Conference 2022
    April 25 - 29, 2022
    Virtual Event, Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)164
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 27 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bridging the Gap: A Survey and Classification of Research-Informed Ethical Hacking ToolsJournal of Cybersecurity and Privacy10.3390/jcp40300214:3(410-448)Online publication date: 16-Jul-2024
    • (2024)SoK: A Comparison of Autonomous Penetration Testing AgentsProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3664484(1-10)Online publication date: 30-Jul-2024
    • (2024)RecurScan: Detecting Recurring Vulnerabilities in PHP Web ApplicationsProceedings of the ACM Web Conference 202410.1145/3589334.3645530(1746-1755)Online publication date: 13-May-2024
    • (2024)Where URLs Become Weapons: Automated Discovery of SSRF Vulnerabilities in Web Applications2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00198(239-257)Online publication date: 19-May-2024
    • (2024)Soft Actor-Critic Based Anti-Attack XSS Detection2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00080(591-600)Online publication date: 1-Jul-2024
    • (2024)Twenty-two years since revealing cross-site scripting attacks: A systematic mapping and a comprehensive surveyComputer Science Review10.1016/j.cosrev.2024.10063452(100634)Online publication date: May-2024
    • (2024)Personal data filtering: a systematic literature review comparing the effectiveness of XSS attacks in web applications vs cookie stealingAnnals of Telecommunications10.1007/s12243-024-01022-879:11-12(763-802)Online publication date: 18-Apr-2024
    • (2023)Grey-Box Fuzzing Based on Reinforcement Learning for XSS VulnerabilitiesApplied Sciences10.3390/app1304248213:4(2482)Online publication date: 15-Feb-2023
    • (2023)Performance Evaluation of Machine Learning Techniques for Detecting Cross-Site Scripting Attacks2023 11th International Conference on Emerging Trends in Engineering & Technology - Signal and Information Processing (ICETET - SIP)10.1109/ICETET-SIP58143.2023.10151468(1-5)Online publication date: 28-Apr-2023
    • (2023)Machine and Deep Learning-based XSS Detection ApproachesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.10162835:7Online publication date: 1-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media