Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3524842.3528482acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

Vul4J: a dataset of reproducible Java vulnerabilities geared towards the study of program repair techniques

Published: 17 October 2022 Publication History

Abstract

In this work we present Vul4J, a Java vulnerability dataset where each vulnerability is associated to a patch and, most importantly, to a Proof of Vulnerability (PoV) test case. We analyzed 1803 fix commits from 912 real-world vulnerabilities in the Project KB knowledge base to extract the reproducible vulnerabilities, i.e., vulnerabilities that can be triggered by one or more PoV test cases. To this aim, we ran the test suite of the application in both, the vulnerable and secure versions, to identify the corresponding PoVs. Furthermore, if no PoV test case was spotted, then we wrote it ourselves. As a result, Vul4J includes 79 reproducible vulnerabilities from 51 open-source projects, spanning 25 different Common Weakness Enumeration (CWE) types. To the extent of our knowledge, this is the first dataset of its kind created for Java. Particularly, it targets the study of Automated Program Repair (APR) tools, where PoVs are often necessary in order to identify plausible patches. We made our dataset and related tools publically available on GitHub.

References

[1]
[n. d.]. Find Security Bugs. https://find-sec-bugs.github.io. Accessed: 2022-01-15.
[2]
[n. d.]. Flawfinder Home Page. https://dwheeler.com/flawfinder. Accessed: 2022-01-15.
[3]
[n. d.]. Home page of project "KB". https://github.com/sap/project-kb. Accessed: 2022-01-15.
[4]
Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: automated collection of vulnerabilities and their fixes from open-source software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 30--39.
[5]
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2021. Deep learning based vulnerability detection: Are we there yet. IEEE Transactions on Software Engineering (2021).
[6]
Jianlei Chi, Yu Qu, Ting Liu, Qinghua Zheng, and Heng Yin. 2020. SeqTrans: Automatic Vulnerability Fix via Sequence to Sequence Learning. arXiv preprint arXiv:2010.10805 (2020).
[7]
Jiahao Fan, Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. 508--512.
[8]
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416--419.
[9]
Xiang Gao, Bo Wang, Gregory J Duck, Ruyi Ji, Yingfei Xiong, and Abhik Roychoudhury. 2021. Beyond Tests: Program Vulnerability Repair via Crash Constraint Extraction. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1--27.
[10]
Tiago Espinha Gasiba, Ulrike Lechner, Maria Pinto-Albuquerque, and Daniel Mendez. 2021. Is Secure Coding Education in the Industry Needed? An Investigation Through a Large Scale Survey. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE, 241--252.
[11]
Antonios Gkortzis, Dimitris Mitropoulos, and Diomidis Spinellis. 2018. VulinOSS: A Dataset of Security Vulnerabilities in Open-Source Systems. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 18--21.
[12]
Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher P Reale, Rebecca L Russell, Louis Y Kim, and Peter Chin. 2018. Learning to repair software vulnerabilities with generative adversarial networks. arXiv preprint arXiv:1805.07475 (2018).
[13]
Zhen Huang, David Lie, Gang Tan, and Trent Jaeger. 2019. Using safety properties to generate vulnerability patches. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 539--554.
[14]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of existing faults to enable controlled testing studies for Java programs. In ISSTA 2014, Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 437--440. Tool demo.
[15]
Siqi Ma, Ferdian Thung, David Lo, Cong Sun, and Robert H Deng. 2017. Vurle: Automatic vulnerability detection and repair by learning from examples. In European Symposium on Research in Computer Security. Springer, 229--246.
[16]
Georgios Nikitopoulos, Konstantina Dritsa, Panos Louridas, and Dimitris Mitropoulos. 2021. CrossVul: A Cross-Language Vulnerability Dataset with Commit Data. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 1565--1569.
[17]
Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. 815--816.
[18]
Eduard Pinconschi, Rui Abreu, and Pedro Adão. 2021. A Comparative Study of Automatic Program Repair Techniques for Security Vulnerabilities. In 2021 IEEE 32th International Symposium on Software Reliability Engineering (ISSRE).
[19]
Serena E. Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cédric Dangremont. 2019. A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. In Proceedings of the 16th International Conference on Mining Software Repositories (Montreal, Quebec, Canada) (MSR '19). IEEE Press, 383--387.
[20]
Ying Zhang, Mahir Kabir, Ya Xiao, Na Meng, et al. 2021. Data-Driven Vulnerability Detection and Repair in Java Code. arXiv preprint arXiv:2102.06994 (2021).

Cited By

View all
  • (2024)A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation FeedbackProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664770(103-111)Online publication date: 10-Jul-2024
  • (2024)Reality Check: Assessing GPT-4 in Fixing Real-World Software VulnerabilitiesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661207(252-261)Online publication date: 18-Jun-2024
  • (2024)PPT4J: Patch Presence Test for Java BinariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639231(1-12)Online publication date: 20-May-2024
  • Show More Cited By

Index Terms

  1. Vul4J: a dataset of reproducible Java vulnerabilities geared towards the study of program repair techniques

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
    May 2022
    815 pages
    ISBN:9781450393034
    DOI:10.1145/3524842
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. java
    2. program repair
    3. vulnerability

    Qualifiers

    • Short-paper

    Funding Sources

    • European Union

    Conference

    MSR '22
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)183
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 20 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation FeedbackProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664770(103-111)Online publication date: 10-Jul-2024
    • (2024)Reality Check: Assessing GPT-4 in Fixing Real-World Software VulnerabilitiesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661207(252-261)Online publication date: 18-Jun-2024
    • (2024)PPT4J: Patch Presence Test for Java BinariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639231(1-12)Online publication date: 20-May-2024
    • (2024)Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and SourcesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639222(1-13)Online publication date: 20-May-2024
    • (2024)On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair toolsEmpirical Software Engineering10.1007/s10664-024-10506-z29:5Online publication date: 3-Aug-2024
    • (2023)How Effective Are Neural Networks for Fixing Security VulnerabilitiesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598135(1282-1294)Online publication date: 12-Jul-2023
    • (2023)BUGSC++: A Highly Usable Real World Defect Benchmark for C/C++2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00208(2034-2037)Online publication date: 11-Sep-2023
    • (2023)PreciseBugCollector: Extensible, Executable and Precise Bug-Fix Collection: Solution for Challenge 8: Automating Precise Data Collection for Code Snippets with Bugs, Fixes, Locations, and Types2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00163(1899-1910)Online publication date: 11-Sep-2023
    • (2023)Compsuite: A Dataset of Java Library Upgrade Incompatibility Issues2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00127(2098-2101)Online publication date: 11-Sep-2023
    • (2023)APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilitiesEmpirical Software Engineering10.1007/s10664-023-10415-729:1Online publication date: 6-Dec-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media