Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3379597.3387501acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

Published: 18 September 2020 Publication History

Abstract

We collected a large C/C++ code vulnerability dataset from open-source Github projects, namely Big-Vul. We crawled the public Common Vulnerabilities and Exposures (CVE) database and CVE-related source code repositories. Specifically, we collected the descriptive information of the vulnerabilities from the CVE database, e.g., CVE IDs, CVE severity scores, and CVE summaries. With the CVE information and its related published Github code repository links, we downloaded all of the code repositories and extracted vulnerability related code changes. In total, Big-Vul contains 3,754 code vulnerabilities spanning 91 different vulnerability types. All these code vulnerabilities are extracted from 348 Github projects. All information is stored in the CSV format. We linked the code changes with the CVE descriptive information. Thus, our Big-Vul can be used for various research topics, e.g., detecting and fixing vulnerabilities, analyzing the vulnerability related code changes. Big-Vul is publicly available on Github.

References

[1]
CVE Details. 2020. CVE Details Website. http://https://www.cvedetails.com/.
[2]
Antonios Gkortzis, Dimitris Mitropoulos, and Diomidis Spinellis. 2018. VulinOSS: a dataset of security vulnerabilities in open-source systems. In Proceedings of the 15th International Conference on Mining Software Repositories. 18--21.
[3]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018).
[4]
Serena Elisa Ponta, Henrik Plate, and Antonino Sabetta. 2018. Beyond metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 449--460.
[5]
Serena Elisa Ponta, Henrik Plate, Antonino Sabetta, Michele Bezzi, and Cédric Dangremont. 2019. A manually-curated dataset of fixes to vulnerabilities of open-source software. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 383--387.
[6]
This Project. [n.d.]. Our C/C++dataset. https://github.com/ZeoVan/MSR_20_Code_Vulnerability_CSV_Dataset.
[7]
Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 757--762.
[8]
Antonino Sabetta and Michele Bezzi. 2018. A practical approach to the automatic classification of security-relevant commits. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 579--582.
[9]
Zack Whittaker. 2020. Microsoft and NSA say a security bug affects millions of Windows 10 computers. https://techcrunch.com/2020/01/14/microsoft-critical-certificates-bug/.
[10]
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy. IEEE, 590--604.
[11]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems. 10197--10207.
[12]
Yaqin Zhou and Asankhaya Sharma. 2017. Automated identification of security issues from commit messages and bug reports. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 914--919.

Cited By

View all
  • (2024)Vul-Mixer: Efficient and Effective Machine Learning–Assisted Software Vulnerability DetectionElectronics10.3390/electronics1313253813:13(2538)Online publication date: 28-Jun-2024
  • (2024)Improving VulRepair’s Perfect Prediction by Leveraging the LION OptimizerApplied Sciences10.3390/app1413575014:13(5750)Online publication date: 1-Jul-2024
  • (2024)A systematic literature review on the impact of AI models on the security of code generationFrontiers in Big Data10.3389/fdata.2024.13867207Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSR '20: Proceedings of the 17th International Conference on Mining Software Repositories
    June 2020
    675 pages
    ISBN:9781450375177
    DOI:10.1145/3379597
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. C/C++ Code
    2. Code Changes
    3. Common Vulnerabilities and Exposures

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    MSR '20
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,435
    • Downloads (Last 6 weeks)139
    Reflects downloads up to 20 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Vul-Mixer: Efficient and Effective Machine Learning–Assisted Software Vulnerability DetectionElectronics10.3390/electronics1313253813:13(2538)Online publication date: 28-Jun-2024
    • (2024)Improving VulRepair’s Perfect Prediction by Leveraging the LION OptimizerApplied Sciences10.3390/app1413575014:13(5750)Online publication date: 1-Jul-2024
    • (2024)A systematic literature review on the impact of AI models on the security of code generationFrontiers in Big Data10.3389/fdata.2024.13867207Online publication date: 13-May-2024
    • (2024)Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language ProcessingIEICE Transactions on Information and Systems10.1587/transinf.2023DAP0013E107.D:5(674-682)Online publication date: 1-May-2024
    • (2024)Slice-level vulnerability detection model based on graph neural networkProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673287(52-57)Online publication date: 19-Jan-2024
    • (2024)MatsVD: Boosting Statement-Level Vulnerability Detection via Dependency-Based AttentionProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674807(115-124)Online publication date: 24-Jul-2024
    • (2024)T-RAP: A Template-guided Retrieval-Augmented Vulnerability Patch Generation ApproachProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3672506(105-114)Online publication date: 24-Jul-2024
    • (2024)VinJ: An Automated Tool for Large-Scale Software Vulnerability Data GenerationCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663800(567-571)Online publication date: 10-Jul-2024
    • (2024)Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPTProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661281(679-685)Online publication date: 18-Jun-2024
    • (2024)How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability PatchingProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661200(150-159)Online publication date: 18-Jun-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media