Abstract
The risk of exploiting information security vulnerabilities should not be determined solely by a single metric, such as the Common Vulnerability Scoring System (CVSS). This approach disregards the global threat landscape and the vulnerable asset. Therefore, in addition to using traditional Vulnerability Management (VM) tools, analysts and researchers must manually curate datasets containing threat intelligence and context-specific information about security flaws. However, this activity is non-trivial and error-prone. To aid this endeavor, we developed a fully automated tool capable of gathering data about the intrinsic characteristics of vulnerabilities available in the National Vulnerability Database (NVD) and augmented it with information collected from multiple security feeds and social networks. Altogether, we collected data on more than 200,000 vulnerabilities that can be used for various research topics, e.g., analyzing the risk of exploiting security flaws, vulnerability severity prediction, etc. In this paper, we present a detailed description of the methodology used to create our dataset with its attributes. Additionally, we perform an exploratory analysis of the data gathered, and finally, we present an illustrative example of how analysts could use the data collected. The CVEjoin dataset and the scripts used for its construction are publicly available on GitHub.
Similar content being viewed by others
Notes
- 1.
National vulnerability database website: https://nvd.nist.gov/.
- 2.
The Mitre corporation website: https://cwe.mitre.org.
- 3.
The Open Web Application Security Project website:https://owasp.org.
- 4.
An archive of vulnerable software and exploits: https://www.exploit-db.com/.
- 5.
Metric for estimating the probability of a vulnerability being exploited: https://www.first.org/epss/.
- 6.
Microsoft security advisory: https://msrc.microsoft.com/update-guide/en-us.
- 7.
Adobe security advisory: https://helpx.adobe.com/security.html.
- 8.
Intel security advisory: https://www.intel.com/content/www/us/en/security-center/default.html.
- 9.
Python package for working with URLs: https://docs.python.org/3/library/urllib.html.
- 10.
Python library for scraping information from web pages: https://pypi.org/project/beautifulsoup4/.
- 11.
Code developed to create the dataset: https://github.com/rodrigoparente/cvejoin-security-dataset.
- 12.
Python package for data analysis and manipulation: https://pypi.org/project/pandas/.
- 13.
News about Log4J vulnerability and how it was exploited: https://blog.qualys.com/vulnerabilities-threat-research/2021/12/10/apache-log4j2-zero-day-exploited-in-the-wild-log4shell.
- 14.
- 15.
Vulnerability affecting SUSE OS: https://nvd.nist.gov/vuln/detail/CVE-2020-8025.
- 16.
Vulnerability affecting a help desk tool: https://nvd.nist.gov/vuln/detail/CVE-2020-15849.
References
NVD, NIST. (2022, November 18). NIST National Vulnerability Database. Retrieved November 18 2022. https://nvd.nist.gov/
Furnell, S., Fischer, P., Finch, A.: Can’t get the staff? The growing need for cyber-security skills. Comput. Fraud Secur. 2017(2), 5–10 (2017)
Forum of Incident Response and Security Teams (2019, June). CVSS v3.1 Specification Document [White paper]. Retrieved November 18 2022. https://www.first.org/cvss/v3.1/specification-document
Dey, D., Lahiri, A., Zhang, G.: Optimal policies for security patch management. INFORMS J. Comput. 27(3), 462–477 (2015)
Spring, J., Hatleback, E., Householder, A., Manion, A., Shick, D.: Time to Change the CVSS? IEEE Security Privacy 19(2), 74–78 (2021)
Trifonov, R., Nakov, O., Mladenov, V.: Artificial Intelligence in Cyber Threats Intelligence. In: 2018 International Conference on Intelligent and Innovative Computing Applications (ICONIC), (pp. 1–4). IEEE (2018)
Elbaz, C., Rilling, L., Morin, C.: Automated Risk Analysis of a Vulnerability Disclosure Using Active Learning. In: Proceedings of the 28th Computer and Electronics Security Application Rendezvous (2021)
Bhandari, G., Naseer, A., Moonen, L.: CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering , pp. 30–39. Association for Computing Machinery (2021)
Fan, J., Li, Y., Wang, S., Nguyen, T.: A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 508–512 Association for Computing Machinery (2020)
Jimenez, M., Le Traon, Y., Papadakis, M.: [Engineering Paper] Enabling the Continuous Analysis of Security Vulnerabilities with VulData7. In: 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 56–61. IEEE (2018)
Gkortzis, A., Mitropoulos, D., Spinellis, D.: VulinOSS: A Dataset of Security Vulnerabilities in Open-Source Systems. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp. 18–21. Association for Computing Machinery (2018)
Alves, H., Fonseca, B., Antunes, N.: Software Metrics and Security Vulnerabilities: Dataset and Exploratory Study. In: 2016 12th European Dependable Computing Conference (EDCC), pp. 37–44. IEEE (2016)
Ponta, S., Plate, H., Sabetta, A., Bezzi, M.,Dangremont, C.: A manually-curated dataset of fixes to vulnerabilities of open-source software. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 383–387. IEEE (2019)
Foreman, P.: Vulnerability management. Auerbach Publications (2019)
Alexander, J.: Risk, threat, or vulnerability? what’s the difference. Retrieved November 18, 2022 (2021). https://www.kennasecurity.com/blog/risk-vs-threat-vs-vulnerability/
Conti, M., Dargahi, T., Dehghantanha, A.: Cyber threat intelligence: challenges and opportunities. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds.) Cyber Threat Intelligence. AIS, vol. 70, pp. 1–6. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73951-9_1
Bromander, S.: Understanding Cyber Threat Intelligence: Towards Automation [Doctoral’s Thesis, University of Oslo] (2021). The University of Oslo Institutt for informatikk. https://www.duo.uio.no/handle/10852/84713
Suciu, O., Nelson, C., Lyu, Z., Bao, T., Dumitras, T.: Expected exploitability: Predicting the development of functional vulnerability exploits. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 377–394 (2022)
RecordFuture, Inc. (2021, February). Top Exploited Vulnerabilities in 2020 Affect Citrix, Microsoft Products [White paper]. Retrieved November 18 2022. https://go.recordedfuture.com/hubfs/reports/cta-2021-0209.pdf
Acknowledgment
The authors would like to thank CAPES for the financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
da Ponte, F.R.P., Rodrigues, E.B., Mattos, C.L.C. (2023). CVEjoin: An Information Security Vulnerability and Threat Intelligence Dataset. In: Barolli, L. (eds) Advanced Information Networking and Applications. AINA 2023. Lecture Notes in Networks and Systems, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-031-29056-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-29056-5_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29055-8
Online ISBN: 978-3-031-29056-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)