Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Aspect-level Information Discrepancies across Heterogeneous Vulnerability Reports: Severity, Types and Detection Methods

Published: 22 December 2023 Publication History

Abstract

Vulnerable third-party libraries pose significant threats to software applications that reuse these libraries. At an industry scale of reuse, manual analysis of third-party library vulnerabilities can be easily overwhelmed by the sheer number of vulnerabilities continually collected from diverse sources for thousands of reused libraries. Our study of four large-scale, actively maintained vulnerability databases (NVD, IBM X-Force, ExploitDB, and Openwall) reveals the wide presence of information discrepancies, in terms of seven vulnerability aspects, i.e., product, version, component, vulnerability type, root cause, attack vector, and impact, between the reports for the same vulnerability from heterogeneous sources. It would be beneficial to integrate and cross-validate multi-source vulnerability information, but it demands automatic aspect extraction and aspect discrepancy detection. In this work, we experimented with a wide range of NLP methods to extract named entities (e.g., product) and free-form phrases (e.g., root cause) from textual vulnerability reports and to detect semantically different aspect mentions between the reports. Our experiments confirm the feasibility of applying NLP methods to automate aspect-level vulnerability analysis and identify the need for domain customization of general NLP methods. Based on our findings, we propose a discrepancy-aware, aspect-level vulnerability knowledge graph and a KG-based web portal that integrates diversified vulnerability key aspect information from heterogeneous vulnerability databases. Our conducted user study proves the usefulness of our web portal. Our study opens the door to new types of vulnerability integration and management, such as vulnerability portraits of a product and explainable prediction of silent vulnerabilities.

References

[1]
Muhammad Abubakar, Adil Ahmad, Pedro Fonseca, and Dongyan Xu. 2021. SHARD: Fine-grained kernel specialization with context-aware hardening. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, Vancouver, B.C.
[2]
Anonymous. 2020. Utilizing data from cvedetails.com, I created this graph to easily compare the amount of AMD and Intel vulnerabilities. https://www.reddit.com/r/Amd/comments/ek6m1q/utilizing_data_from_cvedetailscom_i_created_this/. Accessed: 2022-06-17.
[3]
Afsah Anwar, Ahmed Abusnaina, Songqing Chen, Frank Li, and David Mohaisen. 2021. Cleaning the NVD: Compre-hensive quality assessment, improvements, and analyses. In 19th Transactions on Dependable and Secure Computing.
[4]
Apple Support. 2020. https://support.apple.com/en-us/HT209106. Accessed: 2020-12-31.
[5]
Priyam Biswas, Alessandro Di Federico, Scott A. Carr, Prabhu Rajasekaran, Stijn Volckaert, Yeoul Na, Michael Franz, and Mathias Payer. 2017. Venerable variadic vulnerabilities vanquished. In 26th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 17). 186–198.
[6]
CERT Coordination Center. 1991. CERT advisory CA-91:21. Published electronically athttp://www.cert.org/advisories/CA-1991-21.html
[7]
CERT Coordination Center Vulnerability Notes Database. 2020. https://www.kb.cert.org/vuls/. Accessed: 2020-12-31.
[8]
Chantana Chantrapornchai and Aphisit Tunsakul. 2019. Information extraction based on named entity for tourism corpus. In 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE, 187–192.
[9]
YuXuan Chen, Jianwei Ding, Dashuang Li, and Zhouguo Chen. 2021. Joint BERT model based cybersecurity named entity recognition. In 2021 The 4th International Conference on Software Engineering and Information Management. 236–242.
[10]
Common Vulnerabilities and Exposures. 2020. https://cve.mitre.org/index.html. Accessed: 2020-12-31.
[11]
Common Weakness Enumeration. 2020. https://cwe.mitre.org/. Accessed: 2020-12-31.
[12]
Community Attestation Service. 2022. https://cas.codenotary.com/#sbom. Accessed: 2022-03-31.
[13]
CVE Details. 2023. https://www.cvedetails.com/. Accessed: 2023-05-25.
[14]
CVE Numbering Authorities. 2023. https://www.cve.org/ProgramOrganization/CNAs. Accessed: 2023-05-25.
[15]
CVE Request Template. 2020. http://cveproject.github.io/docs/content/key-details-phrasing.pdf. Accessed: 2020-12-31.
[16]
Cybersecurity and Infrastructure Security Agency. 2021. Industrial Control Systems. https://us-cert.cisa.gov/ics. Accessed: 2020-12-31.
[17]
Dependabot. 2022. https://github.com/dependabot/dependabot-core. Accessed: 2022-03-31.
[18]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.
[19]
Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang. 2019. Towards the detection of inconsistencies in public security vulnerability reports. In 28th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 19). 869–885.
[20]
ElementTree. 2022. https://docs.python.org/3/library/xml.etree.elementtree.html. Accessed: 2022-06-17.
[21]
Exploit Database. 2020. https://www.exploit-db.com/. Accessed: 2020-12-31.
[22]
Xuan Feng, Xiaojing Liao, XiaoFeng Wang, Haining Wang, Qiang Li, Kai Yang, Hongsong Zhu, and Limin Sun. 2019. Understanding and securing device vulnerabilities through automated bug report analysis. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 887–903.
[23]
Chen Gao, Xuan Zhang, and Hui Liu. 2021. Data and knowledge-driven named entity recognition for cyber security. Cybersecurity 4, 1 (2021), 1–13.
[24]
Xinyang Ge, Nirupama Talele, Mathias Payer, and Trent Jaeger. 2016. Fine-grained control-flow integrity for kernel software. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 179–194.
[25]
X. Gong, Z. Xing, X. Li, Z. Feng, and Z. Han. 2019. Joint prediction of multiple vulnerability characteristics through multi-task learning. In 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS). 31–40.
[26]
Hao Guo, Sen Chen, Zhenchang Xing, Xiaohong Li, Yude Bai, and Jiamou Sun. 2022. Detecting and augmenting missing key aspects in vulnerability descriptions. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1–27.
[27]
Z. Han, X. Li, Z. Xing, H. Liu, and Z. Feng. 2017. Learning to predict severity of software vulnerability using only vulnerability description. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). 125–136.
[28]
Foyzul Hassan and Xiaoyin Wang. 2017. Mining readme files to support automatic building of Java projects in software repositories. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 277–279.
[29]
Hao He, Runzhi He, Haiqiao Gu, and Minghui Zhou. 2021. A large-scale empirical study on Java library migrations: Prevalence, trends, and rationales. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 478–490.
[30]
Matthew Honnibal and Mark Johnson. 2015. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 1373–1378.
[31]
IBM X-Force. 2020. https://exchange.xforce.ibmcloud.com/activity/list?filter=Vulnerabilities. Accessed: 2020-12-31.
[32]
Internet Security Services. 1999. Online database x-force. Published electronically athttp://xforce.iss.net/
[33]
Kaspersky. 2023. https://www.kaspersky.com.au/. Accessed: 2023-05-25.
[34]
Milind Kulkarni. 2020. Our CVE Story: Using the CVE Program to Provide Reliable Vulnerability Information. https://cve.mitre.org/blog/December152020_Our_CVE_Story_Using_the_CVE_Program_to_Provide_Reliable_Vulnerability_Information.html. Accessed: 2020-12-31.
[35]
Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, and Kilian Q. Weinberger. 2015. From word embeddings to document distances. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML’15). JMLR.org, 957–966.
[36]
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289.
[37]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 260–270.
[38]
Vector Guo Li, Matthew Dunn, Paul Pearce, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. 2019. Reading the tea leaves: A comparative analysis of threat intelligence. In 28th USENIX Security Symposium (USENIX Security 19). 851–867.
[39]
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129–137.
[40]
Kangjie Lu, Aditya Pakki, and Qiushi Wu. 2019. Detecting missing-check bugs via semantic-and context-aware criticalness and constraints inferences. In 28th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 19). 1769–1786.
[41]
S. Ma, Z. Xing, C. Chen, C. Chen, L. Qu, and G. Li. 2019. Easy-to-deploy API extraction by multi-level feature embedding and transfer learning. IEEE Transactions on Software Engineering (2019), 1–1.
[42]
David E. Mann and Steven M. Christey. 1999. Towards a common enumeration of vulnerabilities. In 2nd Workshop on Research with Security Vulnerability Databases, Purdue University, West Lafayette, Indiana.
[43]
R. A. Martin. 2003. Integrating your information security vulnerability management capabilities through industry standards (CVE OVAL). In SMC’03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483), Vol. 2.
[44]
Microsoft Security. 2020. https://msrc.microsoft.com/update-guide/vulnerability. Accessed: 2020-12-31.
[45]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[46]
Dongliang Mu, Alejandro Cuevas, Limin Yang, Hang Hu, Xinyu Xing, Bing Mao, and Gang Wang. 2018. Understanding the reproducibility of crowd-reported security vulnerabilities. In 27th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 18). 919–936.
[47]
National Vulnerability Database. 2020. https://nvd.nist.gov/. Accessed: 2020-12-31.
[48]
Network Associates Incorporated. 1999. Proprietary Vulnerability Database for CyberCop Scanner 2.4.
[49]
Norton. 2023. https://au.norton.com/. Accessed: 2023-05-25.
[50]
Openwall oss-security mailing list. 2020. https://www.openwall.com/lists/oss-security/. Accessed: 2020-12-31.
[51]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543.
[52]
Jannik Pewny, Philipp Koppe, and Thorsten Holz. 2019. Steroids for DOPed applications: A compiler for automated data-oriented programming. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 111–126.
[53]
Marios Pomonis, Theofilos Petsios, Angelos D. Keromytis, Michalis Polychronakis, and Vasileios P. Kemerlis. 2019. Kernel protection against just-in-time code reuse. ACM Transactions on Privacy and Security (TOPS) 22, 1 (2019), 1–28.
[54]
Sergej Proskurin, Marius Momeu, Seyedhamed Ghavamnia, Vasileios P. Kemerlis, and Michalis Polychronakis. 2020. xMP: Selective memory protection for kernel and user space. In 2020 IEEE Symposium on Security and Privacy (SP). 584–598.
[55]
Lin Qiu, Dongyu Ru, Quanyu Long, Weinan Zhang, and Yong Yu. 2020. QA4IE: A question answering based system for document-level general information extraction. IEEE Access 8 (2020), 29677–29689.
[56]
Lin Qiu, Hao Zhou, Yanru Qu, Weinan Zhang, Suoheng Li, Shu Rong, Dongyu Ru, Lihua Qian, Kewei Tu, and Yong Yu. 2018. QA4IE: A question answering based framework for information extraction. In International Semantic Web Conference. Springer, 198–216.
[57]
Ralf Ramsauer, Lukas Bulwahn, Daniel Lohmann, and Wolfgang Mauerer. 2020. The sound of silence: Mining security vulnerabilities from secret integration channels in open-source projects. In Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop (CCSW’20). Association for Computing Machinery, New York, NY, USA, 147–157.
[58]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992.
[59]
Dongyu Ru, Zhenghui Wang, Lin Qiu, Hao Zhou, Lei Li, Weinan Zhang, and Yong Yu. 2020. QuAChIE: Question answering based Chinese information extraction system. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2177–2180.
[60]
Antonino Sabetta and M. Bezzi. 2018. A practical approach to the automatic classification of security-relevant commits. 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2018), 579–582.
[61]
Secureteam. 2022. https://secureteam.co.uk/. Accessed: 2022-03-31.
[62]
Ramin Shokripour, John Anvik, Zarinah M. Kasirun, and Sima Zamani. 2013. Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 2–11.
[63]
Christian Sillaber, Clemens Sauerwein, Andrea Mussmann, and Ruth Breu. 2016. Data quality challenges and future research directions in threat intelligence sharing practice. In Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security. 65–70.
[64]
Singh. 2013. Elements of Practical Geography. Kalyani Publishers.
[65]
Snyk. 2022. https://snyk.io/. Accessed: 2022-03-31.
[66]
Sonatype. 2022. https://www.sonatype.com/. Accessed: 2022-03-31.
[67]
Cristian-Alexandru Staicu, Michael Pradel, and Benjamin Livshits. 2018. SYNODE: Understanding and automatically preventing injection attacks on NODE. JS. In NDSS.
[68]
Jiamou Sun, Zhenchang Xing, Hao Guo, Deheng Ye, Xiaohong Li, Xiwei Xu, and Liming Zhu. 2021. Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization. arxiv:cs.LG/2101.01431
[69]
Jiamou Sun, Zhenchang Xing, Qinghua Lu, Xiwei Xu, and Liming Zhu. 2022. Heterogeneous vulnerability report traceability recovery by vulnerability aspect matching. 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME).
[70]
Common Vulnerability Scoring System. (n.d.).
[71]
Huanting Wang, Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Yansong Feng, Lizhong Bian, and Zheng Wang. 2021. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Transactions on Information Forensics and Security 16 (2021), 1943–1958.
[72]
Xiaoguang Wang, SengMing Yeoh, Robert Lyerly, Pierre Olivier, Sang-Hoon Kim, and Binoy Ravindran. 2020. A framework for software diversification with \(\lbrace\)ISA\(\rbrace\) heterogeneity. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (\(\lbrace\)RAID\(\rbrace\) 2020). 427–442.
[73]
WebMind. 2023. https://web-mind.io/cyber-security/windows-vs-linux-which-is-safer/. Accessed: 2023-05-25.
[74]
Qiushi Wu, Yang He, Stephen McCamant, and Kangjie Lu. 2020. Precisely characterizing security impact in a flood of patches via symbolic rule comparison. In Network and Distributed System Security Symposium (NDSS).
[75]
Wei Wu, Yueqi Chen, Xinyu Xing, and Wei Zou. 2019. \(\lbrace\)KEPLER\(\rbrace\): Facilitating control-flow hijacking primitive evaluation for Linux kernel vulnerabilities. In 28th \(\lbrace\)USENIX\(\rbrace\) Security Symposium (\(\lbrace\)USENIX\(\rbrace\) Security 19). 1187–1204.
[76]
Hongbo Xiao, Zhenchang Xing, Xiaohong Li, and Hao Guo. 2019. Embedding and predicting software security entity relationships: A knowledge graph based approach. In International Conference on Neural Information Processing. Springer, 50–63.
[77]
B. Xu, D. Ye, Z. Xing, X. Xia, G. Chen, and S. Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 51–62.
[78]
Meng Xu, Chenxiong Qian, Kangjie Lu, Michael Backes, and Taesoo Kim. 2018. Precise and scalable detection of double-fetch bugs in OS kernels. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 661–678.
[79]
Jeong Yang, Young Lee, and Arlen P. McDonald. 2021. SolarWinds software supply chain security: Better protection with enforced policies and technologies. In International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Springer, 43–58.
[80]
D. Ye, Z. Xing, C. Y. Foo, Z. Q. Ang, J. Li, and N. Kapre. 2016. Software-specific named entity recognition in software engineering social content. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 90–101.
[81]
S. Yitagesu, X. Zhang, Z. Feng, X. Li, and Z. Xing. 2021. Automatic part-of-speech tagging for security vulnerability descriptions. In 18th International Conference on Mining Software Repositories (MSR).
[82]
Wei You, Xueqiang Wang, Shiqing Ma, Jianjun Huang, Xiangyu Zhang, XiaoFeng Wang, and Bin Liang. 2019. Profuzzer: On-the-fly input type probing for better zero-day vulnerability discovery. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 769–786.
[83]
Wei You, Peiyuan Zong, Kai Chen, Xiaofeng Wang, Xiaojing Liao, Pan Bian, and Bin Liang. 2017. SemFuzz: Semantics-based automatic generation of proof-of-concept exploits. 2139–2154.
[84]
Yunyan Zhang, Guangluan Xu, Yang Wang, Daoyu Lin, Feng Li, Chenglong Wu, Jingyuan Zhang, and Tinglei Huang. 2020. A question answering-based framework for one-step event argument extraction. IEEE Access 8 (2020), 65420–65431.
[85]
Yutong Zhao, Lu Xiao, Pouria Babvey, Lei Sun, Sunny Wong, Angel A. Martinez, and Xiao Wang. 2020. Automatically identifying performance issue reports with heuristic linguistic patterns. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964–975.

Cited By

View all
  • (2024)CVECenter: Industry Practice of Automated Vulnerability Management for Linux Distribution CommunityCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663852(329-339)Online publication date: 10-Jul-2024
  • (2024)PR-GNN: Enhancing PoC Report Recommendation with Graph Neural Network2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00451(5629-5633)Online publication date: 13-May-2024

Index Terms

  1. Aspect-level Information Discrepancies across Heterogeneous Vulnerability Reports: Severity, Types and Detection Methods

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 2
    February 2024
    947 pages
    EISSN:1557-7392
    DOI:10.1145/3618077
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 December 2023
    Online AM: 16 October 2023
    Accepted: 22 August 2023
    Revised: 28 May 2023
    Received: 02 April 2022
    Published in TOSEM Volume 33, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Vulnerability key aspect
    2. information discrepancy
    3. hetergeneous vulnerability reports

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)345
    • Downloads (Last 6 weeks)28
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CVECenter: Industry Practice of Automated Vulnerability Management for Linux Distribution CommunityCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663852(329-339)Online publication date: 10-Jul-2024
    • (2024)PR-GNN: Enhancing PoC Report Recommendation with Graph Neural Network2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00451(5629-5633)Online publication date: 13-May-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media