Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3597503.3623336acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries

Published: 06 February 2024 Publication History

Abstract

Open-source software (OSS) has been extensively employed to expedite software development, inevitably exposing downstream software to the peril of potential vulnerabilities. Precisely identifying the version of OSS not only facilitates the detection of vulnerabilities associated with it but also enables timely alerts upon the release of 1-day vulnerabilities. However, current methods for identifying OSS versions rely heavily on version strings or constant features, which may not be present in compiled OSS binaries or may not be representative when only function code changes are made. As a result, these methods are often imprecise in identifying the version of OSS binaries being used.
To this end, we propose LibvDiff, a novel approach for identifying open-source software versions. It detects subtle differences through precise symbol information and function-level code changes using binary code similarity detection. LibvDiff introduces a candidate version filter based on a novel version coordinate system to improve efficiency by quantifying gaps between versions and rapidly identifying potential versions. To speed up the code similarity detection process, LibvDiff proposes a function call-based anchor path filter to minimize the number of functions compared in the target binary. We evaluate the performance of LibvDiff through comprehensive experiments under various compilation settings and two datasets (one with version strings, and the other without version strings), which demonstrate that our approach achieves 94.5% and 78.7% precision in two datasets, outperforming state-of-the-art works (including both academic methods and industry tools) by an average of 54.2% and 160.3%, respectively. By identifying and analyzing OSS binaries in real-world firmware images, we make several interesting findings, such as developers having significant differences in their updates to different OSS, and different vendors may also utilize identical OSS binaries.

References

[1]
2014. CVE-2014-0160. https://nvd.nist.gov/vuln/detail/cve-2014-0160.
[2]
2015. CVE-2015-0235. https://nvd.nist.gov/vuln/detail/cve-2015-0235.
[3]
2020. Market Guide for Software Composition Analysis. https://www.gartner.com/doc/reprints?id=1-26OO2IJ6&ct=210630&st=sb.
[4]
2021. The Forrester Wave: Software Composition Analysis, Q3 2021. https://reprints2.forrester.com/assets/2/679/RES176091/report.
[5]
2022. The best-of-breed binary code analysis tool, an indispensable item in the toolbox of world-class software analysts, reverse engineers, malware analyst and cybersecurity professionals. https://hex-rays.com/ida-pro/.
[6]
2022. Core c99 package for AWS SDK for C. Includes cross-platform primitives, configuration, data structures, and error handling. https://github.com/awslabs/aws-c-common.
[7]
2022. A fast, easy to use tool for analyzing, reverse engineering, and extracting firmware images. https://github.com/ReFirmLabs/binwalk.
[8]
2022. The official PNG reference library. http://www.libpng.org/pub/png/libpng.html.
[9]
2022. The open source, decentralized and multi-platform package manager to create and share all your native binaries. https://conan.io/.
[10]
2022. A platform supports manage cybersecurity and cyber compliance across the entire lifecycle. https://tomato.groov.pl/.
[11]
2022. Python Framework to analyse Git repositories. https://github.com/ishepard/pydriller.
[12]
2022. TLS/SSL and crypto library. https://github.com/openssl/openssl.
[13]
2023. A code hosting platform where over 100 million developers shape the future of software, together. https://github.com/.
[14]
2023. The Fedora Project is an independent project[2] to coordinate the development of Fedora Linux. https://admin.fedoraproject.org/mirrormanager/.
[15]
2023. A free replacement for Adobe's enscript program. https://gitlab.gnome.org/GNOME/libxml2.
[16]
2023. A freely available software library to render fonts. https://gitlab.freedesktop.org/freetype/freetype.
[17]
2023. Libraries.io monitors 6,373,004 open source packages across 32 different package managers, so you don't have to. https://libraries.io/.
[18]
2023. Packages for Linux and Unix. https://pkgs.org/.
[19]
2023. A third party alternative firmware based on Asuswrt-Merlin project, for different routers. https://xvtx.ru/xwrt/.
[20]
2023. The U.S. government repository of standards based vulnerability management data represented using the Security Content Automation Protocol (SCAP). https://nvd.nist.gov/.
[21]
Sumaya Almanee, Arda Ünal, Mathias Payer, and Joshua Garcia. 2021. Too Quiet in the Library: An Empirical Study of Security Updates in Android Apps' Native Code. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) (2021), 1347--1359.
[22]
Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2006. Detecting Self-mutating Malware Using Control-Flow Graph Matching. In International Conference on Detection of intrusions and malware, and vulnerability assessment.
[23]
Silvio Cesare, Yang Xiang, and Wanlei Zhou. 2014. Control Flow-Based Malware VariantDetection. IEEE Transactions on Dependable and Secure Computing 11 (2014), 307--317.
[24]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. BinGo: cross-architecture cross-OS binary search. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (2016).
[25]
Christian S. Collberg, Sam Martin, Jonathan Myers, and Jasvir Nagra. 2012. Distributed application tamper detection via continuous software updates. In Asia-Pacific Computer Systems Architecture Conference.
[26]
Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of binaries through re-optimization. Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (2017).
[27]
Yaniv David, Nimrod Partush, and Eran Yahav. 2018. FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (2018).
[28]
Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014).
[29]
Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee. 2017. Identifying Open-Source License Violation and 1-Day Security Risk at Large Scale. In CCS. 2169--2185.
[30]
Thomas Dullien. 2005. Graph-based comparison of Executable Objects.
[31]
Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In NDSS.
[32]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable Graph-Based Bug Search for Firmware Images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 480--491.
[33]
Debin Gao, Michael K. Reiter, and Dawn Xiaodong Song. 2008. BinHunt: Automatically Finding Semantic Differences in Binary Programs. In International Conference on Information, Communications and Signal Processing.
[34]
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary. In ASE. 896--899.
[35]
Xulun Hu, Weidong Zhang, Hong Li, Yan Hu, Zhaoteng Yan, Xiyue Wang, and Limin Sun. 2020. VES: A Component Version Extracting System for Large-Scale IoT Firmwares. In WASA, Dongxiao Yu, Falko Dressler, and Jiguo Yu (Eds.), Vol. 12385. 39--48.
[36]
He Huang, Amr M. Youssef, and Mourad Debbabi. 2017. BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (2017).
[37]
Christopher Krügel, Engin Kirda, Darren Mutz, William K. Robertson, and Giovanni Vigna. 2005. Polymorphic Worm Detection Using Structural Information of Executables. In International Symposium on Recent Advances in Intrusion Detection.
[38]
Tencent Security Keen Lab. 2022. BinaryAI. https://www.binaryai.cn/.
[39]
Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel J. Quinlan, and Zhendong Su. 2009. Detecting code clones in binary executables. In International Symposium on Software Testing and Analysis.
[40]
Monirul I. Sharif, Andrea Lanzi, Jonathon T. Giffin, and Wenke Lee. 2009. Automatic Reverse Engineering of Malware Emulators. 2009 30th IEEE Symposium on Security and Privacy (2009), 94--109.
[41]
Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, and Dongmei Zhang. 2022. LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries. In MSR. arXiv:2204.10232
[42]
Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, and Pablo García Bringas. 2015. SoK: Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers. 2015 IEEE Symposium on Security and Privacy (2015), 659--673.
[43]
Irfan ul Haq, Sergio Chica, Juan Caballero, and Somesh Jha. 2017. Malware Lineage in the Wild. Comput. Secur. 78 (2017), 347--363.
[44]
Dongpeng Xu, Jiang Ming, Yu Fu, and Dinghao Wu. 2018. VMHunt: A Verifiable Approach to Partially-Virtualized Binary Code Simplification. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (2018).
[45]
Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In CCS. 363--376.
[46]
Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch based vulnerability matching for binary programs. Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (2020).
[47]
Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan, and Yang Liu. 2019. Accurate and Scalable Cross-Architecture Cross-OS Binary Code Search with Emulation. IEEE Transactions on Software Engineering 45 (2019), 1125--1149.
[48]
Can Yang, Zhengzi Xu, Hongxu Chen, Yang Liu, Xiaorui Gong, and Baoxu Liu. 2022. ModX: Binary Level Partially Imported Third-Party Library Detection via Program Modularization and Semantic Matching. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022), 1393--1405.
[49]
Shouguo Yang. 2021. Asteria: Deep Learning-Based for Cross-Platform Binary Code Similarity Detection. In DSN. 13.
[50]
Shouguo Yang, Chaopeng Dong, Yang Xiao, Yiran Cheng, Zhiqiang Shi, Zhi Li, and Limin Sun. 2023. Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge. ArXiv abs/2301.00511 (2023).
[51]
Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection. In AAAI, Vol. 34. 1145--1152.
[52]
Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching. In NeurIPS. 12.
[53]
Zimu Yuan, Muyue Feng, Feng Li, Gu Ban, Yang Xiao, Shiyang Wang, Qian Tang, He Su, Chendong Yu, Jiahuan Xu, Aihua Piao, Jingling Xuey, and Wei Huo. 2019. B2SFinder: Detecting Open-Source Software Reuse in COTS Software. In ASE. 1038--1049.
[54]
Weidong Zhang, Yu Chen, Hong Li, Zhi Li, and Limin Sun. 2018. PANDORA: A Scalable and Efficient Scheme to Extract Version of Binaries in IoT Firmwares. In 2018 IEEE International Conference on Communications (ICC). 1--6.
[55]
Binbin Zhao, Shouling Ji, Jiacheng Xu, Yuan Tian, Qiuyang Wei, Qinying Wang, Chenyang Lyu, Xuhong Zhang, Changting Lin, Jingzheng Wu, and Raheem A. Beyah. 2022. A large-scale empirical analysis of the vulnerabilities introduced by third-party components in IoT firmware. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (2022).

Index Terms

  1. LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    In-Cooperation

    • Faculty of Engineering of University of Porto

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 February 2024

    Check for updates

    Badges

    Author Tags

    1. open-source software
    2. version identification
    3. vulnerability detection
    4. firmware analysis

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 718
      Total Downloads
    • Downloads (Last 12 months)718
    • Downloads (Last 6 weeks)148
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media