research-article

RTFM! Automatic Assumption Discovery and Verification Derivation from Library Document for API Misuse Detection

Authors:

Tao Lv,

Yi Yang,

Luyi XingAuthors Info & Claims

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Pages 1837 - 1852

https://doi.org/10.1145/3372297.3423360

Published: 02 November 2020 Publication History

Get Access

Abstract

To use library APIs, a developer is supposed to follow guidance and respect some constraints, which we call integration assumptions (IAs). Violations of these assumptions can have serious consequences, introducing security-critical flaws such as use-after-free, NULL-dereference, and authentication errors. Analyzing a program for compliance with IAs involves significant effort and needs to be automated. A promising direction is to automatically recover IAs from a library document using Natural Language Processing (NLP) and then verify their consistency with the ways APIs are used in a program through code analysis. However, a practical solution along this line needs to overcome several key challenges, particularly the discovery of IAs from loosely formatted documents and interpretation of their informal descriptions to identify complicated constraints (e.g., data-/control-flow relations between different APIs).

In this paper, we present a new technique for automated assumption discovery and verification derivation from library documents. Our approach, called Advance, utilizes a suite of innovations to address those challenges. More specifically, we leverage the observation that IAs tend to express a strong sentiment in emphasizing the importance of a constraint, particularly those security-critical, and utilize a new sentiment analysis model to accurately recover them from loosely formatted documents. These IAs are further processed to identify hidden references to APIs and parameters, through an embedding model, to identify the information-flow relations expected to be followed. Then our approach runs frequent subtree mining to discover the grammatical units in IA sentences that tend to indicate some categories of constraints that could have security implications. These components are mapped to verification code snippets organized in line with the IA sentence's grammatical structure, and can be assembled into verification code executed through CodeQL to discover misuses inside a program. We implemented this design and evaluated it on 5 popular libraries (OpenSSL, SQLite, libpcap, libdbus and libxml2) and 39 real-world applications. Our analysis discovered 193 API misuses, including 139 flaws never reported before.

Supplementary Material

MOV File (Copy of CCS2020_fpe450_TaoLv - Ami Eckard-Lee.mov)

Presentation video

Download
197.19 MB

References

[1]

2016. stanford Parser. https://nlp.stanford.edu/software/dependencies_manual.pdf. (2016).

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Cooperative API misuse detection using correction rules

API Misuse Correction: A Fuzzy Logic Approach

Demystify official API usage directives with crowdsourced API misuse scenarios, erroneous code examples and patches

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations