Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3338504.3357341acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Unacceptable Behavior: Robust PDF Malware Detection Using Abstract Interpretation

Published: 15 November 2019 Publication History

Abstract

The popularity of the PDF format and the rich JavaScript environment that PDF viewers offer make PDF documents an attractive attack vector for malware developers. PDF documents present a serious threat to the security of organizations because most users are unsuspecting of them and thus likely to open documents from untrusted sources.
State-of-the-art approaches use machine learning to learn features that characterize PDF malware, which makes them subject to adversarial attacks that mimic the structure of benign documents. In this paper, we instead propose to detect malicious code inside a PDF by statically reasoning about its possible behavior using abstract interpretation. A comparison with state-of-the-art PDF malware detection tools shows that our conservative abstract interpretation approach achieves similar accuracy, is more resilient to evasion attacks, and provides interpretable reports.

References

[1]
[n. d.]. Apache PDFBox®- A Java PDF Library. https://pdfbox.apache.org/.Accessed 2019-07-08.
[2]
[n. d.]. Contagio - Malware Dump. http://contagiodump.blogspot.com.au/2013/03/16800-clean-and-11960-malicious-files.html. Accessed: 2019-07-08.
[3]
[n. d.]. Cuckoo Sandbox. https://cuckoosandbox.org/. Accessed: 2019-07-08.
[4]
[n. d.]. LiveCycle®Designer ES Scripting Reference. https://help.adobe.com/en_US/livecycle/11.0/DesignerScriptingRef/index.html. Accessed: 2019-07-08.
[5]
[n. d.].Outside In Clean Content.https://www.oracle.com/technetwork/middleware/content-management/oit-all-085236.html. Accessed: 2019-07-08.
[6]
[n. d.]. PDFium. https://pdfium.googlesource.com/pdfium/. Accessed: 2019-07-08.
[7]
[n. d.]. PDF.js - A general-purpose, web standards-based platform for parsingand rendering PDFs. https://mozilla.github.io/pdf.js/. Accessed: 2019-07-08.
[8]
[n. d.]. pdf Pictures - Interactive Electronic Rich Media Documents. http://www.pdfpictures.com/. Accessed: 2019-07-08.
[9]
[n. d.]. VirusShare.com - Because Sharing is Caring. https://virusshare.com/.Accessed: 2019-07-08.
[10]
[n. d.]. VirusTotal. https://www.virustotal.com/. Accessed: 2019-07-08.
[11]
[n. d.]. William W. Cohen: Enron Email Dataset. https://www.cs.cmu.edu/~enron/.Accessed: 2019-07-08.
[12]
>2006. JavaScript for Acrobat API Reference. http://www.adobe.com/devnet/acrobat/javascript.html. Accessed: 2019-07-08.
[13]
2007. JavaScript for Acrobat 3D Annotations API Reference. http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_3d_api_reference.pdf. Accessed: 2019-07-08.
[14]
2008.Document management - Portable document format. Standard. International Organization for Standardization, Geneva, CH.
[15]
2012. The Rise of PDF Malware. https://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/the_rise_of_pdf_malware.pdf. Accessed: 2019-07-08
[16]
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion Attacks against Machine Learning at Test Time. Machine Learning and Knowledge Discovery in Databases, Vol. 8190 (2013), 387--402. https://doi.org/10.1007/978-3-642-40994-3_25
[17]
Curtis Carmony, Xunchao Hu, Heng Yin, Abhishek Vasisht Bhaskar, and Mu Zhang. 2016. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors. In Proceedings of the Network and Distributed System Security Symposium. (hard-to-extract file hashes: https://goo.gl/qtbuOC).
[18]
Mihai Christodorescu, Somesh Jha, Sanjit A Seshia, Dawn Song, and Randal E Bryant. 2005. Semantics-Aware Malware Detection. (2005). http://repository.cmu.edu/cgi/viewcontent.cgi?article=1031&context=ece
[19]
Fady Copty, Matan Danos, Orit Edelstein, Cindy Eisner, Dov Murik, and Benjamin Zeltser. 2018. Accurate Malware Detection by Extreme Abstraction. In Proceedings of the 34th Annual Computer Security Applications Conference (ACSAC '18). 101--111.
[20]
Igino Corona, Davide Maiorca, Davide Ariu, and Giorgio Giacinto. 2014. Lux0R: Detection of Malicious PDF-embedded JavaScript code through Discriminant Analysis of API References. Proceedings of the ACM Conference on Computer and Communications Security (CCS) (2014), 47--57. https://doi.org/10.1145/2666652.2666657
[21]
Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints., bibinfonumpages238--252~pages. https://doi.org/10.1145/512950.512973
[22]
Charlie Curtsinger, Benjamin Livshits, Benjamin~G Zorn, and Christian Seifert. 2011. ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection. In USENIX Security Symposium. 33--48.
[23]
Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2008. A Survey on Automated Dynamic Malware-analysis Techniques and Tools. ACM Comput. Surv., Vol. 44, 2, Article 6 (2008), pages 42~pages.
[24]
Saeed Ehteshamifar, Antonio Barresi, Thomas R. Gross, and Michael Pradel. 2019. Easy to Fool? Testing the Anti-evasion Capabilities of PDF Malware Scanners. arXiv e-prints, Article arXiv:1901.05674 (2019). arxiv: 1901.05674 http://arxiv.org/abs/1901.05674
[25]
Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2018. Analysis of classifiers' robustness to adversarial perturbations. Machine Learning, Vol. 107, 3 (Mar 2018), 481--508.
[26]
Prahlad Fogla and Wenke Lee. 2006. Evading Network Anomaly Detection Systems: Formal Reasoning and Practical Techniques. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS '06). 59--68.
[27]
Amir Globerson and Sam Roweis. 2006. Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd international conference on Machine learning. ACM, 353--360.
[28]
S. Guarnieri, M. Pistoia, O. Tripp, J. Dolby, S. Teilhet, and R. Berg. 2011. Saving the World Wide Web from Vulnerable JavaScript. In ISSTA. 177--187.
[29]
Guillaume Endignoux, Olivier Levillain, and Jean-Yves Migeon. 2016. Caradoc: A Pragmatic Approach to PDF Parsing and Validation. 2016 IEEE Security and Privacy Workshops (SPW) (2016), 126--139. https://doi.org/doi.ieeecomputersociety.org/10.1109/SPW.2016.39
[30]
Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. LEMNA: Explaining Deep Learning Based Security Applications. In Conference on Computer and Communications Security (CCS '18). ACM, New York, NY, USA, 364--379.
[31]
John E. Hopcroft, Rajeev Motwani, and Jeffrey~D. Ullman. 2001. Introduction to automata theory, languages, and computation 2nd ed.). Addison Wesely. https://doi.org/10.1145/568438.568455
[32]
Xunchao Hu, Yao Cheng, Yue Duan, Andrew Henderson, and Heng Yin. 2018. JSForce: A forced execution engine for malicious javascript detection. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST. https://doi.org/10.1007/978-3-319-78813-5_37 arxiv: 1701.07860
[33]
S. H. Jensen, A. Møller, and P. Thiemann. 2009. Type Analysis for JavaScript. In SAS. 238--255.
[34]
Saruhan Karademir, Thomas Dean, and Sylvain Leblanc. 2013. Using clone detection to find malware in acrobat files. In Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research. IBM Corp., 70--80.
[35]
Johannes Kinder, Stefan Katzenbeisser, Christian Schallhart, and Helmut Veith. 2005. Detecting Malicious Code by Model Checking. Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA'05), Vol. 3548 (2005), 174--187. https://doi.org/10.1007/11506881_11
[36]
Clemens Kolbitsch, Benjamin Livshits, Benjamin Zorn, and Christian Seifert. 2012. ROZZLE: De-cloaking internet malware. In Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, 443--457.
[37]
Pavel Laskov and Nedim Šrndić. 2011. Static Detection of Malicious JavaScript-bearing PDF Documents. Proceedings of the 27th Annual Computer Security Applications Conference (2011), 373--382. https://doi.org/10.1145/2076732.2076785
[38]
Hongki Lee, Sooncheol Won, Joonho Jin, Junhee Cho, and Sukyoung Ryu. 2012. SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript. In International Workshop on Foundations of Object-Oriented Languages (FOOL).
[39]
Daiping Liu, Haining Wang, and Angelos Stavrou. 2014. Detecting malicious javascript in pdf through document instrumentation. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on. IEEE, 100--111.
[40]
Xun Lu, Jianwei Zhuge, Ruoyu Wang, Yinzhi Cao, and Yan Chen. 2013. De-obfuscation and detection of malicious PDF files with high accuracy. In Proceedings of the Annual Hawaii International Conference on System Sciences. 4890--4899. https://doi.org/10.1109/HICSS.2013.166
[41]
Davide Maiorca, Battista Biggio, and Giorgio Giacinto. 2018. Towards Robust Detection of Adversarial Infection Vectors: Lessons Learned in PDF Malware. arXiv e-prints, Article arXiv:1811.00830 (Nov. 2018). arxiv: cs.CR/1811.00830
[42]
Davide Maiorca and Giorgio Giacinto. 2013. Looking at the Bag is not Enough to Find the Bomb : An Evasion of Structural Methods for Malicious PDF Files Detection. Proceedings of the ASIA CCS'13 (2013), 119--129. https://doi.org/10.1145/2484313.2484327
[43]
Davide Maiorca, Giorgio Giacinto, and Igino Corona. 2012. A pattern recognition system for malicious PDF files detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 7376 LNAI (2012), 510--524. https://doi.org/10.1007/978-3-642-31537-4_40
[44]
Flemming Nielson, Hanne~R Nielson, and Chris Hankin. 1999. Principles of Program Analysis. Springer-Verlag Berlin Heidelberg New York.
[45]
Nir Nissim, Aviad Cohen, Chanan Glezer, and Yuval Elovici. 2015. Detection of malicious PDF files and directions for enhancements: A state-of-the art survey. Computers and Security, Vol. 48 (2015), 246--266. https://doi.org/10.1016/j.cose.2014.10.014
[46]
Nir Nissim, Aviad Cohen, Robert Moskovitch, Asaf Shabtai, Matan Edri, Oren BarAd, and Yuval Elovici. 2016. Keeping pace with the creation of new malicious PDF files using an active-learning based detection framework. Security Informatics, Vol. 5, 1 (2016), 1. https://doi.org/10.1186/s13388-016-0026-3
[47]
Nir Nissim, Aviad Cohen, Robert Moskovitch, Assaf Shabtai, Mattan Edry, Oren Bar-Ad, and Yuval Elovici. 2014. ALPD: Active learning framework for enhancing the detection of malicious PDF files. In Proceedings - 2014 IEEE Joint Intelligence and Security Informatics Conference, JISIC 2014. 91--98. https://doi.org/10.1109/JISIC.2014.23
[48]
Sirinda Palahan, Domagoj Babić, Swarat Chaudhuri, and Daniel Kifer. 2013. Extraction of statistically significant malware behaviors. Proceedings of the 29th Annual Computer Security Applications Conference on - ACSAC '13 (2013), 69--78. https://doi.org/10.1145/2523649.2523659
[49]
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 582--597.
[50]
Mila Dalla Preda, Mihai Christodorescu, Somesh Jha, and Saumya Debray. 2008. A semantics-based approach to malware detection. ACM Transactions on Programming Languages and Systems, Vol. 30, 5 (2008), 1--54. https://doi.org/10.1145/1387673.1387674 arxiv: 179
[51]
Paruj Ratanaworabhan, V Benjamin Livshits, and Benjamin~G Zorn. 2009. NOZZLE: A Defense Against Heap-spraying Code Injection Attacks. In USENIX Security Symposium. 169--186.
[52]
Christiaan Leonard Schade. 2013. FCScan: A new lightweight and effective approach for detecting malicious content in electronic documents. (2013).
[53]
Florian Schmitt, Jan Gassen, and Elmar Gerhards-Padilla. 2012. PDF Scrutinizer: Detecting JavaScript-based attacks in PDF documents. 2012 10th Annual International Conference on Privacy, Security and Trust, PST 2012 (2012), 104--111. https://doi.org/10.1109/PST.2012.6297926
[54]
Charles Smutz and Angelos Stavrou. 2012. Malicious PDF detection using metadata and structural features. Proceedings of the 28th Annual Computer Security Applications Conference on - ACSAC '12 (2012), 239. https://doi.org/10.1145/2420950.2420987
[55]
Charles Smutz and Angelos Stavrou. 2016. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. In 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21--24, 2016.
[56]
Kevin Z. Snow, Srinivas Krishnan, Fabian Monrose, and Niels Provos. 2011. SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In Proceedings of the 20th USENIX conference on Security (SEC'11). 9. http://dl.acm.org/citation.cfm?id=2028067.2028076
[57]
Nedim Šrndić and Pavel Laskov. 2014. Practical evasion of a learning-based classifier: A case study. In Proceedings - IEEE Symposium on Security and Privacy. 197--211. https://doi.org/10.1109/SP.2014.20
[58]
Nedim vS rndić and Pavel Laskov. 2016. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, Vol. 2016, 1 (2016), 22.
[59]
Zacharias Tzermias, Giorgos Sykiotakis, Michalis Polychronakis, and Evangelos~P. Markatos. 2011. Combining static and dynamic analysis for the detection of malicious documents. Proceedings of the Fourth European Workshop on System Security - EUROSEC '11 (2011), 1--6. https://doi.org/10.1145/1972551.1972555
[60]
Cristina Vatamanu, Drago Gavrilu, and Rzvan Benchea. 2012. A practical approach on clustering malicious PDF documents. Journal in Computer Virology, Vol. 8, 4 (nov 2012), 151--163. https://doi.org/10.1007/s11416-012-0166-z
[61]
Carsten Willems, Felix~C. Freiling, and Thorsten Holz. 2012. Using memory management to detect and extract illegitimate code for malware analysis. Proceedings of the 28th Annual Computer Security Applications Conference on - ACSAC '12 (2012), 179. https://doi.org/10.1145/2420950.2420979
[62]
Meng Xu and Taesoo Kim. 2017. PlatPal: Detecting Malicious Documents with Platform Diversit,. USENIX Security '17 (2017). https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-xu-meng.pdf
[63]
Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically evading classifiers. In Proceedings of the Network and Distributed System Security Symposium.

Cited By

View all
  • (2023)The Role of Program Analysis in Security Vulnerability Detection: Then and NowComputers & Security10.1016/j.cose.2023.103463(103463)Online publication date: Sep-2023
  • (2021)Arms Race in Adversarial Malware Detection: A SurveyACM Computing Surveys10.1145/348449155:1(1-35)Online publication date: 23-Nov-2021
  • (2020)Optimizing Away JavaScript Obfuscation2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM51674.2020.00029(215-220)Online publication date: Sep-2020

Index Terms

  1. Unacceptable Behavior: Robust PDF Malware Detection Using Abstract Interpretation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLAS'19: Proceedings of the 14th ACM SIGSAC Workshop on Programming Languages and Analysis for Security
    November 2019
    78 pages
    ISBN:9781450368360
    DOI:10.1145/3338504
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. abstract interpretation
    2. javascript
    3. malware
    4. pdf

    Qualifiers

    • Research-article

    Conference

    CCS '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 43 of 77 submissions, 56%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)The Role of Program Analysis in Security Vulnerability Detection: Then and NowComputers & Security10.1016/j.cose.2023.103463(103463)Online publication date: Sep-2023
    • (2021)Arms Race in Adversarial Malware Detection: A SurveyACM Computing Surveys10.1145/348449155:1(1-35)Online publication date: 23-Nov-2021
    • (2020)Optimizing Away JavaScript Obfuscation2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM51674.2020.00029(215-220)Online publication date: Sep-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media