research-article

Unacceptable Behavior: Robust PDF Malware Detection Using Abstract Interpretation

Authors:

Alexander Jordan,

François Gauthier,

Behnaz Hassanshahi,

David ZhaoAuthors Info & Claims

PLAS'19: Proceedings of the 14th ACM SIGSAC Workshop on Programming Languages and Analysis for Security

Pages 19 - 30

https://doi.org/10.1145/3338504.3357341

Published: 15 November 2019 Publication History

Abstract

The popularity of the PDF format and the rich JavaScript environment that PDF viewers offer make PDF documents an attractive attack vector for malware developers. PDF documents present a serious threat to the security of organizations because most users are unsuspecting of them and thus likely to open documents from untrusted sources.

State-of-the-art approaches use machine learning to learn features that characterize PDF malware, which makes them subject to adversarial attacks that mimic the structure of benign documents. In this paper, we instead propose to detect malicious code inside a PDF by statically reasoning about its possible behavior using abstract interpretation. A comparison with state-of-the-art PDF malware detection tools shows that our conservative abstract interpretation approach achieves similar accuracy, is more resilient to evasion attacks, and provides interpretable reports.

References

[1]

[n. d.]. Apache PDFBox®- A Java PDF Library. https://pdfbox.apache.org/.Accessed 2019-07-08.

[2]

[n. d.]. Contagio - Malware Dump. http://contagiodump.blogspot.com.au/2013/03/16800-clean-and-11960-malicious-files.html. Accessed: 2019-07-08.

[3]

[n. d.]. Cuckoo Sandbox. https://cuckoosandbox.org/. Accessed: 2019-07-08.

[4]

[n. d.]. LiveCycle®Designer ES Scripting Reference. https://help.adobe.com/en_US/livecycle/11.0/DesignerScriptingRef/index.html. Accessed: 2019-07-08.

[5]

[n. d.].Outside In Clean Content.https://www.oracle.com/technetwork/middleware/content-management/oit-all-085236.html. Accessed: 2019-07-08.

[6]

[n. d.]. PDFium. https://pdfium.googlesource.com/pdfium/. Accessed: 2019-07-08.

[7]

[n. d.]. PDF.js - A general-purpose, web standards-based platform for parsingand rendering PDFs. https://mozilla.github.io/pdf.js/. Accessed: 2019-07-08.

[8]

[n. d.]. pdf Pictures - Interactive Electronic Rich Media Documents. http://www.pdfpictures.com/. Accessed: 2019-07-08.

[9]

[n. d.]. VirusShare.com - Because Sharing is Caring. https://virusshare.com/.Accessed: 2019-07-08.

[10]

[n. d.]. VirusTotal. https://www.virustotal.com/. Accessed: 2019-07-08.

[11]

[n. d.]. William W. Cohen: Enron Email Dataset. https://www.cs.cmu.edu/~enron/.Accessed: 2019-07-08.

[12]

>2006. JavaScript for Acrobat API Reference. http://www.adobe.com/devnet/acrobat/javascript.html. Accessed: 2019-07-08.

[13]

2007. JavaScript for Acrobat 3D Annotations API Reference. http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_3d_api_reference.pdf. Accessed: 2019-07-08.

[14]

2008.Document management - Portable document format. Standard. International Organization for Standardization, Geneva, CH.

[15]

2012. The Rise of PDF Malware. https://www.symantec.com/content/en/us/enterprise/media/security_response/whitepapers/the_rise_of_pdf_malware.pdf. Accessed: 2019-07-08

[16]

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion Attacks against Machine Learning at Test Time. Machine Learning and Knowledge Discovery in Databases, Vol. 8190 (2013), 387--402. https://doi.org/10.1007/978-3-642-40994-3_25

[17]

Curtis Carmony, Xunchao Hu, Heng Yin, Abhishek Vasisht Bhaskar, and Mu Zhang. 2016. Extract Me If You Can: Abusing PDF Parsers in Malware Detectors. In Proceedings of the Network and Distributed System Security Symposium. (hard-to-extract file hashes: https://goo.gl/qtbuOC).

[18]

Mihai Christodorescu, Somesh Jha, Sanjit A Seshia, Dawn Song, and Randal E Bryant. 2005. Semantics-Aware Malware Detection. (2005). http://repository.cmu.edu/cgi/viewcontent.cgi?article=1031&context=ece

[19]

Fady Copty, Matan Danos, Orit Edelstein, Cindy Eisner, Dov Murik, and Benjamin Zeltser. 2018. Accurate Malware Detection by Extreme Abstraction. In Proceedings of the 34th Annual Computer Security Applications Conference (ACSAC '18). 101--111.

Digital Library

[20]

Igino Corona, Davide Maiorca, Davide Ariu, and Giorgio Giacinto. 2014. Lux0R: Detection of Malicious PDF-embedded JavaScript code through Discriminant Analysis of API References. Proceedings of the ACM Conference on Computer and Communications Security (CCS) (2014), 47--57. https://doi.org/10.1145/2666652.2666657

Digital Library

[21]

Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints., bibinfonumpages238--252~pages. https://doi.org/10.1145/512950.512973

[22]

Charlie Curtsinger, Benjamin Livshits, Benjamin~G Zorn, and Christian Seifert. 2011. ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection. In USENIX Security Symposium. 33--48.

[23]

Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2008. A Survey on Automated Dynamic Malware-analysis Techniques and Tools. ACM Comput. Surv., Vol. 44, 2, Article 6 (2008), pages 42~pages.

[24]

Saeed Ehteshamifar, Antonio Barresi, Thomas R. Gross, and Michael Pradel. 2019. Easy to Fool? Testing the Anti-evasion Capabilities of PDF Malware Scanners. arXiv e-prints, Article arXiv:1901.05674 (2019). arxiv: 1901.05674 http://arxiv.org/abs/1901.05674

[25]

Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2018. Analysis of classifiers' robustness to adversarial perturbations. Machine Learning, Vol. 107, 3 (Mar 2018), 481--508.

Digital Library

[26]

Prahlad Fogla and Wenke Lee. 2006. Evading Network Anomaly Detection Systems: Formal Reasoning and Practical Techniques. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS '06). 59--68.

Digital Library

[27]

Amir Globerson and Sam Roweis. 2006. Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd international conference on Machine learning. ACM, 353--360.

Digital Library

[28]

S. Guarnieri, M. Pistoia, O. Tripp, J. Dolby, S. Teilhet, and R. Berg. 2011. Saving the World Wide Web from Vulnerable JavaScript. In ISSTA. 177--187.

[29]

Guillaume Endignoux, Olivier Levillain, and Jean-Yves Migeon. 2016. Caradoc: A Pragmatic Approach to PDF Parsing and Validation. 2016 IEEE Security and Privacy Workshops (SPW) (2016), 126--139. https://doi.org/doi.ieeecomputersociety.org/10.1109/SPW.2016.39

[30]

Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. LEMNA: Explaining Deep Learning Based Security Applications. In Conference on Computer and Communications Security (CCS '18). ACM, New York, NY, USA, 364--379.

Digital Library

[31]

John E. Hopcroft, Rajeev Motwani, and Jeffrey~D. Ullman. 2001. Introduction to automata theory, languages, and computation 2nd ed.). Addison Wesely. https://doi.org/10.1145/568438.568455

[32]

Xunchao Hu, Yao Cheng, Yue Duan, Andrew Henderson, and Heng Yin. 2018. JSForce: A forced execution engine for malicious javascript detection. In Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST. https://doi.org/10.1007/978-3-319-78813-5_37 arxiv: 1701.07860

[33]

S. H. Jensen, A. Møller, and P. Thiemann. 2009. Type Analysis for JavaScript. In SAS. 238--255.

[34]

Saruhan Karademir, Thomas Dean, and Sylvain Leblanc. 2013. Using clone detection to find malware in acrobat files. In Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research. IBM Corp., 70--80.

Digital Library

[35]

Johannes Kinder, Stefan Katzenbeisser, Christian Schallhart, and Helmut Veith. 2005. Detecting Malicious Code by Model Checking. Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA'05), Vol. 3548 (2005), 174--187. https://doi.org/10.1007/11506881_11

Digital Library

[36]

Clemens Kolbitsch, Benjamin Livshits, Benjamin Zorn, and Christian Seifert. 2012. ROZZLE: De-cloaking internet malware. In Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, 443--457.

Digital Library

[37]

Pavel Laskov and Nedim Šrndić. 2011. Static Detection of Malicious JavaScript-bearing PDF Documents. Proceedings of the 27th Annual Computer Security Applications Conference (2011), 373--382. https://doi.org/10.1145/2076732.2076785

Digital Library

[38]

Hongki Lee, Sooncheol Won, Joonho Jin, Junhee Cho, and Sukyoung Ryu. 2012. SAFE: Formal specification and implementation of a scalable analysis framework for ECMAScript. In International Workshop on Foundations of Object-Oriented Languages (FOOL).

[39]

Daiping Liu, Haining Wang, and Angelos Stavrou. 2014. Detecting malicious javascript in pdf through document instrumentation. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on. IEEE, 100--111.

Digital Library

[40]

Xun Lu, Jianwei Zhuge, Ruoyu Wang, Yinzhi Cao, and Yan Chen. 2013. De-obfuscation and detection of malicious PDF files with high accuracy. In Proceedings of the Annual Hawaii International Conference on System Sciences. 4890--4899. https://doi.org/10.1109/HICSS.2013.166

Digital Library

[41]

Davide Maiorca, Battista Biggio, and Giorgio Giacinto. 2018. Towards Robust Detection of Adversarial Infection Vectors: Lessons Learned in PDF Malware. arXiv e-prints, Article arXiv:1811.00830 (Nov. 2018). arxiv: cs.CR/1811.00830

[42]

Davide Maiorca and Giorgio Giacinto. 2013. Looking at the Bag is not Enough to Find the Bomb : An Evasion of Structural Methods for Malicious PDF Files Detection. Proceedings of the ASIA CCS'13 (2013), 119--129. https://doi.org/10.1145/2484313.2484327

Digital Library

[43]

Davide Maiorca, Giorgio Giacinto, and Igino Corona. 2012. A pattern recognition system for malicious PDF files detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 7376 LNAI (2012), 510--524. https://doi.org/10.1007/978-3-642-31537-4_40

[44]

Flemming Nielson, Hanne~R Nielson, and Chris Hankin. 1999. Principles of Program Analysis. Springer-Verlag Berlin Heidelberg New York.

[45]

Nir Nissim, Aviad Cohen, Chanan Glezer, and Yuval Elovici. 2015. Detection of malicious PDF files and directions for enhancements: A state-of-the art survey. Computers and Security, Vol. 48 (2015), 246--266. https://doi.org/10.1016/j.cose.2014.10.014

Digital Library

[46]

Nir Nissim, Aviad Cohen, Robert Moskovitch, Asaf Shabtai, Matan Edri, Oren BarAd, and Yuval Elovici. 2016. Keeping pace with the creation of new malicious PDF files using an active-learning based detection framework. Security Informatics, Vol. 5, 1 (2016), 1. https://doi.org/10.1186/s13388-016-0026-3

[47]

Nir Nissim, Aviad Cohen, Robert Moskovitch, Assaf Shabtai, Mattan Edry, Oren Bar-Ad, and Yuval Elovici. 2014. ALPD: Active learning framework for enhancing the detection of malicious PDF files. In Proceedings - 2014 IEEE Joint Intelligence and Security Informatics Conference, JISIC 2014. 91--98. https://doi.org/10.1109/JISIC.2014.23

Digital Library

[48]

Sirinda Palahan, Domagoj Babić, Swarat Chaudhuri, and Daniel Kifer. 2013. Extraction of statistically significant malware behaviors. Proceedings of the 29th Annual Computer Security Applications Conference on - ACSAC '13 (2013), 69--78. https://doi.org/10.1145/2523649.2523659

Digital Library

[49]

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 582--597.

[50]

Mila Dalla Preda, Mihai Christodorescu, Somesh Jha, and Saumya Debray. 2008. A semantics-based approach to malware detection. ACM Transactions on Programming Languages and Systems, Vol. 30, 5 (2008), 1--54. https://doi.org/10.1145/1387673.1387674 arxiv: 179

Digital Library

[51]

Paruj Ratanaworabhan, V Benjamin Livshits, and Benjamin~G Zorn. 2009. NOZZLE: A Defense Against Heap-spraying Code Injection Attacks. In USENIX Security Symposium. 169--186.

[52]

Christiaan Leonard Schade. 2013. FCScan: A new lightweight and effective approach for detecting malicious content in electronic documents. (2013).

[53]

Florian Schmitt, Jan Gassen, and Elmar Gerhards-Padilla. 2012. PDF Scrutinizer: Detecting JavaScript-based attacks in PDF documents. 2012 10th Annual International Conference on Privacy, Security and Trust, PST 2012 (2012), 104--111. https://doi.org/10.1109/PST.2012.6297926

Digital Library

[54]

Charles Smutz and Angelos Stavrou. 2012. Malicious PDF detection using metadata and structural features. Proceedings of the 28th Annual Computer Security Applications Conference on - ACSAC '12 (2012), 239. https://doi.org/10.1145/2420950.2420987

Digital Library

[55]

Charles Smutz and Angelos Stavrou. 2016. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. In 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21--24, 2016.

[56]

Kevin Z. Snow, Srinivas Krishnan, Fabian Monrose, and Niels Provos. 2011. SHELLOS: enabling fast detection and forensic analysis of code injection attacks. In Proceedings of the 20th USENIX conference on Security (SEC'11). 9. http://dl.acm.org/citation.cfm?id=2028067.2028076

Digital Library

[57]

Nedim Šrndić and Pavel Laskov. 2014. Practical evasion of a learning-based classifier: A case study. In Proceedings - IEEE Symposium on Security and Privacy. 197--211. https://doi.org/10.1109/SP.2014.20

[58]

Nedim vS rndić and Pavel Laskov. 2016. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, Vol. 2016, 1 (2016), 22.

Digital Library

[59]

Zacharias Tzermias, Giorgos Sykiotakis, Michalis Polychronakis, and Evangelos~P. Markatos. 2011. Combining static and dynamic analysis for the detection of malicious documents. Proceedings of the Fourth European Workshop on System Security - EUROSEC '11 (2011), 1--6. https://doi.org/10.1145/1972551.1972555

Digital Library

[60]

Cristina Vatamanu, Drago Gavrilu, and Rzvan Benchea. 2012. A practical approach on clustering malicious PDF documents. Journal in Computer Virology, Vol. 8, 4 (nov 2012), 151--163. https://doi.org/10.1007/s11416-012-0166-z

Digital Library

[61]

Carsten Willems, Felix~C. Freiling, and Thorsten Holz. 2012. Using memory management to detect and extract illegitimate code for malware analysis. Proceedings of the 28th Annual Computer Security Applications Conference on - ACSAC '12 (2012), 179. https://doi.org/10.1145/2420950.2420979

Digital Library

[62]

Meng Xu and Taesoo Kim. 2017. PlatPal: Detecting Malicious Documents with Platform Diversit,. USENIX Security '17 (2017). https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-xu-meng.pdf

[63]

Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically evading classifiers. In Proceedings of the Network and Distributed System Security Symposium.

Cited By

Cifuentes CGauthier FHassanshahi BKrishnan PMcCall D(2023)The Role of Program Analysis in Security Vulnerability Detection: Then and NowComputers & Security10.1016/j.cose.2023.103463(103463)Online publication date: Sep-2023
https://doi.org/10.1016/j.cose.2023.103463
Li DLi QYe YXu S(2021)Arms Race in Adversarial Malware Detection: A SurveyACM Computing Surveys10.1145/348449155:1(1-35)Online publication date: 23-Nov-2021
https://dl.acm.org/doi/10.1145/3484491
Herrera A(2020)Optimizing Away JavaScript Obfuscation2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM51674.2020.00029(215-220)Online publication date: Sep-2020
https://doi.org/10.1109/SCAM51674.2020.00029

Index Terms

Unacceptable Behavior: Robust PDF Malware Detection Using Abstract Interpretation
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

A Structural and Content-based Approach for a Precise and Robust Detection of Malicious PDF Files
ICISSP 2015: Proceedings of the 1st International Conference on Information Systems Security and Privacy

During the past years, malicious PDF files have become a serious threat for the security of modern computer

systems. They are characterized by a complex structure and their variety is considerably high. Several solutions

have been academically developed ...
PDF Scrutinizer: Detecting JavaScript-based attacks in PDF documents
PST '12: Proceedings of the 2012 Tenth Annual International Conference on Privacy, Security and Trust (PST)

For a long time PDF documents have arrived in the everyday life of the average computer user, corporate businesses and critical structures, as authorities and military. Due to its wide spread in general, and because out-of-date versions of PDF readers ...
A fuzzy logic approach for detecting redirection spam

Redirection spam is a relatively newer technique whereby spammers redirect the search user to an unwanted webpage or download malware on the victim's machine without his consent. Spammers are making use of chained redirections to hide their nefarious ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLAS'19: Proceedings of the 14th ACM SIGSAC Workshop on Programming Languages and Analysis for Security

November 2019

78 pages

ISBN:9781450368360

DOI:10.1145/3338504

Program Chairs:
Piotr Mardziel
Carnegie Mellon University, USA
,
Niki Vazou
IMDEA Software Institute, Spain

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '19

Sponsor:

SIGSAC

CCS '19: 2019 ACM SIGSAC Conference on Computer and Communications Security

November 15, 2019

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 43 of 77 submissions, 56%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
287
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cifuentes CGauthier FHassanshahi BKrishnan PMcCall D(2023)The Role of Program Analysis in Security Vulnerability Detection: Then and NowComputers & Security10.1016/j.cose.2023.103463(103463)Online publication date: Sep-2023
https://doi.org/10.1016/j.cose.2023.103463
Li DLi QYe YXu S(2021)Arms Race in Adversarial Malware Detection: A SurveyACM Computing Surveys10.1145/348449155:1(1-35)Online publication date: 23-Nov-2021
https://dl.acm.org/doi/10.1145/3484491
Herrera A(2020)Optimizing Away JavaScript Obfuscation2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM51674.2020.00029(215-220)Online publication date: Sep-2020
https://doi.org/10.1109/SCAM51674.2020.00029

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents