research-article

ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search

Authors:

Enes Altinisik,

Hüsrev Taha SencarAuthors Info & Claims

CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

Pages 2247 - 2261

https://doi.org/10.1145/3576915.3623187

Published: 21 November 2023 Publication History

Abstract

We present ProvG-Searcher, a novel approach for detecting known APT behaviors within system security logs. Our approach leverages provenance graphs, a comprehensive graph representation of event logs, to capture and depict data provenance relations by mapping system entities as nodes and their interactions as edges. We formulate the task of searching provenance graphs as a subgraph matching problem and employ a graph representation learning method. The central component of our search methodology involves embedding of subgraphs in a vector space where subgraph relationships can be directly evaluated. We achieve this through the use of order embeddings that simplify subgraph matching to straightforward comparisons between a query and precomputed subgraph representations. To address challenges posed by the size and complexity of provenance graphs, we propose a graph partitioning scheme and a behavior-preserving graph reduction method. Overall, our technique offers significant computational efficiency, allowing most of the search computation to be performed offline while incorporating a lightweight comparison step during query execution. Experimental results on standard datasets demonstrate that ProvG-Searcher achieves superior performance, with an accuracy exceeding 99% in detecting query behaviors and a false positive rate of approximately 0.02%, outperforming other approaches.

References

[1]

Abdulellah Alsaheel, Yuhong Nan, Shiqing Ma, Le Yu, et al. 2021. ATLAS: A Sequence-based Learning Approach for Attack Investigation. In USENIX Security Symposium.

[2]

Ben Athiwaratkun and Andrew Gordon Wilson. 2018. Hierarchical density order embeddings. arXiv preprint arXiv:1804.09843 (2018).

[3]

MITRE ATT&CK. 2021. MITRE ATT&CK. https://attack.mitre.org. Accessed: February 28, 2023.

[4]

Jinheon Baek, Minki Kang, and Sung Ju Hwang. 2021. Accurate learning of graph representations with graph multiset pooling. arXiv preprint arXiv:2102.11533.

[5]

Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, et al. 2019. Simgnn: A neural network approach to fast graph similarity computation. In WSDM.

Digital Library

[6]

Adam Bates, Dave Jing Tian, Kevin RB Butler, and Thomas Moyer. 2015. Trustworthy whole-system provenance for the linux kernel. In USENIX Security Symposium. 319--334.

Digital Library

[7]

Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, and Anis Zouaoui. 2023. A Survey on Malware Detection with Graph Representation Learning. arXiv preprint arXiv:2303.16004 (2023).

[8]

Fenxiao Chen, Yun-Cheng Wang, Bin Wang, and C-C Jay Kuo. 2020. Graph representation learning: a survey. APSIPA (2020), e15.

[9]

Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Xavier Jayaraj Siddarth Ashok, and Philips Kokoh Prasetyo. 2019a. One-class order embedding for dependency relation prediction. In ACM SIGIR. 205--214.

[10]

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, et al. 2019b. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In SIGKDD. 257--266.

Digital Library

[11]

DARPA. 2014. Transparent Computing. http://www.darpa.mil/program/transparent-computing.

[12]

Angjela Davitkova, Damjan Gjurovski, and Sebastian Michel. 2021. LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. arXiv preprint arXiv:2102.10588 (2021).

[13]

Ashita Diwan. 2021. Representation Learning for Vulnerability Detection on Assembly Code. McGill University (Canada).

[14]

Altinisik Enes, Deniz Fatih, and Sencar Husrev Taha. 2023. ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search. arXiv preprint arXiv:2309.03647.

[15]

Pengcheng Fang, Peng Gao, Changlin Liu, Erman Ayday, et al. 2022. Back-Propagating System Dependency Impact for Attack Investigation. In USENIX Security Symposium. 2461--2478.

[16]

Peng Fei, Zhou Li, Zhiying Wang, Xiao Yu, Ding Li, and Kangkook Jee. 2021. SEAL: Storage-efficient Causality Analysis on Enterprise Logs with Query-friendly Compression. In USENIX Security Symposium. 2987--3004.

[17]

Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, et al. 2021. Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence. In ICDE. 193--204.

[18]

W. Hamilton, Z. Ying, and J. Leskovec. 2017a. Inductive Representation Learning on Large Graphs. In NIPS.

[19]

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017b. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. (2017).

[20]

Xueyuan Han, Thomas Pasquier, Adam Bates, James Mickens, and Margo Seltzer. 2020. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525 (2020).

[21]

Wajih Ul Hassan, Lemay Aguse, Nuraini Aguse, Adam Bates, and Thomas Moyer. 2018. Towards scalable cluster auditing through grammatical inference over provenance graphs. In NDSS.

[22]

Wajih Ul Hassan, Adam Bates, and Daniel Marino. 2020a. Tactical provenance analysis for endpoint detection and response systems. In S&P. 1172--1189.

[23]

Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, et al. 2019. Nodoze: Combatting threat alert fatigue with automated provenance triage. In NDSS.

[24]

Wajih Ul Hassan, Ding Li, Kangkook Jee, Xiao Yu, et al. 2020b. This is why we can't cache nice things: Lightning-fast threat hunting using suspicion-based hierarchical storage. In ACSAC. 165--178.

[25]

Wajih Ul Hassan, Mohammad Ali Noureddine, Pubali Datta, and Adam Bates. 2020c. OmegaLog: High-fidelity attack investigation via transparent multi-layer log analysis. In NDSS.

[26]

Md Nahid Hossain, Sadegh M Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, et al. 2017. SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data. In USENIX Security Symposium. 487--504.

[27]

Md Nahid Hossain, Sanaz Sheikhi, and R Sekar. 2020. Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In S&P. 1139--1155.

[28]

Md Nahid Hossain, Junao Wang, R Sekar, and Scott D Stoller. 2018. Dependence-preserving data compaction for scalable forensic analysis. In USENIX Security Symposium. 1723--1740.

[29]

Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous graph transformer. In Proceedings of The Web Conference 2020. 2704--2710.

Digital Library

[30]

Wenbing Huang, Yu Rong, Tingyang Xu, et al. 2020. Tackling over-smoothing for general graph convolutional networks. arXiv preprint arXiv:2008.09864 (2020).

[31]

Arijit Khan, Yinghui Wu, Charu C Aggarwal, and Xifeng Yan. 2013. Nema: Fast graph search with label similarity. VLDB Endowment, Vol. 6, 181--192.

Digital Library

[32]

Samuel T King and Peter M Chen. 2003. Backtracking intrusions. In SOSP. 223--236.

[33]

T. Kipf and M. Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.

[34]

Yonghwi Kwon, Fei Wang, Weihang Wang, et al. 2018. MCI: Modeling-based Causality Inference in Audit Logging for Attack Investigation. In NDSS. 4.

[35]

Zixun Lan, Limin Yu, Linglong Yuan, et al. 2021. Sub-gmn: The subgraph matching network model. arXiv preprint arXiv:2104.00186.

[36]

Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013a. High Accuracy Attack Provenance via Binary-based Execution Partition. In NDSS, Vol. 16.

[37]

Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013b. LogGC: garbage collecting audit log. In SIGSAC. 1005--1016.

[38]

Yujia Li, Chenjie Gu, Thomas Dullien, et al. 2019. Graph matching networks for learning the similarity of graph structured objects. In ICML. 3835--3845.

[39]

Zitong Li, Xiang Cheng, Lixiao Sun, Ji Zhang, and Bing Chen. 2021. A hierarchical approach for advanced persistent threat detection with attention-based graph neural networks. Security and Communication Networks, Vol. 2021 (2021), 1--14.

[40]

Chung-Shou Liao, Kanghao Lu, Michael Baym, Rohit Singh, and Bonnie Berger. 2009. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 12, i253--i258.

Digital Library

[41]

Fucheng Liu, Yu Wen, Dongxue Zhang, et al. 2019b. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In SIGSAC. 1777--1794.

[42]

Lihui Liu, Boxin Du, Hanghang Tong, et al. 2019a. G-finder: Approximate attributed subgraph matching. In IEEE BigData. 513--522.

[43]

Yushan Liu, Mu Zhang, Ding Li, Kangkook Jee, et al. 2018. Towards a Timely Causality Analysis for Enterprise Security. In NDSS.

[44]

Zhaoyu Lou, Jiaxuan You, Chengtao Wen, et al. 2020. Neural subgraph matching. arXiv preprint arXiv:2007.03092.

[45]

Andreas Loukas. 2019. What graph neural networks cannot learn: depth vs width. arXiv preprint arXiv:1907.03199 (2019).

[46]

Yao Lu, Kaizhu Huang, and Cheng-Lin Liu. 2016. A fast projected fixed-point algorithm for large graph matching. Pattern Recognition, 971--982.

[47]

Shiqing Ma, Juan Zhai, Fei Wang, Kyu Hyung Lee, et al. 2017. MPI: Multiple Perspective Attack Investigation with Semantic Aware Execution Partitioning. In USENIX Security Symposium. 1111--1128.

[48]

Shiqing Ma, Xiangyu Zhang, Dongyan Xu, et al. 2016. Protracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting. In NDSS.

[49]

Emaad Manzoor, Sadegh M Milajerdi, and Leman Akoglu. 2016. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In SIGKDD. 1035--1044.

[50]

Noor Michael, Jaron Mink, Jason Liu, Sneha Gaur, et al. 2020. On the forensic validity of approximated audit logs. In ACSAC. 189--202.

[51]

Sadegh M Milajerdi, Birhanu Eshete, Rigel Gjomemo, and VN Venkatakrishnan. 2019a. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In SIGSAC. 1795--1812.

Digital Library

[52]

Sadegh M Milajerdi, Rigel Gjomemo, Birhanu Eshete, Ramachandran Sekar, and VN Venkatakrishnan. 2019b. Holmes: real-time apt detection through correlation of suspicious information flows. In S&P. 1137--1152.

[53]

Luc Moreau, Juliana Freire, Joe Futrelle, Robert E McGrath, et al. 2008. The open provenance model: An overview. In IPAW. 323--326.

[54]

Kiran-Kumar Muniswamy-Reddy and Margo Seltzer. 2010. Provenance as first class cloud data. SIGOPS (2010), 11--16.

[55]

Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, et al. 2017. Practical whole-system provenance capture. In SoCC. 405--418.

[56]

Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing Ma, et al. 2016. Hercule: Attack story reconstruction via community discovery on correlated log graph. In ACSAC. 583--595.

Digital Library

[57]

Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau. 2014. Mage: Matching approximate patterns in richly-attributed graphs. In IEEE BigData. 585--590.

[58]

Indradyumna Roy, Venkata Sai Baba Reddy Velugoti, Soumen Chakrabarti, and Abir De. 2022. Interpretable Neural Subgraph Matching for Graph Retrieval. In AAAI, Vol. 36. 8115--8123.

[59]

Kiavash Satvat, Rigel Gjomemo, and VN Venkatakrishnan. 2021. Extractor: Extracting attack behavior from threat reports. In EuroS&P. 598--615.

[60]

Franco Scarselli, Marco Gori, Ah Chung Tsoi, et al. 2008. The graph neural network model. IEEE transactions on neural networks, Vol. 20, 61--80.

[61]

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, et al. 2018. Modeling relational data with graph convolutional networks. In ESWC 2018. 593--607.

Digital Library

[62]

Joerg Thalheim, Pramod Bhatotia, and Christof Fetzer. 2016. Inspector: data provenance using intel processor trace (pt). In ICDCS. 25--34.

[63]

Yuanyuan Tian, Richard C Mceachin, Carlos Santos, et al. 2007. SAGA: a subgraph matching tool for biological graphs. Bioinformatics, Vol. 23, 232--239.

Digital Library

[64]

Hanghang Tong, Christos Faloutsos, Brian Gallagher, and Tina Eliassi-Rad. 2007. Fast best-effort pattern matching in large attributed graphs. In SIGKDD. 737--746.

[65]

Jacob Torrey. 2020. Transparent Computing Engagement 3 Data Release. https://www.darpa.mil/program/transparent-computing

[66]

Guillem Cucurull, Arantxa Casanova, Adriana Romero, et al. 2018. Graph Attention Networks. In ICLR.

[67]

Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. 2015. Order-embeddings of images and language. arXiv preprint arXiv:1511.06361.

[68]

Luke Vilnis, Xiang Li, Shikhar Murty, and Andrew McCallum. 2018. Probabilistic embedding of knowledge graphs with box lattice measures. arXiv preprint arXiv:1805.06627 (2018).

[69]

Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, et al. 2020a. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. In NDSS.

[70]

Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, et al. 2020b. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. In NDSS.

[71]

Su Wang, Zhiliang Wang, Tao Zhou, Hongbin Sun, Xia Yin, et al. 2022. Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning. TIFS, Vol. 17 (2022), 3972--3987.

[72]

Renzheng Wei, Lijun Cai, Lixin Zhao, Aimin Yu, and Dan Meng. 2021. Deephunter: A graph neural network based approach for robust cyber threat hunting. In SecureComm. Springer, 3--24.

[73]

Yuting Wu, Xiao Liu, Yansong Feng, et al. 2019. Relation-aware entity alignment for heterogeneous knowledge graphs. arXiv preprint arXiv:1908.08210 (2019).

[74]

Yulai Xie, Dan Feng, Yuchong Hu, Yan Li, et al. 2018. Pagoda: A hybrid approach to enable efficient real-time provenance based intrusion detection in big data environments. IEEE TDSC, Vol. 17, 1283--1296.

[75]

Chunlin Xiong, Tiantian Zhu, Weihao Dong, Linqi Ruan, et al. 2020. CONAN: A practical real-time APT detection system with high accuracy and efficiency. IEEE TDSC, Vol. 19, 1, 551--565.

[76]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).

[77]

Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, et al. 2019. Cross-lingual knowledge graph alignment via graph matching neural network. arXiv preprint arXiv:1905.11605 (2019).

[78]

Zhiqiang Xu, Pengcheng Fang, Changlin Liu, et al. 2022. Depcomm: Graph summarization on system audit logs for attack investigation. In S&P. 540--557.

[79]

Zhang Xu, Zhenyu Wu, Zhichun Li, Kangkook Jee, et al. 2016. High fidelity data reduction for big data security dependency analyses. In SIGSAC. 504--516.

[80]

Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, et al. 2021b. Decoupling the depth and scope of graph neural networks. NeurIPS (2021), 19665--19679.

[81]

Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019).

[82]

Jun Zeng, Zheng Leong Chua, Yinfang Chen, et al. 2021a. WATSON: Abstracting Behaviors from Audit Logs via Aggregation of Contextual Semantics. In NDSS.

[83]

Jun Zengy, Xiang Wang, Jiahao Liu, et al. 2022. Shadewatcher: Recommendation-guided cyber threat analysis using system audit records. In S&P. 489--506.

[84]

Tiantian Zhu, Jinkai Yu, Chunlin Xiong, et al. 2023. APTSHIELD: A Stable, Efficient and Real-time APT Detection System for Linux Hosts. IEEE TDSC.

[85]

Bo Zong, Xusheng Xiao, Zhichun Li, et al. 2015. Behavior query discovery in system-generated temporal graphs. arXiv preprint arXiv:1511.05911 (2015).

Cited By

Xu BGong YGeng XLi YDong CLiu SLiu YJiang BLu Z(2024)ProcSAGE: an efficient host threat detection method based on graph representation learningCybersecurity10.1186/s42400-024-00240-w7:1Online publication date: 25-Aug-2024
https://doi.org/10.1186/s42400-024-00240-w
Li HLiu PLin BLiao YHuang Y(2024)IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00036(265-273)Online publication date: 24-Jun-2024
https://doi.org/10.1109/DSN58291.2024.00036
Yu CZhang BKuang BFu A(2024)SNIPER: Detect Complex Attacks Accurately from TrafficInformation Security Practice and Experience10.1007/978-981-97-9053-1_12(205-221)Online publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1007/978-981-97-9053-1_12
Show More Cited By

Index Terms

ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Intrusion detection systems
  2. Systems security

Recommendations

Subgraph Isomorphism Building on A Hierarchical Query Graph
ICCDA '21: Proceedings of the 2021 5th International Conference on Compute and Data Analysis

Subgraph isomorphism is an essential problem of graph theory. It has broad application on information retrieval in many research field, such as biology, chemistry, knowledge graph and social network. The settlement to graph isomorphism is to find the ...
A subgraph matching algorithm based on subgraph index for knowledge graph
Abstract
The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of ...
Graph Representation Learning: Foundations, Methods, Applications and Systems
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Graphs such as social networks and molecular graphs are ubiquitous data structures in the real world. Due to their prevalence, it is of great research importance to extract meaningful patterns from graph structured data so that downstream tasks can be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

November 2023

3722 pages

ISBN:9798400700507

DOI:10.1145/3576915

General Chairs:
Weizhi Meng
Technical University of Denmark
,
Christian D. Jensen
Technical University of Denmark
,
Program Chairs:
Cas Cremers
CISPA Helmholtz Center for Information Security
,
Engin Kirda
Khoury College of Computer Sciences

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '23

Sponsor:

SIGSAC

CCS '23: ACM SIGSAC Conference on Computer and Communications Security

November 26 - 30, 2023

Copenhagen, Denmark

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
658
Total Downloads

Downloads (Last 12 months)658
Downloads (Last 6 weeks)66

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu BGong YGeng XLi YDong CLiu SLiu YJiang BLu Z(2024)ProcSAGE: an efficient host threat detection method based on graph representation learningCybersecurity10.1186/s42400-024-00240-w7:1Online publication date: 25-Aug-2024
https://doi.org/10.1186/s42400-024-00240-w
Li HLiu PLin BLiao YHuang Y(2024)IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00036(265-273)Online publication date: 24-Jun-2024
https://doi.org/10.1109/DSN58291.2024.00036
Yu CZhang BKuang BFu A(2024)SNIPER: Detect Complex Attacks Accurately from TrafficInformation Security Practice and Experience10.1007/978-981-97-9053-1_12(205-221)Online publication date: 25-Oct-2024
https://dl.acm.org/doi/10.1007/978-981-97-9053-1_12
Ma XLiu XLi CYu ZZhang QLv QWang YJiang J(2024)BehaMiner: System Behavior Mining for Audit Log Based on Graph LearningWireless Artificial Intelligent Computing Systems and Applications10.1007/978-3-031-71464-1_28(333-346)Online publication date: 13-Nov-2024
https://doi.org/10.1007/978-3-031-71464-1_28

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents