Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3576915.3623187acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search

Published: 21 November 2023 Publication History

Abstract

We present ProvG-Searcher, a novel approach for detecting known APT behaviors within system security logs. Our approach leverages provenance graphs, a comprehensive graph representation of event logs, to capture and depict data provenance relations by mapping system entities as nodes and their interactions as edges. We formulate the task of searching provenance graphs as a subgraph matching problem and employ a graph representation learning method. The central component of our search methodology involves embedding of subgraphs in a vector space where subgraph relationships can be directly evaluated. We achieve this through the use of order embeddings that simplify subgraph matching to straightforward comparisons between a query and precomputed subgraph representations. To address challenges posed by the size and complexity of provenance graphs, we propose a graph partitioning scheme and a behavior-preserving graph reduction method. Overall, our technique offers significant computational efficiency, allowing most of the search computation to be performed offline while incorporating a lightweight comparison step during query execution. Experimental results on standard datasets demonstrate that ProvG-Searcher achieves superior performance, with an accuracy exceeding 99% in detecting query behaviors and a false positive rate of approximately 0.02%, outperforming other approaches.

References

[1]
Abdulellah Alsaheel, Yuhong Nan, Shiqing Ma, Le Yu, et al. 2021. ATLAS: A Sequence-based Learning Approach for Attack Investigation. In USENIX Security Symposium.
[2]
Ben Athiwaratkun and Andrew Gordon Wilson. 2018. Hierarchical density order embeddings. arXiv preprint arXiv:1804.09843 (2018).
[3]
MITRE ATT&CK. 2021. MITRE ATT&CK. https://attack.mitre.org. Accessed: February 28, 2023.
[4]
Jinheon Baek, Minki Kang, and Sung Ju Hwang. 2021. Accurate learning of graph representations with graph multiset pooling. arXiv preprint arXiv:2102.11533.
[5]
Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, et al. 2019. Simgnn: A neural network approach to fast graph similarity computation. In WSDM.
[6]
Adam Bates, Dave Jing Tian, Kevin RB Butler, and Thomas Moyer. 2015. Trustworthy whole-system provenance for the linux kernel. In USENIX Security Symposium. 319--334.
[7]
Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, and Anis Zouaoui. 2023. A Survey on Malware Detection with Graph Representation Learning. arXiv preprint arXiv:2303.16004 (2023).
[8]
Fenxiao Chen, Yun-Cheng Wang, Bin Wang, and C-C Jay Kuo. 2020. Graph representation learning: a survey. APSIPA (2020), e15.
[9]
Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Xavier Jayaraj Siddarth Ashok, and Philips Kokoh Prasetyo. 2019a. One-class order embedding for dependency relation prediction. In ACM SIGIR. 205--214.
[10]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, et al. 2019b. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In SIGKDD. 257--266.
[11]
DARPA. 2014. Transparent Computing. http://www.darpa.mil/program/transparent-computing.
[12]
Angjela Davitkova, Damjan Gjurovski, and Sebastian Michel. 2021. LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. arXiv preprint arXiv:2102.10588 (2021).
[13]
Ashita Diwan. 2021. Representation Learning for Vulnerability Detection on Assembly Code. McGill University (Canada).
[14]
Altinisik Enes, Deniz Fatih, and Sencar Husrev Taha. 2023. ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search. arXiv preprint arXiv:2309.03647.
[15]
Pengcheng Fang, Peng Gao, Changlin Liu, Erman Ayday, et al. 2022. Back-Propagating System Dependency Impact for Attack Investigation. In USENIX Security Symposium. 2461--2478.
[16]
Peng Fei, Zhou Li, Zhiying Wang, Xiao Yu, Ding Li, and Kangkook Jee. 2021. SEAL: Storage-efficient Causality Analysis on Enterprise Logs with Query-friendly Compression. In USENIX Security Symposium. 2987--3004.
[17]
Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, et al. 2021. Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence. In ICDE. 193--204.
[18]
W. Hamilton, Z. Ying, and J. Leskovec. 2017a. Inductive Representation Learning on Large Graphs. In NIPS.
[19]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017b. Representation Learning on Graphs: Methods and Applications. IEEE Data Eng. Bull. (2017).
[20]
Xueyuan Han, Thomas Pasquier, Adam Bates, James Mickens, and Margo Seltzer. 2020. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525 (2020).
[21]
Wajih Ul Hassan, Lemay Aguse, Nuraini Aguse, Adam Bates, and Thomas Moyer. 2018. Towards scalable cluster auditing through grammatical inference over provenance graphs. In NDSS.
[22]
Wajih Ul Hassan, Adam Bates, and Daniel Marino. 2020a. Tactical provenance analysis for endpoint detection and response systems. In S&P. 1172--1189.
[23]
Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, et al. 2019. Nodoze: Combatting threat alert fatigue with automated provenance triage. In NDSS.
[24]
Wajih Ul Hassan, Ding Li, Kangkook Jee, Xiao Yu, et al. 2020b. This is why we can't cache nice things: Lightning-fast threat hunting using suspicion-based hierarchical storage. In ACSAC. 165--178.
[25]
Wajih Ul Hassan, Mohammad Ali Noureddine, Pubali Datta, and Adam Bates. 2020c. OmegaLog: High-fidelity attack investigation via transparent multi-layer log analysis. In NDSS.
[26]
Md Nahid Hossain, Sadegh M Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, et al. 2017. SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data. In USENIX Security Symposium. 487--504.
[27]
Md Nahid Hossain, Sanaz Sheikhi, and R Sekar. 2020. Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In S&P. 1139--1155.
[28]
Md Nahid Hossain, Junao Wang, R Sekar, and Scott D Stoller. 2018. Dependence-preserving data compaction for scalable forensic analysis. In USENIX Security Symposium. 1723--1740.
[29]
Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous graph transformer. In Proceedings of The Web Conference 2020. 2704--2710.
[30]
Wenbing Huang, Yu Rong, Tingyang Xu, et al. 2020. Tackling over-smoothing for general graph convolutional networks. arXiv preprint arXiv:2008.09864 (2020).
[31]
Arijit Khan, Yinghui Wu, Charu C Aggarwal, and Xifeng Yan. 2013. Nema: Fast graph search with label similarity. VLDB Endowment, Vol. 6, 181--192.
[32]
Samuel T King and Peter M Chen. 2003. Backtracking intrusions. In SOSP. 223--236.
[33]
T. Kipf and M. Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
[34]
Yonghwi Kwon, Fei Wang, Weihang Wang, et al. 2018. MCI: Modeling-based Causality Inference in Audit Logging for Attack Investigation. In NDSS. 4.
[35]
Zixun Lan, Limin Yu, Linglong Yuan, et al. 2021. Sub-gmn: The subgraph matching network model. arXiv preprint arXiv:2104.00186.
[36]
Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013a. High Accuracy Attack Provenance via Binary-based Execution Partition. In NDSS, Vol. 16.
[37]
Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013b. LogGC: garbage collecting audit log. In SIGSAC. 1005--1016.
[38]
Yujia Li, Chenjie Gu, Thomas Dullien, et al. 2019. Graph matching networks for learning the similarity of graph structured objects. In ICML. 3835--3845.
[39]
Zitong Li, Xiang Cheng, Lixiao Sun, Ji Zhang, and Bing Chen. 2021. A hierarchical approach for advanced persistent threat detection with attention-based graph neural networks. Security and Communication Networks, Vol. 2021 (2021), 1--14.
[40]
Chung-Shou Liao, Kanghao Lu, Michael Baym, Rohit Singh, and Bonnie Berger. 2009. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 12, i253--i258.
[41]
Fucheng Liu, Yu Wen, Dongxue Zhang, et al. 2019b. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In SIGSAC. 1777--1794.
[42]
Lihui Liu, Boxin Du, Hanghang Tong, et al. 2019a. G-finder: Approximate attributed subgraph matching. In IEEE BigData. 513--522.
[43]
Yushan Liu, Mu Zhang, Ding Li, Kangkook Jee, et al. 2018. Towards a Timely Causality Analysis for Enterprise Security. In NDSS.
[44]
Zhaoyu Lou, Jiaxuan You, Chengtao Wen, et al. 2020. Neural subgraph matching. arXiv preprint arXiv:2007.03092.
[45]
Andreas Loukas. 2019. What graph neural networks cannot learn: depth vs width. arXiv preprint arXiv:1907.03199 (2019).
[46]
Yao Lu, Kaizhu Huang, and Cheng-Lin Liu. 2016. A fast projected fixed-point algorithm for large graph matching. Pattern Recognition, 971--982.
[47]
Shiqing Ma, Juan Zhai, Fei Wang, Kyu Hyung Lee, et al. 2017. MPI: Multiple Perspective Attack Investigation with Semantic Aware Execution Partitioning. In USENIX Security Symposium. 1111--1128.
[48]
Shiqing Ma, Xiangyu Zhang, Dongyan Xu, et al. 2016. Protracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting. In NDSS.
[49]
Emaad Manzoor, Sadegh M Milajerdi, and Leman Akoglu. 2016. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In SIGKDD. 1035--1044.
[50]
Noor Michael, Jaron Mink, Jason Liu, Sneha Gaur, et al. 2020. On the forensic validity of approximated audit logs. In ACSAC. 189--202.
[51]
Sadegh M Milajerdi, Birhanu Eshete, Rigel Gjomemo, and VN Venkatakrishnan. 2019a. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In SIGSAC. 1795--1812.
[52]
Sadegh M Milajerdi, Rigel Gjomemo, Birhanu Eshete, Ramachandran Sekar, and VN Venkatakrishnan. 2019b. Holmes: real-time apt detection through correlation of suspicious information flows. In S&P. 1137--1152.
[53]
Luc Moreau, Juliana Freire, Joe Futrelle, Robert E McGrath, et al. 2008. The open provenance model: An overview. In IPAW. 323--326.
[54]
Kiran-Kumar Muniswamy-Reddy and Margo Seltzer. 2010. Provenance as first class cloud data. SIGOPS (2010), 11--16.
[55]
Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, et al. 2017. Practical whole-system provenance capture. In SoCC. 405--418.
[56]
Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing Ma, et al. 2016. Hercule: Attack story reconstruction via community discovery on correlated log graph. In ACSAC. 583--595.
[57]
Robert Pienta, Acar Tamersoy, Hanghang Tong, and Duen Horng Chau. 2014. Mage: Matching approximate patterns in richly-attributed graphs. In IEEE BigData. 585--590.
[58]
Indradyumna Roy, Venkata Sai Baba Reddy Velugoti, Soumen Chakrabarti, and Abir De. 2022. Interpretable Neural Subgraph Matching for Graph Retrieval. In AAAI, Vol. 36. 8115--8123.
[59]
Kiavash Satvat, Rigel Gjomemo, and VN Venkatakrishnan. 2021. Extractor: Extracting attack behavior from threat reports. In EuroS&P. 598--615.
[60]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, et al. 2008. The graph neural network model. IEEE transactions on neural networks, Vol. 20, 61--80.
[61]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, et al. 2018. Modeling relational data with graph convolutional networks. In ESWC 2018. 593--607.
[62]
Joerg Thalheim, Pramod Bhatotia, and Christof Fetzer. 2016. Inspector: data provenance using intel processor trace (pt). In ICDCS. 25--34.
[63]
Yuanyuan Tian, Richard C Mceachin, Carlos Santos, et al. 2007. SAGA: a subgraph matching tool for biological graphs. Bioinformatics, Vol. 23, 232--239.
[64]
Hanghang Tong, Christos Faloutsos, Brian Gallagher, and Tina Eliassi-Rad. 2007. Fast best-effort pattern matching in large attributed graphs. In SIGKDD. 737--746.
[65]
Jacob Torrey. 2020. Transparent Computing Engagement 3 Data Release. https://www.darpa.mil/program/transparent-computing
[66]
Guillem Cucurull, Arantxa Casanova, Adriana Romero, et al. 2018. Graph Attention Networks. In ICLR.
[67]
Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. 2015. Order-embeddings of images and language. arXiv preprint arXiv:1511.06361.
[68]
Luke Vilnis, Xiang Li, Shikhar Murty, and Andrew McCallum. 2018. Probabilistic embedding of knowledge graphs with box lattice measures. arXiv preprint arXiv:1805.06627 (2018).
[69]
Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, et al. 2020a. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. In NDSS.
[70]
Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, et al. 2020b. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis. In NDSS.
[71]
Su Wang, Zhiliang Wang, Tao Zhou, Hongbin Sun, Xia Yin, et al. 2022. Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning. TIFS, Vol. 17 (2022), 3972--3987.
[72]
Renzheng Wei, Lijun Cai, Lixin Zhao, Aimin Yu, and Dan Meng. 2021. Deephunter: A graph neural network based approach for robust cyber threat hunting. In SecureComm. Springer, 3--24.
[73]
Yuting Wu, Xiao Liu, Yansong Feng, et al. 2019. Relation-aware entity alignment for heterogeneous knowledge graphs. arXiv preprint arXiv:1908.08210 (2019).
[74]
Yulai Xie, Dan Feng, Yuchong Hu, Yan Li, et al. 2018. Pagoda: A hybrid approach to enable efficient real-time provenance based intrusion detection in big data environments. IEEE TDSC, Vol. 17, 1283--1296.
[75]
Chunlin Xiong, Tiantian Zhu, Weihao Dong, Linqi Ruan, et al. 2020. CONAN: A practical real-time APT detection system with high accuracy and efficiency. IEEE TDSC, Vol. 19, 1, 551--565.
[76]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
[77]
Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, et al. 2019. Cross-lingual knowledge graph alignment via graph matching neural network. arXiv preprint arXiv:1905.11605 (2019).
[78]
Zhiqiang Xu, Pengcheng Fang, Changlin Liu, et al. 2022. Depcomm: Graph summarization on system audit logs for attack investigation. In S&P. 540--557.
[79]
Zhang Xu, Zhenyu Wu, Zhichun Li, Kangkook Jee, et al. 2016. High fidelity data reduction for big data security dependency analyses. In SIGSAC. 504--516.
[80]
Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, et al. 2021b. Decoupling the depth and scope of graph neural networks. NeurIPS (2021), 19665--19679.
[81]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. 2019. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931 (2019).
[82]
Jun Zeng, Zheng Leong Chua, Yinfang Chen, et al. 2021a. WATSON: Abstracting Behaviors from Audit Logs via Aggregation of Contextual Semantics. In NDSS.
[83]
Jun Zengy, Xiang Wang, Jiahao Liu, et al. 2022. Shadewatcher: Recommendation-guided cyber threat analysis using system audit records. In S&P. 489--506.
[84]
Tiantian Zhu, Jinkai Yu, Chunlin Xiong, et al. 2023. APTSHIELD: A Stable, Efficient and Real-time APT Detection System for Linux Hosts. IEEE TDSC.
[85]
Bo Zong, Xusheng Xiao, Zhichun Li, et al. 2015. Behavior query discovery in system-generated temporal graphs. arXiv preprint arXiv:1511.05911 (2015).

Cited By

View all
  • (2024)ProcSAGE: an efficient host threat detection method based on graph representation learningCybersecurity10.1186/s42400-024-00240-w7:1Online publication date: 25-Aug-2024
  • (2024)IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00036(265-273)Online publication date: 24-Jun-2024
  • (2024)SNIPER: Detect Complex Attacks Accurately from TrafficInformation Security Practice and Experience10.1007/978-981-97-9053-1_12(205-221)Online publication date: 25-Oct-2024
  • Show More Cited By

Index Terms

  1. ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
      November 2023
      3722 pages
      ISBN:9798400700507
      DOI:10.1145/3576915
      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 November 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. apt behaviors
      2. graph entailment
      3. graph neural networks
      4. graph reduction
      5. order embeddings
      6. provenance graph
      7. security logs
      8. subgraph matching
      9. threat hunting

      Qualifiers

      • Research-article

      Conference

      CCS '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)658
      • Downloads (Last 6 weeks)66
      Reflects downloads up to 25 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ProcSAGE: an efficient host threat detection method based on graph representation learningCybersecurity10.1186/s42400-024-00240-w7:1Online publication date: 25-Aug-2024
      • (2024)IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58291.2024.00036(265-273)Online publication date: 24-Jun-2024
      • (2024)SNIPER: Detect Complex Attacks Accurately from TrafficInformation Security Practice and Experience10.1007/978-981-97-9053-1_12(205-221)Online publication date: 25-Oct-2024
      • (2024)BehaMiner: System Behavior Mining for Audit Log Based on Graph LearningWireless Artificial Intelligent Computing Systems and Applications10.1007/978-3-031-71464-1_28(333-346)Online publication date: 13-Nov-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media