Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3391800.3398175acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Published: 23 June 2020 Publication History

Abstract

Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publicly-available audit logs, and there are no pre-existing frameworks that enable push-button creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus' simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus~creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus~hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus~avoids mistakes that lead to non-replicable experiments.

References

[1]
Transparent computing engagement 3 data release, accessed today. https://github.com/darpa-i2o/Transparent-Computing.
[2]
University of New Mexico system call dataset, accessed today. https://www.cs.unm.edu/~immsec/systemcalls.html.
[3]
VagrantCloud, accessed today. https://app.vagrantup.com/boxes/search.
[4]
Nikilesh Balakrishnan, Thomas Bytheway, Ripduman Sohan, and Andy Hopper. Opus: A lightweight system for observational provenance in user space. In Workshop on the Theory and Practice of Provenance. USENIX, 2013.
[5]
Adam Bates, Dave Jing Tian, Kevin~RB Butler, and Thomas Moyer. Trustworthy whole-system provenance for the linux kernel. In Security Symposium, pages 319--334. USENIX, 2015.
[6]
Suresh N Chari and Pau-Chen Cheng. Bluebox: A policy-driven, host-based intrusion detection system. Transactions on Information and System Security, 6(2):173--200, 2003.
[7]
Gideon Creech and Jiankun Hu. Generation of a new ids test dataset: Time to retire the kdd collection. In Wireless Communications and Networking Conference (WCNC), pages 4487--4492. IEEE, 2013.
[8]
Gideon Creech and Jiankun Hu. A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. Transactions on Computers, 63(4):807--819, 2014.
[9]
Ewa Deelman, Victoria Stodden, Michela Taufer, and Von Welch. Initial thoughts on cybersecurity and reproducibility. In International Workshop on Practical Reproducible Evaluation of Computer Systems, pages 13--15, 2019.
[10]
Hussein M Elshafie, Tarek M Mahmoud, and Abdelmgeid A Ali. Improving the performance of the snort intrusion detection using clonal selection. In International Conference on Innovative Trends in Computer Engineering (ITCE), pages 104--110. IEEE, 2019.
[11]
Ashish Gehani and Dawood Tariq. Spade: Support for provenance auditing in distributed environments. In International Middleware Conference, pages 101--120. ACM/IFIP/USENIX, 2012.
[12]
Brendan Gregg and Jim Mauro. DTrace: Dynamic tracing in Oracle Solaris, Mac OS X, and FreeBSD. Prentice Hall Professional, 2011.
[13]
Philip J Guo and Margo Seltzer. Burrito: Wrapping your lab notebook in computational infrastructure. In Workshop on the Theory and Practice of Provenance. USENIX, 2012.
[14]
Waqas Haider, Gideon Creech, Yi~Xie, and Jiankun Hu. Windows based data sets for evaluation of robustness of host based intrusion detection systems (ids) to zero-day and stealth attacks. Future Internet, 8(3):29, 2016.
[15]
Waqas Haider, Jiankun Hu, Jill Slay, Benjamin P Turnbull, and Yi Xie. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. Journal of Network and Computer Applications, 87:185--192, 2017.
[16]
Xueyuan Han, Thomas Pasquier, Adam Bates, James Mickens, and Margo Seltzer. Unicorn: Runtime provenance-based detector for advanced persistent threats. In Symposium on Network and Distributed System Security (NDSS), 2020.
[17]
Xueyuan Han, Thomas Pasquier, Tanvi Ranjan, Mark Goldstein, and Margo Seltzer. Frappuccino: Fault-detection through runtime analysis of provenance. In Workshop on Hot Topics in Cloud Computing (HotCloud). USENIX, 2017.
[18]
Xueyuan Han, Thomas Pasquier, and Margo Seltzer. Provenance-based intrusion detection: opportunities and challenges. In Workshop on the Theory and Practice of Provenance. USENIX, 2018.
[19]
Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, Kangkook Jee, Zhichun Li, and Adam Bates. Nodoze: Combatting threat alert fatigue with automated provenance triage. In Symposium on Network and Distributed System Security (NDSS), 2019.
[20]
Wajih Ul Hassan, Mark Lemay, Nuraini Aguse, Adam Bates, and Thomas Moyer. Towards scalable cluster auditing through grammatical inference over provenance graphs. In Symposium on Network and Distributed System Security (NDSS), 2018.
[21]
Poulmanogo Illy, Georges Kaddoum, Christian Miranda Moreira, Kuljeet Kaur, and Sahil Garg. Securing fog-to-things environment using intrusion detection system based on ensemble learning. In Wireless Communications and Networking Conference (WCNC), pages 1--7. IEEE, 2019.
[22]
Xuxian Jiang, AAron Walters, Dongyan Xu, Eugene H Spafford, Florian Buchholz, and Yi-Min Wang. Provenance-aware tracing of worm break-in and contaminations: A process coloring approach. In International Conference on Distributed Computing Systems (ICDCS), pages 38--38. IEEE, 2006.
[23]
Xuxian Jiang, Dongyan Xu, Helen~J Wang, and Eugene H Spafford. Virtual playgrounds for worm behavior investigation. In International Workshop on Recent Advances in Intrusion Detection, pages 1--21. Springer, 2005.
[24]
Ivo Jimenez, Michael Sevilla, Noah Watkins, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. The popper convention: Making reproducible systems evaluation practical. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1561--1570. IEEE, 2017.
[25]
David Kennedy, Jim O'gorman, Devon Kearns, and Mati Aharoni. Metasploit: The penetration tester's guide. No Starch Press, 2011.
[26]
Gary King. An introduction to the dataverse network as an infrastructure for data sharing, 2007.
[27]
Peter Lichodzijewski, A Nur Zincir-Heywood, and Malcolm I Heywood. Host-based intrusion detection using self-organizing maps. In International Joint Conference on Neural Networks, volume~2, pages 1714--1719. IEEE, 2002.
[28]
Richard Lippmann, Joshua W Haines, David~J Fried, Jonathan Korba, and Kumar Das. The 1999 darpa off-line intrusion detection evaluation. Computer Networks, 34(4):579--595, 2000.
[29]
Yushan Liu, Mu Zhang, Ding Li, Kangkook Jee, Zhichun Li, Zhenyu Wu, Junghwan Rhee, and Prateek Mittal. Towards a timely causality analysis for enterprise security. In Symposium on Network and Distributed System Security (NDSS), 2018.
[30]
Federico Maggi, Matteo Matteucci, and Stefano Zanero. Detecting intrusions through system call sequence and argument analysis. Transactions on Dependable and Secure Computing, 7(4):381--395, 2010.
[31]
Matthew V Mahoney and Philip~K Chan. An analysis of the 1999 darpa/lincoln laboratory evaluation data for network anomaly detection. In International Workshop on Recent Advances in Intrusion Detection, pages 220--237. Springer, 2003.
[32]
Emaad Manzoor, Sadegh M Milajerdi, and Leman Akoglu. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1035--1044. ACM, 2016.
[33]
John McHugh. Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory. ACM Transactions on Information and System Security, 3(4):262--294, 2000.
[34]
Sadegh~M Milajerdi, Rigel Gjomemo, Birhanu Eshete, R Sekar, and VN Venkatakrishnan. Holmes: real-time apt detection through correlation of suspicious information flows. In Symposium on Security and Privacy (SP), pages 1137--1152. IEEE, 2019.
[35]
HD~Moore. Metasploitable 2 exploitability guide. Retrieved June, 27:2013, 2012.
[36]
Kiran-Kumar Muniswamy-Reddy, David~A Holland, Uri Braun, and Margo I Seltzer. Provenance-aware storage systems. In Annual Technical Conference, pages 43--56. USENIX, 2006.
[37]
Syed~Shariyar Murtaza, Wael Khreich, Abdelwahab Hamou-Lhadj, and Mario Couture. A host-based anomaly detection approach by representing system calls as states of kernel modules. In International Symposium on Software Reliability Engineering (ISSRE), pages 431--440. IEEE, 2013.
[38]
National Academies of Sciences, Engineering, and Medicine et al. Reproducibility and replicability in science. National Academies Press, 2019.
[39]
Joshua~Ojo Nehinbe. A critical evaluation of datasets for investigating idss and ipss researches. In International Conference on Cybernetic Intelligent Systems (CIS), pages 92--97. IEEE, 2011.
[40]
Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Eyers, Margo Seltzer, and Jean Bacon. Practical whole-system provenance capture. In Symposium on Cloud Computing, pages 405--418. ACM, 2017.
[41]
Thomas Pasquier, Xueyuan Han, Thomas Moyer, Adam Bates, Olivier Hermant, David Eyers, Jean Bacon, and Margo Seltzer. Runtime analysis of whole-system provenance. In Conference on Computer and Communications Security (CCS). ACM, 2018.
[42]
Marcus Pendleton and Shouhuai Xu. A dataset generator for next generation system call host intrusion detection systems. In Military Communications Conference (MILCOM), pages 231--236. IEEE, 2017.
[43]
Devin~J Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler. Hi-Fi: Collecting high-fidelity whole-system provenance. In Annual Computer Security Applications Conference, pages 259--268. ACM, 2012.
[44]
Haakon Ringberg, Matthew Roughan, and Jennifer Rexford. The need for simulation in evaluating anomaly detectors. SIGCOMM Computer Communication Review, 38(1):55--59, 2008.
[45]
Iman Sharafaldin, Arash Habibi Lashkari, and Ali~A Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In International Conference on Information Systems Security and Privacy (ICISSP), pages 108--116, 2018.
[46]
Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali~A Ghorbani. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, 31(3):357--374, 2012.
[47]
Xiaokui Shu, Danfeng~Daphne Yao, Naren Ramakrishnan, and Trent Jaeger. Long-span program behavior modeling and attack detection. Transactions on Privacy and Security (TOPS), 20(4):12, 2017.
[48]
Richard~P Spillane, Russell Sears, Chaitanya Yalamanchili, Sachin Gaikwad, Manjunath Chinni, and Erez Zadok. Story book: An efficient extensible provenance framework. In Workshop on the Theory and Practice of Provenance. USENIX, 2009.
[49]
Mahbod Tavallaee, Natalia Stakhanova, and Ali~Akbar Ghorbani. Toward credible evaluation of anomaly-based intrusion-detection methods. Transactions on Systems, Man, and Cybernetics, 40(5):516--524, 2010.
[50]
Qi~Wang, Wajih~Ul Hassan, Ding Li, Kangkook Jee, Xiao Yu, Kexuan Zou, Junghwan Rhee, Zhengzhang Chen, Wei Cheng, C~Gunter, et~al. You are what you do: Hunting stealthy malware via data provenance analysis. In Symposium on Network and Distributed System Security (NDSS), 2020.

Cited By

View all
  • (2023)Paradise: Real-Time, Generalized, and Distributed Provenance-Based Intrusion DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.316087920:2(1624-1640)Online publication date: 1-Mar-2023
  • (2023)TeSec: Accurate Server-side Attack Investigation for Web Applications2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179402(2799-2816)Online publication date: May-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
P-RECS '20: Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems
June 2020
38 pages
ISBN:9781450379779
DOI:10.1145/3391800
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. IDS
  2. anomaly detection
  3. audit log
  4. computer security
  5. data provenance
  6. data replicability
  7. intrusion detection systems
  8. penetration testing

Qualifiers

  • Short-paper

Funding Sources

Conference

HPDC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 22 of 106 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Paradise: Real-Time, Generalized, and Distributed Provenance-Based Intrusion DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.316087920:2(1624-1640)Online publication date: 1-Mar-2023
  • (2023)TeSec: Accurate Server-side Attack Investigation for Web Applications2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179402(2799-2816)Online publication date: May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media