Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2939672.2939712acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs

Published: 13 August 2016 Publication History

Abstract

We focus on the problem of detecting anomalous run-time behavior of distributed applications from their execution logs. Specifically we mine templates and template sequences from logs to form a control flow graph (cfg) spanning distributed components. This cfg represents the baseline healthy system state and is used to flag deviations from the expected behavior of runtime logs. The novelty in our work stems from the new techniques employed to: (1) overcome the instrumentation requirements or application specific assumptions made in prior log mining approaches, (2) improve the accuracy of mined templates and the cfg in the presence of long parameters and high amount of interleaving respectively, and (3) improve by orders of magnitude the scalability of the cfg mining process in terms of volume of log data that can be processed per day. We evaluate our approach using (a) synthetic log traces and (b) multiple real-world log datasets collected at different layers of application stack. Results demonstrate that our template mining, cfg mining, and anomaly detection algorithms have high accuracy. The distributed implementation of our pipeline is highly scalable and has more than 500 GB/day of log data processing capability even on a 10 low-end VM based (Spark + Hadoop) cluster. We also demonstrate the efficacy of our end-to-end system using a case study with the Openstack VM provisioning system.

Supplementary Material

MP4 File (kdd2016_nandi_graph_mining_01-acm.mp4)

References

[1]
B. Abrahao, F. ChieriChetti, R. Kleinberg, and A. Panconesi. Trace complexity of network inference. In KDD, 2013.
[2]
Appdynamics. www.appdynamics.com.
[3]
I. Beschastnikh, Y. Brun, M. D. Ernst, A. Krishnamurthy, and T. E. Anderson. Mining temporal invariants from partially ordered logs. In SLAML, 2011.
[4]
D. Bosnacki, W. Lightenberg, M. Odenbrett, A. Wijs, and P. Hilbers. Parallel algorithms for transitive reduction of weighted graphs. In Math Maced, 2010.
[5]
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations without support pruning. In TKDE, 2001.
[6]
W. V. der Aalst, T. Weijters, and L. Maruster. Workflow mining: Discovering process models from event logs. In TKDE, 2004.
[7]
Elasticsearch: Search and analyze data in real time. https://www.elastic.co/products/elasticsearch.
[8]
D. R. Ferreira and D. Gillblad. Discovering process models from unlabelled event logs. In BPM, 2009.
[9]
Q. Fu, J.-G. Lou, Y. Wang, and J. Li. Execution anomaly detection in distributed systems through unstructured log analysis. In ICDM, 2009.
[10]
M. Gomez-Rodriguez, J. Leskovec, and A. Krause. Inferring networks of diffusion and influence. In KDD, 2010.
[11]
C. W. Gunther and W. M. van der Aalst. Fuzzy mining - adaptive process simplification based on multi-perspective metrics. In BPM, 2007.
[12]
P. Indyk and R. Motwani. Approximate nearest neighbor - towards removing the curse of dimensionality. In STOC, 1998.
[13]
X. Ju, L. Soares, K. G. Shin, K. D. Ryu, and D. D. Silva. On fault resilience of openstack. In SOCC, 2013.
[14]
Kafka: A high-throughput distributed messaging system. http://kafka.apache.org.
[15]
T. Li, F. Liang, S. Ma, and W. Peng. An integrated framework on mining log files for computing system management. In KDD, 2005.
[16]
Logstash: Collect, enrich and transform data. https://www.elastic.co/products/logstash.
[17]
J.-G. Lou, Q. Fu, Y. Wang, and J. Li. Mining dependency in distributed systems through unstructured logs analysis. In SIGOPS Operation Systems Review, 2010.
[18]
J.-G. Lou, Q. Fu, S. Yang, J. Li, and B. Wu. Mining program workflow from interleaved traces. In KDD, 2010.
[19]
A. Makanju, A. Z. Heywood, and E. E. Milios. Clustering event logs using iterative partitioning. In KDD, 2009.
[20]
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. In DMKD, 1997.
[21]
Openstack: Open-source software for creating public and private clouds. www.openstack.org.
[22]
J. Pei, J. Han, B. Mortazavi, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, 2001.
[23]
T. Reidimeister, M. Jiang, and P. A. Ward. Mining unstructured log files for recurrent fault diagnosis. In IM, 2011.
[24]
Spark: Lightning-fast cluster computing. http://spark.apache.org.
[25]
J. Stearley. Towards informatic analysis of syslogs. In Cluster, 2004.
[26]
R. Vaarandi. A data clustering algorithm for mining patterns from event logs. In IPOM, 2003.
[27]
R. Vaarandi. A breadth first algorithm for mining frequent patterns from event logs. In Intell. Comm., 2004.
[28]
R. Vaarandi and M. Pihelgas. Logcluster - a data clustering and pattern mining algorithm for event logs. In CNSM, 2015.
[29]
T. Wu, Y. Chen, and J. Han. Association mining in large datasets: A re-examination of its measures. In PKDD, 2009.
[30]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan. Online system problem detection by mining patterns of console logs. In ICDM, 2009.
[31]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan. Detecting large scale system problems by mining console logs. In ICML, 2010.

Cited By

View all
  • (2024)MetaLog: Generalizable Cross-System Anomaly Detection from Logs with Meta-LearningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639205(1-12)Online publication date: 20-May-2024
  • (2024)PBScaler: A Bottleneck-Aware Autoscaling Framework for Microservice-Based ApplicationsIEEE Transactions on Services Computing10.1109/TSC.2024.337620217:2(604-616)Online publication date: Mar-2024
  • (2024)FGF-Bert: A modified Bert-based approach with a fine-grained feature fusion strategy for Binary Code Similarity Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650405(1-8)Online publication date: 30-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. application monitoring
  2. control flow mining
  3. distributed applications
  4. execution logs
  5. log analytics
  6. spark
  7. template

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)13
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MetaLog: Generalizable Cross-System Anomaly Detection from Logs with Meta-LearningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639205(1-12)Online publication date: 20-May-2024
  • (2024)PBScaler: A Bottleneck-Aware Autoscaling Framework for Microservice-Based ApplicationsIEEE Transactions on Services Computing10.1109/TSC.2024.337620217:2(604-616)Online publication date: Mar-2024
  • (2024)FGF-Bert: A modified Bert-based approach with a fine-grained feature fusion strategy for Binary Code Similarity Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650405(1-8)Online publication date: 30-Jun-2024
  • (2024)Probabilistic Temporal Fusion Transformers for Large-Scale KPI Anomaly DetectionIEEE Access10.1109/ACCESS.2024.335320112(9123-9137)Online publication date: 2024
  • (2024)End-to-end log statement generation at block-levelJournal of Systems and Software10.1016/j.jss.2024.112146216(112146)Online publication date: Oct-2024
  • (2024)ServiceAnomaly: An anomaly detection approach in microservices using distributed traces and profiling metricsJournal of Systems and Software10.1016/j.jss.2023.111917209(111917)Online publication date: Mar-2024
  • (2024)Autonomous selection of the fault classification models for diagnosing microservice applicationsFuture Generation Computer Systems10.1016/j.future.2023.12.005153(326-339)Online publication date: Apr-2024
  • (2024)Hilogx: noise-aware log-based anomaly detection with human feedbackThe VLDB Journal10.1007/s00778-024-00843-233:3(883-900)Online publication date: 28-Mar-2024
  • (2024)Log‐based anomaly detection for distributed systems: State of the art, industry experience, and open issuesJournal of Software: Evolution and Process10.1002/smr.2650Online publication date: 7-Feb-2024
  • (2023)Anomaly Detection in Microservice-Based SystemsApplied Sciences10.3390/app1313789113:13(7891)Online publication date: 5-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media