Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs

Published: 25 March 2016 Publication History

Abstract

Cloud infrastructures provide a rich set of management tasks that operate computing, storage, and networking resources in the cloud. Monitoring the executions of these tasks is crucial for cloud providers to promptly find and understand problems that compromise cloud availability. However, such monitoring is challenging because there are multiple distributed service components involved in the executions. CloudSeer enables effective workflow monitoring. It takes a lightweight non-intrusive approach that purely works on interleaved logs widely existing in cloud infrastructures. CloudSeer first builds an automaton for the workflow of each management task based on normal executions, and then it checks log messages against a set of automata for workflow divergences in a streaming manner. Divergences found during the checking process indicate potential execution problems, which may or may not be accompanied by error log messages. For each potential problem, CloudSeer outputs necessary context information including the affected task automaton and related log messages hinting where the problem occurs to help further diagnosis. Our experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeer's efficiency and problem-detection capability are suitable for online monitoring.

References

[1]
2013 Path to an OpenStack-Powered Cloud Survey Results Highlight Aggressive OpenStack Adoption Plans by Enterprises. http://www.redhat.com/en/about/press-releases/2013-path-to-an-openstack-powered-cloud-survey-results-highlight-aggressive-openstack-adoption-plans-by-enterprises.
[2]
Amazon CloudWatch. https://aws.amazon.com/cloudwatch/.
[3]
Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/.
[4]
Apache HTrace. http://htrace.incubator.apache.org/.
[5]
Architecture. OpenStack Installation Guide, http://docs.openstack.org/havana/install-guide/install/apt/content/ch_overview.html.
[6]
CirrOS: A Tiny Cloud Guest. https://launchpad.net/cirros.
[7]
Elasticsearch. http://www.elasticsearch.org/overview/elasticsearch/.
[8]
Logging and Monitoring. OpenStack Operations Guide, http://docs.openstack.org/openstack-ops/content/logging_monitoring.html.
[9]
Logstash. http://www.elasticsearch.org/overview/logstash/.
[10]
Microsoft Azure. http://azure.microsoft.com/en-us/.
[11]
OpenStack. http://www.openstack.org/.
[12]
Zipkin. http://zipkin.io/.
[13]
P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for Request Extraction and Workload Modelling. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 18--18, Berkeley, CA, USA, 2004. USENIX Association.
[14]
I. Beschastnikh, Y. Brun, M. D. Ernst, and A. Krishnamurthy. Inferring Models of Concurrent Systems from Logs of Their Behavior with CSight. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 468--479, New York, NY, USA, 2014. ACM.
[15]
I. Beschastnikh, Y. Brun, M. D. Ernst, A. Krishnamurthy, and T. E. Anderson. Mining Temporal Invariants from Partially Ordered Logs. ACM SIGOPS Operating Systems Review, 45(3):39--46, Jan. 2012.
[16]
M. Chow, D. Meisner, J. Flinn, D. Peek, and T. F. Wenisch. The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 217--231, Berkeley, CA, USA, 2014. USENIX Association.
[17]
T. Do, M. Hao, T. Leesatapornwongsa, T. Patana-anake, and H. S. Gunawi. Limplock: Understanding the Impact of Limpware on Scale-Out Cloud Systems. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 14:1--14:14, New York, NY, USA, 2013. ACM.
[18]
Q. Fu, J.-G. Lou, Y. Wang, and J. Li. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM '09, pages 149--158, Washington, DC, USA, 2009. IEEE Computer Society.
[19]
P. Joshi, H. S. Gunawi, and K. Sen. PREFAIL: A Programmable Tool for Multiple-Failure Injection. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '11, pages 171--188, New York, NY, USA, 2011. ACM.
[20]
X. Ju, L. Soares, K. G. Shin, K. D. Ryu, and D. Da Silva. On Fault Resilience of OpenStack. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 2:1--2:16, New York, NY, USA, 2013. ACM.
[21]
K. Kc and X. Gu. ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures. In 2011 30th IEEE Symposium on Reliable Distributed Systems (SRDS), pages 11--20, Oct 2011.
[22]
D. Lo, L. Mariani, and M. Pezzè. Automatic Steering of Behavioral Model Inference. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE '09, pages 345--354, New York, NY, USA, 2009. ACM.
[23]
J.-G. Lou, Q. Fu, S. Yang, J. Li, and B. Wu. Mining Program Workflow from Interleaved Traces. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pages 613--622, New York, NY, USA, 2010. ACM.
[24]
J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li. Mining Invariants from Console Logs for System Problem Detection. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 24--24, Berkeley, CA, USA, 2010. USENIX Association.
[25]
K. Nagaraj, C. Killian, and J. Neville. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 26--26, Berkeley, CA, USA, 2012. USENIX Association.
[26]
H. Nguyen, D. J. Dean, K. Kc, and X. Gu. Insight: In-situ Online Service Failure Path Inference in Production Computing Infrastructures. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 269--280, Berkeley, CA, USA, 2014. USENIX Association.
[27]
B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical report, Google, Inc., 2010.
[28]
N. Walkinshaw and K. Bogdanov. Inferring Finite-State Models with Temporal Constraints. In Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE '08, pages 248--257, Washington, DC, USA, 2008. IEEE Computer Society.
[29]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting Large-Scale System Problems by Mining Console Logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 117--132, New York, NY, USA, 2009. ACM.
[30]
D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy. SherLog: Error Diagnosis by Connecting Clues from Run-time Logs. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pages 143--154, New York, NY, USA, 2010. ACM.
[31]
D. Yuan, S. Park, P. Huang, Y. Liu, M. M. Lee, X. Tang, Y. Zhou, and S. Savage. Be Conservative: Enhancing Failure Diagnosis with Proactive Logging. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 293--306, Berkeley, CA, USA, 2012. USENIX Association.
[32]
X. Zhao, Y. Zhang, D. Lion, M. F. Ullah, Y. Luo, D. Yuan, and M. Stumm. lprof: A Non-intrusive Request Flow Profiler for Distributed Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 629--644, Berkeley, CA, USA, 2014. USENIX Association.

Cited By

View all
  • (2024)Comprehensive Analysis and Evaluation of Anomalous User Activity in Web Server LogsSensors10.3390/s2403074624:3(746)Online publication date: 24-Jan-2024
  • (2024)Log Anomaly Detection by Adversarial Autoencoders With Graph Feature FusionIEEE Transactions on Reliability10.1109/TR.2023.330537673:1(637-649)Online publication date: Mar-2024
  • (2024)MOMR: A Threat in Web Application Due to the Malicious Orchestration of Microservice RequestsICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10623095(3304-3309)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 44, Issue 2
ASPLOS'16
May 2016
774 pages
ISSN:0163-5964
DOI:10.1145/2980024
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2016
    824 pages
    ISBN:9781450340915
    DOI:10.1145/2872362
    • General Chair:
    • Tom Conte,
    • Program Chair:
    • Yuanyuan Zhou
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016
Published in SIGARCH Volume 44, Issue 2

Check for updates

Author Tags

  1. cloud infrastructures
  2. distributed systems
  3. log analysis
  4. workflow monitoring

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)4
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Comprehensive Analysis and Evaluation of Anomalous User Activity in Web Server LogsSensors10.3390/s2403074624:3(746)Online publication date: 24-Jan-2024
  • (2024)Log Anomaly Detection by Adversarial Autoencoders With Graph Feature FusionIEEE Transactions on Reliability10.1109/TR.2023.330537673:1(637-649)Online publication date: Mar-2024
  • (2024)MOMR: A Threat in Web Application Due to the Malicious Orchestration of Microservice RequestsICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10623095(3304-3309)Online publication date: 9-Jun-2024
  • (2024)DSGNInformation Sciences: an International Journal10.1016/j.ins.2024.121174680:COnline publication date: 1-Oct-2024
  • (2024)Hilogx: noise-aware log-based anomaly detection with human feedbackThe VLDB Journal10.1007/s00778-024-00843-233:3(883-900)Online publication date: 28-Mar-2024
  • (2024)TWLog: Task Workflow-Based Log Anomaly DetectionWeb and Big Data10.1007/978-981-97-7244-5_1(3-16)Online publication date: 31-Aug-2024
  • (2023)ADAL-NN: Anomaly Detection and Localization Using Deep Relational Learning in Distributed SystemsApplied Sciences10.3390/app1312729713:12(7297)Online publication date: 19-Jun-2023
  • (2023)sem2vec: Semantics-aware Assembly Tracelet EmbeddingACM Transactions on Software Engineering and Methodology10.1145/356993332:4(1-34)Online publication date: 27-May-2023
  • (2023)Exploring Better Black-Box Test Case Prioritization via Log AnalysisACM Transactions on Software Engineering and Methodology10.1145/356993232:3(1-32)Online publication date: 26-Apr-2023
  • (2023)A Critical Study on Data Leakage in Recommender System Offline EvaluationACM Transactions on Information Systems10.1145/356993041:3(1-27)Online publication date: 7-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media