Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3338906.3338931acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Robust log-based anomaly detection on unstable log data

Published: 12 August 2019 Publication History

Abstract

Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event data extracted from historical logs. However, we find that the existing methods do not work well in practice. These methods have the close-world assumption, which assumes that the log data is stable over time and the set of distinct log events is known. However, our empirical study shows that in practice, log data often contains previously unseen log events or log sequences. The instability of log data comes from two sources: 1) the evolution of logging statements, and 2) the processing noise in log data. In this paper, we propose a new log-based anomaly detection approach, called LogRobust. LogRobust extracts semantic information of log events and represents them as semantic vectors. It then detects anomalies by utilizing an attention-based Bi-LSTM model, which has the ability to capture the contextual information in the log sequences and automatically learn the importance of different log events. In this way, LogRobust is able to identify and handle unstable log events and sequences. We have evaluated LogRobust using logs collected from the Hadoop system and an actual online service system of Microsoft. The experimental results show that the proposed approach can well address the problem of log instability and achieve accurate and robust results on real-world, ever-changing log data.

References

[1]
Christophe Bertero, Matthieu Roy, Carla Sauvanaud, and Gilles Trédan. 2017. Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection. In Software Reliability Engineering (ISSRE), 2017 IEEE 28th International Symposium on. IEEE, 351–360.
[2]
Peter Bodik, Moises Goldszmidt, Armando Fox, Dawn B Woodard, and Hans Andersen. 2010. Fingerprinting the datacenter: automated classification of performance crises. In Proceedings of the 5th European conference on Computer systems. ACM, 111–124.
[3]
Jakub Breier and Jana Branišová. 2015. Anomaly detection from log files using data mining techniques. In Information Science and Applications. Springer, 449– 457.
[4]
Jakub Breier and Jana Branišová. 2017. A dynamic rule creation based anomaly detection method for identifying security breaches in log records. Wireless Personal Communications 94, 3 (2017), 497–511.
[5]
Andy Brown, Aaron Tuor, Brian Hutchinson, and Nicole Nichols. 2018. Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection. arXiv preprint arXiv:1803.04967 (2018).
[6]
Lianping Chen. 2015. Continuous delivery: Huge benefits, but challenges too. IEEE Software 32, 2 (2015), 50–54.
[7]
Mike Chen, Alice X Zheng, Jim Lloyd, Michael I Jordan, and Eric Brewer. 2004. Failure diagnosis using decision trees. In null. IEEE, 36–43.
[8]
François Chollet et al. 2015. Keras. https://keras.io.
[9]
Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, and Giuliano Antoniol. 2011. Can Better Identifier Splitting Techniques Help Feature Location. In 2011 IEEE 19th International Conference on Program Comprehension. 11–20.
[10]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1285–1298.
[11]
Mostafa Farshchi, Jean-Guy Schneider, Ingo Weber, and John Grundy. 2015. Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. In Software Reliability Engineering (ISSRE), 2015 IEEE 26th International Symposium on. IEEE, 24–34.
[12]
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. 2009. Execution anomaly detection in distributed systems through unstructured log analysis. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on. IEEE, 149–158.
[13]
Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. 2014. Where do developers log? an empirical study on logging practices in industry. In Companion Proceedings of the 36th International Conference on Software Engineering. ACM, 24–33.
[14]
Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, and Abdullah Mueen. 2016. LogMine: fast pattern recognition for log analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1573–1582.
[15]
Mehran Hassani, Weiyi Shang, Emad Shihab, and Nikolaos Tsantalis. 2018. Studying and detecting log-related issues. Empirical Software Engineering (2018), 1–33.
[16]
Pinjia He, Zhuangbin Chen, Shilin He, and Michael R Lyu. 2018. Characterizing the Natural Language Descriptions in Software Logging Statements. (2018).
[17]
P. He, J. Zhu, S. He, J. Li, and M. R. Lyu. 2016. An Evaluation Study on Log Parsing and Its Use in Log Mining. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 654–661.
[18]
Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R Lyu. 2017. Towards Automated Log Parsing for Large-Scale Log Data Analysis. IEEE Transactions on Dependable and Secure Computing (2017).
[19]
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (ICWS). IEEE, 33–40.
[20]
Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang. 2018. Identifying Impactful Service System Problems via Log Analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, 60–70.
[21]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).
[22]
Jez Humble and David Farley. 2010. Continuous delivery: reliable software releases through build, test, and deployment automation. Pearson Education.
[23]
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016).
[24]
Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, Mark D. Syer, and Ahmed E. Hassan. 2018. Examining the stability of logging statements. Empirical Software Engineering 23, 1 (01 Feb 2018), 290–333. 9518-0
[25]
Jack Kiefer, Jacob Wolfowitz, et al. 1952. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics 23, 3 (1952), 462–466.
[26]
Christopher Kruegel and Giovanni Vigna. 2003. Anomaly detection of web-based attacks. In Proceedings of the 10th ACM conference on Computer and communications security. ACM, 251–261.
[27]
Yann Lecun, Yoshua Bengio, and Geoffrey E Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[28]
Tao Li, Yexi Jiang, Chunqiu Zeng, Bin Xia, Zheng Liu, Wubai Zhou, Xiaolong Zhu, Wentao Wang, Liang Zhang, Jun Wu, et al. 2017. FLAP: An end-to-end event log analysis platform for system management. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1547–1556.
[29]
Yinglung Liang, Yanyong Zhang, Hui Xiong, and Ramendra Sahoo. 2007. Failure prediction in ibm bluegene/l event logs. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. IEEE, 583–588.
[30]
Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, and Xuewei Chen. 2016. Log clustering based problem identification for online service systems. In Proceedings of the 38th International Conference on Software Engineering Companion. ACM, 102–111.
[31]
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li. 2010. Mining Invariants from Console Logs for System Problem Detection. In USENIX Annual Technical Conference. 23–25.
[32]
Adetokunbo Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2012. A lightweight algorithm for message type extraction in system application logs. IEEE Transactions on Knowledge and Data Engineering 24, 11 (2012), 1921–1936.
[33]
Leonardo Mariani and Fabrizio Pastore. 2008. Automated identification of failure causes in system logs. In Software Reliability Engineering, 2008. ISSRE 2008. 19th International Symposium on. IEEE, 117–126.
[34]
Karthik Nagaraj, Charles Killian, and Jennifer Neville. 2012. Structured comparative analysis of systems logs to diagnose performance problems. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 26–26.
[35]
Alina Oprea, Zhou Li, Ting-Fang Yen, Sang H Chin, and Sumayah Alrwais. 2015. Detection of early-stage enterprise infection by mining large-scale log data. In Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on. IEEE, 45–56.
[36]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.
[37]
Robert J Schalkoff. 1997. Artificial neural networks. Vol. 1. McGraw-Hill New York.
[38]
Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi, and Priya Narasimhan. 2008. SALSA: Analyzing Logs as StAte Machines. WASL 8 (2008), 6–6.
[39]
Liang Tang, Tao Li, and Chang-Shing Perng. 2011. LogSig: Generating system events from raw textual logs. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 785–794.
[40]
Risto Vaarandi. 2003. A data clustering algorithm for mining patterns from event logs. In IP Operations & Management, 2003.(IPOM 2003). 3rd IEEE Workshop on. IEEE, 119–126.
[41]
R Vinayakumar, KP Soman, and Prabaharan Poornachandran. 2017. Long shortterm memory based operation log anomaly detection. In Advances in Computing, Communications and Informatics (ICACCI), 2017 International Conference on. IEEE, 236–242.
[42]
Wei Xu. 2010. System problem detection by mining console logs. Ph.D. Dissertation. UC Berkeley.
[43]
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. 2009. Largescale system problem detection by mining console logs. Proceedings of SOSPâĂŹ09 (2009).
[44]
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 117–132.
[45]
Ding Yuan, Soyeon Park, and Yuanyuan Zhou. 2012. Characterising Logging Practices in Open-Source Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12).
[46]
Ke Zhang, Jianwu Xu, Martin Renqiang Min, Guofei Jiang, Konstantinos Pelechrinis, and Hui Zhang. 2016. Automated IT system failure prediction: A deep learning approach. In BigData. 1291–1300.
[47]
Jieming Zhu, Pinjia He, Qiang Fu, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang. 2015. Learning to Log: Helping Developers Make Informed Logging Decisions. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 415–425. http://dl.acm.org/citation.cfm?id=2818754.2818807
[48]
Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2018. Tools and Benchmarks for Automated Log Parsing. arXiv preprint arXiv:1811.03509 (2018).

Cited By

View all
  • (2025)LaAeb: A comprehensive log-text analysis based approach for insider threat detectionComputers & Security10.1016/j.cose.2024.104126148(104126)Online publication date: Jan-2025
  • (2024)Verification of Generalizability in Software Log Anomaly Detection ModelsAnomaly Detection - Recent Advances, AI and ML Perspectives and Applications10.5772/intechopen.111938Online publication date: 17-Jan-2024
  • (2024)Leveraging Large Language Models and BERT for Log Parsing and Anomaly DetectionMathematics10.3390/math1217275812:17(2758)Online publication date: 5-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Anomaly Detection
  2. Data Quality
  3. Deep Learning
  4. Log Analysis
  5. Log Instability

Qualifiers

  • Research-article

Conference

ESEC/FSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)852
  • Downloads (Last 6 weeks)113
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2025)LaAeb: A comprehensive log-text analysis based approach for insider threat detectionComputers & Security10.1016/j.cose.2024.104126148(104126)Online publication date: Jan-2025
  • (2024)Verification of Generalizability in Software Log Anomaly Detection ModelsAnomaly Detection - Recent Advances, AI and ML Perspectives and Applications10.5772/intechopen.111938Online publication date: 17-Jan-2024
  • (2024)Leveraging Large Language Models and BERT for Log Parsing and Anomaly DetectionMathematics10.3390/math1217275812:17(2758)Online publication date: 5-Sep-2024
  • (2024)Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT ModelApplied Sciences10.3390/app1413538814:13(5388)Online publication date: 21-Jun-2024
  • (2024)LogCSS: Log anomaly detection based on BERT-CNN with context-semantics-statistics featuresJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23580146:4(7659-7676)Online publication date: 18-Apr-2024
  • (2024)Log2Graph: A graph convolution neural network based method for insider threat detectionJournal of Computer Security10.3233/JCS-230092(1-24)Online publication date: 24-Apr-2024
  • (2024)SOVEREIGN - Towards a Holistic Approach to Critical Infrastructure ProtectionProceedings of the 19th International Conference on Availability, Reliability and Security10.1145/3664476.3671410(1-9)Online publication date: 30-Jul-2024
  • (2024)LogSD: Detecting Anomalies from System Logs through Self-Supervised Learning and Frequency-Based MaskingProceedings of the ACM on Software Engineering10.1145/36608001:FSE(2098-2120)Online publication date: 12-Jul-2024
  • (2024)A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-Based Anomaly Detection TechniquesProceedings of the ACM on Software Engineering10.1145/36607681:FSE(1354-1375)Online publication date: 12-Jul-2024
  • (2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media