Abstract
Log data contains very rich and valuable information that records system states and behavior, which can be used to diagnose system failures. Anomaly detection from large-scale log data plays a key role in building secure and trustworthy systems. Anomaly detection model based on machine learning has achieved good results in practical applications. However, logs generated by modern large-scale distributed systems are more complex than ever before in terms of data size and variety. Therefore, the traditional single-machine learning anomaly detection model faces the model aging problem. We design an anomaly detection model that combines multiple machine learning algorithms. By using a conformal prediction, we can calculate the confidence of each algorithm for each log to be detected and use statistical analysis to tag them with a trusted label. The approach was tested on the public HDFS_100k log dataset, and the results show that our model is more accurate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bodik, P., Goldszmidt, M., Fox, A., Woodard, D.B., Andersen, H.: Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, pp. 111–124. ACM (2010)
Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Anomaly detection using autoencoders in high performance computing systems (2018)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection:a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Chen, M., Zheng, A.X., Lloyd, J., Jordan, M.I., Brewer, E.: Failure diagnosis using decision trees. In: 2004 Proceedings of the International Conference on Autonomic Computing, pp. 36–43. IEEE (2004)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)
He, P., Zhu, J., He, S.: Loglizer (2016). https://github.com/logpai/loglizer
He, S., Zhu, J., He, P., Lyu, M.R.: Experience report: system log analysis for anomaly detection. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 207–218. IEEE (2016)
Jordaney, R., et al.: Transcend: detecting concept drift in malware classification models. In: Proceedings of the 26TH USENIX Security Symposium (USENIX Security 2017), pp. 625–642. USENIX Association (2017)
Li, S.Z., Jain, A. (eds.): Concept Drift, p. 190. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-73003-5
Liang, Y., Zhang, Y., Xiong, H., Sahoo, R.: Failure prediction in IBM BlueGene/L event logs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 583–588. IEEE (2007)
Lin, Q., Zhang, H., Lou, J.G., Yu, Z., Chen, X.: Log clustering based problem identification for online service systems. In: IEEE/ACM International Conference on Software Engineering Companion (2016)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Lou, J.G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: Proceedings of USENIX ATC, pp. 231–244 (2010)
Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: Fast entropy based alert detection in super computer logs. In: 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 52–58. IEEE (2010)
Min, D., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: ACM SIGSAC Conference on Computer & Communications Security (2017)
Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 45–56. IEEE (2015)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9(Mar), 371–421 (2008)
Tsymbal, A.: The problem of concept drift: definitions and related work. Comput. Sci. Dept. Trinity College Dublin 106(2), 58 (2004)
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.: Largescale system problem detection by mining console logs. In: Proceedings of SOSP 2009 (2009)
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 117–132. ACM (2009)
Acknowledgment
This work is partially supported by the National Key Research and Development Program of China (No. 2018YFB2100300, 2016YFC0400709), the National Natural Science Foundation (No. 61872200), the Natural Science Foundation of Tianjin (18YFYZCG00060) and Nankai University (91922299).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, X., Jin, Z., Han, Q., Huang, S., Li, T. (2019). A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-37352-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37351-1
Online ISBN: 978-3-030-37352-8
eBook Packages: Computer ScienceComputer Science (R0)