Abstract
Logs contain runtime information for both systems and users. As many of them use natural language, a typical log-based analysis needs to parse logs into the structured format first. Existing parsing approaches often take two steps. The first step is to find similar words (tokens) or sentences. Second, parsers extract log templates by replacing different tokens with variable placeholders. However, we observe that most parsers concentrate on precisely grouping similar tokens or logs. But they do not have a well-designed template extraction process, which leads to inconsistent accuracy on particular datasets. The root cause is the ambiguous definition of variable placeholders and similar templates. The consequences include abuse of variable placeholders, incorrectly divided templates, and an excessive number of templates over time. In this paper, we propose our online log parsing approach Cognition. It redefines variable placeholders via a strict lower bound to avoid ambiguity first. Then, it applies our template correction technique to merge and absorb similar templates. It eliminates the interference of commonly used parameters and thus isolates template quantity. Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches. It also saves up to 52.1% of time cost on average than the others.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In Proc. the 22nd ACM Symposium on Operating Systems Principles, Oct. 2009, pp.117–132. https://doi.org/10.1145/1629575.1629587.
Zhou P P, Wang Y, Li Z Y, Tyson G, Guan H T, Xie G. Logchain: Cloud workflow reconstruction & troubleshooting with unstructured logs. Computer Networks, 2020, 175: 107279. https://doi.org/10.1016/j.comnet.2020.107279.
Zhou P P, Wang Y, Li Z Y, Wang X, Tyson G, Xie G G. LogSayer: Log pattern-driven cloud component anomaly diagnosis with machine learning. In Proc. the 28th IEEE/ACM International Symposium on Quality of Service, Jun. 2020. https://doi.org/10.1109/IWQoS49365.2020.9212954.
Oprea A, Li Z, Yen T F, Chin S H, Alrwais S. Detection of early-stage enterprise infection by mining large-scale log data. In Proc. the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Jun. 2015, pp.45–56. https://doi.org/10.1109/DSN.2015.14.
Du M, Li F F. ATOM: Automated tracking, orchestration and monitoring of resource usage in infrastructure as a service systems. In Proc. the 2015 IEEE International Conference on Big Data, Oct. 29–Nov. 1, 2015, pp.271–278. https://doi.org/10.1109/BigData.2015.7363764.
Neelima G, Rodda S. Predicting user behavior through sessions using the web log mining. In Proc. the 2016 International Conference on Advances in Human Machine Interaction, Mar. 2016. https://doi.org/10.1109/HMI.2016.7449167.
Lim C, Singh N, Yajnik S. A log mining approach to failure analysis of enterprise telephony systems. In Proc. the 2018 IEEE International Conference on Dependable Systems and Networks with FTCS and DCC, Jun. 2008, pp.398–403. https://doi.org/10.1109/DSN.2008.4630109.
Kobayashi S, Fukuda K, Esaki H. Mining causes of network events in log data with causal inference. In Proc. the 2017 IFIP/IEEE Symposium on Integrated Network and Service Management, May 2017, pp.45–53. https://doi.org/10.23919/INM.2017.7987263.
Oliner A, Ganapathi A, Xu W. Advances and challenges in log analysis. Communications of the ACM, 2012, 55(2): 55–61. https://doi.org/10.1145/2076450.2076466.
Zhu J M, He S L, Liu J Y, He P J, Xie Q, Zheng Z B, Lyu M R. Tools and benchmarks for automated log parsing. In Proc. the 41st IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, May 2019, pp.121–130. https://doi.org/10.1109/ICSE-SEIP.2019.00021.
Du M, Li F F. Spell: Streaming parsing of system event logs. In Proc. the 16th IEEE International Conference on Data Mining, Dec. 2016, pp.859–864. https://doi.org/10.1109/ICDM.2016.0103.
Beschastnikh I, Brun Y, Ernst M D, Krishnamurthy A. Inferring models of concurrent systems from logs of their behavior with CSight. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.468–479. https://doi.org/10.1145/2568225.2568246.
Du M, Li F F, Zheng G N, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. the 2017 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2017, pp.1285–1298. https://doi.org/10.1145/3133956.3134015.
Vaarandi R, Pihelgas M. LogCluster—A data clustering and pattern mining algorithm for event logs. In Proc. the 11th International Conference on Network and Service Management, Nov. 2015. https://doi.org/10.1109/CNSM.2015.7367331.
Dai H T, Li H, Chen C S, Shang W Y, Chen T H. Logram: Efficient log parsing using n-gram dictionaries. IEEE Trans. Software Engineering, 2022, 48(3): 879–892. https://doi.org/10.1109/TSE.2020.3007554.
Meng W B, Liu Y, Zaiter F, Zhang S L, Chen Y H, Zhang Y Z, Zhu Y C, Wang E, Zhang R Z, Tao S M, Yang D, Zhou R, Pei D. LogParse: Making log parsing adaptive through word classification. In Proc. the 29th International Conference on Computer Communications and Networks, Aug. 2020. https://doi.org/10.1109/ICCCN49398.2020.9209681.
Hamooni H, Debnath B, Xu J W, Zhang H, Jiang G F, Mueen A. LogMine: Fast pattern recognition for log analytics. In Proc. the 25th ACM International on Conference on Information and Knowledge Management, Oct. 2016, pp.1573–1582. https://doi.org/10.1145/2983323.2983358.
Yang R P, Qu D, Qian Y K, Dai Y S, Zhu S W. An online log template extraction method based on hierarchical clustering. EURASIP Journal on Wireless Communications and Networking, 2019, 2019(1): Article No. 135. https://doi.org/10.1186/s13638-019-1430-4.
Tang L, Li T, Perng C S. LogSig: Generating system events from raw textual logs. In Proc. the 20th ACM International Conference on Information and Knowledge Management, Oct. 2011, pp.785–794. https://doi.org/10.1145/2063576.2063690.
Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In Proc. the 9th IEEE International Conference on Data Mining, Dec. 2009, pp.149–158. https://doi.org/10.1109/ICDM.2009.60.
Shima K. Length matters: Clustering system log messages using length of words. arXiv: 1611.03213, 2016. https://arxiv.org/abs/1611.03213, Oct. 2023.
Jiang Z M, Hassan A E, Flora P, Hamann G. Abstracting execution logs to execution events for enterprise applications (short paper). In Proc. the 8th International Conference on Quality Software, Aug. 2008, pp.181–186. https://doi.org/10.1109/QSIC.2008.50.
Makanju A A O, Zincir-Heywood A N, Milios E E. Clustering event logs using iterative partitioning. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jun. 2009, pp.1255–1264. https://doi.org/10.1145/1557019.1557154.
He P J, Zhu J M, Zheng Z B, Lyu M R. Drain: An online log parsing approach with fixed depth tree. In Proc. the 2017 IEEE International Conference on Web Services, Jun. 2017, pp.33–40. https://doi.org/10.1109/ICWS.2017.13.
Wurzenberger M, Landauer M, Skopik F, Kastner W. AECID-PG: A tree-based log parser generator to enable log analysis. In Proc. the 2019 IFIP/IEEE Symposium on Integrated Network and Service Management, Apr. 2019, pp.7–12.
He S L, Zhu J M, He P J, Lyu M R. Loghub: A large collection of system log datasets towards automated log analytics. arXiv: 2008.06448v1, 2020. https://arxiv.org/abs/2008.06448v1, Oct. 2023.
He P L, Zhu J M, Xu P C, Zheng Z B, Lyu M R. A directed acyclic graph approach to online log parsing. arXiv: 1806.04356, 2018. https://arxiv.org/abs/1806.04356, Oct. 2023.
He P J, Zhu J M, He S L, Li J, Lyu M R. Towards automated log parsing for large-scale log data analysis. IEEE Trans. Dependable and Secure Computing, 2018, 15(6): 931–944. https://doi.org/10.1109/TDSC.2017.2762673.
Meng W B, Liu Y, Zhu Y C, Zhang S L, Pei D, Liu Y Q, Chen Y H, Zhang R Z, Tao S M, Sun P, Zhou R. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.4739–4745. https://doi.org/10.24963/ijcai.2019/658.
Palacio-Niño J O, Berzal F. Evaluation metrics for unsupervised learning algorithms. arXiv: 1905.05667, 2019. https://arxiv.org/abs/1905.05667, Oct. 2023.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 144 kb)
Rights and permissions
About this article
Cite this article
Tian, R., Diao, ZL., Jiang, HY. et al. Cognition: Accurate and Consistent Linear Log Parsing Using Template Correction. J. Comput. Sci. Technol. 38, 1036–1050 (2023). https://doi.org/10.1007/s11390-021-1691-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-021-1691-3