research-article

Log Parsing with Prompt-Based Few-Shot Learning

Authors:

Hongyu ZhangAuthors Info & Claims

ICSE '23: Proceedings of the 45th International Conference on Software Engineering

Pages 2438 - 2449

https://doi.org/10.1109/ICSE48619.2023.00204

Published: 26 July 2023 Publication History

Abstract

Logs generated by large-scale software systems provide crucial information for engineers to understand the system status and diagnose problems of the systems. Log parsing, which converts raw log messages into structured data, is the first step to enabling automated log analytics. Existing log parsers extract the common part as log templates using statistical features. However, these log parsers often fail to identify the correct templates and parameters because: 1) they often overlook the semantic meaning of log messages, and 2) they require domain-specific knowledge for different log datasets. To address the limitations of existing methods, in this paper, we propose LogPPT to capture the patterns of templates using prompt-based few-shot learning. LogPPT utilises a novel prompt tuning method to recognise keywords and parameters based on a few labelled log data. In addition, an adaptive random sampling algorithm is designed to select a small yet diverse training set. We have conducted extensive experiments on 16 public log datasets. The experimental results show that LogPPT is effective and efficient for log parsing.

References

[1]

M. Du, F. Li, G. Zheng, and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1285--1298.

[2]

X. Zhang, Y. Xu, Q. Lin, B. Qiao, H. Zhang, Y. Dang, C. Xie, X. Yang, Q. Cheng, Z. Li et al., "Robust log-based anomaly detection on unstable log data," in ESEC/FSE 2019, 2019, pp. 807--817.

[3]

B. Zhang, H. Zhang, P. Moscato, and A. Zhang, "Anomaly detection via mining numerical workflow relations from logs," in 2020 International Symposium on Reliable Distributed Systems (SRDS). IEEE, 2020, pp. 195--204.

[4]

B. Zhang, H. Zhang, V.-H. Le, P. Moscato, and A. Zhang, "Semi-supervised and unsupervised anomaly detection by mining numerical workflow relations from system logs," Automated Software Engineering, vol. 30, no. 1, p. 4, 2023.

Digital Library

[5]

S. Lu, B. Rao, X. Wei, B. Tak, L. Wang, and L. Wang, "Log-based abnormal task detection and root cause analysis for spark," in 2017 IEEE International Conference on Web Services (ICWS). IEEE, 2017, pp. 389--396.

[6]

N. Gurumdimma, A. Jhumka, M. Liakata, E. Chuah, and J. Browne, "Crude: combining resource usage data and error logs for accurate error detection in large-scale distributed systems," in 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS). IEEE, 2016, pp. 51--60.

[7]

A. Das, F. Mueller, C. Siegel, and A. Vishnu, "Desh: deep learning for system health prediction of lead times to failure in hpc," in Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018, pp. 40--51.

[8]

S. Zhang, Y. Liu, W. Meng, Z. Luo, J. Bu, S. Yang, P. Liang, D. Pei, J. Xu, Y. Zhang et al., "Prefix: Switch failure prediction in datacenter networks," Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 2, no. 1, pp. 1--29, 2018.

Digital Library

[9]

J. Liu, J. Zhu, S. He, P. He, Z. Zheng, and M. R. Lyu, "Logzip: extracting hidden structures via iterative clustering for log compression," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 863--873.

[10]

J. Wei, G. Zhang, Y. Wang, Z. Liu, Z. Zhu, J. Chen, T. Sun, and Q. Zhou, "On the feasibility of parser-based log compression in large-scale cloud systems," in 19th USENIX Conference on File and Storage Technologies (FAST 21), 2021, pp. 249--262.

[11]

J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M. R. Lyu, "Tools and benchmarks for automated log parsing," in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 121--130.

[12]

P. He, J. Zhu, Z. Zheng, and M. R. Lyu, "Drain: An online log parsing approach with fixed depth tree," in 2017 IEEE International Conference on Web Services (ICWS). IEEE, 2017, pp. 33--40.

[13]

M. Du and F. Li, "Spell: Streaming parsing of system event logs," in 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016, pp. 859--864.

[14]

H. Dai, H. Li, C. S. Chen, W. Shang, and T.-H. Chen, "Logram: Efficient log parsing using n-gram dictionaries," IEEE Transactions on Software Engineering, 2020.

[15]

M. Nagappan and M. A. Vouk, "Abstracting log lines to log event types for mining software system logs," in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 2010, pp. 114--117.

[16]

V.-H. Le and H. Zhang, "Log-based anomaly detection without log parsing," in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021, pp. 492--504.

[17]

Z. M. Jiang, A. E. Hassan, P. Flora, and G. Hamann, "Abstracting execution logs to execution events for enterprise applications (short paper)," in 2008 The Eighth International Conference on Quality Software. IEEE, 2008, pp. 181--186.

[18]

Z. A. Khan, D. Shin, D. Bianculli, and L. Briand, "Guidelines for assessing the accuracy of log message template identification techniques," in Proceedings of the 44th International Conference on Software Engineering (ICSE'22). ACM, 2022.

[19]

Y. Liu, X. Zhang, S. He, H. Zhang, L. Li, Y. Kang, Y. Xu, M. Ma, Q. Lin, Y. Dang et al., "Uniparser: A unified log parser for heterogeneous log data," in Proceedings of the ACM Web Conference 2022, 2022, pp. 1893--1901.

[20]

S. Nedelkoski, J. Bogatinovski, A. Acker, J. Cardoso, and O. Kao, "Self-supervised log parsing," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2020, pp. 122--138.

[21]

P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, "An evaluation study on log parsing and its use in log mining," in 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2016, pp. 654--661.

[22]

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.

[23]

W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, pp. 117--132.

[24]

M. Nagappan, K. Wu, and M. A. Vouk, "Efficiently extracting operational profiles from execution logs using suffix arrays," in 2009 20th International Symposium on Software Reliability Engineering. IEEE, 2009, pp. 41--50.

[25]

R. Vaarandi, "A data clustering algorithm for mining patterns from event logs," in Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764). Ieee, 2003, pp. 119--126.

[26]

Q. Fu, J.-G. Lou, Y. Wang, and J. Li, "Execution anomaly detection in distributed systems through unstructured log analysis," in 2009 Ninth IEEE International Conference on Data Mining. IEEE, 2009, pp. 149--158.

[27]

L. Tang, T. Li, and C.-S. Perng, "Logsig: Generating system events from raw textual logs," in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 785--794.

[28]

K. Shima, "Length matters: Clustering system log messages using length of words," arXiv preprint arXiv:1611.03213, 2016.

[29]

"HDFS dataset," 2022. [Online]. Available: https://github.com/logpai/loghub/tree/master/HDFS

[30]

"BGL dataset," 2022. [Online]. Available: https://github.com/logpai/loghub/tree/master/BGL

[31]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171--4186.

[32]

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, "Exploring the limits of transfer learning with a unified text-to-text transformer," Journal of Machine Learning Research, vol. 21, no. 140, pp. 1--67, 2020. [Online]. Available: http://jmlr.org/papers/v21/20-074.html

[33]

X. Li, P. Chen, L. Jing, Z. He, and G. Yu, "Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults," in 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2020, pp. 92--103.

[34]

S. Tao, W. Meng, Y. Cheng, Y. Zhu, Y. Liu, C. Du, T. Han, Y. Zhao, X. Wang, and H. Yang, "Logstamp: Automatic online log parsing based on sequence labelling," ACM SIGMETRICS Performance Evaluation Review, vol. 49, no. 4, pp. 93--98, 2022.

Digital Library

[35]

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," 2018.

[36]

C. Wang, Y. Yang, C. Gao, Y. Peng, H. Zhang, and M. R. Lyu, "No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence," in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 382--394.

[37]

X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, "Ptr: Prompt tuning with rules for text classification," AI Open, vol. 3, pp. 182--192, 2022.

[38]

T. Gao, A. Fisch, and D. Chen, "Making pre-trained language models better few-shot learners," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 3816--3830.

[39]

R. Ma, X. Zhou, T. Gui, Y. Tan, L. Li, Q. Zhang, and X. Huang, "Template-free prompt tuning for few-shot NER," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, United States: Association for Computational Linguistics, Jul. 2022, pp. 5721--5732. [Online]. Available: https://aclanthology.org/2022.naacl-main.420

[40]

L. Wang, R. Li, Y. Yan, Y. Yan, S. Wang, W. Wu, and W. Xu, "Instructionner: A multi-task instruction-based generative framework for few-shot ner," arXiv preprint arXiv:2203.03903, 2022.

[41]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, "Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing," ACM Computing Surveys, vol. 55, no. 9, pp. 1--35, 2023.

Digital Library

[42]

X. L. Li and P. Liang, "Prefix-tuning: Optimizing continuous prompts for generation," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 4582--4597.

[43]

G. Qin and J. Eisner, "Learning how to ask: Querying LMs with mixtures of soft prompts," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics, Jun. 2021, pp. 5203--5212. [Online]. Available: https://aclanthology.org/2021.naacl-main.410

[44]

T. Y. Chen, H. Leung, and I. K. Mak, "Adaptive random testing," in Annual Asian Computing Science Conference. Springer, 2004, pp. 320--329.

[45]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019.

[46]

R. Sennrich, B. Haddow, and A. Birch, "Neural machine translation of rare words with subword units," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 1715--1725.

[47]

H. Guo, S. Yuan, and X. Wu, "Logbert: Log anomaly detection via bert," in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1--8.

[48]

"A large collection of system log datasets for ai-powered log analytics," 2021. [Online]. Available: https://github.com/logpai/loghub

[49]

"A toolkit for automated log parsing," 2022. [Online]. Available: https://github.com/logpai/logparser

[50]

"Artifact for "guidelines for assessing the accuracy of log message template identification techniques"," 2022. [Online].

[51]

I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net, 2019. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7

[52]

L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, "Template-based named entity recognition using bart," in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 1835--1845.

[53]

K. Hambardzumyan, H. Khachatrian, and J. May, "Warp: Word-level adversarial reprogramming," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 4921--4933.

[54]

S. He, J. Zhu, P. He, and M. R. Lyu, "Experience report: System log analysis for anomaly detection," in 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, 2016, pp. 207--218.

[55]

X. Li, P. Chen, L. Jing, Z. He, and G. Yu, "Swisslog: Robust anomaly detection and localization for interleaved unstructured logs," IEEE Transactions on Dependable and Secure Computing, 2022.

[56]

V.-H. Le and H. Zhang, "Log-based anomaly detection with deep learning: How far are we?" in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1356--1367.

[57]

H. Ott, J. Bogatinovski, A. Acker, S. Nedelkoski, and O. Kao, "Robust and transferable anomaly detection in log data using pre-trained language models," in 2021 IEEE/ACM International Workshop on Cloud Intelligence (CloudIntelligence). IEEE, 2021, pp. 19--24.

[58]

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," Advances in neural information processing systems, vol. 32, 2019.

[59]

F. Setianto, E. Tsani, F. Sadiq, G. Domalis, D. Tsakalidis, and P. Kostakos, "Gpt-2c: a parser for honeypot logs using large pre-trained language models," in Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2021, pp. 649--653.

Digital Library

[60]

W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, "Unified pre-training for program understanding and generation," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 2655--2668.

Cited By

Zhang SJi YLuan JNie XChen ZMa MSun YPei DFilkov VRay BZhou M(2024)End-to-End AutoML for Unsupervised Log Anomaly DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695535(1680-1692)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695535
Xiao YLe VZhang HFilkov VRay BZhou M(2024)Demonstration-Free: Towards More Practical Log Parsing with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694994(153-165)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3694994
Astekin MHort MMoonen L(2024)A Comparative Study on Large Language Models for Log ParsingProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686684(234-244)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686684
Show More Cited By

Recommendations

DivLog: Log Parsing with Prompt Enhanced In-Context Learning
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Log parsing, which involves log template extraction from semi-structured logs to produce structured logs, is the first and the most critical step in automated log analysis. However, current log parsers suffer from limited effectiveness for two reasons. ...
UniParser: A Unified Log Parser for Heterogeneous Log Data
WWW '22: Proceedings of the ACM Web Conference 2022

Logs provide first-hand information for engineers to diagnose failures in large-scale online service systems. Log parsing, which transforms semi-structured raw log messages into structured data, is a prerequisite of automated log analysis such as log-...
Self-supervised log parsing using semantic contribution difference
Abstract
Logs can help developers to promptly diagnose software system failures. Log parsers, which parse semi-structured logs into structured log templates, are the first component for automated log analysis. However, almost all existing log ...
Highlights
- Integrates advanced NLP technology to construct semantic contributions of words to parse logs.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '23: Proceedings of the 45th International Conference on Software Engineering

May 2023

2713 pages

ISBN:9781665457019

General Chair:
John Grundy
Department of Software Systems and Cybersecurity, Faculty of IT, Monash University, Australia
,
Program Co-chairs:
Lori Pollock
University of Delaware, DE, USA
,
Massimiliano Di Penta
University of Sannio, Italy

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 26 July 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '23

Sponsor:

SIGSOFT

ICSE '23: 45th International Conference on Software Engineering

May 14 - 20, 2023

Victoria, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
121
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)7

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang SJi YLuan JNie XChen ZMa MSun YPei DFilkov VRay BZhou M(2024)End-to-End AutoML for Unsupervised Log Anomaly DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695535(1680-1692)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695535
Xiao YLe VZhang HFilkov VRay BZhou M(2024)Demonstration-Free: Towards More Practical Log Parsing with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694994(153-165)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3694994
Astekin MHort MMoonen L(2024)A Comparative Study on Large Language Models for Log ParsingProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686684(234-244)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686684
Le VZhang H(2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654966
Jiang ZLiu JChen ZLi YHuang JHuo YHe PGu JLyu M(2024)LILAC: Log Parsing using LLMs with Adaptive Parsing CacheProceedings of the ACM on Software Engineering10.1145/36437331:FSE(137-160)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643733

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents