Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICSE48619.2023.00204acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Log Parsing with Prompt-Based Few-Shot Learning

Published: 26 July 2023 Publication History

Abstract

Logs generated by large-scale software systems provide crucial information for engineers to understand the system status and diagnose problems of the systems. Log parsing, which converts raw log messages into structured data, is the first step to enabling automated log analytics. Existing log parsers extract the common part as log templates using statistical features. However, these log parsers often fail to identify the correct templates and parameters because: 1) they often overlook the semantic meaning of log messages, and 2) they require domain-specific knowledge for different log datasets. To address the limitations of existing methods, in this paper, we propose LogPPT to capture the patterns of templates using prompt-based few-shot learning. LogPPT utilises a novel prompt tuning method to recognise keywords and parameters based on a few labelled log data. In addition, an adaptive random sampling algorithm is designed to select a small yet diverse training set. We have conducted extensive experiments on 16 public log datasets. The experimental results show that LogPPT is effective and efficient for log parsing.

References

[1]
M. Du, F. Li, G. Zheng, and V. Srikumar, "Deeplog: Anomaly detection and diagnosis from system logs through deep learning," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1285--1298.
[2]
X. Zhang, Y. Xu, Q. Lin, B. Qiao, H. Zhang, Y. Dang, C. Xie, X. Yang, Q. Cheng, Z. Li et al., "Robust log-based anomaly detection on unstable log data," in ESEC/FSE 2019, 2019, pp. 807--817.
[3]
B. Zhang, H. Zhang, P. Moscato, and A. Zhang, "Anomaly detection via mining numerical workflow relations from logs," in 2020 International Symposium on Reliable Distributed Systems (SRDS). IEEE, 2020, pp. 195--204.
[4]
B. Zhang, H. Zhang, V.-H. Le, P. Moscato, and A. Zhang, "Semi-supervised and unsupervised anomaly detection by mining numerical workflow relations from system logs," Automated Software Engineering, vol. 30, no. 1, p. 4, 2023.
[5]
S. Lu, B. Rao, X. Wei, B. Tak, L. Wang, and L. Wang, "Log-based abnormal task detection and root cause analysis for spark," in 2017 IEEE International Conference on Web Services (ICWS). IEEE, 2017, pp. 389--396.
[6]
N. Gurumdimma, A. Jhumka, M. Liakata, E. Chuah, and J. Browne, "Crude: combining resource usage data and error logs for accurate error detection in large-scale distributed systems," in 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS). IEEE, 2016, pp. 51--60.
[7]
A. Das, F. Mueller, C. Siegel, and A. Vishnu, "Desh: deep learning for system health prediction of lead times to failure in hpc," in Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018, pp. 40--51.
[8]
S. Zhang, Y. Liu, W. Meng, Z. Luo, J. Bu, S. Yang, P. Liang, D. Pei, J. Xu, Y. Zhang et al., "Prefix: Switch failure prediction in datacenter networks," Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 2, no. 1, pp. 1--29, 2018.
[9]
J. Liu, J. Zhu, S. He, P. He, Z. Zheng, and M. R. Lyu, "Logzip: extracting hidden structures via iterative clustering for log compression," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 863--873.
[10]
J. Wei, G. Zhang, Y. Wang, Z. Liu, Z. Zhu, J. Chen, T. Sun, and Q. Zhou, "On the feasibility of parser-based log compression in large-scale cloud systems," in 19th USENIX Conference on File and Storage Technologies (FAST 21), 2021, pp. 249--262.
[11]
J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M. R. Lyu, "Tools and benchmarks for automated log parsing," in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 121--130.
[12]
P. He, J. Zhu, Z. Zheng, and M. R. Lyu, "Drain: An online log parsing approach with fixed depth tree," in 2017 IEEE International Conference on Web Services (ICWS). IEEE, 2017, pp. 33--40.
[13]
M. Du and F. Li, "Spell: Streaming parsing of system event logs," in 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016, pp. 859--864.
[14]
H. Dai, H. Li, C. S. Chen, W. Shang, and T.-H. Chen, "Logram: Efficient log parsing using n-gram dictionaries," IEEE Transactions on Software Engineering, 2020.
[15]
M. Nagappan and M. A. Vouk, "Abstracting log lines to log event types for mining software system logs," in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 2010, pp. 114--117.
[16]
V.-H. Le and H. Zhang, "Log-based anomaly detection without log parsing," in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021, pp. 492--504.
[17]
Z. M. Jiang, A. E. Hassan, P. Flora, and G. Hamann, "Abstracting execution logs to execution events for enterprise applications (short paper)," in 2008 The Eighth International Conference on Quality Software. IEEE, 2008, pp. 181--186.
[18]
Z. A. Khan, D. Shin, D. Bianculli, and L. Briand, "Guidelines for assessing the accuracy of log message template identification techniques," in Proceedings of the 44th International Conference on Software Engineering (ICSE'22). ACM, 2022.
[19]
Y. Liu, X. Zhang, S. He, H. Zhang, L. Li, Y. Kang, Y. Xu, M. Ma, Q. Lin, Y. Dang et al., "Uniparser: A unified log parser for heterogeneous log data," in Proceedings of the ACM Web Conference 2022, 2022, pp. 1893--1901.
[20]
S. Nedelkoski, J. Bogatinovski, A. Acker, J. Cardoso, and O. Kao, "Self-supervised log parsing," in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2020, pp. 122--138.
[21]
P. He, J. Zhu, S. He, J. Li, and M. R. Lyu, "An evaluation study on log parsing and its use in log mining," in 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2016, pp. 654--661.
[22]
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
[23]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, pp. 117--132.
[24]
M. Nagappan, K. Wu, and M. A. Vouk, "Efficiently extracting operational profiles from execution logs using suffix arrays," in 2009 20th International Symposium on Software Reliability Engineering. IEEE, 2009, pp. 41--50.
[25]
R. Vaarandi, "A data clustering algorithm for mining patterns from event logs," in Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764). Ieee, 2003, pp. 119--126.
[26]
Q. Fu, J.-G. Lou, Y. Wang, and J. Li, "Execution anomaly detection in distributed systems through unstructured log analysis," in 2009 Ninth IEEE International Conference on Data Mining. IEEE, 2009, pp. 149--158.
[27]
L. Tang, T. Li, and C.-S. Perng, "Logsig: Generating system events from raw textual logs," in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 785--794.
[28]
K. Shima, "Length matters: Clustering system log messages using length of words," arXiv preprint arXiv:1611.03213, 2016.
[29]
"HDFS dataset," 2022. [Online]. Available: https://github.com/logpai/loghub/tree/master/HDFS
[30]
"BGL dataset," 2022. [Online]. Available: https://github.com/logpai/loghub/tree/master/BGL
[31]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171--4186.
[32]
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, "Exploring the limits of transfer learning with a unified text-to-text transformer," Journal of Machine Learning Research, vol. 21, no. 140, pp. 1--67, 2020. [Online]. Available: http://jmlr.org/papers/v21/20-074.html
[33]
X. Li, P. Chen, L. Jing, Z. He, and G. Yu, "Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults," in 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2020, pp. 92--103.
[34]
S. Tao, W. Meng, Y. Cheng, Y. Zhu, Y. Liu, C. Du, T. Han, Y. Zhao, X. Wang, and H. Yang, "Logstamp: Automatic online log parsing based on sequence labelling," ACM SIGMETRICS Performance Evaluation Review, vol. 49, no. 4, pp. 93--98, 2022.
[35]
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," 2018.
[36]
C. Wang, Y. Yang, C. Gao, Y. Peng, H. Zhang, and M. R. Lyu, "No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence," in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 382--394.
[37]
X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, "Ptr: Prompt tuning with rules for text classification," AI Open, vol. 3, pp. 182--192, 2022.
[38]
T. Gao, A. Fisch, and D. Chen, "Making pre-trained language models better few-shot learners," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 3816--3830.
[39]
R. Ma, X. Zhou, T. Gui, Y. Tan, L. Li, Q. Zhang, and X. Huang, "Template-free prompt tuning for few-shot NER," in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, United States: Association for Computational Linguistics, Jul. 2022, pp. 5721--5732. [Online]. Available: https://aclanthology.org/2022.naacl-main.420
[40]
L. Wang, R. Li, Y. Yan, Y. Yan, S. Wang, W. Wu, and W. Xu, "Instructionner: A multi-task instruction-based generative framework for few-shot ner," arXiv preprint arXiv:2203.03903, 2022.
[41]
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, "Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing," ACM Computing Surveys, vol. 55, no. 9, pp. 1--35, 2023.
[42]
X. L. Li and P. Liang, "Prefix-tuning: Optimizing continuous prompts for generation," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 4582--4597.
[43]
G. Qin and J. Eisner, "Learning how to ask: Querying LMs with mixtures of soft prompts," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics, Jun. 2021, pp. 5203--5212. [Online]. Available: https://aclanthology.org/2021.naacl-main.410
[44]
T. Y. Chen, H. Leung, and I. K. Mak, "Adaptive random testing," in Annual Asian Computing Science Conference. Springer, 2004, pp. 320--329.
[45]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019.
[46]
R. Sennrich, B. Haddow, and A. Birch, "Neural machine translation of rare words with subword units," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 1715--1725.
[47]
H. Guo, S. Yuan, and X. Wu, "Logbert: Log anomaly detection via bert," in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1--8.
[48]
"A large collection of system log datasets for ai-powered log analytics," 2021. [Online]. Available: https://github.com/logpai/loghub
[49]
"A toolkit for automated log parsing," 2022. [Online]. Available: https://github.com/logpai/logparser
[50]
"Artifact for "guidelines for assessing the accuracy of log message template identification techniques"," 2022. [Online].
[51]
I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net, 2019. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7
[52]
L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, "Template-based named entity recognition using bart," in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 1835--1845.
[53]
K. Hambardzumyan, H. Khachatrian, and J. May, "Warp: Word-level adversarial reprogramming," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 4921--4933.
[54]
S. He, J. Zhu, P. He, and M. R. Lyu, "Experience report: System log analysis for anomaly detection," in 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, 2016, pp. 207--218.
[55]
X. Li, P. Chen, L. Jing, Z. He, and G. Yu, "Swisslog: Robust anomaly detection and localization for interleaved unstructured logs," IEEE Transactions on Dependable and Secure Computing, 2022.
[56]
V.-H. Le and H. Zhang, "Log-based anomaly detection with deep learning: How far are we?" in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1356--1367.
[57]
H. Ott, J. Bogatinovski, A. Acker, S. Nedelkoski, and O. Kao, "Robust and transferable anomaly detection in log data using pre-trained language models," in 2021 IEEE/ACM International Workshop on Cloud Intelligence (CloudIntelligence). IEEE, 2021, pp. 19--24.
[58]
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," Advances in neural information processing systems, vol. 32, 2019.
[59]
F. Setianto, E. Tsani, F. Sadiq, G. Domalis, D. Tsakalidis, and P. Kostakos, "Gpt-2c: a parser for honeypot logs using large pre-trained language models," in Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2021, pp. 649--653.
[60]
W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, "Unified pre-training for program understanding and generation," in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 2655--2668.

Cited By

View all
  • (2024)End-to-End AutoML for Unsupervised Log Anomaly DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695535(1680-1692)Online publication date: 27-Oct-2024
  • (2024)Demonstration-Free: Towards More Practical Log Parsing with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694994(153-165)Online publication date: 27-Oct-2024
  • (2024)A Comparative Study on Large Language Models for Log ParsingProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686684(234-244)Online publication date: 24-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '23: Proceedings of the 45th International Conference on Software Engineering
May 2023
2713 pages
ISBN:9781665457019
  • General Chair:
  • John Grundy,
  • Program Co-chairs:
  • Lori Pollock,
  • Massimiliano Di Penta

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 26 July 2023

Check for updates

Author Tags

  1. log parsing
  2. few-shot learning
  3. prompt-tuning
  4. deep learning

Qualifiers

  • Research-article

Conference

ICSE '23
Sponsor:
ICSE '23: 45th International Conference on Software Engineering
May 14 - 20, 2023
Victoria, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)7
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)End-to-End AutoML for Unsupervised Log Anomaly DetectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695535(1680-1692)Online publication date: 27-Oct-2024
  • (2024)Demonstration-Free: Towards More Practical Log Parsing with Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694994(153-165)Online publication date: 27-Oct-2024
  • (2024)A Comparative Study on Large Language Models for Log ParsingProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686684(234-244)Online publication date: 24-Oct-2024
  • (2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
  • (2024)LILAC: Log Parsing using LLMs with Adaptive Parsing CacheProceedings of the ACM on Software Engineering10.1145/36437331:FSE(137-160)Online publication date: 12-Jul-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media