Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3597503.3623326acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

UniLog: Automatic Logging via LLM and In-Context Learning

Published: 06 February 2024 Publication History

Abstract

Logging, which aims to determine the position of logging statements, the verbosity levels, and the log messages, is a crucial process for software reliability enhancement. In recent years, numerous automatic logging tools have been designed to assist developers in one of the logging tasks (e.g., providing suggestions on whether to log in try-catch blocks). These tools are useful in certain situations yet cannot provide a comprehensive logging solution in general. Moreover, although recent research has started to explore end-to-end logging, it is still largely constrained by the high cost of fine-tuning, hindering its practical usefulness in software development. To address these problems, this paper proposes UniLog, an automatic logging framework based on the in-context learning (ICL) paradigm of large language models (LLMs). Specifically, UniLog can generate an appropriate logging statement with only a prompt containing five demonstration examples without any model tuning. In addition, UniLog can further enhance its logging ability after warmup with only a few hundred random samples. We evaluated UniLog on a large dataset containing 12,012 code snippets extracted from 1,465 GitHub repositories. The results show that UniLog achieved the state-of-the-art performance in automatic logging: (1) 76.9% accuracy in selecting logging positions, (2) 72.3% accuracy in predicting verbosity levels, and (3) 27.1 BLEU-4 score in generating log messages. Meanwhile, UniLog requires less than 4% of the parameter tuning time needed by fine-tuning the same LLM.

References

[1]
Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, and Saravan Rajmohan. 2023. Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models. ICSE (2023).
[2]
Anunay Amar and Peter C Rigby. 2019. Mining historical test logs to predict bugs and localize faults in the test logs. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 140--151.
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[4]
Boxing Chen and Colin Cherry. 2014. A systematic comparison of smoothing techniques for sentence-level BLEU. In Proceedings of the ninth workshop on statistical machine translation. 362--367.
[5]
Boyuan Chen and Zhen Ming Jiang. 2021. A survey of software log instrumentation. ACM Computing Surveys (CSUR) 54, 4 (2021), 1--34.
[6]
Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, and Zornitsa Kozareva. 2022. Improving In-Context Few-Shot Learning via Self-Supervised Training. arXiv preprint arXiv:2205.01703 (2022).
[7]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[8]
Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, and Lingming Zhang. 2022. Fuzzing Deep-Learning Libraries via Large Language Models. arXiv preprint arXiv:2212.14834 (2022).
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Rui Ding, Hucheng Zhou, Jian-Guang Lou, Hongyu Zhang, Qingwei Lin, Qiang Fu, Dongmei Zhang, and Tao Xie. 2015. Log2: A cost-aware logging mechanism for performance diagnosis. In 2015 {USENIX} Annual Technical Conference ({USENIX}{ATC} 15). 139--150.
[11]
Zishuo Ding, Heng Li, and Weiyi Shang. 2022. Logentext: Automatically generating logging texts using neural machine translation. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 349--360.
[12]
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, and Zhifang Sui. 2022. A Survey for In-context Learning. arXiv preprint arXiv:2301.00234 (2022).
[13]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285--1298.
[14]
Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Proceedings of the 2017 11th joint meeting on foundations of software engineering. 49--60.
[15]
Pinjia He, Zhuangbin Chen, Shilin He, and Michael R Lyu. 2018. Characterizing the natural language descriptions in software logging statements. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 178--189.
[16]
Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2021. A survey on automated log analysis for reliability engineering. ACM computing surveys (CSUR) 54, 6 (2021), 1--37.
[17]
Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R Lyu, and Dongmei Zhang. 2018. Identifying impactful service system problems via log analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 60--70.
[18]
Shilin He, Jieming Zhu, Pinjia He, and Michael R Lyu. 2016. Experience report: System log analysis for anomaly detection. In 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, 207--218.
[19]
Sawan Kumar and Partha Talukdar. 2021. Reordering examples helps during priming-based few-shot learning. arXiv preprint arXiv:2106.01751 (2021).
[20]
Heng Li, Weiyi Shang, Bram Adams, Mohammed Sayagh, and Ahmed E Hassan. 2020. A qualitative study of the benefits and costs of logging from developers' perspectives. IEEE Transactions on Software Engineering 47, 12 (2020), 2858--2873.
[21]
Heng Li, Weiyi Shang, and Ahmed E Hassan. 2017. Which log level should developers choose for a new logging statement? Empirical Software Engineering 22 (2017), 1684--1716.
[22]
Lingwei Li, Li Yang, Huaxi Jiang, Jun Yan, Tiejian Luo, Zihan Hua, Geng Liang, and Chun Zuo. 2022. AUGER: automatically generating review comments with pre-training models. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1009--1021.
[23]
Zhenhao Li. 2020. Towards providing automated supports to developers on writing logging statements. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. 198--201.
[24]
Zhenhao Li, Tse-Hsun Chen, and Weiyi Shang. 2020. Where shall we log? studying and suggesting logging locations in code blocks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 361--372.
[25]
Zhenhao Li, Heng Li, Tse-Hsun Chen, and Weiyi Shang. 2021. Deeplv: Suggesting log levels using ordinal based neural networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1461--1472.
[26]
Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, and Xuewei Chen. 2016. Log clustering based problem identification for online service systems. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, 102--111.
[27]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3? arXiv preprint arXiv:2101.06804 (2021).
[28]
Jiahao Liu, Jun Zeng, Xiang Wang, Kaihang Ji, and Zhenkai Liang. 2022. Tell: log level suggestions via modeling multi-level code block information. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 27--38.
[29]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1--35.
[30]
Jian-Guang Lou, Qiang Fu, Shenqi Yang, Ye Xu, and Jiang Li. 2010. Mining invariants from console logs for system problem detection. In 2010 USENIX Annual Technical Conference (USENIX ATC 10).
[31]
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. 2021. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786 (2021).
[32]
Antonio Mastropaolo, Luca Pascarella, and Gabriele Bavota. 2022. Using deep learning to generate complete log statements. In Proceedings of the 44th International Conference on Software Engineering. 2279--2290.
[33]
Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2021. Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943 (2021).
[34]
Noor Nashid, Mifta Sintaha, and Ali Mesbah. 2023. Retrieval-based prompt selection for code-related few-shot learning. In Proceedings of the 45th International Conference on Software Engineering (ICSE'23).
[35]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311--318.
[36]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[37]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 1, Article 140 (jan 2020), 67 pages.
[38]
Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2021. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633 (2021).
[39]
Weiyi Shang, Zhen Ming Jiang, Hadi Hemmati, Brain Adams, Ahmed E Hassan, and Patrick Martin. 2013. Assisting developers of big data analytics applications when deploying on hadoop clouds. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 402--411.
[40]
Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, and Hongbin Sun. 2022. On the evaluation of neural code summarization. In Proceedings of the 44th International Conference on Software Engineering. 1597--1608.
[41]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[42]
Chaozheng Wang, Yuanhang Yang, Cuiyun Gao, Yun Peng, Hongyu Zhang, and Michael R Lyu. 2022. No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 382--394.
[43]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
[44]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022).
[45]
Wikipedia. 2023. Large Language Model. https://en.wikipedia.org/wiki/Large_language_model. [Online; accessed 19-March-2023].
[46]
W Eric Wong, Vidroha Debroy, Richard Golden, Xiaofeng Xu, and Bhavani Thuraisingham. 2011. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability 61, 1 (2011), 149--169.
[47]
Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. 2023. Revisiting the Plastic Surgery Hypothesis via Large Language Models. arXiv preprint arXiv:2303.10494 (2023).
[48]
Chunqiu Steven Xia and Lingming Zhang. 2023. Conversational automated program repair. arXiv preprint arXiv:2301.13246 (2023).
[49]
Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou, and Shankar Pasupathy. 2010. Sherlog: error diagnosis by connecting clues from run-time logs. In Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems. 143--154.
[50]
Ding Yuan, Soyeon Park, Peng Huang, Yang Liu, Michael M Lee, Xiaoming Tang, Yuanyuan Zhou, and Stefan Savage. 2012. Be conservative: Enhancing failure diagnosis with proactive logging. In Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12). 293--306.
[51]
Ding Yuan, Soyeon Park, and Yuanyuan Zhou. 2012. Characterizing logging practices in open-source software. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 102--112.
[52]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1385--1397.
[53]
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 807--817.
[54]
Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning. PMLR, 12697--12706.
[55]
Jieming Zhu, Pinjia He, Qiang Fu, Hongyu Zhang, Michael R Lyu, and Dongmei Zhang. 2015. Learning to log: Helping developers make informed logging decisions. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 415--425.
[56]
De-Qing Zou, Hao Qin, and Hai Jin. 2016. Uilog: Improving log-based fault diagnosis by log analysis. Journal of computer science and technology 31, 5 (2016), 1038--1052.

Cited By

View all
  • (2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
  • (2024)Leveraging Large Language Models and BERT for Log Parsing and Anomaly DetectionMathematics10.3390/math1217275812:17(2758)Online publication date: 5-Sep-2024
  • (2024)Calico: Automated Knowledge Calibration and Diagnosis for Elevating AI Mastery in Code TasksProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680399(1785-1797)Online publication date: 11-Sep-2024
  • Show More Cited By

Index Terms

  1. UniLog: Automatic Logging via LLM and In-Context Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • Faculty of Engineering of University of Porto

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 February 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. logging
    2. large language model
    3. in-context learning

    Qualifiers

    • Research-article

    Conference

    ICSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,482
    • Downloads (Last 6 weeks)358
    Reflects downloads up to 23 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Automated description generation for software patchesInformation and Software Technology10.1016/j.infsof.2024.107543177(107543)Online publication date: Jan-2025
    • (2024)Leveraging Large Language Models and BERT for Log Parsing and Anomaly DetectionMathematics10.3390/math1217275812:17(2758)Online publication date: 5-Sep-2024
    • (2024)Calico: Automated Knowledge Calibration and Diagnosis for Elevating AI Mastery in Code TasksProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680399(1785-1797)Online publication date: 11-Sep-2024
    • (2024)Toward Facilitating Search in VR With the Assistance of Vision Large Language ModelsProceedings of the 30th ACM Symposium on Virtual Reality Software and Technology10.1145/3641825.3687742(1-14)Online publication date: 9-Oct-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media