Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Only diff Is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model

Published: 12 July 2024 Publication History

Abstract

Commit messages play a vital role in software development and maintenance. While previous research has introduced various Commit Message Generation (CMG) approaches, they often suffer from a lack of consideration for the broader software context associated with code changes. This limitation resulted in generated commit messages that contained insufficient information and were poorly readable. To address these shortcomings, we approached CMG as a knowledge-intensive reasoning task. We employed ReAct prompting with a cutting-edge Large Language Model (LLM) to generate high-quality commit messages. Our tool retrieves a wide range of software context information, enabling the LLM to create commit messages that are factually grounded and comprehensive. Additionally, we gathered commit message quality expectations from software practitioners, incorporating them into our approach to further enhance message quality. Human evaluation demonstrates the overall effectiveness of our CMG approach, which we named Omniscient Message Generator (OMG). It achieved an average improvement of 30.2% over human-written messages and a 71.6% improvement over state-of-the-art CMG methods.

References

[1]
2006. Code Change Example 2. https://github.com/apache/maven/commit/40aacad4f0d2b0b33f3a70b971030c5d42afa167. 2006
[2]
2013. Code Change Example 1. https://github.com/apache/karaf/commit/5ea93654cf709383c1d59012e749e0fa20e70ffb. 2013
[3]
2022. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt. 2022
[4]
2023. Agents in Langchain. https://python.langchain.com/docs/modules/agents/. 2023
[5]
2023. Apache Jira. https://issues.apache.org/jira. 2023
[6]
2023. Apache Software Foundation Contributor Guide. https://community.apache.org/contributors/. 2023
[7]
2023. beautifulsoup4. https://pypi.org/project/beautifulsoup4/. 2023
[8]
2023. Github. https://github.com/. 2023
[9]
2023. GPT-4. https://openai.com/research/gpt-4. 2023
[10]
2023. Jira Issue tracking system. https://www.atlassian.com/software/jira. 2023
[11]
2023. LangChain. https://www.langchain.com/. 2023
[12]
2023. LangChain’s code understanding agent. https://python.langchain.com/docs/use_cases/code_understanding. 2023
[13]
2023. Pygithub: A python library to access the github api v3. https://github.com/PyGithub/PyGithub. 2023
[14]
2023. SciTools Understand. https://scitools.com/. 2023
[15]
Iftekhar Ahmed, Umme Ayda Mannan, Rahul Gopinath, and Carlos Jensen. 2015. An empirical study of design degradation: How software projects get worse over time. In 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–10.
[16]
Ahmed Anwar, Haider Ilyas, Ussama Yaqub, and Salma Zaman. 2021. Analyzing qanon on twitter in context of us elections 2020: Analysis of user messages and profiles using vader and bert topic modeling. In DG. O2021: The 22nd Annual International Conference on Digital Government Research. 82–88.
[17]
Hirojiro Aoyama. 1954. A study of stratified random sampling. Ann. Inst. Stat. Math, 6, 1 (1954), 1–36.
[18]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research, 3, Jan (2003), 993–1022.
[19]
Ali Borji. 2023. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494.
[20]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 (2020), 1877–1901.
[21]
Raymond PL Buse and Westley R Weimer. 2010. Automatically documenting program changes. In Proceedings of the 25th IEEE/ACM international conference on automated software engineering. 33–42.
[22]
Kuljit Kaur Chahal and Munish Saini. 2018. Developer dynamics and syntactic quality of commit messages in oss projects. In Open Source Systems: Enterprise Software and Solutions: 14th IFIP WG 2.13 International Conference, OSS 2018, Athens, Greece, June 8-10, 2018, Proceedings 14. 61–76.
[23]
Tadeusz Cheł kowski, Peter Gloor, and Dariusz Jemielniak. 2016. Inequalities in open source software development: Analysis of contributor’s commits in apache software foundation projects. PLoS One, 11, 4 (2016), e0152976.
[24]
Dan Chen and Sally E Goldin. 2020. A project-level investigation of software commit comments and code quality. In 2020 3rd International Conference on Information and Communications Technology (ICOIACT). 240–245.
[25]
Luis Fernando Cortés-Coy, Mario Linares-Vásquez, Jairo Aponte, and Denys Poshyvanyk. 2014. On automatically generating commit messages via summarization of source code changes. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. 275–284.
[26]
Andrew M Dai and Quoc V Le. 2015. Semi-supervised sequence learning. Advances in neural information processing systems, 28 (2015).
[27]
Brian De Alwis and Jonathan Sillito. 2009. Why are software projects moving from centralized to decentralized version control systems? In 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering. 36–39.
[28]
Themistoklis Diamantopoulos, Dimitrios-Nikitas Nastos, and Andreas Symeonidis. 2023. Semantically-enriched Jira issue tracking data. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 218–222.
[29]
Jinhao Dong, Yiling Lou, Dan Hao, and Lin Tan. 2023. Revisiting Learning-based Commit Message Generation. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 794–805.
[30]
Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: fine-grained graph-based code change representation for automated commit message generation. In Proceedings of the 44th International Conference on Software Engineering. 970–981.
[31]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). 422–431.
[32]
Zixuan Feng, Amreeta Chatterjee, Anita Sarma, and Iftekhar Ahmed. 2022. A case study of implicit mentoring, its prevalence, and impact in apache. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 797–809.
[33]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.
[34]
Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. 2024. Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning.
[35]
Jiri Gesi, Jiawei Li, and Iftekhar Ahmed. 2021. An empirical examination of the impact of bias on just-in-time defect prediction. In Proceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.
[36]
Mohammad Gharehyazie, Daryl Posnett, Bogdan Vasilescu, and Vladimir Filkov. 2015. Developer initiation and social interactions in OSS: A case study of the Apache Software Foundation. Empirical Software Engineering, 20 (2015), 1318–1353.
[37]
Leo A Goodman. 1961. Snowball sampling. The annals of mathematical statistics, 148–170.
[38]
Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794.
[39]
Yichen He, Liran Wang, Kaiyi Wang, Yupeng Zhang, Hang Zhang, and Zhoujun Li. 2023. COME: Commit Message Generation with Modification Embedding. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 792–803.
[40]
Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiang-Ping Chen, Zi-Bin Zheng, and Ming-Dong Tang. 2020. Learning human-written commit messages to document code changes. Journal of Computer Science and Technology, 35 (2020), 1258–1277.
[41]
Yuan Huang, Qiaoyang Zheng, Xiangping Chen, Yingfei Xiong, Zhiyong Liu, and Xiaonan Luo. 2017. Mining version control system for automatically generating commit comment. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 414–423.
[42]
Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1161–1173.
[43]
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 135–146.
[44]
Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, and Ahmed E Hassan. 2016. Logging library migrations: A case study for the apache software foundation projects. In Proceedings of the 13th International Conference on Mining Software Repositories. 154–164.
[45]
Mira Kajko-Mattsson. 2005. A survey of documentation practice within corrective maintenance. Empirical Software Engineering, 10 (2005), 31–55.
[46]
Katja Kevic, Braden M Walters, Timothy R Shaffer, Bonita Sharif, David C Shepherd, and Thomas Fritz. 2015. Tracing software developers’ eyes and interactions for change tasks. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 202–213.
[47]
Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, and Thomas Wolf. 2022. The stack: 3 tb of permissively licensed source code. arXiv preprint arXiv:2211.15533.
[48]
Stanislav Levin and Amiram Yehudai. 2017. Boosting automatic commit classification into maintenance activities by utilizing source code changes. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering. 97–106.
[49]
Jiawei Li and Iftekhar Ahmed. 2023. Commit message matters: Investigating impact and evolution of commit message quality. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 806–817.
[50]
Mingchen Li, Huiqun Yu, Guisheng Fan, Ziyi Zhou, and Jiawen Huang. 2023. ClassSum: a deep learning model for class-level code summarization. Neural Computing and Applications, 35, 4 (2023), 3373–3393.
[51]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, and Jenny Chim. 2023. StarCoder: may the source be with you!. arXiv preprint arXiv:2305.06161.
[52]
Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. Changescribe: A tool for automatically generating commit messages. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 2, 709–712.
[53]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210.
[54]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys, 55, 9 (2023), 1–35.
[55]
Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, and Yu Qian. 2019. Generating commit messages from diffs using pointer-generator network. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 299–309.
[56]
Shangqing Liu, Cuiyun Gao, Sen Chen, Lun Yiu Nie, and Yang Liu. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering, 48, 5 (2020), 1800–1817.
[57]
Zhongxin Liu, Xin Xia, Ahmed E Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-machine-translation-based commit message generation: how far are we? In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 373–384.
[58]
Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A neural architecture for generating natural language descriptions from source code changes. arXiv preprint arXiv:1704.04856.
[59]
Umme Ayda Mannan, Iftekhar Ahmed, Carlos Jensen, and Anita Sarma. 2020. On the relationship between design discussions and design quality: a case study of Apache projects. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 543–555.
[60]
Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2, 11 (2017), 205.
[61]
Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
[62]
Diener MJ. 2010. Cohen’s d. The Corsini encyclopedia of psychology.
[63]
Mockus and Votta. 2000. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance. 120–130.
[64]
Thaís Mombach and Marco Tulio Valente. 2018. GitHub REST API vs GHTorrent vs GitHub Archive: A comparative study.
[65]
Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, and Zenglin Xu. 2021. Coregen: Contextualized code representation learning for commit message generation. Neurocomputing, 459 (2021), 97–107.
[66]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.
[67]
Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, and Dacheng Tao. 2023. Towards making the most of chatgpt for machine translation. arXiv preprint arXiv:2303.13780.
[68]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.
[69]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
[70]
Samantha Robertson, Zijie J Wang, Dominik Moritz, Mary Beth Kery, and Fred Hohman. 2023. Angler: Helping Machine Translation Practitioners Prioritize Model Improvements. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
[71]
Graeme D Ruxton. 2006. The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. Behavioral Ecology, 17, 4 (2006), 688–690.
[72]
Jinfeng Shen, Xiaobing Sun, Bin Li, Hui Yang, and Jiajun Hu. 2016. On automatic summarization of what and why information in source code changes. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). 1, 103–112.
[73]
Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2022. RACE: Retrieval-Augmented Commit Message Generation. arXiv preprint arXiv:2203.02700.
[74]
Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and Thomas Zimmermann. 2013. Improving developer participation rates in surveys. In 2013 6th International workshop on cooperative and human aspects of software engineering (CHASE). 89–92.
[75]
Nicholas Smith, Danny Van Bruggen, and Federico Tomassetti. 2017. Javaparser: visited. Leanpub, oct. de, 10 (2017), 29–40.
[76]
Supplementary. 2023. Replication Package. https://figshare.com/s/d0d7375a2d19edf62cd4
[77]
Ayisha Tabassum and Rajendra R Patil. 2020. A survey on text pre-processing & feature extraction techniques in natural language processing. International Research Journal of Engineering and Technology (IRJET), 7, 06 (2020), 4864–4867.
[78]
Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2021. On the evaluation of commit message generation models: An experimental study. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 126–136.
[79]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a large-scale dataset for fact extraction and VERification. arXiv preprint arXiv:1803.05355.
[80]
Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. 2022. What makes a good commit message? In Proceedings of the 44th International Conference on Software Engineering. 2389–2401.
[81]
Apoorva Upadhyaya, Marco Fisichella, and Wolfgang Nejdl. 2023. A Multi-task Model for Sentiment Aided Stance Detection of Climate Change Tweets. In Proceedings of the International AAAI Conference on Web and Social Media. 17, 854–865.
[82]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30 (2017).
[83]
Haoye Wang, Xin Xia, David Lo, Qiang He, Xinyu Wang, and John Grundy. 2021. Context-aware retrieval-based deep commit message generation. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 4 (2021), 1–30.
[84]
Liran Wang, Xunzhu Tang, Yichen He, Changyu Ren, Shuhua Shi, Chaoran Yan, and Zhoujun Li. 2023. Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models. arXiv preprint arXiv:2308.00147.
[85]
Ying Wang, Bihuan Chen, Kaifeng Huang, Bowen Shi, Congying Xu, Xin Peng, Yijian Wu, and Yang Liu. 2020. An empirical study of usages, updates and risks of third-party libraries in java projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 35–45.
[86]
Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
[87]
Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, and Jian Lu. 2019. Commit message generation for source code changes. In IJCAI.
[88]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600.
[89]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
[90]
Bereket A Yilma and Luis A Leiva. 2023. The Elements of Visual Art Recommendation: Learning Latent Semantic Representations of Paintings. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
[91]
Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, DongGyun Han, David Lo, and Lingxiao Jiang. 2022. Automatic pull request title generation. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). 71–81.
[92]
Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, DongGyun Han, David Lo, and Lingxiao Jiang. 2022. iTiger: an automatic issue title generation tool. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1637–1641.
[93]
Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.
[94]
Thomas Zimmermann. 2016. Card-sorting: From text to themes. In Perspectives on data science for software engineering. Elsevier, 137–141.
[95]
Zulip. 2021. Zulip Commit Guideline. https://zulip.readthedocs.io/en/latest/contributing/commit-discipline.html##commit-messages

Index Terms

  1. Only diff Is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Software Engineering
    Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
    July 2024
    2770 pages
    EISSN:2994-970X
    DOI:10.1145/3554322
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 July 2024
    Published in PACMSE Volume 1, Issue FSE

    Badges

    • Distinguished Paper

    Author Tags

    1. Commit message generation
    2. large language model

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 869
      Total Downloads
    • Downloads (Last 12 months)869
    • Downloads (Last 6 weeks)229
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media