research-article

CREF: An LLM-Based Conversational Software Repair Framework for Programming Tutors

Authors:

Tegawendé F. Bissyandé,

Shunfu JinAuthors Info & Claims

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 882 - 894

https://doi.org/10.1145/3650212.3680328

Published: 11 September 2024 Publication History

Abstract

With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, existing repair benchmarks might have influenced LLM training data, potentially causing data leakage. To evaluate LLMs’ realistic repair capabilities, (i) we introduce an extensive, non-crawled benchmark TutorCode, comprising 1,239 C++ defect codes and associated information such as tutor guidance, solution description, failing test cases, and the corrected code. Our work assesses LLM’s repair performance on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). (ii) We then provide a comprehensive investigation into which types of extra information can help LLMs improve their repair performance. Among these types, tutor guidance was the most effective information. To fully harness LLMs’ conversational capabilities and the benefits of augmented information, (iii) we introduce a novel conversational semi-automatic repair framework CREF assisting human programming tutors. It demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs’ repair capabilities through tutor interactions and historical conversations. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors’ workload and improving students’ learning experience, showing promise for code review and other software engineering tasks.

References

[1]

Umair Z Ahmed, Zhiyu Fan, Jooyong Yi, Omar I Al-Bataineh, and Abhik Roychoudhury. 2022. Verifix: Verified repair of programming assignments. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 4 (2022), 1–31.

Digital Library

[2]

Rachith Aiyappa, Jisun An, Haewoon Kwak, and Yong-Yeol Ahn. 2023. Can we trust the evaluation on ChatGPT? arXiv preprint arXiv:2303.12767.

[3]

Gabin An, Minhyuk Kwon, Kyunghwa Choi, Jooyong Yi, and Shin Yoo. 2023. BUGSC++: A Highly Usable Real World Defect Benchmark for C/C++. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2034–2037.

[4]

Anthropic. 2023. Introducing Claude. Anthropic Blog, https://www.anthropic.com/index/introducing-claude

[5]

Amos Azaria, Rina Azoulay, and Shulamit Reches. 2023. ChatGPT is a Remarkable Tool–For Experts. arXiv preprint arXiv:2306.03102.

[6]

Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson. 2023. StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code. arXiv preprint arXiv:2306.04556.

[7]

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan. 2022. Constitutional AI: Harmlessness from AI Feedback. arxiv:2212.08073.

[8]

Marcel Böhme, Charaka Geethal, and Van-Thuan Pham. 2020. Human-in-the-loop automatic program repair. In 2020 IEEE 13th international conference on software testing, validation and verification (ICST). 274–285.

[9]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 (2020), 1877–1901.

[10]

Jialun Cao, Meiziniu Li, Ming Wen, and Shing chi Cheung. 2023. A study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repair. arxiv:2304.08191.

[11]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374.

[12]

Yukun Dong, Meng Wu, Li Zhang, Wenjing Yin, Mengying Wu, and Haojie Li. 2020. Priority Measurement of Patches for Program Repair Based on Semantic Distance. Symmetry, 12, 12 (2020), 2102.

[13]

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1469–1481.

Digital Library

[14]

Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, and Mike Lewis. 2022. InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations.

[15]

Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 935–947.

Digital Library

[16]

Xiang Gao, Yannic Noller, and Abhik Roychoudhury. 2022. Program repair. arXiv preprint arXiv:2211.12787.

[17]

Xiang Gao and Abhik Roychoudhury. 2020. Interactive patch generation and suggestion. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 17–18.

Digital Library

[18]

Li Ge, Peng Xin, Wang Qianxiang, Xie Tao, Jin Zhi, Wang Ji, Ma Xiaoxing, and Li Xuandong. 2023. Challenges from LLMs as a Natural Language Based Human-machine Collaborative Tool for Software Development and Evolution. In Journal of Software, 2023, 34(10). 4601–4606.

[19]

Ukeje Chukwuemeriwo Goodness. 2023. What Is Claude AI and Why Should You Use It? MakeUseOf, https://www.makeuseof.com/what-is-claude-ai-why-use-it/

[20]

Claire Le Goues, Neal J. Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar T. Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Trans. Software Eng., 41, 12 (2015), 1236–1256. https://doi.org/10.1109/TSE.2015.2454513

Digital Library

[21]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair. Commun. ACM, 62, 12 (2019), 56–65.

Digital Library

[22]

Sumit Gulwani, Ivan Radiček, and Florian Zuleger. 2018. Automated clustering and program repair for introductory programming assignments. ACM SIGPLAN Notices, 53, 4 (2018), 465–480.

Digital Library

[23]

Yang Hu, Umair Z. Ahmed, Sergey Mechtaev, Ben Leong, and Abhik Roychoudhury. 2019. Re-Factoring Based Program Repair Applied to Programming Assignments. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 388–398. https://doi.org/10.1109/ASE.2019.00044

Digital Library

[24]

Dongchen Jiang and Bo Xu. 2022. Generation of C++ Code from Isabelle/HOL Specification. International Journal of Software Engineering and Knowledge Engineering, 32, 07 (2022), 1043–1069.

[25]

Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, and Xiangyu Zhang. 2023. Knod: Domain knowledge distilled tree decoder for automated program repair. arXiv preprint arXiv:2302.01857.

[26]

René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In International Symposium on Software Testing and Analysis, ISSTA ’14, San Jose, CA, USA - July 21 - 26, 2014, Corina S. Pasareanu and Darko Marinov (Eds.). ACM, 437–440. https://doi.org/10.1145/2610384.2628055

Digital Library

[27]

Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan. 2018. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference on artificial intelligence. 32.

[28]

YoungJae Kim, Seungheon Han, Askar Yeltayuly Khamit, and Jooyong Yi. 2023. Automated Program Repair from Fuzzing Perspective. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 854–866.

Digital Library

[29]

Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th international symposium on software testing and analysis. 165–176.

Digital Library

[30]

Sophia D Kolak, Ruben Martins, Claire Le Goues, and Vincent Josua Hellendoorn. 2022. Patch generation with language models: Feasibility and scaling behavior. In Deep Learning for Code Workshop.

[31]

Xuan Bach D Le, David Lo, and Claire Le Goues. 2016. History driven program repair. In 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER). 1, 213–224.

[32]

Claire Le Goues, Michael Pradel, Abhik Roychoudhury, and Satish Chandra. 2021. Automatic program repair. IEEE Software, 38, 4 (2021), 22–27.

Digital Library

[33]

Changyoon Lee, Junho Myung, Jieun Han, Jiho Jin, and Alice Oh. 2023. Learning from Teaching Assistants to Program with Subgoals: Exploring the Potential for AI Teaching Assistants. arXiv preprint arXiv:2309.10419.

[34]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880.

[35]

Qingyuan Li, Wenkang Zhong, Chuanyi Li, Jidong Ge, and Bin Luo. 2024. Empirical Study on the Data Leakage Problem in Neural Program Repair. Journal of Software, 35, 7 (2024), 0–0.

[36]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries. 2023. StarCoder: may the source be with you!. arxiv:2305.06161.

[37]

Yichen Li, Yintong Huo, Zhihan Jiang, Renyi Zhong, Pinjia He, Yuxin Su, and Michael R Lyu. 2023. Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study. arXiv preprint arXiv:2307.05950.

[38]

Bo Lin, Shangwen Wang, Ming Wen, and Xiaoguang Mao. 2022. Context-aware code change embedding for better patch correctness assessment. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 3 (2022), 1–29.

Digital Library

[39]

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2023. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210.

[40]

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. WizardCoder: Empowering Code Large Language Models with Evol-Instruct. arXiv preprint arXiv:2306.08568.

[41]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50–60.

[42]

Martin Monperrus. 2018. Automatic software repair: A bibliography. ACM Computing Surveys (CSUR), 51, 1 (2018), 1–24.

Digital Library

[43]

Chao Ni, Wei Wang, Kaiwen Yang, Xin Xia, Kui Liu, and David Lo. 2022. The best of both worlds: integrating semantic features with expert features for defect prediction and localization. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 672–683.

Digital Library

[44]

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. ICLR.

[45]

OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt

[46]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.

[47]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35 (2022), 27730–27744.

[48]

Nikhil Parasaram, Earl T Barr, and Sergey Mechtaev. 2023. Rete: Learning Namespace Representation for Program Repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1264–1276.

[49]

Tung Phung, José Cambronero, Sumit Gulwani, Tobias Kohn, Rupak Majumdar, Adish Singla, and Gustavo Soares. 2023. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662.

[50]

Weiguo Pian, Hanyu Peng, Xunzhu Tang, Tiezhu Sun, Haoye Tian, Andrew Habib, Jacques Klein, and Tegawendé F Bissyandé. 2023. MetaTPTrans: A meta learning approach for multilingual code representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 37, 5239–5247.

Digital Library

[51]

Sundar Pichai. 2023. An important next step on our AI journey. Google Technology Blog, https://blog.google/technology/ai/bard-google-ai-search-updates/

[52]

Julian Aron Prenner, Hlib Babii, and Romain Robbes. 2022. Can OpenAI’s codex fix bugs? an evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair. 69–75.

Digital Library

[53]

Fangcheng Qiu, Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Xinyu Wang. 2021. Deep just-in-time defect localization. IEEE Transactions on Software Engineering, 48, 12 (2021), 5068–5086.

[54]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21, 1 (2020), 5485–5551.

Digital Library

[55]

Inc. Repl.it. 2023. replit-code-v1-3b. Hugging Face Hub, https://huggingface.co/replit/replit-code-v1-3b

[56]

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, and Jérémy Rapin. 2023. Code Llama: Open Foundation Models for Code. arXiv preprint arXiv:2308.12950.

[57]

Atsushi Shirafuji, Md Mostafizer Rahman, Md Faizul Ibne Amin, and Yutaka Watanobe. 2023. Program repair with minimal edits using codet5. arXiv preprint arXiv:2309.14760.

[58]

Dominik Sobania, Martin Briesch, Carol Hanna, and Justyna Petke. 2023. An analysis of the automatic bug fixing performance of chatgpt. arXiv preprint arXiv:2301.08653.

[59]

Bjarne Stroustrup. 2013. The C++ programming language. Pearson Education.

[60]

Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, and Yu Du. 2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.

[61]

Haoye Tian, Kui Liu, Abdoul Kader Kaboré, Anil Koyuncu, Li Li, Jacques Klein, and Tegawendé F Bissyandé. 2020. Evaluating representation learning of code changes for predicting patch correctness in program repair. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 981–992.

Digital Library

[62]

Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawendé F Bissyandé. 2023. Is ChatGPT the Ultimate Programming Assistant–How far is it? arXiv preprint arXiv:2304.11938.

[63]

Haoye Tian, Xunzhu Tang, Andrew Habib, Shangwen Wang, Kui Liu, Xin Xia, Jacques Klein, and Tegawendé F Bissyandé. 2022. Is this change the answer to that problem? Correlating descriptions of bug and code changes for evaluating patch correctness. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Digital Library

[64]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971.

[65]

Lewis Tunstall, Nathan Lambert, Nazneen Rajani, Edward Beeching, Teven Le Scao, Leandro von Werra, Sheon Han, Philipp Schmid, and Alexander Rush. 2023. Creating a Coding Assistant with StarCoder. Hugging Face Blog, https://huggingface.co/blog/starchat

[66]

Shangwen Wang, Ming Wen, Liqian Chen, Xin Yi, and Xiaoguang Mao. 2019. How different is it between machine-generated and developer-provided patches?: An empirical study on the correct patches generated by automated program repair techniques. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.

[67]

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. 2023. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922.

[68]

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8696–8708.

[69]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35 (2022), 24824–24837.

[70]

Emily Winter, Vesna Nowack, David Bowes, Steve Counsell, Tracy Hall, Sæ mundur Haraldsson, and John Woodward. 2022. Let’s talk with developers, not about developers: A review of automatic program repair research. IEEE Transactions on Software Engineering, 49, 1 (2022), 419–436.

[71]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.

Digital Library

[72]

Chunqiu Steven Xia and Lingming Zhang. 2023. Conversational Automated Program Repair. arxiv:2301.13246.

[73]

Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arxiv:2304.00385.

[74]

Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawendé F. Bissyandé, Claire Le Goues, and Shunfu Jin. 2024. Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs. arxiv:2404.12636.

[75]

Michihiro Yasunaga and Percy Liang. 2020. Graph-based, self-supervised program repair from diagnostic feedback. In International Conference on Machine Learning. 10799–10808.

[76]

He Ye, Matias Martinez, and Martin Monperrus. 2022. Neural program repair with execution-based backpropagation. In Proceedings of the 44th International Conference on Software Engineering. 1506–1518.

Digital Library

[77]

Jooyong Yi, Umair Z Ahmed, Amey Karkare, Shin Hwei Tan, and Abhik Roychoudhury. 2017. A feasibility study of using automated program repair for introductory programming assignments. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 740–751.

Digital Library

[78]

Jialu Zhang, José Cambronero, Sumit Gulwani, Vu Le, Ruzica Piskac, Gustavo Soares, and Gust Verbruggen. 2022. Repairing bugs in python assignments using large language models. arXiv preprint arXiv:2209.14876.

[79]

Jialu Zhang, De Li, John Charles Kolesar, Hanyuan Shi, and Ruzica Piskac. 2022. Automated feedback generation for competition-level code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Digital Library

[80]

Yuwei Zhang, Zhi Jin, Ying Xing, and Ge Li. 2023. STEAM: Simulating the InTeractive BEhavior of ProgrAMmers for Automatic Bug Fixing. arXiv preprint arXiv:2308.14460.

[81]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arxiv:2306.05685.

Index Terms

CREF: An LLM-Based Conversational Software Repair Framework for Programming Tutors
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Model and Program Repair via SAT Solving
Special Issue on MEMCODE 2015 and Regular Papers (Diamonds)

We consider the subtractive model repair problem: given a finite Kripke structure M and a CTL formula η, determine if M contains a substructure M^′ that satisfies η. Thus, M can be “repaired” to satisfy eta by deleting some transitions and states. We map ...
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in repairing real-world bugs. However, existing APR ...
ConDefects: A Complementary Dataset to Address the Data Leakage Concern for LLM-Based Fault Localization and Program Repair
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

With the growing interest on Large Language Models (LLMs) for fault localization and program repair, ensuring the integrity and generalizability of the LLM-based methods becomes paramount. The code in existing widely-adopted benchmarks for these tasks ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 2024

1928 pages

ISBN:9798400706127

DOI:10.1145/3650212

General Chair:
Maria Christakis
TU Wien, Austria
,
Program Chair:
Michael Pradel
University of Stuttgart, Germany

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISSTA '24

Sponsor:

SIGSOFT

ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
429
Total Downloads

Downloads (Last 12 months)429
Downloads (Last 6 weeks)85

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten