Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3661167.3661171acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
extended-abstract

Agent-Driven Automatic Software Improvement

Published: 18 June 2024 Publication History

Abstract

With software maintenance accounting for 50% of the cost of developing software, enhancing code quality and reliability has become more critical than ever. In response to this challenge, this doctoral research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) to perform software maintenance tasks. The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. One distinct challenge is the last-mile problems, errors at the final stage of producing functionally and contextually relevant code. Furthermore, this project aims to surpass the inherent limitations of current LLMs in source code through a collaborative framework where agents can correct and learn from each other’s errors. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement. Our main goal is to achieve a leap forward in the field of automatic software improvement by developing new tools and frameworks that can enhance the efficiency and reliability of software development.

References

[1]
A. Alaboudi and T. D. LaToza. 2021. An Exploratory Study of Debugging Episodes. arxiv:2105.02162 [cs]
[2]
R. Bavishi, H. Joshi, J. P. C. Sánchez, A. Fariha, S. Gulwani, V. Le, I. Radicek, and A. Tiwari. 2022. Neurosymbolic Repair for Low-Code Formula Languages. https://doi.org/10.48550/arXiv.2207.11765 arxiv:2207.11765 [cs]
[3]
D. Chen, H. Wang, Y. Huo, Y. Li, and H. Zhang. 2023. GameGPT: Multi-agent Collaborative Framework for Game Development. https://doi.org/10.48550/arXiv.2310.08067 arxiv:2310.08067 [cs]
[4]
X. Chen, M. Lin, N. Schärli, and D. Zhou. 2023. Teaching Large Language Models to Self-Debug. arxiv:2304.05128 [cs]
[5]
Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan. 2023. Automated Repair of Programs from Large Language Models. https://doi.org/10.48550/arXiv.2205.10583 arxiv:2205.10583 [cs]
[6]
R. Feldt, S. Kang, J. Yoon, and S. Yoo. 2023. Towards Autonomous Testing Agents via Conversational Large Language Models. arxiv:2306.05152 [cs]
[7]
T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang. 2024. Large Language Model Based Multi-Agents: A Survey of Progress and Challenges. arxiv:2402.01680 [cs]
[8]
J. He and M. Vechev. 2023. Large Language Models for Code: Security Hardening and Adversarial Testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879. https://doi.org/10.1145/3576915.3623175 arxiv:2302.05319 [cs]
[9]
S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, and C. Wu. 2023. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. arxiv:2308.00352 [cs]
[10]
M. Hort, A. Grishina, and L. Moonen. 2023. An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code. https://doi.org/10.48550/arXiv.2307.02443 arxiv:2307.02443 [cs]
[11]
X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y. Wang, R. Tang, and E. Chen. 2024. Understanding the Planning of LLM Agents: A Survey. arxiv:2402.02716 [cs]
[12]
E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Duvenaud, D. Ganguli, F. Barez, J. Clark, K. Ndousse, K. Sachan, M. Sellitto, M. Sharma, N. DasSarma, R. Grosse, S. Kravec, Y. Bai, Z. Witten, M. Favaro, J. Brauner, H. Karnofsky, P. Christiano, S. R. Bowman, L. Graham, J. Kaplan, S. Mindermann, R. Greenblatt, B. Shlegeris, N. Schiefer, and E. Perez. 2024. Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training. arxiv:2401.05566 [cs]
[13]
T. Ji, Y. Wu, C. Wang, X. Zhang, and Z. Wang. 2018. The Coming Era of AlphaHacking? A Survey of Automatic Software Vulnerability Detection, Exploitation and Patching Techniques. arxiv:1805.11001 [cs]
[14]
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. 2023. Mistral 7B. arxiv:2310.06825 [cs]
[15]
E. Kalliamvakou. 2022. Research: Quantifying GitHub Copilot’s Impact on Developer Productivity and Happiness.
[16]
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, and S. Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
[17]
S. Ma, H. Wang, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, R. Wang, J. Xue, and F. Wei. 2024. The Era of 1-Bit LLMs: All Large Language Models Are in 1.58 Bits. arxiv:2402.17764 [cs]
[18]
N. Mathur, T. Baldwin, and T. Cohn. 2020. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 4984–4997. https://doi.org/10.18653/v1/2020.acl-main.448
[19]
S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Amatriain, and J. Gao. 2024. Large Language Models: A Survey. arxiv:2402.06196 [cs]
[20]
C. Niu, C. Li, V. Ng, D. Chen, J. Ge, and B. Luo. 2023. An Empirical Comparison of Pre-Trained Models of Source Code. In IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2136–2148. https://doi.org/10.1109/icse48619.2023.00180
[21]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. Bleu: A Method for Automatic Evaluation of Machine Translation. In 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311–318. https://doi.org/10.3115/1073083.1073135
[22]
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri. 2021. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. arxiv:2108.09293 [cs]
[23]
C. Qian, X. Cong, W. Liu, C. Yang, W. Chen, Y. Su, Y. Dang, J. Li, J. Xu, D. Li, Z. Liu, and M. Sun. 2023. Communicative Agents for Software Development. arxiv:2307.07924 [cs]
[24]
A. Radhakrishnan, K. Nguyen, A. Chen, C. Chen, C. Denison, D. Hernandez, E. Durmus, E. Hubinger, J. Kernion, K. Lukošiūtė, N. Cheng, N. Joseph, N. Schiefer, O. Rausch, S. McCandlish, S. E. Showk, T. Lanham, T. Maxwell, V. Chandrasekaran, Z. Hatfield-Dodds, J. Kaplan, J. Brauner, S. R. Bowman, and E. Perez. 2023. Question Decomposition Improves the Faithfulness of Model-Generated Reasoning. https://doi.org/10.48550/arXiv.2307.11768 arxiv:2307.11768 [cs]
[25]
T. Shen, R. Jin, Y. Huang, C. Liu, W. Dong, Z. Guo, X. Wu, Y. Liu, and D. Xiong. 2023. Large Language Model Alignment: A Survey. arxiv:2309.15025 [cs]
[26]
A. Shypula, A. Madaan, Y. Zeng, U. Alon, J. Gardner, M. Hashemi, G. Neubig, P. Ranganathan, O. Bastani, and A. Yazdanbakhsh. 2023. Learning Performance-Improving Code Edits. arxiv:2302.07867 [cs]
[27]
J. B. Simon, D. Karkada, N. Ghosh, and M. Belkin. 2023. More Is Better in Modern Machine Learning: When Infinite Overparameterization Is Optimal and Overfitting Is Obligatory. arxiv:2311.14646 [cs, stat]
[28]
Y. Talebirad and A. Nadiri. 2023. Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents. https://doi.org/10.48550/arXiv.2306.03314 arxiv:2306.03314 [cs]
[29]
L. Tunstall, E. Beeching, N. Lambert, N. Rajani, K. Rasul, Y. Belkada, S. Huang, L. von Werra, C. Fourrier, N. Habib, N. Sarrazin, O. Sanseviero, A. M. Rush, and T. Wolf. 2023. Zephyr: Direct Distillation of LM Alignment. https://doi.org/10.48550/arXiv.2310.16944 arxiv:2310.16944 [cs]
[30]
R. Verdecchia, L. Cruz, J. Sallou, M. Lin, J. Wickenden, and E. Hotellier. 2022. Data-Centric Green AI: An Exploratory Empirical Study. In 2022 International Conference on ICT for Sustainability (ICT4S). 35–45. https://doi.org/10.1109/ict4s55073.2022.00015 arxiv:2204.02766 [cs]
[31]
J. Wang, Y. Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang. 2023. Software Testing with Large Language Model: Survey, Landscape, and Vision. https://doi.org/10.48550/arXiv.2307.07221 arxiv:2307.07221 [cs]
[32]
L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, and J.-R. Wen. 2023. A Survey on Large Language Model Based Autonomous Agents. https://doi.org/10.48550/arXiv.2308.11432 arxiv:2308.11432 [cs]
[33]
C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. 2000. Experimentation in Software Engineering: An Introduction. Number 6 in The Kluwer International Series in Software Engineering. Springer.
[34]
Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y. Zhou, W. Wang, C. Jiang, Y. Zou, X. Liu, Z. Yin, S. Dou, R. Weng, W. Cheng, Q. Zhang, W. Qin, Y. Zheng, X. Qiu, X. Huang, and T. Gui. 2023. The Rise and Potential of Large Language Model Based Agents: A Survey. arxiv:2309.07864 [cs]
[35]
C. S. Xia and L. Zhang. 2023. Conversational Automated Program Repair. arxiv:2301.13246 [cs]
[36]
C. S. Xia and L. Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 Bugs for $0.42 Each Using ChatGPT. arxiv:2304.00385 [cs]
[37]
Z. Zeng, H. Tan, H. Zhang, J. Li, Y. Zhang, and L. Zhang. 2022. An Extensive Study on Pre-Trained Models for Program Understanding and Generation. In 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 39–51. https://doi.org/10.1145/3533767.3534390
[38]
Q. Zhang, C. Fang, Y. Ma, W. Sun, and Z. Chen. 2023. A Survey of Learning-based Automated Program Repair. arxiv:2301.03270 [cs]
[39]
S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang, J. Li, R. Hu, T. Zhang, F. Wu, and G. Wang. 2023. Instruction Tuning for Large Language Models: A Survey. arxiv:2308.10792 [cs]
[40]
T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. arxiv:1904.09675 [cs]
[41]
Z. Zheng, K. Ning, Y. Wang, J. Zhang, D. Zheng, M. Ye, and J. Chen. 2024. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arxiv:2311.10372 [cs]
[42]
K. Zhou, Y. Zhu, Z. Chen, W. Chen, W. X. Zhao, X. Chen, Y. Lin, J.-R. Wen, and J. Han. 2023. Don’t Make Your LLM an Evaluation Benchmark Cheater. arxiv:2311.01964 [cs]

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
728 pages
ISBN:9798400717017
DOI:10.1145/3661167
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

  1. Automatic Maintenance
  2. Automatic Software Improvement
  3. LLM-based Agents
  4. ML4Code
  5. Multi-Agent Systems

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Funding Sources

  • Research Council of Norway

Conference

EASE 2024

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 82
    Total Downloads
  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)25
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media