extended-abstract

Agent-Driven Automatic Software Improvement

Author:

Fernando Vallecillos RuizAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 470 - 475

https://doi.org/10.1145/3661167.3661171

Published: 18 June 2024 Publication History

Abstract

With software maintenance accounting for 50% of the cost of developing software, enhancing code quality and reliability has become more critical than ever. In response to this challenge, this doctoral research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) to perform software maintenance tasks. The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. One distinct challenge is the last-mile problems, errors at the final stage of producing functionally and contextually relevant code. Furthermore, this project aims to surpass the inherent limitations of current LLMs in source code through a collaborative framework where agents can correct and learn from each other’s errors. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement. Our main goal is to achieve a leap forward in the field of automatic software improvement by developing new tools and frameworks that can enhance the efficiency and reliability of software development.

References

[1]

A. Alaboudi and T. D. LaToza. 2021. An Exploratory Study of Debugging Episodes. arxiv:2105.02162 [cs]

[2]

R. Bavishi, H. Joshi, J. P. C. Sánchez, A. Fariha, S. Gulwani, V. Le, I. Radicek, and A. Tiwari. 2022. Neurosymbolic Repair for Low-Code Formula Languages. https://doi.org/10.48550/arXiv.2207.11765 arxiv:2207.11765 [cs]

[3]

D. Chen, H. Wang, Y. Huo, Y. Li, and H. Zhang. 2023. GameGPT: Multi-agent Collaborative Framework for Game Development. https://doi.org/10.48550/arXiv.2310.08067 arxiv:2310.08067 [cs]

[4]

X. Chen, M. Lin, N. Schärli, and D. Zhou. 2023. Teaching Large Language Models to Self-Debug. arxiv:2304.05128 [cs]

[5]

Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan. 2023. Automated Repair of Programs from Large Language Models. https://doi.org/10.48550/arXiv.2205.10583 arxiv:2205.10583 [cs]

[6]

R. Feldt, S. Kang, J. Yoon, and S. Yoo. 2023. Towards Autonomous Testing Agents via Conversational Large Language Models. arxiv:2306.05152 [cs]

[7]

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang. 2024. Large Language Model Based Multi-Agents: A Survey of Progress and Challenges. arxiv:2402.01680 [cs]

[8]

J. He and M. Vechev. 2023. Large Language Models for Code: Security Hardening and Adversarial Testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879. https://doi.org/10.1145/3576915.3623175 arxiv:2302.05319 [cs]

Digital Library

[9]

S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, and C. Wu. 2023. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework. arxiv:2308.00352 [cs]

[10]

M. Hort, A. Grishina, and L. Moonen. 2023. An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code. https://doi.org/10.48550/arXiv.2307.02443 arxiv:2307.02443 [cs]

[11]

X. Huang, W. Liu, X. Chen, X. Wang, H. Wang, D. Lian, Y. Wang, R. Tang, and E. Chen. 2024. Understanding the Planning of LLM Agents: A Survey. arxiv:2402.02716 [cs]

[12]

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Duvenaud, D. Ganguli, F. Barez, J. Clark, K. Ndousse, K. Sachan, M. Sellitto, M. Sharma, N. DasSarma, R. Grosse, S. Kravec, Y. Bai, Z. Witten, M. Favaro, J. Brauner, H. Karnofsky, P. Christiano, S. R. Bowman, L. Graham, J. Kaplan, S. Mindermann, R. Greenblatt, B. Shlegeris, N. Schiefer, and E. Perez. 2024. Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training. arxiv:2401.05566 [cs]

[13]

T. Ji, Y. Wu, C. Wang, X. Zhang, and Z. Wang. 2018. The Coming Era of AlphaHacking? A Survey of Automatic Software Vulnerability Detection, Exploitation and Patching Techniques. arxiv:1805.11001 [cs]

[14]

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. 2023. Mistral 7B. arxiv:2310.06825 [cs]

[15]

E. Kalliamvakou. 2022. Research: Quantifying GitHub Copilot’s Impact on Developer Productivity and Happiness.

[16]

S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, and S. Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).

[17]

S. Ma, H. Wang, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, R. Wang, J. Xue, and F. Wei. 2024. The Era of 1-Bit LLMs: All Large Language Models Are in 1.58 Bits. arxiv:2402.17764 [cs]

[18]

N. Mathur, T. Baldwin, and T. Cohn. 2020. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 4984–4997. https://doi.org/10.18653/v1/2020.acl-main.448

[19]

S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Amatriain, and J. Gao. 2024. Large Language Models: A Survey. arxiv:2402.06196 [cs]

[20]

C. Niu, C. Li, V. Ng, D. Chen, J. Ge, and B. Luo. 2023. An Empirical Comparison of Pre-Trained Models of Source Code. In IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2136–2148. https://doi.org/10.1109/icse48619.2023.00180

Digital Library

[21]

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. Bleu: A Method for Automatic Evaluation of Machine Translation. In 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311–318. https://doi.org/10.3115/1073083.1073135

Digital Library

[22]

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri. 2021. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. arxiv:2108.09293 [cs]

[23]

C. Qian, X. Cong, W. Liu, C. Yang, W. Chen, Y. Su, Y. Dang, J. Li, J. Xu, D. Li, Z. Liu, and M. Sun. 2023. Communicative Agents for Software Development. arxiv:2307.07924 [cs]

[24]

A. Radhakrishnan, K. Nguyen, A. Chen, C. Chen, C. Denison, D. Hernandez, E. Durmus, E. Hubinger, J. Kernion, K. Lukošiūtė, N. Cheng, N. Joseph, N. Schiefer, O. Rausch, S. McCandlish, S. E. Showk, T. Lanham, T. Maxwell, V. Chandrasekaran, Z. Hatfield-Dodds, J. Kaplan, J. Brauner, S. R. Bowman, and E. Perez. 2023. Question Decomposition Improves the Faithfulness of Model-Generated Reasoning. https://doi.org/10.48550/arXiv.2307.11768 arxiv:2307.11768 [cs]

[25]

T. Shen, R. Jin, Y. Huang, C. Liu, W. Dong, Z. Guo, X. Wu, Y. Liu, and D. Xiong. 2023. Large Language Model Alignment: A Survey. arxiv:2309.15025 [cs]

[26]

A. Shypula, A. Madaan, Y. Zeng, U. Alon, J. Gardner, M. Hashemi, G. Neubig, P. Ranganathan, O. Bastani, and A. Yazdanbakhsh. 2023. Learning Performance-Improving Code Edits. arxiv:2302.07867 [cs]

[27]

J. B. Simon, D. Karkada, N. Ghosh, and M. Belkin. 2023. More Is Better in Modern Machine Learning: When Infinite Overparameterization Is Optimal and Overfitting Is Obligatory. arxiv:2311.14646 [cs, stat]

[28]

Y. Talebirad and A. Nadiri. 2023. Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents. https://doi.org/10.48550/arXiv.2306.03314 arxiv:2306.03314 [cs]

[29]

L. Tunstall, E. Beeching, N. Lambert, N. Rajani, K. Rasul, Y. Belkada, S. Huang, L. von Werra, C. Fourrier, N. Habib, N. Sarrazin, O. Sanseviero, A. M. Rush, and T. Wolf. 2023. Zephyr: Direct Distillation of LM Alignment. https://doi.org/10.48550/arXiv.2310.16944 arxiv:2310.16944 [cs]

[30]

R. Verdecchia, L. Cruz, J. Sallou, M. Lin, J. Wickenden, and E. Hotellier. 2022. Data-Centric Green AI: An Exploratory Empirical Study. In 2022 International Conference on ICT for Sustainability (ICT4S). 35–45. https://doi.org/10.1109/ict4s55073.2022.00015 arxiv:2204.02766 [cs]

[31]

J. Wang, Y. Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang. 2023. Software Testing with Large Language Model: Survey, Landscape, and Vision. https://doi.org/10.48550/arXiv.2307.07221 arxiv:2307.07221 [cs]

[32]

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, and J.-R. Wen. 2023. A Survey on Large Language Model Based Autonomous Agents. https://doi.org/10.48550/arXiv.2308.11432 arxiv:2308.11432 [cs]

[33]

C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. 2000. Experimentation in Software Engineering: An Introduction. Number 6 in The Kluwer International Series in Software Engineering. Springer.

[34]

Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, R. Zheng, X. Fan, X. Wang, L. Xiong, Y. Zhou, W. Wang, C. Jiang, Y. Zou, X. Liu, Z. Yin, S. Dou, R. Weng, W. Cheng, Q. Zhang, W. Qin, Y. Zheng, X. Qiu, X. Huang, and T. Gui. 2023. The Rise and Potential of Large Language Model Based Agents: A Survey. arxiv:2309.07864 [cs]

[35]

C. S. Xia and L. Zhang. 2023. Conversational Automated Program Repair. arxiv:2301.13246 [cs]

[36]

C. S. Xia and L. Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 Bugs for $0.42 Each Using ChatGPT. arxiv:2304.00385 [cs]

[37]

Z. Zeng, H. Tan, H. Zhang, J. Li, Y. Zhang, and L. Zhang. 2022. An Extensive Study on Pre-Trained Models for Program Understanding and Generation. In 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 39–51. https://doi.org/10.1145/3533767.3534390

Digital Library

[38]

Q. Zhang, C. Fang, Y. Ma, W. Sun, and Z. Chen. 2023. A Survey of Learning-based Automated Program Repair. arxiv:2301.03270 [cs]

[39]

S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang, J. Li, R. Hu, T. Zhang, F. Wu, and G. Wang. 2023. Instruction Tuning for Large Language Models: A Survey. arxiv:2308.10792 [cs]

[40]

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. arxiv:1904.09675 [cs]

[41]

Z. Zheng, K. Ning, Y. Wang, J. Zhang, D. Zheng, M. Ye, and J. Chen. 2024. A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arxiv:2311.10372 [cs]

[42]

K. Zhou, Y. Zhu, Z. Chen, W. Chen, W. X. Zhao, X. Chen, Y. Lin, J.-R. Wen, and J. Han. 2023. Don’t Make Your LLM an Evaluation Benchmark Cheater. arxiv:2311.01964 [cs]

Index Terms

Agent-Driven Automatic Software Improvement
1. Software and its engineering
  1. Software creation and management

Recommendations

Motivation for a new formal framework for agent-oriented software engineering

Agent-Oriented Software Engineering (AOSE) poses several challenges to the traditional theory and practice of software engineering with the emergence of a variety of theories and tools proposed to deal with this challenge. Agent frameworks with formal ...
A case for new directions in agent-oriented software engineering
AOSE'10: Proceedings of the 11th international conference on Agent-oriented software engineering

The state-of-the-art of Agent-oriented Software Engineering (AOSE) is insufficiently reflected in the state-of-practice in developing complex distributed systems. This paper discusses software engineering (SE) areas that have not been widely addressed ...
Agent-oriented software engineering: a model-driven approach

A recent trend in Agent-Oriented Software Engineering (AOSE) is the adoption of a model-driven approach for designing Multiagent Systems (MASs). OMG's Model-Driven Architecture (MDA) provides standards useful for a model-driven software engineering ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

June 2024

728 pages

ISBN:9798400717017

DOI:10.1145/3661167

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Funding Sources

Research Council of Norway

Conference

EASE 2024

EASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering

June 18 - 21, 2024

Salerno, Italy

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
82
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)25

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents