Abstract
Large language models such as OpenAI’s GPT and Google’s Bard offer new opportunities for supporting software engineering processes. Large language model assisted software engineering promises to support developers in a conversational way with expert knowledge over the whole software lifecycle. Current applications range from requirements extraction, ambiguity resolution, code and test case generation, code review and translation to verification and repair of software vulnerabilities. In this paper we present our position on the potential benefits and challenges associated with the adoption of language models in software engineering. In particular, we focus on the possible applications of large language models for requirements engineering, system design, code and test generation, code quality reviews, and software process management. We also give a short review of the state-of-the-art of large language model support for software construction and illustrate our position by a case study on the object-oriented development of a simple “search and rescue” scenario.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Despite the validity of the Church–Turing thesis, more powerful tools enable more products in practice.
References
Anley, C.: Security code review with ChatGPT. NCC Group (2023). https://research.nccgroup.com/2023/02/09/security-code-review-with-chatgpt/. Accessed 20 June 2023
Bang, Y., et al.: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023)
Belzner, L., Hennicker, R., Wirsing, M.: OnPlan: a framework for simulation-based online planning. In: Braga, C., Ölveczky, P.C. (eds.) FACS 2015. LNCS, vol. 9539, pp. 1–30. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28934-2_1
Blasi, A., et al.: Translating code comments to procedure specifications. In: Tip, F., Bodden, E. (eds.) Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, 16–21 July 2018, pp. 242–253. ACM (2018)
Blum, B.I., Wachter, R.F.: Expert system applications in software engineering. Telematics Inform. 3(4), 237–262 (1986)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Buchanan, B.G., Davis, R., Smith, R.G., Feigenbaum, E.A.: Expert systems: a perspective from computer science. Cambridge Handbooks in Psychology, 2nd edn, pp. 84–104. Cambridge University Press (2018)
Busch, D., Nolte, G., Bainczyk, A., Steffen, B.: ChatGPT in the loop. In: Steffen, B. (ed.) AISoLA 2023. LNCS, vol. 14380, pp. 375–390. Springer, Cham (2023)
Chang, E.Y.: Examining GPT-4: capabilities, implications, and future directions (2023)
Chang, Y., et al.: A survey on evaluation of large language models. CoRR, abs/2307.03109 (2023)
Charalambous, Y., Tihanyi, N., Jain, R., Sun, Y., Ferrag, M.A., Cordeiro, L.C.: A new era in software security: towards self-healing software via large language models and formal verification. CoRR, abs/2305.14752 (2023)
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Dettmers, T., Lewis, M., Belkada, Y., Zettlemoyer, L.: Llm.int8(): 8-bit matrix multiplication for transformers at scale. CoRR, abs/2208.07339 (2022)
Feldt, R., Kang, S., Yoon, J., Yoo, S.: Towards autonomous testing agents via conversational large language models. CoRR, abs/2306.05152 (2023). Accessed 29 June 2023
Frantar, E., Alistarh, D.: SparseGPT: massive language models can be accurately pruned in one-shot. CoRR, abs/2301.00774 (2023)
Fu, M.: A ChatGPT-powered code reviewer bot for open-source projects. Cloud Native Computing Foundation (2023). https://www.cncf.io/blog/2023/06/06/a-chatgpt-powered-code-reviewer-bot-for-open-source-projects/. Accessed 20 July 2023
Fu, M., Tantithamthavorn, C.: GPT2SP: a transformer-based agile story point estimation approach. IEEE Trans. Software Eng. 49(2), 611–625 (2023)
Gabor, T.: Self-adaptive fitness in evolutionary processes. Ph.D. thesis, LMU (2021)
Gabor, T., et al.: The scenario coevolution paradigm: adaptive quality assurance for adaptive systems. Int. J. Softw. Tools Technol. Transf. 22(4), 457–476 (2020)
Goldstein, I., Papert, S.: Artificial intelligence, language, and the study of knowledge. Cogn. Sci. 1(1), 84–123 (1977)
Jana, P., Jha, P., Ju, H., Kishore, G., Mahajan, A., Ganesh, V.: Attention, compilation, and solver-based symbolic analysis are all you need. CoRR, abs/2306.06755 (2023)
Kabir, S., Udo-Imeh, D.N., Kou, B., Zhang, T.: Who answers it better? An in-depth analysis of ChatGPT and Stack Overflow answers to software engineering questions. CoRR, abs/2308.02312 (2023)
Kim, S., Yun, S., Lee, H., Gubri, M., Yoon, S., Oh, S.J.: Propile: probing privacy leakage in large language models (2023)
Lahiri, S.K., et al.: Interactive code generation via test-driven user-intent formalization. CoRR, abs/2208.05950 (2022)
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Callison-Burch, C., Koehn, P., Fordyce, C.S., Monz, C. (eds.) Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL 2007, Prague, Czech Republic, 23 June 2007, pp. 228–231. Association for Computational Linguistics (2007)
Li, Y., Tan, Z., Liu, Y.: Privacy-preserving prompt tuning for large language model services (2023)
Liu, J., Xia, C.S., Wang, Y., Zhang, L.: Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. CoRR, abs/2305.01210 (2023)
Liventsev, V., Grishina, A., Härmä, A., Moonen, L.: Fully autonomous programming with large language models. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2023) (2023)
Luccioni, A.S., Viguier, S., Ligozat, A.-L.: Estimating the carbon footprint of bloom, a 176b parameter language model. CoRR, abs/2211.02001 (2022)
McColl, R.: On-demand code review with ChatGPT. NearForm blog (2023). https://www.nearform.com/blog/on-demand-code-review-with-chatgpt/. Accessed 20 June 2023
Motger, Q., Franch, X., Marco, J.: Software-based dialogue systems: survey, taxonomy, and challenges. ACM Comput. Surv. 55(5), 91:1–91:42 (2023)
Naveed, H., et al.: A comprehensive overview of large language models. CoRR, abs/2307.06435 (2023)
Nielsen, J.: AI is first new UI paradigm in 60 years. Jakob Nielsen on UX (2023). https://jakobnielsenphd.substack.com/p/ai-is-first-new-ui-paradigm-in-60. Accessed 03 July 2023
Ouyang, L., et al.: Training language models to follow instructions with human feedback. In: NeurIPS (2022)
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap (2023)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI, San Francisco, California, United States (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf. Accessed 05 July 2023
Ross, S.I., Martinez, F., Houde, S., Muller, M., Weisz, J.D.: The programmer’s assistant: conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI 2023, Sydney, NSW, Australia, 27–31 March 2023, pp. 491–514. ACM (2023)
Sansonnet, J.-P., Martin, J.-C., Leguern, K.: A software engineering approach combining rational and conversational agents for the design of assistance applications. In: Panayiotopoulos, T., Gratch, J., Aylett, R., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 111–119. Springer, Heidelberg (2005). https://doi.org/10.1007/11550617_10
Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: Adaptive test generation using a large language model. CoRR, abs/2302.06527 (2023)
Schröder, M.: Autoscrum: automating project planning using large language models. CoRR, abs/2306.03197 (2023)
Sridhara, G., Mazumdar, S.: ChatGPT: a study on its utility for ubiquitous software engineering tasks. CoRR, abs/2305.16837 (2023)
Thoppilan, R., et al.: Lamda: language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022)
White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. CoRR, abs/2303.07839 (2023)
Wirsing, M., Belzner, L.: Towards systematically engineering autonomous systems using reinforcement learning and planning. In: López-García, P., Gallagher, J.P., Giacobazzi, R. (eds.) Analysis, Verification and Transformation for Declarative Programming and Intelligent Systems. LNCS, vol. 13160, pp. 281–306. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-31476-6_16
Xie, D., et al.: Docter: documentation-guided fuzzing for testing deep learning API functions. In: Ryu, S., Smaragdakis, Y. (eds.) ISSTA 2022: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, 18–22 July 2022, pp. 176–188. ACM (2022)
Xie, D., et al.: Impact of large language models on generating software specifications. CoRR, abs/2306.03324 (2023)
Yan, Z., Qin, Y., Hu, X.S., Shi, Y.: On the viability of using LLMS for SW/HW co-design: an example in designing cim DNN accelerators. CoRR, abs/2306.06923 (2023)
Yuan, Z., et al.: No more manual tests? Evaluating and improving ChatGPT for unit test generation. CoRR, abs/2305.04207 (2023). Accessed 29 June 2023
Zhao, W.X., et al.: A survey of large language models. CoRR, abs/2303.18223 (2023)
Acknowledgements
We thank the anonymous reviewer for constructive criticisms and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Belzner, L., Gabor, T., Wirsing, M. (2024). Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2023. Lecture Notes in Computer Science, vol 14380. Springer, Cham. https://doi.org/10.1007/978-3-031-46002-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-46002-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46001-2
Online ISBN: 978-3-031-46002-9
eBook Packages: Computer ScienceComputer Science (R0)