research-article

HITS: High-coverage LLM-based Unit Test Generation via Method Slicing

Authors:

Zhi JinAuthors Info & Claims

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Pages 1258 - 1268

https://doi.org/10.1145/3691620.3695501

Published: 27 October 2024 Publication History

Editorial Notes

The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on February 18, 2025. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

Large language models (LLMs) have behaved well in generating unit tests for Java projects. However, the performance for covering the complex focal methods within the projects is poor. Complex methods comprise many conditions and loops, requiring the test cases to be various enough to cover all lines and branches. However, existing test generation methods with LLMs provide the whole method-to-test to the LLM without assistance on input analysis. The LLM has difficulty inferring the test inputs to cover all conditions, resulting in missing lines and branches. To tackle the problem, we propose decomposing the focal methods into slices and asking the LLM to generate test cases slice by slice. Our method simplifies the analysis scope, making it easier for the LLM to cover more lines and branches in each slice. We build a dataset comprising complex focal methods collected from the projects used by existing state-of-the-art approaches. Our experiment results show that our method significantly outperforms current test case generation methods with LLMs and the typical SBST method Evosuite regarding both line and branch coverage scores.

Supplementary Material

PDF File (3695501-cvor.pdf)

Download
728.41 KB

References

[1]

1990. IEEE Standard Glossary of Software Engineering Terminology. IEEE Std 610.12-1990 (1990), 1--84.

[2]

Saranya Alagarsamy, Chakkrit Tantithamthavorn, and Aldeida Aleti. 2023. A3test: Assertion-augmented automated test case generation. arXiv preprint arXiv:2302.10352 (2023).

[3]

Roberto Baldoni, Emilio Coppa, Daniele Cono D'elia, Camil Demetrescu, and Irene Finocchi. 2018. A survey of symbolic execution techniques. ACM Computing Surveys (CSUR) 51, 3 (2018), 1--39.

Digital Library

[4]

Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, Vol. 8. 209--224.

Digital Library

[5]

Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In 2012 IEEE Symposium on Security and Privacy. IEEE, 380--394.

Digital Library

[6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).

[7]

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2024. Teaching Large Language Models to Self-Debug. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=KuPixIqPiq

[8]

Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. 2012. The S2E platform: Design, implementation, and applications. ACM Transactions on Computer Systems (TOCS) 30, 1 (2012), 1--49.

Digital Library

[9]

Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 201--211.

Digital Library

[10]

Gordon Fraser and Andrea Arcuri. 2011. Evolutionary Generation of Whole Test Suites. In Proceedings of the 11th International Conference on Quality Software, QSIC 2011, Madrid, Spain, July 13--14, 2011, Manuel Núñez, Robert M. Hierons, and Mercedes G. Merayo (Eds.). IEEE Computer Society, 31--40.

Digital Library

[11]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation. 213--223.

Digital Library

[12]

James Alexander George Hamilton and Sebastian Danicic. 2012. Dependence communities in source code. In 28th IEEE International Conference on Software Maintenance, ICSM 2012, Trento, Italy, September 23--28, 2012. IEEE Computer Society, 579--582.

Digital Library

[13]

Mark Harman, Yue Jia, and Yuanyuan Zhang. 2015. Achievements, open problems and challenges for search based software testing. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 1--12.

[14]

Xing Hu, Zhuang Liu, Xin Xia, Zhongxin Liu, Tongtong Xu, and Xiaohu Yang. 2023. Identify and Update Test Cases When Production Code Changes: A Transformer-Based Approach. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1111--1122.

[15]

Yong Jiang, Ziye Zhu, and Yun Li. 2023. Decomposing Source Codes by Program Slicing for Bug Localization. In International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18--23, 2023. IEEE, 1--8.

[16]

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 919--931.

Digital Library

[17]

Hui Liu, Mingzhu Shen, Jiaqi Zhu, Nan Niu, Ge Li, and Lu Zhang. 2020. Deep learning based program generation from requirements text: Are we there yet? IEEE Transactions on Software Engineering 48, 4 (2020), 1268--1289.

[18]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=S37hOerQLB

[19]

Thomas J. McCabe. 1976. A Complexity Measure. IEEE Trans. Software Eng. 2, 4 (1976), 308--320.

Digital Library

[20]

Phil McMinn. 2011. Search-based software testing: Past, present and future. In 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. IEEE, 153--163.

Digital Library

[21]

Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20--26, 2007. IEEE Computer Society, 75--84.

Digital Library

[22]

Strategic Planning. 2002. The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology 1 (2002).

[23]

Nikitha Rao, Kush Jain, Uri Alon, Claire Le Goues, and Vincent J Hellendoorn. 2023. CAT-LM training language models on aligned code and tests. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 409--420.

Digital Library

[24]

Gabriel Ryan, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, and Baishakhi Ray. 2024. Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM. CoRR abs/2402.00097 (2024). arXiv:2402.00097

[25]

BA Sanusi, SO Olabiyisi, AO Afolabi, AO Olowoye, et al. 2020. Development of an Enhanced Automated Software Complexity Measurement System. Journal of Advances in Computational Intelligence Theory 1, 3 (2020).

[26]

Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering (2023).

[27]

Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. 2024. Chatgpt vs sbst: A comparative assessment of unit test suite generation. IEEE Transactions on Software Engineering (2024).

[28]

Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit test case generation with transformers and focal context. arXiv preprint arXiv:2009.05617 (2020).

[29]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering (2024).

[30]

Tao Xie, Nikolai Tillmann, Jonathan De Halleux, and Wolfram Schulte. 2009. Fitness-guided path exploration in dynamic symbolic execution. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, 359--368.

[31]

Zhuokui Xie, Yinghao Chen, Chen Zhi, Shuiguang Deng, and Jianwei Yin. 2023. ChatUniTest: a ChatGPT-based automated unit test generation tool. CoRR abs/2305.04764 (2023). arXiv:2305.04764

[32]

Yanming Yang, Xin Xia, David Lo, and John Grundy. 2022. A survey on deep learning for software engineering. ACM Computing Surveys (CSUR) 54, 10s (2022), 1--73.

Digital Library

[33]

Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, and Xin Peng. 2023. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. CoRR abs/2305.04207 (2023). arXiv:2305.04207

Index Terms

HITS: High-coverage LLM-based Unit Test Generation via Method Slicing
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

On the Evaluation of Large Language Models in Unit Test Generation
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new ...
Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools
LLM4Code '24: Proceedings of the 1st International Workshop on Large Language Models for Code

Generating unit tests is a crucial task in software development, demanding substantial time and effort from programmers. The advent of Large Language Models (LLMs) introduces a novel avenue for unit test script generation. This research aims to ...
Evaluating and Improving ChatGPT for Unit Test Generation

Unit testing plays an essential role in detecting bugs in functionally-discrete program units (e.g., methods). Manually writing high-quality unit tests is time-consuming and laborious. Although the traditional techniques are able to generate tests with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering

October 2024

2587 pages

ISBN:9798400712487

DOI:10.1145/3691620

General Chair:
Vladimir Filkov,
Program Co-chairs:
Baishakhi Ray
Columbia University, USA; AWS AI Lab
,
Minghui Zhou
Peking University, China

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE '24

Sponsor:

ASE '24: 39th IEEE/ACM International Conference on Automated Software Engineering

October 27 - November 1, 2024

CA, Sacramento, USA

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
404
Total Downloads

Downloads (Last 12 months)404
Downloads (Last 6 weeks)116

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten