research-article

Open access

Faster Feedback with AI? A Test Prioritization Study

Authors:

Robert HirschfeldAuthors Info & Claims

Programming '24: Companion Proceedings of the 8th International Conference on the Art, Science, and Engineering of Programming

Pages 32 - 40

https://doi.org/10.1145/3660829.3660837

Published: 09 July 2024 Publication History

All formats PDF

Abstract

Feedback during programming is desirable, but its usefulness depends on immediacy and relevance to the task. Unit and regression testing are practices to ensure programmers can obtain feedback on their changes; however, running a large test suite is rarely fast, and only a few results are relevant.

Identifying tests relevant to a change can help programmers in two ways: upcoming issues can be detected earlier during programming, and relevant tests can serve as examples to help programmers understand the code they are editing.

In this work, we describe an approach to evaluate how well large language models (LLMs) and embedding models can judge the relevance of a test to a change. We construct a dataset by applying faulty variations of real-world code changes and measuring whether the model could nominate the failing tests beforehand.

We found that, while embedding models perform best on such a task, even simple information retrieval models are surprisingly competitive. In contrast, pre-trained LLMs are of limited use as they focus on confounding aspects like coding styles.

We argue that the high computational cost of AI models is not always justified, and tool developers should also consider non-AI models for code-related retrieval and recommendation tasks. Lastly, we generalize from unit tests to live examples and outline how our approach can benefit live programming environments.

References

[1]

Gilad Bracha. 2021. Enhancing Liveness with Exemplars in the Newspeak IDE. https://newspeaklanguage.org/pubs/newspeak-exemplars.pdf.

Google Scholar

[2]

Jonathan Edwards. 2004. Example Centric Programming. ACM SIGPLAN Notices 39, 12 (Dec. 2004), 84–91. https://doi.org/10.1145/1052883.1052894

Digital Library

Google Scholar

[3]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).

Google Scholar

[4]

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:2203.03850 (2022).

Google Scholar

[5]

Jung-Min Kim and Adam Porter. 2002. A History-Based Test Prioritization Technique for Regression Testing in Resource Constrained Environments. In Proceedings of the 24th International Conference on Software Engineering(ICSE ’02). Association for Computing Machinery, New York, NY, USA, 119–129. https://doi.org/10.1145/581339.581357

Digital Library

Google Scholar

[6]

Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. 2021. Generation-Augmented Retrieval for Open-domain Question Answering. https://doi.org/10.48550/arXiv.2009.08553 arxiv:2009.08553 [cs]

Crossref

Google Scholar

[7]

Toni Mattis and Robert Hirschfeld. 2020. Lightweight Lexical Test Prioritization for Immediate Feedback. The Art, Science, and Engineering of Programming 4, 3 (Feb. 2020), 12:1–12:32. https://doi.org/10.22152/programming-journal.org/2020/4/12

Crossref

Google Scholar

[8]

Toni Mattis, Patrick Rein, Falco Dürsch, and Robert Hirschfeld. 2020. RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization. In Proceedings of the 17th International Conference on Mining Software Repositories(MSR ’20). Association for Computing Machinery, New York, NY, USA, 385–396. https://doi.org/10.1145/3379597.3387458

Digital Library

Google Scholar

[9]

Hong Mei, Dan Hao, Lingming Zhang, Lu Zhang, Ji Zhou, and Gregg Rothermel. 2012. A Static Approach to Prioritizing JUnit Test Cases. IEEE Transactions on Software Engineering 38, 6 (Nov. 2012), 1258–1275. https://doi.org/10.1109/TSE.2011.106

Digital Library

Google Scholar

[10]

Dominik Meier, Toni Mattis, and Robert Hirschfeld. 2021. Toward Exploratory Understanding of Software Using Test Suites. In Companion Proceedings of the 5th International Conference on the Art, Science, and Engineering of Programming(Programming ’21). Association for Computing Machinery, New York, NY, USA, 60–67. https://doi.org/10.1145/3464432.3464438

Digital Library

Google Scholar

[11]

Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, and Nathan Cooper. [n. d.]. Stable Code 3B. https://huggingface.co/stabilityai/stable-code-3b

Google Scholar

[12]

David Rauch, Patrick Rein, Stefan Ramson, Jens Lincke, and Robert Hirschfeld. 2019. Babylonian-style Programming - Design and Implementation of an Integration of Live Examples Into General-purpose Source Code. Art Sci. Eng. Program. 3, 3 (2019), 9. https://doi.org/10.22152/programming-journal.org/2019/3/9

Crossref

Google Scholar

[13]

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. 2024. Code Llama: Open Foundation Models for Code. https://doi.org/10.48550/arXiv.2308.12950 arxiv:2308.12950 [cs]

Crossref

Google Scholar

[14]

Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid, and Dewayne E. Perry. 2015. An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes. In Proceedings of the 37th International Conference on Software Engineering - Volume 1(ICSE ’15). IEEE Press, Piscataway, NJ, USA, 268–279.

Crossref

Google Scholar

Index Terms

Faster Feedback with AI? A Test Prioritization Study
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software version control
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Mutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...
Experimental Comparison of Code-Based and Model-Based Test Prioritization
ICSTW '09: Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops

During regression testing, a modified system needs to beretested using the existing test suite. Since test suites may be very large, developers are interested in detecting faults in the system as early as possible. Test prioritization orders test cases ...
Model-based test prioritization heuristic methods and their evaluation
A-MOST '07: Proceedings of the 3rd international workshop on Advances in model-based testing

During regression testing, a modified system needs to be retested using the existing test suite. Since test suites may be very large, developers are interested in detecting faults in the system as early as possible. Test prioritization orders test cases ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Programming '24: Companion Proceedings of the 8th International Conference on the Art, Science, and Engineering of Programming

March 2024

159 pages

ISBN:9798400706349

DOI:10.1145/3660829

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

‹Programming› '24

Sponsor:

SIGPLAN

‹Programming› '24: 8th International Conference on the Art, Science, and Engineering of Programming

March 11 - 15, 2024

Lund, Sweden

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
129
Total Downloads

Downloads (Last 12 months)129
Downloads (Last 6 weeks)49

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Faster mutation testing inspired by test prioritization and reduction

Experimental Comparison of Code-Based and Model-Based Test Prioritization

Model-based test prioritization heuristic methods and their evaluation