Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3660829.3660837acmconferencesArticle/Chapter ViewAbstractPublication PagesprogrammingConference Proceedingsconference-collections
research-article
Open access

Faster Feedback with AI? A Test Prioritization Study

Published: 09 July 2024 Publication History

Abstract

Feedback during programming is desirable, but its usefulness depends on immediacy and relevance to the task. Unit and regression testing are practices to ensure programmers can obtain feedback on their changes; however, running a large test suite is rarely fast, and only a few results are relevant.
Identifying tests relevant to a change can help programmers in two ways: upcoming issues can be detected earlier during programming, and relevant tests can serve as examples to help programmers understand the code they are editing.
In this work, we describe an approach to evaluate how well large language models (LLMs) and embedding models can judge the relevance of a test to a change. We construct a dataset by applying faulty variations of real-world code changes and measuring whether the model could nominate the failing tests beforehand.
We found that, while embedding models perform best on such a task, even simple information retrieval models are surprisingly competitive. In contrast, pre-trained LLMs are of limited use as they focus on confounding aspects like coding styles.
We argue that the high computational cost of AI models is not always justified, and tool developers should also consider non-AI models for code-related retrieval and recommendation tasks. Lastly, we generalize from unit tests to live examples and outline how our approach can benefit live programming environments.

References

[1]
Gilad Bracha. 2021. Enhancing Liveness with Exemplars in the Newspeak IDE. https://newspeaklanguage.org/pubs/newspeak-exemplars.pdf.
[2]
Jonathan Edwards. 2004. Example Centric Programming. ACM SIGPLAN Notices 39, 12 (Dec. 2004), 84–91. https://doi.org/10.1145/1052883.1052894
[3]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
[4]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:2203.03850 (2022).
[5]
Jung-Min Kim and Adam Porter. 2002. A History-Based Test Prioritization Technique for Regression Testing in Resource Constrained Environments. In Proceedings of the 24th International Conference on Software Engineering(ICSE ’02). Association for Computing Machinery, New York, NY, USA, 119–129. https://doi.org/10.1145/581339.581357
[6]
Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. 2021. Generation-Augmented Retrieval for Open-domain Question Answering. https://doi.org/10.48550/arXiv.2009.08553 arxiv:2009.08553 [cs]
[7]
Toni Mattis and Robert Hirschfeld. 2020. Lightweight Lexical Test Prioritization for Immediate Feedback. The Art, Science, and Engineering of Programming 4, 3 (Feb. 2020), 12:1–12:32. https://doi.org/10.22152/programming-journal.org/2020/4/12
[8]
Toni Mattis, Patrick Rein, Falco Dürsch, and Robert Hirschfeld. 2020. RTPTorrent: An Open-source Dataset for Evaluating Regression Test Prioritization. In Proceedings of the 17th International Conference on Mining Software Repositories(MSR ’20). Association for Computing Machinery, New York, NY, USA, 385–396. https://doi.org/10.1145/3379597.3387458
[9]
Hong Mei, Dan Hao, Lingming Zhang, Lu Zhang, Ji Zhou, and Gregg Rothermel. 2012. A Static Approach to Prioritizing JUnit Test Cases. IEEE Transactions on Software Engineering 38, 6 (Nov. 2012), 1258–1275. https://doi.org/10.1109/TSE.2011.106
[10]
Dominik Meier, Toni Mattis, and Robert Hirschfeld. 2021. Toward Exploratory Understanding of Software Using Test Suites. In Companion Proceedings of the 5th International Conference on the Art, Science, and Engineering of Programming(Programming ’21). Association for Computing Machinery, New York, NY, USA, 60–67. https://doi.org/10.1145/3464432.3464438
[11]
Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, and Nathan Cooper. [n. d.]. Stable Code 3B. https://huggingface.co/stabilityai/stable-code-3b
[12]
David Rauch, Patrick Rein, Stefan Ramson, Jens Lincke, and Robert Hirschfeld. 2019. Babylonian-style Programming - Design and Implementation of an Integration of Live Examples Into General-purpose Source Code. Art Sci. Eng. Program. 3, 3 (2019), 9. https://doi.org/10.22152/programming-journal.org/2019/3/9
[13]
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. 2024. Code Llama: Open Foundation Models for Code. https://doi.org/10.48550/arXiv.2308.12950 arxiv:2308.12950 [cs]
[14]
Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid, and Dewayne E. Perry. 2015. An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes. In Proceedings of the 37th International Conference on Software Engineering - Volume 1(ICSE ’15). IEEE Press, Piscataway, NJ, USA, 268–279.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Programming '24: Companion Proceedings of the 8th International Conference on the Art, Science, and Engineering of Programming
March 2024
159 pages
ISBN:9798400706349
DOI:10.1145/3660829
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2024

Check for updates

Author Tags

  1. embedding models
  2. generative ai
  3. large language models
  4. test prioritization
  5. testing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

‹Programming› '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 129
    Total Downloads
  • Downloads (Last 12 months)129
  • Downloads (Last 6 weeks)49
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media