Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3691620.3695541acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Open access

Reducing Test Runtime by Transforming Test Fixtures

Published: 27 October 2024 Publication History

Abstract

Software testing is a fundamental part of software development, but the cost of running tests can be high. Existing approaches to speed up testing such as test-suite reduction or regression test selection aim to run only a subset of tests from the full test suite, but these approaches run the risk of missing to run some key tests that are needed to detect faults in the code.
We propose a new technique to transform test code to speed up test runtime while still running all the tests. The insight is that testing frameworks such as JUnit for Java projects allow for developers to define test fixtures, i.e., methods that run before or after every test to setup or teardown test state, but these test fixtures need not be called all the time before/after each test. It may be sufficient to do the setup and teardown once at the beginning and end, respectively, of all tests. Our technique, TestBoost, transforms the test fixtures within a test class to instead run once before/after all tests in the test class, thereby running the test fixtures less frequently while still running all tests and ensuring that tests all still pass, as they did before. Our evaluation on 697 test classes from 34 projects shows that on average we can reduce the runtime per test class by 28.39% for the cases with positive significant improvement. Using these transformed test classes can result in an average 18.24% reduction per test suite runtime. We find that the coverage of the transformed test classes changes by <1%, and when we submitted 15 pull requests, 9 have already been merged.

References

[1]
2011. Testing at the speed and scale of Google. http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html.
[2]
2019. JavaParser. http://javaparser.org.
[3]
2023. GitHub Actions. https://github.com/features/actions.
[4]
2023. Reducing Test Runtime by Transforming Test Fixtures. https://sites.google.com/view/transforming-test-fixtures.
[5]
2023. Travis-CI. https://travis-ci.org.
[6]
2024. JaCoCo Java Code Coverage Library. https://www.eclemma.org/jacoco/.
[7]
2024. PIT Mutation Testing. http://pitest.org.
[8]
Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Efficient Dependency Detection for Safe Java Test Acceleration. In International Symposium on Foundations of Software Engineering. 770--781.
[9]
T. Y. Chen and M. F. Lau. 1998. A new heuristic for test suite reduction. Journal of Information and Software Technology 40, 5--6 (1998), 347--354.
[10]
T. Y. Chen and M. F. Lau. 1998. A simulation study on some heuristics for test suite reduction. Journal of Information and Software Technology 40, 13 (1998), 777--787.
[11]
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding Flaky Tests: The Developer's Perspective. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 830--840.
[12]
Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In International Symposium on Foundations of Software Engineering. 235--245.
[13]
Daniel Elsner, Florian Hauer, Alexander Pretschner, and Silke Reimer. 2021. Empirically Evaluating Readily Available Information for Regression Test Optimization in Continuous Integration. In International Symposium on Software Testing and Analysis. 491--504.
[14]
Emelie Engström, Mats Skoglund, and Per Runeson. 2008. Empirical evaluations of regression test selection techniques: A systematic review. In International Symposium on Empirical Software Engineering and Measurement. 22--31.
[15]
Alessio Gambi, Jonathan Bell, and Andreas Zeller. 2018. Practical test dependency detection. In International Conference on Software Testing, Verification, and Validation. 1--11.
[16]
Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis. 211--222.
[17]
Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: Detecting state-polluting tests to prevent test dependency. In International Symposium on Software Testing and Analysis. 223--233.
[18]
Dan Hao, Lu Zhang, Xingxia Wu, Hong Mei, and Gregg Rothermel. 2012. On-demand test suite reduction. In International Conference on Software Engineering. 738--748.
[19]
Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, S. Alexander Spoon, and Ashish Gujarathi. 2001. Regression Test Selection for Java Software. In Conference on Object-Oriented Programming, Systems, Languages, and Applications. 312--326.
[20]
Mary Jean Harrold, David Rosenblum, Gregg Rothermel, and Elaine Weyuker. 2001. Empirical studies of a prediction model for regression test selection. IEEE Transactions on Software Engineering 27, 3 (2001), 248--263.
[21]
Michael Hilton, Jonathan Bell, and Darko Marinov. 2018. A Large-Scale, Longitudinal Study of Test Coverage Evolution. In International Conference on Automated Software Engineering. 53--63.
[22]
Yue Jia and Mark Harman. 2011. An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering 37, 5 (2011), 649--678.
[23]
Bo Jiang, Zhenyu Zhang, Wing Kwong Chan, and T. H. Tse. 2009. Adaptive random test case prioritization. In International Conference on Automated Software Engineering. 233--244.
[24]
James A. Jones and Mary Jean Harrold. 2001. Test-suite reduction and prioritization for modified condition/decision coverage. In International Conference on Software Maintenance. 92--102.
[25]
Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A framework for detecting and partially classifying flaky tests. In International Conference on Software Testing, Verification, and Validation. 312--322.
[26]
Wing Lam, August Shi, Reed Oei, Sai Zhang, Michael D. Ernst, and Tao Xie. 2020. Dependent-Test-Aware Regression Testing Techniques. In International Symposium on Software Testing and Analysis. 298--311.
[27]
Wing Lam, Stefan Winter, Angello Astorga, Victoria Stodden, and Darko Marinov. 2020. Understanding Reproducibility and Characteristics of Flaky Tests Through Test Reruns in Java Projects. In International Symposium on Software Reliability Engineering. 403--413.
[28]
Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, and Jonathan Bell. 2020. A large-scale longitudinal study of flaky tests. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1--29.
[29]
Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An Extensive Study of Static Regression Test Selection in Modern Software Evolution. In International Symposium on Foundations of Software Engineering. 583--594.
[30]
Owolabi Legunsen, August Shi, and Darko Marinov. 2017. STARTS: STAtic regression test selection. In International Conference on Automated Software Engineering. IEEE, 949--954.
[31]
Chengpeng Li, M Mahdi Khosravi, Wing Lam, and August Shi. 2023. Systematically Producing Test Orders to Detect Order-Dependent Flaky Tests. In International Symposium on Software Testing and Analysis. 627--638.
[32]
Chengpeng Li and August Shi. 2022. Evolution-aware detection of order-dependent flaky tests. In International Symposium on Software Testing and Analysis. 114--125.
[33]
Chengpeng Li, Chenguang Zhu, Wenxi Wang, and August Shi. 2022. Repairing Order-Dependent Flaky Tests via Test Generation. In International Conference on Software Engineering. 1881--1892.
[34]
Yu Liu, Jiyang Zhang, Pengyu Nie, Milos Gligoric, and Owolabi Legunsen. 2023. More Precise Regression Test Selection via Reasoning about Semantics-Modifying Changes. In International Symposium on Software Testing and Analysis. 664--676.
[35]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In International Symposium on Foundations of Software Engineering. 643--653.
[36]
Qi Luo, Kevin Moran, and Denys Poshyvanyk. 2016. A large-scale empirical comparison of static and dynamic test case prioritization techniques. In International Symposium on Foundations of Software Engineering. 559--570.
[37]
Xue-ying Ma, Bin-kui Sheng, and Cheng-qing Ye. 2005. Test-suite reduction using genetic algorithm. In International Conference on Advanced Parallel Processing Technologies. 253--262.
[38]
Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra. 2019. Predictive test selection. In International Conference on Software Engineering, Software Engineering in Practice. 91--100.
[39]
Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob Siemborski, and John Micco. 2017. Taming Google-scale continuous testing. In International Conference on Software Engineering, Software Engineering in Practice. 233--242.
[40]
Pengyu Nie, Ahmet Celik, Matthew Coley, Aleksandar Milicevic, Jonathan Bell, and Milos Gligoric. 2020. Debugging the performance of Maven's test isolation: Experience report. In International Symposium on Software Testing and Analysis. 249--259.
[41]
Alessandro Orso, Nanjuan Shi, and Mary Jean Harrold. 2004. Scaling regression testing to large software systems. In International Symposium on Foundations of Software Engineering. 241--251.
[42]
Qianyang Peng, August Shi, and Lingming Zhang. 2020. Empirically Revisiting and Enhancing IR-Based Test-Case Prioritization. In International Symposium on Software Testing and Analysis. 324--336.
[43]
Gregg Rothermel and Mary Jean Harrold. 1997. A safe, efficient regression test selection technique. ACM Transactions on Software Engineering Methodology 6, 2 (1997), 173--210.
[44]
Gregg Rothermel, Mary Jean Harrold, Jeffery von Ronne, and Christie Hong. 2002. Empirical studies of test-suite reduction. Journal of Software Testing, Verification and Reliability 12, 4 (2002), 219--249.
[45]
G. Rothermel, R.H. Untch, Chengyun Chu, and M.J. Harrold. 1999. Test case prioritization: an empirical study. In International Conference on Software Maintenance. 179--188.
[46]
Ripon K. Saha, Lingming Zhang, Sarfraz Khurshid, and Dewayne E. Perry. 2015. An information retrieval approach for regression test prioritization based on program changes. In International Conference on Software Engineering. 268--279.
[47]
August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In International Symposium on Software Testing and Analysis. 112--122.
[48]
August Shi, Alex Gyori, Milos Gligoric, Andrey Zaytsev, and Darko Marinov. 2014. Balancing trade-offs in test-suite reduction. In International Symposium on Foundations of Software Engineering. 246--256.
[49]
August Shi, Alex Gyori, Suleman Mahmood, Peiyuan Zhao, and Darko Marinov. 2018. Evaluating Test-Suite Reduction in Real Software Evolution. In International Symposium on Software Testing and Analysis. 84--94.
[50]
August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A Framework for Automatically Fixing Order-Dependent Flaky Tests. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 545--555.
[51]
August Shi, Peiyuan Zhao, and Darko Marinov. 2019. Understanding and Improving Regression Test Selection in Continuous Integration. In International Symposium on Software Reliability Engineering. 228--238.
[52]
Arash Vahabzadeh, Andrea Stocco, and Ali Mesbah. 2018. Fine-grained test minimization. In International Conference on Software Engineering. 210--221.
[53]
Ruixin Wang, Yang Chen, and Wing Lam. 2022. IPFlakies: A Framework for Detecting and Fixing Python Order-Dependent Flaky Tests. In International Conference on Software Engineering (Tool Demonstrations Track). 120--124.
[54]
Anjiang Wei, Pu Yi, Tao Xie, Darko Marinov, and Wing Lam. 2021. Probabilistic and Systematic Coverage of Consecutive Test-Method Pairs for Detecting Order-Dependent Flaky Tests. In Tools and Algorithms for the Construction and Analysis of Systems. 270--287.
[55]
Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. Journal of Software Testing, Verification and Reliability 22, 2 (2012), 67--120.
[56]
Jiyang Zhang, Yu Liu, Milos Gligoric, Owolabi Legunsen, and August Shi. 2022. Comparing and Combining Analysis-Based and Learning-Based Regression Test Selection. In ACM/IEEE International Conference on Automation of Software Test. 17--28.
[57]
Lingming Zhang. 2018. Hybrid regression test selection. In International Conference on Software Engineering. 199--209.
[58]
Lingming Zhang, Darko Marinov, Lu Zhang, and Sarfraz Khurshid. 2011. An empirical study of JUnit test-suite reduction. In International Symposium on Software Reliability Engineering. 170--179.
[59]
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In International Symposium on Software Testing and Analysis. 385--396.
[60]
Chenguang Zhu, Owolabi Legunsen, August Shi, and Milos Gligoric. 2019. A Framework for Checking Regression Test Selection Tools. In International Conference on Software Engineering. 430--441.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering
October 2024
2587 pages
ISBN:9798400712487
DOI:10.1145/3691620
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Check for updates

Author Tags

  1. regression testing
  2. test fixtures
  3. testing speedup

Qualifiers

  • Research-article

Funding Sources

Conference

ASE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 64
    Total Downloads
  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)64
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media