research-article

Developer-centric test amplification: The interplay between automatic generation human exploration

Authors:

Carolin Brandt,

Andy ZaidmanAuthors Info & Claims

Empirical Software Engineering, Volume 27, Issue 4

https://doi.org/10.1007/s10664-021-10094-2

Published: 01 July 2022 Publication History

Abstract

Automatically generating test cases for software has been an active research topic for many years. While current tools can generate powerful regression or crash-reproducing test cases, these are often kept separately from the maintained test suite. In this paper, we leverage the developer’s familiarity with test cases amplified from existing, manually written developer tests. Starting from issues reported by developers in previous studies, we investigate what aspects are important to design a developer-centric test amplification approach, that provides test cases that are taken over by developers into their test suite. We conduct 16 semi-structured interviews with software developers supported by our prototypical designs of a developer-centric test amplification approach and a corresponding test exploration tool. We extend the test amplification tool DSpot, generating test cases that are easier to understand. Our IntelliJ plugin TestCube[inline-graphic not available: see fulltext] empowers developers to explore amplified test cases from their familiar environment. From our interviews, we gather 52 observations that we summarize into 23 result categories and give two key recommendations on how future tool designers can make their tools better suited for developer-centric test amplification.

References

[1]

Almasi MM, Hemmati H, Fraser G, Arcuri A, Benefelds J (2017) An industrial evaluation of unit test generation: Finding real faults in a financial application. In: 39th IEEE/ACM international conference on software engineering: Software engineering in practice track, ICSE-SEIP 2017, Buenos Aires, Argentina, May 20-28, 2017. IEEE Computer Society, pp 263–272

[2]

Alsharif A, Kapfhammer GM, McMinn P (2019) What factors make SQL test cases understandable for testers? A human study of automated test data generation techniques. In: 2019 IEEE international conference on software maintenance and evolution, ICSME 2019, Cleveland, OH, USA, September 29 - October 4, 2019. IEEE, pp 437–448

[3]

Aniche MF, Treude C, Zaidman A (2021) How developers engineer test cases: An observational study. IEEE Transactions on Software Engineering

[4]

Arcuri A and Fraser G Parameter tuning or default values? An empirical investigation in search-based software engineering Empir Softw Eng 2013 18 3 594-623

[5]

Arcuri A, Campos J, Fraser G (2016) Unit test generation during software development: EvoSuite plugins for Maven, IntelliJ and Jenkins. In: 2016 IEEE international conference on software testing, verification and validation, ICST 2016, Chicago, IL, USA, April 11-15, 2016. IEEE Computer Society, pp 401–408

[6]

Athanasiou D, Nugroho A, Visser J, and Zaidman A Test code quality and its relation to issue handling performance IEEE Trans Software Eng 2014 40 11 1100-1125

[7]

Bangor A, Kortum PT, and Miller JT An empirical evaluation of the System Usability Scale Int J Hum Comput Interact 2008 24 6 574-594

[8]

Beck KL (2003) Test-Driven Development - By Example. The Addison-Wesley signature series, Addison-Wesley

[9]

Beller M, Gousios G, Panichella A, Zaidman A (2015a) When, how, and why developers (do not) test in their IDEs. In: Nitto ED, Harman M, Heymans P (eds) Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015. ACM, pp 179–190

[10]

Beller M, Gousios G, Zaidman A (2015b) How (much) do developers test? In: Bertolino A, Canfora G, Elbaum S G (eds) 37th IEEE/ACM international conference on software engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, vol 2. IEEE Computer Society, pp 559–562

[11]

Beller M, Gousios G, Zaidman A (2017) Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub. In: Proceedings of the 14th international conference on mining software repositories (MSR). IEEE Computer Society, pp 356–367

[12]

Beller M, Gousios G, Panichella A, Proksch S, Amann S, and Zaidman A Developer testing in the IDE: patterns, beliefs, and behavior IEEE Trans Software Eng 2019 45 3 261-284

[13]

Bevan N International standards for HCI and usability Int J Hum Comput Stud 2001 55 4 533-552

[14]

Bihel S, Baudry B (2018) Adapting amplified unit tests for human comprehension. KTH Internship Report

[15]

Brandt C, Zaidman A (2021) Developer-centric test amplification: The interplay between automatic generation and human exploration — appendix.

[16]

Corbin JM and Strauss A Grounded theory research: Procedures, canons, and evaluative criteria Qual Sociol 1990 13 1 3-21

[17]

Daka E, Campos J, Fraser G, Dorn J, Weimer W (2015) Modeling readability to improve unit tests. In: Nitto ED, Harman M, Heymans P (eds) Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015. ACM, pp 107–118

[18]

Daka E, Rojas JM, Fraser G (2017) Generating unit tests with descriptive names or: Would you name your children thing1 and thing2?. In: Bultan T, Sen K (eds) Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, Santa Barbara, CA, USA, July 10 - 14, 2017. ACM, pp 57–67

[19]

Danglot B, Vera-Perez O, Yu Z, Zaidman A, Monperrus M, and Baudry B A snowballing literature study on test amplification J Syst Softw 2019 157 110398

[20]

Danglot B, Vera-Pėrez OL, Baudry B, and Monperrus M Automatic test improvement with DSpot: A study with ten mature open-source projects Empir Softw Eng 2019 24 4 2603-2635

[21]

Danglot B, Monperrus M, Rudametkin W, and Baudry B An approach and benchmark to detect behavioral changes of commits in continuous integration Empir Softw Eng 2020 25 4 2379-2415

[22]

Derakhshanfar P, Devroey X, Panichella A, Zaidman A, van Deursen A (2020a) Botsing, a search-based crash reproduction framework for java. In: 35th IEEE/ACM international conference on automated software engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, pp 1278–1282

[23]

Derakhshanfar P, Devroey X, Zaidman A, van Deursen A, Panichella A (2020b) Good things come in threes: Improving search-based crash reproduction with helper objectives. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, pp 211–223

[24]

Fraser G, Arcuri A (2011) EvoSuite: Automatic test suite generation for object-oriented software. In: Gyimȯthy T, Zeller A (eds) SIGSOFT/FSE’11 19th ACM SIGSOFT symposium on the foundations of software engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011. ACM, pp 416–419

[25]

Fraser G, Arcuri A (2013) EvoSuite: On the challenges of test case generation in the real world. In: Sixth IEEE international conference on software testing, verification and validation, ICST 2013, Luxembourg, Luxembourg, March 18-22, 2013. IEEE Computer Society, pp 362–369

[26]

Fraser G, Staats M, McMinn P, Arcuri A, and Padberg F Does automated unit test generation really help software testers? A controlled empirical study ACM Trans Softw Eng Methodol 2015 24 4 23:1-23:49

[27]

Grano G, Scalabrino S, Gall HC, Oliveto R (2018) An empirical investigation on the readability of manual and generated test cases. In: Khomh F, Roy CK, Siegmund J (eds) Proceedings of the 26th conference on program comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018. ACM, pp 348–351

[28]

Grano G, Iaco CD, Palomba F, Gall HC (2020) Pizza versus pinsa: On the perception and measurability of unit test code quality. In: IEEE international conference on software maintenance and evolution, ICSME 2020, Adelaide, Australia, September 28 - October 2, 2020. IEEE, pp 336–347

[29]

Haq FU, Shin D, Briand LC, Stifter T, Wang J (2021) Automatic test suite generation for key-points detection dnns using many-objective search (experience paper). ACM, ISSTA 2021

[30]

Hoffman D and Strooper P API documentation with executable examples J Syst Softw 2003 66 2 143-156

[31]

Infinitest (2021) Infinitest - the continuous test runner for the JVM. https://ingfinitest.github.io/

[32]

Kochhar PS, Xia X, Lo D (2019) Practitioners’ views on good software testing practices. In: Sharp H, Whalen M (eds) Proceedings of the 41st International conference on software engineering: Software engineering in practice, ICSE (SEIP) 2019, Montreal, QC, Canada, May 25-31, 2019. IEEE / ACM, pp 61–70

[33]

Li B, Vendome C, Vȧsquez ML, Poshyvanyk D, Kraft NA (2016) Automatically documenting unit test cases. In: 2016 IEEE international conference on software testing, verification and validation (ICST). IEEE Computer Society, pp 341–352

[34]

Liu X, Holmes R (2020) Exploring developer preferences for visualizing external information within source code editors, IEEE

[35]

Marculescu B, Feldt R, Torkar R (2012) A concept for an interactive search-based software testing system. In: Fraser G, de Souza JT (eds) Search based software engineering - 4th International Symposium, SSBSE 2012, Riva del Garda, Italy, September 28-30, 2012. Proceedings, Springer, Lecture Notes in Computer Science, vol 7515, pp 273–278

[36]

Marculescu B, Feldt R, Torkar R, and Poulding SM Transferring interactive search-based software testing to industry J Syst Softw 2018 142 156-170

[37]

Meszaros G (2007) XUnit Test Patterns: Refactoring Test Code. Pearson Education

[38]

Nassif M, Hernandez A, Sridharan A, Robillard MP (2021) Generating unit tests for documentation. IEEE Transactions on Software Engineering

[39]

Nijkamp N, Brandt C, Zaidman A (2021) Naming amplified tests based on improved coverage. In: 2021 IEEE international working conference on source code analysis and manipulation (SCAM)

[40]

Oosterbroek W, Brandt C, Zaidman A (2021) Removing redundant statements in amplified test cases. In: 2021 IEEE international working conference on source code analysis and manipulation (SCAM)

[41]

Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2016) Automatic test case generation: What if test code quality matters? In: Zeller A, Roychoudhury A (eds) Proceedings of the 25th international symposium on software testing and analysis, ISSTA 2016, Saarbru̇cken, Germany, July 18-20, 2016. ACM, pp 130–141

[42]

Panichella S, Panichella A, Beller M, Zaidman A, Gall HC (2016) The impact of test case summaries on bug fixing performance: An empirical investigation. In: Dillon LK, Visser W, Williams L (eds) Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. ACM, pp 547–558

[43]

Prado MP and Vincenzi AMR Towards cognitive support for unit testing: A qualitative study with practitioners J Syst Softw 2018 141 66-84

[44]

Robinson B, Ernst MD, Perkins JH, Augustine V, Li N (2011) Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE Computer Society, pp 23—32

[45]

Rojas JM, Fraser G, Arcuri A (2015) Automated unit test generation during software development: A controlled experiment and think-aloud observations. In: Young M, Xie T (eds) Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 2015, Baltimore, MD, USA, July 12-17, 2015. ACM, pp 338–349

[46]

Roy D, Zhang Z, Ma M, Arnaoudova V, Panichella A, Panichella S, Gonzalez D, Mirakhorli M (2020) DeepTC-Enhancer: Improving the readability of automatically generated tests. In: 35th IEEE/ACM international conference on automated software engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, pp 287–298

[47]

Serra D, Grano G, Palomba F, Ferrucci F, Gall HC, Bacchelli A (2019) On the effectiveness of manual and automatic unit test generation: Ten years later. In: Storey MD, Adams B, Haiduc S (eds) Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019. IEEE / ACM, Montreal, Canada, pp 121–125

[48]

Spadini D, Aniche MF, Storey MD, Bruntink M, Bacchelli A (2018) When testing meets code review: Why and how developers review tests. In: Chaudron M, Crnkovic I, Chechik M, Harman M (eds) Proceedings of the 40th international conference on software engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. ACM, pp 677–687

[49]

STAMP (2019a) STAMP project: Eclipse IDE. https://github.com/STAMP-project/stamp-ide

[50]

STAMP (2019b) Use cases validation report v3. https://github.com/STAMP-project/docs-forum/blob/master/docs/

[51]

Tillmann N, de Halleux J (2008) Pex-white box test generation for .NET. In: Beckert B, Hȧhnle R (eds) Tests and Proofs - 2nd international conference, TAP 2008, Prato, Italy, April 9-11, 2008. Proceedings, Springer, Lecture Notes in Computer Science, vol 4966, pp 134–153

[52]

Whittaker JA, Arbon J, Carollo J (2012) How Google Tests Software. Addison-Wesley

[53]

Zhang B, Hill E, Clause J (2016) Towards automatically generating descriptive names for unit tests. In: Lo D, Apel S, Khurshid S (eds) Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, Singapore, September 3-7, 2016. ACM, pp 625–636

[54]

Zhang Y, Wildemuth BM (2009) Unstructured interviews. Applications of social research methods to questions in information and library science, pp 222–231

Cited By

Deljouyi ARoychoudhury APaiva AAbreu RStorey M(2024)Understandable Test Generation Through Capture/Replay and LLMsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639789(261-263)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3639789
Brandt CCastelluccio MHoller CKratzer JZaidman ABacchelli ARoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)Mind the Gap: What Working With Developers on Fuzz Tests Taught Us About Coverage GapsProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639721(157-167)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639721
Roslan MRojas JMcMinn P(2022)An Empirical Comparison of EvoSuite and DSpot for Improving Developer-Written Test Suites with Respect to Mutation ScoreSearch-Based Software Engineering10.1007/978-3-031-21251-2_2(19-34)Online publication date: 17-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-21251-2_2

Recommendations

An Empirical Comparison of EvoSuite and DSpot for Improving Developer-Written Test Suites with Respect to Mutation Score
Search-Based Software Engineering
Abstract
Since software faults are usually unknown, researchers and developers rely on mutation analysis—i.e., seeding artificial defects, called mutants—to measure the quality of their test suites. One aim of test amplification techniques is to improve ... $_{}$ $_{}$ $_{}$
Automatic test amplification for executable models
MODELS '22: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems

Behavioral models are important assets that must be thoroughly verified early in the design process. This can be achieved with manually-written test cases that embed carefully hand-picked domain-specific input data. However, such test cases may not ...
A framework and tool supports for generating test inputs of AspectJ programs
AOSD '06: Proceedings of the 5th international conference on Aspect-oriented software development

Aspect-oriented software development is gaining popularity with the wider adoption of languages such as AspectJ. To reduce the manual effort of testing aspects in AspectJ programs, we have developed a framework, called Aspectra, that automates ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering

Empirical Software Engineering Volume 27, Issue 4

Jul 2022

848 pages

ISSN:1382-3256

Issue’s Table of Contents

© The Author(s) 2022.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2022

Accepted: 02 December 2021

Author Tags

Qualifiers

Research-article

Funding Sources

Nederlandse Organisatie voor Wetenschappelijk Onderzoek

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deljouyi ARoychoudhury APaiva AAbreu RStorey M(2024)Understandable Test Generation Through Capture/Replay and LLMsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639789(261-263)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3639789
Brandt CCastelluccio MHoller CKratzer JZaidman ABacchelli ARoychoudhury APaiva AAbreu RStorey MAniche MNagappan N(2024)Mind the Gap: What Working With Developers on Fuzz Tests Taught Us About Coverage GapsProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639721(157-167)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639477.3639721
Roslan MRojas JMcMinn P(2022)An Empirical Comparison of EvoSuite and DSpot for Improving Developer-Written Test Suites with Respect to Mutation ScoreSearch-Based Software Engineering10.1007/978-3-031-21251-2_2(19-34)Online publication date: 17-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-21251-2_2

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents