A large-scale evaluation of automated unit test generation using evosuite

G Fraser, A Arcuri - ACM Transactions on Software Engineering and …, 2014 - dl.acm.org
ACM Transactions on Software Engineering and Methodology (TOSEM), 2014dl.acm.org
Research on software testing produces many innovative automated techniques, but
because software testing is by necessity incomplete and approximate, any new technique
faces the challenge of an empirical assessment. In the past, we have demonstrated scientific
advance in automated unit test generation with the EvoSuite tool by evaluating it on
manually selected open-source projects or examples that represent a particular problem
addressed by the underlying technique. However, demonstrating scientific advance is not …
Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated scientific advance in automated unit test generation with the EVOSUITE tool by evaluating it on manually selected open-source projects or examples that represent a particular problem addressed by the underlying technique. However, demonstrating scientific advance is not necessarily the same as demonstrating practical value; even if VOSUITE worked well on the software projects we selected for evaluation, it might not scale up to the complexity of real systems. Ideally, one would use large “real-world” software systems to minimize the threats to external validity when evaluating research tools. However, neither choosing such software systems nor applying research prototypes to them are trivial tasks.
In this article we present the results of a large experiment in unit test generation using the VOSUITE tool on 100 randomly chosen open-source projects, the 10 most popular open-source projects according to the SourceForge Web site, seven industrial projects, and 11 automatically generated software projects. The study confirms that VOSUITE can achieve good levels of branch coverage (on average, 71% per class) in practice. However, the study also exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects. Furthermore, our experiments demonstrate how practical limitations interfere with scientific advances, branch coverage on an unbiased sample is affected by predominant environmental dependencies. The surprisingly large effect of such practical engineering problems in unit testing will hopefully lead to a larger appreciation of work in this area, thus supporting transfer of knowledge from software testing research to practice.
ACM Digital Library