Abstract
Finding test data to cover structural test coverage criteria such as branch coverage is largely a manual and hence expensive activity. A potential low cost alternative is to generate the required test data automatically. Search-based test data generation is one approach that has attracted recent interest. This approach is based on the definition of an evaluation or cost function that is able to discriminate between candidate test cases with respect to achieving a given test goal. The cost function is implemented by appropriate instrumentation of the program under test. The candidate test is then executed on the instrumented program. This provides an evaluation of the candidate test in terms of the “distance” between the computation achieved by the candidate test and the computation required to achieve the test goal. Providing the cost function is able to discriminate reliably between candidate tests that are close or far from covering the test goal and the goal is feasible, a search process is able to converge to a solution, i.e., a test case that satisfies the coverage goal. For some programs, however, an informative cost function is difficult to define. The operations performed by these programs are such that the cost function returns a constant value for a very wide range of inputs. A typical example of this problem arises in the instrumentation of branch predicates that depend on the value of a Boolean-valued (flag) variable although the problem is not limited to programs that contain flag variables. Although methods are known for overcoming the problems of flag variables in particular cases, the more general problem of a near constant cost function has not been tackled. This paper presents a new heuristic for directing the search when the cost function at a test goal is not able to differentiate between candidate test inputs. The heuristic directs the search toward test cases that produce rare or scarce data states. Scarce inputs for the cost function are more likely to produce new cost values. The proposed method is evaluated empirically for a number of example programs for which existing methods are inadequate.
Similar content being viewed by others
References
Alshraideh, M., & Bottaci, L. (2006). Automatic software test data generation for string data using heuristic search with domain specific search operators. Software Testing, Verification and Reliability, 16(3), 175–203.
Baresel, A., Harman, M., Binkley, D., & Korel, B. (2004). Evolutionary testing in the presence of loop-assigned flags: A testability transformation approach. In G. S. Avrunin & G. Rothermel (Eds.), Proceedings of 2004 ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2004, (pp. 108–118). New York: ACM.
Booker, L. B. (1982). Intelligent behavior as an adaptation to the task environment. PhD thesis, Ann Arbor, MI: The University of Michigan.
Bottaci, L. (2002). Instrumenting programs with flag variables for test data search by genetic algorithm. In Proceedings of genetic and evolutionary computation conference (GECCO 2002), (pp. 1337–1342). San Mateo, CA: Morgan Kaufmann.
Bottaci, L. (2003). Predicate expression cost functions to guide evolutionary search for test data. In Proceedings of genetic and evolutionary computation conference (GECCO 2003), (pp. 2455–2464). Chicago, IL: Springer Verlag.
Bottaci, L. (2005). Use of branch cost functions to diversify the search for test data. In Proceedings of the UK software testing workshop (UKTest 2005), (pp. 151–163). UK: University of Sheffield.
Bouchachia, A. (2007). An immune genetic algorithm for software test data generation. In Proceedings of the 7th international conference on hybrid intelligent systems (HIS 2007), (pp. 84–89). Washington, DC.
Bueno, P., Paulo, M. S., Wong, W. E., & Jino, M. (2007). Improving random test sets using the diversity oriented test data generation. In Proceedings of the 2nd international workshop on random testing: co-located with the 22nd IEEE/ACM international conference on automated software engineering (ASE 2007), (pp. 10–17). New York, NY: ACM.
Chen, T. Y., Kuo, F. C., Merkel, R. G., & Tse, T. (2009). Adaptive random testing: The art of test case diversity. Journal of Systems and Software (in press), Corrected Proof:–, doi:10.1016/j.jss.2009.02.022, http://www.sciencedirect.com/science/article/B6V0N-4VRX633-1/2/50b8286d83d4352947ae21e1c6930c00
Collins, R. J., & Jefferson, D. R. (1991). Selection in massively parallel genetic. In Belew, R. K., Booker, L. B. (Eds.), Proceedings of the fourth international on genetic algorithms, (pp. 249–256). San Mateo, CA: Morgan Kaufmann.
Coward, P. D. (1991). Symbolic execution and testing. Information and Software Technique, 33(1), 53–64.
Davis, L. (1991). Handbook of Genetic Algorithms. NY: International Thomson Computer Press.
de Halleux, J., & Tillmann, N. (2008). Parameterized unit testing with pex. In B. Beckert & R. Hähnle (Eds.), Tests and proofs, proceedings of the second international conference, TAP 2008, (Vol. 4966, pp. 171–181). Springer, Lecture Notes in Computer Science.
De Jong, L. A. (1975). An analysis of the behavior of a class of genetic adaptive systems. PhD thesis, Ann Arbor, MI: The University of Michigan.
De Jong, E. D., Watson, R., & Pollack, J. (2001). Reducing bloat and promoting diversity using multi-objective methods. In L. Spector et al. (Eds.), Proceedings of the genetic and evolutionary computation conference, (pp. 11–18). San Francisco, CA: Morgan Kaufmann.
Deb, K., & Goldberg, D. E. (1989). An investigation of niche and species formation in genetic function optimization. In Proceedings of the third international conference on genetic algorithms, (pp. 42–50). San Mateo, CA: Morgan Kaufmann.
Duran, J. E., & Ntafos, S. C. (1984). An evaluation of random testing. IEEE Transactions on Software Engineering, 10(4), 438–444.
Feldt, R., Torkar, R., Gorschek, T., & Afzal, W. (2008). Searching for cognitively diverse tests: Towards universal test diversity metrics. In IEEE international conference on software testing verification and validation workshop, ICSTW ’08, (pp. 178–186).
Ferguson, R., & Korel, B. (1996). The chaining approach for software test data generation. ACM Transactions on Software Engineering and Methodology, 5(1), 63–86.
Goldberg, D., & Richardson, J. (1987). Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the second international conference on genetic algorithms, (pp. 148–154). San Francisco, CA: Morgan Kaufmann.
Goldberg, D. E. (1989). Genetic Algorithms in Search Optimization and Machine Learning. Reading, MA: Addison Wesley.
Harman, M., Hu, L., Hierons, R., Baresel, A., & Sthamer, H. (2002). Improving evolutionary testing by flag removal. In Proceedings of genetic and evolutionary computation conference GECCO 2002, New York, USA, pp. 1351–1358.
Harman, M., Hu, L., Hierons, R., Wegener, J., Baresel, A., Sthamer, H., & Roper, M. (2004). Testability transformation. IEEE Transactions on Software Engineering, 30(1), 3–16.
Jones, B.F., Sthamer, H., & Eyres, D. (1996). Automatic structural testing using genetic algorithms. Software Engineering Journal, 11(5), 299–306.
Keijzer, M. (1996). Efficiently representing populations in genetic programming. In Angeline, P., Kinnear, K. Jr. (Eds.), Advances in Genetic Programming, (Vol. 2, pp. 259–278), Cambridge: MIT Press.
Korel, B. (1990). Automated software test data generation. IEEE Transactions on Software Engineering, 16(8), 870–879.
Korel, B., Harman, M., Chung, S., Apirukvorapinit, P., Gupta, R., & Zhang, Q. (2005). Data dependence based testability transformation in automated test generation. In 16th IEEE international symposium on software reliability engineering (pp. 245–254)
Koza, J. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.
Langdon, W. B. (1998). Genetic programming and data structures: Genetic programming + data structures = automatic programming!, genetic programming, (Vol. 1). Boston: Kluwer.
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of fifth berkeley symposium on mathematical statistics and probability, (Vol. 1, pp. 281–297). Berkeley: University of California Press.
Malaiya, Y. (1995). Antirandom testing: Getting the most out of black-box testing. In Proceeding of international symposium on software reliability engineering, (pp. 86–95). citeseer.ist.psu.edu/article/malaiya95antirandom.html.
Mattiussi, C., Waibel, M., & Floreano, D. (2004). Measures of diversity for populations and distances between individuals with highly reorganizable genomes. Evolutionary Computation, 12(4), 495–515.
McMinn, P. (2004). Search-based software test data generation: A survey. Software Testing, Verification and Reliability, 14(2), 105–156.
McMinn, P., Binkley, D., Harman, M., & Tonella, P. (2006). The species per path approach to searchbased test data generation. In proceedings of the international symposium on software testing and analysis (ISSTA 2006), (pp. 13–24). Portland, ME.
O’Reilly, U. M. (1997). Using a distance metric on genetic programs to understand genetic operators. In IEEE international conference on systems, man, and cybernetics, computational cybernetics and simulation, 5, 4092–4097.
Roper, M. (1997). Computer aided software testing using genetic algorithms. In Proceeding of the 10th international software quality week conference, San Francisco, USA, pp. 27–30.
Rosca, J. (1995). Entropy-driven adaptive representation. In Rosca, J. (Ed.), Proceedings of the workshop on genetic programming: From theory to real-world applications, Tahoe City, CA, pp. 23–32.
Voas, J., & Miller, K. (1995). Software testability: The new verification. IEEE Software, 12(3), 17–28.
Wappler, S., Baresel, A., & Wegener, J. (2007). Improving evolutionary testing in the presence of function-assigned flags. In Proceedings of the testing: Academic and industrial conference practice and research techniques, (pp. 23–34). IEEE Digital Library.
Wegener, J., Baresel, A., & Sthamer, H. (2001). Evolutionary test environment for automatic structural testing. Information and Software Technology Special Issue on Software Engineering using Metaheuristic Innovative Algorithms, 43(14), 841–854.
Wu, S. H., Jandhyala, S., Malaiya, Y. K., & Jayasumana, A. P. (2008). Antirandom testing: A distance-based approach. VLSI Design, 2008(2), 1–9.
Acknowledgments
The authors would like to express their gratitude to the anonymous referees for their valuable comments and suggestions for improving the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alshraideh, M., Bottaci, L. & Mahafzah, B.A. Using program data-state scarcity to guide automatic test data generation. Software Qual J 18, 109–144 (2010). https://doi.org/10.1007/s11219-009-9083-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-009-9083-x