Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3674805.3686669acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable Mutants

Published: 24 October 2024 Publication History

Abstract

Background. Mutation analysis is the premier technique for evaluating test suite quality estimating residual software defects. However, the reliability of mutation analysis is hampered by equivalent mutants which are undetectable by test cases. Reliably detecting and eliminating killable mutants is difficult as it is highly program and location dependent. Statistical estimation of killable mutants seems to be a promising approach to tackle this problem. Aims. Frequency-based species estimation methods have been proposed as a solution for several related problems in software testing. This paper investigates whether such frequency-based estimation methods can accurately estimate the number of killable mutants. Method. We conducted a large-scale empirical study on the ability of twelve widely known frequency-based estimators to predict the number of killable mutants in ten mature software projects. Result. Our investigation finds limited or no evidence that any of the statistical estimators are able to consistently predict the number of killable mutants in projects evaluated. Conclusion. We found that the investigated estimators lack sufficient predictive power and cannot produce reliable and useful estimates of killable mutants.

References

[1]
[1] [n.d.]. https://github.com/vrthra/chaos-replication.
[2]
Nicola Accettura, Giovanni Neglia, and Luigi Alfredo Grieco. 2015. The Capture-Recapture approach for population estimation in computer networks. Computer Networks 89 (2015), 107–122.
[3]
Allen Troy Acree Jr. 1980. On Mutation. Ph.D. Dissertation. Georgia Institute of Technology, Atlanta, Georgia. GIT-ICS-80/12.
[4]
James H Andrews, Lionel C Briand, and Yvan Labiche. 2005. Is mutation an appropriate tool for testing experiments?. In Proceedings of the 27th international conference on Software engineering. 402–411.
[5]
James H Andrews, Lionel C Briand, Yvan Labiche, and Akbar Siami Namin. 2006. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering 32, 8 (2006), 608–624.
[6]
Apache Software Foundation. [n.d.]. Apache Commons. http://commons.apache.org/.
[7]
Apache Software Foundation. [n.d.]. Apache Maven. https://maven.apache.org/.
[8]
A. Ayad, I. Marsit, J. Loh, M. N. Omri, and A. Mili. 2019. Estimating the Number of Equivalent Mutants. In 2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 112–121.
[9]
Marcel Böhme. 2018. Assurances in Software Testing: A Roadmap. CoRR abs/1807.10255 (2018). arXiv:1807.10255http://arxiv.org/abs/1807.10255
[10]
Marcel Böhme. 2018. STADS: Software testing as species discovery. ACM Transactions on Software Engineering and Methodology 27, 2 (1 7 2018). https://doi.org/10.1145/3210309
[11]
Dankmar Böhning. 2010. Some general comparative points on Chao’s and Zelterman’s estimators of the population size. Scandinavian Journal of Statistics 37, 2 (2010), 221–236.
[12]
Timothy A Budd and Dana Angluin. 1982. Two notions of correctness and their relation to testing. Acta informatica 18, 1 (1982), 31–45.
[13]
K. P. Burnham and W. S. Overton. 1978. Estimation of the Size of a Closed Population when Capture Probabilities vary Among Animals. Biometrika 65, 3 (1978), 625–633. http://www.jstor.org/stable/2335915
[14]
K. P. Burnham and W. S. Overton. 1979. Robust Estimation of Population Size When Capture Probabilities Vary Among Animals. Ecology 60, 5 (1979), 927–936. http://www.jstor.org/stable/1936861
[15]
Anne Chao. 1984. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of statistics (1984), 265–270.
[16]
Anne Chao and John Bunge. 2002. Estimating the number of species in a stochastic abundance model. Biometrics 58, 3 (2002), 531–539.
[17]
Anne Chao and Chun-Huo Chiu. 2016. Species richness: estimation and comparison. Wiley StatsRef: statistics reference online 1 (2016), 26.
[18]
Anne Chao, SM Lee, and SL Jeng. 1992. Estimating population size for capture-recapture data when capture probabilities vary by time and individual animal. Biometrics (1992), 201–216.
[19]
Chun-Huo Chiu, Yi-Ting Wang, Bruno A Walther, and Anne Chao. 2014. An improved nonparametric lower bound of species richness via a modified good–turing frequency formula. Biometrics 70, 3 (2014), 671–682.
[20]
Henry Coles. [n.d.]. PIT - Real world mutation testing. https://pitest.org.
[21]
Murial Daran and Pascale Thévenod-Fosse. 1996. Software error analysis: A real case study involving real faults and mutations. ACM SIGSOFT Software Engineering Notes 21, 3 (1996), 158–171.
[22]
Xavier Devroey, Alessio Gambi, Juan Pablo Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, and Sebastiano Panichella. 2021. JUGE: An Infrastructure for Benchmarking Java Unit Test Generators. CoRR abs/2106.07520 (2021). arXiv:2106.07520https://arxiv.org/abs/2106.07520
[23]
Xavier Devroey, Sebastiano Panichella, and Alessio Gambi. 2020. Java Unit Testing Tool Competition: Eighth Round. In ICSE ’20: 42nd International Conference on Software Engineering, Workshops, Seoul, Republic of Korea, 27 June - 19 July, 2020. ACM, 545–548. https://doi.org/10.1145/3387940.3392265
[24]
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.
[25]
Alessio Gambi, Gunel Jahangirova, Vincenzo Riccio, and Fiorella Zampetti. 2022. SBST Tool Competition 2022. In 15th IEEE/ACM International Workshop on Search-Based Software Testing, SBST@ICSE 2022, Pittsburgh, PA, USA, May 9, 2022. IEEE, 25–32. https://doi.org/10.1145/3526072.3527538
[26]
Rahul Gopinath, Iftekhar Ahmed, Mohammad Amin Alipour, Carlos Jensen, and Alex Groce. 2017. Mutation reduction strategies considered harmful. IEEE Transactions on Reliability 66, 3 (2017), 854–874.
[27]
Rahul Gopinath, Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2016. On the limits of mutation reduction strategies. In Proceedings of the 38th International Conference on Software Engineering. ACM.
[28]
Nicholas J Gotelli and Anne Chao. 2013. Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. (2013).
[29]
Bernhard JM Grün, David Schuler, and Andreas Zeller. 2009. The impact of equivalent mutants. In 2009 International Conference on Software Testing, Verification, and Validation Workshops. IEEE, 192–199.
[30]
Joseph R Horgan and Aditya P Mathur. 1996. Software testing and reliability. In Handbook of software reliability engineering. 531–566.
[31]
Joaquín Hortal, Paulo AV Borges, and Clara Gaspar. 2006. Evaluating the performance of species richness estimators: sensitivity to sample grain size. Journal of animal ecology 75, 1 (2006), 274–287.
[32]
Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649–678.
[33]
René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst, Reid Holmes, and Gordon Fraser. 2014. Are mutants a valid substitute for real faults in software testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 654–665.
[34]
Shen-Ming Lee and Anne Chao. 1994. Estimating population size via sample coverage for closed capture-recapture models. Biometrics (1994), 88–97.
[35]
L. Madeyski, W. Orzeszyna, R. Torkar, and M. Józala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation. IEEE Transactions on Software Engineering 40, 1 (2014), 23–42.
[36]
Imen Marsit, Mohamed Nazih Omri, JiMing Loh, and Ali Mili. 2018. Impact of Mutation Operators on Mutant Equivalence. In ICSOFT. 55–66.
[37]
Imen Marsit, Mohamed Nazih Omri, and Ali Mili. 2017. Estimating the Survival Rate of Mutants. In ICSOFT.
[38]
Tapan Nayak. 1988. Estimating Population Size by Recapture Sampling. Biometrika 75 (03 1988). https://doi.org/10.2307/2336441
[39]
James L Norris and Kenneth H Pollock. 1998. Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environmental and Ecological Statistics 5, 4 (1998), 391–402.
[40]
A. Jefferson Offutt and W. Michael Craft. 1994. Using compiler optimization techniques to detect equivalent mutants. Software Testing, Verification and Reliability 4, 3 (1994), 131–154. https://doi.org/10.1002/stvr.4370040303 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/stvr.4370040303
[41]
R Lyman Ott and Micheal T Longnecker. 2015. An introduction to statistical methods and data analysis. Cengage Learning.
[42]
Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2015. Reformulating branch coverage as a many-objective optimization problem. In IEEE International Conference on Software Testing, Verification and Validation. IEEE, 1–10.
[43]
Sebastiano Panichella, Alessio Gambi, Fiorella Zampetti, and Vincenzo Riccio. 2021. SBST Tool Competition 2021. In 14th IEEE/ACM International Workshop on Search-Based Software Testing, SBST 2021, Madrid, Spain, May 31, 2021. IEEE, 20–27. https://doi.org/10.1109/SBST52555.2021.00011
[44]
Mike Papadakis, Marcio Delamaro, and Yves Le Traon. 2014. Mitigating the effects of equivalent mutants with mutant classification strategies. Science of Computer Programming 95 (2014), 298–319.
[45]
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Mutation testing advances: an analysis and survey. In Advances in Computers. Vol. 112. Elsevier, 275–378.
[46]
David Schuler and Andreas Zeller. 2010. (Un-) covering equivalent mutants. In 2010 Third International Conference on Software Testing, Verification and Validation. IEEE, 45–54.
[47]
Eric P. Smith and Gerald van Belle. 1984. Nonparametric Estimation of Species Richness. Biometrics 40, 1 (1984), 119–129. http://www.jstor.org/stable/2530750
[48]
Yoshihiro Tohma, Kenshin Tokunaga, Shinji Nagase, and Yukihisa Murata. 1989. Structural approach to the estimation of the number of residual software faults based on the hyper-geometric distribution. IEEE transactions on software engineering 15, 3 (1989), 345–355.
[49]
Auri Vincenzi, Elisa Nakagawa, José Maldonado, Márcio Delamaro, and Roseli Romero. 2002. Bayesian-Learning Based Guidelines to Determine Equivalent Mutants.International Journal of Software Engineering and Knowledge Engineering 12 (12 2002), 675–689. https://doi.org/10.1142/S021819400200113X
[50]
Jeffrey M. Voas and Gary McGraw. 1997. Software Fault Injection: Inoculating Programs against Errors. John Wiley & Sons, Inc., USA.
[51]
Ji-Ping Wang. 2010. Estimating species richness by a Poisson-compound gamma model. Biometrika 97, 3 (2010), 727–740.
[52]
Ji-Ping Z Wang and Bruce G Lindsay. 2005. A penalized nonparametric maximum likelihood approach to species richness estimation. J. Amer. Statist. Assoc. 100, 471 (2005), 942–959.
[53]
Xiangjuan Yao, Mark Harman, and Yue Jia. 2014. A study of equivalent and stubborn mutation operators using human analysis of equivalence. In Proceedings of the 36th International Conference on Software Engineering. 919–930.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
October 2024
633 pages
ISBN:9798400710476
DOI:10.1145/3674805
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Australian Research Council

Conference

ESEM '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 24
    Total Downloads
  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)24
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media