research-article

Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable Mutants

Authors:

Konstantin Kuznetsov,

Saikrishna Dhiddi,

Rahul GopinathAuthors Info & Claims

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 61 - 71

https://doi.org/10.1145/3674805.3686669

Published: 24 October 2024 Publication History

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable Mutants

Pages 61 - 71

Abstract
References

Abstract

Background. Mutation analysis is the premier technique for evaluating test suite quality estimating residual software defects. However, the reliability of mutation analysis is hampered by equivalent mutants which are undetectable by test cases. Reliably detecting and eliminating killable mutants is difficult as it is highly program and location dependent. Statistical estimation of killable mutants seems to be a promising approach to tackle this problem. Aims. Frequency-based species estimation methods have been proposed as a solution for several related problems in software testing. This paper investigates whether such frequency-based estimation methods can accurately estimate the number of killable mutants. Method. We conducted a large-scale empirical study on the ability of twelve widely known frequency-based estimators to predict the number of killable mutants in ten mature software projects. Result. Our investigation finds limited or no evidence that any of the statistical estimators are able to consistently predict the number of killable mutants in projects evaluated. Conclusion. We found that the investigated estimators lack sufficient predictive power and cannot produce reliable and useful estimates of killable mutants.

References

[1]

[1] [n.d.]. https://github.com/vrthra/chaos-replication.

[2]

Nicola Accettura, Giovanni Neglia, and Luigi Alfredo Grieco. 2015. The Capture-Recapture approach for population estimation in computer networks. Computer Networks 89 (2015), 107–122.

Digital Library

[3]

Allen Troy Acree Jr. 1980. On Mutation. Ph.D. Dissertation. Georgia Institute of Technology, Atlanta, Georgia. GIT-ICS-80/12.

[4]

James H Andrews, Lionel C Briand, and Yvan Labiche. 2005. Is mutation an appropriate tool for testing experiments?. In Proceedings of the 27th international conference on Software engineering. 402–411.

Digital Library

[5]

James H Andrews, Lionel C Briand, Yvan Labiche, and Akbar Siami Namin. 2006. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering 32, 8 (2006), 608–624.

Digital Library

[6]

Apache Software Foundation. [n.d.]. Apache Commons. http://commons.apache.org/.

[7]

Apache Software Foundation. [n.d.]. Apache Maven. https://maven.apache.org/.

[8]

A. Ayad, I. Marsit, J. Loh, M. N. Omri, and A. Mili. 2019. Estimating the Number of Equivalent Mutants. In 2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 112–121.

[9]

Marcel Böhme. 2018. Assurances in Software Testing: A Roadmap. CoRR abs/1807.10255 (2018). arXiv:1807.10255http://arxiv.org/abs/1807.10255

[10]

Marcel Böhme. 2018. STADS: Software testing as species discovery. ACM Transactions on Software Engineering and Methodology 27, 2 (1 7 2018). https://doi.org/10.1145/3210309

Digital Library

[11]

Dankmar Böhning. 2010. Some general comparative points on Chao’s and Zelterman’s estimators of the population size. Scandinavian Journal of Statistics 37, 2 (2010), 221–236.

[12]

Timothy A Budd and Dana Angluin. 1982. Two notions of correctness and their relation to testing. Acta informatica 18, 1 (1982), 31–45.

[13]

K. P. Burnham and W. S. Overton. 1978. Estimation of the Size of a Closed Population when Capture Probabilities vary Among Animals. Biometrika 65, 3 (1978), 625–633. http://www.jstor.org/stable/2335915

[14]

K. P. Burnham and W. S. Overton. 1979. Robust Estimation of Population Size When Capture Probabilities Vary Among Animals. Ecology 60, 5 (1979), 927–936. http://www.jstor.org/stable/1936861

[15]

Anne Chao. 1984. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of statistics (1984), 265–270.

[16]

Anne Chao and John Bunge. 2002. Estimating the number of species in a stochastic abundance model. Biometrics 58, 3 (2002), 531–539.

[17]

Anne Chao and Chun-Huo Chiu. 2016. Species richness: estimation and comparison. Wiley StatsRef: statistics reference online 1 (2016), 26.

[18]

Anne Chao, SM Lee, and SL Jeng. 1992. Estimating population size for capture-recapture data when capture probabilities vary by time and individual animal. Biometrics (1992), 201–216.

[19]

Chun-Huo Chiu, Yi-Ting Wang, Bruno A Walther, and Anne Chao. 2014. An improved nonparametric lower bound of species richness via a modified good–turing frequency formula. Biometrics 70, 3 (2014), 671–682.

[20]

Henry Coles. [n.d.]. PIT - Real world mutation testing. https://pitest.org.

[21]

Murial Daran and Pascale Thévenod-Fosse. 1996. Software error analysis: A real case study involving real faults and mutations. ACM SIGSOFT Software Engineering Notes 21, 3 (1996), 158–171.

Digital Library

[22]

Xavier Devroey, Alessio Gambi, Juan Pablo Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, and Sebastiano Panichella. 2021. JUGE: An Infrastructure for Benchmarking Java Unit Test Generators. CoRR abs/2106.07520 (2021). arXiv:2106.07520https://arxiv.org/abs/2106.07520

[23]

Xavier Devroey, Sebastiano Panichella, and Alessio Gambi. 2020. Java Unit Testing Tool Competition: Eighth Round. In ICSE ’20: 42nd International Conference on Software Engineering, Workshops, Seoul, Republic of Korea, 27 June - 19 July, 2020. ACM, 545–548. https://doi.org/10.1145/3387940.3392265

Digital Library

[24]

Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.

Digital Library

[25]

Alessio Gambi, Gunel Jahangirova, Vincenzo Riccio, and Fiorella Zampetti. 2022. SBST Tool Competition 2022. In 15th IEEE/ACM International Workshop on Search-Based Software Testing, SBST@ICSE 2022, Pittsburgh, PA, USA, May 9, 2022. IEEE, 25–32. https://doi.org/10.1145/3526072.3527538

Digital Library

[26]

Rahul Gopinath, Iftekhar Ahmed, Mohammad Amin Alipour, Carlos Jensen, and Alex Groce. 2017. Mutation reduction strategies considered harmful. IEEE Transactions on Reliability 66, 3 (2017), 854–874.

[27]

Rahul Gopinath, Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. 2016. On the limits of mutation reduction strategies. In Proceedings of the 38th International Conference on Software Engineering. ACM.

Digital Library

[28]

Nicholas J Gotelli and Anne Chao. 2013. Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. (2013).

[29]

Bernhard JM Grün, David Schuler, and Andreas Zeller. 2009. The impact of equivalent mutants. In 2009 International Conference on Software Testing, Verification, and Validation Workshops. IEEE, 192–199.

Digital Library

[30]

Joseph R Horgan and Aditya P Mathur. 1996. Software testing and reliability. In Handbook of software reliability engineering. 531–566.

[31]

Joaquín Hortal, Paulo AV Borges, and Clara Gaspar. 2006. Evaluating the performance of species richness estimators: sensitivity to sample grain size. Journal of animal ecology 75, 1 (2006), 274–287.

[32]

Yue Jia and Mark Harman. 2010. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2010), 649–678.

[33]

René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst, Reid Holmes, and Gordon Fraser. 2014. Are mutants a valid substitute for real faults in software testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 654–665.

Digital Library

[34]

Shen-Ming Lee and Anne Chao. 1994. Estimating population size via sample coverage for closed capture-recapture models. Biometrics (1994), 88–97.

[35]

L. Madeyski, W. Orzeszyna, R. Torkar, and M. Józala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation. IEEE Transactions on Software Engineering 40, 1 (2014), 23–42.

Digital Library

[36]

Imen Marsit, Mohamed Nazih Omri, JiMing Loh, and Ali Mili. 2018. Impact of Mutation Operators on Mutant Equivalence. In ICSOFT. 55–66.

[37]

Imen Marsit, Mohamed Nazih Omri, and Ali Mili. 2017. Estimating the Survival Rate of Mutants. In ICSOFT.

[38]

Tapan Nayak. 1988. Estimating Population Size by Recapture Sampling. Biometrika 75 (03 1988). https://doi.org/10.2307/2336441

[39]

James L Norris and Kenneth H Pollock. 1998. Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environmental and Ecological Statistics 5, 4 (1998), 391–402.

[40]

A. Jefferson Offutt and W. Michael Craft. 1994. Using compiler optimization techniques to detect equivalent mutants. Software Testing, Verification and Reliability 4, 3 (1994), 131–154. https://doi.org/10.1002/stvr.4370040303 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/stvr.4370040303

[41]

R Lyman Ott and Micheal T Longnecker. 2015. An introduction to statistical methods and data analysis. Cengage Learning.

[42]

Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2015. Reformulating branch coverage as a many-objective optimization problem. In IEEE International Conference on Software Testing, Verification and Validation. IEEE, 1–10.

[43]

Sebastiano Panichella, Alessio Gambi, Fiorella Zampetti, and Vincenzo Riccio. 2021. SBST Tool Competition 2021. In 14th IEEE/ACM International Workshop on Search-Based Software Testing, SBST 2021, Madrid, Spain, May 31, 2021. IEEE, 20–27. https://doi.org/10.1109/SBST52555.2021.00011

[44]

Mike Papadakis, Marcio Delamaro, and Yves Le Traon. 2014. Mitigating the effects of equivalent mutants with mutant classification strategies. Science of Computer Programming 95 (2014), 298–319.

Digital Library

[45]

Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Mutation testing advances: an analysis and survey. In Advances in Computers. Vol. 112. Elsevier, 275–378.

[46]

David Schuler and Andreas Zeller. 2010. (Un-) covering equivalent mutants. In 2010 Third International Conference on Software Testing, Verification and Validation. IEEE, 45–54.

Digital Library

[47]

Eric P. Smith and Gerald van Belle. 1984. Nonparametric Estimation of Species Richness. Biometrics 40, 1 (1984), 119–129. http://www.jstor.org/stable/2530750

[48]

Yoshihiro Tohma, Kenshin Tokunaga, Shinji Nagase, and Yukihisa Murata. 1989. Structural approach to the estimation of the number of residual software faults based on the hyper-geometric distribution. IEEE transactions on software engineering 15, 3 (1989), 345–355.

Digital Library

[49]

Auri Vincenzi, Elisa Nakagawa, José Maldonado, Márcio Delamaro, and Roseli Romero. 2002. Bayesian-Learning Based Guidelines to Determine Equivalent Mutants.International Journal of Software Engineering and Knowledge Engineering 12 (12 2002), 675–689. https://doi.org/10.1142/S021819400200113X

[50]

Jeffrey M. Voas and Gary McGraw. 1997. Software Fault Injection: Inoculating Programs against Errors. John Wiley & Sons, Inc., USA.

[51]

Ji-Ping Wang. 2010. Estimating species richness by a Poisson-compound gamma model. Biometrika 97, 3 (2010), 727–740.

[52]

Ji-Ping Z Wang and Bruce G Lindsay. 2005. A penalized nonparametric maximum likelihood approach to species richness estimation. J. Amer. Statist. Assoc. 100, 471 (2005), 942–959.

[53]

Xiangjuan Yao, Mark Harman, and Yue Jia. 2014. A study of equivalent and stubborn mutation operators using human analysis of equivalence. In Proceedings of the 36th International Conference on Software Engineering. 919–930.

Digital Library

Index Terms

Empirical Evaluation of Frequency Based Statistical Models for Estimating Killable Mutants
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Empirical software validation
      2. Software defect analysis
        Software testing and debugging

Recommendations

The care and feeding of wild-caught mutants
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Mutation testing of a test suite and a program provides a way to measure the quality of the test suite. In essence, mutation testing is a form of sensitivity testing: by running mutated versions of the program against the test suite, mutation testing ...
Avoiding useless mutants
GPCE 2017: Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences

Mutation testing is a program-transformation technique that injects artificial bugs to check whether the existing test suite can detect them. However, the costs of using mutation testing are usually high, hindering its use in industry. Useless mutants (...
Prioritizing mutants to guide mutation testing
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Mutation testing offers concrete test goals (mutants) and a rigorous test efficacy criterion, but it is expensive due to vast numbers of mutants, many of which are neither useful nor actionable. Prior work has focused on selecting representative and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '24: Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

October 2024

633 pages

ISBN:9798400710476

DOI:10.1145/3674805

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Australian Research Council

Conference

ESEM '24

Sponsor:

SIGSOFT

ESEM '24: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 24 - 25, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
43
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)14

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten