Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3239235.3240500acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
research-article

Are mutants really natural?: a study on how "naturalness" helps mutant selection

Published: 11 October 2018 Publication History

Abstract

Background: Code is repetitive and predictable in a way that is similar to the natural language. This means that code is "natural" and this "naturalness" can be captured by natural language modelling techniques. Such models promise to capture the program semantics and identify source code parts that `smell', i.e., they are strange, badly written and are generally error-prone (likely to be defective). Aims: We investigate the use of natural language modelling techniques in mutation testing (a testing technique that uses artificial faults). We thus, seek to identify how well artificial faults simulate real ones and ultimately understand how natural the artificial faults can be. Our intuition is that natural mutants, i.e., mutants that are predictable (follow the implicit coding norms of developers), are semantically useful and generally valuable (to testers). We also expect that mutants located on unnatural code locations (which are generally linked with error-proneness) to be of higher value than those located on natural code locations. Method: Based on this idea, we propose mutant selection strategies that rank mutants according to a) their naturalness (naturalness of the mutated code), b) the naturalness of their locations (naturalness of the original program statements) and c) their impact on the naturalness of the code that they apply to (naturalness differences between original and mutated statements). We empirically evaluate these issues on a benchmark set of 5 open-source projects, involving more than 100k mutants and 230 real faults. Based on the fault set we estimate the utility (i.e. capability to reveal faults) of mutants selected on the basis of their naturalness, and compare it against the utility of randomly selected mutants. Results: Our analysis shows that there is no link between naturalness and the fault revelation utility of mutants. We also demonstrate that the naturalness-based mutant selection performs similar (slightly worse) to the random mutant selection. Conclusions: Our findings are negative but we consider them interesting as they confute a strong intuition, i.e., fault revelation is independent of the mutants' naturalness.

References

[1]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSF-22), Hong Kong, China, November 16 - 22, 2014. 281--293.
[2]
Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles A. Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. CoRR abs/1709.06182 (2017). http://arxiv.org/abs/1709.06182
[3]
J.H. Andrews, L.C. Briand, Y. Labiche, and A.S. Namin. 2006. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria. Software Engineering, IEEE Transactions on 32, 8 (2006), 608--624.
[4]
F. Belli, M. Beyazit, T. Takagi, and Z. Furukawa. 2011. Mutation Testing of "Go-Back" Functions Based on Pushdown Automata. In 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. 249--258.
[5]
David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps. 2017. The Care and Feeding of Wild-caught Mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSF 2017). ACM, New York, NY, USA, 511--522.
[6]
Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F. Bissyandé, Yves Le Traon, and Koushik Sen. 2018. Selecting Fault Revealing Mutants. CoRR abs/1803.07901 (2018). arXiv:1803.07901 http://arxiv.org/abs/1803.07901
[7]
Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. 2017. An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 597--608.
[8]
Stanley F. Chen and Joshua Goodman. 1999. An Empirical Study of Smoothing Techniques for Language Modeling. Comput. Speech Lang. 13, 4 (Oct. 1999), 359--394.
[9]
J. A. Clark, H. Dan, and R. M. Hierons. 2010. Semantic Mutation Testing. In 2010 Third International Conference on Software Testing, Verification, and Validation Workshops. 100--109.
[10]
H. Dan and R. M. Hierons. 2012. SMT-C: A Semantic Mutation Testing Tools for C. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. 654--663.
[11]
L. Deng, N. Mirzaei, P. Ammann, and J. Offutt. 2015. Towards mutation analysis of Android apps. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 1--10.
[12]
Xavier Devroey, Gilles Perrouin, Mike Papadakis, Axel Legay, Pierre-Yves Schobbens, and Patrick Heymans. 2016. Featured Model-based Mutation Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 655--666.
[13]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011. 416--419.
[14]
Mark Gabel and Zhendong Su. 2010. A Study of the Uniqueness of Source Code. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '10). ACM, New York, NY, USA, 147--156.
[15]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. 2012. On the naturalness of software. In 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. 837--847.
[16]
Reyhaneh Jabbarvand and Sam Malek. 2017. μDroid: An Energy-aware Mutation Testing Framework for Android. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 208--219.
[17]
Y. Jia and M. Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. Software Engineering, IEEE Transactions on 37, 5 (2011), 649--678.
[18]
Matthieu Jimenez, Maxime Cordy, Yves Le Traon, and Mike Papadakis. 2018. On the Impact of Tokenizer and Parameters on N-Gram Based Code Analysis. In 34th International Conference on Conference on Software Maintenance and Evolution, ICSME 2018, September 23 - 29, 2018, Madrid, Spain.
[19]
Matthieu Jimenez, Maxime Cordy, Yves Le Traon, and Mike Papadakis. 2018. TUNA: TUning Naturalness-based Analysis. In 34th International Conference on Conference on Software Maintenance and Evolution, ICSME 2018, September 23 - 29, 2018, Madrid, Spain.
[20]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In International Symposium on Software Testing and Analysis, ISSTA '14, San Jose, CA, USA - July 21 - 26, 2014. 437--440.
[21]
René Just, Franz Schweiggert, and Gregory M. Kapfhammer. 2011. MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, 2011. 612--615.
[22]
Marinos Kintis, Mike Papadakis, Yue Jia, Nicos Malevris, Yves Le Traon, and Mark Harman. 2018. Detecting Trivial Mutant Equivalences via Compiler Optimisations. IEEE Trans. Software Eng. 44, 4 (2018), 308--333.
[23]
Marinos Kintis, Mike Papadakis, Andreas Papadopoulos, Evangelos Valvis, Nicos Malevris, and Yves Le Traon. 2018. How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults. Empirical Software Engineering 23, 4 (2018), 2426--2463.
[24]
Donald E. Knuth. 1984. Literate Programming. Comput. J. 27, 2 (May 1984), 97--111.
[25]
Bob Kurtz, Paul Ammann, Jeff Offutt, Márcio Eduardo Delamaro, Mariet Kurtz, and Nida Gökçe. 2016. Analyzing the validity of selective mutation with dominator mutants. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016. 571--582.
[26]
T. Laurent, M. Papadakis, M. Kintis, C. Henard, Y. L. Traon, and A. Ventresque. 2017. Assessing and Improving the Mutation Testing Practice of PIT. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). 430--435.
[27]
Mario Linares-Vásquez, Gabriele Bavota, Michele Tufano, Kevin Moran, Massimiliano Di Penta, Christopher Vendome, Carlos Bernal-Cárdenas, and Denys Poshyvanyk. 2017. Enabling Mutation Testing for Android Apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 233--244.
[28]
T. Loise, X. Devroey, G. Perrouin, M. Papadakis, and P. Heymans. 2017. Towards Security-Aware Mutation Testing. In 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 97--102.
[29]
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
[30]
Michaël Marcozzi, Sébastien Bardin, Nikolai Kosmatov, Mike Papadakis, Virgile Prevosto, and Loïc Correnson. 2018. Time to clean your test objectives. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 456--467.
[31]
Akbar Siami Namin, Xiaozhen Xue, Omar Rosas, and Pankaj Sharma. 2015. MuRanker: a mutant ranking tool. Software Testing, Verification and Reliability 25, 5-7 (2015), 572--604.
[32]
Graham Neubig. 2017. Kyoto Language Modeling Toolkit. (2017). https://github.com/neubig/kylm
[33]
A. Jefferson Offutt and J. Huffman Hayes. 1996. A Semantic Model of Program Faults. In Proceedings of the 1996 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '96). ACM, New York, NY, USA, 195--200.
[34]
A. Jefferson Offutt, Ammei Lee, Gregg Rothermel, Roland H. Untch, and Christian Zapf. 1996. An Experimental Determination of Sufficient Mutant Operators. ACM Trans. Softw. Eng. Methodol. 5, 2 (1996), 99--118.
[35]
A.J. Offutt and R. H. Untch. 2001. Mutation 2000: Uniting the Orthogonal. In Mutation Testing for the New Century, W.Eric Wong (Ed.). The Springer International Series on Advances in Database Systems, Vol. 24. Springer US, 34--44.
[36]
Carlos Pacheco and Michael D. Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada. 815--816.
[37]
Mike Papadakis, Thierry Titcheu Chekam, and Yves Le Traon. 2018. Mutant Quality Indicators. In 13th International Workshop on Mutation Analysis (MUTATION'18).
[38]
Mike Papadakis, Christopher Henard, Mark Harman, Yue Jia, and Yves Le Traon. 2016. Threats to the Validity of Mutation-based Test Assessment. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 354--365.
[39]
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2018. Mutation Testing Advances: An Analysis and Survey. Advances in Computers (2018).
[40]
Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection?: a large scale empirical study on the relationship between mutants and real faults. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 537--548.
[41]
Java Parser. 2017. Java Parser Github. (2017). https://github.com/javaparser/javaparser
[42]
Rene Pickhardt, Thomas Gottron, Steffen Staab, Paul Georg Wagner, Till Speicher, and Typology Gbr. 2014. A generalized language model as the combination of skipped n-grams and modified Kneser-Ney smoothing. In In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
[43]
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the "Naturalness" of Buggy Code. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 428--439.
[44]
Mohan Sridharan and Akbar Siami Namin. 2010. Prioritizing Mutation Operators Based on Importance Sampling. In IEEE 21st International Symposium on Software Reliability Engineering, ISSRE 2010, San Jose, CA, USA, 1-4 November 2010. 378--387.
[45]
Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the Localness of Software. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 269--280.
[46]
A. Vargha and H. D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Jrnl. Educ. Behav. Stat. 25, 2 (2000), 101--132.
[47]
Fan Wu, Jay Nanavati, Mark Harman, Yue Jia, and Jens Krinke. 2017. Memory mutation testing. Information and Software Technology 81, Supplement C (2017), 97 -- 111.

Cited By

View all
  • (2024)Delta4Ms: Improving mutation‐based fault localization by eliminating mutant biasSoftware Testing, Verification and Reliability10.1002/stvr.187234:4Online publication date: 16-Jan-2024
  • (2022)Do bugs lead to unnaturalness of source code?Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549149(1085-1096)Online publication date: 7-Nov-2022
  • (2022)An Empirical Study on Higher-Order Mutation-Based Fault LocalizationInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402250001232:01(1-35)Online publication date: 24-Feb-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
October 2018
487 pages
ISBN:9781450358231
DOI:10.1145/3239235
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fault revelation
  2. language models
  3. mutation testing

Qualifiers

  • Research-article

Conference

ESEM '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Delta4Ms: Improving mutation‐based fault localization by eliminating mutant biasSoftware Testing, Verification and Reliability10.1002/stvr.187234:4Online publication date: 16-Jan-2024
  • (2022)Do bugs lead to unnaturalness of source code?Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549149(1085-1096)Online publication date: 7-Nov-2022
  • (2022)An Empirical Study on Higher-Order Mutation-Based Fault LocalizationInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402250001232:01(1-35)Online publication date: 24-Feb-2022
  • (2022)Enhancement of Mutation Testing via Fuzzy Clustering and Multi-Population Genetic AlgorithmIEEE Transactions on Software Engineering10.1109/TSE.2021.305298748:6(2141-2156)Online publication date: 1-Jun-2022
  • (2022)Can Higher-Order Mutants Improve the Performance of Mutation-Based Fault Localization?IEEE Transactions on Reliability10.1109/TR.2022.316203971:2(1157-1173)Online publication date: Jun-2022
  • (2022)Improving the Performance of Mutation-based Fault Localization via Mutant Bias Practical Experience Report2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE55969.2022.00038(309-320)Online publication date: Oct-2022
  • (2022)µBert: Mutation Testing using Pre-Trained Language Models2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW55395.2022.00039(160-169)Online publication date: Apr-2022
  • (2020)Do Programmers Prefer Predictable Expressions in Code?Cognitive Science10.1111/cogs.1292144:12Online publication date: 12-Dec-2020
  • (2020)Increasing Mutation Testing Effectiveness by Combining Lower Order Mutants to Construct Higher Order MutantsComputational Collective Intelligence10.1007/978-3-030-63007-2_16(205-216)Online publication date: 30-Nov-2020
  • (2019)On the Effectiveness of Using Elitist Genetic Algorithm in Mutation TestingSymmetry10.3390/sym1109114511:9(1145)Online publication date: 9-Sep-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media