Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3338906.3338945acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Understanding flaky tests: the developer’s perspective

Published: 12 August 2019 Publication History

Abstract

Flaky tests are software tests that exhibit a seemingly random outcome (pass or fail) despite exercising unchanged code. In this work, we examine the perceptions of software developers about the nature, relevance, and challenges of flaky tests.
We asked 21 professional developers to classify 200 flaky tests they previously fixed, in terms of the nature and the origin of the flakiness, as well as of the fixing effort. We also examined developers' fixing strategies. Subsequently, we conducted an online survey with 121 developers with a median industrial programming experience of five years. Our research shows that: The flakiness is due to several different causes, four of which have never been reported before, despite being the most costly to fix; flakiness is perceived as significant by the vast majority of developers, regardless of their team's size and project's domain, and it can have effects on resource allocation, scheduling, and the perceived reliability of the test suite; and the challenges developers report to face regard mostly the reproduction of the flaky behavior and the identification of the cause for the flakiness. Public preprint [http://arxiv.org/abs/1907.01466], data and materials [https://doi.org/10.5281/zenodo.3265785].

References

[1]
Jonathan Bell and Gail Kaiser. 2014. Unit Test Virtualization with VMVM. In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 550–561.
[2]
Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In Proceedings of the International Conference on Software Engineering (ICSE). To Appear.
[3]
Juliet M Corbin and Anselm Strauss. 1990. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative sociology 13, 1 (1990), 3–21.
[4]
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Data and materials for: ‘Understanding Flaky Tests: The Developer’s Perspective’.
[5]
E. Farchi, Y. Nir, and S. Ur. 2003. Concurrent bug patterns and how to test them. In Proceedings International Parallel and Distributed Processing Symposium. 7 pp.–.
[6]
Timothy S Flanigan, Emily McFarlane, and Sarah Cook. 2008. Conducting survey research among physicians and other medical professionals: a review of current literature. In Proceedings of the Survey Research Methods Section, American Statistical Association, Vol. 1. 4136–47.
[7]
Martin Fowler. {n.d.}. Eradicating non-determinism in tests. https://martinfowler. com/articles/nonDeterminism.html
[8]
M. Fowler. 1999. Refactoring: improving the design of existing code. Addison-Wesley.
[9]
Gordon Fraser and Andrea Arcuri. 2013. Whole test suite generation. IEEE Transactions on Software Engineering 39, 2 (2013), 276–291.
[10]
Vahid Garousi, Michael Felderer, and Mika V Mäntylä. 2016. The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering. ACM, 26.
[11]
Michael Hilton, Jonathan Bell, and Darko Marinov. 2018. A Large-Scale, Longitudinal Study of Test Coverage Evolution. In 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2018). http://jonbell.net/ publications/coverage
[12]
Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit. 2011. Automated Atomicity-violation Fixing. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 389–400.
[13]
R Burke Johnson and Anthony J Onwuegbuzie. 2004. Mixed methods research: A research paradigm whose time has come. Educational researcher 33, 7 (2004), 14–26.
[14]
Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 329–339.
[15]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In Proceedings of the SIGSOFT International Symposium on Foundations of Software Engineering (FSE). ACM, 643–653.
[16]
Paul Marinescu, Petr Hosek, and Cristian Cadar. 2014. Covrig: A Framework for the Analysis of Code, Test, and Coverage Evolution in Real Software. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA). ACM, 93–104.
[17]
Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications: Models, Tools, and Controlling Flakiness. In Proceedings of the International Conference on Software Engineering (ICSE). IEEE, 1479–1480.
[18]
Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding Bugs by Isolating Unit Tests. In Proceedings of the SIGSOFT Symposium on Foundations of Software Engineering and the European Conference on Software Engineering (ESEC/FSE). ACM, 496–499.
[19]
A. N. Oppenheim. 1992. Questionnaire Design, Interviewing and Attitude Measurement. Pinter Publishers.
[20]
Fabio Palomba, Annibale Panichella, Andy Zaidman, Rocco Oliveto, and Andrea De Lucia. 2016. Automatic test case generation: What if test code quality matters?. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 130–141.
[21]
Fabio Palomba and Andy Zaidman. 2017. Does refactoring of test smells induce fixing flaky tests?. In Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017. 1–12. 2017.12
[22]
Fabio Palomba and Andy Zaidman. 2017. Does refactoring of test smells induce fixing flaky tests?. In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on. IEEE, 1–12.
[23]
Fabio Palomba and Andy Zaidman. 2019. The smell of fear: On the relation between test smells and flaky tests. Journal of Empirical Software Engineering (2019).
[24]
Fabio Palomba, Andy Zaidman, and AD Lucia. 2018. Automatic test smell detection using information retrieval techniques. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME). IEEE.
[25]
Davide Spadini, Maurício Aniche, Magiel Bruntink, and Alberto Bacchelli. 2017. To Mock or Not To Mock? An Empirical Study on Mocking Practices. In Mining Software Repositories (MSR), 2017 IEEE/ACM 14th International Conference on. IEEE, 402–412.
[26]
Davide Spadini, Maurício Aniche, Magiel Bruntink, and Alberto Bacchelli. 2019. Mock objects for testing java systems: Why and how developers use them, and how they evolve. Empirical Software Engineering 24, 3 (Jun 2019), 1461–1498.
[27]
Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, and Alberto Bacchelli. 2019. Test-driven code review: an empirical study. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 1061–1072.
[28]
Davide Spadini, Fabio Palomba, Andy Zaidman, Magiel Bruntink, and Alberto Bacchelli. 2018. On the relation of test smells to software code quality. In Proceedings of the International Conference on Software Maintenance and Evolution (ICSME). IEEE.
[29]
Arie van Deursen, Leon Moonen, Alex Bergh, and Gerard Kok. 2001. Refactoring Test Code. In Proceedings of the 2nd International Conference on Extreme Programming and Flexible Processes in Software Engineering (XP). 92–95.
[30]
Marilyn Domas White and Emily E Marsh. 2006. Content analysis: A flexible methodology. Library trends 55, 1 (2006), 22–45.
[31]
Mozilla wiki. 2019. Sheriffing. https://wiki.mozilla.org/Sheriffing.
[32]
Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering. ACM, 38.

Cited By

View all
  • (2025)Combinatorial transition testing in dynamically adaptive systems: Implementation and test oracleJournal of Systems and Software10.1016/j.jss.2024.112260221(112260)Online publication date: Mar-2025
  • (2025)How and why developers implement OS-specific testsEmpirical Software Engineering10.1007/s10664-024-10571-430:1Online publication date: 1-Feb-2025
  • (2024)Efficient Incremental Code Coverage Analysis for Regression Test SuitesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695551(1882-1894)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Empirical Studies
  2. Flaky Tests
  3. Mixed-Method Research

Qualifiers

  • Research-article

Conference

ESEC/FSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)27
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Combinatorial transition testing in dynamically adaptive systems: Implementation and test oracleJournal of Systems and Software10.1016/j.jss.2024.112260221(112260)Online publication date: Mar-2025
  • (2025)How and why developers implement OS-specific testsEmpirical Software Engineering10.1007/s10664-024-10571-430:1Online publication date: 1-Feb-2025
  • (2024)Efficient Incremental Code Coverage Analysis for Regression Test SuitesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695551(1882-1894)Online publication date: 27-Oct-2024
  • (2024)Reducing Test Runtime by Transforming Test FixturesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695541(1757-1769)Online publication date: 27-Oct-2024
  • (2024)Towards a Robust Waiting Strategy for Web GUI Testing for an Industrial Software SystemProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695269(2065-2076)Online publication date: 27-Oct-2024
  • (2024)Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANAProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695407(572-581)Online publication date: 24-Oct-2024
  • (2024)Practitioners’ Expectations on Automated Test GenerationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680386(1618-1630)Online publication date: 11-Sep-2024
  • (2024)Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via a Single Event DelayProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680377(1504-1515)Online publication date: 11-Sep-2024
  • (2024)Neurosymbolic Repair of Test FlakinessProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680369(1402-1414)Online publication date: 11-Sep-2024
  • (2024)Can ChatGPT Repair Non-Order-Dependent Flaky Tests?Proceedings of the 1st International Workshop on Flaky Tests10.1145/3643656.3643900(22-29)Online publication date: 14-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media