Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1540438.1540448acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

Revisiting the evaluation of defect prediction models

Published: 18 May 2009 Publication History

Abstract

Defect Prediction Models aim at identifying error-prone parts of a software system as early as possible. Many such models have been proposed, their evaluation, however, is still an open question, as recent publications show.
An important aspect often ignored during evaluation is the effort reduction gained by using such models. Models are usually evaluated per module by performance measures used in information retrieval, such as recall, precision, or the area under the ROC curve (AUC). These measures assume that the costs associated with additional quality assurance activities are the same for each module, which is not reasonable in practice. For example, costs for unit testing and code reviews are roughly proportional to the size of a module.
In this paper, we investigate this discrepancy using optimal and trivial models. We describe a trivial model that takes only the module size measured in lines of code into account, and compare it to five classification methods. The trivial model performs surprisingly well when evaluated using AUC. However, when an effort-sensitive performance measure is used, it becomes apparent that the trivial model is in fact the worst.

References

[1]
E. Arisholm, L. C. Briand, and M. Fuglerud. Data mining techniques for building fault-proneness models in telecom java software. In ISSRE '07: Proceedings of the 18th IEEE International Symposium on Software Reliability Engineering, pages 215--224. IEEE Press, 2007.
[2]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.
[3]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[4]
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, 1984.
[5]
J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1--30, 2006.
[6]
E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel. e1071: Misc functions of the department of statistics (e1071), TU Wien, 2009. R package version 1.5--19.
[7]
K. E. Emam, S. Benlarbi, N. Goel, and S. N. Rai. Comparing case-based reasoning classifiers for predicting high risk software components. Journal of Systems Software, 55(3):301--320, 2001.
[8]
T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861--874, 2006.
[9]
M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32:675--701, 1937.
[10]
M. H. Halstead. Elements of Software Science. Elsevier Science Inc., New York, NY, USA, 1977.
[11]
Y. Jiang, B. Cukic, and Y. Ma. Techniques for evaluating fault prediction models. Empirical Software Engineering, 13(5):561--595, 2008.
[12]
Y. Jiang, B. Cukic, and T. Menzies. Can data transformation help in the detection of fault-prone modules? In DEFECTS '08: Proceedings of the 2008 workshop on Defects in large software systems. ACM, 2008.
[13]
Y. Jiang, B. Cukic, and T. Menzies. Costs curve evaluation of fault prediction models. In ISSRE'08: Proceedings of the 19th International Symposium on Software Reliability Engineering, pages 197--206. IEEE Press, 2008.
[14]
Y. Jiang, B. Cukic, T. Menzies, and N. Bartlow. Comparing design and code metrics for software quality prediction. In PROMISE '08: Proceedings of the 4th international workshop on Predictor models in software engineering, pages 11--18. ACM, 2008.
[15]
T. M. Khoshgoftaar and E. B. Allen. Ordering fault-prone software modules. Software Quality Journal, 11(1):19--37, 2003.
[16]
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4):485--496, 2008.
[17]
A. Liaw and M. Wiener. Classification and regression by randomForest. R News, 2(3):18--22, 2002.
[18]
Y. Ma and B. Cukic. Adequate and precise evaluation of quality models in software engineering studies. In PROMISE '07: Proceedings of the Third International Workshop on Predictor Models in Software Engineering. IEEE Press, 2007.
[19]
T. J. McCabe. A complexity measure. IEEE Transactions on Software Engineering, 2(4):308--320, 1976.
[20]
T. Mende, R. Koschke, and M. Leszak. Evaluating defect prediction models for a large, evolving software system. In Proceedings of the 13th European Conference on Software Maintenance and Reengineering, pages 247--250. IEEE Press, 2009.
[21]
T. Menzies, A. Dekhtyar, J. Distefano, and J. Greenwald. Problems with precision: A response to "comments on 'data mining static code attributes to learn defect predictors"'. IEEE Transactions on Software Engineering, 33(9):637--640, 2007.
[22]
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1):2--13, 2007.
[23]
N. Ohlsson and H. Alberg. Predicting fault-prone software modules in telephone switches. IEEE Transactions on Software Engineering, 22(12):886--894, 1996.
[24]
T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4):340--355, 2005.
[25]
T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Automating algorithms for the identification of fault-prone files. In ISSTA '07: Proceedings of the 2007 international symposium on Software testing and analysis, pages 219--227, New York, NY, USA, 2007. ACM.
[26]
A. Peters and T. Hothorn. ipred: Improved Predictors, 2008. R package version 0.8--6.
[27]
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2008. ISBN 3-900051-07-0.
[28]
T. Sing, O. Sander, N. Beerenwinkel, and T. Lengauer. ROCR: visualizing classifier performance in R. Bioinformatics, 21(20):3940--3941, 2005.
[29]
P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison Wesley, 1st edition, May 2005.
[30]
H. Zhang and X. Zhang. Comments on "data mining static code attributes to learn defect predictors". IEEE Transactions on Software Engineering, 33(9):635--637, 2007.

Cited By

View all
  • (2024)Risky Dynamic Typing-related Practices in Python: An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/364959333:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Prevalence and severity of design anti-patterns in open source programs—A large-scale studyInformation and Software Technology10.1016/j.infsof.2024.107429170:COnline publication date: 1-Jun-2024
  • (2024)Improving effort-aware just-in-time defect prediction with weighted code churn and multi-objective slime mold algorithmHeliyon10.1016/j.heliyon.2024.e37360(e37360)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE '09: Proceedings of the 5th International Conference on Predictor Models in Software Engineering
May 2009
268 pages
ISBN:9781605586342
DOI:10.1145/1540438
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cost-sensitive performance measures
  2. defect prediction

Qualifiers

  • Research-article

Funding Sources

Conference

Promise '09
Promise '09: 5th International Workshop on Predictor Models in SE
May 18 - 19, 2009
British Columbia, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Risky Dynamic Typing-related Practices in Python: An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/364959333:6(1-35)Online publication date: 27-Jun-2024
  • (2024)Prevalence and severity of design anti-patterns in open source programs—A large-scale studyInformation and Software Technology10.1016/j.infsof.2024.107429170:COnline publication date: 1-Jun-2024
  • (2024)Improving effort-aware just-in-time defect prediction with weighted code churn and multi-objective slime mold algorithmHeliyon10.1016/j.heliyon.2024.e37360(e37360)Online publication date: Sep-2024
  • (2024)The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspectsEmpirical Software Engineering10.1007/s10664-024-10454-829:4Online publication date: 8-Jun-2024
  • (2024)Other Research Questions of SDPIntelligent Software Defect Prediction10.1007/978-981-99-2842-2_7(171-201)Online publication date: 18-Jan-2024
  • (2024)IntroductionIntelligent Software Defect Prediction10.1007/978-981-99-2842-2_1(1-11)Online publication date: 18-Jan-2024
  • (2023)The Impact of the bug number on Effort-Aware Defect Prediction: An Empirical StudyProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609458(67-78)Online publication date: 4-Aug-2023
  • (2023)Novel Approach for Software Reliability Analysis Controlled with Multifunctional Machine Learning Approach2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC57686.2023.10193348(1445-1450)Online publication date: 6-Jul-2023
  • (2023)An Experimental Analysis of the Software Bug Prediction and Identifications Approaches with Different Levels of Inheritance2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI)10.1109/ICAEECI58247.2023.10370782(1-5)Online publication date: 19-Oct-2023
  • (2023)A Cost-Effectiveness Metric for Association Rule Mining in Software Defect Prediction2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00418(2615-2620)Online publication date: 24-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media