Abstract
Reducing the number of bugs is a crucial issue during software development and maintenance. Software process and product metrics are good indicators of software complexity. These metrics have been used to build bug predictor models to help developers maintain the quality of software. In this paper we empirically evaluate the use of hunk metrics as predictor of bugs. We present a technique for bug prediction that works at smallest units of code change called hunks. We build bug prediction models using random forests, which is an efficient machine learning classifier. Hunk metrics are used to train the classifier and each hunk metric is evaluated for its bug prediction capabilities. Our classifier can classify individual hunks as buggy or bug-free with 86 % accuracy, 83 % buggy hunk precision and 77% buggy hunk recall. We find that history based and change level hunk metrics are better predictors of bugs than code level hunk metrics.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brun, Y., Ernst, M.D.: Finding Latent Code Errors via Machine Learning over Program Executions. In: Proc. of 26th International Conference on Software Engineering (ICSE 2004), Scotland, UK, pp. 480–490 (2004)
Fenton, N., Neil, M.: A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering 25, 675–689 (1999)
Ferzund, J., Ahsan, S.N., Wotawa, F.: Automated Classification of Faults in Programms using Machine Learning Techniques. In: AISEW, European Conference on Artificial Intelligence, Patras, Greece (July 2008)
Ferzund, J., Ahsan, S.N., Wotawa, F.: Analysing Bug Prediction Capabilities of Static Code Metrics in Open Source Software. In: Dumke, R.R., Braungarten, R., Büren, G., Abran, A., Cuadrado-Gallego, J.J. (eds.) IWSM 2008. LNCS, vol. 5338, pp. 331–343. Springer, Heidelberg (2008)
Fischer, M., Pinzger, M., Gall, H.: Populating a release history database from version control and bug tracking systems. In: Proc. 19th Int’l Conference on Software Maintenance (ICSM 2003), Amsterdam, The Netherlands, pp. 23–32 (2003)
Graves, T.L., Karr, A.F., Marron, J.S., Siy, H.: Predicting fault incidence using software change history. IEEE Transactions on Software Engineering 26, 653–661 (2000)
Guilford, J.P., Fruchter, B.: Fundamental Statistics in Psychology and Education, 5th edn. McGraw-Hill, New York (1973)
Gyimothy, T., Ferenc, R., Siket, I.: Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction. IEEE Trans. Software Eng. 31(10), 897–910 (2005)
Hassan, A.E., Holt, R.C.: The Top Ten List: Dynamic Fault Prediction. In: Proc. 21st Int’l Conf. Software Maintenance, pp. 263–272 (2005)
Khoshgoftaar, T.M., Bhattacharyya, B.B., Richardson, G.D.: Predicting Software Errors, During Development, Using Nonlinear Regression Models: A Comparative Study. IEEE Transactions on Reliability 41, 390–395 (1992)
Kim, S., Whitehead Jr., E.J., Zhang, Y.: Classifying Software Changes: Clean or Buggy? IEEE Trans. Software Eng. 34(2), 181–196 (2008)
Kim, S., Pan, K., Whitehead Jr., E.J.: Memories of Bug Fixes. In: Proc. 14th ACM Symp. Foundations of Software Eng., pp. 35–45 (2006)
Kleinbaum, D.G., Klein, M.: Logistic Regression- A Self-Learning Text, 2nd edn. Springer, New York (2002)
Koru, A.G., Liu, H.: Building effective defect-prediction models in practice. IEEE Software 22, 23–29 (2005)
Lanubile, F., Visaggio, G.: Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned. Journal of Systems and Software 38, 225–234 (1997)
Menzies, T., Greenwald, J., Frank, A.: Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Trans. Software Eng. 33(1), 2–13 (2007)
Mockus, A., Votta, L.G.: Identifying Reasons for Software Changes Using Historic Databases. In: Proc. of 16th International Conference on Software Maintenance (ICSM 2000), San Jose, California, USA, pp. 120–130 (2000)
Mockus, A., Weiss, D.M.: Predicting Risk of Software Changes. Bell Labs Technical J. 5(2), 169–180 (2002)
Nagappan, N., Ball, T., Zeller, A.: Mining Metrics to Predict Component Failures. In: Proc. of 28th International Conference on Software Engineering, Shanghai, China (May 2006)
Neumann, D.E.: An Enhanced Neural Network Technique for Software Risk Analysis. IEEE Tran. Software Eng. (September 2002)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. Software Eng. 31(4), 340–355 (2005)
Pan, K., Kim, S., Whitehead Jr., E.J.: Bug Classification Using Program Slicing Metrics. In: Proc. Sixth IEEE Int’l Workshop Source Code Analysis and Manipulation (2006)
Porter, A., Selby, R.: Empirically-guided software development using metric-based classification trees. IEEE Software 7, 46–54 (1990)
Ratzinger, J., Pinzger, M., Gall, H.: EQ-Mine: Predicting Short-Term Defects for Software Evolution. In: Dwyer, M.B., Lopes, A. (eds.) FASE 2007. LNCS, vol. 4422, pp. 12–26. Springer, Heidelberg (2007)
Shepperd, M., Kadoda, G.: Comparing software prediction techniques using simulation. IEEE Trans. Software Eng. 27, 1014–1022 (2001)
Sliwerski, J., Zimmermann, T., Zeller, A.: When Do Changes Induce Fixes? In: Proc. of Int’l Workshop on Mining Software Repositories (MSR 2005), Saint Louis, Missouri, USA, pp. 24–28 (2005)
Williams, C.C., Hollingsworth, J.K.: Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques. IEEE Trans. on Software Engineering 31(6), 466–480 (2005)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferzund, J., Ahsan, S.N., Wotawa, F. (2009). Empirical Evaluation of Hunk Metrics as Bug Predictors. In: Abran, A., Braungarten, R., Dumke, R.R., Cuadrado-Gallego, J.J., Brunekreef, J. (eds) Software Process and Product Measurement. IWSM 2009. Lecture Notes in Computer Science, vol 5891. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05415-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-05415-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05414-3
Online ISBN: 978-3-642-05415-0
eBook Packages: Computer ScienceComputer Science (R0)