Abstract
In some studies, Support Vector Machines (SVMs) have been turned out to be promising for predicting fault-prone software components. Nevertheless, the performance of the method depends on the setting of some parameters. To address this issue, we propose the use of a Genetic Algorithm (GA) to search for a suitable configuration of SVMs parameters that allows us to obtain optimal prediction performance. The approach has been assessed carrying out an empirical analysis based on jEdit data from the PROMISE repository. We analyzed both the inter- and the intra-release performance of the proposed method. As benchmarks we exploited SVMs with Grid-search and several other machine learning techniques. The results show that the proposed approach let us to obtain an improvement of the performance with an increasing of the Recall measure without worsening the Precision one. This behavior was especially remarkable for the inter-release use with respect to the other prediction techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arisholm, E., Briand, L., Johannessen, B.: Data mining techniques, candidate measures and evaluation methods for building practically useful fault-proneness prediction models. Simula Research Laboratory Technical Report, 2008-06
Arisholm, E., Briand, L., Johannessen, B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83, 2–17 (2010)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Briand, L., Langley, T., Wiekzorek, I.: A Replicated Assessment and Comparison of Common Software Cost Modeling Techniques. In: Procs of the International Conference on Software Engineering, pp. 377–386. IEEE press, Los Alamitos (2000)
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: How Effective is Tabu Search to Configure Support Vector Regression for Effort Estimation? In: Procs of the International Conference on Predictive Models in Software Engineering, p. 4 (2010)
Chang, C.C., Lin, C.-J.: LIBSVM: a library for support vector machines, (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20(6), 476–493 (1994)
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. Journal of Systems and Software 81(5), 649–660 (2008)
Ferrucci, F., Gravino, C., Oliveto, R., Sarro, F.: Genetic Programming for Effort Estimation: an Analysis of the Impact of Different Fitness Functions. In: Procs of the 2nd International Symposium on Search Based Software Engineering, pp. 89–98. IEEE Computer Society, Los Alamitos (2010)
Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Boston (1997)
Goldberg, E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989)
Gondra, I.: Applying machine learning to software fault-proneness prediction. Journal of Systems and Software 81, 186–195 (2008)
Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds.) EANN 2009. Communications in Computer and Information Science, vol. 43, pp. 223–234. Springer, Heidelberg (2009)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Halstead, M.H.: Elements of Software Science. Elsevier North-Holland, New York (1977)
Harman, M., Jones, B.F.: Search based software engineering. Information and Software Technology 43(14), 833–839 (2001)
Harman, M., Clark, J.A.: Metrics Are Fitness Functions Too. IEEE Metrics, 58–69 (2004)
Yan, Z., Chen, X., Guo, P.: Software Defect Prediction Using Fuzzy Support Vector Regression. In: Procs of the International Symposium on Neural Networks, pp. 17–24 (2010)
Kaur, A., Malhotra, R.: Application of Random Forest in Predicting Fault-Prone Classes. In: Procs of the International Conference on Advanced Computer Theory and Engineering
Kitchenham, B., Pickard, L., Peeger, S.: Case studies for method and tool evaluation. IEEE Software 12(4), 52–62 (1995)
NASA – Metrics data program, http://mdp.ivv.nasa.gov/
Ostrand, T.J., Weyuker, E.J.: How to measure success of fault prediction models. In: Procs of the Fourth Workshop on Software Quality Assurance, pp. 25–30 (2007)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the Location and Number of Faults in Large Software Systems. IEEE Trans. Software Eng. 31(4), 340–355 (2005)
PROMISE Repository of empirical software engineering data, http://promisedata.org
Sandhu, P.S., Dhiman, S.K., Goyal, A.: A Genetic Algorithm Based Classification Approach for Finding Fault Prone Classes. World Academy of Science, Engineering and Technology 60 (2009)
Singh, Y., Kaur, A., Malhorta, R.: Software Fault Proneness prediction Using Support Vector Machines. In: Procs of the World Congress on Engineering, vol. I, pp. 240–245 (2009)
Singh, Y., Kaur, A., Malhorta, R.: Application of Support Vector Machine to Predict Fault Prone Classes. ACM SIGSOFT Software Engineering Notes 34(1) (2009)
Vapnik, V., Chervonenkis, A.Y.: Theory of Pattern Recognition (1974) (in Russian)
Vapnik, V.: The nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Watanabe, S., Kaiya, H., Kaijiri, K.: Adapting a Fault Prediction Model to Allow Inter Language Reuse. In: Procs of the International Conference on Predictive Models in Software Engineering, pp. 19–24 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F. (2011). A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components. In: Caivano, D., Oivo, M., Baldassarre, M.T., Visaggio, G. (eds) Product-Focused Software Process Improvement. PROFES 2011. Lecture Notes in Computer Science, vol 6759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21843-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-21843-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21842-2
Online ISBN: 978-3-642-21843-9
eBook Packages: Computer ScienceComputer Science (R0)