research-article

Tuning for software analytics

Authors:

Xipeng ShenAuthors Info & Claims

Information and Software Technology, Volume 76, Issue C

Pages 135 - 146

https://doi.org/10.1016/j.infsof.2016.04.017

Published: 01 August 2016 Publication History

Abstract

Context: Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the "black arts" of data mining is setting the tunings that control the miner.Objective: We seek simple, automatic, and very effective method for finding those tunings.Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data.Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%.Conclusion: Since (1) the improvements are so large, and (2) the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue.

References

[1]

T. Menzies, C. Pape, M. Rees-Jones, The promise repository of empirical software engineering data, 2015, URL: http://openscience.us/repo

[2]

D. Rodriguez, I. Herraiz, R. Harrison, On software engineering repositories and their open problems, 2012.

[3]

N. Nagappan, T. Ball, Static analysis tools as early indicators of pre-release defect density, ACM, 2005.

[4]

S. Lessmann, B. Baesens, C. Mues, S. Pietsch, Benchmarking classification models for software defect prediction: a proposed framework and novel findings, IEEE Trans. Softw. Eng., 34 (2008) 485-496.

Digital Library

[5]

T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic review of fault prediction performance in software engineering, IEEE Trans. Softw. Eng., 38 (2012) 1276-1304.

Digital Library

[6]

R.M. Bell, T.J. Ostrand, E.J. Weyuker, The limited impact of individual developer data on software defect prediction, Emp. Softw. Eng., 18 (2013) 478-505.

[7]

F. Rahman, P. Devanbu, How, and why, process metrics are better, IEEE Press, 2013.

[8]

T. Menzies, J.D. Stefano, More success and failure factors in software reuse, IEEE Trans. Softw Eng., 29 (2003) 474-477.

Digital Library

[9]

R. Moser, W. Pedrycz, G. Succi, A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, ACM, 2008.

Digital Library

[10]

T. Zimmermann, R. Premraj, A. Zeller, Predicting defects for eclipse, IEEE, 2007.

[11]

K. Herzig, S. Just, A. Rau, A. Zeller, Predicting defects using change genealogies, IEEE, 2013.

[12]

R. Storn, K. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim., 11 (1997) 341-359.

Digital Library

[13]

J. Bergstra, Y. Bengio, Random Search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012) 281-305.

Digital Library

[14]

K. Gao, T.M. Khoshgoftaar, H. Wang, N. Seliya, Choosing software metrics for defect prediction: An investigation on feature selection techniques, Softw. Pract. Exper., 41 (2011) 579-606.

Digital Library

[15]

T. Menzies, J. Greenwald, A. Frank, Data mining static code attributes to learn defect predictors, IEEE Trans. Softw. Eng., 33 (2007) 2-13.

Digital Library

[16]

K.O. Elish, M.O. Elish, Predicting defect-prone software modules using support vector machines, J. Syst. Softw., 81 (2008) 649-660.

Digital Library

[17]

L. Pelayo, S. Dick, Applying novel resampling strategies to software defect prediction, 2007.

[18]

S. Kim, H. Zhang, R. Wu, L. Gong, Dealing with noise in defect prediction, ACM, 2011.

[19]

N. Nagappan, B. Murphy, V. Basili, The influence of organizational structure on software quality: an empirical case study, ACM, 2008.

[20]

A.E. Hassan, Predicting faults using the complexity of code changes, IEEE Computer Society, Washington, DC, USA, 2009.

Digital Library

[21]

D. Baker, Lane Department of Computer Science and Electrical Engineering, West Virginia University, 2007.

[22]

B. Anda, D.I.K. Sjøberg, A. Mockus, Variability and reproducibility in software engineering: A study of four companies that developed the same system, IEEE Trans. Softw. Eng., 35 (2009) 407-429.

Digital Library

[23]

E. Arisholm, L. Briand, Predicting fault-prone components in a java legacy system, 2006.

[24]

T.J. Ostrand, E.J. Weyuker, R.M. Bell, Where the bugs are, ACM, 2004.

[25]

F. Rahman, D. Posnett, P. Devanbu, Recalling the 'imprecision' of cross-project defect prediction, ACM, 2012.

[26]

Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, L. Bairavasundaram, How do fixes become bugs?, 2011.

[27]

T. Menzies, D. Raffo, S. Setamanit, Y. Hu, S. Tootoonian, Model-based tests of truisms, 2002.

[28]

C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, E.J.W. Jr, Does bug prediction support human developers? Findings from a google case study, IEEE, 2013.

[29]

S. Rakitin, Software Verification and Validation for Practitioners and Managers, Second Edition, Artech House, 2001.

Digital Library

[30]

A. Tosun, A. Bener, R. Kale, AI-based software defect predictors: applications and benefits in a case study, 2010.

[31]

A. Tosun, A. Bener, B. Turhan, Practical considerations of deploying ai in defect prediction: a case study within the Turkish telecommunication industry, 2009.

[32]

F. Shull, V.B. ad B. Boehm, A. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, M. Zelkowitz, What we have learned about fighting defects, IEEE, 2002.

[33]

M. Fagan, Design and code inspections to reduce errors in program development, IBM Syst. J., 15 (1976).

Digital Library

[34]

F. Rahman, S. Khatri, E. Barr, P. Devanbu, Comparing static bug finders and statistical prediction, ACM, 2014.

[35]

L. Breiman, A. Cutler, Random forests, 2001, https://www.stat.berkeley.edu/~breiman/RandomForests.

[36]

L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, CRC press, 1984.

[37]

T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, T. Zimmermann, Local versus global lessons for defect prediction and effort estimation, IEEE Trans. Softw. Eng., 39 (2013) 822-834.

Digital Library

[38]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011) 2825-2830.

Digital Library

[39]

T. Menzies, E. Kocaguneli, L. Minku, F. Peters, B. Turhan, Sharing Data and Models in Software Engineering, Morgan Kaufmann, 2015.

Digital Library

[40]

I. Jolliffe, Principal Component Analysis, Wiley Online Library, 2002.

[41]

U.M. Fayyad, I.H. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, 1993.

[42]

P.R. Cohen, Empirical Methods for Artificial Intelligence, MIT Press, 1995.

Digital Library

[43]

R. Holte, Very simple classification rules perform well on most commonly used datasets, Mach. Learn., 11 (1993) 63.

Digital Library

[44]

M.S. Feather, T. Menzies, Converging on the optimal attainment of requirements, 2002.

[45]

T. Menzies, O. El-Rawas, J. Hihn, M. Feather, B. Boehm, R. Madachy, The business case for automated software engineerng, ACM, 2007.

Digital Library

[46]

A. Goldberg, On the complexity of the satisfiability problem, 1979.

[47]

F. Glover, C. McMillan, The general employee scheduling problem. an integration of ms and ai, Comput. Oper. Res., 13 (1986) 563-573.

Digital Library

[48]

R.P. Beausoleil, MOSS: multiobjective scatter search applied to non-linear multiple criteria optimization, Eur. J. Oper. Res., 169 (2006) 426-449.

[49]

J. Molina, M. Laguna, R. Marti, R. Caballero, SSPMO: A scatter tabu search procedure for non-linear multiobjective optimization, INFORMS J. Comput., 19 (2007) 91-100.

Digital Library

[50]

A.J. Nebro, F. Luna, E. Alba, B. Dorronsoro, J.J. Durillo, A. Beham, Abyss: adapting scatter search to multiobjective optimization, IEEE Trans. Evol. Comp., 12 (2008) 439-457.

Digital Library

[51]

H. Pan, M. Zheng, X. Han, Particle swarm-simulated annealing fusion algorithm and its application in function optimization, 2008.

[52]

J. Krall, T. Menzies, M. Davies, Gale: Geometric active learning for search-based software engineering, IEEE Trans. Softw. Eng., 41 (2015) 1001-1018.

[53]

M. Zuluaga, A. Krause, G. Sergent, M. Püschel, Active learning for multi-objective optimization, 2013.

[54]

J. Vesterstrom, R. Thomsen, A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems, 2004.

[55]

M. Omran, A.P. Engelbrecht, A. Salman, Differential evolution methods for unsupervised image classification, 2005.

[56]

I. Chiha, J. Ghabi, N. Liouane, Tuning pid controller with multi-objective differential evolution, IEEE, 2012.

[57]

T. Robič, B. Filipič, Demo: differential evolution for multiobjective optimization, Springer, 2005.

[58]

Q. Zhang, H. Li, Moea/d: a multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comp, 11 (2007) 712-731.

Digital Library

[59]

W. Huang, H. Li, On the differential evolution schemes in moea/d, 2010.

[60]

J. Krall, T. Menzies, M. Davies, Learning mitigations for pilot issues when landing aircraft (via multiobjective optimization and multiagent simulations), IEEE Trans. Human Mach. Syst., 46 (2016) 221-230.

[61]

P.G. II, T. Menzies, S. Williams, O. El-Rawas, Understanding the value of software engineering technologies, IEEE, 2009.

[62]

J. Fürnkranz, P. Flach, Roc 'n' rule learning: towards a better understanding of covering algorithms, Mach. Learn., 58 (2005) 39-77.

Digital Library

[63]

Y. Jia, M.B. Cohen, M. Harman, J. Petke, Learning combinatorial interaction testing strategies using hyperheuristic search, IEEE, 2015.

Cited By

Malhotra RCherukuri M(2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.3233/IDA-230653
Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Wan XZheng ZQin FLu X(2024)Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction TasksACM Transactions on Software Engineering and Methodology10.1145/364959633:6(1-45)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3649596
Show More Cited By

Recommendations

Defect prediction on a legacy industrial software: a case study on software with few defects
CESI '16: Proceedings of the 4th International Workshop on Conducting Empirical Studies in Industry

Context: Building defect prediction models for software projects is helpful for reducing the effort in locating defects. In this paper, we share our experiences in building a defect prediction model for a large industrial software project. We extract ...
MEG: Multi-objective Ensemble Generation for Software Defect Prediction
ESEM '22: Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Background: Defect Prediction research aims at assisting software engineers in the early identification of software defect during the development process. A variety of automated approaches, ranging from traditional classification models to more ...
A differential evolution-based approach for effort-aware just-in-time software defect prediction
RL+SE&PL 2020: Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages

Software defect prediction technology is an effective method to improve software quality. Effort-aware just-in-time software defect prediction (JIT-SDP) aims to identify more defective changes in limited effort. Although many methods have been proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information and Software Technology

Information and Software Technology Volume 76, Issue C

August 2016

147 pages

ISSN:0950-5849

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 August 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Malhotra RCherukuri M(2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.3233/IDA-230653
Liu YTantithamthavorn CLiu YThongtanunam PLi L(2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
https://doi.org/10.1145/3678167
Wan XZheng ZQin FLu X(2024)Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction TasksACM Transactions on Software Engineering and Methodology10.1145/364959633:6(1-45)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3649596
Lustosa AMenzies T(2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3630252
Esposito MFalessi D(2024)VALIDATEInformation and Software Technology10.1016/j.infsof.2024.107448170:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.infsof.2024.107448
Nikravesh NKeyvanpour M(2024)Parameter tuning for software fault prediction with different variants of differential evolutionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121251237:PCOnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121251
Li ZDu QZhang HJing XWu F(2024)An empirical study of data sampling techniques for just-in-time software defect predictionAutomated Software Engineering10.1007/s10515-024-00455-831:2Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1007/s10515-024-00455-8
Li ZNiu JJing X(2024)Software defect prediction: future directions and challengesAutomated Software Engineering10.1007/s10515-024-00424-131:1Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1007/s10515-024-00424-1
C. SMenzies T(2023)Assessing the Early Bird Heuristic (for Predicting Project Quality)ACM Transactions on Software Engineering and Methodology10.1145/358356532:5(1-39)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3583565
Mathew GAgrawal AMenzies T(2023)Finding Trends in Software ResearchIEEE Transactions on Software Engineering10.1109/TSE.2018.287038849:4(1397-1410)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TSE.2018.2870388
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents