Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Tuning for software analytics

Published: 01 August 2016 Publication History

Abstract

Context: Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the "black arts" of data mining is setting the tunings that control the miner.Objective: We seek simple, automatic, and very effective method for finding those tunings.Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data.Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%.Conclusion: Since (1) the improvements are so large, and (2) the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue.

References

[1]
T. Menzies, C. Pape, M. Rees-Jones, The promise repository of empirical software engineering data, 2015, URL: http://openscience.us/repo
[2]
D. Rodriguez, I. Herraiz, R. Harrison, On software engineering repositories and their open problems, 2012.
[3]
N. Nagappan, T. Ball, Static analysis tools as early indicators of pre-release defect density, ACM, 2005.
[4]
S. Lessmann, B. Baesens, C. Mues, S. Pietsch, Benchmarking classification models for software defect prediction: a proposed framework and novel findings, IEEE Trans. Softw. Eng., 34 (2008) 485-496.
[5]
T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic review of fault prediction performance in software engineering, IEEE Trans. Softw. Eng., 38 (2012) 1276-1304.
[6]
R.M. Bell, T.J. Ostrand, E.J. Weyuker, The limited impact of individual developer data on software defect prediction, Emp. Softw. Eng., 18 (2013) 478-505.
[7]
F. Rahman, P. Devanbu, How, and why, process metrics are better, IEEE Press, 2013.
[8]
T. Menzies, J.D. Stefano, More success and failure factors in software reuse, IEEE Trans. Softw Eng., 29 (2003) 474-477.
[9]
R. Moser, W. Pedrycz, G. Succi, A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, ACM, 2008.
[10]
T. Zimmermann, R. Premraj, A. Zeller, Predicting defects for eclipse, IEEE, 2007.
[11]
K. Herzig, S. Just, A. Rau, A. Zeller, Predicting defects using change genealogies, IEEE, 2013.
[12]
R. Storn, K. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim., 11 (1997) 341-359.
[13]
J. Bergstra, Y. Bengio, Random Search for hyper-parameter optimization, J. Mach. Learn. Res., 13 (2012) 281-305.
[14]
K. Gao, T.M. Khoshgoftaar, H. Wang, N. Seliya, Choosing software metrics for defect prediction: An investigation on feature selection techniques, Softw. Pract. Exper., 41 (2011) 579-606.
[15]
T. Menzies, J. Greenwald, A. Frank, Data mining static code attributes to learn defect predictors, IEEE Trans. Softw. Eng., 33 (2007) 2-13.
[16]
K.O. Elish, M.O. Elish, Predicting defect-prone software modules using support vector machines, J. Syst. Softw., 81 (2008) 649-660.
[17]
L. Pelayo, S. Dick, Applying novel resampling strategies to software defect prediction, 2007.
[18]
S. Kim, H. Zhang, R. Wu, L. Gong, Dealing with noise in defect prediction, ACM, 2011.
[19]
N. Nagappan, B. Murphy, V. Basili, The influence of organizational structure on software quality: an empirical case study, ACM, 2008.
[20]
A.E. Hassan, Predicting faults using the complexity of code changes, IEEE Computer Society, Washington, DC, USA, 2009.
[21]
D. Baker, Lane Department of Computer Science and Electrical Engineering, West Virginia University, 2007.
[22]
B. Anda, D.I.K. Sjøberg, A. Mockus, Variability and reproducibility in software engineering: A study of four companies that developed the same system, IEEE Trans. Softw. Eng., 35 (2009) 407-429.
[23]
E. Arisholm, L. Briand, Predicting fault-prone components in a java legacy system, 2006.
[24]
T.J. Ostrand, E.J. Weyuker, R.M. Bell, Where the bugs are, ACM, 2004.
[25]
F. Rahman, D. Posnett, P. Devanbu, Recalling the 'imprecision' of cross-project defect prediction, ACM, 2012.
[26]
Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, L. Bairavasundaram, How do fixes become bugs?, 2011.
[27]
T. Menzies, D. Raffo, S. Setamanit, Y. Hu, S. Tootoonian, Model-based tests of truisms, 2002.
[28]
C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, E.J.W. Jr, Does bug prediction support human developers? Findings from a google case study, IEEE, 2013.
[29]
S. Rakitin, Software Verification and Validation for Practitioners and Managers, Second Edition, Artech House, 2001.
[30]
A. Tosun, A. Bener, R. Kale, AI-based software defect predictors: applications and benefits in a case study, 2010.
[31]
A. Tosun, A. Bener, B. Turhan, Practical considerations of deploying ai in defect prediction: a case study within the Turkish telecommunication industry, 2009.
[32]
F. Shull, V.B. ad B. Boehm, A. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, M. Zelkowitz, What we have learned about fighting defects, IEEE, 2002.
[33]
M. Fagan, Design and code inspections to reduce errors in program development, IBM Syst. J., 15 (1976).
[34]
F. Rahman, S. Khatri, E. Barr, P. Devanbu, Comparing static bug finders and statistical prediction, ACM, 2014.
[35]
L. Breiman, A. Cutler, Random forests, 2001, https://www.stat.berkeley.edu/~breiman/RandomForests.
[36]
L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, CRC press, 1984.
[37]
T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, T. Zimmermann, Local versus global lessons for defect prediction and effort estimation, IEEE Trans. Softw. Eng., 39 (2013) 822-834.
[38]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011) 2825-2830.
[39]
T. Menzies, E. Kocaguneli, L. Minku, F. Peters, B. Turhan, Sharing Data and Models in Software Engineering, Morgan Kaufmann, 2015.
[40]
I. Jolliffe, Principal Component Analysis, Wiley Online Library, 2002.
[41]
U.M. Fayyad, I.H. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, 1993.
[42]
P.R. Cohen, Empirical Methods for Artificial Intelligence, MIT Press, 1995.
[43]
R. Holte, Very simple classification rules perform well on most commonly used datasets, Mach. Learn., 11 (1993) 63.
[44]
M.S. Feather, T. Menzies, Converging on the optimal attainment of requirements, 2002.
[45]
T. Menzies, O. El-Rawas, J. Hihn, M. Feather, B. Boehm, R. Madachy, The business case for automated software engineerng, ACM, 2007.
[46]
A. Goldberg, On the complexity of the satisfiability problem, 1979.
[47]
F. Glover, C. McMillan, The general employee scheduling problem. an integration of ms and ai, Comput. Oper. Res., 13 (1986) 563-573.
[48]
R.P. Beausoleil, MOSS: multiobjective scatter search applied to non-linear multiple criteria optimization, Eur. J. Oper. Res., 169 (2006) 426-449.
[49]
J. Molina, M. Laguna, R. Marti, R. Caballero, SSPMO: A scatter tabu search procedure for non-linear multiobjective optimization, INFORMS J. Comput., 19 (2007) 91-100.
[50]
A.J. Nebro, F. Luna, E. Alba, B. Dorronsoro, J.J. Durillo, A. Beham, Abyss: adapting scatter search to multiobjective optimization, IEEE Trans. Evol. Comp., 12 (2008) 439-457.
[51]
H. Pan, M. Zheng, X. Han, Particle swarm-simulated annealing fusion algorithm and its application in function optimization, 2008.
[52]
J. Krall, T. Menzies, M. Davies, Gale: Geometric active learning for search-based software engineering, IEEE Trans. Softw. Eng., 41 (2015) 1001-1018.
[53]
M. Zuluaga, A. Krause, G. Sergent, M. Püschel, Active learning for multi-objective optimization, 2013.
[54]
J. Vesterstrom, R. Thomsen, A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems, 2004.
[55]
M. Omran, A.P. Engelbrecht, A. Salman, Differential evolution methods for unsupervised image classification, 2005.
[56]
I. Chiha, J. Ghabi, N. Liouane, Tuning pid controller with multi-objective differential evolution, IEEE, 2012.
[57]
T. Robič, B. Filipič, Demo: differential evolution for multiobjective optimization, Springer, 2005.
[58]
Q. Zhang, H. Li, Moea/d: a multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comp, 11 (2007) 712-731.
[59]
W. Huang, H. Li, On the differential evolution schemes in moea/d, 2010.
[60]
J. Krall, T. Menzies, M. Davies, Learning mitigations for pilot issues when landing aircraft (via multiobjective optimization and multiagent simulations), IEEE Trans. Human Mach. Syst., 46 (2016) 221-230.
[61]
P.G. II, T. Menzies, S. Williams, O. El-Rawas, Understanding the value of software engineering technologies, IEEE, 2009.
[62]
J. Fürnkranz, P. Flach, Roc 'n' rule learning: towards a better understanding of covering algorithms, Mach. Learn., 58 (2005) 39-77.
[63]
Y. Jia, M.B. Cohen, M. Harman, J. Petke, Learning combinatorial interaction testing strategies using hyperheuristic search, IEEE, 2015.

Cited By

View all
  • (2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
  • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
  • (2024)Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction TasksACM Transactions on Software Engineering and Methodology10.1145/364959633:6(1-45)Online publication date: 27-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 76, Issue C
August 2016
147 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 August 2016

Author Tags

  1. CART
  2. Defect prediction
  3. Differential evolution
  4. Random forest
  5. Search-based software engineering

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
  • (2024)Automatically Recommend Code Updates: Are We There Yet?ACM Transactions on Software Engineering and Methodology10.1145/3678167Online publication date: 16-Jul-2024
  • (2024)Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction TasksACM Transactions on Software Engineering and Methodology10.1145/364959633:6(1-45)Online publication date: 27-Jun-2024
  • (2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
  • (2024)VALIDATEInformation and Software Technology10.1016/j.infsof.2024.107448170:COnline publication date: 1-Jun-2024
  • (2024)Parameter tuning for software fault prediction with different variants of differential evolutionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121251237:PCOnline publication date: 1-Mar-2024
  • (2024)An empirical study of data sampling techniques for just-in-time software defect predictionAutomated Software Engineering10.1007/s10515-024-00455-831:2Online publication date: 22-Jun-2024
  • (2024)Software defect prediction: future directions and challengesAutomated Software Engineering10.1007/s10515-024-00424-131:1Online publication date: 27-Feb-2024
  • (2023)Assessing the Early Bird Heuristic (for Predicting Project Quality)ACM Transactions on Software Engineering and Methodology10.1145/358356532:5(1-39)Online publication date: 24-Jul-2023
  • (2023)Finding Trends in Software ResearchIEEE Transactions on Software Engineering10.1109/TSE.2018.287038849:4(1397-1410)Online publication date: 1-Apr-2023
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media