Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Published: 01 January 2009 Publication History

Abstract

Software classification models have been regarded as an essential support tool in performing measurement and analysis processes. Most of the established models are single-cycled in the model usage stage, and thus require the measurement data of all the model's variables to be simultaneously collected and utilized for classifying an unseen case within only a single decision cycle. Conversely, the multi-cycled model allows the measurement data of all the model's variables to be gradually collected and utilized for such a classification within more than one decision cycle, and thus intuitively seems to have better classification efficiency but poorer classification accuracy. Software project managers often have difficulties in choosing an appropriate classification model that is better suited to their specific environments and needs. However, this important topic is not adequately explored in software measurement and analysis literature. By using an industrial software measurement dataset of NASA KC2, this paper explores the quantitative performance comparisons of the classification accuracy and efficiency of the discriminant analysis (DA)- and logistic regression (LR)-based single-cycled models and the decision tree (DT)-based (C4.5 and ECHAID algorithms) multi-cycled models. The experimental results suggest that the re-appraisal cost of the Type I MR, the software failure cost of Type II MR and the data collection cost of software measurements should be considered simultaneously when choosing an appropriate classification model.

References

[1]
Standish Group, CHAOS Demographics and Project Resolution of Third Quarter, 2004. Available from: <http://standishgroup.com/sample_research/PDFpages/q3-spotlight.pdf>.
[2]
CMU/SEI CMMI Product Team, CMU/SEI-2002-TR-012: Capability Maturity Model Integration (CMMI-SE/SW/IPPD/SS, V1.1, Staged Representation), 2002.
[3]
ISO/IEC, 15939: Software Engineering-Software Measurement Process, 2002.
[4]
T.M. Khoshgoftaar, E.B. Allen, N. Goel, A. Nandi, J. Mcmullan, Detection of software modules with high debug code churn in a very large legacy system, in: Proceedings of the Seventh International symposium on Software Reliability Engineering, White Plains, NY, October 1996, pp. 364-371.
[5]
V.R. Basili, S.E. Condon, K.E. Emam, R.B. Hendrick, W. Melo, Characterizing and modeling the cost of rework in a library of reusable software components, in: Proceedings of 19th International Conference on Software Engineering, May 1997, pp. 282-291.
[6]
T.M. Khoshgoftaar, E.B. Allen, W.D. Jones, J.I. Hudepohl, Classification tree models of software quality over multiple releases, in: Proceedings of 10th International Symposium on Software Reliability Engineering, November 1999, pp. 116-125.
[7]
Khoshgoftaar, T.M. and Allen, E.B., Controlling over-fitting in classification-tree models of software quality. Empirical Software Engineering. v6 i1. 59-79.
[8]
Reformat, M., Pedrycz, W. and Pizzi, N.J., Software quality analysis with the use of computational intelligence. Information and Software Technology. v45 i7. 405-417.
[9]
Huang, S.J., Lin, C.Y. and Chiu, N.H., Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. Journal of Information Science and Engineering. v22 i2. 297-313.
[10]
Rodriguez, V. and Tsai, W.T., Evaluation of software metrics using discriminate analysis. Journal of Information and Software Technology. v29 i3. 245-251.
[11]
Takahashi, R., Software quality classification model based on McCabe's complexity measure. Journal of Systems and Software. v38 i1. 61-69.
[12]
Y. Ping, T. Systa, H. Muller, Predicting fault-proneness using OO metrics. An industrial case study, in: Proceedings of Sixth European Conference on Software Maintenance and Reengineering, 11-13 March 2002, pp.99-107.
[13]
Khoshgoftaar, T.M. and Allen, E.B., Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering. v6 i4. 303-317.
[14]
Mockus, A. and Weiss, D.M., Predicting risk of software changes. Bell Labs Technical Journal. v5 i2. 169-180.
[15]
G. Denaro, S. Morasca, M. Pezzè, Deriving models of software fault-proneness, in: Proceedings of 14th International Conference on Software Engineering and Knowledge Engineering, July 2002, pp. 361-368.
[16]
Khoshgoftaar, T.M., Seliya, N. and Herzberg, A., Resource-oriented software quality classification models. Journal of Systems and Software. v76 i2. 111-126.
[17]
Z. Thomas, P. Rahul, Z. Andreas, Predicting Defects for Eclipse, in: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, 2007, p. 9.
[18]
Lanubile, F. and Visaggio, G., Evaluating predictive quality models derived from software measures: lessons learned. Journal of Systems and Software. v38 i3. 225-234.
[19]
Khoshgoftaar, T.M. and Seliya, N., Comparative assessment of software quality classification techniques: an empirical case study. Empirical Software Engineering. v9 i3. 229-257.
[20]
P. Bellini, I. Bruno, P. Nesi, D. Rogai, Comparing fault-proneness estimation models, in: Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems (ICECCS), 2005, pp. 205-214.
[21]
Menzies, T., Greenwald, J. and Frank, A., Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering. v33 i1. 2-13.
[22]
E. Arisholm, L.C. Briand, M. Fuglerud, Data mining techniques for building fault-proneness models in telecom java software, in: The 18th IEEE International Symposium on Software Reliability (ISSRE), 2007, pp. 215-224.
[23]
Quinlan, J.R., C4.5: Programs for Machine Learning. 1993. Morgan Kaufmann, San Mateo, California.
[24]
Kass, G.V., An exploratory technique for investigating large quantities of categorical data. Journal of Applied Statistics. v29. 119-127.
[25]
SPSS Inc., SPSS v.13 Tutorial, 2004.
[26]
Hosmer, D.W. and Lemeshow, S., Applied Logistic Analysis. 1989. Wiley, New York.
[27]
J.R. Quinlan, Program: C4.5 Release 8, 1997. Available from: <http://www.rulequest.com/Personal/>.
[28]
Biggs, D., DeVille, B. and Suen, E., A method of choosing multi-way partitions for classification and decision trees. Journal of Applied Statistics. v18 i1. 49-62.
[29]
Weller, A.F., J Harris, A., Ware, J.A. and Jarvis, P.S., Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification. Computers and Geosciences. v32 i9. 1357-1367.
[30]
NASA Metrics Data Program, NASA KC2 Dataset of Software Fault Prediction, 2 December 2004. Available from: <http://promise.site.uottawa.ca/SERepository/datasets/kc2.arff>.
[31]
Burman, P., A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika. v76. 503-514.
[32]
Han, Y.S., Park, Y.C. and Choi, K.S., Efficient inference for sigmoid Bayesian networks by reducing sampling space. Applied Intelligence. v6 i4. 275-285.
[33]
Y. Xu, J.Y. Yang, J.F. Lu, An efficient kernel-based nonlinear regression method for two-class classification, in: Proceedings of International Conference on Machine Learning and Cybernetics, August 2005, pp. 4442-4445.
[34]
Ebert, C., Classification techniques for metric-based software development. Software Quality Journal. v5 i4. 255-272.
[35]
Khoshgoftaar, T.M. and Allen, E.B., A practical classification rule for software quality models. IEEE Transactions on Reliability. v49 i2. 209-216.
[36]
Chen, J.S. and Cheng, C.H., Extracting classification rule of software diagnosis using modified MEPA. Expert Systems with Applications. v34 i1. 411-418.
[37]
Menzies, T., Dekhtyar, A., Distefano, J. and Greenwald, J., Problems with precision: a response to comments on "Data Mining Static Code Attributes to Learn Defect Predictors". IEEE Transactions on Software Engineering. v33 i9. 637-640.

Cited By

View all
  • (2018)Decision-Tree Models for Predicting Time Performance in Software-Intensive ProjectsInternational Journal of Information Technology Project Management10.4018/IJITPM.20170401058:2(64-86)Online publication date: 16-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 January 2009

Author Tags

  1. Classification accuracy and efficiency
  2. Multi-cycle
  3. Single-cycle
  4. Software classification model
  5. Software measurement and analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Decision-Tree Models for Predicting Time Performance in Software-Intensive ProjectsInternational Journal of Information Technology Project Management10.4018/IJITPM.20170401058:2(64-86)Online publication date: 16-Dec-2018

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media