article

Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Authors:

Sun-Jen HuangAuthors Info & Claims

Information and Software Technology, Volume 51, Issue 1

Pages 173 - 181

https://doi.org/10.1016/j.infsof.2008.03.004

Published: 01 January 2009 Publication History

Abstract

Software classification models have been regarded as an essential support tool in performing measurement and analysis processes. Most of the established models are single-cycled in the model usage stage, and thus require the measurement data of all the model's variables to be simultaneously collected and utilized for classifying an unseen case within only a single decision cycle. Conversely, the multi-cycled model allows the measurement data of all the model's variables to be gradually collected and utilized for such a classification within more than one decision cycle, and thus intuitively seems to have better classification efficiency but poorer classification accuracy. Software project managers often have difficulties in choosing an appropriate classification model that is better suited to their specific environments and needs. However, this important topic is not adequately explored in software measurement and analysis literature. By using an industrial software measurement dataset of NASA KC2, this paper explores the quantitative performance comparisons of the classification accuracy and efficiency of the discriminant analysis (DA)- and logistic regression (LR)-based single-cycled models and the decision tree (DT)-based (C4.5 and ECHAID algorithms) multi-cycled models. The experimental results suggest that the re-appraisal cost of the Type I MR, the software failure cost of Type II MR and the data collection cost of software measurements should be considered simultaneously when choosing an appropriate classification model.

References

[1]

Standish Group, CHAOS Demographics and Project Resolution of Third Quarter, 2004. Available from: <http://standishgroup.com/sample_research/PDFpages/q3-spotlight.pdf>.

[2]

CMU/SEI CMMI Product Team, CMU/SEI-2002-TR-012: Capability Maturity Model Integration (CMMI-SE/SW/IPPD/SS, V1.1, Staged Representation), 2002.

[3]

ISO/IEC, 15939: Software Engineering-Software Measurement Process, 2002.

[4]

T.M. Khoshgoftaar, E.B. Allen, N. Goel, A. Nandi, J. Mcmullan, Detection of software modules with high debug code churn in a very large legacy system, in: Proceedings of the Seventh International symposium on Software Reliability Engineering, White Plains, NY, October 1996, pp. 364-371.

[5]

V.R. Basili, S.E. Condon, K.E. Emam, R.B. Hendrick, W. Melo, Characterizing and modeling the cost of rework in a library of reusable software components, in: Proceedings of 19th International Conference on Software Engineering, May 1997, pp. 282-291.

Digital Library

[6]

T.M. Khoshgoftaar, E.B. Allen, W.D. Jones, J.I. Hudepohl, Classification tree models of software quality over multiple releases, in: Proceedings of 10th International Symposium on Software Reliability Engineering, November 1999, pp. 116-125.

[7]

Khoshgoftaar, T.M. and Allen, E.B., Controlling over-fitting in classification-tree models of software quality. Empirical Software Engineering. v6 i1. 59-79.

[8]

Reformat, M., Pedrycz, W. and Pizzi, N.J., Software quality analysis with the use of computational intelligence. Information and Software Technology. v45 i7. 405-417.

[9]

Huang, S.J., Lin, C.Y. and Chiu, N.H., Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. Journal of Information Science and Engineering. v22 i2. 297-313.

[10]

Rodriguez, V. and Tsai, W.T., Evaluation of software metrics using discriminate analysis. Journal of Information and Software Technology. v29 i3. 245-251.

[11]

Takahashi, R., Software quality classification model based on McCabe's complexity measure. Journal of Systems and Software. v38 i1. 61-69.

[12]

Y. Ping, T. Systa, H. Muller, Predicting fault-proneness using OO metrics. An industrial case study, in: Proceedings of Sixth European Conference on Software Maintenance and Reengineering, 11-13 March 2002, pp.99-107.

[13]

Khoshgoftaar, T.M. and Allen, E.B., Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering. v6 i4. 303-317.

[14]

Mockus, A. and Weiss, D.M., Predicting risk of software changes. Bell Labs Technical Journal. v5 i2. 169-180.

[15]

G. Denaro, S. Morasca, M. Pezzè, Deriving models of software fault-proneness, in: Proceedings of 14th International Conference on Software Engineering and Knowledge Engineering, July 2002, pp. 361-368.

Digital Library

[16]

Khoshgoftaar, T.M., Seliya, N. and Herzberg, A., Resource-oriented software quality classification models. Journal of Systems and Software. v76 i2. 111-126.

[17]

Z. Thomas, P. Rahul, Z. Andreas, Predicting Defects for Eclipse, in: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, 2007, p. 9.

[18]

Lanubile, F. and Visaggio, G., Evaluating predictive quality models derived from software measures: lessons learned. Journal of Systems and Software. v38 i3. 225-234.

[19]

Khoshgoftaar, T.M. and Seliya, N., Comparative assessment of software quality classification techniques: an empirical case study. Empirical Software Engineering. v9 i3. 229-257.

[20]

P. Bellini, I. Bruno, P. Nesi, D. Rogai, Comparing fault-proneness estimation models, in: Proceedings of the IEEE International Conference on Engineering of Complex Computer Systems (ICECCS), 2005, pp. 205-214.

[21]

Menzies, T., Greenwald, J. and Frank, A., Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering. v33 i1. 2-13.

[22]

E. Arisholm, L.C. Briand, M. Fuglerud, Data mining techniques for building fault-proneness models in telecom java software, in: The 18th IEEE International Symposium on Software Reliability (ISSRE), 2007, pp. 215-224.

[23]

Quinlan, J.R., C4.5: Programs for Machine Learning. 1993. Morgan Kaufmann, San Mateo, California.

[24]

Kass, G.V., An exploratory technique for investigating large quantities of categorical data. Journal of Applied Statistics. v29. 119-127.

[25]

SPSS Inc., SPSS v.13 Tutorial, 2004.

[26]

Hosmer, D.W. and Lemeshow, S., Applied Logistic Analysis. 1989. Wiley, New York.

[27]

J.R. Quinlan, Program: C4.5 Release 8, 1997. Available from: <http://www.rulequest.com/Personal/>.

[28]

Biggs, D., DeVille, B. and Suen, E., A method of choosing multi-way partitions for classification and decision trees. Journal of Applied Statistics. v18 i1. 49-62.

[29]

Weller, A.F., J Harris, A., Ware, J.A. and Jarvis, P.S., Determining the saliency of feature measurements obtained from images of sedimentary organic matter for use in its classification. Computers and Geosciences. v32 i9. 1357-1367.

[30]

NASA Metrics Data Program, NASA KC2 Dataset of Software Fault Prediction, 2 December 2004. Available from: <http://promise.site.uottawa.ca/SERepository/datasets/kc2.arff>.

[31]

Burman, P., A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika. v76. 503-514.

[32]

Han, Y.S., Park, Y.C. and Choi, K.S., Efficient inference for sigmoid Bayesian networks by reducing sampling space. Applied Intelligence. v6 i4. 275-285.

[33]

Y. Xu, J.Y. Yang, J.F. Lu, An efficient kernel-based nonlinear regression method for two-class classification, in: Proceedings of International Conference on Machine Learning and Cybernetics, August 2005, pp. 4442-4445.

[34]

Ebert, C., Classification techniques for metric-based software development. Software Quality Journal. v5 i4. 255-272.

[35]

Khoshgoftaar, T.M. and Allen, E.B., A practical classification rule for software quality models. IEEE Transactions on Reliability. v49 i2. 209-216.

[36]

Chen, J.S. and Cheng, C.H., Extracting classification rule of software diagnosis using modified MEPA. Expert Systems with Applications. v34 i1. 411-418.

[37]

Menzies, T., Dekhtyar, A., Distefano, J. and Greenwald, J., Problems with precision: a response to comments on "Data Mining Static Code Attributes to Learn Defect Predictors". IEEE Transactions on Software Engineering. v33 i9. 637-640.

Cited By

Çebi FSökmen N(2018)Decision-Tree Models for Predicting Time Performance in Software-Intensive ProjectsInternational Journal of Information Technology Project Management10.4018/IJITPM.20170401058:2(64-86)Online publication date: 16-Dec-2018
https://dl.acm.org/doi/10.4018/IJITPM.2017040105

Index Terms

Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Recommendations

A Comparative Study of Selected Classifiers with Classification Accuracy in User Profiling
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 03

In recent years the use of personalized service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. In literature a number of classification algorithms have been used to ...
Analysis of classification margin for classification accuracy with applications

Classification margin is commonly used for describing the classification capability of a committee of classifiers. In this paper, we study the relation between classification margin and misclassification error, focusing on exploring useful information ...
Improving Classification Accuracy on Imbalanced Data by Ensembling Technique

Prediction using classification techniques is one of the fundamental feature widely applied in various fields. Classification accuracy is still a great challenge due to data imbalance problem. The increased volume of data is also posing a challenge for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Copyright © Elsevier B.V. © 2008.

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 January 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Çebi FSökmen N(2018)Decision-Tree Models for Predicting Time Performance in Software-Intensive ProjectsInternational Journal of Information Technology Project Management10.4018/IJITPM.20170401058:2(64-86)Online publication date: 16-Dec-2018
https://dl.acm.org/doi/10.4018/IJITPM.2017040105

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents