Abstract
Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10): 751–761, October.
Beaumont, G. P. 1996. Statistical Tests: An Introduction with Minitab Commentary. Prentice Hall.
Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ: Prentice Hall, USA.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification And Regression Trees. Belmont, California, USA: Wadsworth International Group, 2nd edition.
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044, November.
Briand, L. C., Melo, W. L., and Wust, J. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions on Software Engineering 28(7): 706–720.
Clark, L. A., and Pregibon, D. 1992. Tree-based models. In J. M. Chambers and T. J. Hastie, (eds), Statistical Models in S. Wadsworth International Group, Pacific Grove, California, pp. 377–419.
Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255–272.
Fayyad, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11(4): 20–25.
Johnson, R. A., and Wichern, D. W. 1992. Applied Multivariate Statistical Analysis. Englewood Cliffs, NJ: Prentice Hall, USA, 2nd edition.
Khoshgoftaar, T. M., and Allen, E. B. 1999. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering 6(4): 303–317.
Khoshgoftaar, T. M., and Allen, E. B. 2000. A practical classification rule for software quality models. IEEE Transactions on Reliability 49(2): 209–216.
Khoshgoftaar, T. M., and Allen, E. B. 2001. Controlling overfitting in classification-tree models of software quality. Empirical Software Engineering 6(1): 59–79.
Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2002. Using regression trees to classify fault-prone software modules. IEEE Transactions on Reliability 51(4): 455–462.
Khoshgoftaar, T. M., Allen, E. B., Hudepohl, J. P., and Aud, S. J. 1997. Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Transactions on Neural Networks 8(4): 902–909.
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000. Classification tree models of software-quality over multiple releases. IEEE Transactions on Reliability 49(1): 4–11.
Khoshgoftaar, T. M., and Seliya, N. 2002. Software quality classification modeling using the SPRINT decision tree algorithm. In Proceedings: 14th International Conference on Tools with Artificial Intelligence. Washington, DC, USA: IEEE Computer Society, November, pp. 365–374.
Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California USA: Morgan Kaufmann Publishers, Inc.
Leake, D. B. (ed.) 1996. Case-Based Reasoning: Experience, Lessons, and Future Directions. Cambridge, MA USA: MIT Press.
Mehta, M., Agarwal, R., and Rissanen, J. 1996. SLIQ: A fast scalable classifier for data mining. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, March.
Mehta, M., Rissanen, J., and Agarwal, R. 1995. MDL-based decision tree pruning. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, August.
Myers, R. H. 1990. Classical and Modern Regression with Applications. PWS-KENT, Boston, MA, USA.
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. 4th edition. Chicago, IL, USA: Irwin McGraw-Hill.
Ohlsson, M. C., Helander, M., and Wohlin, C. 1996. Quality improvement by identification of fault-prone modules using software design metrics. In Proceedings: International Conference on Software Quality, Ottawa, Ontario, Canada, pp. 1-13.
Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada: IEEE Computer Society, June, pp. 217–226.
Ohlsson, N., Zhao, M., and Helander, M. 1998. Application of multivariate analysis for software fault prediction. Software Quality Journal 7(1): 51–66.
Paul, R. 1992. Metric-based neural network classification tool for analyzing large-scale software. In Proceedings: International Conference on Tools with Artificial Intelligence. Arlington, Virginia, USA: IEEE Computer Society, November, pp. 108–113.
Ping, Y., Systa, T., and Muller, H. 2002. Predicting fault-proneness using OO metrics, an industrial case study. In T. Gyimothy and F. B. Abreu (eds), Proceedings: 6th European Conference on Software Maintenance and Reengineering. Budapest, Hungary, March, pp. 99-107.
Pizzi, N. J., Summers, A. R., and Pedrycz, W. 2002. Software quality prediction using median-adjusted class labels. In Proceedings: International Joint Conference on Neural Networks, volume 3. Honolulu, Hawaii, USA: IEEE Computer Society, May, pp. 2405–2409.
Ponnuswamy, V. 2001. Classification of software quality with tree modeling using C4.5 algorithm. Master's thesis, Florida Atlantic University, Boca Raton, Florida USA, December. Advised by Taghi M. Khoshgoftaar.
Quinlan, J. R. 1993. Machine Learning, C4.5: Programs For Machine Learning. San Mateo, California: Morgan Kaufmann.
Ross, F. D. 2001. An empirical study of analogy based software quality classification models. Master's thesis, Florida Atlantic University, Boca Raton, FL USA, August. Advised by T. M. Khoshgoftaar.
Schneidewind, N. F. 1995. Software metrics validation: Space shuttle flight software example. Annals of Software Engineering 1: 287–309.
Schneidewind, N. F. 2001. Investigation of logistic regression as a discriminant of software quality. In Proceedings: 7th International Software Metrics Symposium. London UK: IEEE Computer Society, April, pp. 328–337.
Shafer, J., Agarwal, R., and Mehta, M. 1996. SPRINT: A scalable parallel classifier for data mining. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, September.
Shepperd, M., and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27(11): 1014–1022.
Steinberg, D., and Colla, P. 1995. CART: A supplementary module for SYSTAT. Salford Systems, San Diego, California.
Suarez, A., and Lutsko, J. F. 1999. Globally optimal fuzzy decision trees for classification and regression. Pattern Analysis and Machine Intelligence 21(12): 1297–1311, IEEE Computer Society.
Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: Approach, experimentation, evaluation. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA: IEEE Computer Society: November, pp. 222–233.
Votta, L. G., and Porter, A. A. 1995. Experimental software engineering: A report on the state of the art. In Proceedings of the 17th. International Conference on Software Engineering. Seattle, WA USA: IEEE Computer Society, April, pp. 277–279.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Khoshgoftaar, T.M., Seliya, N. Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study. Empirical Software Engineering 9, 229–257 (2004). https://doi.org/10.1023/B:EMSE.0000027781.18360.9b
Issue Date:
DOI: https://doi.org/10.1023/B:EMSE.0000027781.18360.9b