Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Taghi M. Khoshgoftaar¹ &
Naeem Seliya¹

816 Accesses
118 Citations
Explore all metrics

Abstract

Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

Discriminating features-based cost-sensitive approach for software defect prediction

Article Open access 12 July 2021

Identifying and eliminating less complex instances from software fault data

Article 24 December 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10): 751–761, October.
Google Scholar
Beaumont, G. P. 1996. Statistical Tests: An Introduction with Minitab Commentary. Prentice Hall.
Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ: Prentice Hall, USA.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification And Regression Trees. Belmont, California, USA: Wadsworth International Group, 2nd edition.
Google Scholar
Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044, November.
Google Scholar
Briand, L. C., Melo, W. L., and Wust, J. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions on Software Engineering 28(7): 706–720.
Google Scholar
Clark, L. A., and Pregibon, D. 1992. Tree-based models. In J. M. Chambers and T. J. Hastie, (eds), Statistical Models in S. Wadsworth International Group, Pacific Grove, California, pp. 377–419.
Google Scholar
Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255–272.
Google Scholar
Fayyad, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11(4): 20–25.
Google Scholar
Johnson, R. A., and Wichern, D. W. 1992. Applied Multivariate Statistical Analysis. Englewood Cliffs, NJ: Prentice Hall, USA, 2nd edition.
Google Scholar
Khoshgoftaar, T. M., and Allen, E. B. 1999. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering 6(4): 303–317.
Google Scholar
Khoshgoftaar, T. M., and Allen, E. B. 2000. A practical classification rule for software quality models. IEEE Transactions on Reliability 49(2): 209–216.
Google Scholar
Khoshgoftaar, T. M., and Allen, E. B. 2001. Controlling overfitting in classification-tree models of software quality. Empirical Software Engineering 6(1): 59–79.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2002. Using regression trees to classify fault-prone software modules. IEEE Transactions on Reliability 51(4): 455–462.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., Hudepohl, J. P., and Aud, S. J. 1997. Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Transactions on Neural Networks 8(4): 902–909.
Google Scholar
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000. Classification tree models of software-quality over multiple releases. IEEE Transactions on Reliability 49(1): 4–11.
Google Scholar
Khoshgoftaar, T. M., and Seliya, N. 2002. Software quality classification modeling using the SPRINT decision tree algorithm. In Proceedings: 14th International Conference on Tools with Artificial Intelligence. Washington, DC, USA: IEEE Computer Society, November, pp. 365–374.
Google Scholar
Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California USA: Morgan Kaufmann Publishers, Inc.
Google Scholar
Leake, D. B. (ed.) 1996. Case-Based Reasoning: Experience, Lessons, and Future Directions. Cambridge, MA USA: MIT Press.
Google Scholar
Mehta, M., Agarwal, R., and Rissanen, J. 1996. SLIQ: A fast scalable classifier for data mining. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, March.
Mehta, M., Rissanen, J., and Agarwal, R. 1995. MDL-based decision tree pruning. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, August.
Myers, R. H. 1990. Classical and Modern Regression with Applications. PWS-KENT, Boston, MA, USA.
Google Scholar
Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. 4th edition. Chicago, IL, USA: Irwin McGraw-Hill.
Google Scholar
Ohlsson, M. C., Helander, M., and Wohlin, C. 1996. Quality improvement by identification of fault-prone modules using software design metrics. In Proceedings: International Conference on Software Quality, Ottawa, Ontario, Canada, pp. 1-13.
Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada: IEEE Computer Society, June, pp. 217–226.
Google Scholar
Ohlsson, N., Zhao, M., and Helander, M. 1998. Application of multivariate analysis for software fault prediction. Software Quality Journal 7(1): 51–66.
Google Scholar
Paul, R. 1992. Metric-based neural network classification tool for analyzing large-scale software. In Proceedings: International Conference on Tools with Artificial Intelligence. Arlington, Virginia, USA: IEEE Computer Society, November, pp. 108–113.
Google Scholar
Ping, Y., Systa, T., and Muller, H. 2002. Predicting fault-proneness using OO metrics, an industrial case study. In T. Gyimothy and F. B. Abreu (eds), Proceedings: 6th European Conference on Software Maintenance and Reengineering. Budapest, Hungary, March, pp. 99-107.
Pizzi, N. J., Summers, A. R., and Pedrycz, W. 2002. Software quality prediction using median-adjusted class labels. In Proceedings: International Joint Conference on Neural Networks, volume 3. Honolulu, Hawaii, USA: IEEE Computer Society, May, pp. 2405–2409.
Google Scholar
Ponnuswamy, V. 2001. Classification of software quality with tree modeling using C4.5 algorithm. Master's thesis, Florida Atlantic University, Boca Raton, Florida USA, December. Advised by Taghi M. Khoshgoftaar.
Google Scholar
Quinlan, J. R. 1993. Machine Learning, C4.5: Programs For Machine Learning. San Mateo, California: Morgan Kaufmann.
Google Scholar
Ross, F. D. 2001. An empirical study of analogy based software quality classification models. Master's thesis, Florida Atlantic University, Boca Raton, FL USA, August. Advised by T. M. Khoshgoftaar.
Google Scholar
Schneidewind, N. F. 1995. Software metrics validation: Space shuttle flight software example. Annals of Software Engineering 1: 287–309.
Google Scholar
Schneidewind, N. F. 2001. Investigation of logistic regression as a discriminant of software quality. In Proceedings: 7th International Software Metrics Symposium. London UK: IEEE Computer Society, April, pp. 328–337.
Google Scholar
Shafer, J., Agarwal, R., and Mehta, M. 1996. SPRINT: A scalable parallel classifier for data mining. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, September.
Shepperd, M., and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27(11): 1014–1022.
Google Scholar
Steinberg, D., and Colla, P. 1995. CART: A supplementary module for SYSTAT. Salford Systems, San Diego, California.
Google Scholar
Suarez, A., and Lutsko, J. F. 1999. Globally optimal fuzzy decision trees for classification and regression. Pattern Analysis and Machine Intelligence 21(12): 1297–1311, IEEE Computer Society.
Google Scholar
Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: Approach, experimentation, evaluation. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA: IEEE Computer Society: November, pp. 222–233.
Google Scholar
Votta, L. G., and Porter, A. A. 1995. Experimental software engineering: A report on the state of the art. In Proceedings of the 17th. International Conference on Software Engineering. Seattle, WA USA: IEEE Computer Society, April, pp. 277–279.
Google Scholar

Download references

Author information

Authors and Affiliations

Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, 33431, USA
Taghi M. Khoshgoftaar & Naeem Seliya

Authors

Taghi M. Khoshgoftaar
View author publications
You can also search for this author in PubMed Google Scholar
Naeem Seliya
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study. Empirical Software Engineering 9, 229–257 (2004). https://doi.org/10.1023/B:EMSE.0000027781.18360.9b

Download citation

Issue Date: September 2004
DOI: https://doi.org/10.1023/B:EMSE.0000027781.18360.9b

Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

Discriminating features-based cost-sensitive approach for software defect prediction

Identifying and eliminating less complex instances from software fault data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Software Defect-Based Prediction Using Logistic Regression: Review and Challenges

Discriminating features-based cost-sensitive approach for software defect prediction

Identifying and eliminating less complex instances from software fault data

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation