Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10): 751–761, October.

    Google Scholar 

  • Beaumont, G. P. 1996. Statistical Tests: An Introduction with Minitab Commentary. Prentice Hall.

  • Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ: Prentice Hall, USA.

    Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification And Regression Trees. Belmont, California, USA: Wadsworth International Group, 2nd edition.

    Google Scholar 

  • Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044, November.

    Google Scholar 

  • Briand, L. C., Melo, W. L., and Wust, J. 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions on Software Engineering 28(7): 706–720.

    Google Scholar 

  • Clark, L. A., and Pregibon, D. 1992. Tree-based models. In J. M. Chambers and T. J. Hastie, (eds), Statistical Models in S. Wadsworth International Group, Pacific Grove, California, pp. 377–419.

    Google Scholar 

  • Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255–272.

    Google Scholar 

  • Fayyad, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11(4): 20–25.

    Google Scholar 

  • Johnson, R. A., and Wichern, D. W. 1992. Applied Multivariate Statistical Analysis. Englewood Cliffs, NJ: Prentice Hall, USA, 2nd edition.

    Google Scholar 

  • Khoshgoftaar, T. M., and Allen, E. B. 1999. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering 6(4): 303–317.

    Google Scholar 

  • Khoshgoftaar, T. M., and Allen, E. B. 2000. A practical classification rule for software quality models. IEEE Transactions on Reliability 49(2): 209–216.

    Google Scholar 

  • Khoshgoftaar, T. M., and Allen, E. B. 2001. Controlling overfitting in classification-tree models of software quality. Empirical Software Engineering 6(1): 59–79.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2002. Using regression trees to classify fault-prone software modules. IEEE Transactions on Reliability 51(4): 455–462.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., Hudepohl, J. P., and Aud, S. J. 1997. Applications of neural networks to software quality modeling of a very large telecommunications system. IEEE Transactions on Neural Networks 8(4): 902–909.

    Google Scholar 

  • Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000. Classification tree models of software-quality over multiple releases. IEEE Transactions on Reliability 49(1): 4–11.

    Google Scholar 

  • Khoshgoftaar, T. M., and Seliya, N. 2002. Software quality classification modeling using the SPRINT decision tree algorithm. In Proceedings: 14th International Conference on Tools with Artificial Intelligence. Washington, DC, USA: IEEE Computer Society, November, pp. 365–374.

    Google Scholar 

  • Kolodner, J. 1993. Case-Based Reasoning. San Mateo, California USA: Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  • Leake, D. B. (ed.) 1996. Case-Based Reasoning: Experience, Lessons, and Future Directions. Cambridge, MA USA: MIT Press.

    Google Scholar 

  • Mehta, M., Agarwal, R., and Rissanen, J. 1996. SLIQ: A fast scalable classifier for data mining. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, March.

  • Mehta, M., Rissanen, J., and Agarwal, R. 1995. MDL-based decision tree pruning. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, August.

  • Myers, R. H. 1990. Classical and Modern Regression with Applications. PWS-KENT, Boston, MA, USA.

    Google Scholar 

  • Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. 4th edition. Chicago, IL, USA: Irwin McGraw-Hill.

    Google Scholar 

  • Ohlsson, M. C., Helander, M., and Wohlin, C. 1996. Quality improvement by identification of fault-prone modules using software design metrics. In Proceedings: International Conference on Software Quality, Ottawa, Ontario, Canada, pp. 1-13.

  • Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. In Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada: IEEE Computer Society, June, pp. 217–226.

    Google Scholar 

  • Ohlsson, N., Zhao, M., and Helander, M. 1998. Application of multivariate analysis for software fault prediction. Software Quality Journal 7(1): 51–66.

    Google Scholar 

  • Paul, R. 1992. Metric-based neural network classification tool for analyzing large-scale software. In Proceedings: International Conference on Tools with Artificial Intelligence. Arlington, Virginia, USA: IEEE Computer Society, November, pp. 108–113.

    Google Scholar 

  • Ping, Y., Systa, T., and Muller, H. 2002. Predicting fault-proneness using OO metrics, an industrial case study. In T. Gyimothy and F. B. Abreu (eds), Proceedings: 6th European Conference on Software Maintenance and Reengineering. Budapest, Hungary, March, pp. 99-107.

  • Pizzi, N. J., Summers, A. R., and Pedrycz, W. 2002. Software quality prediction using median-adjusted class labels. In Proceedings: International Joint Conference on Neural Networks, volume 3. Honolulu, Hawaii, USA: IEEE Computer Society, May, pp. 2405–2409.

    Google Scholar 

  • Ponnuswamy, V. 2001. Classification of software quality with tree modeling using C4.5 algorithm. Master's thesis, Florida Atlantic University, Boca Raton, Florida USA, December. Advised by Taghi M. Khoshgoftaar.

    Google Scholar 

  • Quinlan, J. R. 1993. Machine Learning, C4.5: Programs For Machine Learning. San Mateo, California: Morgan Kaufmann.

    Google Scholar 

  • Ross, F. D. 2001. An empirical study of analogy based software quality classification models. Master's thesis, Florida Atlantic University, Boca Raton, FL USA, August. Advised by T. M. Khoshgoftaar.

    Google Scholar 

  • Schneidewind, N. F. 1995. Software metrics validation: Space shuttle flight software example. Annals of Software Engineering 1: 287–309.

    Google Scholar 

  • Schneidewind, N. F. 2001. Investigation of logistic regression as a discriminant of software quality. In Proceedings: 7th International Software Metrics Symposium. London UK: IEEE Computer Society, April, pp. 328–337.

    Google Scholar 

  • Shafer, J., Agarwal, R., and Mehta, M. 1996. SPRINT: A scalable parallel classifier for data mining. IBM White Paper Series: Available at www.almaden.ibm.com/cs/quest, September.

  • Shepperd, M., and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27(11): 1014–1022.

    Google Scholar 

  • Steinberg, D., and Colla, P. 1995. CART: A supplementary module for SYSTAT. Salford Systems, San Diego, California.

    Google Scholar 

  • Suarez, A., and Lutsko, J. F. 1999. Globally optimal fuzzy decision trees for classification and regression. Pattern Analysis and Machine Intelligence 21(12): 1297–1311, IEEE Computer Society.

    Google Scholar 

  • Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: Approach, experimentation, evaluation. In Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA: IEEE Computer Society: November, pp. 222–233.

    Google Scholar 

  • Votta, L. G., and Porter, A. A. 1995. Experimental software engineering: A report on the state of the art. In Proceedings of the 17th. International Conference on Software Engineering. Seattle, WA USA: IEEE Computer Society, April, pp. 277–279.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study. Empirical Software Engineering 9, 229–257 (2004). https://doi.org/10.1023/B:EMSE.0000027781.18360.9b

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:EMSE.0000027781.18360.9b

Navigation