research-article

Optimized fuzzy clustering‐based k‐nearest neighbors imputation for mixed missing data in software development effort estimation

Authors:

Ibtissam Abnane,

Ali Idri,

Alain AbranAuthors Info & Claims

Journal of Software: Evolution and Process, Volume 36, Issue 4

https://doi.org/10.1002/smr.2529

Published: 04 January 2023 Publication History

Abstract

Context

Software development effort estimation (SDEE) is one of the most challenging aspects in project management. The presence of missing data (MD) in software attributes makes SDEE even more complex. K‐nearest neighbors imputation (KNNI) has been widely used in SDEE to deal with the MD issue. However, KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. When dealing with categorical attributes, KNNI uses a classical approach, employing mainly numbers or classical intervals to represent software attributes and similarity measures originally designed for numerical attributes.

Objectives

This paper evaluates the use of an optimized fuzzy clustering‐based KNNI (FC‐KNNI) and compares it with classical KNN when dealing with mixed data in the context of SDEE.

Methods

We investigate the effect of two imputation techniques (FC‐KNNI and KNNI) on five SDEE techniques: case‐based reasoning, fuzzy case‐based reasoning, support vector regression, multilayer perceptron, and reduced‐error pruning tree. The evaluation is carried out using six publicly available datasets for SDEE using two performance measures, standardized accuracy (SA), and Pred (0.25). The Wilcoxon statistical test is also performed to assess the significance of results.

Results

The results are promising in the sense that using an imputation technique designed for mixed data is better than reusing methods originally designed for numerical data. We found that FC‐KNNI significantly outperforms KNNI regardless of the SDEE technique and dataset used. Another important finding is that F‐CBR improved the analogy process compared to CBR.

Conclusion

The introduction of fuzzy sets and fuzzy clustering in the analogy process improves its performances in terms of SA and Pred (0.25).

Graphical Abstract

This paper investigates the use of k‐nearest neighbors imputation (KNNI) to deal with missing data in software development effort estimation (SDEE). KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. We evaluate the use of an optimized fuzzy clustering‐based KNNI (FC‐KNNI) and compare it with classical KNN when dealing with mixed data in the context of SDEE. The results are promising in the sense that using an imputation technique designed for mixed data is better than reusing methods originally designed for numerical data. KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features.

References

[1]

Wen J, Li S, Lin Z, Hu Y, Huang C. Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol. 2012;54(1):41‐59.

Abstract

Context

Objectives

Methods

Results

Conclusion

Graphical Abstract

References

Recommendations

Four Factors Affecting Missing Data Imputation

A comprehensive empirical evaluation of missing value imputation in noisy software measurement data

Evaluating ensemble imputation in software effort estimation

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations