research-article

Evaluating ensemble imputation in software effort estimation

Authors:

Ibtissam Abnane,

Ali Idri,

Imane Chlioui,

Alain AbranAuthors Info & Claims

Empirical Software Engineering, Volume 28, Issue 2

https://doi.org/10.1007/s10664-022-10260-0

Published: 15 March 2023 Publication History

Abstract

Choosing the appropriate missing data (MD) imputation technique for a given software development effort estimation (SDEE) technique is not a trivial task. In fact, the impact of MD imputation on the estimation output depends on the dataset and the SDEE technique used, and there is no best imputation technique in all contexts. Thus, an attractive solution is to use more than one imputation technique and combine their results to obtain a final imputation outcome. This concept is called ensemble imputation and can significantly improve the effort estimation accuracy. This study proposes and constructs 11 heterogeneous ensemble imputation techniques, whose members are two, three, or four of the following single imputation techniques: K-nearest neighbors, expectation maximization, support vector regression (SVR) and decision trees (DTs). The effects of single/ensemble imputation techniques on SDEE performance were evaluated over six SDEE datasets: COCOMO81, ISBSG, Desharnais, China, Kemerer, and Miyazaki. Five SDEE performance measures were used: standardized accuracy (SA), predictor at 25% (Pred (0.25)), mean balanced relative error (MBRE), mean inverted balanced relative error (MIBRE), and logarithmic standard deviation (LSD). Moreover, we used: (1) the Skott-Knott (SK) statistical test to cluster and compare the results, and (2) the Borda count method to rank the SDEE techniques belonging to the best SK cluster.

The results showed that ensemble imputers significantly improved the performance of SDEE techniques compared to single imputation techniques. We also found that adding one or more imputers to the ensemble imputers generally led to a significant improvement in the SDEE performance. When the performance improvement is not significant, it is better to use the ensemble imputer with the minimum number of members because it is less complex. For ensemble imputers, the results suggest that no particular ensemble imputer gave the best results in all contexts. Overall, SVR imputation was the best imputation technique used to construct ensemble imputers for the SDEE. For the SDEE techniques, the best results were obtained by the DTs and SVR variants using ensemble imputation.

References

[1]

Abnane I, Hosni M, Idri A, Abran A (2019) Analogy software effort estimation using ensemble KNN imputation. 2019 45th Euromicro Conf Softw Eng Adv Appl 228–235.

Abstract

References

Recommendations

Heterogeneous ensemble imputation for software development effort estimation

Missing data techniques in analogy-based software development effort estimation

Imputing missing value through ensemble concept based on statistical measures

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations