Feature selection for software effort estimation with localized neighborhood mutual information

Qin Liu¹,
Jiakai Xiao² &
Hongming Zhu¹

428 Accesses
Explore all metrics

Abstract

Feature selection is usually employed before applying case based reasoning (CBR) for Software Effort Estimation (SEE). Unfortunately, most feature selection methods treat CBR as a black box method so there is no guarantee on the appropriateness of CBR on selected feature subset. The key to solve the problem is to measure the appropriateness of CBR assumption for a given feature set. In this paper, a measure called localized neighborhood mutual information (LNI) is proposed for this purpose and a greedy method called LNI based feature selection (LFS) is designed for feature selection. Experiment with leave-one-out cross validation (LOOCV) on 6 benchmark datasets demonstrates that: (1) CBR makes effective estimation with the LFS selected subset compared with a randomized baseline method. Compared with three representative feature selection methods, (2) LFS achieves optimal MAR value on 3 out of 6 datasets with a 14% average improvement and (3) LFS achieves optimal MMRE on 5 out of 6 datasets with a 24% average improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

Investigating the Impact of Functional Size Measurement on Predicting Software Enhancement Effort Using Correlation-Based Feature Selection Algorithm and SVR Method

A Study of Filter-Based Feature Selection in Software Fault Prediction

References

Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches: a survey. Ann. Softw. Eng. 10(1–4), 177–205 (2000)
Article Google Scholar
Vergara, J.R., Esté, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)
Article Google Scholar
Fernandes, S.L., Gurupur, V.P., Sunder, N.R., Arunkumar, N., Kadry, S.: A novel nonintrusive decision support approach for heart rate measurement. Pattern Recognit. Lett. (2017). https://doi.org/10.1016/j.patrec.2017.07.002
Article Google Scholar
Keung, J.W., Kitchenham, B.A., Jeffery, D.R.: Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Softw. Eng. Trans. 34(4), 471–484 (2008)
Article Google Scholar
Guyon, I., Elisseeff, A.E.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In; proceedings of the ICML (2003)
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Article Google Scholar
Esté, V., Pablo, A., et al.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)
Article Google Scholar
Liu, H., et al.: Feature selection with dynamic mutual information. Pattern Recogn. 42(7), 1330–1339 (2009)
Article Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Hu, Q., et al.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 38(9), 10737–10750 (2011)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton (1999)
Google Scholar
Cover, T.M., Thomas, J.A., Kieffer, J.: Elements of information theory. SIAM Rev. 36(3), 509–510 (1994)
Article Google Scholar
Arunkumar, N., Kumar, K.R., Venkataraman, V.: Automatic detection of epileptic seizures using new entropy measures. J. Med. Imaging Health Inform 6(3), 724–730 (2016)
Article Google Scholar
Menzies, T., Krishna, R., Pryor D.: The promise repository of empirical software engineering data. (2015)
Van Hulse, J., Khoshgoftaar, T.M.: A comprehensive empirical evaluation of missing value imputation in noisy software measurement data. J. Syst. Softw. 81(5), 691–708 (2008)
Article Google Scholar
Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)
Article Google Scholar
Kitchenham, B.A., et al. What accuracy statistics really measure [software estimation]. In: Proceedings in Software, IEE (2001)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Kampenes, V.B., et al.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11), 1073–1086 (2007)
Article Google Scholar
Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V., Valentine, J.C. (eds.) The Handbook of Research Synthesis, pp. 231–244. Russell Sage Foundation, New York (1994)
Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Academic Press, Hillsdale (1988)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering, Tongji University, Shanghai, 201804, China
Qin Liu & Hongming Zhu
Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
Jiakai Xiao

Authors

Qin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiakai Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Hongming Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Xiao, J. & Zhu, H. Feature selection for software effort estimation with localized neighborhood mutual information. Cluster Comput 22 (Suppl 3), 6953–6961 (2019). https://doi.org/10.1007/s10586-018-1884-x

Download citation

Received: 05 November 2017
Accepted: 17 January 2018
Published: 19 February 2018
Issue Date: May 2019
DOI: https://doi.org/10.1007/s10586-018-1884-x

Feature selection for software effort estimation with localized neighborhood mutual information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

Investigating the Impact of Functional Size Measurement on Predicting Software Enhancement Effort Using Correlation-Based Feature Selection Algorithm and SVR Method

A Study of Filter-Based Feature Selection in Software Fault Prediction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Feature selection for software effort estimation with localized neighborhood mutual information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

Investigating the Impact of Functional Size Measurement on Predicting Software Enhancement Effort Using Correlation-Based Feature Selection Algorithm and SVR Method

A Study of Filter-Based Feature Selection in Software Fault Prediction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now