A Hybrid GP-KNN Imputation for Symbolic Regression with Missing Values

Baligh Al-Helali¹⁶,
Qi Chen¹⁶,
Bing Xue¹⁶ &
…
Mengjie Zhang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11320))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2779 Accesses
20 Citations

Abstract

In data science, missingness is a serious challenge when dealing with real-world data sets. Although many imputation approaches have been proposed to tackle missing values in machine learning, most studies focus on the classification task rather than the regression task. To the best of our knowledge, no study has been conducted to investigate the use of imputation methods when performing symbolic regression on incomplete real-world data sets. In this work, we propose a new imputation method called GP-KNN which is a hybrid method employing two concepts: Genetic Programming Imputation (GPI) and K-Nearest Neighbour (KNN). GP-KNN considers both the feature and instance relevance. The experimental results show that the proposed method has a better performance comparing to state-of-the-art imputation methods in most of the considered cases with respect to both imputation accuracy and symbolic regression performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Article 07 February 2021

Genetic Programming-Based Selection of Imputation Methods in Symbolic Regression with Missing Values

Genetic Programming-Based Simultaneous Feature Selection and Imputation for Symbolic Regression with Incomplete Data

References

Austel, V., et al.: Globally optimal symbolic regression. arXiv preprint arXiv:1710.10720 (2017)
Beretta, L., Santaniello, A.: Nearest neighbor imputation algorithms: a critical evaluation. BMC Med. Inform. Decis. Mak. 16(3), 74 (2016). https://doi.org/10.1186/s12911-016-0318-z
Article Google Scholar
Brandejsky, T.: Model identification from incomplete data set describing state variable subset only - the problem of optimizing and predicting heuristic incorporation into evolutionary system. In: Zelinka, I., Chen, G., Rössler, O., Snasel, V., Abraham, A. (eds.) Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems. AISC, vol. 210, pp. 181–189. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00542-3_19
Chapter MATH Google Scholar
van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)
Google Scholar
Chen, Q., Zhang, M., Xue, B.: Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans. Evol. Comput. 21(5), 792–806 (2017). https://doi.org/10.1109/TEVC.2017.2683489
Article Google Scholar
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dick, G.: Bloat and generalisation in symbolic regression. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 491–502. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13563-2_42
Chapter Google Scholar
Donders, A.R.T., van der Heijden, G.J., Stijnen, T., Moons, K.G.: Review: a gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Article Google Scholar
Eggermont, J., et al.: Data mining using genetic programming: classification and symbolic regression. Institute for Programming research and Algorithmics, Leiden Institute of Advanced Computer Science, Faculty of Mathematics & Natural Sciences, Leiden University (2005)
Google Scholar
Fortin, F.A., Rainville, F.M.D., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13(Jul), 2171–2175 (2012)
MathSciNet MATH Google Scholar
Haitovsky, Y.: Missing data in regression analysis. J. R. Stat. Soc. Ser. B (Methodol.) 30, 67–82 (1968)
MATH Google Scholar
Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 61(1), 79–90 (2007)
Article MathSciNet Google Scholar
Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994)
Article Google Scholar
Loh, P.L., Wainwright, M.J.: High-dimensional regression with noisy and missing data: provable guarantees with non-convexity. In: Advances in Neural Information Processing Systems, pp. 2726–2734 (2011)
Google Scholar
Pennachin, C., Looks, M., de Vasconcelos, J.: Improved time series prediction and symbolic regression with affine arithmetic. In: Riolo, R., Vladislavleva, E., Moore, J. (eds.) Genetic Programming Theory and Practice IX. GEVO, pp. 97–112. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1770-5_6
Chapter Google Scholar
Pornprasertmanit, S., Miller, P., Schoemann, A., Quick, C., Jorgensen, T., Pornprasertmanit, M.S.: Package ‘simsem’ (2016)
Google Scholar
Tran, C.T., Zhang, M., Andreae, P.: Multiple imputation for missing data using genetic programming. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 583–590. ACM (2015)
Google Scholar
Tran, C.T., Zhang, M., Andreae, P.: A genetic programming-based imputation method for classification with missing data. In: Heywood, M.I., McDermott, J., Castelli, M., Costa, E., Sim, K. (eds.) EuroGP 2016. LNCS, vol. 9594, pp. 149–163. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30668-1_10
Chapter Google Scholar
Tran, C.T., Zhang, M., Andreae, P., Xue, B.: Multiple imputation and genetic programming for classification with incomplete data. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 521–528. ACM (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington, 6400, New Zealand
Baligh Al-Helali, Qi Chen, Bing Xue & Mengjie Zhang

Authors

Baligh Al-Helali
View author publications
You can also search for this author in PubMed Google Scholar
Qi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bing Xue
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baligh Al-Helali .

Editor information

Editors and Affiliations

University of Canterbury, Christchurch, New Zealand
Tanja Mitrovic
School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Bing Xue
RMIT University, Melbourne, VIC, Australia
Xiaodong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al-Helali, B., Chen, Q., Xue, B., Zhang, M. (2018). A Hybrid GP-KNN Imputation for Symbolic Regression with Missing Values. In: Mitrovic, T., Xue, B., Li, X. (eds) AI 2018: Advances in Artificial Intelligence. AI 2018. Lecture Notes in Computer Science(), vol 11320. Springer, Cham. https://doi.org/10.1007/978-3-030-03991-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-03991-2_33
Published: 10 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03990-5
Online ISBN: 978-3-030-03991-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hybrid GP-KNN Imputation for Symbolic Regression with Missing Values

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Genetic Programming-Based Selection of Imputation Methods in Symbolic Regression with Missing Values

Genetic Programming-Based Simultaneous Feature Selection and Imputation for Symbolic Regression with Incomplete Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Hybrid GP-KNN Imputation for Symbolic Regression with Missing Values

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Genetic Programming-Based Selection of Imputation Methods in Symbolic Regression with Missing Values

Genetic Programming-Based Simultaneous Feature Selection and Imputation for Symbolic Regression with Incomplete Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation