Abstract
This paper introduces Multi-Region Symbolic Regression (MR-SR), a general framework that divides the original input data space of symbolic regression problems into subspaces (regions), generates different solutions to fit these regions and then combines them. MR-SR has three main components: (1) a strategy for finding the different regions of the data input space; (2) a method for generating the functions for each region; and (3) a strategy for combining the models found by (2). The main contribution of this paper is on how we generate the functions for each region. We model the function generation problem following a multi-objective approach, where each objective corresponds to the quality of the evolved function in a region, and the number of objectives is equal to the number of regions of the data input space. We test MR-SR in two scenarios with different objectives. In the first, we used the new approach to solve the symbolic regression problem with standard GP, with the main objective of reducing error rate. In the second, we took advantage of this method for a different purpose: to reduce the dimensionality of the semantic space of a variation of GP, namely Geometric Semantic Genetic Programming (GSGP). Results in 10 datasets showed that the method using clustering k-means and a model switching strategy—which makes predictions using the best evolved function for the region of interest—obtained better results in 5 out of 10 datasets for GP with 2 regions. For GSGP the framework was less effective due to the lack of diversity of the solutions evolved.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Archetti F, Lanzeni S, Messina E, Vanneschi L (2006) Genetic programming for human oral bioavailability of drugs. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation. pp 255–262
Arnaldo I, Krawiec K, O’Reilly U-M (2014) Multiple regression genetic programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 879–886
Brazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer, Berlin
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Casadei F, Martins JFBS, Pappa GL (2019) A multi-objective approach for symbolic regression with semantic genetic programming. In 2019 8th Brazilian conference on intelligent systems (BRACIS). pp 66–71
Castelli M, Silva S, Vanneschi L (2015) A c++ framework for geometric semantic genetic programming. Genet Program Evol Mach 16(1):73–81
Castelli M, Gonçalves I, Manzoni L, Vanneschi L (2018) Pruning techniques for mixed ensembles of genetic programming models. In: European conference on genetic programming. pp 52–67
Chen Q (2018) Improving the generalisation of genetic programming for symbolic regression, PhD thesis, Victoria University of Wellington
Coello CAC, Lamont GB, Van Veldhuizen DA et al (2007) Evolutionary algorithms for solving multi-objective problems, vol 5. Springer, Berlin
De Stefano C, Folino G, Fontanella F, Di Freca AS (2014) Using bayesian networks for selecting classifiers in gp ensembles. Inf Sci 258:200–216
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Dua D, Graff C (2017) UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231
Gagné C, Sebag M, Schoenauer M, Tomassini M (2007) Ensemble learning for free with evolutionary algorithms? In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. pp 1782–1789
Galván E, Schoenauer M (2019) Promoting semantic diversity in multi-objective genetic programming. In: Proceedings of the genetic and evolutionary computation conference. ACM, pp 1021–1029
Galván-López E, Mezura-Montes E, ElHara OA, Schoenauer M (2016) On the use of semantics in multi-objective genetic programming. In: International conference on parallel problem solving from nature. Springer, pp 353–363
Hisao I, Noritaka T, Yusuke N (2008) Evolutionary many-objective optimization: a short review. In: 2008 IEEE congress on evolutionary computation (IEEE world congress on computational intelligence), June 2008, pp 2419–2426
Kommenda M, Kronberger G, Affenzeller M, Winkler SM, Burlacu B (2016) Evolving simple symbolic regression models by multi-objective genetic programming. In: Genetic programming theory and practice XIII. Springer, pp 1–19
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, London
McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K et al (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on Genetic and evolutionary computation. pp 791–798
Moraglio A (2014) An efficient implementation of gsgp using higher-order functions and memoization. Semantic Methods in Genetic Programming, Ljubljana, Slovenia, 13
Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. Springer, Berlin, Heidelberg, pp 21–31
Oliveira LOV, Miranda LF, Pappa GL, Otero FE, Takahashi RH (2016) Reducing dimensionality to improve search in semantic genetic programming. In: International conference on parallel problem solving from nature. Springer, pp 375–385
Oliveira LOV, Otero FE, Pappa GL, Albinati J (2015) Sequential symbolic regression with genetic programming. In: Genetic programming theory and practice XII. Springer, pp 73–90
Potter MA, De Jong KA (1994) A cooperative coevolutionary approach to function optimization. In: International conference on parallel problem solving from nature. Springer, pp 249–257
Smits GF, Kotanchek M (2005) Pareto-front exploitation in symbolic regression. In: Genetic programming theory and practice II. Springer, pp 283–299
Tsai C-F, Eberle W, Chu C-Y (2013) Genetic algorithms in feature and instance selection. Knowl Based Syst 39:240–247
Veeramachaneni K, Derby O, Sherry D, O’Reilly U-M (2013) Learning regression ensembles with genetic programming at scale. In: Proceedings of the 15th annual conference on genetic and evolutionary computation. pp 1117–1124
Vladislavleva EJ, Smits GF, Den Hertog D (2008) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans Evol Comput 13(2):333–349
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193
Zaki MJ, Meira Jr W (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Acknowledgements
This work was partially supported by CNPq and Fapemig.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Casadei, F., Pappa, G.L. Multi-region symbolic regression: combining functions under a multi-objective approach. Nat Comput 20, 753–773 (2021). https://doi.org/10.1007/s11047-021-09851-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-021-09851-5