Abstract
We analyse the search behaviour of genetic programming (GP) for symbolic regression (SR) in search spaces that are small enough to allow exhaustive enumeration, and use an improved exhaustive symbolic regression algorithm to generate the set of semantically unique expression structures, which is orders of magnitude smaller than the original SR search space. The efficiency of GP and a hypothetical random search in this set of unique expressions is compared, whereby the efficiency is quantified via the number of function evaluations performed until a given error threshold is reached, and the percentage of unique expressions evaluated during the search after simplification to a canonical form. The results for two real-world datasets with a single input variable show that GP in such limited search space explores only a small fraction of the search space, and evaluates semantically equivalent expressions repeatedly. GP has a smaller success probability than the idealised random search for such small search spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\({\text {powabs}}(x,y) =|x|^y\).
- 2.
- 3.
- 4.
- 5.
These experiments were executed on a Dell G5 with a Core i7-9750H and 16 GB of RAM.
- 6.
Details can be found in the supplement.
References
Banzhaf, W., Hu, T., Ochoa, G.: How the combinatorics of neutral spaces leads genetic programming to discover simple solutions. In: Winkler, S., Trujillo, L., Ofria, C., Hu, T. (eds.) Genetic Programming Theory and Practice XX, pp. 65–86. Springer, Singapore (2024). https://doi.org/10.1007/978-981-99-8413-8_4
Bartlett, D.J., Desmond, H.: Marginalised Normal Regression: unbiased curve fitting in the presence of x-errors. Open J. Astrophys. 6, 42 (2023). https://doi.org/10.21105/astro.2309.00948
Bartlett, D.J., Desmond, H., Ferreira, P.G.: Exhaustive symbolic regression. IEEE Trans. Evol. Comput. 28, 950–964 (2023)
Burlacu, B., Affenzeller, M., Kronberger, G., Kommenda, M.: Online diversity control in symbolic regression via a fast hash-based tree similarity measure. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 2175–2182. IEEE (2019)
Burlacu, B., Kammerer, L., Affenzeller, M., Kronberger, G.: Hash-based tree similarity and simplification in genetic programming for symbolic regression. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2019, Part I. LNCS, vol. 12013, pp. 361–369. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45093-9_44
Burlacu, B., Kronberger, G., Kommenda, M.: Operon C++ an efficient genetic programming framework for symbolic regression. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1562–1570 (2020)
Cao, L., Zheng, Z., Ding, C., Cai, J., Jiang, M.: Genetic programming symbolic regression with simplification-pruning operator for solving differential equations. In: Luo, B., Cheng, L., Wu, Z.G., Li, H., Li, C. (eds.) ICONIP 2023. CCIS, vol. 1962, pp. 287–298. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-8132-8_22
Cranmer, M.: Interpretable machine learning for science with PySR and SymbolicRegression.Jl (2023). https://doi.org/10.48550/ARXIV.2305.01582
Daida, J.M., Hilss, A.M.: Identifying structural mechanisms in standard genetic programming. In: Cantú-Paz, E., et al. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1639–1651. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45110-2_58
Daida, J.M., Li, H., Tang, R., Hilss, A.M.: What makes a problem GP-Hard? Validating a hypothesis of structural causes. In: Cantú-Paz, E., et al. (eds.) GECCO 2003, Part II. LNCS, vol. 2724, pp. 1665–1677. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45110-2_60
Desmond, H., Bartlett, D.J., Ferreira, P.G.: On the functional form of the radial acceleration relation. Monthly Not. RAS 521(2), 1817–1831 (2023). https://doi.org/10.1093/mnras/stad597
Ebner, M.: On the search space of genetic programming and its relation to Nature’s search space. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2, pp. 1357–1361. IEEE (1999)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8
de França, F.O.: A greedy search tree heuristic for symbolic regression. Inf. Sci. 442, 18–32 (2018)
de França, F.O.: Transformation-interaction-rational representation for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 920–928 (2022)
de Franca, F.O., Aldeia, G.S.I.: Interaction-transformation evolutionary algorithm for symbolic regression. Evol. Comput. 29(3), 367–390 (2021)
de Franca, F.O., Kronberger, G.: Reducing overparameterization of symbolic regression models with equality saturation. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1064–1072 (2023)
Guimerà, R., et al.: A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6(5) (2020). https://doi.org/10.1126/sciadv.aav6971
Gustafson, S., Burke, E.K., Krasnogor, N.: On improving genetic programming for symbolic regression. In: 2005 IEEE Congress on Evolutionary Computation, vol. 1, pp. 912–919. IEEE (2005)
Halim, A.H., Ismail, I., Das, S.: Performance assessment of the metaheuristic optimization algorithms: an exhaustive review. Artif. Intell. Rev. 54(3), 2323–2409 (2021). https://doi.org/10.1007/s10462-020-09906-6
Hansen, N., Auger, A., Ros, R., Mersmann, O., Tušar, T., Brockhoff, D.: COCO: a platform for comparing continuous optimizers in a black-box setting. Optim. Methods Softw. 36, 114–144 (2021). https://doi.org/10.1080/10556788.2020.1808977
Hu, T., Banzhaf, W.: Neutrality, robustness, and evolvability in genetic programming. In: Riolo, R., Worzel, B., Goldman, B., Tozier, B. (eds.) Genetic Programming Theory and Practice XIV. GEC, pp. 101–117. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97088-2_7
Hu, T., Ochoa, G., Banzhaf, W.: Phenotype search trajectory networks for linear genetic programming. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds.) EuroGP 2023. LNCS, vol. 13986, pp. 52–67. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-29573-7_4
Kammerer, L., Kronberger, G., Burlacu, B., Winkler, S.M., Kommenda, M., Affenzeller, M.: Symbolic regression by exhaustive search: reducing the search space using syntactical constraints and efficient semantic structure deduplication. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds.) Genetic Programming Theory and Practice XVII. GEC, pp. 79–99. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39958-0_5
Kartelj, A., Djukanović, M.: RILS-ROLS: robust symbolic regression via iterated local search and ordinary least squares. J. Big Data 10(1), 71 (2023)
Kommenda, M., Burlacu, B., Kronberger, G., Affenzeller, M.: Parameter identification for symbolic regression using nonlinear least squares. Genet. Program Evolvable Mach. 21(3), 471–501 (2020)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Kronberger, G., Burlacu, B., Kommenda, M., Winkler, S.M., Affenzeller, M.: Symbolic Regression. Chapman & Hall / CRC Press, Boca Raton (2024)
Kronberger, G., de Franca, F.O.: Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming. J. Symb. Comput. (2024)
Langdon, W.B.: Genetic programming convergence. Genet. Program Evolvable Mach. 23(1), 71–104 (2021). https://doi.org/10.1007/s10710-021-09405-9
Lelli, F., McGaugh, S.S., Schombert, J.M., Pawlowski, M.S.: One law to rule them all: the radial acceleration relation of galaxies. Astrophys. J. 836(2), 152 (2017). https://doi.org/10.3847/1538-4357/836/2/152
Margossian, C.C.: A review of automatic differentiation and its efficient implementation. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(4), e1305 (2019)
McGaugh, S.S.: A tale of two paradigms: the mutual incommensurability of \({\varLambda }\)CDM and MOND. Can. J. Phys. 93(2), 250–259 (2015). https://doi.org/10.1139/cjp-2014-0203
McPhee, N.F., Ohs, B., Hutchison, T.: Semantic Building Blocks in Genetic Programming. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 134–145. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78671-9_12
Nelson, C.G.: Techniques for program verification. Stanford University (1980)
Niehaus, J., Igel, C., Banzhaf, W.: Reducing the number of fitness evaluations in graph genetic programming using a canonical graph indexed database. Evol. Comput. 15(2), 199–221 (2007)
Nikuradse, J.: Laws of flow in rough pipes. Technical report, National Advisory Committee for Aeronautics Washington, NACA TM 1292 - Translation of “Strömungsgesetze in rauhen Rohren" VDI-Forschungsheft 361. Beilage zu “Forschung auf dem Gebiete des Ingenieurwesens” Ausgabe B Band 4, July/August 1933 (1950)
Randall, D.L., Townsend, T.S., Hochhalter, J.D., Bomarito, G.F.: Bingo: a customizable framework for symbolic regression with genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 2282–2288 (2022)
Reichardt, I., Pallarès, J., Sales-Pardo, M., Guimerà, R.: Bayesian machine scientist to compare data collapses for the Nikuradse dataset. Phys. Rev. Lett. 124(8) (2020). https://doi.org/10.1103/physrevlett.124.084503
Rivero, D., Fernandez-Blanco, E., Pazos, A.: DoMe: a deterministic technique for equation development and symbolic regression. Expert Syst. Appl. 198, 116712 (2022)
Seidyo Imai Aldeia, G., Olivetti de Franca, F., La Cava, W.G.: Inexact simplification of symbolic regression expressions with locality-sensitive hashing. arXiv e-prints pp. arXiv–2404 (2024)
Sipper, M.: Tiny genetic programming in Python (2019). https://github.com/moshesipper/tiny_gp
Virgolin, M., Alderliesten, T., Witteveen, C., Bosman, P.A.: Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building-block learning. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1041–1048 (2017)
Wang, H., Vermetten, D., Ye, F., Doerr, C., Bäck, T.: IOHanalyzer: detailed performance analyses for iterative optimization heuristics. ACM Trans. Evol. Learn. Optim. 2(1) (2022). https://doi.org/10.1145/3510426
Willsey, M., Nandi, C., Wang, Y.R., Flatt, O., Tatlock, Z., Panchekha, P.: Egg: fast and extensible equality saturation. Proc. ACM Program. Lang. 5(POPL), 1–29 (2021)
Worm, T., Chiu, K.: Prioritized grammar enumeration: symbolic regression by dynamic programming. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1021–1028 (2013)
Acknowledgments
We thank Pedro Ferreira for enabling this collaboration and his ideas for improving this work, Bogdan Burlacu for his help when making changes to Operon, and Alessandro Cheli for help with the Metatheory.jl package. Gabriel Kronberger is supported by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, the Federal Ministry for Labour and Economy, and the regional government of Upper Austria within the COMET project ProMetHeus (904919) supported by the Austrian Research Promotion Agency (FFG). Fabricio Olivetti de Franca is supported by CNPq through the grant 301596/2022-0. Harry Desmond is supported by a Royal Society University Research Fellowship (grant no. 211046). Deaglan J. Bartlett is supported by the Simons Collaboration on “Learning the Universe.”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kronberger, G., Olivetti de Franca, F., Desmond, H., Bartlett, D.J., Kammerer, L. (2024). The Inefficiency of Genetic Programming for Symbolic Regression. In: Affenzeller, M., et al. Parallel Problem Solving from Nature – PPSN XVIII. PPSN 2024. Lecture Notes in Computer Science, vol 15148. Springer, Cham. https://doi.org/10.1007/978-3-031-70055-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-70055-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70054-5
Online ISBN: 978-3-031-70055-2
eBook Packages: Computer ScienceComputer Science (R0)