research-article

Open access

Mini-Batching, Gradient-Clipping, First- versus Second-Order: What Works in Gradient-Based Coefficient Optimisation for Symbolic Regression?

Authors:

Marco Virgolin,

Tanja Alderliesten,

Peter BosmanAuthors Info & Claims

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1127 - 1136

https://doi.org/10.1145/3583131.3590368

Published: 12 July 2023 Publication History

Abstract

The aim of Symbolic Regression (SR) is to discover interpretable expressions that accurately describe data. The accuracy of an expression depends on both its structure and coefficients. To keep the structure simple enough to be interpretable, effective coefficient optimisation becomes key. Gradient-based optimisation is clearly effective at training neural networks in Deep Learning (DL), which can essentially be viewed as large, over-parameterised expressions: in this paper, we study how gradient-based optimisation techniques as often used in DL transfer to SR. In particular, we first assess what techniques work well across random SR expressions, independent of any specific SR algorithm. We find that mini-batching and gradient-clipping can be helpful (similar to DL), while second-order optimisers outperform first-order ones (different from DL). Next, we consider whether including gradient-based optimisation in Genetic Programming (GP), a classic SR algorithm, is beneficial. On five real-world datasets, in a generation-based comparison, we find that second-order optimisation outperforms coefficient mutation (or no optimisation). However, in time-based comparisons, performance gaps shrink substantially because the computational expensiveness of second-order optimisation causes GP to perform fewer generations. The interplay of computational costs between the optimisation of structure and coefficients is thus a critical aspect to consider.

Supplementary Material

PDF File (p1127-harrison-suppl.pdf)

Supplemental material.

Download
1.45 MB

References

[1]

117th US Congress. 2022. Algorithmic accountability act. https://www.congress.gov/bill/117th-congress/house-bill/6580/

[2]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository.

[3]

Vladan Babovic and Maarten Keijzer. 2000. Genetic programming as a model induction engine. Journal of Hydroinformatics 2, 1 (2000), 35--60.

[4]

Deaglan J Bartlett, Harry Desmond, and Pedro G Ferreira. 2022. Exhaustive Symbolic Regression. arXiv preprint arXiv:2211.11461 (2022).

[5]

Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, and Giambattista Parascandolo. 2021. Neural symbolic regression that scales. In International Conference on Machine Learning. PMLR, 936--945.

[6]

Bogdan Burlacu, Gabriel Kronberger, and Michael Kommenda. 2020. Operon C++ an efficient genetic programming framework for symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. 1562--1570.

Digital Library

[7]

Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and domain adaptation in GP with gradient descent for symbolic regression. In 2015 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1137--1144.

[8]

Miles Cranmer, Alvaro Sanchez Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. 2020. Discovering symbolic models from deep learning with inductive biases. Advances in Neural Information Processing Systems 33 (2020), 17429--17442.

[9]

Fabrício Olivetti de França. 2018. A greedy search tree heuristic for symbolic regression. Information Sciences 442 (2018), 18--32.

Digital Library

[10]

Grant Dick, Caitlin A Owen, and Peter A Whigham. 2020. Feature standardisation and coefficient optimisation for effective symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 306--314.

Digital Library

[11]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997--2017.

Digital Library

[12]

European Commission. 2021. Artificial intelligence act. https://artificialintelligenceact.eu/

[13]

Matthew Evett and Thomas Fernandez. 1998. Numeric Mutation Improves the Discovery of Numeric Constants in Genetic Programming, Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA. University of Wisconsin, Madi-son, Wisconsin, USA (1998), 66--71.

[14]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 249--256.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. 770--778.

[16]

Daniel Hein, Steffen Udluft, and Thomas A Runkler. 2018. Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence 76 (2018), 158--169.

[17]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.

[18]

Dario Izzo, Francesco Biscani, and Alessio Mereta. 2017. Differentiable genetic programming. In European conference on genetic programming. Springer, 35--51.

[19]

Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. 2019. Bayesian symbolic regression. arXiv preprint arXiv:1910.08892 (2019).

[20]

Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, and François Charton. 2022. End-to-end symbolic regression with transformers. arXiv preprint arXiv:2204.10532 (2022).

[21]

Lukas Kammerer, Gabriel Kronberger, Bogdan Burlacu, Stephan M Winkler, Michael Kommenda, and Michael Affenzeller. 2020. Symbolic regression by exhaustive search: reducing the search space using syntactical constraints and efficient semantic structure deduplication. In Genetic Programming Theory and Practice XVII. Springer, 79--99.

[22]

Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In European Conference on Genetic Programming. Springer, 70--82.

[23]

Liron Simon Keren, Alex Liberzon, and Teddy Lazebnik. 2023. A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledge. Scientific Reports 13 (2023). Issue 1.

[24]

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016).

[25]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[26]

Michael Kommenda, Bogdan Burlacu, Gabriel Kronberger, and Michael Affenzeller. 2020. Parameter identification for symbolic regression using nonlinear least squares. Genetic Programming and Evolvable Machines 21, 3 (2020), 471--501.

Digital Library

[27]

Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner. 2013. Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In Proceedings of the 15th annual Conference companion on Genetic and Evolutionary Computation. 1121--1128.

Digital Library

[28]

John R Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Vol. 1. MIT press Cambridge, MA, USA.

Digital Library

[29]

Gabriel Kronberger. 2022. Local Optimization Often is Ill-conditioned in Genetic Programming for Symbolic Regression. arXiv preprint arXiv:2209.00942 (2022).

[30]

William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H Moore. 2021. Contemporary symbolic regression methods and their relative performance. arXiv preprint arXiv:2107.14351 (2021).

[31]

Guillaume Lample and François Charton. 2019. Deep learning for symbolic mathematics. arXiv preprint arXiv:1912.01412 (2019).

[32]

Mikel Landajuela, Chak Lee, Jiachen Yang, Ruben Glatt, Claudio P. Santiago, Ignacio Aravena, Terrell N. Mundhenk, Garrett Mulcahy, and Brenden K. Petersen. 2022. A Unified Framework for Deep Symbolic Regression. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=2FNnBhwJsHK

[33]

Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics 2, 2 (1944), 164--168.

[34]

Paweł Liskowski, Krzysztof Krawiec, Nihat Engin Toklu, and Jerry Swan. 2020. Program synthesis as latent continuous optimization: Evolutionary search in neural embeddings. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 359--367.

Digital Library

[35]

Francesco Marchetti and Edmondo Minisci. 2020. A hybrid neural network-genetic programming intelligent control approach. In International Conference on Bioinspired Methods and Their Applications. Springer, 240--254.

Digital Library

[36]

Donald W Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11, 2 (1963), 431--441.

[37]

Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (Haifa, Israel) (ICML'10). Omnipress, Madison, WI, USA, 807--814.

[38]

Caitlin A Owen, Grant Dick, and Peter A Whigham. 2018. Feature standardisation in symbolic regression. In Australasian Joint Conference on Artificial Intelligence. Springer, 565--576.

[39]

Michael O'Neill. 2009. Riccardo Poli, William B. Langdon, Nicholas F. McPhee: a field guide to genetic programming.

[40]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International conference on machine learning. PMLR, 1310--1318.

[41]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).

[42]

Brenden K Petersen, Mikel Landajuela Larma, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. 2019. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. arXiv preprint arXiv:1912.04871 (2019).

[43]

Gloria Pietropolli, Luca Manzoni, Alessia Paoletti, and Mauro Castelli. 2022. Combining geometric semantic gp with gradient-descent optimization. In European Conference on Genetic Programming (Part of EvoStar). Springer, 19--33.

Digital Library

[44]

Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).

[45]

Alexander Topchy, William F Punch, et al. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann San Francisco, CA.

[46]

Silviu-Marian Udrescu and Max Tegmark. 2020. AI Feynman: A physics-inspired method for symbolic regression. Science Advances 6, 16 (2020), eaay2631.

[47]

Marco Virgolin, Tanja Alderliesten, Arjan Bel, Cees Witteveen, and Peter AN Bosman. 2018. Symbolic regression and feature construction with GP-GOMEA applied to radiotherapy dose reconstruction of childhood cancer survivors. In Proceedings of the Genetic and Evolutionary Computation Conference. 1395--1402.

Digital Library

[48]

Marco Virgolin, Tanja Alderliesten, Cees Witteveen, and Peter AN Bosman. 2017. Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building-block learning. In Proceedings of the Genetic and Evolutionary Computation Conference. 1041--1048.

Digital Library

[49]

Marco Virgolin, Tanja Alderliesten, Cees Witteveen, and Peter AN Bosman. 2021. Improving model-based genetic programming for symbolic regression of small expressions. Evolutionary computation 29, 2 (2021), 211--237.

[50]

Marco Virgolin and Peter AN Bosman. 2022. Coefficient mutation in the gene-pool optimal mixing evolutionary algorithm for symbolic regression. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2289--2297.

Digital Library

[51]

Marco Virgolin and Solon P Pissis. 2022. Symbolic Regression is NP-hard. arXiv preprint arXiv:2207.01018 (2022).

[52]

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261--272.

[53]

Xingyu Wang, Sewoong Oh, and Chang-Han Rhee. 2021. Eliminating sharp minima from SGD with truncated heavy-tailed noise. arXiv preprint arXiv:2102.04297 (2021).

[54]

David Wittenberg, Franz Rothlauf, and Dirk Schweim. 2020. DAE-GP: denoising autoencoder LSTM networks as probabilistic models in estimation of distribution genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 1037--1045.

Digital Library

[55]

Stephen Wright, Jorge Nocedal, et al. 1999. Numerical optimization. Springer Science 35, 67--68 (1999), 7.

[56]

Lei Wu, Chao Ma, et al. 2018. How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective. Advances in Neural Information Processing Systems 31 (2018).

Cited By

Marchetti FPietropolli GCamerota Verdù FCastelli MMinisci E(2024)Automatic design of interpretable control laws through parametrized Genetic Programming with adjoint state method gradient evaluationApplied Soft Computing10.1016/j.asoc.2024.111654159(111654)Online publication date: Jul-2024
https://doi.org/10.1016/j.asoc.2024.111654
Koch JAlderliesten TBosman P(2024)Simultaneous Model-Based Evolution of Constants and Expression Structure in GP-GOMEA for Symbolic RegressionParallel Problem Solving from Nature – PPSN XVIII10.1007/978-3-031-70055-2_15(238-255)Online publication date: 14-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70055-2_15

Index Terms

Mini-Batching, Gradient-Clipping, First- versus Second-Order: What Works in Gradient-Based Coefficient Optimisation for Symbolic Regression?
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression
    2. Machine learning approaches
      1. Bio-inspired approaches
        Genetic programming
  2. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
      1. Hybrid symbolic-numeric methods

Recommendations

Relieving Genetic Programming from Coefficient Learning for Symbolic Regression via Correlation and Linear Scaling
GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

The difficulty of learning optimal coefficients in regression models using only genetic operators has long been a challenge in genetic programming for symbolic regression. As a simple but effective remedy it has been proposed to perform linear scaling ...
Parametrizing GP Trees for Better Symbolic Regression Performance through Gradient Descent.
GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

Symbolic regression is a common problem in genetic programming (GP), but the syntactic search carried out by the standard GP algorithm often struggles to tune the learned expressions. On the other hand, gradient-based optimizers can efficiently tune ...
Genetic programming performance prediction and its application for symbolic regression problems
Highlights
- Theoretical analysis of GP performance prediction.
- Suggestion of upper bounds ...
Abstract
Predicting the performance of Genetic Programming (GP) helps us identify whether it is an appropriate approach to solve the problem at hand. However, previous studies show that measuring the difficulty of a problem for GP and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

July 2023

1667 pages

ISBN:9798400701191

DOI:10.1145/3583131

Chair:
Sara Silva,
Program Chair:
Luís Paquete

Copyright © 2023 Owner/Author(s).

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NWO

Conference

GECCO '23

Sponsor:

SIGEVO

GECCO '23: Genetic and Evolutionary Computation Conference

July 15 - 19, 2023

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
258
Total Downloads

Downloads (Last 12 months)190
Downloads (Last 6 weeks)13

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marchetti FPietropolli GCamerota Verdù FCastelli MMinisci E(2024)Automatic design of interpretable control laws through parametrized Genetic Programming with adjoint state method gradient evaluationApplied Soft Computing10.1016/j.asoc.2024.111654159(111654)Online publication date: Jul-2024
https://doi.org/10.1016/j.asoc.2024.111654
Koch JAlderliesten TBosman P(2024)Simultaneous Model-Based Evolution of Constants and Expression Structure in GP-GOMEA for Symbolic RegressionParallel Problem Solving from Nature – PPSN XVIII10.1007/978-3-031-70055-2_15(238-255)Online publication date: 14-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70055-2_15

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents