Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583131.3590368acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article
Open access

Mini-Batching, Gradient-Clipping, First- versus Second-Order: What Works in Gradient-Based Coefficient Optimisation for Symbolic Regression?

Published: 12 July 2023 Publication History

Abstract

The aim of Symbolic Regression (SR) is to discover interpretable expressions that accurately describe data. The accuracy of an expression depends on both its structure and coefficients. To keep the structure simple enough to be interpretable, effective coefficient optimisation becomes key. Gradient-based optimisation is clearly effective at training neural networks in Deep Learning (DL), which can essentially be viewed as large, over-parameterised expressions: in this paper, we study how gradient-based optimisation techniques as often used in DL transfer to SR. In particular, we first assess what techniques work well across random SR expressions, independent of any specific SR algorithm. We find that mini-batching and gradient-clipping can be helpful (similar to DL), while second-order optimisers outperform first-order ones (different from DL). Next, we consider whether including gradient-based optimisation in Genetic Programming (GP), a classic SR algorithm, is beneficial. On five real-world datasets, in a generation-based comparison, we find that second-order optimisation outperforms coefficient mutation (or no optimisation). However, in time-based comparisons, performance gaps shrink substantially because the computational expensiveness of second-order optimisation causes GP to perform fewer generations. The interplay of computational costs between the optimisation of structure and coefficients is thus a critical aspect to consider.

Supplementary Material

PDF File (p1127-harrison-suppl.pdf)
Supplemental material.

References

[1]
117th US Congress. 2022. Algorithmic accountability act. https://www.congress.gov/bill/117th-congress/house-bill/6580/
[2]
Arthur Asuncion and David Newman. 2007. UCI machine learning repository.
[3]
Vladan Babovic and Maarten Keijzer. 2000. Genetic programming as a model induction engine. Journal of Hydroinformatics 2, 1 (2000), 35--60.
[4]
Deaglan J Bartlett, Harry Desmond, and Pedro G Ferreira. 2022. Exhaustive Symbolic Regression. arXiv preprint arXiv:2211.11461 (2022).
[5]
Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, and Giambattista Parascandolo. 2021. Neural symbolic regression that scales. In International Conference on Machine Learning. PMLR, 936--945.
[6]
Bogdan Burlacu, Gabriel Kronberger, and Michael Kommenda. 2020. Operon C++ an efficient genetic programming framework for symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. 1562--1570.
[7]
Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and domain adaptation in GP with gradient descent for symbolic regression. In 2015 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1137--1144.
[8]
Miles Cranmer, Alvaro Sanchez Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. 2020. Discovering symbolic models from deep learning with inductive biases. Advances in Neural Information Processing Systems 33 (2020), 17429--17442.
[9]
Fabrício Olivetti de França. 2018. A greedy search tree heuristic for symbolic regression. Information Sciences 442 (2018), 18--32.
[10]
Grant Dick, Caitlin A Owen, and Peter A Whigham. 2020. Feature standardisation and coefficient optimisation for effective symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 306--314.
[11]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997--2017.
[12]
European Commission. 2021. Artificial intelligence act. https://artificialintelligenceact.eu/
[13]
Matthew Evett and Thomas Fernandez. 1998. Numeric Mutation Improves the Discovery of Numeric Constants in Genetic Programming, Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA. University of Wisconsin, Madi-son, Wisconsin, USA (1998), 66--71.
[14]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 249--256.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. 770--778.
[16]
Daniel Hein, Steffen Udluft, and Thomas A Runkler. 2018. Interpretable policies for reinforcement learning by genetic programming. Engineering Applications of Artificial Intelligence 76 (2018), 158--169.
[17]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.
[18]
Dario Izzo, Francesco Biscani, and Alessio Mereta. 2017. Differentiable genetic programming. In European conference on genetic programming. Springer, 35--51.
[19]
Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. 2019. Bayesian symbolic regression. arXiv preprint arXiv:1910.08892 (2019).
[20]
Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, and François Charton. 2022. End-to-end symbolic regression with transformers. arXiv preprint arXiv:2204.10532 (2022).
[21]
Lukas Kammerer, Gabriel Kronberger, Bogdan Burlacu, Stephan M Winkler, Michael Kommenda, and Michael Affenzeller. 2020. Symbolic regression by exhaustive search: reducing the search space using syntactical constraints and efficient semantic structure deduplication. In Genetic Programming Theory and Practice XVII. Springer, 79--99.
[22]
Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In European Conference on Genetic Programming. Springer, 70--82.
[23]
Liron Simon Keren, Alex Liberzon, and Teddy Lazebnik. 2023. A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledge. Scientific Reports 13 (2023). Issue 1.
[24]
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2016. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016).
[25]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[26]
Michael Kommenda, Bogdan Burlacu, Gabriel Kronberger, and Michael Affenzeller. 2020. Parameter identification for symbolic regression using nonlinear least squares. Genetic Programming and Evolvable Machines 21, 3 (2020), 471--501.
[27]
Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner. 2013. Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In Proceedings of the 15th annual Conference companion on Genetic and Evolutionary Computation. 1121--1128.
[28]
John R Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Vol. 1. MIT press Cambridge, MA, USA.
[29]
Gabriel Kronberger. 2022. Local Optimization Often is Ill-conditioned in Genetic Programming for Symbolic Regression. arXiv preprint arXiv:2209.00942 (2022).
[30]
William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H Moore. 2021. Contemporary symbolic regression methods and their relative performance. arXiv preprint arXiv:2107.14351 (2021).
[31]
Guillaume Lample and François Charton. 2019. Deep learning for symbolic mathematics. arXiv preprint arXiv:1912.01412 (2019).
[32]
Mikel Landajuela, Chak Lee, Jiachen Yang, Ruben Glatt, Claudio P. Santiago, Ignacio Aravena, Terrell N. Mundhenk, Garrett Mulcahy, and Brenden K. Petersen. 2022. A Unified Framework for Deep Symbolic Regression. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=2FNnBhwJsHK
[33]
Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics 2, 2 (1944), 164--168.
[34]
Paweł Liskowski, Krzysztof Krawiec, Nihat Engin Toklu, and Jerry Swan. 2020. Program synthesis as latent continuous optimization: Evolutionary search in neural embeddings. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 359--367.
[35]
Francesco Marchetti and Edmondo Minisci. 2020. A hybrid neural network-genetic programming intelligent control approach. In International Conference on Bioinspired Methods and Their Applications. Springer, 240--254.
[36]
Donald W Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11, 2 (1963), 431--441.
[37]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (Haifa, Israel) (ICML'10). Omnipress, Madison, WI, USA, 807--814.
[38]
Caitlin A Owen, Grant Dick, and Peter A Whigham. 2018. Feature standardisation in symbolic regression. In Australasian Joint Conference on Artificial Intelligence. Springer, 565--576.
[39]
Michael O'Neill. 2009. Riccardo Poli, William B. Langdon, Nicholas F. McPhee: a field guide to genetic programming.
[40]
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International conference on machine learning. PMLR, 1310--1318.
[41]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).
[42]
Brenden K Petersen, Mikel Landajuela Larma, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. 2019. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. arXiv preprint arXiv:1912.04871 (2019).
[43]
Gloria Pietropolli, Luca Manzoni, Alessia Paoletti, and Mauro Castelli. 2022. Combining geometric semantic gp with gradient-descent optimization. In European Conference on Genetic Programming (Part of EvoStar). Springer, 19--33.
[44]
Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).
[45]
Alexander Topchy, William F Punch, et al. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann San Francisco, CA.
[46]
Silviu-Marian Udrescu and Max Tegmark. 2020. AI Feynman: A physics-inspired method for symbolic regression. Science Advances 6, 16 (2020), eaay2631.
[47]
Marco Virgolin, Tanja Alderliesten, Arjan Bel, Cees Witteveen, and Peter AN Bosman. 2018. Symbolic regression and feature construction with GP-GOMEA applied to radiotherapy dose reconstruction of childhood cancer survivors. In Proceedings of the Genetic and Evolutionary Computation Conference. 1395--1402.
[48]
Marco Virgolin, Tanja Alderliesten, Cees Witteveen, and Peter AN Bosman. 2017. Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building-block learning. In Proceedings of the Genetic and Evolutionary Computation Conference. 1041--1048.
[49]
Marco Virgolin, Tanja Alderliesten, Cees Witteveen, and Peter AN Bosman. 2021. Improving model-based genetic programming for symbolic regression of small expressions. Evolutionary computation 29, 2 (2021), 211--237.
[50]
Marco Virgolin and Peter AN Bosman. 2022. Coefficient mutation in the gene-pool optimal mixing evolutionary algorithm for symbolic regression. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2289--2297.
[51]
Marco Virgolin and Solon P Pissis. 2022. Symbolic Regression is NP-hard. arXiv preprint arXiv:2207.01018 (2022).
[52]
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261--272.
[53]
Xingyu Wang, Sewoong Oh, and Chang-Han Rhee. 2021. Eliminating sharp minima from SGD with truncated heavy-tailed noise. arXiv preprint arXiv:2102.04297 (2021).
[54]
David Wittenberg, Franz Rothlauf, and Dirk Schweim. 2020. DAE-GP: denoising autoencoder LSTM networks as probabilistic models in estimation of distribution genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 1037--1045.
[55]
Stephen Wright, Jorge Nocedal, et al. 1999. Numerical optimization. Springer Science 35, 67--68 (1999), 7.
[56]
Lei Wu, Chao Ma, et al. 2018. How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective. Advances in Neural Information Processing Systems 31 (2018).

Cited By

View all
  • (2024)Automatic design of interpretable control laws through parametrized Genetic Programming with adjoint state method gradient evaluationApplied Soft Computing10.1016/j.asoc.2024.111654159(111654)Online publication date: Jul-2024
  • (2024)Simultaneous Model-Based Evolution of Constants and Expression Structure in GP-GOMEA for Symbolic RegressionParallel Problem Solving from Nature – PPSN XVIII10.1007/978-3-031-70055-2_15(238-255)Online publication date: 14-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
July 2023
1667 pages
ISBN:9798400701191
DOI:10.1145/3583131
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2023

Check for updates

Author Tags

  1. gradient descent
  2. symbolic regression
  3. genetic programming
  4. explainable AI
  5. coefficient optimisation

Qualifiers

  • Research-article

Funding Sources

  • NWO

Conference

GECCO '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)190
  • Downloads (Last 6 weeks)13
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Automatic design of interpretable control laws through parametrized Genetic Programming with adjoint state method gradient evaluationApplied Soft Computing10.1016/j.asoc.2024.111654159(111654)Online publication date: Jul-2024
  • (2024)Simultaneous Model-Based Evolution of Constants and Expression Structure in GP-GOMEA for Symbolic RegressionParallel Problem Solving from Nature – PPSN XVIII10.1007/978-3-031-70055-2_15(238-255)Online publication date: 14-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media