Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2001858.2002060acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
tutorial

Overfitting detection and adaptive covariant parsimony pressure for symbolic regression

Published: 12 July 2011 Publication History

Abstract

Covariant parsimony pressure is a theoretically motivated method primarily aimed to control bloat. In this contribution we describe an adaptive method to control covariant parsimony pressure that is aimed to reduce overfitting in symbolic regression. The method is based on the assumption that overfitting can be reduced by controlling the evolution of program length. Additionally, we propose an overfitting detection criterion that is based on the correlation of the fitness values on the training set and a validation set of all models in the population.
The proposed method uses covariant parsimony pressure to decrease the average program length when overfitting occurs and allows an increase of the average program length in the absence of overfitting. The proposed approach is applied on two real world datasets. The experimental results show that the correlation of training and validation fitness can be used as an indicator for overfitting and that the proposed method of covariant parsimony pressure adaption alleviates overfitting in symbolic regression experiments with the two datasets.

References

[1]
H. Akaike. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, pages 267--281. 1973.
[2]
R. M. A. Azad and C. Ryan. Abstract functions and lifetime learning in genetic programming for symbolic regression. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO '10, pages 893--900, New York, NY, USA, 2010. ACM.
[3]
S. Dignum and R. Poli. Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, volume 2, pages 1588--1595, London, 7-11 July 2007. ACM Press.
[4]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[5]
C. Gagne, M. Schoenauer, M. Parizeau, and M. Tomassini. Genetic programming, validation sets, and parsimony pressure. In Genetic Programming, 9th European Conference, EuroGP2006, volume 3905 of Lecture Notes in Computer Science, pages 109--120, Berlin, Heidelberg, New York, 2006. Springer.
[6]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, 2009. Second Edition.
[7]
M. Keijzer. Scaled symbolic regression. Genetic Programming and Evolvable Machines, 5(3):259--269, Sept. 2004.
[8]
S. Luke. Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation, 4(3):274--283, Sept. 2000.
[9]
R. Poli and N. F. McPhee. Covariant parsimony pressure for genetic programming. Technical Report CES-480, Department of Computing and Electronic Systems, University of Essex, UK, 2008.
[10]
J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11:416--431, 1983.
[11]
M. Schmidt and H. Lipson. Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pages 73--85. Springer US, 2010.
[12]
G. E. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461--464, 1978.
[13]
S. Silva and S. Dignum. Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP. In Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, volume 5481 of LNCS, pages 159--170, Tuebingen, Apr. 15-17 2009. Springer.
[14]
S. Silva and L. Vanneschi. Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1115--1122, Montreal, 8-12 July 2009. ACM.
[15]
G. F. Smits and M. Kotanchek. Pareto-front exploitation in symbolic regression. In Genetic Programming in Theory and Practice II, pages 283--299. Springer, 2005.
[16]
C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72--101, 1904.
[17]
L. Vanneschi, M. Castelli, and S. Silva. Measuring bloat, overfitting and functional complexity in genetic programming. In Proc. GECCO'10, pages 877--884, July 7-11 2010.
[18]
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1996.
[19]
E. J. Vladislavleva, G. F. Smits, and D. den Hertog. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation, 13(2):333--349, 2009.
[20]
S. Wagner. Heuristic Optimization Software Systems - Modeling of Heuristic Optimization Algorithms in the HeuristicLab Software Environment. PhD thesis, Institute for Formal Models and Verification, Johannes Kepler University, Linz, Austria, 2009.
[21]
S. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis. Genetic Programming and Evolvable Machines, 10(2):111--140, 2009.

Cited By

View all
  • (2024)Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate OverfittingIEEE Access10.1109/ACCESS.2024.340254312(70676-70689)Online publication date: 2024
  • (2023)A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledgeScientific Reports10.1038/s41598-023-28328-213:1Online publication date: 23-Jan-2023
  • (2023)Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimizationGenetic Programming and Evolvable Machines10.1007/s10710-023-09461-324:2Online publication date: 20-Oct-2023
  • Show More Cited By

Index Terms

  1. Overfitting detection and adaptive covariant parsimony pressure for symbolic regression

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
    July 2011
    1548 pages
    ISBN:9781450306904
    DOI:10.1145/2001858
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 July 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. overfitting
    2. parsimony pressure
    3. symbolic regression

    Qualifiers

    • Tutorial

    Conference

    GECCO '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate OverfittingIEEE Access10.1109/ACCESS.2024.340254312(70676-70689)Online publication date: 2024
    • (2023)A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledgeScientific Reports10.1038/s41598-023-28328-213:1Online publication date: 23-Jan-2023
    • (2023)Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimizationGenetic Programming and Evolvable Machines10.1007/s10710-023-09461-324:2Online publication date: 20-Oct-2023
    • (2022)Machine learning-enabled self-consistent parametrically-upscaled crystal plasticity model for Ni-based superalloysComputer Methods in Applied Mechanics and Engineering10.1016/j.cma.2022.115384402(115384)Online publication date: Dec-2022
    • (2021)Soft target and functional complexity reduction: A hybrid regularization method for genetic programmingExpert Systems with Applications10.1016/j.eswa.2021.114929177(114929)Online publication date: Sep-2021
    • (2020)Robustness And Overfitting Behavior Of Implicit Background Models2020 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP40778.2020.9191361(3274-3278)Online publication date: Oct-2020
    • (2020)Machine Learning-Aided Parametrically Homogenized Crystal Plasticity Model (PHCPM) for Single Crystal Ni-Based SuperalloysJOM10.1007/s11837-020-04344-9Online publication date: 16-Sep-2020
    • (2019)A Survey of Statistical Machine Learning Elements in Genetic ProgrammingIEEE Transactions on Evolutionary Computation10.1109/TEVC.2019.290091623:6(1029-1048)Online publication date: Dec-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media