Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-642-29139-5_19guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Random sampling technique for overfitting control in genetic programming

Published: 11 April 2012 Publication History

Abstract

One of the areas of Genetic Programming (GP) that, in comparison to other Machine Learning methods, has seen fewer research efforts is that of generalization. Generalization is the ability of a solution to perform well on unseen cases. It is one of the most important goals of any Machine Learning method, although in GP only recently has this issue started to receive more attention. In this work we perform a comparative analysis of a particularly interesting configuration of the Random Sampling Technique (RST) against the Standard GP approach. Experiments are conducted on three multidimensional symbolic regression real world datasets, the first two on the pharmacokinetics domain and the third one on the forestry domain. The results show that the RST decreases overfitting on all datasets. This technique also improves testing fitness on two of the three datasets. Furthermore, it does so while producing considerably smaller and less complex solutions. We discuss the possible reasons for the good performance of the RST, as well as its possible limitations.

References

[1]
Poli, R., Langdon, W. B., McPhee, N. F.: A field guide to genetic programming (With contributions by J. R. Koza) (2008), http://lulu.com, http://www.gp-field-guide.org.uk
[2]
O'Neill, M., Vanneschi, L., Gustafson, S., Banzhaf, W.: Open Issues in Genetic Programming. Genetic Programming and Evolvable Machines 11, 339-363 (2010)
[3]
Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press (1992)
[4]
Kushchu, I.: An Evaluation of Evolutionary Generalisation in Genetic Programming. Artificial Intelligence Review 18, 3-14 (2002)
[5]
Silva, S., Costa, E.: Dynamic Limits for Bloat Control in Genetic Programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10(2), 141-179 (2009)
[6]
Vanneschi, L., Silva, S.: Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming. In: Lopes, L. S., Lau, N., Mariano, P., Rocha, L. M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 65-76. Springer, Heidelberg (2009)
[7]
Becker, L. A., Seshadri, M.: Comprehensibility and Overfitting Avoidance in Genetic Programming for Technical Trading Rules. Technical report, Worcester Polytechnic Institute (2003)
[8]
Mahler, S., Robilliard, D., Fonlupt, C.: Tarpeian Bloat Control and Generalization Accuracy. In: Keijzer, M., Tettamanzi, A. G. B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 203-214. Springer, Heidelberg (2005)
[9]
Gagné, C., Schoenauer, M., Parizeau, M., Tomassini, M.: Genetic Programming, Validation Sets, and Parsimony Pressure. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 109-120. Springer, Heidelberg (2006)
[10]
Cavaretta, M. J., Chellapilla, K.: Data Mining using Genetic Programming: The implications of parsimony on generalization error. In: Proceedings of the 1999 IEEE Congress on Evolutionary Computation, pp. 1330-1337. IEEE Press (1999)
[11]
Zhang, B.-T., Mühlenbein, H.: Balancing Accuracy and Parsimony in Genetic Programming. Evolutionary Computation 3(1), 17-38 (1995)
[12]
Vladislavleva, E. J., Smits, G. F., den Hertog, D.:Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming. IEEE Transactions on Evolutionary Computation 13(2), 333-349 (2009)
[13]
Vanneschi, L., Castelli, M., Silva, S.: Measuring Bloat, Overfitting and Functional Complexity in Genetic Programming. In: Proceedings of GECCO 2010, pp. 877-884. ACM Press (2010)
[14]
Trujillo, L., Silva, S., Legrand, P., Vanneschi, L.: An Empirical Study of Functional Complexity as an Indicator of Overfitting in Genetic Programming. In: Silva, S., Foster, J. A., Nicolau, M., Machado, P., Giacobini, M. (eds.) EuroGP 2011. LNCS, vol. 6621, pp. 262-273. Springer, Heidelberg (2011)
[15]
Nguyen, Q. U., Nguyen, T. H., Nguyen, X. H., O'Neill, M.: Improving the Generalisation Ability of Genetic Programming with Semantic Similarity based Crossover. In: Esparcia-Alcázar, A. I., Ekárt, A., Silva, S., Dignum, S., Uyar, A. S. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 184-195. Springer, Heidelberg (2010)
[16]
Vanneschi, L., Gustafson, S.: Using Crossover Based Similarity Measure to Improve Genetic Programming Generalization Ability. In: Proceedings of GECCO 2009, pp. 1139-1146. ACM Press (2009)
[17]
Da Costa, L. E., Landry, J.-A.: Relaxed Genetic Programming. In: Proceedings of GECCO 2006, pp. 937-938. ACM Press (2006)
[18]
Chan, K.Y., Kwong, C. K., Chang, E.: Reducing Overfitting in Manufacturing Process Modeling using a Backward Elimination Based Genetic Programming. Applied Soft Computing 11(2), 1648-1656 (2011)
[19]
Nikolaev, N., deMenezes, L. M., Iba, H.: Overfitting Avoidance in Genetic Programming of Polynomials. In: Proceedings of the 2002 IEEE Congress on Evolutionary Computation, pp. 1209-1214. IEEE Press (2002)
[20]
Chen, S.-H., Kuo, T.-W.: Overfitting or Poor Learning: A Critique of Current Financial Applications of GP. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E. P. K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 34-46. Springer, Heidelberg (2003)
[21]
Foreman, N., Evett, M.: Preventing overfitting in GP with canary functions. In: Proceedings of GECCO 2005, pp. 1779-1780. ACM Press (2005)
[22]
Vanneschi, L., Rochat, D., Tomassini, M.: Multi-optimization improves genetic programming generalization ability. In: Proceedings of GECCO 2007, p. 1759. ACM Press (2007)
[23]
Robilliard, D., Fonlupt, C.: Backwarding: An Overfitting Control for Genetic Programming in a Remote Sensing Application. In: Collet, P., Fonlupt, C., Hao, J.-K., Lutton, E., Schoenauer, M. (eds.) EA 2001. LNCS, vol. 2310, pp. 245-254. Springer, Heidelberg (2002)
[24]
Banzhaf, W., Francone, F. D., Nordin, P.: The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming using Sparse Data Sets. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 300-309. Springer, Heidelberg (1996)
[25]
Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8(4), 17-26 (2007)
[26]
Baccini, A., Laporte, N., Goetz, S. J., Sun, M., Dong, H.: A first map of tropical Africa's above-ground biomass derived from satellite imagery. Environmental Research Letters 3, 045011 (2008)
[27]
Lucas, R., Armston, J., Fairfax, R., Fensham, R., Accad, A., Carreiras, J., Kelley, J., Bunting, P., Clewley, D., Bray, S., Metcalfe, D., Dwyer, J., Bowen, M., Eyre, T., Laidlaw, M., Shimada, M.: An Evaluation of the ALOS PALSAR L-Band Backscatter-Above Ground Biomass Relationship Queensland, Australia: Impacts of Surface Moisture Condition and Vegetation Structure. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 3(4), 576-593 (2010)
[28]
Saatchi, S. S., Harris, N. L., Brown, S., Lefsky, M., Mitchard, E. T. A., Salas, W., Zutta, B. R., Buermann, W., Lewis, S. L., Hagen, S., Petrova, S., White, L., Silman, M., Morel, A.: Benchmark map of forest carbon stocks in tropical regions across three continents. Proceedings of the National Academy of Sciences 108(24), 9899-9904 (2011)
[29]
Gathercole, C., Ross, P.: Dynamic Training Subset Selection for Supervised Learning in Genetic Programming. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312-321. Springer, Heidelberg (1994)
[30]
Liu, Y., Khoshgoftaar, T.: ReducingOverfitting in Genetic ProgrammingModels for Software Quality Classification. In: Proceedings of the Eighth IEEE International Symposium on High Assurance Systems Engineering, pp. 56-65. IEEE Press (2004)
[31]
Gonçalves, I., Silva, S.: Experiments on Controlling Overfitting in Genetic Programming. In: 15th Portuguese Conference on Artificial Intelligence (to appear)
[32]
Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of GECCO 2002, pp. 829-836. Morgan Kaufmann (2002)

Cited By

View all
  • (2024)A Comprehensive Analysis of Down-sampling for Genetic Programming-based Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654134(487-490)Online publication date: 14-Jul-2024
  • (2022)Subsampling for partial least-squares regression via an influence functionKnowledge-Based Systems10.1016/j.knosys.2022.108661245:COnline publication date: 7-Jun-2022
  • (2020)Refined typed genetic programming as a user interface for genetic programmingProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion10.1145/3377929.3390042(251-252)Online publication date: 8-Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
EuroGP'12: Proceedings of the 15th European conference on Genetic Programming
April 2012
278 pages
ISBN:9783642291388
  • Editors:
  • Alberto Moraglio,
  • Sara Silva,
  • Krzysztof Krawiec,
  • Penousal Machado,
  • Carlos Cotta

Sponsors

  • University of Malaga

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 11 April 2012

Author Tags

  1. generalization
  2. genetic programming
  3. overfitting

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Comprehensive Analysis of Down-sampling for Genetic Programming-based Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654134(487-490)Online publication date: 14-Jul-2024
  • (2022)Subsampling for partial least-squares regression via an influence functionKnowledge-Based Systems10.1016/j.knosys.2022.108661245:COnline publication date: 7-Jun-2022
  • (2020)Refined typed genetic programming as a user interface for genetic programmingProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion10.1145/3377929.3390042(251-252)Online publication date: 8-Jul-2020
  • (2019)Random subsampling improves performance in lexicase selectionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3319619.3326900(2028-2031)Online publication date: 13-Jul-2019
  • (2018)Neural estimation of interaction outcomesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3205455.3205600(1055-1062)Online publication date: 2-Jul-2018
  • (2017)Unsure when to stop?Proceedings of the Genetic and Evolutionary Computation Conference10.1145/3071178.3071328(929-936)Online publication date: 1-Jul-2017
  • (2016)Non-negative Matrix Factorization for Unsupervised Derivation of Search Objectives in Genetic ProgrammingProceedings of the Genetic and Evolutionary Computation Conference 201610.1145/2908812.2908888(749-756)Online publication date: 20-Jul-2016
  • (2015)Model Selection and Overfitting in Genetic ProgrammingProceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation10.1145/2739482.2764678(1527-1528)Online publication date: 11-Jul-2015
  • (2014)Genetic programming with data migration for symbolic regressionProceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation10.1145/2598394.2609857(1361-1366)Online publication date: 12-Jul-2014
  • (2014)An Improved Multi-Gene Genetic Programming Approach for the Evolution of Generalized Model in Modelling of Rapid Prototyping ProcessProceedings, Part I, of the 27th International Conference on Modern Advances in Applied Intelligence - Volume 848110.1007/978-3-319-07455-9_23(218-226)Online publication date: 3-Jun-2014
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media