Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Controlling code growth by dynamically shaping the genotype size distribution

Published: 01 December 2015 Publication History

Abstract

Genetic programming is a hyperheuristic optimization approach that seeks to evolve various forms of symbolic computer programs, in order to solve a wide range of problems. However, the approach can be severely hindered by a significant computational burden and stagnation of the evolution caused by uncontrolled code growth. This paper introduces HARM-GP, a novel operator equalization method that conducts an adaptive shaping of the genotype size distribution of individuals in order to effectively control code growth. Its probabilistic nature minimizes the computational overheads on the evolutionary process while its generic formulation allows it to remain independent of both the problem and the genetic variation operators. Comparative results over twelve problems with different dynamics, and over nine other algorithms taken from the literature, show that HARM-GP is excellent at controlling code growth while maintaining good overall performance. Results also demonstrate the effectiveness of HARM-GP at limiting overfitting in real-world supervised learning problems.

References

[1]
E. Alfaro-Cid, A. Esparcia-Alcázar, K. Sharman, F. Fernandez de Vega, J. J. Merelo, Prune and plant: a new bloat control method for genetic programming. in Proceedings of the International Conference on Hybrid Intelligent Systems (HIS), pp. 31---35 (2008)
[2]
E. Alfaro-Cid, J.J. Merelo, F. Fernandez de Vega, A. Esparcia-Alcazar, K. Sharman, Bloat control operators and diversity in genetic programming: a comparative study. Evol. Comput. 18(2), 305---332 (2010)
[3]
E. Alpaydin, Introduction to Machine Learning (MIT Press, Cambridge, 2004)
[4]
N. Amil, N. Bredeche, C. Gagné, S. Gelly, M. Schoenauer, O. Teytaud, A statistical learning perspective of genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 327---338 (2009)
[5]
J. Bacardit, M. Stout, N. Krasnogor, J. D. Hirst, J. Blazewicz, Coordination number prediction using learning classifier systems: performance and interpretability. in Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 247---254 (2006)
[6]
W. Banzhaf, W.B. Langdon, Some considerations on the reason for bloat. Genet. Program. Evolvable Mach. 3(1), 81---91 (2002)
[7]
W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming: An Introduction (Morgan Kaufmann, Los Altos, 1997)
[8]
S. Bleuler, M. Brack, L. Thiele, E. Zitzler, Multiobjective genetic programming: reducing bloat using SPEA2. in Proceedings of the Congress on evolutionary computation (CEC), 1, pp. 536---543 (2001)
[9]
T. Blickle, L. Thiele, Genetic programming and redundancy. in Genetic Algorithms within the Framework of Computation (Workshop at KI-94) (1994)
[10]
R. Bock, A. Chilingarian, M. Gaug, F. Hakl, T. Hengstebeck, M. Jiřina, J. Klaschka, E. Kotră¿, P. Savick¿, S. Towers et al., Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 516(2), 511---528 (2004)
[11]
M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17---26 (2001)
[12]
E.K. Burke, M. Hyde, G. Kendall, J. Woodward, A genetic programming hyper-heuristic approach for evolving 2-D strip packing heuristics. IEEE Trans. Evol. Comput. 14(6), 942---958 (2010)
[13]
S. Dignum, R. Poli, Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1588---1595 (2007)
[14]
S. Dignum, R. Poli, Crossover, sampling, bloat and the harmful effects of size limits. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 158---169 (2008)
[15]
S. Dignum, R. Poli, Operator equalisation and bloat free GP. in Proceedings of the European conference on Genetic Programming (EuroGP), pp. 110---121 (2008)
[16]
K. Dimitrios, K. Aigli, T. Konstantinos, L. Spiros, T. Athanasios, M. Seferina, Where we stand, where we are moving: surveying computational techniques for identifying miRNA genes and uncovering their regulatory role. J. Biomed. Inform. 46(3), 563---573 (2013)
[17]
P.G. Espejo, S. Ventura, F. Herrera, A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121---144 (2010)
[18]
J. Fitzgerald, R. Azad, C. Ryan, Bootstrapping to reduce bloat and improve generalisation in genetic programming. in Companion Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 141---142 (2013)
[19]
F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171---2175 (2012)
[20]
C. Gagné, M. Parizeau, Genericity in evolutionary computation software tools: principles and case study. Int. J. Artif. Intel. Tools 15(2), 173---194 (2006)
[21]
S. Gelly, O. Teytaud, N. Bredeche, M. Schoenauer, A statistical learning theory approach of bloat. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1783---1784 (2005)
[22]
L. Guo, D. Rivero, J. Dorado, C.R. Munteanu, A. Pazos, Automatic feature extraction using genetic programming: an application to epileptic EEG classification. Expert Systems Appl. 38(8), 10425---10436 (2011)
[23]
K. Harries, P. Smith, Code growth, explicitly defined introns and alternative selection schemes. Evol. Comput. 6(4), 346---364 (1998)
[24]
M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 70---82 (2003)
[25]
K. Kinnear Jr., Evolving a sort: lessons in genetic programming. in Proceedings of the IEEE International Conference on Neural Networks (ICNN), pp. 881---888 (1993)
[26]
D. Kinzett, M. Johnston, M. Zhang, Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evol. Intell. 2(4), 151---168 (2009)
[27]
A. Kordon, G. Smits, E. Jordaan, E. Rightor, Robust soft sensors based on integration of genetic programming, analytical neural networks, and support vector machines. in Proceedings of the IEEE International Conference on E-Commerce Technology, 1 (2002)
[28]
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)
[29]
J.R. Koza, Human-competitive results produced by genetic programming. Genet. Program. Evolv. Mach. 11(3---4), 251 (2010)
[30]
W. Langdon, R. Poli, Fitness causes bloat. in Soft Computing in Engineering Design and Manufacturing, (Springer, London, 1998), pp. 13---22
[31]
W. Langdon, T. Soule, R. Poli, J. Foster. The evolution of size and shape. in Advances in Genetic Programming III, chapter 8 (MIT Press, 1999) pp. 163---190
[32]
W.B. Langdon, R. Poli, Foundations of Genetic Programming (Springer, Berlin, 2002)
[33]
S. M. Lee, D. S. Kim, J. H. Kim, J. S. Park, Spam detection using feature selection and parameters optimization. in Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), (Washington, DC, USA, 2010) pp. 883---888
[34]
S. Luke, L. Panait, Fighting bloat with nonparametric parsimony pressure. in Proceedings of Parallel Problem Solving from Nature (PPSN), pp. 411---421 (2002)
[35]
S. Luke, L. Panait, Lexicographic parsimony pressure. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 829---836 (2002)
[36]
S. Luke, L. Panait, A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309---344 (2006)
[37]
J. F. Miller, P. Thomson, Cartesian genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 121---132 (2000)
[38]
P. Nordin, W. Banzhaf et al., Complexity compression and evolution. in Proceedings of the International Conference on Genetic Algorithms (ICGA), pp. 310---317 (1995)
[39]
M. O'Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program. Evolv. Mach. 11(3---4), 339---363 (2010)
[40]
L. Pagie, P. Hogeweg, Evolutionary consequences of coevolving targets. Evol. Comput. 5(4), 401---418 (1997)
[41]
L. Panait, S. Luke, Alternative bloat control methods. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 630---641 (2004)
[42]
R. Poli, A simple but theoretically-motivated method to control bloat in genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 204---217 (2003)
[43]
R. Poli, Covariant tarpeian method for bloat control in genetic programming. in Genetic Programming Theory and Practice VIII, (Springer, 2011), pp. 71---89
[44]
R. Poli, W. B. Langdon, S. Dignum, On the limiting distribution of program sizes in tree-based genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 193---204 (2007)
[45]
R. Poli, W. B. Langdon, N. F. McPhee, A field guide to genetic programming. freely. http://www.gp-field-guide.org.uk (2008)
[46]
R. Poli, N. F. McPhee, Exact schema theorems for gp with one-point and standard crossover operating on linear structures and their application to the study of the evolution of size. in Proceedings of the European Conference on Genetic Programming (EuroGP) (2001)
[47]
R. Poli, N.F. McPhee, General schema theory for genetic programming with subtree-swapping crossover: Part II. Evol. Comput. 11(2), 169---206 (2003)
[48]
R. Poli, N. F. McPhee, Parsimony pressure made easy. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) pp. 1267---1274 (2008)
[49]
R.Y. Rubinstein, D.P. Kroese, Simulation and the Monte Carlo method (Wiley, New York, 2008)
[50]
S. Silva, Reassembling operator equalisation: a secret revealed. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1395---1402 (2011)
[51]
S. Silva, E. Costa, Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet. Program. Evolv. Mach. 10(2), 141---179 (2009)
[52]
S. Silva, S. Dignum, Extending operator equalisation: fitness based self adaptive length distribution for bloat free GP. in Proc. of the European Conference on Genetic Programming (EuroGP), pp. 159---170 (2009)
[53]
S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program. Evolv. Mach. 13(2), 197---238 (2011)
[54]
S. Silva, L. Vanneschi, Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction. in Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 1115---1122 (2009)
[55]
S. Silva, L. Vanneschi, The importance of being flat--studying the program length distributions of operator equalisation. in Genetic Programming Theory and Practice IX, (Springer, 2011), pp. 211---233
[56]
T. Soule, J.A. Foster, Effects of code growth and parsimony pressure on populations in genetic programming. Evolut. Comput. 6(4), 293---309 (1998)
[57]
T. Soule, R.B. Heckendorn, An analysis of the causes of code growth in genetic programming. Genet. Program. Evolv. Machin. 3(3), 283---309 (2002)
[58]
W. Tackett, Recombination, selection, and the genetic construction of computer programs. PhD thesis, (University of Southern California, 1994)
[59]
A. Teller, M. Veloso, Program evolution for data mining. Int. J. Expert Syst. Res. Appl. 8, 213---236 (1995)
[60]
M. Tomassini, L. Vanneschi, J. Cuendet, F. Fernández, A new technique for dynamic size populations in genetic programming. in Proceedings of the Congress on Evolutionary Computation (CEC), 1, pp. 486---493 (2004)
[61]
L. Trujillo, E. Naredo, Y. Martínez, Preliminary study of bloat in genetic programming with behavior-based search. in EVOLVE-A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation IV, (Springer, 2013) pp. 293---305
[62]
L. Trujillo, S. Silva, P. Legrand, L. Vanneschi, An empirical study of functional complexity as an indicator of overfitting in genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 262---273 (2011)
[63]
N.Q. Uy, N.X. Hoai, M. O'Neill, R.I. McKay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolv. Machin. 12(2), 91---119 (2011)
[64]
L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 877---884 (2010)
[65]
E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evolut. Comput. 13(2), 333---349 (2009)
[66]
P.A. Whigham, G. Dick, Implicitly controlling bloat in genetic programming. IEEE Trans. Evolut. Comput. 14(2), 173---190 (2010)
[67]
D.R. White, J. McDermott, M. Castelli, L. Manzoni, B.W. Goldman, G. Kronberger, W. Jaśkowski, U.-M. O'Reilly, S. Luke, Better GP benchmarks: community survey results and proposals. Genetic Program. Evolv. Mach. 14(1), 3---29 (2013)
[68]
L. Wilkinson, A. Anand, D. N. Tuan, CHIRP: a new classifier based on composite hypercubes on iterated random projections. in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp. 6---14 (2011)
[69]
J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, D. Ghosh, W.P. Worzel, A.M. Chinnaiyan, Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292 (2007)
[70]
M. Zhang, P. Wong, Genetic programming for medical classification: a program simplification approach. Genetic Program. Evolv. Mach. 9(3), 229---255 (2008)

Cited By

View all
  • (2018)Semantics Based Substituting Technique for Reducing Code Bloat in Genetic ProgrammingProceedings of the 9th International Symposium on Information and Communication Technology10.1145/3287921.3287948(77-83)Online publication date: 6-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Genetic Programming and Evolvable Machines
Genetic Programming and Evolvable Machines  Volume 16, Issue 4
December 2015
162 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2015

Author Tags

  1. Bloat control
  2. Genetic programming
  3. Monte Carlo methods

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Semantics Based Substituting Technique for Reducing Code Bloat in Genetic ProgrammingProceedings of the 9th International Symposium on Information and Communication Technology10.1145/3287921.3287948(77-83)Online publication date: 6-Dec-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media