Abstract
Although Gaussian process regression (GPR) is a powerful Bayesian nonparametric regression model for engineering problems, its predictive performance is highly dependent on a kernel for covariance function of GPR. However, choosing a proper kernel is still challenging even for experts. To choose proper kernel automatically, this study proposes a compositional kernel (CPK) learning using tree-based genetic programming (GEP). The optimal structure of the kernel is defined as a compositional representation based on sums and products of eight base-kernels. The CPK can be encoded as a tree-structure, so that tree-based GEP is employed to discover an optimal tree-structure of the CPK. To avoid overly complex solution in GEP, the proposed method introduced a dynamic maximum tree-depth technique. The novelty of the proposed method is to utilize more flexible and efficient learning capability to learn the relationship between input and output than existing methods. To evaluate the learning capability of the proposed method, seven test functions were firstly investigated with various noise levels, and its predictive accuracy was compared with existing methods. Reliability problems in both parallel and series systems were introduced to evaluate the performance of the proposed method for efficient reliability assessment. The results show that the proposed method generally outperforms or performs similarly to the best one among existing methods. In addition, it is also shown that proper kernel function can significantly improve the performance of GPR as the training data increases. Stated differently, the proposed method can learn the function of being fitted efficiently with less training samples than existing methods. In this context, the proposed method can make powerful and automatic predictive modeling based on GPR in engineering problems.
Similar content being viewed by others
References
Au S-K, Wang Y (2014) Engineering risk assessment with subset simulation. Wiley, Singapore
Babaee H, Perdikaris P, Chryssostomidis C, Karniadakis GE (2016) Multi-fidelity modelling of mixed convection based on experimental correlations and numerical simulations. J Fluid Mech 809:895–917. https://doi.org/10.1017/jfm.2016.718
Ben-Ari EN, Steinberg DM (2007) Modeling data from computer experiments: an empirical comparison of kriging with MARS and projection pursuit regression. Qual Eng 19:327–338. https://doi.org/10.1080/08982110701580930
Bichon BJ, McFarland JM, Mahadevan S (2011) Efficient surrogate models for reliability analysis of systems with multiple failure modes. Reliab Eng Syst Saf 96:1386–1395. https://doi.org/10.1016/j.ress.2011.05.008
Canas LS et al. (2018) Gaussian processes with optimal kernel construction for neuro-degenerative clinical onset prediction vol 10575. SPIE Medical Imaging. SPIE
Chiles J-P, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley series in probability and statistics applied probability and statistics section. Wiley, New York
Costabal FS, Matsuno K, Yao J, Perdikaris P, Kuhl E (2019) Machine learning in drug development: characterizing the effect of 30 drugs on the QT interval using Gaussian process regression, sensitivity analysis, and uncertainty quantification. Comput Methods Appl Mech Eng 348:313–333. https://doi.org/10.1016/j.cma.2019.01.033
Dette H, Pepelyshev A (2010) Generalized Latin hypercube design for computer experiments. Technometrics 52:421–429. https://doi.org/10.1198/Tech.2010.09157
DiazDelaO FA, Adhikari S (2011) Gaussian process emulators for the stochastic finite element method. Int J Numer Methods Eng 87:521–540. https://doi.org/10.1002/nme.3116
Diosan L, Rogozan A, Pecuchet JP (2007) Evolving kernel functions for SVMs by genetic programming. In: Sixth international conference on machine learning and applications (ICMLA 2007), 13–15 Dec. 2007. pp 19–24. https://doi.org/10.1109/ICMLA.2007.70
Duvenaud D, Lloyd J, Grosse R, Tenenbaum J, Zoubin G (2013) Structure discovery in nonparametric regression through compositional kernel search. Paper presented at the proceedings of the 30th international conference on machine learning, proceedings of machine learning research
Forrester AIJ, Sóbester AS, Keane AJ (2008) Engineering design via surrogate modelling: a practical guide. Wiley, Chichester
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67. https://doi.org/10.1214/aos/1176347963
Ginsbourger D, Durrande N, Roustant O (2013) Kernels and designs for modelling invariant functions: from group invariance to additivity. In, Heidelberg. mODa 10—advances in model-oriented design and analysis. Springer International Publishing, pp 107–115
Guo MW, Hesthaven JS (2018) Reduced order modeling for nonlinear structural analysis using Gaussian process regression. Comput Methods Appl Mech Eng 341:807–826. https://doi.org/10.1016/j.cma.2018.07.017
Hamdaoui M, Le Quilliec G, Breitkopf P, Villon P (2014) POD surrogates for real-time multi-parametric sheet metal forming problems. Int J Mater Form 7:337–358. https://doi.org/10.1007/s12289-013-1132-0
Hamdaoui M, Oujebbour F-Z, Habbal A, Breitkopf P, Villon P (2015) Kriging surrogates for evolutionary multi-objective optimization of CPU intensive sheet metal forming applications. Int J Mater Form 8:469–480. https://doi.org/10.1007/s12289-014-1190-y
Harper WV, Gupta SK (1983) Sensitivity/uncertainty analysis of a borehole scenario comparing Latin hypercube sampling and deterministic sensitivity approaches. Battelle Memorial Inst., Columbus, OH (USA). Office of Nuclear Waste Isolation,
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press
Hwang Y, Tong A, Choi J (2016) Automatic construction of nonparametric relational regression models for multiple time series. Paper presented at the proceedings of the 33rd international conference on machine learning, proceedings of machine learning research,
Jin SS, Jung HJ (2016) Self-adaptive sampling for sequential surrogate modeling of time-consuming finite element analysis. Smart Struct Syst 17:611–629
Kersaudy P, Sudret B, Varsier N, Picon O, Wiart J (2015) A new surrogate modeling technique combining kriging and polynomial chaos expansions—application to uncertainty analysis in computational dosimetry. J Comput Phys 286:103–117. https://doi.org/10.1016/j.jcp.2015.01.034
Kleijnen JPC, van Beers WCM (2004) Application-driven sequential designs for simulation experiments: kriging metamodelling. J Oper Res Soc 55:876–883. https://doi.org/10.1057/palgrave.jors.2601747
Klenske ED, Zeilinger MN, Schölkopf B, Hennig P (2013) Nonparametric dynamics estimation for time periodic systems. In: 2013 51st annual allerton conference on communication, control, and computing (Allerton), 2–4 Oct. 2013. pp 486–493. https://doi.org/10.1109/Allerton.2013.6736564
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. Complex adaptive systems. MIT Press, Cambridge
Kronberger G, Kommenda M (2013) Evolution of covariance functions for gaussian process regression using genetic programming. In: Computer aided systems theory—EUROCAST 2013. Springer, Berlin, pp 308–315
Lawrence N (2005) Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J Mach Learn Res 6:1783–1816
Lee I, Choi KK, Gorsich D (2010) System reliability-based design optimization using the MPP-based dimension reduction method. Struct Multidiscip Optim 41:823–839. https://doi.org/10.1007/s00158-009-0459-0
Lloyd JR, Duvenaud D, Grosse R, Tenenbaum JB, Ghahramani Z (2014) Automatic construction and natural-language description of nonparametric regression models. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2894066th edn. AAAI Press, Quebec City, pp 1242–1250
Madár J, Abonyi J, Szeifert F (2005) Genetic programming for the identification of nonlinear input−output models. Ind Eng Chem Res 44:3178–3186. https://doi.org/10.1021/ie049626e
Menezes R, Garcia-Soidán P, Febrero-Bande M (2005) A comparison of approaches for valid variogram achievement. Comput Stat 20:623–642. https://doi.org/10.1007/bf02741319
Moon H, Dean AM, Santner TJ (2012) Two-stage sensitivity-based group screening in computer experiments. Technometrics 54:376–387. https://doi.org/10.1080/00401706.2012.725994
Morris MD, Mitchell TJ, Ylvisaker D (1993) Bayesian design and analysis of computer experiments: use of derivatives in surface prediction. Technometrics 35:243–255. https://doi.org/10.1080/00401706.1993.10485320
Perdikaris P, Raissi M, Damianou A, Lawrence ND, Karniadakis GE (2017) Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling. P Roy Soc A-Math Phy 473:2016075135. https://doi.org/10.1098/Rspa.2016.0751
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48:607–626. https://doi.org/10.1007/s00158-013-0919-4
Quionero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6:1939–1959
Raissi M, Perdikaris P, Karniadakis GE (2017) Inferring solutions of differential equations using noisy multi-fidelity data. J Comput Phys 335:736–746. https://doi.org/10.1016/j.jcp.2017.01.060
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge
Schulz E, Speekenbrink M, Krause A (2018) A tutorial on Gaussian process regression: modelling, exploring, and exploiting functions. J Math Psychol 85:1–16. https://doi.org/10.1016/j.jmp.2018.03.001
Silva S, Almeida J (2003) Dynamic maximum tree depth—a simple technique for avoiding bloat in tree-based GP. Lect Notes Comput Sci 2724:1776–1787
Smith N, Mahadevan S (2005) Integrating system-level and component-level designs under uncertainty. J Spacecr Rocket 42:752–760. https://doi.org/10.2514/1.6662
Snelson E, Ghahramani Z (2005) Sparse Gaussian processes using pseudo-inputs. Paper presented at the proceedings of the 18th international conference on neural information processing systems, Vancouver, British Columbia, Canada,
Soule T, Foster JA (1998) Effects of code growth and parsimony pressure on populations in genetic programming. Evol Comput 6:293–309. https://doi.org/10.1162/evco.1998.6.4.293
Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New York. https://doi.org/10.1007/978-1-4612-1494-6
Ugray Z, Lasdon L, Plummer J, Glover F, Kelly J, Martí R (2007) Scatter search and local NLP solvers: a multistart framework for global optimization. INFORMS J Comput 19:313–484. https://doi.org/10.1287/ijoc.1060.0175
Ward EJ (2008) A review and comparison of four commonly used Bayesian and maximum likelihood model selection tools. Ecol Model 211:1–10. https://doi.org/10.1016/j.ecolmodel.2007.10.030
Welch WJ, Robert JB, Sacks J, Wynn HP, Mitchell TJ, Morris MD (1992) Screening, predicting, and computer experiments. Technometrics 34:15–25. https://doi.org/10.2307/1269548
William E, Northern J (2008) Genetic programming lab (GPLab) tool set version 3.0. In: 2008 IEEE Region 5 Conference, 17–20 April 2008. pp 1–6. https://doi.org/10.1109/TPSD.2008.4562729
Wu B, Zhang W-Q, Chen L, Liang J-H. (2010) A GP-based kernel construction and optimization method for RVM. In: 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE), 26–28 Feb. 2010. pp 419–423. https://doi.org/10.1109/ICCAE.2010.5451646
Replication of results
The codes are available from the following repository of GITHUB: https://github.com/seungsab/CPKL_using_Tree-GEP.
Funding
This research was supported by a grant from a Strategic Research Project (Smart Monitoring System for Concrete Structures Using FRP Nerve Sensor) funded by the Korea Institute of Civil Engineering and Building Technology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflict of interest.
Additional information
Responsible Editor: Mehmet Polat Saka
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1. Base kernel for covariance function
The distance-based similarity between function values is fundamental for the nonparametric modeling in regression. The properties of the similarity are encoded by the kernel. Therefore, the kernel for covariance function is a crucial ingredient in GPR, since it encodes our prior assumptions about the function which we wish to learn. This appendix briefly reviews eight kernels for richer representation to examine their properties. Hereafter, these kernels are referred to as base-kernels. Their properties and expressible structures are shown in Fig. 24.
1.1 Squared exponential kernel (SE)
SE is the most popular kernel for the covariance function of GPR. Since the SE kernel is infinitely differentiable, it encodes prior assumption about the strongly smooth function. It is also known as the “radial basis” or “Gaussian correlation” kernel and this kernel is given by
where σ2 and l are the scale factor and length scale for each input, respectively; σ2 determines the average distance away from its mean, while l defines the characteristics of the correlation length scale for each input (length of the wiggle in the functions).
1.2 Rational quadratic kernel (RQ)
RQ is equivalent to adding many SE kernels with different length scales. The RQ kernel encodes the function of varying smoothly across many length scales to represent smoothness with multiscale variation. This kernel is given by
where σ2, l, and α are the scale factor, the correlation length scale for each input, and the shape parameter for the RQ kernel, respectively; σ2 and l play similar roles in those of the SE kernel; and α adjusts multiscale variation (i.e., relative weighting of large-scale and small-scale variations).
1.3 Matérn kernel (ME)
ME is a generalization of the SE kernel with an additional hyperparameter (υ) of controlling the smoothness of the function (i.e., roughness). Stein (1999) recommended the ME kernel as an alternative of the SE kernel, since the smoothness assumption of the SE kernel is unrealistic for modeling physical processes. Matérn kernels with υ = 3/2 and υ = 5/2 are most popular for the GPR, and they are described as
and
where r = x − x′; σ2 and l play similar roles in those of the SE kernel.
1.4 Constant kernel (CON)
CON can be used as the part of a product-kernel to scale the magnitude of the other kernel. The CON kernel is also used as part of a sum-kernel to represent the mean of GPR. It is given by
where σ2 can be used as both scale factor for part of a product-kernel or mean of GPR as the sum-kernel.
1.5 Linear kernel (LIN)
LIN is nonstationary, which means that it depends on the absolute location of inputs. The nonstationarity of the LIN kernel can model the polynomial trend as part of the product-kernel. Therefore, the LIN kernel is used to encode the trend in the function. This kernel is defined by
where \( {\sigma}_b^2 \) and \( {\sigma}_v^2 \) denote the bias and slope coefficient for each input.
1.6 Periodic kernel (PER)
PER is a stationary periodic function to encode the repeating patterns in the function. This kernel is given by
where σ2 and l play similar roles in those of the SE kernel, and p is the period to simply determine the distance between repetition of the function.
1.7 White noise kernel (WN)
WN encodes the uncorrelated noise in functions. The WN kernel can be modeled by a limit of the SE kernel by going its length scale to zero. The WN kernel is widely used to model the additive noise in the function. This kernel is defined by
where \( {\sigma}_{\mathrm{WN}}^2 \) and δx, x′ are the noise variance and Kronecker delta function, respectively. It has only one hyperparameter as: \( {\sigma}_{\mathrm{WN}}^2 \) determines the level of the noise.
Appendix 2. Analytical test functions
1.1 Branin function (Forrester et al. 2008) (d = 2)
where x1 ∈ [−5, 10] and x2 ∈ [0, 15].
1.2 Friedman function (Friedman 1991) (d = 5)
where xi ∈ [0, 1]5.
1.3 Dette and Pepelyshev function (Dette and Pepelyshev 2010) (d = 8)
where xi ∈ [0, 1]8.
1.4 Welch et al. function (Welch et al. 1992) (d = 20)
where xi ∈ [−0.5, 0.5]20.
1.5 Output-transformerless (OTL) circuit model (d = 6)
The OTL circuit model has six inputs to estimate the mid-point voltage of an OTL push-pull circuit. This model has been used as a test function in surrogate modeling (Ben-Ari and Steinberg 2007). This model provides analytical expression of the mid-point voltage and it is defined as
where x1 ∈ [50, 150], x2 ∈ [25, 70], x3 ∈ [0.5, 3], x4 ∈ [1.2, 2.5], x5 ∈ [0.25, 1.2], and x6 ∈ [50, 300].
1.6 Borehole model (d = 8)
The borehole model has eight inputs to estimate water-flow through a borehole. The borehole is drilled from the ground surface through the two aquifers. This model has been used as a test function in surrogate modeling (Kersaudy et al. 2015; Morris et al. 1993) and sensitivity analysis (Harper and Gupta 1983). The water-flow rate can be computed by (36) with the properties of the aquifers and borehole
where x1 ∈ [0.05, 0.15], x2 ∈ [100, 50,000], x3 ∈ [63,070, 115,600], x4 ∈ [990, 1110],
x5 ∈ [63.1, 116], x6 ∈ [700, 82], x7 ∈ [1120, 1680], and x8 ∈ [9855, 12,045].
1.7 Wing weight model (d = 10)
The wing weight model has 10 inputs to estimate the weight of the light aircraft wing. This model has been used as a test function for the purpose of the input variable screening (Forrester et al. 2008). The wing weight is computed by (37)
where x1 ∈ [150, 200], x2 ∈ [220, 300], x3 ∈ [6, 10], x4 ∈ [−10, 10], x5 ∈ [16, 45], x6 ∈ [0.5, 1], x7 ∈ [0.08, 0.18], x8 ∈ [2.5, 6], x9 ∈ [1700, 2500], and x10 ∈ [0.025, 0.08].
Appendix 3. Predictive performance for global surrogate modeling
1.1 Numerical verification #1: interpolation for noiseless data (β = 0)
1.2 Numerical verification #2: regression for noisy data (β = 0.005)
1.3 Numerical verification #3: regression for noisy data (β = 0.05)
1.4 Computational experiment #1: system reliability in a parallel system
1.5 Computational experiment #2: system reliability in a series system
Appendix 4. Computational cost with different sample sizes and dimensionalities
Rights and permissions
About this article
Cite this article
Jin, SS. Compositional kernel learning using tree-based genetic programming for Gaussian process regression. Struct Multidisc Optim 62, 1313–1351 (2020). https://doi.org/10.1007/s00158-020-02559-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-020-02559-7