Abstract
The prediction of global optima for non-convex and expensive objective functions is a pervasive challenge across many engineering applications and research areas. Bayesian optimization (BO) is a powerful method for solving optimization problems of this type, as it replaces the expensive search space of the objective function with a less expensive Gaussian process or alternative surrogate model. However, selecting the form and hyperparameters of this surrogate model to optimally represent the design space and maximize the convergence rate is a difficult and non-intuitive challenge. In this work, we conduct a systematic breakdown of the computational costs of the BO framework to reveal how these choices of surrogate formulation and hyperparameters influence overall convergence and prediction quality. We consider two qualitatively different modifications of BO to evaluate for improved performance, specifically gradient-enhanced BO (GEBO) and anisotropy-enhanced automatic relevance determination (ARD). GEBO utilizes available gradient information about the objective function to improve the quality of the surrogate representation and selection of the next evaluation point, but with the trade-off of additional expense. In contrast, ARD utilizes an anisotropic Gaussian process surrogate and relevancy criteria to reduce the search space of the surrogate model and improve convergence by solving a smaller problem. After a systematic analysis of the hyperparameters for both strategies, the methods were benchmarked by solving a fluid mechanics airfoil shape optimization problem and a structural mechanics origami actuator problem. These optimization problems involve 38 to 84 design variables. GEBO exhibited around 3\(\times\) speedup for all benchmark problems compared to BO without modification, while ARD-enriched BO exhibited a 1.55× speedup on select problems. Manifold analysis of the design space revealed that ARD performed best on problems with a contiguous reduced dimension. Collectively, these results highlight the trade-offs and cost distribution difference between GE and ARD modification for BO and provide guidelines for implementation in new problems.
Similar content being viewed by others
References
Acerbi L, Ma WJ (2017) Practical Bayesian optimization for model fitting with bayesian adaptive direct search. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 1834–1844
Albring T (2021) Unconstrained shape design of a transonic inviscid airfoil at a cte. aoa. https://su2code.github.io/tutorials/Inviscid_2D_Unconstrained_NACA0012/. Accessed 17 Sept 2021
Albring T, Sagebaum M, Gauger N (2016) Efficient aerodynamic design using the discrete adjoint method in su2. In: 17th AIAA/ISSMO multidisciplinary analysis and optimization conference. https://doi.org/10.2514/6.2016-3518
Alléon G, Benzi M, Giraud L (1997) Sparse approximate inverse preconditioning for dense linear systems arising in computational electromagnetics. Numer Algorithms 16:1–15. https://doi.org/10.1023/A:1019170609950
Backhaus J, Aulich M, Frey C, Lengyel T, Voß C (2012) Gradient enhanced surrogate models based on adjoint cfd methods for the design of a counter rotating turbofan. Turbo expo: power for land, sea, and air volume 8: turbomachinery, parts A, B, and C:2319–2329. https://doi.org/10.1115/GT2012-69706, https://arxiv.org/abs/https://asmedigitalcollection.asme.org/GT/proceedings-pdf/GT2012/44748/2319/4233028/2319_1.pdf
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Nips, pp 585–591
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182(2):418–477. https://doi.org/10.1006/jcph.2002.7176
Bhosekar A, Ierapetritou M (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput Chem Eng 108:250–267. https://doi.org/10.1016/j.compchemeng.2017.09.017
Bischof C, Bücker M, Hovland P, Naumann U, Utke J (2008) Advances in automatic differentiation, lecture notes in computational science and engineering, vol 64. Springer. https://doi.org/10.1007/978-3-540-68942-3
Brand M (2003) Charting a manifold. In: Advances in neural information processing systems, vol 15. MIT Press, pp 985–992, https://proceedings.neurips.cc/paper/2002/file/8929c70f8d710e412d38da624b21c3c8-Paper.pdf
Brochu E, Cora VM, de Freitas N (2010) A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599
Byrd RH, Gilbert JC, Nocedal J (2000) A trust region method based on interior point techniques for nonlinear programming. Math Program B 89(1):149–185. https://doi.org/10.1007/PL00011391
Cao Y, Li S, Petzold L, Serban R (2003) Adjoint sensitivity analysis for differential-algebraic equations: the adjoint DAE system and its numerical solution. SIAM J Sci Comput 24(3):1076–1089. https://doi.org/10.1137/S1064827501380630
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Choffin B, Ueda N (2018) Scaling Bayesian optimization up to higher dimensions: a review and comparison of recent algorithms. In: 2018 IEEE 28th international workshop on machine learning for signal processing (MLSP), pp 1–6
Corliss G, Faure C, Griewank A, Hascoët L, Naumann U (2002) Automatic differentiation of algorithms: from simulation to optimization. Springer, Berlin. https://doi.org/10.1007/978-1-4613-0075-5
Cox DD, John S (1997) SDO: a statistical method for global optimization. Multidisciplinary design optimization: state-of-the-art, pp 315–329. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.4169
Cuculo V, D’Amelio A, Lanzarotti R, Boccignone G (2018) Personality gaze patterns unveiled via automatic relevance determination. In: Federation of international conferences on software technologies: applications and foundations, Springer, pp 171–184
Despotovic V, Tomas Schommer C (2020) Speech based estimation of Parkinson’s disease using Gaussian processes and automatic relevance determination. Neurocomputing 401:173–181. https://doi.org/10.1016/j.neucom.2020.03.058
Economon T, Palacios F, Copeland S, Lukaczyk T, Alonso J (2016) Su2: an open-source suite for multiphysics simulation and design. AIAA J 54(3):828–846. https://doi.org/10.2514/1.J053813
Enciu P, Gerbaud L, Wurtz F (2010) Automatic differentiation applied for optimization of dynamical systems. IEEE Trans Magn 46(8):2943–2946. https://doi.org/10.1109/TMAG.2010.2044770
Eriksson D, Lee EH, Dong K, Bindel D, Wilson AG (2018) Scaling Gaussian process regression with derivatives. In: Advances in neural information processing systems, pp 6867–6877
Forrester A, Keane A (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1):50–79. https://doi.org/10.1016/j.paerosci.2008.11.001
Garcia-Santiago X, Burger S, Rockstuhl C, Schneider PI (2020) Bayesian optimization with improved scalability and derivative information for efficient design of nanophotonic structures. J Lightwave Technol. https://doi.org/10.1109/JLT.2020.3023450
Gill P, Murray W, Wright M (1991) Numerical linear algebra and optimization, vol 1. Addison-Wesley, Redwood City
Gillman A, Fuchi K, Buskohl PR (2018) Origami topology optimization with nonlinear truss model. https://www.mathworks.com/matlabcentral/fileexchange/69612-origami-topology-optimization-w-nonlinear-trussmodel
Gillman AS, Fuchi K, Buskohl PR (2019) Discovering sequenced origami folding through nonlinear mechanics and topology optimization. J Mech Des Trans ASME 141(4):1–11. https://doi.org/10.1115/1.4041782
Greco M, Gesualdo FAR, Venturini WS, Coda HB (2006) Nonlinear positional formulation for space truss analysis. Finite Elem Anal Des 42(12):1079–1086. https://doi.org/10.1016/j.finel.2006.04.007
Han ZH, Zhang Y, Song CX, Zhang KS (2017) Weighted gradient-enhanced kriging for high-dimensional surrogate modeling and design optimization. AIAA J 55(12):4330–4346. https://doi.org/10.2514/1.J055842
Hao P, Feng S, Li Y, Wang B, Chen H (2020) Adaptive infill sampling criterion for multi-fidelity gradient-enhanced kriging model. Struct Multidisc Optim 62(1):353–373. https://doi.org/10.1007/s00158-020-02493-8
Hart PE, Stork DG, Duda RO (2000) Pattern classification. Wiley, Hoboken
Hicks RM, Henne PA (1978) Wing design by numerical optimization. J Aircr 15(7):407–412. https://doi.org/10.2514/3.58379
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hyoung-Seog C, Juan AJ (2002) Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. In: 40th AIAA aerospace sciences meeting & exhibit, https://arc.aiaa.org/doi/abs/10.2514/6.2002-317,
Jameson A (1988) Aerodynamic design via control theory. J Sci Comput 3:233–260. https://doi.org/10.1007/BF01061285
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Global Optim 13:455–492. https://doi.org/10.1023/A:1008306431147
Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458
Koziel S, Ciaurri DE, Leifsson L (2011) Surrogate-based methods. Springer, Berlin, pp 33–59. https://doi.org/10.1007/978-3-642-20859-1_3
Kushner HJ (1964) A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J Basic Eng 86(1):97–106. https://doi.org/10.1115/1.3653121
Laurent L, Le Riche R, Soulier B, Boucard PA (2019) An overview of gradient-enhanced metamodels with applications. Arch Comput Methods Eng 26(1):61–106. https://doi.org/10.1007/s11831-017-9226-3
Leary SJ, Bhaskar A, Keane AJ (2004) Global approximation and optimization using adjoint computational fluid dynamics codes. AIAA J 42(3):631–41. https://eprints.soton.ac.uk/22795/
Liming C, Haobo Q, Liang G, Chen J, Zan Y (2020) Optimization of expensive black-box problems via gradient-enhanced kriging. Comput Methods Appl Mech Eng 362(112):861. https://doi.org/10.1016/j.cma.2020.112861
Liu K, Li Y, Hu X, Lucu M, Widanage WD (2020) Gaussian process regression with automatic relevance determination kernel for calendar aging prediction of lithium-ion batteries. IEEE Trans Ind Inf 16(6):3767–3777. https://doi.org/10.1109/TII.2019.2941747
Lizotte DJ (2008) Practical Bayesian optimization. PhD thesis, University of Alberta
March A, Willcox K, Wang Q (2011) Gradient-based multifidelity optimisation for aircraft design using Bayesian model calibration. Aeronaut J 115:729–738. https://doi.org/10.1017/S0001924000006473
Marchant R, Ramos F (2012) Bayesian optimisation for intelligent environmental monitoring. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 2242–2249, https://doi.org/10.1109/IROS.2012.6385653
MATLAB (2019) MATLAB version 9.10.0.1613233 (R2021a) The Mathworks, Inc., Natick, Massachusetts
MATLAB Genetic Algorithm Toolbox (2019) Matlab genetic algorithm toolbox. The MathWorks, Natick
Mbuvha R, Boulkaibet I, Marwala T (2019) Automatic relevance determination Bayesian neural networks for credit card default modelling arXiv:1906.06382
McInnes L, Healy J, Melville J (2020) Umap: uniform manifold approximation and projection for dimension reduction. arxiv:1802.03426
Modha DS, Spangler WS (2003) Feature weighting in k-means clustering. Mach Learn 52(3):217–237
Nazrul H, Dhruba Kumar B, Jugal Kumar K (2014) Mifs-nd: a mutual information-based feature selection method. Exp Syst Appl 41(14):6371–6385. https://doi.org/10.1016/j.eswa.2014.04.019
Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin
Nikolaidis P, Chatzis S (2021) Gaussian process-based Bayesian optimization for data-driven unit commitment. Int J Electr Power Energy Syst 130(106):930. https://doi.org/10.1016/j.ijepes.2021.106930
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidisc Optim 48:607–626. https://doi.org/10.1007/s00158-013-0919-4
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New York. https://doi.org/10.1002/9781118029176
Rana S, Li C, Gupta S, Nguyen V, Venkatesh S (2017) High dimensional Bayesian optimization with elastic Gaussian process. In: 34th international conference on machine learning, ICML 2017 6:4407–4415
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Sengupta B, Friston K, Penny W (2014) Efficient gradient computation for dynamical models. NeuroImage 98:521–527. https://doi.org/10.1016/j.neuroimage.2014.04.040
Shahriari B, Swersky K, Wang Z, Adams RP, Freitas ND (2015) Taking the human out of the loop: a review of Bayesian optimization. Clim Change 104(1):1–30. https://doi.org/10.1017/CBO9781107415324.004
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Shende S, Gillman A, Yoo D, Buskohl P, Vemaganti K (2021) Bayesian topology optimization for efficient design of origami folding structures. Struct Multidisc Optim 63(4):1907–1926. https://doi.org/10.1007/s00158-020-02787-x
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp 2951–2959
Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13(4):354–356
Svanberg K (1987) The method of moving asymptotes: a new method for structural optimization. Int J Numer Meth Eng 24(2):359–373
Svante W, Kim E, Paul G (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52. https://doi.org/10.1016/0169-7439(87)80084-9
Teh Y, Roweis S (2003) Automatic alignment of local representations. In: Advances in neural information processing systems, vol 15. MIT Press, pp 841–848. https://proceedings.neurips.cc/paper/2002/file/3a1dd98341fafc1dfe9bcf36360e6b84-Paper.pdf
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Ulaganathan S, Couckuyt I, Ferranti F, Laermans E, Dhaene T (2014) Performance study of multi-fidelity gradient enhanced kriging. Struct Multidisc Optim 51:1–17. https://doi.org/10.1007/s00158-014-1192-x
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10(66–71):13
Wang Q (2013) Forward and adjoint sensitivity computation of chaotic dynamical systems. J Comput Phys 235:1–13. https://doi.org/10.1016/j.jcp.2012.09.007
Wu A, Aoi MC, Pillow JW (2017a) Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature. Machine Learning, pp 1–20. arxiv:1704.00060
Wu J, Poloczek M, Wilson AG, Frazier PI (2017b) Bayesian optimization with gradients. Adv Neural Inf Process Syst 3:5268–5279
Xia J, Xin Z (2017) Effective and robust preconditioning of general SPD matrices via structured incomplete factorization. SIAM J Matrix Anal Appl 38(4):1298–1322. https://doi.org/10.1137/17M1124152
Yamazaki W, Rumpfkeil M, Mavriplis D (2012) Design optimization utilizing gradient/hessian enhanced surrogate model. In: 28th AIAA applied aerodynamics conference. https://doi.org/10.2514/6.2010-4363
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
Zhao J, Chen L, Pedrycz W, Wang W (2019) Variational inference-based automatic relevance determination kernel for embedded feature selection of noisy industrial data. IEEE Trans Industr Electron 66(1):416–428. https://doi.org/10.1109/TIE.2018.2815997
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interests to declare.
Replication of results
The MATLAB code for solving origami problems using gradient-based methods can be downloaded from MathWorks. The MATLAB code for the integration of Bayesian optimization to solve origami problems and the airfoil problem can be made available to interested parties upon request to the authors.
Additional information
Responsible Editor: Lei Wang
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shende, S., Gillman, A., Buskohl, P. et al. Systematic cost analysis of gradient- and anisotropy-enhanced Bayesian design optimization. Struct Multidisc Optim 65, 235 (2022). https://doi.org/10.1007/s00158-022-03324-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00158-022-03324-8