Systematic cost analysis of gradient- and anisotropy-enhanced Bayesian design optimization

Sourabh Shende¹,
Andrew Gillman²,
Philip Buskohl² &
…
Kumar Vemaganti ORCID: orcid.org/0000-0001-5729-2858¹

397 Accesses
2 Citations
Explore all metrics

Abstract

The prediction of global optima for non-convex and expensive objective functions is a pervasive challenge across many engineering applications and research areas. Bayesian optimization (BO) is a powerful method for solving optimization problems of this type, as it replaces the expensive search space of the objective function with a less expensive Gaussian process or alternative surrogate model. However, selecting the form and hyperparameters of this surrogate model to optimally represent the design space and maximize the convergence rate is a difficult and non-intuitive challenge. In this work, we conduct a systematic breakdown of the computational costs of the BO framework to reveal how these choices of surrogate formulation and hyperparameters influence overall convergence and prediction quality. We consider two qualitatively different modifications of BO to evaluate for improved performance, specifically gradient-enhanced BO (GEBO) and anisotropy-enhanced automatic relevance determination (ARD). GEBO utilizes available gradient information about the objective function to improve the quality of the surrogate representation and selection of the next evaluation point, but with the trade-off of additional expense. In contrast, ARD utilizes an anisotropic Gaussian process surrogate and relevancy criteria to reduce the search space of the surrogate model and improve convergence by solving a smaller problem. After a systematic analysis of the hyperparameters for both strategies, the methods were benchmarked by solving a fluid mechanics airfoil shape optimization problem and a structural mechanics origami actuator problem. These optimization problems involve 38 to 84 design variables. GEBO exhibited around 3$\times$ speedup for all benchmark problems compared to BO without modification, while ARD-enriched BO exhibited a 1.55× speedup on select problems. Manifold analysis of the design space revealed that ARD performed best on problems with a contiguous reduced dimension. Collectively, these results highlight the trade-offs and cost distribution difference between GE and ARD modification for BO and provide guidelines for implementation in new problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian optimization for mixed-variable, multi-objective problems

Article 11 November 2022

Bi-objective Bayesian optimization of engineering problems with cheap and expensive cost functions

Article 25 January 2022

Efficient inverse design optimization through multi-fidelity simulations, machine learning, and boundary refinement strategies

Article Open access 09 September 2024

References

Acerbi L, Ma WJ (2017) Practical Bayesian optimization for model fitting with bayesian adaptive direct search. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 1834–1844
Albring T (2021) Unconstrained shape design of a transonic inviscid airfoil at a cte. aoa. https://su2code.github.io/tutorials/Inviscid_2D_Unconstrained_NACA0012/. Accessed 17 Sept 2021
Albring T, Sagebaum M, Gauger N (2016) Efficient aerodynamic design using the discrete adjoint method in su2. In: 17th AIAA/ISSMO multidisciplinary analysis and optimization conference. https://doi.org/10.2514/6.2016-3518
Alléon G, Benzi M, Giraud L (1997) Sparse approximate inverse preconditioning for dense linear systems arising in computational electromagnetics. Numer Algorithms 16:1–15. https://doi.org/10.1023/A:1019170609950
Article MathSciNet MATH Google Scholar
Backhaus J, Aulich M, Frey C, Lengyel T, Voß C (2012) Gradient enhanced surrogate models based on adjoint cfd methods for the design of a counter rotating turbofan. Turbo expo: power for land, sea, and air volume 8: turbomachinery, parts A, B, and C:2319–2329. https://doi.org/10.1115/GT2012-69706, https://arxiv.org/abs/https://asmedigitalcollection.asme.org/GT/proceedings-pdf/GT2012/44748/2319/4233028/2319_1.pdf
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Nips, pp 585–591
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
MATH Google Scholar
Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182(2):418–477. https://doi.org/10.1006/jcph.2002.7176
Article MathSciNet MATH Google Scholar
Bhosekar A, Ierapetritou M (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput Chem Eng 108:250–267. https://doi.org/10.1016/j.compchemeng.2017.09.017
Article Google Scholar
Bischof C, Bücker M, Hovland P, Naumann U, Utke J (2008) Advances in automatic differentiation, lecture notes in computational science and engineering, vol 64. Springer. https://doi.org/10.1007/978-3-540-68942-3
Brand M (2003) Charting a manifold. In: Advances in neural information processing systems, vol 15. MIT Press, pp 985–992, https://proceedings.neurips.cc/paper/2002/file/8929c70f8d710e412d38da624b21c3c8-Paper.pdf
Brochu E, Cora VM, de Freitas N (2010) A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599
Byrd RH, Gilbert JC, Nocedal J (2000) A trust region method based on interior point techniques for nonlinear programming. Math Program B 89(1):149–185. https://doi.org/10.1007/PL00011391
Article MathSciNet MATH Google Scholar
Cao Y, Li S, Petzold L, Serban R (2003) Adjoint sensitivity analysis for differential-algebraic equations: the adjoint DAE system and its numerical solution. SIAM J Sci Comput 24(3):1076–1089. https://doi.org/10.1137/S1064827501380630
Article MathSciNet MATH Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Article Google Scholar
Choffin B, Ueda N (2018) Scaling Bayesian optimization up to higher dimensions: a review and comparison of recent algorithms. In: 2018 IEEE 28th international workshop on machine learning for signal processing (MLSP), pp 1–6
Corliss G, Faure C, Griewank A, Hascoët L, Naumann U (2002) Automatic differentiation of algorithms: from simulation to optimization. Springer, Berlin. https://doi.org/10.1007/978-1-4613-0075-5
Book MATH Google Scholar
Cox DD, John S (1997) SDO: a statistical method for global optimization. Multidisciplinary design optimization: state-of-the-art, pp 315–329. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.4169
Cuculo V, D’Amelio A, Lanzarotti R, Boccignone G (2018) Personality gaze patterns unveiled via automatic relevance determination. In: Federation of international conferences on software technologies: applications and foundations, Springer, pp 171–184
Despotovic V, Tomas Schommer C (2020) Speech based estimation of Parkinson’s disease using Gaussian processes and automatic relevance determination. Neurocomputing 401:173–181. https://doi.org/10.1016/j.neucom.2020.03.058
Article Google Scholar
Economon T, Palacios F, Copeland S, Lukaczyk T, Alonso J (2016) Su2: an open-source suite for multiphysics simulation and design. AIAA J 54(3):828–846. https://doi.org/10.2514/1.J053813
Article Google Scholar
Enciu P, Gerbaud L, Wurtz F (2010) Automatic differentiation applied for optimization of dynamical systems. IEEE Trans Magn 46(8):2943–2946. https://doi.org/10.1109/TMAG.2010.2044770
Article Google Scholar
Eriksson D, Lee EH, Dong K, Bindel D, Wilson AG (2018) Scaling Gaussian process regression with derivatives. In: Advances in neural information processing systems, pp 6867–6877
Forrester A, Keane A (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1):50–79. https://doi.org/10.1016/j.paerosci.2008.11.001
Article Google Scholar
Garcia-Santiago X, Burger S, Rockstuhl C, Schneider PI (2020) Bayesian optimization with improved scalability and derivative information for efficient design of nanophotonic structures. J Lightwave Technol. https://doi.org/10.1109/JLT.2020.3023450
Article Google Scholar
Gill P, Murray W, Wright M (1991) Numerical linear algebra and optimization, vol 1. Addison-Wesley, Redwood City
MATH Google Scholar
Gillman A, Fuchi K, Buskohl PR (2018) Origami topology optimization with nonlinear truss model. https://www.mathworks.com/matlabcentral/fileexchange/69612-origami-topology-optimization-w-nonlinear-trussmodel
Gillman AS, Fuchi K, Buskohl PR (2019) Discovering sequenced origami folding through nonlinear mechanics and topology optimization. J Mech Des Trans ASME 141(4):1–11. https://doi.org/10.1115/1.4041782
Article Google Scholar
Greco M, Gesualdo FAR, Venturini WS, Coda HB (2006) Nonlinear positional formulation for space truss analysis. Finite Elem Anal Des 42(12):1079–1086. https://doi.org/10.1016/j.finel.2006.04.007
Article MathSciNet Google Scholar
Han ZH, Zhang Y, Song CX, Zhang KS (2017) Weighted gradient-enhanced kriging for high-dimensional surrogate modeling and design optimization. AIAA J 55(12):4330–4346. https://doi.org/10.2514/1.J055842
Article Google Scholar
Hao P, Feng S, Li Y, Wang B, Chen H (2020) Adaptive infill sampling criterion for multi-fidelity gradient-enhanced kriging model. Struct Multidisc Optim 62(1):353–373. https://doi.org/10.1007/s00158-020-02493-8
Article MathSciNet Google Scholar
Hart PE, Stork DG, Duda RO (2000) Pattern classification. Wiley, Hoboken
MATH Google Scholar
Hicks RM, Henne PA (1978) Wing design by numerical optimization. J Aircr 15(7):407–412. https://doi.org/10.2514/3.58379
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Hyoung-Seog C, Juan AJ (2002) Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. In: 40th AIAA aerospace sciences meeting & exhibit, https://arc.aiaa.org/doi/abs/10.2514/6.2002-317,
Jameson A (1988) Aerodynamic design via control theory. J Sci Comput 3:233–260. https://doi.org/10.1007/BF01061285
Article MATH Google Scholar
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Global Optim 13:455–492. https://doi.org/10.1023/A:1008306431147
Article MathSciNet MATH Google Scholar
Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458
Koziel S, Ciaurri DE, Leifsson L (2011) Surrogate-based methods. Springer, Berlin, pp 33–59. https://doi.org/10.1007/978-3-642-20859-1_3
Book MATH Google Scholar
Kushner HJ (1964) A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J Basic Eng 86(1):97–106. https://doi.org/10.1115/1.3653121
Article Google Scholar
Laurent L, Le Riche R, Soulier B, Boucard PA (2019) An overview of gradient-enhanced metamodels with applications. Arch Comput Methods Eng 26(1):61–106. https://doi.org/10.1007/s11831-017-9226-3
Article MathSciNet Google Scholar
Leary SJ, Bhaskar A, Keane AJ (2004) Global approximation and optimization using adjoint computational fluid dynamics codes. AIAA J 42(3):631–41. https://eprints.soton.ac.uk/22795/
Liming C, Haobo Q, Liang G, Chen J, Zan Y (2020) Optimization of expensive black-box problems via gradient-enhanced kriging. Comput Methods Appl Mech Eng 362(112):861. https://doi.org/10.1016/j.cma.2020.112861
Article MathSciNet MATH Google Scholar
Liu K, Li Y, Hu X, Lucu M, Widanage WD (2020) Gaussian process regression with automatic relevance determination kernel for calendar aging prediction of lithium-ion batteries. IEEE Trans Ind Inf 16(6):3767–3777. https://doi.org/10.1109/TII.2019.2941747
Article Google Scholar
Lizotte DJ (2008) Practical Bayesian optimization. PhD thesis, University of Alberta
March A, Willcox K, Wang Q (2011) Gradient-based multifidelity optimisation for aircraft design using Bayesian model calibration. Aeronaut J 115:729–738. https://doi.org/10.1017/S0001924000006473
Article Google Scholar
Marchant R, Ramos F (2012) Bayesian optimisation for intelligent environmental monitoring. In: 2012 IEEE/RSJ international conference on intelligent robots and systems, pp 2242–2249, https://doi.org/10.1109/IROS.2012.6385653
MATLAB (2019) MATLAB version 9.10.0.1613233 (R2021a) The Mathworks, Inc., Natick, Massachusetts
MATLAB Genetic Algorithm Toolbox (2019) Matlab genetic algorithm toolbox. The MathWorks, Natick
Mbuvha R, Boulkaibet I, Marwala T (2019) Automatic relevance determination Bayesian neural networks for credit card default modelling arXiv:1906.06382
McInnes L, Healy J, Melville J (2020) Umap: uniform manifold approximation and projection for dimension reduction. arxiv:1802.03426
Modha DS, Spangler WS (2003) Feature weighting in k-means clustering. Mach Learn 52(3):217–237
Article Google Scholar
Nazrul H, Dhruba Kumar B, Jugal Kumar K (2014) Mifs-nd: a mutual information-based feature selection method. Exp Syst Appl 41(14):6371–6385. https://doi.org/10.1016/j.eswa.2014.04.019
Article Google Scholar
Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin
Book Google Scholar
Nikolaidis P, Chatzis S (2021) Gaussian process-based Bayesian optimization for data-driven unit commitment. Int J Electr Power Energy Syst 130(106):930. https://doi.org/10.1016/j.ijepes.2021.106930
Article Google Scholar
Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidisc Optim 48:607–626. https://doi.org/10.1007/s00158-013-0919-4
Article Google Scholar
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality. Wiley, New York. https://doi.org/10.1002/9781118029176
Book MATH Google Scholar
Rana S, Li C, Gupta S, Nguyen V, Venkatesh S (2017) High dimensional Bayesian optimization with elastic Gaussian process. In: 34th international conference on machine learning, ICML 2017 6:4407–4415
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516
Article MathSciNet Google Scholar
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Sengupta B, Friston K, Penny W (2014) Efficient gradient computation for dynamical models. NeuroImage 98:521–527. https://doi.org/10.1016/j.neuroimage.2014.04.040
Article Google Scholar
Shahriari B, Swersky K, Wang Z, Adams RP, Freitas ND (2015) Taking the human out of the loop: a review of Bayesian optimization. Clim Change 104(1):1–30. https://doi.org/10.1017/CBO9781107415324.004
Article Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book Google Scholar
Shende S, Gillman A, Yoo D, Buskohl P, Vemaganti K (2021) Bayesian topology optimization for efficient design of origami folding structures. Struct Multidisc Optim 63(4):1907–1926. https://doi.org/10.1007/s00158-020-02787-x
Article MathSciNet Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp 2951–2959
Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13(4):354–356
Article MathSciNet Google Scholar
Svanberg K (1987) The method of moving asymptotes: a new method for structural optimization. Int J Numer Meth Eng 24(2):359–373
Article MathSciNet Google Scholar
Svante W, Kim E, Paul G (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52. https://doi.org/10.1016/0169-7439(87)80084-9
Article Google Scholar
Teh Y, Roweis S (2003) Automatic alignment of local representations. In: Advances in neural information processing systems, vol 15. MIT Press, pp 841–848. https://proceedings.neurips.cc/paper/2002/file/3a1dd98341fafc1dfe9bcf36360e6b84-Paper.pdf
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Ulaganathan S, Couckuyt I, Ferranti F, Laermans E, Dhaene T (2014) Performance study of multi-fidelity gradient enhanced kriging. Struct Multidisc Optim 51:1–17. https://doi.org/10.1007/s00158-014-1192-x
Article Google Scholar
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative review. J Mach Learn Res 10(66–71):13
Google Scholar
Wang Q (2013) Forward and adjoint sensitivity computation of chaotic dynamical systems. J Comput Phys 235:1–13. https://doi.org/10.1016/j.jcp.2012.09.007
Article MathSciNet MATH Google Scholar
Wu A, Aoi MC, Pillow JW (2017a) Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature. Machine Learning, pp 1–20. arxiv:1704.00060
Wu J, Poloczek M, Wilson AG, Frazier PI (2017b) Bayesian optimization with gradients. Adv Neural Inf Process Syst 3:5268–5279
Google Scholar
Xia J, Xin Z (2017) Effective and robust preconditioning of general SPD matrices via structured incomplete factorization. SIAM J Matrix Anal Appl 38(4):1298–1322. https://doi.org/10.1137/17M1124152
Article MathSciNet MATH Google Scholar
Yamazaki W, Rumpfkeil M, Mavriplis D (2012) Design optimization utilizing gradient/hessian enhanced surrogate model. In: 28th AIAA applied aerodynamics conference. https://doi.org/10.2514/6.2010-4363
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
Zhao J, Chen L, Pedrycz W, Wang W (2019) Variational inference-based automatic relevance determination kernel for embedded feature selection of noisy industrial data. IEEE Trans Industr Electron 66(1):416–428. https://doi.org/10.1109/TIE.2018.2815997
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical & Materials Engineering, University of Cincinnati, 2600 Clifton Ave, Cincinnati, OH, 45221-0072, USA
Sourabh Shende & Kumar Vemaganti
Materials & Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, Dayton, OH, 45433, USA
Andrew Gillman & Philip Buskohl

Authors

Sourabh Shende
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Gillman
View author publications
You can also search for this author in PubMed Google Scholar
Philip Buskohl
View author publications
You can also search for this author in PubMed Google Scholar
Kumar Vemaganti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumar Vemaganti.

Ethics declarations

Conflict of interest

The authors have no conflict of interests to declare.

Replication of results

The MATLAB code for solving origami problems using gradient-based methods can be downloaded from MathWorks. The MATLAB code for the integration of Bayesian optimization to solve origami problems and the airfoil problem can be made available to interested parties upon request to the authors.

Additional information

Responsible Editor: Lei Wang

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shende, S., Gillman, A., Buskohl, P. et al. Systematic cost analysis of gradient- and anisotropy-enhanced Bayesian design optimization. Struct Multidisc Optim 65, 235 (2022). https://doi.org/10.1007/s00158-022-03324-8

Download citation

Received: 26 March 2022
Revised: 03 June 2022
Accepted: 23 June 2022
Published: 11 August 2022
DOI: https://doi.org/10.1007/s00158-022-03324-8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian optimization for mixed-variable, multi-objective problems

Bi-objective Bayesian optimization of engineering problems with cheap and expensive cost functions

Efficient inverse design optimization through multi-fidelity simulations, machine learning, and boundary refinement strategies

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Replication of results

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Systematic cost analysis of gradient- and anisotropy-enhanced Bayesian design optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian optimization for mixed-variable, multi-objective problems

Bi-objective Bayesian optimization of engineering problems with cheap and expensive cost functions

Efficient inverse design optimization through multi-fidelity simulations, machine learning, and boundary refinement strategies

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Replication of results

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation