Article

Practical Bayesian optimization of machine learning algorithms

Authors:

Hugo Larochelle,

Ryan P. AdamsAuthors Info & Claims

NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2

Pages 2951 - 2959

Published: 03 December 2012 Publication History

Abstract

The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

References

[1]

Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2:117-129, 1978.

[2]

D.R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21(4):345-383, 2001.

[3]

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.

[4]

Adam D. Bull. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, (3-4):2879-2904, 2011.

[5]

James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Bálázs Kégl. Algorithms for hyperparameter optimization. In Advances in Neural Information Processing Systems 25. 2011.

[6]

Marc C. Kennedy and Anthony O'Hagan. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(3), 2001.

[7]

Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization 5, 2011.

[8]

Nimalan Mahendran, Ziyu Wang, Firas Hamze, and Nando de Freitas. Adaptive mcmc with bayesian optimization. In AISTATS, 2012.

[9]

James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13:281-305, 2012.

[10]

Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. pre-print, 2010. arXiv:1012.2599.

[11]

Carl E. Rasmussen and Christopher Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[12]

H. J. Kushner. A new method for locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86, 1964.

[13]

Iain Murray and Ryan P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models. In Advances in Neural Information Processing Systems 24, pages 1723-1731. 2010.

[14]

Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In AISTATS, 2005.

[15]

Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In Advances in Neural Information Processing Systems 22, 2008.

[16]

David Ginsbourger and Rodolphe Le Riche. Dealing with asynchronicity in parallel Gaussian process based global optimization. http://hal.archives-ouvertes.fr/hal-00507632, 2010.

[17]

Matthew Hoffman, David M. Blei, and Francis Bach. Online learning for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 24, 2010.

[18]

Kevin Miller, M. Pawan Kumar, Benjamin Packer, Danny Goodman, and Daphne Koller. Max-margin min-entropy models. In AISTATS, 2012.

[19]

Chun-Nam John Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In Proceedings of the 26th International Conference on Machine Learning, 2009.

[20]

M. Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems 25. 2010.

[21]

Andrew Saxe, Pang Wei Koh, Zhenghao Chen, Maneesh Bhand, Bipin Suresh, and Andrew Ng. On random weights and unsupervised feature learning. In Proceedings of the 28th International Conference on Machine Learning, 2011.

[22]

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Department of Computer Science, University of Toronto, 2009.

[23]

Adam Coates and Andrew Y. Ng. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems 25. 2011.

[24]

Dan Claudiu Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition, 2012.

Cited By

Malhotra RCherukuri M(2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.3233/IDA-230653
Tae KZhang HPark JRong KWhang S(2024)Falcon: Fair Active Learning Using Multi-Armed BanditsProceedings of the VLDB Endowment10.14778/3641204.364120717:5(952-965)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641207
Pricopie SAllmendinger RLopez-Ibanez MFare CBenatan MKnowles JLi XHandl J(2024)Bayesian Optimization with Setup Switching CostProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664101(103-104)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3664101
Show More Cited By

Practical Bayesian optimization of machine learning algorithms
1. Computing methodologies

Recommendations

Distributed Bayesian optimization of deep reinforcement learning algorithms
Abstract
Significant strides have been made in supervised learning settings thanks to the successful application of deep learning. Now, recent work has brought the techniques of deep learning to bear on sequential decision processes in the area of deep ...
Highlights
- Presents a distributed Bayesian hyperparameter optimization approach called HyperSpace.
- HyperSpace exploits statistical dependencies in hyperparameters to identify optimal settings.
- Distributed search can run in parallel and find ...
How Bayesian should Bayesian optimisation be?
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Bayesian optimisation (BO) uses probabilistic surrogate models - usually Gaussian processes (GPs) - for the optimisation of expensive black-box functions. At each BO iteration, the GP hyperparameters are fit to previously-evaluated data by maximising ...
Learning Dynamic Bayesian Networks Structure Based on Bayesian Optimization Algorithm
ISNN '07: Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks

An optimization algorithm for dynamic Bayesian networks (DBN) based on Bayesian optimization algorithm (BOA) is developed for learning and constructing the DBN structure. In this paper, we first introduce some basic theories and concepts of probability ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2

December 2012

3328 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

402
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Malhotra RCherukuri M(2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.3233/IDA-230653
Tae KZhang HPark JRong KWhang S(2024)Falcon: Fair Active Learning Using Multi-Armed BanditsProceedings of the VLDB Endowment10.14778/3641204.364120717:5(952-965)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641207
Pricopie SAllmendinger RLopez-Ibanez MFare CBenatan MKnowles JLi XHandl J(2024)Bayesian Optimization with Setup Switching CostProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664101(103-104)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3664101
Yu LLi HChen KShou LBaeza-Yates RBonchi F(2024)BoKA: Bayesian Optimization based Knowledge Amalgamation for Multi-unknown-domain Text ClassificationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671963(4035-4046)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671963
Liu WLiu CSajedi SSu HLiang XZheng M(2024)An audio‐based risky flight detection framework for quadrotorsIET Cyber-Systems and Robotics10.1049/csy2.121056:1Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1049/csy2.12105
So BValdez E(2024)SAMME.C2 algorithm for imbalanced multi-class classificationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-024-09847-028:17-18(9387-9404)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00500-024-09847-0
Huber FBürkner PGöddeke DSchulte M(2024)Knowledge-based modeling of simulation behavior for Bayesian optimizationComputational Mechanics10.1007/s00466-023-02427-374:1(151-168)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s00466-023-02427-3
Park NRossi RWang XSimoulin AAhmed NFaloutsos COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)GLEMOSProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669185(69887-69899)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669185
Dreczkowski KGrosnit ABou-Ammar HOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Framework and benchmarks for combinatorial and mixed-variable bayesian optimizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669164(69464-69489)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669164
Saday AYıldırım YTekin COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Robust Bayesian satisficingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669154(69253-69269)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669154
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents