Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2999325.2999464guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Practical Bayesian optimization of machine learning algorithms

Published: 03 December 2012 Publication History

Abstract

The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

References

[1]
Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2:117-129, 1978.
[2]
D.R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21(4):345-383, 2001.
[3]
Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning, 2010.
[4]
Adam D. Bull. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, (3-4):2879-2904, 2011.
[5]
James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Bálázs Kégl. Algorithms for hyperparameter optimization. In Advances in Neural Information Processing Systems 25. 2011.
[6]
Marc C. Kennedy and Anthony O'Hagan. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(3), 2001.
[7]
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization 5, 2011.
[8]
Nimalan Mahendran, Ziyu Wang, Firas Hamze, and Nando de Freitas. Adaptive mcmc with bayesian optimization. In AISTATS, 2012.
[9]
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13:281-305, 2012.
[10]
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. pre-print, 2010. arXiv:1012.2599.
[11]
Carl E. Rasmussen and Christopher Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[12]
H. J. Kushner. A new method for locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86, 1964.
[13]
Iain Murray and Ryan P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models. In Advances in Neural Information Processing Systems 24, pages 1723-1731. 2010.
[14]
Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In AISTATS, 2005.
[15]
Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In Advances in Neural Information Processing Systems 22, 2008.
[16]
David Ginsbourger and Rodolphe Le Riche. Dealing with asynchronicity in parallel Gaussian process based global optimization. http://hal.archives-ouvertes.fr/hal-00507632, 2010.
[17]
Matthew Hoffman, David M. Blei, and Francis Bach. Online learning for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 24, 2010.
[18]
Kevin Miller, M. Pawan Kumar, Benjamin Packer, Danny Goodman, and Daphne Koller. Max-margin min-entropy models. In AISTATS, 2012.
[19]
Chun-Nam John Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In Proceedings of the 26th International Conference on Machine Learning, 2009.
[20]
M. Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems 25. 2010.
[21]
Andrew Saxe, Pang Wei Koh, Zhenghao Chen, Maneesh Bhand, Bipin Suresh, and Andrew Ng. On random weights and unsupervised feature learning. In Proceedings of the 28th International Conference on Machine Learning, 2011.
[22]
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Department of Computer Science, University of Toronto, 2009.
[23]
Adam Coates and Andrew Y. Ng. Selecting receptive fields in deep networks. In Advances in Neural Information Processing Systems 25. 2011.
[24]
Dan Claudiu Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition, 2012.

Cited By

View all
  • (2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
  • (2024)Falcon: Fair Active Learning Using Multi-Armed BanditsProceedings of the VLDB Endowment10.14778/3641204.364120717:5(952-965)Online publication date: 1-Jan-2024
  • (2024)Bayesian Optimization with Setup Switching CostProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664101(103-104)Online publication date: 14-Jul-2024
  • Show More Cited By
  1. Practical Bayesian optimization of machine learning algorithms

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2
    December 2012
    3328 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 03 December 2012

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A systematic review of hyperparameter tuning techniques for software quality prediction modelsIntelligent Data Analysis10.3233/IDA-23065328:5(1131-1149)Online publication date: 19-Sep-2024
    • (2024)Falcon: Fair Active Learning Using Multi-Armed BanditsProceedings of the VLDB Endowment10.14778/3641204.364120717:5(952-965)Online publication date: 1-Jan-2024
    • (2024)Bayesian Optimization with Setup Switching CostProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664101(103-104)Online publication date: 14-Jul-2024
    • (2024)BoKA: Bayesian Optimization based Knowledge Amalgamation for Multi-unknown-domain Text ClassificationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671963(4035-4046)Online publication date: 25-Aug-2024
    • (2024)An audio‐based risky flight detection framework for quadrotorsIET Cyber-Systems and Robotics10.1049/csy2.121056:1Online publication date: 11-Jan-2024
    • (2024)SAMME.C2 algorithm for imbalanced multi-class classificationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-024-09847-028:17-18(9387-9404)Online publication date: 1-Sep-2024
    • (2024)Knowledge-based modeling of simulation behavior for Bayesian optimizationComputational Mechanics10.1007/s00466-023-02427-374:1(151-168)Online publication date: 1-Jul-2024
    • (2023)GLEMOSProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669185(69887-69899)Online publication date: 10-Dec-2023
    • (2023)Framework and benchmarks for combinatorial and mixed-variable bayesian optimizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669164(69464-69489)Online publication date: 10-Dec-2023
    • (2023)Robust Bayesian satisficingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669154(69253-69269)Online publication date: 10-Dec-2023
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media