article

Free access

Random search for hyper-parameter optimization

Authors:

James Bergstra,

Yoshua BengioAuthors Info & Claims

The Journal of Machine Learning Research, Volume 13

Pages 281 - 305

Published: 01 February 2012 Publication History

PDF eReader Publisher Site

Abstract

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

References

[1]

I. A. Antonov and V. M. Saleev. An economic method of computing LP _¿-sequences. USSR Computational Mathematics and Mathematical Physics, 19(1):252-256, 1979.

[2]

R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey, 1961.

[3]

Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1): 1-127, 2009.

[4]

Y. Bengio and X. Glorot. Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh and M. Titterington, editors, Proc. of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS'10), pages 249-256, 2010.

[5]

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral.

[6]

C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, London, UK, 1995.

[7]

P. Bratley, B. L. Fox, and H. Niederreiter. Implementation and tests of low-discrepancy sequences. Transactions on Modeling and Computer Simulation, (TOMACS), 2(3):195-213, 1992.

[8]

R. E. Caflisch, W. Morokoff, and A. Owen. Valuation of mortgage backed securities using brownian bridges to reduce effective dimension, 1997.

[9]

C. Chang and C. Lin. LIBSVM: A Library for Support Vector Machines, 2001.

[10]

I. Czogiel, K. Luebke, and C. Weihs. Response surface methodology for optimizing hyper parameters. Technical report, Universität Dortmund Fachbereich Statistik, September 2005.

[11]

S. S. Drew and T. Homem de Mello. Quasi-Monte Carlo strategies for stochastic optimization. In Proc. of the 38th Conference on Winter Simulation, pages 774-782, 2006.

[12]

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11:625-660, 2010.

[13]

J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multidimensional integrals. Numerische Mathematik, 2:84-90, 1960.

[14]

N. Hansen, S. D. Müller, and P. Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 11 (1):1-18, 2003.

[15]

G. E. Hinton. A practical guide to training restricted Boltzmann machines. Technical Report 2010-003, University of Toronto, 2010. version 1.

[16]

G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.

[17]

F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. PhD thesis, University of British Columbia, 2009.

[18]

F. Hutter, H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In LION-5, 2011. Extended version as UBC Tech report TR-2010-10.

[19]

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220 (4598):671-680, 1983.

[20]

N. L. Kleinman, J. C. Spall, and D. Q. Naiman. Simulation-based optimization with stochastic approximation using common random numbers. Management Science, 45(11):1570-1578, November 1999.

[21]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Z. Ghahramani, editor, Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), pages 473-480. ACM, 2007.

[22]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, November 1998a.

[23]

Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. In G. Orr and K. Muller, editors, Neural Networks: Tricks of the Trade. Springer, 1998b.

[24]

M. Galassi et al. GNU Scientific Library Reference Manual, 3rd edition, 2009.

[25]

M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2): 239-245, May 1979.

[26]

A. Nareyek. Choosing search heuristics by non-stationary reinforcement learning. Applied Optimization, 86:523-544, 2003.

[27]

R. M. Neal. Assessing relevance determination methods using DELVE. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 97-129. Springer-Verlag, 1998.

[28]

J. A. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7: 308-313, 1965.

[29]

M. J. D. Powell. A direct search optimization method that models the objective and constraint functions by linear interpolation. Advances in Optimization and Numerical Analysis, pages 51-67, 1994.

[30]

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[31]

Ingo Rechenberg. Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fommann-Holzboog, Stuttgart, 1973.

[32]

A. Srinivasan and G. Ramakrishnan. Parameter screening and optimisation for ILP using designed experiments. Journal of Machine Learning Research, 12:627-662, February 2011.

[33]

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. In W. W. Cohen, A. McCallum, and S. T. Roweis, editors, Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), pages 1096-1103. ACM, 2008.

[34]

T. Weise. Global Optimization Algorithms - Theory and Application. Self-Published, second edition, 2009. Online available at http://www.it-weise.de/.

Cited By

Yin RLu WYang J(2024)A stacked architecture-based fuzzy classifier with data position transformation using fuzzy cognitive mapsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23608746:1(2037-2052)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-236087
Wanjau SWambugu GOirere AMuketha G(2024)Discriminative spatial-temporal feature learning for modeling network intrusion detection systemsJournal of Computer Security10.3233/JCS-22003132:1(1-30)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.3233/JCS-220031
Al Faath KSuyanto S(2024)Komodo Mlipir Algorithm for Optimizing Hyperparameters of a Convolutional Neural NetworkProceedings of the 2024 10th International Conference on Computing and Artificial Intelligence10.1145/3669754.3669809(356-364)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3669754.3669809
Show More Cited By

Index Terms

Random search for hyper-parameter optimization

Recommendations

Random search for hyper-parameter optimization

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. ...
Artificial bee colony algorithm with efficient search strategy based on random neighborhood structure
Abstract
As a popular swarm intelligence algorithm, artificial bee colony (ABC) achieves excellent optimization performance, but it has some shortcomings. In order to strengthen the performance of ABC, a new ABC with efficient search strategy ...
Highlights
- A new random K-neighborhood structure (RNS) is designed.
- Each solution has an ...
A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost
SPML '18: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning

The impact of Hyper-Parameter optimization on the performance of a machine learning algorithm has been proved both theoretically and empirically by many studies reported in the literature. It is a tedious and a time-consuming task if one goes for Manual ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 February 2012

Published in JMLR Volume 13

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

789
Total Citations
View Citations
10,335
Total Downloads

Downloads (Last 12 months)1,612
Downloads (Last 6 weeks)254

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yin RLu WYang J(2024)A stacked architecture-based fuzzy classifier with data position transformation using fuzzy cognitive mapsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23608746:1(2037-2052)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-236087
Wanjau SWambugu GOirere AMuketha G(2024)Discriminative spatial-temporal feature learning for modeling network intrusion detection systemsJournal of Computer Security10.3233/JCS-22003132:1(1-30)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.3233/JCS-220031
Al Faath KSuyanto S(2024)Komodo Mlipir Algorithm for Optimizing Hyperparameters of a Convolutional Neural NetworkProceedings of the 2024 10th International Conference on Computing and Artificial Intelligence10.1145/3669754.3669809(356-364)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3669754.3669809
Coleti MGunaratne CYoung SVaradarajan SPatton RPotok TLi XHandl J(2024)Efficacy of using a dynamic length representation vs. a fixed-length for neuroarchitecture searchProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664149(1888-1894)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3664149
Kenny ARay TLimmer SSingh HRodemann TOlhofer MLi XHandl J(2024)Using Bayesian Optimization to Improve Hyperparameter Search in TPOTProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654061(340-348)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654061
Yuan YWang WLi XChen KZhang YPang WLi XHandl J(2024)Evolving Molecular Graph Neural Networks with Hierarchical Evaluation StrategyProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654055(1417-1425)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654055
Custode LDe Falco IDella Cioppa AIacca GScafuri ULi XHandl J(2024)NEvoFed: A Decentralized Approach to Federated NeuroEvolution of Heterogeneous Neural NetworksProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654029(295-303)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654029
Liang FZhao HQuan YFang WShi CBaeza-Yates RBonchi F(2024)Customizing Graph Neural Network for CAD Assembly RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671788(1746-1757)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671788
Kuang WQian BLi ZChen DGao DPan XXie YLi YDing BZhou JBaeza-Yates RBonchi F(2024)FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671573(5260-5271)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671573
Yin HLiu KSun MChen YZhang BLiu JSehgal VPanchal RHotaj ELiu XGuo DZhang JWang ZJiang SLi HChen ZChen WYang JWen WChua TNgo CKumar RLauw HKa-Wei Lee R(2024)AutoML for Large Capacity Modeling of Meta's Ranking SystemsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648336(374-382)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648336
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents