Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Random search for hyper-parameter optimization

Published: 01 February 2012 Publication History

Abstract

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

References

[1]
I. A. Antonov and V. M. Saleev. An economic method of computing LP ¿-sequences. USSR Computational Mathematics and Mathematical Physics, 19(1):252-256, 1979.
[2]
R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey, 1961.
[3]
Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1): 1-127, 2009.
[4]
Y. Bengio and X. Glorot. Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh and M. Titterington, editors, Proc. of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS'10), pages 249-256, 2010.
[5]
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral.
[6]
C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, London, UK, 1995.
[7]
P. Bratley, B. L. Fox, and H. Niederreiter. Implementation and tests of low-discrepancy sequences. Transactions on Modeling and Computer Simulation, (TOMACS), 2(3):195-213, 1992.
[8]
R. E. Caflisch, W. Morokoff, and A. Owen. Valuation of mortgage backed securities using brownian bridges to reduce effective dimension, 1997.
[9]
C. Chang and C. Lin. LIBSVM: A Library for Support Vector Machines, 2001.
[10]
I. Czogiel, K. Luebke, and C. Weihs. Response surface methodology for optimizing hyper parameters. Technical report, Universität Dortmund Fachbereich Statistik, September 2005.
[11]
S. S. Drew and T. Homem de Mello. Quasi-Monte Carlo strategies for stochastic optimization. In Proc. of the 38th Conference on Winter Simulation, pages 774-782, 2006.
[12]
D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11:625-660, 2010.
[13]
J. H. Halton. On the efficiency of certain quasi-random sequences of points in evaluating multidimensional integrals. Numerische Mathematik, 2:84-90, 1960.
[14]
N. Hansen, S. D. Müller, and P. Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 11 (1):1-18, 2003.
[15]
G. E. Hinton. A practical guide to training restricted Boltzmann machines. Technical Report 2010-003, University of Toronto, 2010. version 1.
[16]
G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006.
[17]
F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. PhD thesis, University of British Columbia, 2009.
[18]
F. Hutter, H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In LION-5, 2011. Extended version as UBC Tech report TR-2010-10.
[19]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220 (4598):671-680, 1983.
[20]
N. L. Kleinman, J. C. Spall, and D. Q. Naiman. Simulation-based optimization with stochastic approximation using common random numbers. Management Science, 45(11):1570-1578, November 1999.
[21]
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Z. Ghahramani, editor, Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), pages 473-480. ACM, 2007.
[22]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, November 1998a.
[23]
Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. In G. Orr and K. Muller, editors, Neural Networks: Tricks of the Trade. Springer, 1998b.
[24]
M. Galassi et al. GNU Scientific Library Reference Manual, 3rd edition, 2009.
[25]
M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2): 239-245, May 1979.
[26]
A. Nareyek. Choosing search heuristics by non-stationary reinforcement learning. Applied Optimization, 86:523-544, 2003.
[27]
R. M. Neal. Assessing relevance determination methods using DELVE. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 97-129. Springer-Verlag, 1998.
[28]
J. A. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7: 308-313, 1965.
[29]
M. J. D. Powell. A direct search optimization method that models the objective and constraint functions by linear interpolation. Advances in Optimization and Numerical Analysis, pages 51-67, 1994.
[30]
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[31]
Ingo Rechenberg. Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fommann-Holzboog, Stuttgart, 1973.
[32]
A. Srinivasan and G. Ramakrishnan. Parameter screening and optimisation for ILP using designed experiments. Journal of Machine Learning Research, 12:627-662, February 2011.
[33]
P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. In W. W. Cohen, A. McCallum, and S. T. Roweis, editors, Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), pages 1096-1103. ACM, 2008.
[34]
T. Weise. Global Optimization Algorithms - Theory and Application. Self-Published, second edition, 2009. Online available at http://www.it-weise.de/.

Cited By

View all
  • (2024)A stacked architecture-based fuzzy classifier with data position transformation using fuzzy cognitive mapsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23608746:1(2037-2052)Online publication date: 1-Jan-2024
  • (2024)Discriminative spatial-temporal feature learning for modeling network intrusion detection systemsJournal of Computer Security10.3233/JCS-22003132:1(1-30)Online publication date: 2-Feb-2024
  • (2024)Komodo Mlipir Algorithm for Optimizing Hyperparameters of a Convolutional Neural NetworkProceedings of the 2024 10th International Conference on Computing and Artificial Intelligence10.1145/3669754.3669809(356-364)Online publication date: 26-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 February 2012
Published in JMLR Volume 13

Author Tags

  1. deep learning
  2. global optimization
  3. model selection
  4. neural networks
  5. response surface modeling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,612
  • Downloads (Last 6 weeks)254
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A stacked architecture-based fuzzy classifier with data position transformation using fuzzy cognitive mapsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23608746:1(2037-2052)Online publication date: 1-Jan-2024
  • (2024)Discriminative spatial-temporal feature learning for modeling network intrusion detection systemsJournal of Computer Security10.3233/JCS-22003132:1(1-30)Online publication date: 2-Feb-2024
  • (2024)Komodo Mlipir Algorithm for Optimizing Hyperparameters of a Convolutional Neural NetworkProceedings of the 2024 10th International Conference on Computing and Artificial Intelligence10.1145/3669754.3669809(356-364)Online publication date: 26-Apr-2024
  • (2024)Efficacy of using a dynamic length representation vs. a fixed-length for neuroarchitecture searchProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664149(1888-1894)Online publication date: 14-Jul-2024
  • (2024)Using Bayesian Optimization to Improve Hyperparameter Search in TPOTProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654061(340-348)Online publication date: 14-Jul-2024
  • (2024)Evolving Molecular Graph Neural Networks with Hierarchical Evaluation StrategyProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654055(1417-1425)Online publication date: 14-Jul-2024
  • (2024)NEvoFed: A Decentralized Approach to Federated NeuroEvolution of Heterogeneous Neural NetworksProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654029(295-303)Online publication date: 14-Jul-2024
  • (2024)Customizing Graph Neural Network for CAD Assembly RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671788(1746-1757)Online publication date: 25-Aug-2024
  • (2024)FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671573(5260-5271)Online publication date: 25-Aug-2024
  • (2024)AutoML for Large Capacity Modeling of Meta's Ranking SystemsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648336(374-382)Online publication date: 13-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media