Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3135974.3135994acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

HyperDrive: exploring hyperparameters with POP scheduling

Published: 11 December 2017 Publication History

Abstract

The quality of machine learning (ML) and deep learning (DL) models are very sensitive to many different adjustable parameters that are set before training even begins, commonly called hyperparameters. Efficient hyperparameter exploration is of great importance to practitioners in order to find high-quality models with affordable time and cost. This is however a challenging process due to a huge search space, expensive training runtime, sparsity of good configurations, and scarcity of time and resources. We develop a scheduling algorithm POP that quickly identifies among promising, opportunistic and poor configurations of hyperparameters. It infuses probabilistic model-based classification with dynamic scheduling and early termination to jointly optimize quality and cost. We also build a comprehensive hyperparameter exploration infrastructure, HyperDrive, to support existing and future scheduling algorithms for a wide range of usage scenarios across different ML/DL frameworks and learning domains. We evaluate POP and HyperDrive using complex and deep models. The results show that we speedup the training process by up to 6.7x compared with basic approaches like random/grid search and up to 2.1x compared with state-of-the-art approaches while achieving similar model quality compared with prior work.

References

[1]
2017. A high performance, open-source universal RPC framework. https://grpc.io. (2017).
[2]
2017. Checkpoint/Restore In Userspace (CRIU). https://criu.org/. (2017). Accessed: 2017-09-13.
[3]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, GA, 265--283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
[4]
Kavosh Asadi and Jason D. Williams. 2016. Sample-efficient Deep Reinforcement Learning for Dialog Control. CoRR abs/1612.06000 (2016). http://arxiv.org/abs/1612.06000
[5]
The GPyOpt authors. 2016. GPyOpt: A Bayesian Optimization framework in python. http://github.com/SheffieldML/GPyOpt. (2016).
[6]
James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CPU and GPU Math Compiler in Python. In Proceedings of the 9th Python in Science Conference, Stéfan van der Walt and Jarrod Millman (Eds.). 3 -- 10.
[7]
J. Bergstra, D. Yamins, and D. D. Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML '13). JMLR.org, I-115--I-123. http://dl.acm.org/citation.cfm?id=3042817.3042832
[8]
Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an Efficient and Scalable Deep Learning Training System. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 571--582. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/chilimbi
[9]
François Chollet. 2015. Keras. https://github.com/fchollet/keras. (2015).
[10]
Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: A Modular Machine Learning Software Library. Idiap-RR Idiap-RR-46-2002. IDIAP.
[11]
Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15). AAAI Press, 3460--3468. http://dl.acm.org/citation.cfm?id=2832581.2832731
[12]
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. Journal of machine learning research 7, Jun (2006), 1079--1105.
[13]
Tim Hunter. 2016. Deep Learning with Apache Spark and TensorFlow. https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html. (January 2016).
[14]
F. Hutter, H. H. Hoos, and K. Leyton-Brown. 2011. Sequential Model-Based Optimization for General Algorithm Configuration. In Proc. of LION-5. 507âĂŞ523.
[15]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).
[16]
Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. 2017. Learning curve prediction with Bayesian neural networks. Proc. of ICLR 17 (2017).
[17]
Oleg Klimov. 2017. LunarLander-v2. https://gym.openai.com/envs/LunarLander-v2. (2017).
[18]
Brent Komer, James Bergstra, and Chris Eliasmith. 2014. Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In ICML workshop on AutoML.
[19]
Alex Krizhevsky. 2017. cuda-convnet. https://code.google.com/p/cuda-convnet/. (2017).
[20]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009). Technical report, University of Toronto.
[21]
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: Bandit-based Configuration Evaluation for Hyperparameter Optimization. Proc. of ICLR 17 (2017).
[22]
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 693--701.
[23]
Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional Attention Flow for Machine Comprehension. arXiv CoRR abs/1611.01603 (2016). http://arxiv.org/abs/1611.01603
[24]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959.
[25]
Evan R. Sparks, Ameet Talwalkar, Daniel Haas, Michael J. Franklin, Michael I. Jordan, and Tim Kraska. Automating Model Search for Large Scale Machine Learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC 2015). 13.
[26]
Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. 2014. Freeze-Thaw Bayesian Optimization. arXiv preprint arXiv:1406.3896 (2014).
[27]
Wangda Tan and Vinod Kumar Vavilapalli. 2017. Distributed Tensor-Flow Assembly on Apache Hadoop YARN. https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/. (March 2017).
[28]
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '13). ACM, New York, NY, USA, 847--855.
[29]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 2074--2082. http://papers.nips.cc/paper/6504-learning-structured-sparsity-in-deep-neural-networks.pdf
[30]
Lee Yang, Jun Shi, Bobbie Chern, and Andy Feng. 2017. Open Sourcing TensorFlowOnSpark: Distributed Deep Learning on Big-Data Clusters. http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing-tensorflowonspark-distributed-deep. (February 2017).
[31]
Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide, Zhiheng Huang, Brian Guenter, Huaming Wang, Jasha Droppo, Geoffrey Zweig, Chris Rossbach, Jie Gao, Andreas Stolcke, Jon Currey, Malcolm Slaney, Guoguo Chen, Amit Agarwal, Chris Basoglu, Marko Padmilac, Alexey Kamenev, Vladimir Ivanov, Scott Cypher, Hari Parthasarathi, Bhaskar Mitra, Baolin Peng, and Xuedong Huang. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Technical Report.
[32]
Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 1 (2006), 49--67.
[33]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent Neural Network Regularization. arXiv CoRR abs/1409.2329 (2014). http://arxiv.org/abs/1409.2329

Cited By

View all
  • (2024)Online Training Flow Scheduling for Geo-Distributed Machine Learning Jobs Over Heterogeneous and Dynamic NetworksIEEE Transactions on Cognitive Communications and Networking10.1109/TCCN.2023.332633110:1(277-291)Online publication date: Feb-2024
  • (2023)Saturn: An Optimized Data System for Multi-Large-Model Deep Learning WorkloadsProceedings of the VLDB Endowment10.14778/3636218.363622717:4(712-725)Online publication date: 1-Dec-2023
  • (2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '17: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference
December 2017
268 pages
ISBN:9781450347204
DOI:10.1145/3135974
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • USENIX Assoc: USENIX Assoc
  • IFIP

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster scheduling
  2. hyperparameter exploration

Qualifiers

  • Research-article

Conference

Middleware '17
Sponsor:
Middleware '17: 18th International Middleware Conference
December 11 - 15, 2017
Nevada, Las Vegas

Acceptance Rates

Middleware '17 Paper Acceptance Rate 20 of 85 submissions, 24%;
Overall Acceptance Rate 203 of 948 submissions, 21%

Upcoming Conference

MIDDLEWARE '24
25th International Middleware Conference
December 2 - 6, 2024
Hong Kong , Hong Kong

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Online Training Flow Scheduling for Geo-Distributed Machine Learning Jobs Over Heterogeneous and Dynamic NetworksIEEE Transactions on Cognitive Communications and Networking10.1109/TCCN.2023.332633110:1(277-291)Online publication date: Feb-2024
  • (2023)Saturn: An Optimized Data System for Multi-Large-Model Deep Learning WorkloadsProceedings of the VLDB Endowment10.14778/3636218.363622717:4(712-725)Online publication date: 1-Dec-2023
  • (2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
  • (2023)Waterwave: A GPU Memory Flow Engine for Concurrent DNN TrainingIEEE Transactions on Computers10.1109/TC.2023.327853072:10(2938-2950)Online publication date: Oct-2023
  • (2023)Efficient Supernet Training Using Path Parallelism2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071099(1249-1261)Online publication date: Feb-2023
  • (2022)CoGNNProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571936(1-15)Online publication date: 13-Nov-2022
  • (2022)HippoProceedings of the VLDB Endowment10.14778/3510397.351040215:5(1038-1052)Online publication date: 18-May-2022
  • (2022)Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in CloudProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545027(1-11)Online publication date: 29-Aug-2022
  • (2022)Elastic Parameter Server: Accelerating ML Training With Scalable Resource SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310424233:5(1128-1143)Online publication date: 1-May-2022
  • (2022)Machine Learning Feature Based Job Scheduling for Distributed Machine Learning ClustersIEEE/ACM Transactions on Networking10.1109/TNET.2022.3190797(1-16)Online publication date: 2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media