research-article

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

Authors:

Isabelly Rocha,

Nathaniel Morris,

Valerio SchiavoniAuthors Info & Claims

Middleware '20: Proceedings of the 21st International Middleware Conference

Pages 89 - 104

https://doi.org/10.1145/3423211.3425692

Published: 11 December 2020 Publication History

Abstract

DNN learning jobs are common in today's clusters due to the advances in AI driven services such as machine translation and image recognition. The most critical phase of these jobs for model performance and learning cost is the tuning of hyperparameters. Existing approaches make use of techniques such as early stopping criteria to reduce the tuning impact on learning cost. However, these strategies do not consider the impact that certain hyperparameters and systems parameters have on training time. This paper presents PipeTune, a framework for DNN learning jobs that addresses the trade-offs between these two types of parameters. PipeTune takes advantage of the high parallelism and recurring characteristics of such jobs to minimize the learning cost via a pipelined simultaneous tuning of both hyper and system parameters. Our experimental evaluation using three different types of workloads indicates that PipeTune achieves up to 22.6% reduction and 1.7× speed up on tuning and training time, respectively. PipeTune not only improves performance but also lowers energy consumption up to 29%.

References

[1]

2020. 20 Newsgroups. http://qwone.com/~jason/20Newsgroups. Accessed: 2020-14-09.

[2]

2020. Linux kernel profiling with perf. https://perf.wiki.kernel.org/index.php/Tutorial. Accessed: 2020-14-09.

[3]

2020. Perform Automatic Model Tuning. https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html. Accessed: 2020-14-09.

[4]

2020. TensorFlow ConvNets on a Budget with Bayesian Optimization. htps://sigopt.com/blog/tensorflow-convnets-on-a-budget-with-bayesian-optimization/.

[5]

Ahmad Aghaebrahimian and Mark Cieliebak. 2019. Hyperparameter Tuning for Deep Learning in Natural Language Processing. In Proceedings of the 4th edition of the Swiss Text Analytics Conference, SwissText 2019, Winterthur, Switzerland, June 18-19, 2019 (CEUR Workshop Proceedings), Mark Cieliebak, Don Tuggener, and Fernando Benites (Eds.), Vol. 2458. CEUR-WS.org. http://ceur-ws.org/Vol-2458/paper5.pdf

[6]

Orna Agmon Ben-Yehuda, Muli Ben-Yehuda, Assaf Schuster, and Dan Tsafrir. 2013. Deconstructing Amazon EC2 Spot Instance Pricing. ACM Trans. Economics and Comput. 1, 3 (2013), 16:1-16:20. https://doi.org/10.1145/2509413.2509416

Digital Library

[7]

Yoshua Bengio. 2000. Gradient-Based Optimization of Hyperparameters. Neural Computation 12, 8 (2000), 1889-1900. https://doi.org/10.1162/089976600300015187

Digital Library

[8]

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger (Eds.). 2546--2554. http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization

[9]

James Bergstra and Yoshua Bengio. 2012. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 13 (2012), 281--305. http://dl.acm.org/citation.cfm?id=2188395

[10]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, October 4-6, 2009, Austin, TX, USA. IEEE Computer Society, 44--54. https://doi.org/10.1109/IISWC.2009.5306797

Digital Library

[11]

Dan C. Ciresan, Alessandro Giusti, Luca Maria Gambardella, and Jürgen Schmidhuber. 2013. Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2013 - 16th International Conference, Nagoya, Japan, September 22-26, 2013, Proceedings, Part II (Lecture Notes in Computer Science), Kensaku Mori, Ichiro Sakuma, Yoshinobu Sato, Christian Barillot, and Nassir Navab (Eds.), Vol. 8150. Springer, 411--418. https://doi.org/10.1007/978-3-642-40763-5_51

[12]

Marc Claesen and Bart De Moor. 2015. Hyperparameter Search in Machine Learning. CoRR abs/1502.02127 (2015). arXiv:1502.02127 htp://arxiv.org/abs/1502.02127

[13]

Jason Jinquan Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Li Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan Wu, Yang Wang, Yuhao Yang, Bowen She, Dongjie Shi, Qi Lu, Kai Huang, and Guoqiong Song. 2019. BigDL: A Distributed Deep Learning Framework for Big Data. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, Santa Cruz, CA, USA, November 20-23, 2019. ACM, 50--60. https://doi.org/10.1145/3357223.3362707

Digital Library

[14]

Li Deng, Geoffrey E. Hinton, and Brian Kingsbury. 2013. New types of deep neural network learning for speech recognition and related applications: an overview. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013. IEEE, 8599--8603. https://doi.org/10.1109/ICASSP.2013.6639344

[15]

Li Deng and Yang Liu. 2018. Deep learning in natural language processing. Springer.

[16]

Sourav Dutta. 2018. An overview on the evolution and adoption of deep learning applications used in the industry. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018).

[17]

Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov. 2014. Scalable Object Detection Using Deep Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, 2155--2162. htps://doi.org/10.1109/CVPR.2014.276

[18]

Felix A. Gers, Jürgen Schmidhuber, and Fred A. Cummins. 2000. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 12, 10 (2000), 2451--2471. htps://doi.org/10.1162/089976600300015015

Digital Library

[19]

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM, 1487--1495. https://doi.org/10.1145/3097983.3098043

Digital Library

[20]

Karen Hao. 2019. Police across the US are training crimepredicting AIs on falsified data. MIT Technology Review. 13 (2019).

[21]

Geoffrey E. Hinton. 2012. A Practical Guide to Training Restricted Boltzmann Machines. In Neural Networks: Tricks of the Trade - Second Edition, Grégoire Montavon, Genevieve B. Orr, and Klaus-Robert Müller (Eds.). Lecture Notes in Computer Science, Vol. 7700. Springer, 599--619. htps://doi.org/10.1007/978-3-642-35289-8_32

[22]

Connor Holmes, Daniel Mawhirter, Yuxiong He, Feng Yan, and Bo Wu. 2019. GRNN: Low-Latency and Scalable RNN Inference on GPUs. In Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, George Candea, Robbert van Renesse, and Christof Fetzer (Eds.). ACM, 41:1-41:16. https://doi.org/10.1145/3302424.3303949

Digital Library

[23]

Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, Fernando A. Mujica, Adam Coates, and Andrew Y. Ng. 2015. An Empirical Evaluation of Deep Learning on Highway Driving. CoRR abs/1504.01716 (2015). arXiv:1504.01716htp://arxiv.org/abs/1504.01716

[24]

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Population Based Training of Neural Networks. CoRR abs/1711.09846 (2017). arXiv:1711.09846 http://arxiv.org/abs/1711.09846

[25]

Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 1946--1956. htps://doi.org/10.1145/3292500.3330648

Digital Library

[26]

Minsuk Kahng, Pierre Y. Andrews, Aditya Kalro, and Duen Horng (Polo) Chau. 2018. ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models. IEEE Trans. Vis. Comput. Graph. 24, 1 (2018), 88--97. htps://doi.org/10.1109/TVCG.2017.2744718

[27]

Jin Kyu Kim, Abutalib Aghayev, Garth A. Gibson, and Eric P. Xing. 2019. STRADS-AP: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model. In 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10-12, 2019, Dahlia Malkhi and Dan Tsafrir (Eds.). USENIX Association, 207--222. https://www.usenix.org/conference/atc19/presentation/kim-jin

[28]

Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, and Eric P. Xing. 2016. STRADS: a distributed framework for scheduled model parallel machine learning. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016, London, United Kingdom, April 18-21, 2016, Cristian Cadar, Peter R. Pietzuch, Kimberly Keeton, and Rodrigo Rodrigues (Eds.). ACM, 5:1-5:16. htps://doi.org/10.1145/2901318.2901331

Digital Library

[29]

Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, and Byung-Gon Chun. 2019. Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks. In Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, George Candea, Robbert van Renesse, and Christof Fetzer (Eds.). ACM, 43:1-43:15. https://doi.org/10.1145/3302424.3303957

Digital Library

[30]

Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998).

[31]

Yann LeCun et al. 2015. LeNet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet 20 (2015), 5.

[32]

Lisha Li, Kevin G. Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. J. Mach. Learn. Res. 18 (2017), 185:1-185:52. http://jmlr.org/papers/v18/16-558.html

[33]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI '14, Broomfield, CO, USA, October 6-8, 2014, Jason Flinn and Hank Levy (Eds.). USENIX Association, 583--598. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu

Digital Library

[34]

Yuxi Li. 2017. Deep Reinforcement Learning: An Overview. CoRR abs/1701.07274 (2017). arXiv:1701.07274 htp://arxiv.org/abs/1701.07274

[35]

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, and Ion Stoica. 2018. Tune: A Research Platform for Distributed Model Selection and Training. CoRR abs/1807.05118 (2018). arXiv:1807.05118 http://arxiv.org/abs/1807.05118

[36]

Edo Liberty, Zohar S. Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, David Arpin, Tim Januschowski, Valentin Flunkert, Yuyang Wang, Jan Gasthaus, Lorenzo Stella, Syama Sundar Rangapuram, David Salinas, Sebastian Schelter, and Alex Smola. 2020. Elastic Machine Learning Algorithms in Amazon SageMaker. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 731--737. htps://doi.org/10.1145/3318464.3386126

Digital Library

[37]

Alicia Lozano-Diez, Ruben Zazo, Doroteo T Toledano, and Joaquin Gonzalez-Rodriguez. 2017. An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition. PloS one 12, 8 (2017).

[38]

Qinyi Luo, Jinkun Lin, Youwei Zhuo, and Xuehai Qian. 2019. Hop: Heterogeneity-aware Decentralized Training. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019, Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck (Eds.). ACM, 893--907. htps://doi.org/10.1145/3297858.3304009

Digital Library

[39]

Amith R. Mamidala, Jiuxing Liu, and Dhabaleswar K. Panda. 2004. Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms. In 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), September 20-23 2004, San Diego, California, USA. IEEE Computer Society, 135--144. https://doi.org/10.1109/CLUSTR.2004.1392611

[40]

Tomas Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (Eds.). ISCA, 1045--1048. htp://www.isca-speech.org/archive/interspeech_2010/i10_1045.html

[41]

Dmitry Molchanov, Arsenii Ashukha, and Dmitry P. Vetrov. 2017. Variational Dropout Sparsifies Deep Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 2498--2507. http://proceedings.mlr.press/v70/molchanov17a.html

[42]

Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 1--15. htps://doi.org/10.1145/3341301.3359646

Digital Library

[43]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013 (JMLR Workshop and Conference Proceedings), Vol. 28.JMLR.org, 1310--1318.

[44]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (2011), 2825--2830. http://dl.acm.org/citation.cfm?id=2078195

Digital Library

[45]

Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. 2018. Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference, EuroSys 2018, Porto, Portugal, April 23-26, 2018, Rui Oliveira, Pascal Felber, and Y. Charlie Hu (Eds.). ACM, 3:1-3:14. htps://doi.org/10.1145/3190508.3190517

Digital Library

[46]

Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, and Chuanxiong Guo. 2019. A generic communication scheduler for distributed DNN training acceleration. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27-30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 16--29. https://doi.org/10.1145/3341301.3359642

Digital Library

[47]

Mercy Prasanna Ranjit, Gopinath Ganapathy, Kalaivani Sridhar, and Vikram Arumugham. 2019. Efficient Deep Learning Hyperparameter Tuning Using Cloud Infrastructure: Intelligent Distributed Hyperparameter Tuning with Bayesian Optimization in the Cloud. In 12th IEEE International Conference on Cloud Computing, CLOUD 2019, Milan, Italy, July 8-13, 2019, Elisa Bertino, Carl K. Chang, Peter Chen, Ernesto Damiani, Michael Goul, and Katsunori Oyama (Eds.). IEEE, 520--522. htps://doi.org/10.1109/CLOUD.2019.00097

[48]

Jeff Rasley, Yuxiong He, Feng Yan, Olatunji Ruwase, and Rodrigo Fonseca. 2017. HyperDrive: exploring hyperparameters with POP scheduling. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11 - 15, 2017, K. R. Jayaram, Anshul Gandhi, Bettina Kemme, and Peter R. Pietzuch (Eds.). ACM, 1--13. htps://doi.org/10.1145/3135974.3135994

Digital Library

[49]

Fred Richardson, Douglas A. Reynolds, and Najim Dehak. 2015. Deep Neural Network Approaches to Speaker and Language Recognition. IEEE Signal Processing Letters 22, 10 (2015), 1671--1675. https://doi.org/10.1109/LSP.2015.2420092

[50]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. 2016. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2016), 148--175. htps://doi.org/10.1109/JPROC.2015.2494218

[51]

Muthian Sivathanu, Tapan Chugh, Sanjay S. Singapuram, and Lidong Zhou. 2019. Astra: Exploiting Predictability to Optimize Deep Learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019, Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck (Eds.). ACM, 909--923. htps://doi.org/10.1145/3297858.3304072

Digital Library

[52]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 2960--2968. htp://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms

Digital Library

[53]

Jacob Snow. 2018. Amazon's face recognition falsely matched 28 members of Congress with mugshots. American Civil Liberties Union 28 (2018).

[54]

Daniel Strigl, Klaus Kofler, and Stefan Podlipnig. 2010. Performance and Scalability of GPU-Based Convolutional Neural Networks. In Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, Pisa, Italy, February 17-19, 2010, Marco Danelutto, Julien Bourgeois, and Tom Gross (Eds.). IEEE Computer Society, 317--324. htps://doi.org/10.1109/PDP.2010.43

Digital Library

[55]

Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2018. A Genetic Programming Approach to Designing Convolutional Neural Network Architectures. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, Jérôme Lang (Ed.). ijcai.org, 5369--5373. htps://doi.org/10.24963/ijcai.2018/755

[56]

Yi Sun, Ding Liang, Xiaogang Wang, and Xiaoou Tang. 2015. DeepID3: Face Recognition with Very Deep Neural Networks. CoRR abs/1502.00873 (2015). arXiv:1502.00873 htp://arxiv.org/abs/1502.00873

[57]

Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael J. Franklin, Benjamin Recht, and Ion Stoica. 2017. Drizzle: Fast and Adaptable Stream Processing at Scale. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017. ACM, 374--389. https://doi.org/10.1145/3132747.3132750

Digital Library

[58]

Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. 2001. Constrained K-means Clustering with Background Knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28-July 1, 2001, Carla E. Brodley and Andrea Pohoreckyj Danyluk (Eds.). Morgan Kaufmann, 577--584.

[59]

Jinliang Wei, Garth A. Gibson, Phillip B. Gibbons, and Eric P. Xing. 2019. Automating Dependence-Aware Parallelization of Machine Learning Training on Distributed Shared Memory. In Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, March 25-28, 2019, George Candea, Robbert van Renesse, and Christof Fetzer (Eds.). ACM, 42:1-42:17. htps://doi.org/10.1145/3302424.3303954

[60]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR abs/1708.07747 (2017). arXiv:1708.07747 htp://arxiv.org/abs/1708.07747

[61]

Hui Y Xiong, Babak Alipanahi, Leo J Lee, Hannes Bretschneider, Daniele Merico, Ryan KC Yuen, Yimin Hua, Serge Gueroussov, Hamed S Najafabadi, Timothy R Hughes, et al. 2015. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 6218 (2015), 1254806.

[62]

Steven R. Young, Derek C. Rose, Thomas P. Karnowski, Seung-Hwan Lim, and Robert M. Patton. 2015. Optimizing deep learning hyperparameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, MLHPC 2015, Austin, Texas, USA, November 15, 2015. ACM, 4:1-4:5. htps://doi.org/10.1145/2834892.2834896

[63]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, 8697--8710. htps://doi.org/10.1109/CVPR.2018.00907

Cited By

Hanindhito BPatel BJohn L(2024)Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00031(241-256)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00031
Rocha IFelber PMartorel XPasin MSchiavoni VUnsal O(2024)Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning : (Practical Experience Report)2024 19th European Dependable Computing Conference (EDCC)10.1109/EDCC61798.2024.00029(97-102)Online publication date: 8-Apr-2024
https://doi.org/10.1109/EDCC61798.2024.00029
Qi CZheng JYang XChen QWu M(2023)Application of deep neural network in the strength prediction of cemented paste backfill based on a global datasetConstruction and Building Materials10.1016/j.conbuildmat.2023.131827391(131827)Online publication date: Aug-2023
https://doi.org/10.1016/j.conbuildmat.2023.131827
Show More Cited By

Index Terms

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
  2. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Meta-optimization for parameter tuning with a flexible computing budget
GECCO '12: Proceedings of the 14th annual conference on Genetic and evolutionary computation

Meta-optimization techniques for tuning algorithm parameters usually try to find optimal parameter settings for a given computational budget allocated to the lower-level algorithm. If the available computational budget changes, parameters have to be ...
Parameter Tuning for the $(1 + (λ, λ))$ Genetic Algorithm Using Landscape Analysis and Machine Learning
Applications of Evolutionary Computation
Abstract
The choice of parameter values in evolutionary algorithms greatly affects their performance. Many popular parameter tuning techniques are limited by the tuning budget for finding a good set of parameter values. Recently, we proposed an approach to ...
Experimental Study of Automated Parameter Tuning on the Example of irace and the Traveling Salesman Problem
GECCO '16 Companion: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion

Performance of optimization algorithms (OA) greatly depends on their parameter values. Automated (offline) parameter tuning methods allow to select good parameter values for an OA. Though such methods are widely used, there is little to none information ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

Middleware '20: Proceedings of the 21st International Middleware Conference

December 2020

455 pages

ISBN:9781450381536

DOI:10.1145/3423211

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Middleware '20

Sponsor:

ACM

Middleware '20: 21st International Middleware Conference

December 7 - 11, 2020

Delft, Netherlands

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Upcoming Conference

MIDDLEWARE '24

25th International Middleware Conference

December 2 - 6, 2024

Hong Kong , Hong Kong

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
212
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hanindhito BPatel BJohn L(2024)Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00031(241-256)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00031
Rocha IFelber PMartorel XPasin MSchiavoni VUnsal O(2024)Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning : (Practical Experience Report)2024 19th European Dependable Computing Conference (EDCC)10.1109/EDCC61798.2024.00029(97-102)Online publication date: 8-Apr-2024
https://doi.org/10.1109/EDCC61798.2024.00029
Qi CZheng JYang XChen QWu M(2023)Application of deep neural network in the strength prediction of cemented paste backfill based on a global datasetConstruction and Building Materials10.1016/j.conbuildmat.2023.131827391(131827)Online publication date: Aug-2023
https://doi.org/10.1016/j.conbuildmat.2023.131827
Liang JLin WXu YLiu YMo RLuo X(2023)Energy-aware parameter tuning for mixed workloads in cloud serverCluster Computing10.1007/s10586-023-04212-627:4(4805-4821)Online publication date: 27-Dec-2023
https://doi.org/10.1007/s10586-023-04212-6
Stewart CMorris NChen LBirke R(2022)Performance Modeling for Short-Term Cache AllocationProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545094(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545094
Rocha IFelber PSchiavoni VChen LBellavista PZhang KGherbi ABagchi SPatiño MDi Modica GGascon-Samson J(2022)EdgeTuneProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3533273(1-14)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3528535.3533273
Yu CShi XShen HChen ZLi MWang Y(2021)LorienProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486973(18-32)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3486973
Albo Martínez DBobde SMotyka TChen LBourcier JJiang ZBezemer CCortellessa V(2021)Courier: Real-Time Optimal Batch Size Prediction for Latency SLOs in BigDLProceedings of the ACM/SPEC International Conference on Performance Engineering10.1145/3427921.3450233(133-144)Online publication date: 9-Apr-2021
https://dl.acm.org/doi/10.1145/3427921.3450233

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents