Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

Published: 25 July 2019 Publication History

Abstract

Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBENCH, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset-a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBENCH, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBENCH entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPERF v0.5 benchmark.

References

[1]
Second conference on machine translation, 2017.
[2]
Tensorflow xla overview. https://www.tensorflow.org/ performance/xla, 2017.
[3]
MLPerf. https://mlperf.org/, 2018.
[4]
TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI, Carlsbad, CA, 2018. USENIX Association.
[5]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, volume 16, pages 265--283, 2016.
[6]
Robert Adolf, Saketh Rama, Brandon Reagen, Gu-YeonWei, and David Brooks. Fathom: Reference Workloads for Modern Deep Learning Methods. In IISWC, pages 1--10. IEEE, 2016.
[7]
Takuya Akiba, Shuji Suzuki, and Keisuke Fukuda. Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325, 2017.
[8]
Dario Amodei and Danny Hernandez. Ai and compute, 2018.
[9]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In SIGMETRICS, volume 40, pages 53--64. ACM, 2012.
[10]
Alexei Baevski and Michael Auli. Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853, 2018.
[11]
Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, and Mohak Shah. Comparative Study of Deep Learning Software Frameworks. arXiv preprint arXiv:1511.06435, 2015.
[12]
Baidu. DeepBench: Benchmarking Deep Learning Operations on Different Hardware. https://github.com/baidu-research/ DeepBench, 2017.
[13]
Anup Bhande. What is underfitting and overfitting in machine learning and how to deal with it, 2018.
[14]
Victor Bittorf. Making ncf reflect production usage, 2019.
[15]
Doug Burger. Microsoft unveils Project Brainwave for Real-time AI. Microsoft Research, Microsoft, 22, 2017.
[16]
Kevin K Chang, A Giray Yaglkç, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O'Connor, Hasan Hassan, and Onur Mutlu. Understanding reduced-voltage operation in modern dram devices: Experimental characterization, analysis, and mechanisms. SIGMETRICS, 1(1):10, 2017.
[17]
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005, 2013.
[18]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.
[19]
Sharan Chetlur, CliffWoolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cuDNN: Efficient Primitives for Deep Learning. arXiv preprint arXiv:1410.0759, 2014.
[20]
Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI, volume 14, pages 571--582, 2014.
[21]
Soumith Chintala. Convnet-Benchmarks: Easy Benchmarking of All Publicly Accessible Implementations of Convnets. https://github. com/soumith/convnet-benchmarks, September 2017.
[22]
Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop, 2017.
[23]
Christopher De Sa, Matthew Feldman, Christopher Ré, and Kunle Olukotun. Understanding and Optimizing Asynchronous Lowprecision Stochastic Gradient Descent. In ISCA. ACM, 2017.
[24]
Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R Aberger, Kunle Olukotun, and Christopher Ré. Highaccuracy low-precision training. arXiv preprint arXiv:1803.03383, 2018.
[25]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large Scale Distributed Deep Networks. In NIPS, 2012.
[26]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
[27]
Saugata Ghose, Abdullah Giray Yaglikçi, Raghav Gupta, Donghyuk Lee, Kais Kudrolli, William X Liu, Hasan Hassan, Kevin K Chang, Niladrish Chatterjee, Aditya Agrawal, et al. What your dram power models are not telling you: Lessons from a detailed experimental study. SIGMETRICS, 2(3):38, 2018.
[28]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep Sparse Rectifier Neural Networks. In AISTATS, pages 315--323, 2011.
[29]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
[30]
Google. TensorFlow Benchmarks. https://www.tensorflow.org/ performance/benchmarks, 2017.
[31]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, Large Minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[32]
Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei, Gregory Ganger, Phillip Gibbons, Garth Gibson, and Eric Xing. Addressing the Straggler Problem for Iterative Convergent Parallel ML. In SoCC. ACM, 2016.
[33]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size. arXiv preprint arXiv:1602.07360, 2016.
[34]
Intel. Bigdl: Distributed deep learning library for apache spark, 2019.
[35]
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[36]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In ACM International Conference on Multimedia, pages 675--678. ACM, 2014.
[37]
Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond data and model parallelism for deep neural networks. In SysML, 2019.
[38]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In-datacenter Performance Analysis of a Tensor Processing Unit. In ISCA, pages 1--12. ACM, 2017.
[39]
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and YonghuiWu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
[40]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196, 2017.
[41]
Diederik P Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. ICLR, 2015.
[42]
Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. Scaling distributed machine learning with the parameter server. In OSDI, volume 14, pages 583--598, 2014.
[43]
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced Deep Residual Networks for Single Image Super- Resolution. In CVPR Workshops, volume 1, page 3, 2017.
[44]
Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep gradient compression: Reducing the communication bandwidth for distributed training. In ICLR, 2018.
[45]
Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. Nvidia tensor core programmability, performance & precision. arXiv preprint arXiv:1803.04014, 2018.
[46]
Dominic Masters and Carlo Luschi. Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612, 2018.
[47]
Sam McCandlish, Jared Kaplan, Dario Amodei, and OpenAI Dota Team. An empirical model of large-batch training. arXiv preprint arXiv:1812.06162, 2018.
[48]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaev, Ganesh Venkatesh, et al. Mixed Precision Training. arXiv preprint arXiv:1710.03740, 2017.
[49]
Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, and Christopher Ré. Asynchrony begets momentum, with an application to deep learning. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 997--1004. IEEE, 2016.
[50]
Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807--814, 2010.
[51]
Feng Niu, Benjamin Recht, Christopher Re, and Stephen Wright. Hogwild: A Lock-free Approach to Parallelizing Stochastic Gradient Descent. In NIPS, pages 693--701, 2011.
[52]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
[53]
Dexmont Pena, Andrew Forembski, Xiaofan Xu, and David Moloney. Benchmarking of CNNs for Low-Cost, Low-Power Robotics Applications. 2017.
[54]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized Evolution for Image Classifier Architecture Search. arXiv preprint arXiv:1802.01548, 2018.
[55]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do CIFAR-10 classifiers generalize to cifar-10? CoRR, abs/1806.00451, 2018.
[56]
Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951.
[57]
Alexander Sergeev and Mike Del Balso. Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799, 2018.
[58]
Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. Benchmarking State-of-the-Art Deep Learning Software Tools. In Cloud Computing and Big Data (CCBD). IEEE, 2016.
[59]
Samuel L Smith, Pieter-Jan Kindermans, and Quoc V Le. Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.
[60]
Jascha Sohl-Dickstein, Ben Poole, and Surya Ganguli. Fast Largescale Optimization by Unifying Stochastic Gradient and Quasi-Newton Methods. In ICML, pages 604--612, 2014.
[61]
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. Revisiting unreasonable effectiveness of data in deep learning era. CoRR, abs/1707.02968, 2017.
[62]
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the Importance of Initialization and Momentum in Deep Learning. In ICML, pages 1139--1147, 2013.
[63]
Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Communications of the ACM, 52(4):65--76, 2009.
[64]
Ashia C Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht. The marginal value of adaptive gradient methods in machine learning. In NIPS, pages 4148--4158, 2017.
[65]
Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. Imagenet training in minutes. In ICPP, page 1. ACM, 2018.
[66]
Ce Zhang and Christopher Ré. Dimmwitted: A Study of Main-memory Statistical Analytics. PVLDB, 7(12):1283--1294, 2014.
[67]
Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko. Tbd: Benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905, 2018.

Cited By

View all
  • (2025)Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge ComputingIEEE Transactions on Mobile Computing10.1109/TMC.2024.345779324:1(210-226)Online publication date: Jan-2025
  • (2024)Mastering Computer Vision Inference FrameworksCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651430(28-33)Online publication date: 7-May-2024
  • (2024)Cracking the code of adaptive immunity: The role of computational toolsCell Systems10.1016/j.cels.2024.11.00915:12(1156-1167)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 53, Issue 1
July 2019
90 pages
ISSN:0163-5980
DOI:10.1145/3352020
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019
Published in SIGOPS Volume 53, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)199
  • Downloads (Last 6 weeks)39
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge ComputingIEEE Transactions on Mobile Computing10.1109/TMC.2024.345779324:1(210-226)Online publication date: Jan-2025
  • (2024)Mastering Computer Vision Inference FrameworksCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651430(28-33)Online publication date: 7-May-2024
  • (2024)Cracking the code of adaptive immunity: The role of computational toolsCell Systems10.1016/j.cels.2024.11.00915:12(1156-1167)Online publication date: Dec-2024
  • (2024)Targeted prevention of risky deals for improper granular data with deep learningInternational Journal of System Assurance Engineering and Management10.1007/s13198-024-02646-8Online publication date: 6-Dec-2024
  • (2024)A systematic review of deep learning based image segmentation to detect polypArtificial Intelligence Review10.1007/s10462-023-10621-157:1Online publication date: 5-Jan-2024
  • (2024)Analyzing cultural relationships visual cues through deep learning models in a cross-dataset settingNeural Computing and Applications10.1007/s00521-023-08966-336:20(11727-11742)Online publication date: 1-Jul-2024
  • (2023)Towards Adversarial Robustness for Multi-Mode Data through Metric LearningSensors10.3390/s2313617323:13(6173)Online publication date: 5-Jul-2023
  • (2023)Improving Forest Detection Using Machine Learning and Remote Sensing: A Case Study in Southeastern SerbiaApplied Sciences10.3390/app1314828913:14(8289)Online publication date: 18-Jul-2023
  • (2023)Architectural Design of a Blockchain-Enabled, Federated Learning Platform for Algorithmic Fairness in Predictive Health Care: Design Science StudyJournal of Medical Internet Research10.2196/4654725(e46547)Online publication date: 30-Oct-2023
  • (2023)On the Principles and Decisions of New Word Translation in Sino-Japan Cross-Border e-CommerceInternational Journal of Digital Multimedia Broadcasting10.1155/2023/28137022023Online publication date: 1-Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media