Abstract
The increased availability of High-Performance Computing resources can enable data scientists to deploy and evaluate data-driven approaches, notably in the field of deep learning, at a rapid pace. As deep neural networks become more complex and are ingesting increasingly larger datasets, it becomes unpractical to perform the training phase on single machine instances due to memory constraints, and extremely long training time. Rather than scaling up, scaling out the computing resources is a productive approach to improve performance. The paradigm of data parallelism allows us to split the training dataset into manageable chunks that can be processed in parallel. In this work, we evaluate the scaling performance of training a 3D generative adversarial network (GAN) on an IBM POWER8 cluster, equipped with 12 NVIDIA P100 GPUs. The full training duration of the GAN, including evaluation, is reduced from 20 h and 16 min on a single GPU, to 2 h and 14 min on all 12 GPUs. We achieve a scaling efficiency of 98.9% when scaling from 1 to 12 GPUs, taking only the training process into consideration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Except for the first epoch, which includes the cuDNN configuration time. For this reason, we start timing from the second epoch.
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
Aicheler, M., et al.: A multi-TeV linear collider based on CLIC technology: CLIC conceptual design report. CERN Yellow Reports: Monographs, CERN, Geneva (2012). https://doi.org/10.5170/CERN-2012-007. https://cds.cern.ch/record/1500095
Bird, I.: Workshop introduction, context of the workshop: half-way through run2; preparing for run3, run4. WLCG Workshop (2016)
Carminati, F., Khattak, G., Pierini, M., Vallecor-safa, S., Farbin, A.: Calorimetry with deep learning: particle classification, energy regression, and simulation for high-energy physics. In: NIPS (2017)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30218-6_19
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Goodfellow, I.J.: On distinguishability criteria for estimating generative models. arXiv preprint arXiv:1412.6515 (2014)
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)
Khattak, G., Vallecorsa, S., Carminati, F.: Three dimensional energy parametrized generative adversarial networks for electromagnetic shower simulation. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3913–3917, October 2018
Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014)
Kurth, T., Smorkalov, M., Mendygral, P., Sridharan, S., Mathuriya, A.: Tensorflow at scale: performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML. Concurr. Comput.: Pract. Exp. (2018). https://doi.org/10.1002/cpe.4989
Le, T.D., Imai, H., Negishi, Y., Kawachiya, K.: TFLMS: large model support in TensorFlow by graph rewriting. arXiv preprint arXiv:1807.02037 (2018)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? arXiv preprint arXiv:1801.04406 (2018)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2642–2651. JMLR.org (2017)
de Oliveira, L., Paganini, M., Nachman, B.: Learning particle physics by example: location-aware generative adversarial networks for physics synthesis. Comput. Softw. Big Sci. 1(1), 4 (2017)
Paganini, M., de Oliveira, L., Nachman, B.: CaloGAN: simulating 3D high energy particle showers in multilayer electromagnetic calorimeters with generative adversarial networks. Phys. Rev. D 97(1), 014021 (2018)
Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)
Quintero, D.: IBM POWER8 high-performance computing guide: IBM power system S822LC (8335-GTB) edition. IBM Corporation, International Technical Support Organization, Poughkeepsie, NY (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)
Vallecorsa, S.: Generative models for fast simulation. J. Phys.: Conf. Ser. 1085, 022005 (2018). https://doi.org/10.1088/1742-6596/1085/2/022005
Vallecorsa, S., Moise, D., Carminati, F., Khattak, G.R.: Data-parallel training of generative adversarial networks on HPC systems for HEP simulations. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 162–171, December 2018. https://doi.org/10.1109/HiPC.2018.00026
Vallecorsa, S., et al.: Distributed training of generative adversarial networks for fast detector simulation. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 487–503. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_35
Wale, D.: IBM PowerAI : Deep Learning Unleashed on IBM Power Systems Servers. IBM Redbooks, S.l (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hesam, A., Vallecorsa, S., Khattak, G., Carminati, F. (2019). Evaluating POWER Architecture for Distributed Training of Generative Adversarial Networks. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)