JUWELS Booster – A Supercomputer for Large-Scale AI Research

Stefan Kesselheim¹²,
Andreas Herten¹²,
Kai Krajsek¹²,
Jan Ebert¹²,
Jenia Jitsev¹²,
Mehdi Cherti¹²,
Michael Langguth¹²,
Bing Gong¹²,
Scarlet Stadtler¹²,
Amirpasha Mozaffari¹²,
Gabriele Cavallaro¹²,
Rocco Sedona^12,13,
Alexander Schug^12,14,
Alexandre Strube¹²,
Roshni Kamath¹²,
Martin G. Schultz¹²,
Morris Riedel^12,13 &
…
Thomas Lippert¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12761))

Included in the following conference series:

International Conference on High Performance Computing

2059 Accesses
7 Citations

Abstract

In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the Jülich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its system architecture, parallel, distributed model training, and benchmarks indicating its outstanding performance. We exemplify its potential for research application by presenting large-scale AI research highlights from various scientific fields that require such a facility.

S. Kesselheim, A. Herten, K. Krajsek, J. Ebert, J. Jitsev, M. Cherti, M. Langguth, B. Gong, S. Stadtler, A. Mozaffari, G. Cavallaro, R. Sedona, A. Schug—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bayesian neural networks at scale: a performance analysis and pruning study

Article 04 September 2020

Parallel Bayesian ARTMAP and Its OpenCL Implementation

Article 06 July 2017

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

Notes

1.
See e.g. https://github.com/EleutherAI/the-pile.
2.
https://docs.nvidia.com/deeplearning/nccl/index.html.
3.
PyTorch allows AD for distributing tensors across computational devices based on the remote procedure call (RPC) protocol [9]. However, the RPC framework does not compete with communication frameworks like NCCL or MPI with respect to performance.
4.
https://github.com/helmholtz-analytics/mpi4torch.
5.
https://gitlab.version.fz-juelich.de/kesselheim1/mlperf_juwelsbooster.
6.
https://tinyurl.com/CovidNetXHelmholtz.
7.
http://bigearth.net/.

References

Intel Math Kernel Library. Reference Manual. Intel Corporation (2009)
Google Scholar
NVIDIA CUBLAS Library Documentation (2017). https://docs.nvidia.com/cuda/cublas/. Accessed 14 Apr 2021
Pucci, F., Schug, A.: Shedding light on the dark matter of the biomolecular structural universe: Progress in RNA 3D structure prediction. Methods 162–163, 68–73 (2019). https://doi.org/10.1016/j.ymeth.2019.04.012
Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems (2015). http://tensorflow.org/, Software available from tensorflow.org
Agarwal, S., Wang, H., Venkataraman, S., Papailiopoulos, D.: On the utility of gradient compression in distributed training systems. ArXiv abs/2103.00543 (2021)
Google Scholar
Amodei, D., Hernandez, D., Sastry, G., Clark, J., Brockman, G., Sutskever, I.: AI and compute. Technical report, OpenAI Blog (2018)
Google Scholar
Bauer, P., Thorpe, A., Brunet, G.: Nature. https://doi.org/10.1038/nature14956
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. U.S.A. 116, 15849–15854 (2019). https://doi.org/10.1073/pnas.1903070116
Article MathSciNet MATH Google Scholar
Birrell, A.D., Nelson, B.J.: Implementing remote procedure calls. ACM Trans. Comput. Syst. 2(1), 39–59 (1984)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Canty, M.: Image Analysis, Classification and Change Detection in Remote Sensing: With Algorithms for ENVI/IDL and Python, 3rd edn. Taylor & Francis, New York (2014). ISBN: 9781466570375
Google Scholar
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 (2020)
Cherti, M., Jitsev, J.: Effect of large-scale pre-training on full and few-shot transfer learning for natural and medical images. arXiv preprint arXiv:2106.00116 (2021)
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning (2014)
Google Scholar
Cohen, J.P., Morrison, P., Dao, L., Roth, K., Duong, T.Q., Ghassemi, M.: Covid-19 image data collection: Prospective predictions are the future. J. Mach. Learn. Biomed. Imaging (2020)
Google Scholar
Cuturello, F., Tiana, G., Bussi, G.: Assessing the accuracy of direct-coupling analysis for RNA contact prediction (2020). https://doi.org/10.1261/rna.074179.119
Dago, A.E., Schug, A., Procaccini, A., Hoch, J.A., Weigt, M., Szurmant, H.: Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc. Natl. Acad. Sci. 109(26), E1733–E1742 (2012)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, June 2009. https://doi.org/10.1109/CVPR.2009.5206848
Deng, L., Yu, D., Platt, J.: Scalable stacking and learning for building deep architectures. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2133–2136 (2012). https://doi.org/10.1109/ICASSP.2012.6288333
Dettmers, T.: 8-bit approximations for parallelism in deep learning (2015). arxiv:1511.04561
De Leonardis, E., et al.: Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucl. Acids Res. 43(21), 10444–10455 (2015). https://doi.org/10.1093/nar/gkv932
Article Google Scholar
Ginsburg, B., et al.: Stochastic gradient methods with layer-wise adaptive moments for training of deep networks (2020)
Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: training Imagenet in 1 hour. CoRR abs/1706.02677 (2017). http://arxiv.org/abs/1706.02677
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour (2018)
Google Scholar
Götz, M., et al.: HeAT - a distributed and GPU-accelerated tensor framework for data analytics. In: Proceedings of the 19th IEEE International Conference on Big Data, pp. 276–288. IEEE, December 2020
Google Scholar
Hernandez, D., Kaplan, J., Henighan, T., McCandlish, S.: Scaling laws for transfer. arXiv preprint arXiv:2102.01293 (2021)
Hersbach, H., et al.: The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020). https://doi.org/10.1002/qj.3803
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 448–456. PMLR, Lille, France, 7–9 July 2015. http://proceedings.mlr.press/v37/ioffe15.html
Jülich Supercomputing Centre: JUWELS: Modular Tier-0/1 Supercomputer at the Jülich Supercomputing Centre. J. Large-Scale Res. Facil. 5(A171) (2019). http://dx.doi.org/10.17815/jlsrf-5-171
Kalvari, I., et al.: RFAM 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46(D1), D335–D342 (2017). https://doi.org/10.1093/nar/gkx1038
Kaplan, J., et al.: Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)
Kolesnikov, A., et al.: Big transfer (bit): general visual representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision - ECCV 2020, pp. 491–507. Springer, Cham (2020)
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kurth, T., et al.: Exascale deep learning for climate analytics. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 649–660. IEEE (2018)
Google Scholar
Laanait, N., et al.: Exascale deep learning for scientific inverse problems. arXiv preprint arXiv:1909.11150 (2019)
Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., Levine, S.: Stochastic adversarial video prediction. arXiv preprint arXiv:1804.01523 (2018)
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D.J., Batra, D.: Why M heads are better than one: Training a diverse ensemble of deep networks. CoRR abs/1511.06314 (2015). http://arxiv.org/abs/1511.06314
Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. arXiv e-prints arXiv:1711.00436, November 2017
Lorenzo, P.R., Nalepa, J., Ramos, L., Ranilla, J.: Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion (2017)
Google Scholar
Mattson, P., et al.: MLPerf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40(2), 8–16 (2020)
Article Google Scholar
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.1. High Performance Computing Center Stuttgart (HLRS) (2015). https://fs.hlrs.de/projects/par/mpi//mpi31/
Muller, U.A., Gunzinger, A.: Neural net simulation on parallel computers. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN 1994), vol. 6, pp. 3961–3966 (1994). https://doi.org/10.1109/ICNN.1994.374845
Orhan, E., Gupta, V., Lake, B.M.: Self-supervised learning through the eyes of a child. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Patton, R.M., et al.: Exascale deep learning to accelerate cancer research. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 1488–1496. IEEE (2019)
Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519, June 2014. https://doi.org/10.1109/CVPRW.2014.131
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N.: Prabhat: deep learning and process understanding for data-driven Earth system science. Nature (2019). https://doi.org/10.1038/s41586-019-0912-1
Article Google Scholar
Ren, J., et al.: Zero-offload: Democratizing billion-scale model training (2021)
Google Scholar
Rocklin, M.: Dask: parallel computation with blocked algorithms and task scheduling. In: Huff, K., Bergstra, J. (eds.) Proceedings of the 14th Python in Science Conference (SciPy 2015), pp. 130–136 (2015)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Schmitt, M., Hughes, L.: Sen12ms
Google Scholar
Schug, A., Weigt, M., Onuchic, J.N., Hwa, T., Szurmant, H.: High-resolution protein complexes from integrating genomic information with molecular simulation. Proc. Natl. Acad. Sci. 106(52), 22124–22129 (2009)
Article Google Scholar
Senior, A.W., et al.: Improved protein structure prediction using potentials from deep learning. Nature 577(7792), 706–710 (2020). https://doi.org/10.1038/s41586-019-1923-7
Article Google Scholar
Sergeev, A., Balso, M.D.: Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)
Shallue, C.J., Lee, J., Antognini, J., Sohl-Dickstein, J., Frostig, R., Dahl, G.E.: Measuring the effects of data parallelism on neural network training. J. Mach. Learn. Res. 20, 1–49 (2019)
MathSciNet Google Scholar
Shi, X., et al.: Convolutional lstm network: A machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Sriram, A., et al.: Covid-19 deterioration prediction via self-supervised representation learning and multi-image prediction. arXiv preprint arXiv:2101.04909 (2021)
Stodden, V., et al.: Enhancing reproducibility for computational methods. Science 354(6317), 1240–1241 (2016)
Article Google Scholar
Subramoney, A., et al.: Igitugraz/l2l: v1.0.0-beta, March 2019. https://doi.org/10.5281/zenodo.2590760
Sumbul, G., Charfuelan, M., Demir, B., Markl, V.: BigEarthNet: a large-scale benchmark archive for remote sensing image understanding. In: Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2019). https://doi.org/10.1109/igarss.2019.8900532
Sumbul, G., Kang, J., Kreuziger, T., Marcelino, F., Costa, H., et al.: BigEarthNet dataset with a new class-nomenclature for remote sensing image understanding (2020). http://arxiv.org/abs/2001.06372
Uguzzoni, G., Lovis, S.J., Oteri, F., Schug, A., Szurmant, H., Weigt, M.: Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis. Proc. Natl. Acad. Sci. 114(13), E2662–E2671 (2017)
Article Google Scholar
Vogels, T., Karimireddy, S.P., Jaggi, M.: PowerSGD: practical low-rank gradient compression for distributed optimization. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/d9fbed9da256e344c1fa46bb46c34c5f-Paper.pdf
Wang, L., Lin, Z.Q., Wong, A.: COVID-net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images. Sci. Rep. 10, 19549 (2020). https://doi.org/10.1038/s41598-020-76550-z
Article Google Scholar
Wehbe, R.M., et al.: DeepCOVID-XR: an artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large U.S. clinical data set. Radiology 299, E167–E176 (2021). https://doi.org/10.1148/radiol.2020203511
Weigt, M., White, R.A., Szurmant, H., Hoch, J.A., Hwa, T.: Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Nat. Acad. Sci. 106(1), 67–72 (2009)
Article Google Scholar
Zerihun, M.B., Pucci, F., Peter, E.K., Schug, A.: pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences. Bioinformatics 36(7), 2264–2265 (2020)
Article Google Scholar
Zerihun, M.B., Pucci, F., Schug, A.: Coconet: boosting RNA contact prediction by convolutional neural networks. bioRxiv (2020)
Google Scholar
Zhang, D., et al.: The AI index 2021 annual report, Technical report. AI Index Steering Committee, Human-Centered AI Institute, Stanford University, Stanford, CA (2021)
Google Scholar
Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/d18f655c3fce66ca401d5f38b48c89af-Paper.pdf

Download references

Acknowledgements

This work was funded by Helmholtz Association’s Initiative and Networking Fund under project number ZT-I-0003 and HelmholtzAI computing resources (HAICORE) Funding has been obtained through grants ERC-2017-ADG 787576 (IntelliAQ) and BMBF 01 IS 18O47A (DeepRain). This work was performed in the CoE RAISE and DEEP-EST projects receiving funding from EU’s Horizon 2020 Research and Innovation Framework Programme under the grant agreement no. 951733 and no. 754304 respectively. We thank ECMWF for providing ERA-5 data. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputers JUWELS, JUWELS Booster at Jülich Supercomputing Centre (JSC) and we acknowledge computing resources from the Helmholtz Data Federation. Further computing time was provided on supercomputer JUSUF in frame of offer for epidemiology research on COVID-19 by JSC.

Author information

Authors and Affiliations

Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Jülich, Germany
Stefan Kesselheim, Andreas Herten, Kai Krajsek, Jan Ebert, Jenia Jitsev, Mehdi Cherti, Michael Langguth, Bing Gong, Scarlet Stadtler, Amirpasha Mozaffari, Gabriele Cavallaro, Rocco Sedona, Alexander Schug, Alexandre Strube, Roshni Kamath, Martin G. Schultz, Morris Riedel & Thomas Lippert
School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Rocco Sedona & Morris Riedel
University of Duisburg-Essen, Duisburg, Germany
Alexander Schug

Authors

Stefan Kesselheim
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Herten
View author publications
You can also search for this author in PubMed Google Scholar
Kai Krajsek
View author publications
You can also search for this author in PubMed Google Scholar
Jan Ebert
View author publications
You can also search for this author in PubMed Google Scholar
Jenia Jitsev
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Cherti
View author publications
You can also search for this author in PubMed Google Scholar
Michael Langguth
View author publications
You can also search for this author in PubMed Google Scholar
Bing Gong
View author publications
You can also search for this author in PubMed Google Scholar
Scarlet Stadtler
View author publications
You can also search for this author in PubMed Google Scholar
Amirpasha Mozaffari
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Cavallaro
View author publications
You can also search for this author in PubMed Google Scholar
Rocco Sedona
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schug
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Strube
View author publications
You can also search for this author in PubMed Google Scholar
Roshni Kamath
View author publications
You can also search for this author in PubMed Google Scholar
Martin G. Schultz
View author publications
You can also search for this author in PubMed Google Scholar
Morris Riedel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Lippert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Kesselheim .

Editor information

Editors and Affiliations

University of Tennessee at Knoxville, Knowville, TN, USA
Heike Jagode
Karlsruhe Institute of Technology, Karlsruhe, Baden-Württemberg, Germany
Hartwig Anzt
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Hatem Ltaief
University of Tennessee System, Knoxville, TN, USA
Piotr Luszczek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kesselheim, S. et al. (2021). JUWELS Booster – A Supercomputer for Large-Scale AI Research. In: Jagode, H., Anzt, H., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12761. Springer, Cham. https://doi.org/10.1007/978-3-030-90539-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-90539-2_31
Published: 13 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90538-5
Online ISBN: 978-3-030-90539-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

JUWELS Booster – A Supercomputer for Large-Scale AI Research

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian neural networks at scale: a performance analysis and pruning study

Parallel Bayesian ARTMAP and Its OpenCL Implementation

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

JUWELS Booster – A Supercomputer for Large-Scale AI Research

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian neural networks at scale: a performance analysis and pruning study

Parallel Bayesian ARTMAP and Its OpenCL Implementation

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation