Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/SC.2018.00055acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

The design, deployment, and evaluation of the CORAL pre-exascale systems

Published: 26 July 2019 Publication History

Abstract

CORAL, the Collaboration of Oak Ridge, Argonne and Livermore, is fielding two similar IBM systems, Summit and Sierra, with NVIDIA GPUs that will replace the existing Titan and Sequoia systems. Summit and Sierra are currently ranked No. 1 and No. 3, respectively on the Top500 list. We discuss the design and key differences of the systems. Our evaluation of the systems highlights the following. Applications that fit in HBM see the most benefit and may prefer more GPUs; however, for some applications, the CPU-GPU bandwidth is more important than the number of GPUs. The node-local burst buffer scales linearly, and can achieve a 4X improvement over the parallel file system for large jobs; smaller jobs, however, may benefit from writing directly to the PFS. Finally, several CPU, network and memory bound analytics and GPU-bound deep learning codes achieve up to a 11X and 79X speedup/node, respectively over Titan.

References

[1]
Oak Ridge National Laboratory, "Titan ~ Oak Ridge Leadership Computing Facility," https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/ (visited March 2018).
[2]
Lawrence Livermore National Laboratory, "Sequoia - Computation," https://computation.llnl.gov/computers/sequoia (visited March 2018).
[3]
J. Dongarra, H. Meuer, and E. Strohmaier, "Top500 Supercomputing Sites," http://www.top500.org, 2015.
[4]
Wikipedia, "Cray-1," https://en.wikipedia.org/wiki/Cray-1 (visited March 2018).
[5]
R. M. Russell, "The CRAY-1 Computer System," Commun. ACM, vol. 21, no. 1, pp. 63--72, Jan. 1978. {Online}. Available
[6]
P. B. Schneck, The CDC STAR-100. Boston, MA: Springer US, 1987, pp. 99--117.
[7]
A. Moody, G. Bronevetsky, K. Mohror, and B. R. de Supinski, "Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System," in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 2010, pp. 1--11.
[8]
J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, "PLFS: A Checkpoint Filesystem for Parallel Applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Nov 2009, pp. 1--12.
[9]
IBM-POWER9-NPU team, "Functionality and Performance of NVLink with POWER9 processors," IBM Journal of Research & Development, vol. 62/4-5 - IBM POWER9.
[10]
Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL BENCHMARKS," https://asc.llnl.gov/coral-benchmarks/ (visited March 2018).
[11]
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn, "On the Role of Burst Buffers in Leadership-Class Storage Systems," in Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on. IEEE, 2012, pp. 1--11.
[12]
K. Harms, S. Oral, S. Atchley, and S. S. Vazhkudai, "Impact of Burst Buffer Architectures on Application Portability," ORNL-TR-69359, Oak Ridge National Laboratory.(ORNL), Oak Ridge, TN (USA), Tech. Rep., 2016.
[13]
W. Bhimji, D. Bard, M. Romanus, D. Paul, A. Ovsyannikov, B. Friesen, M. Bryson, J. Correa, G. K. Lockwood, V. Tsulaia, S. Byna, S. Farrell, D. Gursoy, C. Daley, V. Beckner, B. Van Straalen, D. Trebotich, C. Tull, G. Weber, N. J. Wright, K. Antypas, and Prabhat, "Accelerating Science with the NERSC Burst Buffer Early User Program," Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), Tech. Rep., 2016.
[14]
"What is the ELK Stack?" {Online}. Available: https://www.elastic.co/elk-stack
[15]
"Telegraf is the Agent for Collecting & Reporting Metrics & Data." {Online}. Available: https://www.influxdata.com/time-series-platform/telegraf/
[16]
"Your Window into the Elastic Stack." {Online}. Available: https://www.elastic.co/products/kibana
[17]
"What Is Apache Hadoop?" {Online}. Available: http://hadoop.apache.org
[18]
W. Joubert, "Performance of Variant Memory Configurations for Cray XT Systems," in Cray User Group CUG2990 Proceedings, 2009.
[19]
University of Bristol, "BabelStream," https://github.com/UoB-HPC/BabelStream (visited March 2018).
[20]
A. Moody, "Contention-Free Routing for Shift-based Communication in MPI Applications on Large-scale InfiniBand Clusters," LLNL-TR-418522, Lawrence Livermore National Laboratory.(LLNL), Livermore, CA (USA), Tech. Rep., 2009.
[21]
Ohio Sate University Network-Based Computing Laboratory, "MVA-PICH::Benchamrks," http://mvapich.cse.ohio-state.edu/benchmarks/ (visited March 2018).
[22]
G. Hagen, G. R. Jansen, and T. Papenbrock, "Structure of <sup>78</sup>Ni from First-Principles Computations," Phys. Rev. Lett., vol. 117, p. 172501, Oct 2016. {Online}. Available:
[23]
W. Tang, B. Wang, S. Ethier, and Z. Lin, "Performance Portability of HPC Discovery Science Software: Fusion Energy Turbulence Simulations at Extreme Scale," Supercomputing Frontiers and Innovations, vol. 4, no. 1, pp. 83--97, 2017.
[24]
Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL Benchmarks: AMG2013 Summary," https://asc.llnl.gov/CORAL-benchmarks/Summaries/AMG2013_Summary_v2.3.pdf (visited March 2018).
[25]
D. Appelhans and B. Walkup, "Leveraging NVLINK and Asynchronous Data Transfer to Scale Beyond the Memory Capacity of GPUs," in Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ser. ScalA '17. New York, NY, USA: ACM, 2017, pp. 5:1--5:5. {Online}. Available
[26]
Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL Benchmarks: UMT2013 Summary," https://asc.llnl.gov/CORAL-benchmarks/Summaries/UMT2013_Summary_v1.2.pdf (visited March 2018).
[27]
Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL 2 BENCHMARKS," https://asc.llnl.gov/coral-2-benchmarks/ (visited March 2018).
[28]
Cancer Distributed Learning Environment, "CANDLE," https://candle.cels.anl.gov (visited March 2018).
[29]
"ImageNet," https://www.image-net.org (visited March 2018).
[30]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv:1512.03385, 2015.
[31]
"Keras," https://keras.io (visited March 2018).
[32]
Google, "Tensorflow," https://www.tensorflow.org (visited March 2018).
[33]
Uber, "Horovod," https://github.com/uber/horovod (visited March 2018).
[34]
S. S. Vazhkudai, R. Miller, D. Tiwari, C. Zimmer, F. Wang, S. Oral, R. Gunasekaran, and D. Steinert, "GUIDE: A Scalable Information Directory Service to Collect, Federate, and Analyze Logs for Operational Insights into a Leadership HPC Facility," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '17. New York, NY, USA: ACM, 2017, pp. 45:1--45:12. {Online}. Available

Cited By

View all
  • (2024)CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC NodesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656591(426-436)Online publication date: 30-May-2024
  • (2024)NVIDIA Grace Superchip Early Evaluation for HPC ApplicationsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637284(45-54)Online publication date: 11-Jan-2024
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis
November 2018
932 pages

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 26 July 2019

Check for updates

Qualifiers

  • Research-article

Conference

SC18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC NodesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656591(426-436)Online publication date: 30-May-2024
  • (2024)NVIDIA Grace Superchip Early Evaluation for HPC ApplicationsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637284(45-54)Online publication date: 11-Jan-2024
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)Frontier: Exploring ExascaleProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607089(1-16)Online publication date: 12-Nov-2023
  • (2022)Distributed Acceleration of Adhesive Dynamics SimulationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555832(37-45)Online publication date: 14-Sep-2022
  • (2022)Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation AlgorithmProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555821(11-17)Online publication date: 14-Sep-2022
  • (2022)Frequency Recovery in Power Grids using High-Performance ComputingWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548632(1-6)Online publication date: 29-Aug-2022
  • (2022)Automatic Differentiation of Parallel Loops with Formal MethodsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545089(1-11)Online publication date: 29-Aug-2022
  • (2022)Efficiently emulating high-bitwidth computation with low-bitwidth hardwareProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532377(1-12)Online publication date: 28-Jun-2022
  • (2022)Automatic differentiation of parallel loops with formal methodsProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508442(463-464)Online publication date: 2-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media