research-article

The design, deployment, and evaluation of the CORAL pre-exascale systems

SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

Article No.: 52, Pages 1 - 12

https://doi.org/10.1109/SC.2018.00055

Published: 26 July 2019 Publication History

Abstract

CORAL, the Collaboration of Oak Ridge, Argonne and Livermore, is fielding two similar IBM systems, Summit and Sierra, with NVIDIA GPUs that will replace the existing Titan and Sequoia systems. Summit and Sierra are currently ranked No. 1 and No. 3, respectively on the Top500 list. We discuss the design and key differences of the systems. Our evaluation of the systems highlights the following. Applications that fit in HBM see the most benefit and may prefer more GPUs; however, for some applications, the CPU-GPU bandwidth is more important than the number of GPUs. The node-local burst buffer scales linearly, and can achieve a 4X improvement over the parallel file system for large jobs; smaller jobs, however, may benefit from writing directly to the PFS. Finally, several CPU, network and memory bound analytics and GPU-bound deep learning codes achieve up to a 11X and 79X speedup/node, respectively over Titan.

References

[1]

Oak Ridge National Laboratory, "Titan ~ Oak Ridge Leadership Computing Facility," https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/ (visited March 2018).

Google Scholar

[2]

Lawrence Livermore National Laboratory, "Sequoia - Computation," https://computation.llnl.gov/computers/sequoia (visited March 2018).

Google Scholar

[3]

J. Dongarra, H. Meuer, and E. Strohmaier, "Top500 Supercomputing Sites," http://www.top500.org, 2015.

Google Scholar

[4]

Wikipedia, "Cray-1," https://en.wikipedia.org/wiki/Cray-1 (visited March 2018).

Google Scholar

[5]

R. M. Russell, "The CRAY-1 Computer System," Commun. ACM, vol. 21, no. 1, pp. 63--72, Jan. 1978. {Online}. Available

Digital Library

Google Scholar

[6]

P. B. Schneck, The CDC STAR-100. Boston, MA: Springer US, 1987, pp. 99--117.

Google Scholar

[7]

A. Moody, G. Bronevetsky, K. Mohror, and B. R. de Supinski, "Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System," in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 2010, pp. 1--11.

Digital Library

Google Scholar

[8]

J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, "PLFS: A Checkpoint Filesystem for Parallel Applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Nov 2009, pp. 1--12.

Digital Library

Google Scholar

[9]

IBM-POWER9-NPU team, "Functionality and Performance of NVLink with POWER9 processors," IBM Journal of Research & Development, vol. 62/4-5 - IBM POWER9.

Google Scholar

[10]

Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL BENCHMARKS," https://asc.llnl.gov/coral-benchmarks/ (visited March 2018).

Google Scholar

[11]

N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn, "On the Role of Burst Buffers in Leadership-Class Storage Systems," in Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on. IEEE, 2012, pp. 1--11.

Google Scholar

[12]

K. Harms, S. Oral, S. Atchley, and S. S. Vazhkudai, "Impact of Burst Buffer Architectures on Application Portability," ORNL-TR-69359, Oak Ridge National Laboratory.(ORNL), Oak Ridge, TN (USA), Tech. Rep., 2016.

Crossref

Google Scholar

[13]

W. Bhimji, D. Bard, M. Romanus, D. Paul, A. Ovsyannikov, B. Friesen, M. Bryson, J. Correa, G. K. Lockwood, V. Tsulaia, S. Byna, S. Farrell, D. Gursoy, C. Daley, V. Beckner, B. Van Straalen, D. Trebotich, C. Tull, G. Weber, N. J. Wright, K. Antypas, and Prabhat, "Accelerating Science with the NERSC Burst Buffer Early User Program," Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), Tech. Rep., 2016.

Google Scholar

[14]

"What is the ELK Stack?" {Online}. Available: https://www.elastic.co/elk-stack

Google Scholar

[15]

"Telegraf is the Agent for Collecting & Reporting Metrics & Data." {Online}. Available: https://www.influxdata.com/time-series-platform/telegraf/

Google Scholar

[16]

"Your Window into the Elastic Stack." {Online}. Available: https://www.elastic.co/products/kibana

Google Scholar

[17]

"What Is Apache Hadoop?" {Online}. Available: http://hadoop.apache.org

Google Scholar

[18]

W. Joubert, "Performance of Variant Memory Configurations for Cray XT Systems," in Cray User Group CUG2990 Proceedings, 2009.

Google Scholar

[19]

University of Bristol, "BabelStream," https://github.com/UoB-HPC/BabelStream (visited March 2018).

Google Scholar

[20]

A. Moody, "Contention-Free Routing for Shift-based Communication in MPI Applications on Large-scale InfiniBand Clusters," LLNL-TR-418522, Lawrence Livermore National Laboratory.(LLNL), Livermore, CA (USA), Tech. Rep., 2009.

Google Scholar

[21]

Ohio Sate University Network-Based Computing Laboratory, "MVA-PICH::Benchamrks," http://mvapich.cse.ohio-state.edu/benchmarks/ (visited March 2018).

Google Scholar

[22]

G. Hagen, G. R. Jansen, and T. Papenbrock, "Structure of <sup>78</sup>Ni from First-Principles Computations," Phys. Rev. Lett., vol. 117, p. 172501, Oct 2016. {Online}. Available:

Crossref

Google Scholar

[23]

W. Tang, B. Wang, S. Ethier, and Z. Lin, "Performance Portability of HPC Discovery Science Software: Fusion Energy Turbulence Simulations at Extreme Scale," Supercomputing Frontiers and Innovations, vol. 4, no. 1, pp. 83--97, 2017.

Digital Library

Google Scholar

[24]

Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL Benchmarks: AMG2013 Summary," https://asc.llnl.gov/CORAL-benchmarks/Summaries/AMG2013_Summary_v2.3.pdf (visited March 2018).

Google Scholar

[25]

D. Appelhans and B. Walkup, "Leveraging NVLINK and Asynchronous Data Transfer to Scale Beyond the Memory Capacity of GPUs," in Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ser. ScalA '17. New York, NY, USA: ACM, 2017, pp. 5:1--5:5. {Online}. Available

Digital Library

Google Scholar

[26]

Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL Benchmarks: UMT2013 Summary," https://asc.llnl.gov/CORAL-benchmarks/Summaries/UMT2013_Summary_v1.2.pdf (visited March 2018).

Google Scholar

[27]

Collaboration of Oak Ridge, Argonne and Livermore (CORAL), "CORAL 2 BENCHMARKS," https://asc.llnl.gov/coral-2-benchmarks/ (visited March 2018).

Google Scholar

[28]

Cancer Distributed Learning Environment, "CANDLE," https://candle.cels.anl.gov (visited March 2018).

Google Scholar

[29]

"ImageNet," https://www.image-net.org (visited March 2018).

Google Scholar

[30]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv:1512.03385, 2015.

Google Scholar

[31]

"Keras," https://keras.io (visited March 2018).

Google Scholar

[32]

Google, "Tensorflow," https://www.tensorflow.org (visited March 2018).

Google Scholar

[33]

Uber, "Horovod," https://github.com/uber/horovod (visited March 2018).

Google Scholar

[34]

S. S. Vazhkudai, R. Miller, D. Tiwari, C. Zimmer, F. Wang, S. Oral, R. Gunasekaran, and D. Steinert, "GUIDE: A Scalable Information Directory Service to Collect, Federate, and Analyze Logs for Operational Insights into a Leadership HPC Facility," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '17. New York, NY, USA: ACM, 2017, pp. 45:1--45:12. {Online}. Available

Digital Library

Google Scholar

Cited By

View all

Hidayetoglu MDe Gonzalo SSlaughter ELi YZimmer CBicer TRen BGropp WHwu WAiken A(2024)CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC NodesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656591(426-436)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656591
Banchelli FVinyals-Ylla-Catala JPocurull JClascà MPeiro KSpiga FGarcia-Gasulla MMantovani F(2024)NVIDIA Grace Superchip Early Evaluation for HPC ApplicationsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637284(45-54)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3636480.3637284
Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Show More Cited By

The design, deployment, and evaluation of the CORAL pre-exascale systems

Recommendations

The design, deployment, and evaluation of the CORAL pre-exascale systems
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

CORAL, the Collaboration of Oak Ridge, Argonne and Livermore, is fielding two similar IBM systems, Summit and Sierra, with NVIDIA GPUs that will replace the existing Titan and Sequoia systems. Summit and Sierra are currently ranked No. 1 and No. 3, ...
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Hybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

November 2018

932 pages

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 26 July 2019

Check for updates

Qualifiers

Research-article

Conference

SC18

Sponsor:

SIGHPC

SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 11 - 16, 2018

Texas, Dallas

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
98
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Hidayetoglu MDe Gonzalo SSlaughter ELi YZimmer CBicer TRen BGropp WHwu WAiken A(2024)CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC NodesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656591(426-436)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656591
Banchelli FVinyals-Ylla-Catala JPocurull JClascà MPeiro KSpiga FGarcia-Gasulla MMantovani F(2024)NVIDIA Grace Superchip Early Evaluation for HPC ApplicationsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3636480.3637284(45-54)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3636480.3637284
Bez JByna SIbrahim S(2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3611007
Atchley SZimmer CLange JBernholdt DMelesse Vergara VBeck TBrim MBudiardja RChandrasekaran SEisenbach MEvans TEzell MFrontiere NGeorgiadou AGlenski JGrete PHamilton SHolmen JHuebl AJacobson DJoubert WMcmahon KMerzari EMoore SMyers ANichols SOral SPapatheodore TPerez DRogers DSchneider EVay JYeung PMohror KArnold DBadia R(2023)Frontier: Exploring ExascaleProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607089(1-16)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607089
Puleri DMartin ARandles A(2022)Distributed Acceleration of Adhesive Dynamics SimulationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555832(37-45)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555832
Chochia GSolt DHursey J(2022)Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation AlgorithmProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555821(11-17)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555821
Rao VSubramanyam ASchanen MKim YSatkauskas IAnitescu M(2022)Frequency Recovery in Power Grids using High-Performance ComputingWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548632(1-6)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3547276.3548632
Hückelheim JHascoët L(2022)Automatic Differentiation of Parallel Loops with Formal MethodsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545089(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545089
Ma ZWang HFeng GZhang CXie LHe JChen SZhai JRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Efficiently emulating high-bitwidth computation with low-bitwidth hardwareProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532377(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532377
Hückelheim JHascoët LLee JAgrawal KSpear M(2022)Automatic differentiation of parallel loops with formal methodsProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508442(463-464)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508442
Show More Cited By