research-article

Free access

An Evaluation of the Effect of Network Cost Optimization for Leadership Class Supercomputers

Authors:

Verónica Melesse Vergara,

Christopher Zimmer,

Scott AtchleyAuthors Info & Claims

SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

Article No.: 31, Pages 1 - 16

https://doi.org/10.1109/SC41406.2024.00037

Published: 17 November 2024 Publication History

PDF eReader

Abstract

Dragonfly-based networks are an extensively deployed network topology in large-scale high-performance computing due to their cost-effectiveness and efficiency. The US will soon have three Exascale supercomputers for leadership class workloads deployed using dragonfly networks. Compared to indirect networks of similar scale, the dragonfly network has considerably reduced cable lengths, cable counts, and switch counts, resulting in significant network cost savings for a given system size, however, these cost reductions result in reduced global minimal paths and more challenging routing. Additionally, large scale dragonfly networks often require a taper at the global link level, resulting in less bisection bandwidth than is achievable in other traditional non-blocking topologies of equivalent scale. While dragonfly networks have been extensively studied, they have yet to be fully evaluated in an extreme scale (i.e., exascale) system that targets capability workloads. In this paper, we present the results of the first large scale evaluation of a dragonfly network on an exascale system (Frontier) and compare its behavior to a similar scale fat-tree network on a previous generation TOP500 system (Summit). This evaluation aims to determine the effect of network cost optimizations by measuring a tapered topology's impact on capability workloads. Our evaluation is based on a collection of synthetic microbenchmarks, mini-apps, and full scale applications. It compares the scaling efficiencies of each benchmark between the dragonfly-based Frontier and the fat-tree-based Summit systems. Our results show that a dragonfly network is ~30% more cost efficient than a fat-tree topology, which amortizes to ~3% of an exascale system cost. Furthermore, while tapered dragonfly networks impose significant tradeoffs, the impacts are not as broad as initially thought and are mostly seen in applications with global communication patterns, particularly all-to-all (e.g., FFT-based algorithms), but also local communication patterns (e.g., nearest-neighbor algorithms) that are sensitive to network performance variability.

References

[1]

TOP500 Lists, 2024. http://www.top500.org/lists/.

Abstract

References

Index Terms

Recommendations

An evaluation of network architectures for next generation supercomputers

A low-cost and latency bypass channel-based on-chip network

Cost considerations in network on chip

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations