Article

Technology-Driven, Highly-Scalable Dragonfly Topology

Authors:

Wiliam J. Dally,

Dennis AbtsAuthors Info & Claims

ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture

Pages 77 - 88

https://doi.org/10.1109/ISCA.2008.19

Published: 01 June 2008 Publication History

Abstract

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because cables dominate network cost, the number of cables, and particularly the number of long, global cables should be minimized to realize an efficient network. In this paper, we introduce the dragonfly topology which uses a group of high-radix routers as a virtual router to increase the effective radix of the network. With this organization, each minimally routed packet traverses at most one global channel. By reducing global channels, a dragonfly reduces cost by 20% compared to a flattened butterfly and by 52% compared to a folded Clos network in configurations with ≥ 16K nodes.We also introduce two new variants of global adaptive routing that enable load-balanced routing in the dragonfly. Each router in a dragonfly must make an adaptive routing decision based on the state of a global channel connected to a different router. Because of the indirect nature of this routing decision, conventional adaptive routing algorithms give degraded performance. We introduce the use of selective virtual-channel discrimination and the use of credit round-trip latency to both sense and signal channel congestion. The combination of these two methods gives throughput and latency that approaches that of an ideal adaptive routing algorithm.

References

[1]

D. Abts, A. Bataineh, S. Scott, G. Faanes, J. Schwarzmeier, E. Lundberg, T. Johnson, M. Bye, and G. Schwoerer. The Cray BlackWidow: A Highly Scalable Vector Multiprocessor. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'07), Reno, NV, Nov. 2007.

Digital Library

[2]

R. D. Chamberlain, M. A. Franklin, and C. S. Baw. Gemini: An Optical Interconnection Network for Parallel Processing. IEEE Transactions on Parallel and Distributed Systems, 13(10):1038-1055, 2002.

Digital Library

[3]

D. Chiou, L. R. Dennison, and W. J. Dally. Adaptive source routing and packet processing. United States Patent 20050100035, May 2005.

[4]

C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406-424, March 1953.

[5]

W. J. Dally. Virtual-channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2):194-205, 1992.

Digital Library

[6]

W. J. Dally and J. W. Poulton. Digital systems engineering. Cambridge University Press, New York, NY, 1998.

Digital Library

[7]

W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, 36(5):547-553, 1987.

Digital Library

[8]

W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA, 2004.

Digital Library

[9]

S. Dandamudi and D. Eager. Hierarchical Interconnection Networks for Multicomputer Systems. IEEE Transactions on Computers, 39(6):786- 797, 1990.

Digital Library

[10]

A. K. Gupta and W. J. Dally. Topology optimization of interconnection networks. IEEE Computer Architecture Letters, 5(1), 2006.

Digital Library

[11]

A. K. Gupta, W. J. Dally, A. Singh, and B. Towles. Scalable Opto-Electronic Network (SOENet). In Proc. of Hot Interconnects, pages 71-75, Stanford, CA, Aug. 2002.

Digital Library

[12]

Intel Connects Cables. http://www.intel.com/design/network/ products/optical/cables/index.htm/.

[13]

J. Kim, W. J. Dally, and D. Abts. Adaptive Routing in High-radix Clos Network. In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'06), Tampa, FL, Nov. 2006.

Digital Library

[14]

J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 126-137, San Diego, CA, June 2007.

Digital Library

[15]

J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a High-Radix Router. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 420-431, Madison, WI, June 2005.

Digital Library

[16]

A. K. Kodi and A. Louri. Design of a High-Speed Optical Interconnect for Scalable Shared-Memory Multiprocessors. IEEE Micro, 25(1):41- 49, 2005.

Digital Library

[17]

P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21-29, 2005.

Digital Library

[18]

J. M. Kumar and L. M. Patnaik. Extended hypercube: A hierarchical interconnection network of hypercubes. IEEE Trans. Parallel Distrib. Syst., 3(1):45-57, 1992.

Digital Library

[19]

J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In Proc. of the 24th Annual Int'l Symp. on Computer Architecture , pages 241-251, 1997.

Digital Library

[20]

C. Leiserson. Fat-trees: Universal networks for hardware efficient supercomputing. IEEE Transactions on Computer, C-34(10):892-901, October 1985.

Digital Library

[21]

Luxtera Blazar LUX5010. http://www.luxtera.com/ prod- ucts_blazar.htm.

[22]

Luxtera Inc. White Paper: Fiber will displace copper sooner than you think, Nov. 2005.

[23]

R. Palmer, J. Poulton, W. J. Dally, J. Eyles, A. M. Fuller, T. Greer, M. Horowitz, M. Kellam, F. Quan, and F. Zarkeshvari. A 14mW 6.25Gb/s Transceiver in 90nm CMOS for Serial Chip-to-Chip Communications. In IEEE Int'l Solid-State Circuits Conf., Digest of Tech. Papers (ISSCC), pages 440-441, 2007.

[24]

T. Pinkston. Design considerations for optical interconnects in parallel computers. In Massively Parallel Processing Using Optical Interconnections , pages 306-322, Cancun, Mexico, 1994.

[25]

F. P. Preparata and J. Vuillemin. The cube-connected cycles: a versatile network for parallel computation. Commun. ACM, 24(5):300-309, 1981.

Digital Library

[26]

S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-radix Clos Network. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 16-28, Boston, MA, June 2006.

Digital Library

[27]

S. Scott and G. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Hot Chips 4, Stanford, CA, Aug. 1996.

[28]

A. Shacham and K. Bergman. Building Ultralow-Latency Interconnection Networks Using Photonic Integration. IEEE Micro, 27(4):6-20, 2007.

Digital Library

[29]

A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.

[30]

A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In Proc. of the International Symposium on Computer Architecture (ISCA), pages 194-205, San Diego, CA, June 2003.

Digital Library

[31]

A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. Adaptive channel queue routing on k-ary n-cubes. In SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures , pages 11-19, New York, NY, USA, 2004. ACM Press.

Digital Library

[32]

L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350-361, 1982.

Digital Library

[33]

D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. B. III, and A. Agarwal. On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro, 27(5):15-31, 2007.

Digital Library

Cited By

Feng YWei YXiang DMa KBagchi SZhang Y(2024)Evaluating chiplet-based large-scale interconnection networks via cycle-accurate packet-parallel simulationProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692037(731-747)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692037
Blach NBesta MDe Sensi DDomke JHarake HLi SIff PKonieczny MLakhotia KKubicek AFerrari MPetrini FHoefler TVanbever LZhang I(2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691882
Chen SHe KWang RSeshan SSteenkiste PVanbever LZhang I(2024)Precise data center traffic engineering with constrained hardware resourcesProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691862(669-690)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691862
Show More Cited By

Index Terms

Technology-Driven, Highly-Scalable Dragonfly Topology
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Interconnect
      1. Input / output circuits

Recommendations

Technology-Driven, Highly-Scalable Dragonfly Topology

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to reduce the diameter, latency, and cost of interconnection networks. High-radix networks, however, require longer cables than their low-radix counterparts. Because ...
Flattened butterfly: a cost-efficient topology for high-radix networks
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On ...
Firefly: illuminating future network-on-chip with nanophotonics
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Future many-core processors will require high-performance yet energy-efficient on-chip networks to provide a communication substrate for the increasing number of cores. Recent advances in silicon nanophotonics create new opportunities for on-chip ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '08: Proceedings of the 35th Annual International Symposium on Computer Architecture

June 2008

449 pages

ISBN:9780769531748

ACM SIGARCH Computer Architecture News Volume 36, Issue 3
June 2008
449 pages
ISSN:0163-5964
DOI:10.1145/1394608
Issue’s Table of Contents

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 June 2008

Check for updates

Author Tags

Qualifiers

Article

Conference

ISCA08

Sponsor:

SIGARCH

ISCA08: The 35th Annual International Symposium on Computer Architecture

June 21 - 25, 2008

Acceptance Rates

ISCA '08 Paper Acceptance Rate 37 of 259 submissions, 14%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

331
Total Citations
View Citations
2,513
Total Downloads

Downloads (Last 12 months)295
Downloads (Last 6 weeks)41

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng YWei YXiang DMa KBagchi SZhang Y(2024)Evaluating chiplet-based large-scale interconnection networks via cycle-accurate packet-parallel simulationProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692037(731-747)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692037
Blach NBesta MDe Sensi DDomke JHarake HLi SIff PKonieczny MLakhotia KKubicek AFerrari MPetrini FHoefler TVanbever LZhang I(2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691882
Chen SHe KWang RSeshan SSteenkiste PVanbever LZhang I(2024)Precise data center traffic engineering with constrained hardware resourcesProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691862(669-690)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691862
Ang JCarini GChen YChuang IDemarco MEconomou SEickbusch AFaraon AFu KGirvin SHatridge MHouck AHilaire PKrsulich KLi ALiu CLiu YMartonosi MMcKay DMisewich JRitter MSchoelkopf RStein SSussman STang HTang WTomesh TTubman NWang CWiebe NYao YYost DZhou Y(2024)ARQUIN: Architectures for Multinode Superconducting Quantum ComputersACM Transactions on Quantum Computing10.1145/36741515:3(1-59)Online publication date: 19-Sep-2024
https://dl.acm.org/doi/10.1145/3674151
Huang PZhang XChen ZLiu CChen G(2024)LEFT: LightwEight and FasT packet Reordering for RDMAProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663418(67-73)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.1145/3663408.3663418
Postigo DHerreros DBarón ECamarero CFuentes P(2024)Defining the Boundaries for Endpoint Congestion Management in Networks for High-Performance ComputingProceedings of the Seventh International Workshop on Systems and Network Telemetry and Analytics10.1145/3660320.3660333(15-23)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3660320.3660333
Chaulagain RYuan X(2024)Enhanced UGAL Routing Schemes for Dragonfly NetworksProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656602(449-459)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656602
Gan XWu GQiu SXiong FSi JFang JDong DGong CLi TWang ZLee IChabbi MSteuwer M(2024)GraphCube: Interconnection Hierarchy-aware Graph ProcessingProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638498(160-174)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638498
Lakhotia KMonroe LIsham KBesta MBlach NHoefler TPetrini FAgrawal KPetrank E(2024)PolarStar: Expanding the Horizon of Diameter-3 NetworksProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659975(345-357)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659975
Luczynski PGianinazzi LIff PWilson LDe Sensi DHoefler TMencagli GDazzi PLowenthal DBadia R(2024)Near-Optimal Wafer-Scale ReduceProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658693(334-347)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658693
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten