Trade-off topology design for hierarchical network based on job characteristics

Wenxiang Yang¹ &
Jie Yu¹

197 Accesses
Explore all metrics

Abstract

Supercomputers rely on the job scheduling and resource management (JSRM) system to allocate compute nodes for jobs. To reduce the job’s communication overhead, the JSRM system relies on its detailed internal topology design to allocate closely-connected compute nodes. However, over-complicated topology designs are laborious for the JSRM system to parse, causing excessive scheduling overheads. The optimal node allocation and scheduling overhead cannot be reconciled, especially in ultra-scale supercomputers with increasingly sophisticated network topology. We perform a study on the production supercomputer with a two-dimensional fat-tree network, systematically analyze the underlying correlation among the node allocation, topology design and job characteristics, and present multiple trade-off designs to adapt to different scenarios. Our methods and insights about the topology design can be generalized to a large family of different topologies or even deployed directly in the systems using similar hierarchical networks. We propose three topology design guidelines based on the load of JSRM system, job size and communication characteristic, achieving a trade-off between the communication cost and the scheduling overhead. This study reveals that full topology details are not always necessary and a holistic investigation of the system and job characteristics is required when designing the topology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Experiment for Topology-Aware Resource Management

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Article 01 June 2020

Static Cost-Effective Analysis of a Shifted Completely Connected Network

Data availability

The authors confirm that the data supporting the findings of this study are available within the article.

References

Albing C, Troullier N, Whalen S, Olson R, Glenski J, Pritchard H, Mills H (2011) Scalable node allocation for improved performance in regular and anisotropic 3d torus supercomputers. In European MPI Users’ Group Meeting, pages 61–70. Springer
Alverson B, Froese E, Kaplan L, Roweth D (2012) Cray xc series network. Cray Inc., White Paper WP-Aries01-1112
Bhatele, A., Bohm, E., Kalé, L.V.: Topology aware task mapping techniques: an api and case study. ACM Sigplan Notices 44(4), 301–302 (2009)
Article Google Scholar
Domke J, Matsuoka S, Ivanov IR, Tsushima Y, Yuki T, Nomura A, Miura S, McDonald N, Floyd DL, Dubé N (2019) Hyperx topology: First at-scale implementation and comparison to the fat-tree. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–23
Faanes, C., Bataineh, A., Roweth, D., Court, T., Froese, E., Alverson, B., Johnson, T., Kopnick, J., Higgins, M., Reinhard, J.: Cray cascade: a scalable hpc system based on a dragonfly network. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–9. IEEE (2012)
Galvez, J.J., Jain, N., Kale, L.V.: Automatic topology mapping of diverse large-scale parallel applications. In Proceedings of the International Conference on Supercomputing, pages 1–10 (2017)
Guo, C., Lu, G. Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Lu, S.: Bcube: a high performance, server-centric network architecture for modular data centers. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pages 63–74 (2009)
Habata, S., Yokokawa, M., Kitawaki, S., et al.: The earth simulator system. NEC Res. Dev. 44(1), 21–26 (2003)
Google Scholar
Jain, N., Bhatele, A., Howell, L.H., Böhme, D., Karlin, I., León, E.A., Mubarak, M., Wolfe, N., Gamblin, T., Leininger, M.L.: Predicting the performance impact of different fat-tree configurations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–13 (2017)
Kim, J., Balfour, J., Dally, W.: Flattened butterfly topology for on-chip networks. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 172–182. IEEE (2007)
Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In 2008 international Symposium on Computer Architecture, pages 77–88. IEEE (2008)
Leiserson, C.E. , Abuhamdeh, Z.S., Douglas, D.C., Feynman, C.R., Ganmukhi, M.N., Hill, J.V., Hillis, D., Kuszmaul, B. C., St. Pierre, M.A., Wells, D.S. et al. The network architecture of the connection machine cm-5. In Proceedings of the fourth annual ACM symposium on parallel algorithms and architectures, pages 272–285 (1992)
Liao, X., Xiao, L., Yang, C., Yutong, L.: Milkyway-2 supercomputer: system and application. Front. Comput. Sci. 8(3), 345–356 (2014)
Article MathSciNet Google Scholar
Navaridas, J., Miguel-Alonso, J., Ridruejo, F.J., Denzel, W.: Reducing complexity in tree-like computer interconnection networks. Parallel computing 36(2–3), 71–85 (2010)
Article Google Scholar
NPB. Nas parallel benchmarks. http://www.nas.nasa.gob/publications/npb.html/, 2020
OMB. Osu micro-benchmarks. http://www.mvapich.cse.ohio-state.edu/benchmarks/, 2020
Panda, D.K., Tomko, K., Schulz, K., Majumdar, A.: The mvapich project: evolution and sustainability of an open source production quality mpi library for hpc. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE) (2013)
Pollard, S.D., Jain, N., Herbein, S., Bhatele, A.: Evaluation of an interference-free node allocation policy on fat-tree clusters. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 333–345. IEEE (2018)
Shpiner, A., Haramaty, Z., Eliad, S., Zdornov, V., Gafni, B. Zahavi, E.: Dragonfly+: Low cost topology for scaling datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pages 1–8. IEEE (2017)
Simakov, N.A., Innus, M.D., Jones, M.D., DeLeon, R.L., White, J.P., Gallo, S.M.: Patra, A.K.; Furlani, T.R. simulator, A.S.: Implementation and parametric analysis. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, pages 197–217. Springer, 2017
Slotnick, J., Khodadoust, A., Aloso, J., Darmofal, D., Gropp, W., Lurie, E., Mavriplis, D. Cfd vision 2030 study: a path to revolutionary computational aerosciences (2014)
Smith, S.A., Lowenthal, D.K., Jigsaw: a high-utilization, interference-free job scheduler for fat-tree clusters. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pages 201–213, 2021
Stunkel, C.B., Graham, R.L., Shainer, G., Kagan, M., Sharkawi, S.S., Rosenburg, B., Chochia, G.A.: The high-speed networks of the summit and sierra supercomputers. IBM journal of Research and Development, 64(3/4):3–1 (2020)
Tanash, M., Dunn, B. Andresen, D., Hsu, W., Yang, H., Okanlawon, A.: Improving hpc system performance by predicting job resources via supervised machine learning. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), pages 1–8 (2019)
Wang, R., Kai, L., Chen, J., Zhang, W., Li, J., Yuan, Y., Pingjing, L., Huang, L., Li, S., Fan, X.: Brief introduction of tianhe exascale prototype system. Tsinghua Sci. Technol. 26(3), 361–369 (2020)
Article Google Scholar
Yan, B., Xiao, L., Qin, G., Yang, Z., Dong, B., Haonan, Y., Hongyu, W.: Qtms: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared hpc system. Parallel Computing 94, 102637 (2020)
Yang, X., Zhou, Z., Tang, W., Zheng, X., Wang, J., Lan, Z. Balancing job performance with system performance via locality-aware scheduling on torus-connected systems. In 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 140–148. IEEE (2014)
Yu, J., Liu, G., Liu, X., Dong, W., Li, X., Liu, Y.: Rethinking node allocation strategy for data-intensive applications in consideration of spatially bursty i/o. In Proceedings of the 2018 International Conference on Supercomputing, pages 12–21 (2018)
Zhang, S., Zhang, H., Shu, C.-W.: Topological structure of shock induced vortex breakdown. J. Fluid Mech. 639, 343–372 (2009)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

China Aerodynamics Research and Development Center, Mianyang, China
Wenxiang Yang & Jie Yu

Authors

Wenxiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxiang Yang.

Ethics declarations

Conflict of interest

We declare that we have no Conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, W., Yu, J. Trade-off topology design for hierarchical network based on job characteristics. CCF Trans. HPC 6, 459–471 (2024). https://doi.org/10.1007/s42514-024-00193-z

Download citation

Received: 26 December 2023
Accepted: 16 April 2024
Published: 21 May 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s42514-024-00193-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-Scale Experiment for Topology-Aware Resource Management

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Static Cost-Effective Analysis of a Shifted Completely Connected Network

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Trade-off topology design for hierarchical network based on job characteristics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-Scale Experiment for Topology-Aware Resource Management

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Static Cost-Effective Analysis of a Shifted Completely Connected Network

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation