Abstract
Supercomputers rely on the job scheduling and resource management (JSRM) system to allocate compute nodes for jobs. To reduce the job’s communication overhead, the JSRM system relies on its detailed internal topology design to allocate closely-connected compute nodes. However, over-complicated topology designs are laborious for the JSRM system to parse, causing excessive scheduling overheads. The optimal node allocation and scheduling overhead cannot be reconciled, especially in ultra-scale supercomputers with increasingly sophisticated network topology. We perform a study on the production supercomputer with a two-dimensional fat-tree network, systematically analyze the underlying correlation among the node allocation, topology design and job characteristics, and present multiple trade-off designs to adapt to different scenarios. Our methods and insights about the topology design can be generalized to a large family of different topologies or even deployed directly in the systems using similar hierarchical networks. We propose three topology design guidelines based on the load of JSRM system, job size and communication characteristic, achieving a trade-off between the communication cost and the scheduling overhead. This study reveals that full topology details are not always necessary and a holistic investigation of the system and job characteristics is required when designing the topology.
Similar content being viewed by others
Data availability
The authors confirm that the data supporting the findings of this study are available within the article.
References
Albing C, Troullier N, Whalen S, Olson R, Glenski J, Pritchard H, Mills H (2011) Scalable node allocation for improved performance in regular and anisotropic 3d torus supercomputers. In European MPI Users’ Group Meeting, pages 61–70. Springer
Alverson B, Froese E, Kaplan L, Roweth D (2012) Cray xc series network. Cray Inc., White Paper WP-Aries01-1112
Bhatele, A., Bohm, E., Kalé, L.V.: Topology aware task mapping techniques: an api and case study. ACM Sigplan Notices 44(4), 301–302 (2009)
Domke J, Matsuoka S, Ivanov IR, Tsushima Y, Yuki T, Nomura A, Miura S, McDonald N, Floyd DL, Dubé N (2019) Hyperx topology: First at-scale implementation and comparison to the fat-tree. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–23
Faanes, C., Bataineh, A., Roweth, D., Court, T., Froese, E., Alverson, B., Johnson, T., Kopnick, J., Higgins, M., Reinhard, J.: Cray cascade: a scalable hpc system based on a dragonfly network. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–9. IEEE (2012)
Galvez, J.J., Jain, N., Kale, L.V.: Automatic topology mapping of diverse large-scale parallel applications. In Proceedings of the International Conference on Supercomputing, pages 1–10 (2017)
Guo, C., Lu, G. Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Lu, S.: Bcube: a high performance, server-centric network architecture for modular data centers. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pages 63–74 (2009)
Habata, S., Yokokawa, M., Kitawaki, S., et al.: The earth simulator system. NEC Res. Dev. 44(1), 21–26 (2003)
Jain, N., Bhatele, A., Howell, L.H., Böhme, D., Karlin, I., León, E.A., Mubarak, M., Wolfe, N., Gamblin, T., Leininger, M.L.: Predicting the performance impact of different fat-tree configurations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–13 (2017)
Kim, J., Balfour, J., Dally, W.: Flattened butterfly topology for on-chip networks. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 172–182. IEEE (2007)
Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In 2008 international Symposium on Computer Architecture, pages 77–88. IEEE (2008)
Leiserson, C.E. , Abuhamdeh, Z.S., Douglas, D.C., Feynman, C.R., Ganmukhi, M.N., Hill, J.V., Hillis, D., Kuszmaul, B. C., St. Pierre, M.A., Wells, D.S. et al. The network architecture of the connection machine cm-5. In Proceedings of the fourth annual ACM symposium on parallel algorithms and architectures, pages 272–285 (1992)
Liao, X., Xiao, L., Yang, C., Yutong, L.: Milkyway-2 supercomputer: system and application. Front. Comput. Sci. 8(3), 345–356 (2014)
Navaridas, J., Miguel-Alonso, J., Ridruejo, F.J., Denzel, W.: Reducing complexity in tree-like computer interconnection networks. Parallel computing 36(2–3), 71–85 (2010)
NPB. Nas parallel benchmarks. http://www.nas.nasa.gob/publications/npb.html/, 2020
OMB. Osu micro-benchmarks. http://www.mvapich.cse.ohio-state.edu/benchmarks/, 2020
Panda, D.K., Tomko, K., Schulz, K., Majumdar, A.: The mvapich project: evolution and sustainability of an open source production quality mpi library for hpc. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE) (2013)
Pollard, S.D., Jain, N., Herbein, S., Bhatele, A.: Evaluation of an interference-free node allocation policy on fat-tree clusters. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 333–345. IEEE (2018)
Shpiner, A., Haramaty, Z., Eliad, S., Zdornov, V., Gafni, B. Zahavi, E.: Dragonfly+: Low cost topology for scaling datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pages 1–8. IEEE (2017)
Simakov, N.A., Innus, M.D., Jones, M.D., DeLeon, R.L., White, J.P., Gallo, S.M.: Patra, A.K.; Furlani, T.R. simulator, A.S.: Implementation and parametric analysis. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, pages 197–217. Springer, 2017
Slotnick, J., Khodadoust, A., Aloso, J., Darmofal, D., Gropp, W., Lurie, E., Mavriplis, D. Cfd vision 2030 study: a path to revolutionary computational aerosciences (2014)
Smith, S.A., Lowenthal, D.K., Jigsaw: a high-utilization, interference-free job scheduler for fat-tree clusters. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pages 201–213, 2021
Stunkel, C.B., Graham, R.L., Shainer, G., Kagan, M., Sharkawi, S.S., Rosenburg, B., Chochia, G.A.: The high-speed networks of the summit and sierra supercomputers. IBM journal of Research and Development, 64(3/4):3–1 (2020)
Tanash, M., Dunn, B. Andresen, D., Hsu, W., Yang, H., Okanlawon, A.: Improving hpc system performance by predicting job resources via supervised machine learning. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), pages 1–8 (2019)
Wang, R., Kai, L., Chen, J., Zhang, W., Li, J., Yuan, Y., Pingjing, L., Huang, L., Li, S., Fan, X.: Brief introduction of tianhe exascale prototype system. Tsinghua Sci. Technol. 26(3), 361–369 (2020)
Yan, B., Xiao, L., Qin, G., Yang, Z., Dong, B., Haonan, Y., Hongyu, W.: Qtms: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared hpc system. Parallel Computing 94, 102637 (2020)
Yang, X., Zhou, Z., Tang, W., Zheng, X., Wang, J., Lan, Z. Balancing job performance with system performance via locality-aware scheduling on torus-connected systems. In 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 140–148. IEEE (2014)
Yu, J., Liu, G., Liu, X., Dong, W., Li, X., Liu, Y.: Rethinking node allocation strategy for data-intensive applications in consideration of spatially bursty i/o. In Proceedings of the 2018 International Conference on Supercomputing, pages 12–21 (2018)
Zhang, S., Zhang, H., Shu, C.-W.: Topological structure of shock induced vortex breakdown. J. Fluid Mech. 639, 343–372 (2009)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no Conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, W., Yu, J. Trade-off topology design for hierarchical network based on job characteristics. CCF Trans. HPC 6, 459–471 (2024). https://doi.org/10.1007/s42514-024-00193-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-024-00193-z