Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Trade-off topology design for hierarchical network based on job characteristics

  • Review Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Supercomputers rely on the job scheduling and resource management (JSRM) system to allocate compute nodes for jobs. To reduce the job’s communication overhead, the JSRM system relies on its detailed internal topology design to allocate closely-connected compute nodes. However, over-complicated topology designs are laborious for the JSRM system to parse, causing excessive scheduling overheads. The optimal node allocation and scheduling overhead cannot be reconciled, especially in ultra-scale supercomputers with increasingly sophisticated network topology. We perform a study on the production supercomputer with a two-dimensional fat-tree network, systematically analyze the underlying correlation among the node allocation, topology design and job characteristics, and present multiple trade-off designs to adapt to different scenarios. Our methods and insights about the topology design can be generalized to a large family of different topologies or even deployed directly in the systems using similar hierarchical networks. We propose three topology design guidelines based on the load of JSRM system, job size and communication characteristic, achieving a trade-off between the communication cost and the scheduling overhead. This study reveals that full topology details are not always necessary and a holistic investigation of the system and job characteristics is required when designing the topology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The authors confirm that the data supporting the findings of this study are available within the article.

References

  • Albing C, Troullier N, Whalen S, Olson R, Glenski J, Pritchard H, Mills H (2011) Scalable node allocation for improved performance in regular and anisotropic 3d torus supercomputers. In European MPI Users’ Group Meeting, pages 61–70. Springer

  • Alverson B, Froese E, Kaplan L, Roweth D (2012) Cray xc series network. Cray Inc., White Paper WP-Aries01-1112

  • Bhatele, A., Bohm, E., Kalé, L.V.: Topology aware task mapping techniques: an api and case study. ACM Sigplan Notices 44(4), 301–302 (2009)

    Article  Google Scholar 

  • Domke J, Matsuoka S, Ivanov IR, Tsushima Y, Yuki T, Nomura A, Miura S, McDonald N, Floyd DL, Dubé N (2019) Hyperx topology: First at-scale implementation and comparison to the fat-tree. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–23

  • Faanes, C., Bataineh, A., Roweth, D., Court, T., Froese, E., Alverson, B., Johnson, T., Kopnick, J., Higgins, M., Reinhard, J.: Cray cascade: a scalable hpc system based on a dragonfly network. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–9. IEEE (2012)

  • Galvez, J.J., Jain, N., Kale, L.V.: Automatic topology mapping of diverse large-scale parallel applications. In Proceedings of the International Conference on Supercomputing, pages 1–10 (2017)

  • Guo, C., Lu, G. Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Lu, S.: Bcube: a high performance, server-centric network architecture for modular data centers. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pages 63–74 (2009)

  • Habata, S., Yokokawa, M., Kitawaki, S., et al.: The earth simulator system. NEC Res. Dev. 44(1), 21–26 (2003)

    Google Scholar 

  • Jain, N., Bhatele, A., Howell, L.H., Böhme, D., Karlin, I., León, E.A., Mubarak, M., Wolfe, N., Gamblin, T., Leininger, M.L.: Predicting the performance impact of different fat-tree configurations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–13 (2017)

  • Kim, J., Balfour, J., Dally, W.: Flattened butterfly topology for on-chip networks. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 172–182. IEEE (2007)

  • Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In 2008 international Symposium on Computer Architecture, pages 77–88. IEEE (2008)

  • Leiserson, C.E. , Abuhamdeh, Z.S., Douglas, D.C., Feynman, C.R., Ganmukhi, M.N., Hill, J.V., Hillis, D., Kuszmaul, B. C., St. Pierre, M.A., Wells, D.S. et al. The network architecture of the connection machine cm-5. In Proceedings of the fourth annual ACM symposium on parallel algorithms and architectures, pages 272–285 (1992)

  • Liao, X., Xiao, L., Yang, C., Yutong, L.: Milkyway-2 supercomputer: system and application. Front. Comput. Sci. 8(3), 345–356 (2014)

    Article  MathSciNet  Google Scholar 

  • Navaridas, J., Miguel-Alonso, J., Ridruejo, F.J., Denzel, W.: Reducing complexity in tree-like computer interconnection networks. Parallel computing 36(2–3), 71–85 (2010)

    Article  Google Scholar 

  • NPB. Nas parallel benchmarks. http://www.nas.nasa.gob/publications/npb.html/, 2020

  • OMB. Osu micro-benchmarks. http://www.mvapich.cse.ohio-state.edu/benchmarks/, 2020

  • Panda, D.K., Tomko, K., Schulz, K., Majumdar, A.: The mvapich project: evolution and sustainability of an open source production quality mpi library for hpc. In Workshop on Sustainable Software for Science: Practice and Experiences, held in conjunction with Int’l Conference on Supercomputing (WSSPE) (2013)

  • Pollard, S.D., Jain, N., Herbein, S., Bhatele, A.: Evaluation of an interference-free node allocation policy on fat-tree clusters. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 333–345. IEEE (2018)

  • Shpiner, A., Haramaty, Z., Eliad, S., Zdornov, V., Gafni, B. Zahavi, E.: Dragonfly+: Low cost topology for scaling datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pages 1–8. IEEE (2017)

  • Simakov, N.A., Innus, M.D., Jones, M.D., DeLeon, R.L., White, J.P., Gallo, S.M.: Patra, A.K.; Furlani, T.R. simulator, A.S.: Implementation and parametric analysis. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, pages 197–217. Springer, 2017

  • Slotnick, J., Khodadoust, A., Aloso, J., Darmofal, D., Gropp, W., Lurie, E., Mavriplis, D. Cfd vision 2030 study: a path to revolutionary computational aerosciences (2014)

  • Smith, S.A., Lowenthal, D.K., Jigsaw: a high-utilization, interference-free job scheduler for fat-tree clusters. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pages 201–213, 2021

  • Stunkel, C.B., Graham, R.L., Shainer, G., Kagan, M., Sharkawi, S.S., Rosenburg, B., Chochia, G.A.: The high-speed networks of the summit and sierra supercomputers. IBM journal of Research and Development, 64(3/4):3–1 (2020)

  • Tanash, M., Dunn, B. Andresen, D., Hsu, W., Yang, H., Okanlawon, A.: Improving hpc system performance by predicting job resources via supervised machine learning. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), pages 1–8 (2019)

  • Wang, R., Kai, L., Chen, J., Zhang, W., Li, J., Yuan, Y., Pingjing, L., Huang, L., Li, S., Fan, X.: Brief introduction of tianhe exascale prototype system. Tsinghua Sci. Technol. 26(3), 361–369 (2020)

    Article  Google Scholar 

  • Yan, B., Xiao, L., Qin, G., Yang, Z., Dong, B., Haonan, Y., Hongyu, W.: Qtms: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared hpc system. Parallel Computing 94, 102637 (2020)

  • Yang, X., Zhou, Z., Tang, W., Zheng, X., Wang, J., Lan, Z. Balancing job performance with system performance via locality-aware scheduling on torus-connected systems. In 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 140–148. IEEE (2014)

  • Yu, J., Liu, G., Liu, X., Dong, W., Li, X., Liu, Y.: Rethinking node allocation strategy for data-intensive applications in consideration of spatially bursty i/o. In Proceedings of the 2018 International Conference on Supercomputing, pages 12–21 (2018)

  • Zhang, S., Zhang, H., Shu, C.-W.: Topological structure of shock induced vortex breakdown. J. Fluid Mech. 639, 343–372 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxiang Yang.

Ethics declarations

Conflict of interest

We declare that we have no Conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, W., Yu, J. Trade-off topology design for hierarchical network based on job characteristics. CCF Trans. HPC 6, 459–471 (2024). https://doi.org/10.1007/s42514-024-00193-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-024-00193-z

Keywords

Navigation