Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Hierarchical task mapping for parallel applications on supercomputers

Published: 01 May 2015 Publication History

Abstract

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the mapping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic recursive tree mapping algorithm, which can handle both inter-node mapping and intra-node mapping; (2) a recursive bipartitioning mapping algorithm for torus topology, which efficiently partitions the compute nodes according to their coordinates. Moreover, a hierarchical task mapping library is developed. Experimental results show that the proposed approach significantly improves the communication performance by up to 77 % with low runtime overhead.

References

[1]
LibTopoMap: A Generic Topology Mapping Library. http://www.unixer.de/research/mpitopo/libtopomap/
[2]
MPI: A message-passing interface standard. version 3.0. http://www.mpi-forum.org/
[3]
ALCF Intrepid. https://www.alcf.anl.gov/intrepid
[4]
METIS: Graph Partitioning Tool. http://glaros.dtc.umn.edu/gkhome/views/metis
[5]
NICS Kraken User Guide. https://www.xsede.org/web/guest/nics-kraken
[6]
TACC Stampede User Guide. http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide
[7]
Top 500 Supercomputer Sites. http://www.top500.org/
[8]
TOPOMap. http://bluesky.cs.iit.edu/topomap/
[9]
Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470---477.
[10]
Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS)
[11]
Aleliunas R, Rosenberg AL (1982) On embedding rectangular grids in square grids. IEEE Trans Comput C-31(9):907---913.
[12]
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3---12
[13]
Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349---357.
[14]
Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75---82
[15]
Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439---458
[16]
Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana.
[17]
Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1---8
[18]
Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207---214
[19]
Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180---186.
[20]
Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157---1165
[21]
Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815---1823
[22]
Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157---172.
[23]
Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1---25
[24]
Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27---36
[25]
Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210---221
[26]
Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75---84
[27]
Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199---210
[28]
Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96---129
[29]
Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73---94
[30]
Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix conference on computers and communications, pp 40---44.
[31]
Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892---901
[32]
Mercier G, Jeannot E (2011) Improving MPI applications performance on multicore clusters with rank reordering. In: Proceedings of the 18th European MPI users' group conference on recent advances in the message passing interface, pp 39---49
[33]
Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of the scalable high-performance computing conference, pp 486---493
[34]
pellegrini F, Roman J (1996) Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. High-performance computing and networking. Lecture notes in computer science, vol 1067, pp 493---498
[35]
Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement: theory and applications. Springer, Berlin
[36]
Rashti MJ, Green J, Balaji P, Afsahi A, Gropp W (2011) Multi-core and network aware MPI topology functions. In: Proceedings of the 18th European MPI users' group conference on recent advances in the message passing interface, pp 50---60
[37]
Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363---371
[38]
Subramoni H, Potluri S, Kandalla K, Barth B, Vienne J, Keasler J, Tomko K, Schulz K, Moody A, Panda D (2012) Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of international conference on high performance computing, networking, storage and analysis, pp 1---12.
[39]
Tang W, Lan Z, Desai N, Buettner D, Yu Y (2011) Reducing fragmentation on torus-connected supercomputers. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 828---839
[40]
Tr$$\ddot{a}$$äff JL (2002) Implementing the MPI process topology mechanism. In: Proceedings of ACM/IEEE conference on supercomputing, pp 28:1---28:14
[41]
Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proceedings of IEEE International conference on cluster computing (CLUSTER), pp 8---16
[42]
Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proceedings of international conference on high performance computing, networking, storage and analysis, SC '12, pp 75:1---75:10
[43]
Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of ACM/IEEE conference on supercomputing, p 52
[44]
Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 933---944
[45]
Zou H, Yu Y, Tang W, Chen HWM (2014) Flexanalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1(0): 4---13 (Special Issue on Scalable Computing for Big Data)

Cited By

View all
  • (2023)GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media AccessProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613828(8984-8993)Online publication date: 26-Oct-2023
  • (2019)LPMSWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339208(1-10)Online publication date: 5-Aug-2019
  • (2018)Topology-aware job mappingInternational Journal of High Performance Computing Applications10.5555/3195474.319547632:1(14-27)Online publication date: 1-Jan-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 71, Issue 5
May 2015
327 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2015

Author Tags

  1. Communication optimization
  2. Fat-tree network
  3. Hierarchical task mapping
  4. Parallel applications
  5. Topology-aware task mapping
  6. Torus network

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media AccessProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613828(8984-8993)Online publication date: 26-Oct-2023
  • (2019)LPMSWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339208(1-10)Online publication date: 5-Aug-2019
  • (2018)Topology-aware job mappingInternational Journal of High Performance Computing Applications10.5555/3195474.319547632:1(14-27)Online publication date: 1-Jan-2018
  • (2017)Topology-aware resource management for HPC applicationsProceedings of the 18th International Conference on Distributed Computing and Networking10.1145/3007748.3007768(1-10)Online publication date: 5-Jan-2017
  • (2017)Topology mapping of irregular parallel applications on torus-connected supercomputersThe Journal of Supercomputing10.1007/s11227-016-1876-773:4(1691-1714)Online publication date: 1-Apr-2017

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media