article

Hierarchical task mapping for parallel applications on supercomputers

Authors:

Xuanxing Xiong,

Zhiling LanAuthors Info & Claims

The Journal of Supercomputing, Volume 71, Issue 5

Pages 1776 - 1802

https://doi.org/10.1007/s11227-014-1324-5

Published: 01 May 2015 Publication History

Abstract

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the mapping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic recursive tree mapping algorithm, which can handle both inter-node mapping and intra-node mapping; (2) a recursive bipartitioning mapping algorithm for torus topology, which efficiently partitions the compute nodes according to their coordinates. Moreover, a hierarchical task mapping library is developed. Experimental results show that the proposed approach significantly improves the communication performance by up to 77 % with low runtime overhead.

References

[1]

LibTopoMap: A Generic Topology Mapping Library. http://www.unixer.de/research/mpitopo/libtopomap/

[2]

MPI: A message-passing interface standard. version 3.0. http://www.mpi-forum.org/

[3]

ALCF Intrepid. https://www.alcf.anl.gov/intrepid

[4]

METIS: Graph Partitioning Tool. http://glaros.dtc.umn.edu/gkhome/views/metis

[5]

NICS Kraken User Guide. https://www.xsede.org/web/guest/nics-kraken

[6]

TACC Stampede User Guide. http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide

[7]

Top 500 Supercomputer Sites. http://www.top500.org/

[8]

TOPOMap. http://bluesky.cs.iit.edu/topomap/

[9]

Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470---477.

[10]

Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS)

Digital Library

[11]

Aleliunas R, Rosenberg AL (1982) On embedding rectangular grids in square grids. IEEE Trans Comput C-31(9):907---913.

Digital Library

[12]

Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3---12

Digital Library

[13]

Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349---357.

[14]

Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75---82

Digital Library

[15]

Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439---458

Digital Library

[16]

Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana.

Digital Library

[17]

Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1---8

[18]

Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207---214

Digital Library

[19]

Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180---186.

Digital Library

[20]

Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157---1165

[21]

Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815---1823

Digital Library

[22]

Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157---172.

Digital Library

[23]

Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1---25

Digital Library

[24]

Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27---36

Digital Library

[25]

Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210---221

Digital Library

[26]

Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75---84

Digital Library

[27]

Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199---210

Digital Library

[28]

Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96---129

Digital Library

[29]

Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73---94

[30]

Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix conference on computers and communications, pp 40---44.

[31]

Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892---901

[32]

Mercier G, Jeannot E (2011) Improving MPI applications performance on multicore clusters with rank reordering. In: Proceedings of the 18th European MPI users' group conference on recent advances in the message passing interface, pp 39---49

Digital Library

[33]

Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of the scalable high-performance computing conference, pp 486---493

[34]

pellegrini F, Roman J (1996) Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. High-performance computing and networking. Lecture notes in computer science, vol 1067, pp 493---498

Digital Library

[35]

Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement: theory and applications. Springer, Berlin

[36]

Rashti MJ, Green J, Balaji P, Afsahi A, Gropp W (2011) Multi-core and network aware MPI topology functions. In: Proceedings of the 18th European MPI users' group conference on recent advances in the message passing interface, pp 50---60

Digital Library

[37]

Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363---371

[38]

Subramoni H, Potluri S, Kandalla K, Barth B, Vienne J, Keasler J, Tomko K, Schulz K, Moody A, Panda D (2012) Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of international conference on high performance computing, networking, storage and analysis, pp 1---12.

Digital Library

[39]

Tang W, Lan Z, Desai N, Buettner D, Yu Y (2011) Reducing fragmentation on torus-connected supercomputers. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 828---839

Digital Library

[40]

Tr$$\ddot{a}$$äff JL (2002) Implementing the MPI process topology mechanism. In: Proceedings of ACM/IEEE conference on supercomputing, pp 28:1---28:14

Digital Library

[41]

Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proceedings of IEEE International conference on cluster computing (CLUSTER), pp 8---16

Digital Library

[42]

Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proceedings of international conference on high performance computing, networking, storage and analysis, SC '12, pp 75:1---75:10

Digital Library

[43]

Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of ACM/IEEE conference on supercomputing, p 52

Digital Library

[44]

Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 933---944

Digital Library

[45]

Zou H, Yu Y, Tang W, Chen HWM (2014) Flexanalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1(0): 4---13 (Special Issue on Scalable Computing for Big Data)

Digital Library

Cited By

Gan XGuo JGuo PWu GSi JMei SLiu CLi TEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media AccessProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613828(8984-8993)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613828
Baicheng YZhang YLimin XYi ZBing WYao S(2019)LPMSWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339208(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3339186.3339208
Georgiou YJeannot EMercier GVilliermet A(2018)Topology-aware job mappingInternational Journal of High Performance Computing Applications10.5555/3195474.319547632:1(14-27)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3195474.3195476
Show More Cited By

Hierarchical task mapping for parallel applications on supercomputers

Recommendations

Topology mapping of irregular parallel applications on torus-connected supercomputers

Supercomputers with ever increasing computing power are being built for scientific applications. As the system size scales up, so does the size of interconnect network. As a result, communication in supercomputers becomes increasingly expensive due to ...
Task mapping on supercomputers with cellular networks
Periodic hierarchical load balancing for large supercomputers

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing

The Journal of Supercomputing Volume 71, Issue 5

May 2015

327 pages

ISSN:0920-8542

Issue’s Table of Contents

Copyright © Copyright © 2015 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gan XGuo JGuo PWu GSi JMei SLiu CLi TEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media AccessProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613828(8984-8993)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3613828
Baicheng YZhang YLimin XYi ZBing WYao S(2019)LPMSWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339208(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3339186.3339208
Georgiou YJeannot EMercier GVilliermet A(2018)Topology-aware job mappingInternational Journal of High Performance Computing Applications10.5555/3195474.319547632:1(14-27)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3195474.3195476
Georgiou YJeannot EMercier GVilliermet A(2017)Topology-aware resource management for HPC applicationsProceedings of the 18th International Conference on Distributed Computing and Networking10.1145/3007748.3007768(1-10)Online publication date: 5-Jan-2017
https://dl.acm.org/doi/10.1145/3007748.3007768
Wu JXiong XBerrocal EWang JLan Z(2017)Topology mapping of irregular parallel applications on torus-connected supercomputersThe Journal of Supercomputing10.1007/s11227-016-1876-773:4(1691-1714)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s11227-016-1876-7

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents