Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2388996.2389091acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes

Published: 10 November 2012 Publication History

Abstract

Over the last decade, InfiniBand has become an increasingly popular interconnect for deploying modern super-computing systems. However, there exists no detection service that can discover the underlying network topology in a scalable manner and expose this information to runtime libraries and users of the high performance computing systems in a convenient way. In this paper, we design a novel and scalable method to detect the InfiniBand network topology by using Neighbor-Joining techniques (NJ). To the best of our knowledge, this is the first instance where the neighbor joining algorithm has been applied to solve the problem of detecting InfiniBand network topology. We also design a network-topology-aware MPI library that takes advantage of the network topology service. The library places processes taking part in the MPI job in a network-topology-aware manner with the dual aim of increasing intra-node communication and reducing the long distance inter-node communication across the InfiniBand fabric.

References

[1]
K. Kandalla and H. Subramoni and D. K. Panda, "Designing Topology-Aware Collective Communication Algorithms for Large Scale InfiniBand Clusters: Case Studies wih Scatter and Gather," in IPDPS, 2010.
[2]
T. Hoefler and M. Snir, "Generic Topology Mapping Strategies for Large-scale Parallel Architectures," in Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11). ACM, Jun. 2011, pp. 75--85.
[3]
M. J. Rashti, J. Green, P. Balaji, A. Afsahi, and W. Gropp, "Multi-core and Network Aware MPI Topology Functions," in Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface, ser. EuroMPI'11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 50--60.
[4]
H. Subramoni, K. Kandalla, J. Vienne, S. Sur, B. Barth, K. Tomko, R. Mclay, K. Schulz and D. K. Panda, "Design and Evaluation of Network Topology-/Speed-Aware Broadcast Algorithms for InfiniBand Clusters," in CLUSTER, 2011.
[5]
N. Saitou and M. Nei, "The Neighbor-Joining Method: A New Method for Reconstructing Phylogentic Trees," Mol. Biol. Evol, vol. 4, pp. 406--425, 1987.
[6]
The MIMD Lattice Computation (MILC) Collaboration, http://physics.indiana.edu/~sg/milc.html.
[7]
R. D. Falgout and U. M. Yang, "Hypre: A Library of High Performance Preconditioners," in Proceedings of the International Conference on Computational Science-Part III, ser. ICCS '02. London, UK, UK: Springer-Verlag, 2002, pp. 632--641.
[8]
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker, "The IBM Blue Gene/Q Interconnection Network and Message Unit," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '11. New York, NY, USA: ACM, 2011, pp. 26:1--26:10. {Online}. Available: http://doi.acm.org/10.1145/2063384.2063419
[9]
Top500, "Top500 Supercomputing systems," Jun 2011, http://www.top500.org/.
[10]
"Open Fabrics Enterprise Distribution," http://www.openfabrics.org/.
[11]
S. C. Johnson, "Hierarchical Clustering Schemes," Psychometrika, vol. 32, no. 3, pp. 241--254, September 1967.
[12]
C. Walshaw and M. Cross, "JOSTLE: Parallel Multi-level Graph-Partitioning Software -- An Overview," in Mesh Partitioning Techniques and Domain Decomposition Techniques, F. Magoules, Ed. Civil-Comp Ltd., 2007.
[13]
K. Schloegel, G. Karypis, and V. Kumar, "Parallel Static and Dynamic Multi-Constraint Graph Partitioning," Concurrency and Computation: Practice and Experience, pp. 219--240, 2002.
[14]
C. D. Spradling, "SPEC CPU2006 Benchmark Tools," SIGARCH Comput. Archit. News, vol. 35, no. 1, pp. 130--134, Mar. 2007.
[15]
Müller, Matthias S. and van Waveren, Matthijs and Lieberman, Ron and Whitney, Brian and Saito, Hideki and Kumaran, Kalyan and Baron, John and Brantley, William C. and Parrott, Chris and Elken, Tom and Feng, Huiyu and Ponder, Carl, "SPEC MPI2007--An Application Benchmark Suite for Parallel Systems using MPI," Concurr. Comput.: Pract. Exper., vol. 22, no. 2, pp. 191--205, Feb. 2010.
[16]
The NERSC SDSA Benchmark Codes, http://www1.nersc.gov/projects/SDSA/software/.
[17]
W. P. Nicholas J. Wright and A. Snavely, "Characterizing Parallel Scaling of Scientific Applications using IPM," in 10th LCI Conference, Mar. 2009.
[18]
He, Jun and Kowalkowski, Jim and Paterno, Marc and Holmgren, Don and Simone, James and Sun, Xian-He, "Layout-Aware Scientific Computing: A Case Study using MILC," in Proceedings of the Second Workshop on Latest AdScalable Algorithms for Large-Scale Systems, ser. ScalA '11. ACM, 2011, pp. 21--24.
[19]
A.H. Baker, R. D. Falgout, T. V. Kolev and U. M. Yang, "Scaling hypre's Multigrid Solvers to 100,000 Cores," in High Performance Scientific Computing: Algorithms and Applications - A Tribute to Prof. Ahmed Sameh, M. Berry et al., eds., Springer, LLNL-JRNL-479591, 2012.
[20]
Y. Cui, R. Moore, K. Olsen, A. Chourasia, P. Maechling, B. Minster, S. Day, Y. Hu, J. Zhu, A. Majumdar, and T. Jordan, "Toward Petascale Earthquake Simulations," in Acta Geotechnica (in press), Springer, 2008.
[21]
MVAPICH2, http://mvapich.cse.ohio-state.edu/.
[22]
F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst, "hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications," in PDP2010, 2010.
[23]
P. Sack and W. Gropp, "A Scalable MPI_Comm_split Algorithm for Exascale Computing," in Recent Advances in the Message Passing Interface, ser. Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2010.
[24]
J. Dinan, S. Krishnamoorthy, P. Balaji, J. R. Hammond, M. Krishnan, V. Tipparaju, and A. Vishnu, "Noncollective Communicator Creation in MPI," in EuroMPI, 2011.
[25]
The NERSC-6 Benchmarks, http://www.nersc.gov/research-and-development/benchmarking-and-workload-characterization/nersc-6-benchmarks/.
[26]
S. H. Bokhari, "On the Mapping Problem," IEEE Transactions on Computers, vol. 30, pp. 207--214, 1981.
[27]
F. Erçal, J. Ramanujam, and P. Sadayappan, "Task Allocation onto a Hypercube by Recursive Mincut Bipartitioning," J. Parallel Distrib. Comput., vol. 10, no. 1, pp. 35--44, 1990.
[28]
B. W. Kernighan and S. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," Bell System Technical Journal, vol. 49, no. 2, pp. 291--308, 1970.
[29]
S.-Y. Lee and J. K. Aggarwal, "A Mapping Strategy for Parallel Processing," IEEE Trans. Comput., vol. 36, no. 4, pp. 433--442, Apr. 1987.
[30]
S. Radhakrishnan, R. Brunner, and L. V. Kalé, "Branch and Bound Based Load Balancing for Parallel Applications," in Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments, ser. ISCOPE '99. London, UK: Springer-Verlag, 1999, pp. 194--199.
[31]
F. Berman and L. Snyder, "On Mapping Parallel Algorithms into Parallel Architectures," Journal of Parallel and Distributed Computing, vol. 4, pp. 439--458, 1987.
[32]
S. W. Bollinger and S. F. Midkiff, "Heuristic Technique for Processor and Link Assignment in Multicomputers," IEEE Trans. Comput., vol. 40, pp. 325--333, March 1991.
[33]
N. Mansour and G. Fox, "Allocating Data to Multicomputer Modes by Physical Optimization Algorithms for Loosely Synchronous Computations," Concurrency - Practice and Experience, vol. 4, no. 7, pp. 557--574, 1992.
[34]
T. Chockalingam and S. Arunkumar, "Genetic Algorithm Based Heuristics for the Mapping Problem," Computers & Operations Research, vol. 22, pp. 55--64, 1995.
[35]
A. Bhatele, "Automating Topology Aware Mapping for Supercomputers," Ph.D. dissertation, Dept. of Computer Science, University of Illinois, August 2010.
[36]
A. Bhatele, E. J. Bohm, and L. V. Kalé, "Optimizing Communication for Charm++ Applications by Reducing Network Contention," Concurrency and Computation: Practice and Experience, 2011.
[37]
E. Jeannot and G. Mercier, "Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures," in Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II, ser. Euro-Par'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 199--210.
[38]
G. Mercier and E. Jeannot, "Improving MPI Applications Performance on Multicore Clusters with Rank Reordering," in Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface, ser. EuroMPI'11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 39--49.
[39]
F. Pellegrini and J. Roman, "Scotch: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs," in High-Performance Computing and Networking, ser. Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 1996.
[40]
"Implicit Radiation Solver (IRS)," https://asc.llnl.gov/sequoia/benchmarks/#irs.
[41]
"Arbitrary Lagrangian Eulerian in 3D (ALE3D)," https://wci.llnl.gov/codes/ale3d/.

Cited By

View all
  • (2020)Measuring congestion in high-performance datacenter interconnectsProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388246(37-58)Online publication date: 25-Feb-2020
  • (2019)An MPI interface for application and hardware aware cartesian topology optimizationProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343217(1-8)Online publication date: 11-Sep-2019
  • (2018)ADAPTProceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3208040.3208054(118-130)Online publication date: 11-Jun-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2012
1161 pages
ISBN:9781467308045

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 10 November 2012

Check for updates

Qualifiers

  • Research-article

Conference

SC '12
Sponsor:

Acceptance Rates

SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Measuring congestion in high-performance datacenter interconnectsProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388246(37-58)Online publication date: 25-Feb-2020
  • (2019)An MPI interface for application and hardware aware cartesian topology optimizationProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343217(1-8)Online publication date: 11-Sep-2019
  • (2018)ADAPTProceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3208040.3208054(118-130)Online publication date: 11-Jun-2018
  • (2017)Automatic topology mapping of diverse large-scale parallel applicationsProceedings of the International Conference on Supercomputing10.1145/3079079.3079104(1-10)Online publication date: 14-Jun-2017
  • (2016)Scheduling-aware routing for supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014922(1-12)Online publication date: 13-Nov-2016
  • (2015)Congestion Control for Large-Scale RDMA DeploymentsACM SIGCOMM Computer Communication Review10.1145/2829988.278748445:4(523-536)Online publication date: 17-Aug-2015
  • (2015)Congestion Control for Large-Scale RDMA DeploymentsProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication10.1145/2785956.2787484(523-536)Online publication date: 17-Aug-2015
  • (2015)PaCMapProceedings of the 29th ACM on International Conference on Supercomputing10.1145/2751205.2751225(37-46)Online publication date: 8-Jun-2015
  • (2015)Hierarchical task mapping for parallel applications on supercomputersThe Journal of Supercomputing10.1007/s11227-014-1324-571:5(1776-1802)Online publication date: 1-May-2015
  • (2014)FaRMProceedings of the 11th USENIX Conference on Networked Systems Design and Implementation10.5555/2616448.2616486(401-414)Online publication date: 2-Apr-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media