Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1128022.1128073acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Topology-aware tile mapping for clusters of SMPs

Published: 03 May 2006 Publication History

Abstract

We propose a technique to optimize the performance of applications using distributed dense arrays and characterized by a nearest-neighbor communication profile by exploiting the topology of SMP clusters. The topological information is used to map array tiles to processors to reduce network communication and improve utilization of shared memory for inter-process communication. The potential benefits of using the SMP-aware mapping are demonstrated through a simulation, as well as a real application solving a wind-driven ocean circulation model on an IBM SP. On 256 processors, the execution time was reduced by almost 30 percent without any changes to the original application source code. The proposed mapping approach is applicable to multiple programming models and distributed array management systems.

References

[1]
O. Beaumont, V. Boudet, and A. Petitet. A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers). IEEE Trans. Comput., 50(10):1052--1070, 2001.]]
[2]
E. Chu. Impact of Physical/logical Network Topology on Parallel Matrix Computation. Int. J. High Perform. Comput. Appl., 13(2):124--145, 1999.]]
[3]
E. Chu and A. George. QR factorization of a dense matrix on a hypercube multiprocessor. SIAM J. Sci. Stat. Comput., 11(5):990--1028, 1990.]]
[4]
A. Darte, D. Chavarría-Miranda, R. Fowler, and J. Mellor-Crummey. Generalized Multipartitioning for Multi-dimensional arrays. In Proceedings of the International Parallel and Distributed Processing Symposium, Fort Lauderdale, FL, Apr. 2002.]]
[5]
A. Darte, J. Mellor-Crummey, R. Fowler, and D. Chavarría-Miranda. Generalized Multipartitioning of Multi-dimensional Arrays for Parallelizing Line-sweep Applications. Journal of Parallel and Distributed Computing, 63(9), Sept. 2003.]]
[6]
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications v1.1.1, October 2003.]]
[7]
I. Foster, J. Geisler, C. Kesselman, and S. Tuecke. Managing Multiple Communication Methods in High-performance Networked Computing Systems. Journal of Parallel and Distributed Computing, 40, 1997.]]
[8]
W. Gropp, M. Snir, B. Nitzberg, and E. Lusk. MPI: The Complete Reference. MIT Press, second edition, 1998.]]
[9]
C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.]]
[10]
S. S. Lumetta, A. M. Mainwaring, and D. E. Culler. Multi-protocol Active Messages on a cluster of SMP's. In Supercomputing'97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), pages 1--22, New York, NY, USA, 1997. ACM Press.]]
[11]
P. Mohapatra. Wormhole routing techniques for directly connected multicomputer systems. ACM Comput. Surv., 30(3):374--410, 1998.]]
[12]
J. Nieplocha and B. Carpenter. ARMCI: A Portable Remote Memory Copy Libray for Distributed Array Libraries and Compiler Run-time Systems. In Proceedings of the 11th IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pages 533--546, London, UK, 1999. Springer-Verlag.]]
[13]
J. Nieplocha, R. Harrison, and R. Littlefield. Global Arrays: A non-uniform memory access programming model for high-performance computers. J. Supercomput., 10:197--220, 1996.]]
[14]
J. Nieplocha, J. Ju, and T. Straatsma. A Multiprotocol Communication Support for the Global Address Space Programmming Model on the IBM SP. In A. Bode, T. Ludwig, W. Karl, and R. Wismuller, editors, Proceedings of the European Conference on Parallel Computing, number 1900 in Lecture Notes in Computer Science, pages 718--728, Munich, Germany, August 2000. Springer-Verlag.]]
[15]
J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Apra. Advances, applications and performance of the Global Arrays shared memory programming toolkit. International Journal of High Performance Computing and Applications, 2006. to appear.]]
[16]
R. W. Numrich and J. K. Reid. Co-Array Fortran for parallel programming. ACM Fortran Forum, 17(2):1--31, August 1998.]]

Cited By

View all
  • (2020)Performance drop at executing communication-intensive parallel algorithmsThe Journal of Supercomputing10.1007/s11227-019-03142-8Online publication date: 6-Jan-2020
  • (2019)Effect of MPI tasks location on cluster throughput using NASCluster Computing10.1007/s10586-018-02898-7Online publication date: 3-Jan-2019
  • (2019)Benchmarking LAMMPS: Sensitivity to Task Location Under CPU-Based Weak-ScalingHigh Performance Computing10.1007/978-3-030-16205-4_17(224-238)Online publication date: 31-Mar-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '06: Proceedings of the 3rd conference on Computing frontiers
May 2006
430 pages
ISBN:1595933026
DOI:10.1145/1128022
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clusters of SMPs
  2. communication optimization
  3. data layout optimization
  4. topology awareness

Qualifiers

  • Article

Conference

CF06
Sponsor:
CF06: Computing Frontiers Conference
May 3 - 5, 2006
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Performance drop at executing communication-intensive parallel algorithmsThe Journal of Supercomputing10.1007/s11227-019-03142-8Online publication date: 6-Jan-2020
  • (2019)Effect of MPI tasks location on cluster throughput using NASCluster Computing10.1007/s10586-018-02898-7Online publication date: 3-Jan-2019
  • (2019)Benchmarking LAMMPS: Sensitivity to Task Location Under CPU-Based Weak-ScalingHigh Performance Computing10.1007/978-3-030-16205-4_17(224-238)Online publication date: 31-Mar-2019
  • (2017)Benchmarking Performance: Influence of Task Location on Cluster ThroughputHigh Performance Computing10.1007/978-3-319-73353-1_9(125-138)Online publication date: 28-Dec-2017
  • (2012)Optimizing Process-to-Core Mappings for Application Level Multi-dimensional MPI CommunicationsProceedings of the 2012 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2012.47(486-494)Online publication date: 24-Sep-2012
  • (2011)Optimizing Process-to-Core Mappings for Two Dimensional Broadcast/Reduce on Multicore ArchitecturesProceedings of the 2011 International Conference on Parallel Processing10.1109/ICPP.2011.26(404-413)Online publication date: 13-Sep-2011
  • (2008)Performance effects of gram-schmidt orthogonalization on multi-core infiniband clusters2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536474(1-8)Online publication date: Apr-2008
  • (2008)Cache optimization for mixed regular and irregular computations2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536184(1-8)Online publication date: Apr-2008
  • (2008)Mapping Algorithms for Multiprocessor Tasks on Multi-Core ClustersProceedings of the 2008 37th International Conference on Parallel Processing10.1109/ICPP.2008.42(141-148)Online publication date: 9-Sep-2008
  • (2008)Performance Evaluation of Clusters with ccNUMA Nodes - A Case StudyProceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2008.111(320-327)Online publication date: 25-Sep-2008

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media