Article

Exploiting processor groups to extend scalability of the GA shared memory programming model

Authors:

Jarek Nieplocha,

Manoj Krishnan,

Vinod Tipparaju,

Yeliang ZhangAuthors Info & Claims

CF '05: Proceedings of the 2nd conference on Computing frontiers

Pages 262 - 272

https://doi.org/10.1145/1062261.1062305

Published: 04 May 2005 Publication History

Abstract

Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, processor groups were very effective in improving scalability of a Molecular Dynamics application

References

[1]

H. Bal and M. Haines, "Approaches for integrating task and data parallelism," IEEE Concurrency, vol. 6, pp. 74--84, 1998.

Digital Library

[2]

T. Rauber and G. Rünger, " Parallel execution of embedded and iterated Runge-Kutta methods," Concurrency: Practice and Experience, vol. 11, pp. 367--385, 1999.

[3]

J. Subhlok and B. Yang, "A new model for integrating nested task and data parallel programming," Proceedings of 8th ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, New York, 1997.

Digital Library

[4]

T. Rauber and G. Rünger, "Library support for hierarchical multi-processor tasks," Proceedings of ACM/IEEE conference on Supercomputing, 2002.

Digital Library

[5]

J. R. McCombs and A. Stathopoulos, "Multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters," Parallel Computing, vol. 29, 2003.

Digital Library

[6]

S. Dong, D. Lucor, V. Symeonidis, J. Xu, and G. E. Karniadakis, "Multilevel parallelization models: application to VIV," Proceedings of User Group Conference (DoD UGC'03), 2003.

Digital Library

[7]

W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and Language Specification," Center for Computing Sciences CCS-TR-99-157, 1999.

[8]

H. Jin, G. Jost, J. Yan, E. Ayguade, M. Gonzalez, and X. Martorell, "Automatic Multilevel Parallelization Using OpenMP," Proceedings of EWOMP, 2001.

[9]

J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Apra, "Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit," International Journal of High Performance Computing Applications, accepted for publication.

Digital Library

[10]

J. Nieplocha, R. J. Harrison, and R. J. Littlefield, "Global arrays: A nonuniform memory access programming model for high-performance computers," Journal of Supercomputing, vol. 10, pp. 169--189, 1996.

Digital Library

[11]

D. E. Bernholdt, J. Nieplocha, and P. Sadayappan, "Raising the Level of Programming Abstraction in Scalable Programming Models," Proceedings of HPCA Workshop on Productivity and Performance in High-End Computing (P-PHEC 2004), Madrid, Spain, 2004.

[12]

D. E. Bernholdt, E. Apra, H. A. Fruchtl, M. F. Guest, R. J. Harrison, R. A. Kendall, R. A. Kutteh, X. Long, J. B. Nicholas, J. A. Nichols, H. L. Taylor, A. T. Wong, G. I. Fann, R. J. Littlefield, and J. Nieplocha, "Parallel Computational Chemistry Made Easier: The Development of NWChem," Int. J. Quantum Chem. Symposium, vol. 29, pp. 475--483, 1995.

[13]

MPI-Forum, "MPI: a message-passing interface standard," International Journal of Supercomputer Applications and High Performance Computing, vol. 8, pp. 159--416, 1994.

[14]

Y. Zhou, L. Iftode, and K. Li, "Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems," Proceedings of Operating Systems Design and Implementation Symposium, 1996.

Digital Library

[15]

A. L. Cox, S. Dwarkadas, H. Lu, and W. Zwaenepoel, "Evaluating the performance of software distributed shared memory as a target for parallelizing compilers," Proceedings of 1997 11th International Parallel Processing Symposium, IPPS 97, Apr 1-5 1997, Geneva, Switz, 1997.

Digital Library

[16]

B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon, "Midway distributed shared memory system," Proceedings of 38th Annual IEEE Computer Society International Computer Conference - COMPCON SPRING '93, Feb 22-26 1993, San Francisco, CA, USA, 1993.

[17]

V. W. Freeh and G. R. Andrews, "Dynamically controlling false sharing in distributed shared memory," Proceedings of 1996 5th IEEE International Symposium on High Performance Distributed Computing, Aug 6-9 1996, Syracuse, NY, USA, 1996.

Digital Library

[18]

J. Nieplocha, V. Tipparaju, M. Krishnakumar, G. Santhmaraman, DK Panda, Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Operations on Clusters, Proc. IEEE Cluster'03. HK. 2003.

[19]

J. Nieplocha and B. Carpenter, "ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems," Proceedings of RTSPP of IPPS/SDP'99, 1999.

Digital Library

[20]

ARMCI: A Portable Aggregate Remote Memory Copy Interface, http://www.emsl.pnl.gov/docs/parsoft/armci/armci1-1.pdf, 2000

[21]

M. Krishnan, V. Tipparaju, B. Palmer, and J. Nieplocha, "Processor-Group Aware Runtime Support for Shared- and Global-Address Space Models," Proceedings of 3rd Workshop on Compile and Runtime Techniques for Parallel Computing, International Conference on Parallel Processing, 2004.

Digital Library

[22]

W. R. Stevens, Advanced Programming in the UNIX Environment: Addison Wesley, 1992.

Digital Library

[23]

V. Tipparaju, G. Santhmaraman, J. Nieplocha, and D. K. Panda, "Host-assised zero-copy remote memory access communication on Infiniband," Proceedings of International Parallel and Distributed Computing Symposium (IPDPS), Santa Fe, NM, USA, 2004.

[24]

L. S. Lin, R. Anderson, B. Chamberlain, S. Choi, G. Forman, E. Lewis, and W. D. Weathersby., "ZPL vs. HPF: A comparison of performance and programming style," Department of Computer Science and Engineering, University of Washington 95--11--05, 1995.

[25]

C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. M. Mellor-Crummey, "Co-array Fortran Performance and Potential: An NPB Experimental Study," Proceedings of LCPC, 2003.

[26]

D. Bailey, T. Harris, W. Saphir, R. Van der Vijingaart, A. Woo, and M. Yarrow, "The NAS Parallel Benchmarks 2.0," NASA Ames Research Center, Moffett Field, CA NAS-95-020, 1995.

[27]

G. Krawezik and F. Cappello., "Performance Comparison of MPI and three OpenMP Programming Styles on Shared Memory Multiprocessors," Proceedings of 15th ACM Symp on Parallel algorithms and architectures, 2003.

Digital Library

[28]

H. Jin, M. Frumkin, and J. Yan., "The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance," NASA Ames Research Center NAS99 -011, 1999.

[29]

M. Frumkin, H. Jin, and J. Yan, "Implementation of NAS Parallel Benchmarks in High Performance Fortran," Proceedings of International Parallel and Distributed Processing Symposium, 2000.

[30]

T. A. El-Ghazawi and F. Cantonnet, "UPC Performance and Potential: A NPB Experimental Study," Proceedings of Supercomputing, 2002.

Digital Library

[31]

S. White, A. Alund, and V. S. Sunderam, " Performance of the NAS parallel benchmarks on PVM-Based networks," Parallel and Distributed Computing, vol. 26, pp. 61-71, 1995.

Digital Library

[32]

C. Clemencon, K. M. Decker, V. R. Deshpande, A. Endo, J. Fritscher, P. A. R. Lorenzo, N. Masuda, A. Müller, R. Rühl, W. Sawyer, B. J. N. Wylie, and F. Zimmermann, "HPF and MPI implementation of the NAS Parallel Benchmarks supported by integrated program engineering tools," Proceedings of PDCS'96, 1996.

Cited By

Neves D(2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
https://doi.org/10.1002/cpe.3842
Neves D(2015)Multilevel Task Parallelism Exploitation on Asymmetric Sets of Tasks and When Using Third-Party ToolsProceedings of the 2015 14th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2015.13(46-55)Online publication date: 29-Jun-2015
https://dl.acm.org/doi/10.1109/ISPDC.2015.13
Mills RYue CStathopoulos ANikolopoulos D(2007)Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote MemoryJournal of Grid Computing10.1007/s10723-007-9075-75:2(213-234)Online publication date: 14-Apr-2007
https://doi.org/10.1007/s10723-007-9075-7
Show More Cited By

Index Terms

Exploiting processor groups to extend scalability of the GA shared memory programming model
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Supporting tasks with adaptive groups in data parallel programming

A set of communication operations is defined, which allows a form of task parallelism to be achieved in a data parallel architecture. The set of processors can be subdivided recursively into groups, and a communication operation inside a group never ...
Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs

Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single ...
Shared memory programming for large scale machines
PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation

This paper describes the design and implementation of a scalable run-time system and an optimizing compiler for Unified Parallel C (UPC). An experimental evaluation on BlueGene/L®, a distributed-memory machine, demonstrates that the combination of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '05: Proceedings of the 2nd conference on Computing frontiers

May 2005

467 pages

ISBN:1595930191

DOI:10.1145/1062261

General Chair:
Nader Bagherzadeh,
Program Chairs:
Mateo Valero,
Alex Ramirez

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 May 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CF05

Sponsor:

CF05: Computing Frontiers Conference

May 4 - 6, 2005

Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
283
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Neves D(2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
https://doi.org/10.1002/cpe.3842
Neves D(2015)Multilevel Task Parallelism Exploitation on Asymmetric Sets of Tasks and When Using Third-Party ToolsProceedings of the 2015 14th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2015.13(46-55)Online publication date: 29-Jun-2015
https://dl.acm.org/doi/10.1109/ISPDC.2015.13
Mills RYue CStathopoulos ANikolopoulos D(2007)Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote MemoryJournal of Grid Computing10.1007/s10723-007-9075-75:2(213-234)Online publication date: 14-Apr-2007
https://doi.org/10.1007/s10723-007-9075-7
Nieplocha JPalmer BTipparaju VKrishnan MTrease HAprà E(2006)Advances, Applications and Performance of the Global Arrays Shared Memory Programming ToolkitInternational Journal of High Performance Computing Applications10.1177/109434200606450320:2(203-231)Online publication date: 1-May-2006
https://dl.acm.org/doi/10.1177/1094342006064503
Oehmen CNieplocha J(2006)ScalaBLASTIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.11217:8(740-749)Online publication date: 1-Aug-2006
https://dl.acm.org/doi/10.1109/TPDS.2006.112
Krishnan MAlexeev YWindus TNieplocha JKramer W(2005)Multilevel Parallelism in Computational Chemistry using Common Component Architecture and Global ArraysProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.46Online publication date: 12-Nov-2005
https://dl.acm.org/doi/10.1109/SC.2005.46

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten