Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1062261.1062305acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Exploiting processor groups to extend scalability of the GA shared memory programming model

Published: 04 May 2005 Publication History

Abstract

Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, processor groups were very effective in improving scalability of a Molecular Dynamics application

References

[1]
H. Bal and M. Haines, "Approaches for integrating task and data parallelism," IEEE Concurrency, vol. 6, pp. 74--84, 1998.
[2]
T. Rauber and G. Rünger, " Parallel execution of embedded and iterated Runge-Kutta methods," Concurrency: Practice and Experience, vol. 11, pp. 367--385, 1999.
[3]
J. Subhlok and B. Yang, "A new model for integrating nested task and data parallel programming," Proceedings of 8th ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, New York, 1997.
[4]
T. Rauber and G. Rünger, "Library support for hierarchical multi-processor tasks," Proceedings of ACM/IEEE conference on Supercomputing, 2002.
[5]
J. R. McCombs and A. Stathopoulos, "Multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters," Parallel Computing, vol. 29, 2003.
[6]
S. Dong, D. Lucor, V. Symeonidis, J. Xu, and G. E. Karniadakis, "Multilevel parallelization models: application to VIV," Proceedings of User Group Conference (DoD UGC'03), 2003.
[7]
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren, "Introduction to UPC and Language Specification," Center for Computing Sciences CCS-TR-99-157, 1999.
[8]
H. Jin, G. Jost, J. Yan, E. Ayguade, M. Gonzalez, and X. Martorell, "Automatic Multilevel Parallelization Using OpenMP," Proceedings of EWOMP, 2001.
[9]
J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Apra, "Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit," International Journal of High Performance Computing Applications, accepted for publication.
[10]
J. Nieplocha, R. J. Harrison, and R. J. Littlefield, "Global arrays: A nonuniform memory access programming model for high-performance computers," Journal of Supercomputing, vol. 10, pp. 169--189, 1996.
[11]
D. E. Bernholdt, J. Nieplocha, and P. Sadayappan, "Raising the Level of Programming Abstraction in Scalable Programming Models," Proceedings of HPCA Workshop on Productivity and Performance in High-End Computing (P-PHEC 2004), Madrid, Spain, 2004.
[12]
D. E. Bernholdt, E. Apra, H. A. Fruchtl, M. F. Guest, R. J. Harrison, R. A. Kendall, R. A. Kutteh, X. Long, J. B. Nicholas, J. A. Nichols, H. L. Taylor, A. T. Wong, G. I. Fann, R. J. Littlefield, and J. Nieplocha, "Parallel Computational Chemistry Made Easier: The Development of NWChem," Int. J. Quantum Chem. Symposium, vol. 29, pp. 475--483, 1995.
[13]
MPI-Forum, "MPI: a message-passing interface standard," International Journal of Supercomputer Applications and High Performance Computing, vol. 8, pp. 159--416, 1994.
[14]
Y. Zhou, L. Iftode, and K. Li, "Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems," Proceedings of Operating Systems Design and Implementation Symposium, 1996.
[15]
A. L. Cox, S. Dwarkadas, H. Lu, and W. Zwaenepoel, "Evaluating the performance of software distributed shared memory as a target for parallelizing compilers," Proceedings of 1997 11th International Parallel Processing Symposium, IPPS 97, Apr 1-5 1997, Geneva, Switz, 1997.
[16]
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon, "Midway distributed shared memory system," Proceedings of 38th Annual IEEE Computer Society International Computer Conference - COMPCON SPRING '93, Feb 22-26 1993, San Francisco, CA, USA, 1993.
[17]
V. W. Freeh and G. R. Andrews, "Dynamically controlling false sharing in distributed shared memory," Proceedings of 1996 5th IEEE International Symposium on High Performance Distributed Computing, Aug 6-9 1996, Syracuse, NY, USA, 1996.
[18]
J. Nieplocha, V. Tipparaju, M. Krishnakumar, G. Santhmaraman, DK Panda, Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Operations on Clusters, Proc. IEEE Cluster'03. HK. 2003.
[19]
J. Nieplocha and B. Carpenter, "ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems," Proceedings of RTSPP of IPPS/SDP'99, 1999.
[20]
ARMCI: A Portable Aggregate Remote Memory Copy Interface, http://www.emsl.pnl.gov/docs/parsoft/armci/armci1-1.pdf, 2000
[21]
M. Krishnan, V. Tipparaju, B. Palmer, and J. Nieplocha, "Processor-Group Aware Runtime Support for Shared- and Global-Address Space Models," Proceedings of 3rd Workshop on Compile and Runtime Techniques for Parallel Computing, International Conference on Parallel Processing, 2004.
[22]
W. R. Stevens, Advanced Programming in the UNIX Environment: Addison Wesley, 1992.
[23]
V. Tipparaju, G. Santhmaraman, J. Nieplocha, and D. K. Panda, "Host-assised zero-copy remote memory access communication on Infiniband," Proceedings of International Parallel and Distributed Computing Symposium (IPDPS), Santa Fe, NM, USA, 2004.
[24]
L. S. Lin, R. Anderson, B. Chamberlain, S. Choi, G. Forman, E. Lewis, and W. D. Weathersby., "ZPL vs. HPF: A comparison of performance and programming style," Department of Computer Science and Engineering, University of Washington 95--11--05, 1995.
[25]
C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. M. Mellor-Crummey, "Co-array Fortran Performance and Potential: An NPB Experimental Study," Proceedings of LCPC, 2003.
[26]
D. Bailey, T. Harris, W. Saphir, R. Van der Vijingaart, A. Woo, and M. Yarrow, "The NAS Parallel Benchmarks 2.0," NASA Ames Research Center, Moffett Field, CA NAS-95-020, 1995.
[27]
G. Krawezik and F. Cappello., "Performance Comparison of MPI and three OpenMP Programming Styles on Shared Memory Multiprocessors," Proceedings of 15th ACM Symp on Parallel algorithms and architectures, 2003.
[28]
H. Jin, M. Frumkin, and J. Yan., "The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance," NASA Ames Research Center NAS99 -011, 1999.
[29]
M. Frumkin, H. Jin, and J. Yan, "Implementation of NAS Parallel Benchmarks in High Performance Fortran," Proceedings of International Parallel and Distributed Processing Symposium, 2000.
[30]
T. A. El-Ghazawi and F. Cantonnet, "UPC Performance and Potential: A NPB Experimental Study," Proceedings of Supercomputing, 2002.
[31]
S. White, A. Alund, and V. S. Sunderam, " Performance of the NAS parallel benchmarks on PVM-Based networks," Parallel and Distributed Computing, vol. 26, pp. 61-71, 1995.
[32]
C. Clemencon, K. M. Decker, V. R. Deshpande, A. Endo, J. Fritscher, P. A. R. Lorenzo, N. Masuda, A. Müller, R. Rühl, W. Sawyer, B. J. N. Wylie, and F. Zimmermann, "HPF and MPI implementation of the NAS Parallel Benchmarks supported by integrated program engineering tools," Proceedings of PDCS'96, 1996.

Cited By

View all
  • (2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
  • (2015)Multilevel Task Parallelism Exploitation on Asymmetric Sets of Tasks and When Using Third-Party ToolsProceedings of the 2015 14th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2015.13(46-55)Online publication date: 29-Jun-2015
  • (2007)Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote MemoryJournal of Grid Computing10.1007/s10723-007-9075-75:2(213-234)Online publication date: 14-Apr-2007
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '05: Proceedings of the 2nd conference on Computing frontiers
May 2005
467 pages
ISBN:1595930191
DOI:10.1145/1062261
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 May 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. extreme scalability
  2. global arrays
  3. multi-level parallelism
  4. processor groups

Qualifiers

  • Article

Conference

CF05
Sponsor:
CF05: Computing Frontiers Conference
May 4 - 6, 2005
Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)EPIC: A framework to exploit parallelism in irregular codesConcurrency and Computation: Practice and Experience10.1002/cpe.384229:2Online publication date: 12-May-2016
  • (2015)Multilevel Task Parallelism Exploitation on Asymmetric Sets of Tasks and When Using Third-Party ToolsProceedings of the 2015 14th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2015.13(46-55)Online publication date: 29-Jun-2015
  • (2007)Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote MemoryJournal of Grid Computing10.1007/s10723-007-9075-75:2(213-234)Online publication date: 14-Apr-2007
  • (2006)Advances, Applications and Performance of the Global Arrays Shared Memory Programming ToolkitInternational Journal of High Performance Computing Applications10.1177/109434200606450320:2(203-231)Online publication date: 1-May-2006
  • (2006)ScalaBLASTIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.11217:8(740-749)Online publication date: 1-Aug-2006
  • (2005)Multilevel Parallelism in Computational Chemistry using Common Component Architecture and Global ArraysProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.46Online publication date: 12-Nov-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media