Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Remote memory access: A case for portable, efficient and library independent parallel programming

Published: 01 August 2004 Publication History

Abstract

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

References

[1]
{1} G. Bilardi, K.T. Herley, A. Pietracaprina, G. Pucci and P. Spirakis, BSP vs. LogP, in Proceedings of the 8-th ACM Symposium on Parallel Algorithms and Architectures, ACM Press, 1996, pp. 25-32.
[2]
{2} O. Bonorden, B. Juurlink, I. von Otte and I. Rieping, The Paderborn University BSP (PUB) Library, Parallel Computing 29 (2003), 187-207.
[3]
{3} H.T. Cormen, C.E. Leiserson and L.R. Rivest, Introduction to Algorithms, MIT Press, 1990.
[4]
{4} SHMEM Library Ready Reference, Cray Inc.
[5]
{5} Critical Software Inc., WMPI.
[6]
{6} D.E. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian and T. von Eicken, LogP: Towards a Realistic Model of Parallel Computation, in Proceedings of the Fourth ACM SIGPLAN Symposium of Principles and Practice of Parallel Programming, San Diego, CA, ACM Press, 1993, pp. 1-12.
[7]
{7} F. Dehne A. Fabri and A. Rau-Chaplin, Scalable Parallel Geometric Algorithms for Coarse Grained Multicomputers, in Proceedings of ACM Symposium of Computational Geometry, San Diego, CA, ACM Press, 1993, pp. 298-307.
[8]
{8} R. Dohmen, Experiences with switching from SHMEM to MPI as communication library, in Proceedings of Sixth European SGI/Cray MPP Workshop Manchester, UK, 7-8 September 2000.
[9]
{9} A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jaing and V. Sunderam, PVM: A Users' Guide and Tutorial for Networked Parallel Computing, MIT Press, Boston, 1994.
[10]
{10} A.V. Gerbessiotis and S.Y. Lee, Remote memory access: A case for portable, efficient and library independent parallel programming Technical Report CS-03-12, CS Department, New Jersey Institute of Technology.
[11]
{11} A.V. Gerbessiotis and L.G. Valiant, Direct bulk-synchronous parallel algorithms, Journal of Parallel and Distributed Computing 22 (1994), 251-267.
[12]
{12} A.V. Gerbessiotis and C.J. Siniolakis, Deterministic sorting and randomized median finding on the BSP model, in Proceedings of the 8-th ACM Symposium on Parallel Algorithms and Architectures, Padova, Italy, June 1996, pp. 223-232.
[13]
{13} A.V. Gerbessiotis, Architecture Independent Parallel Algorithm Design: Theory vs Practice, Future Generation Computer Systems 18 (2002), 573-593.
[14]
{14} A.V. Gerbessiotis, http://www.cs.njit.edu/~alexg, July 2003.
[15]
{15} P.B. Gibbons, Y. Matias and V. Ramachandran, The queue-read queue-write asynchronous PRAM model, in Proceedings of the 9-th ACM Symposium on Parallel Algorithms and Architectures, ACM Press, 1997, pp. 72-83.
[16]
{16} J.M.D. Hill, W. McColl, D.C. Stefanescu, M.W. Goudreau, K. Lang, S.B. Rao, T. Suel, T. Tsantilas and R. Bisseling, BSPlib: The BSP Programming Library, Parallel Computing 24(14) (1998), 1947-1980.
[17]
{17} LAM/MPI Parallel Computing, http://www.lam-mpi.org.
[18]
{18} G. Luecke and W. Hu, Evaluating the Performance of MPI-2 One-Sided Routines on a Cray SV1, Technical Report, Iowa State University, December 2002.
[19]
{19} G.R. Luecke, S. Spanoyannis and M. Kraeva, The Performance and Scalability of SHMEM and MPI-2 One-Sided Routines on a SGI Origin 2000 and a Cray T3E-600, Concurrency and Computation: Practice and Experience, to appear.
[20]
{20} G.R. Luecke, S. Spanoyannis and J. Coyle, The Performance of MPI Derived Types on a SGI Origin 2000, a Cray T3E- 900, a Myrinet Linux Cluster and an Ethernet Linux Cluster, Manuscript, 2001.
[21]
{21} R. Miller, A library for Bulk-Synchronous Parallel program ming, in Proceedings of the BCS Parallel Processing Specialist Group workshop on General Purpose Parallel Computing, 1993.
[22]
{22} MPICH Parallel Computing, //http://www-unix.mcs.anl.gov/ mpi/mpich/.
[23]
{23} D.B. Skillicorn, J.M.D. Hill and W.F. McColl, Questions and Answers about BSP, Scientific Programming 6 (1997), 249- 274.
[24]
{24} L.G. Valiant, A bridging model for parallel computation, Communications of the ACM 33(8) (August 1990), 103-111.

Cited By

View all
  • (2018)Measurement of the latency parameters of the Multi-BSP modelThe Journal of Supercomputing10.1007/s11227-013-1018-467:2(565-584)Online publication date: 31-Dec-2018
  • (2017)Asynchronous one-sided communications and synchronizations for a clustered manycore processorProceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia10.1145/3139315.3139318(51-60)Online publication date: 15-Oct-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Scientific Programming
Scientific Programming  Volume 12, Issue 3
August 2004
65 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 August 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Measurement of the latency parameters of the Multi-BSP modelThe Journal of Supercomputing10.1007/s11227-013-1018-467:2(565-584)Online publication date: 31-Dec-2018
  • (2017)Asynchronous one-sided communications and synchronizations for a clustered manycore processorProceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia10.1145/3139315.3139318(51-60)Online publication date: 15-Oct-2017

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media