Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2145816.2145827acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

A hybrid approach of OpenMP for clusters

Published: 25 February 2012 Publication History

Abstract

We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.

References

[1]
Berkeley UPC - Unified Parallel C. Available at: upc.lbl.gov.
[2]
GCC Unified Parallel C. Available at: www.gccupc.org.
[3]
UPC NAS Parallel Benchmarks from The George Washington University High Performance Computing Laboratory. Available at: threads.hpcl.gwu.edu/sites/npb-upc.
[4]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. 1991.
[5]
M. M. Baskaran, N. Vydyanathan, U. K. R. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 219--228, New York, NY, USA, 2009. ACM.
[6]
D. Baxter, R. Mirchandaney, and J. H. Saltz. Run-time parallelization and scheduling of loops. In Proceedings of the first annual ACM symposium on Parallel Algorithms and Architectures, SPAA '89, pages 303--312, New York, NY, USA, 1989. ACM.
[7]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented Programming, Systems, Languages, and Applications. (OOPSLA '05), pages 519--538, New York, NY, USA, 2005. ACM.
[8]
S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System. In Proc. of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 186--197, 1996.
[9]
M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. In Symposium on Parallel and Distributed Processing, 2000.
[10]
M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 71, New York, NY, USA, 1995. ACM.
[11]
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Houston, Tex., 1993.
[12]
J. P. Hoeflinger. Extending OpenMP to Clusters. White Paper, 2006.
[13]
K. Kusano, M. Sato, T. Hosomi, and Y. Seo. The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4. In OpenMP Shared Memory Parallel Programming, volume 2104 of Lecture Notes in Computer Science, pages 20--30. Springer Berlin / Heidelberg, 2001.
[14]
O. Kwon, F. Jubair, S.-J. Min, H. Bae, R. Eigenmann, and S. Midkiff. Automatic Scaling of OpenMP Beyond Shared Memory. In LCPC 2011: Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011.
[15]
R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17 (2): 1--31, 1998.
[16]
Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24: 65--109, January 2002.
[17]
S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing, ICS '02, pages 274--284, New York, NY, USA, 2002. ACM.
[18]
H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A programming model performance study using the NAS parallel benchmarks. Scientific Programming, 18: 153--167, August 2010.
[19]
UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.
[20]
R. F. V. D. Wijngaart. Efficient Implementation of a 3-Dimensional ADI Method on the iPSC/860. In In Supercomputing '93, pages 102--111, 1993.
[21]
K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency - Practice and Experience, 10 (11-13): 825--836, 1998.

Cited By

View all
  • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
  • (2020)MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)10.1109/IPDRM51949.2020.00006(9-16)Online publication date: Nov-2020
  • (2019)libMPNodeProceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3303084.3309495(81-90)Online publication date: 17-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
February 2012
352 pages
ISBN:9781450311601
DOI:10.1145/2145816
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 8
    PPOPP '12
    August 2012
    334 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2370036
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. OpenMP
  3. hybrid
  4. optimization
  5. runtime data flow analysis
  6. runtime environment
  7. translator

Qualifiers

  • Research-article

Conference

PPoPP '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
  • (2020)MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)10.1109/IPDRM51949.2020.00006(9-16)Online publication date: Nov-2020
  • (2019)libMPNodeProceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3303084.3309495(81-90)Online publication date: 17-Feb-2019
  • (2019)D2PProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356205(1-22)Online publication date: 17-Nov-2019
  • (2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
  • (2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
  • (2017)A technique to automatically determine Ad-hoc communication patterns at runtimeParallel Computing10.1016/j.parco.2017.08.00969:C(45-62)Online publication date: 1-Nov-2017
  • (2016)IMPACCProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907302(189-201)Online publication date: 31-May-2016
  • (2015)Code Generation for Distributed-Memory ArchitecturesThe Computer Journal10.1093/comjnl/bxv077(bxv077)Online publication date: 15-Sep-2015
  • (2015)HYDRARevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_9(140-155)Online publication date: 9-Sep-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media