research-article

Towards a framework for abstracting accelerators in parallel applications: experience with cell

Authors:

David M. Kunzman,

Laxmikant V. KaléAuthors Info & Claims

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Article No.: 54, Pages 1 - 12

https://doi.org/10.1145/1654059.1654114

Published: 14 November 2009 Publication History

Abstract

While accelerators have become more prevalent in recent years, they are still considered hard to program. In this work, we extend a framework for parallel programming so that programmers can easily take advantage of the Cell processor's Synergistic Processing Elements (SPEs) as seamlessly as possible. Using this framework, the same application code can be compiled and executed on multiple platforms, including x86-based and Cell-based clusters. Furthermore, our model allows independently developed libraries to efficiently time-share one or more SPEs by interleaving work from multiple libraries. To demonstrate the framework, we present performance data for an example molecular dynamics (MD) application. When compared to a single Xeon core utilizing streaming SIMD extensions (SSE), the MD program achieves a speedup of 5.74 on a single Cell chip (with 8 SPEs). In comparison, a similar speedup of 5.89 is achieved using six Xeon (x86) cores.

References

[1]

Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, July 2007. http://www.bsc.es/media/1002.pdf.

[2]

K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Entering the petaflop era: the architecture and performance of roadrunner. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

[3]

P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. CellSs: A Programming Model for the Cell BE Architecture. In Proceedings of the ACM/IEEE SC 2006 Conference, November 2006.

Digital Library

[4]

A. Bhatele, S. Kumar, C. Mei, J. C. Phillips, G. Zheng, and L. V. Kale. Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, April 2008.

[5]

E. Bohm, G. J. Martyna, A. Bhatele, S. Kumar, L. V. Kale, J. A. Gunnels, and M. E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM Journal of Research and Development: Applications of Massively Parallel Systems, 52(1/2):159--174, 2008.

Digital Library

[6]

B. Bouzas, R. Cooper, J. Greene, M. Pepe, and M. J. Prelle. MultiCore Framework: An API for Programming Heterogeneous Multicore Processors. Mercury Computer System's Literature Library (http://www.mc.com/mediacenter/litlibrarylist.aspx).

[7]

L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science&Engineering, 5(1), January--March 1998.

Digital Library

[8]

A. E. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing compiler for the cell processor. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 161--172, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[9]

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006.

Digital Library

[10]

P. Jetley, F. Gioachin, C. Mendes, L. V. Kale, and T. R. Quinn. Massively Parallel Cosmological Simulations with ChaNGa. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, 2008.

[11]

J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Processor. IBM Journal of Research and Development: POWER5 and Packaging, 49(4/5):589, 2005.

Digital Library

[12]

L. V. Kalé. Performance and productivity in parallel programming via processor virtualization. In Proc. of the First Intl. Workshop on Productivity and Performance in High-End Computing (at HPCA 10), Madrid, Spain, February 2004.

[13]

L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006.

Digital Library

[14]

D. Kunzman. Charm++ on the Cell Processor. Master's thesis, Dept. of Computer Science, University of Illinois, 2006. http://charm.cs.uiuc.edu/papers/KunzmanMSThesis06.shtml.

[15]

D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé. Charm++, Offload API, and the Cell Processor. In Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA, September 2006.

[16]

M. D. McCool. Data-parallel programming on the cell be and the gpu using the rapidmind development platform. In GSPx Multicore Applications Converence, 2006.

[17]

M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani. MPI microtask for programming the cell broadband engine#8482;processor. IBM Syst. J., 45(1):85--102, 2006.

Digital Library

[18]

L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1--15, 2008.

Digital Library

Cited By

Varbanescu Avan Nieuwpoort RHijma PBal HBadia RMartorell X(2017)Programming Models for Multicore and Many‐Core Computing SystemsProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch2(29-58)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch2
Lifflander JEvans GArya AKale L(2012)Dynamic Scheduling for Work Agglomeration on Heterogeneous ClustersProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.297(2404-2413)Online publication date: 21-May-2012
https://dl.acm.org/doi/10.1109/IPDPSW.2012.297
Hamidouche KMendonca FFalcou Jde Melo AEtiemble D(2012)Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++International Journal of Parallel Programming10.1007/s10766-012-0209-641:1(111-136)Online publication date: 11-Aug-2012
https://doi.org/10.1007/s10766-012-0209-6
Show More Cited By

Index Terms

Towards a framework for abstracting accelerators in parallel applications: experience with cell
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Towards achieving performance portability using directives for accelerators
WACCPD '16: Proceedings of the Third International Workshop on Accelerator Programming Using Directives

In this paper we explore the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine ...
Developmental directions in parallel accelerators
AusPDC '14: Proceedings of the Twelfth Australasian Symposium on Parallel and Distributed Computing - Volume 152

Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such ...
Comparing Hardware Accelerators in Scientific Applications: A Case Study

Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

November 2009

778 pages

ISBN:9781605587448

DOI:10.1145/1654059

Conference Chair:
Wilfred Pinfold

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Institutes of Health

Conference

SC '09

Sponsor:

SIGARCH
IEEE-CS

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 20, 2009

Oregon, Portland

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
19
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Varbanescu Avan Nieuwpoort RHijma PBal HBadia RMartorell X(2017)Programming Models for Multicore and Many‐Core Computing SystemsProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch2(29-58)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch2
Lifflander JEvans GArya AKale L(2012)Dynamic Scheduling for Work Agglomeration on Heterogeneous ClustersProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.297(2404-2413)Online publication date: 21-May-2012
https://dl.acm.org/doi/10.1109/IPDPSW.2012.297
Hamidouche KMendonca FFalcou Jde Melo AEtiemble D(2012)Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++International Journal of Parallel Programming10.1007/s10766-012-0209-641:1(111-136)Online publication date: 11-Aug-2012
https://doi.org/10.1007/s10766-012-0209-6
Kretz MLindenstruth V(2012)Vc: A C++ library for explicit vectorizationSoftware—Practice & Experience10.1002/spe.114942:11(1409-1430)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1002/spe.1149
Ravi VMa WChiu DAgrawal G(2012)Compiler and runtime support for enabling reduction computations on heterogeneous systemsConcurrency and Computation: Practice & Experience10.1002/cpe.184824:5(463-480)Online publication date: 1-Apr-2012
https://dl.acm.org/doi/10.1002/cpe.1848
Aldinucci MDanelutto MKilpatrick PMeneghin MTorquati M(2011)Accelerating code on multi-cores with fastflowProceedings of the 17th international conference on Parallel processing - Volume Part II10.5555/2033408.2033428(170-181)Online publication date: 29-Aug-2011
https://dl.acm.org/doi/10.5555/2033408.2033428
Kunzman DKalé L(2011)Programming heterogeneous clusters with accelerators using object-based programmingScientific Programming10.1155/2011/52571719:1(47-62)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1155/2011/525717
Damevski KMuralimanohar MDig DBatory D(2011)A refactoring tool to extract GPU kernelsProceedings of the 4th Workshop on Refactoring Tools10.1145/1984732.1984739(29-32)Online publication date: 22-May-2011
https://dl.acm.org/doi/10.1145/1984732.1984739
Kicherer MBuchty RKarl WKatevenis MMartonosi MKozyrakis CTemam O(2011)Cost-aware function migration in heterogeneous systemsProceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers10.1145/1944862.1944883(137-145)Online publication date: 24-Jan-2011
https://dl.acm.org/doi/10.1145/1944862.1944883
Hamidouche KMendonca FFalcou JEtiemble D(2011)Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2011.16(136-143)Online publication date: 26-Oct-2011
https://dl.acm.org/doi/10.1109/SBAC-PAD.2011.16
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents