Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1654059.1654114acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Towards a framework for abstracting accelerators in parallel applications: experience with cell

Published: 14 November 2009 Publication History

Abstract

While accelerators have become more prevalent in recent years, they are still considered hard to program. In this work, we extend a framework for parallel programming so that programmers can easily take advantage of the Cell processor's Synergistic Processing Elements (SPEs) as seamlessly as possible. Using this framework, the same application code can be compiled and executed on multiple platforms, including x86-based and Cell-based clusters. Furthermore, our model allows independently developed libraries to efficiently time-share one or more SPEs by interleaving work from multiple libraries. To demonstrate the framework, we present performance data for an example molecular dynamics (MD) application. When compared to a single Xeon core utilizing streaming SIMD extensions (SSE), the MD program achieves a speedup of 5.74 on a single Cell chip (with 8 SPEs). In comparison, a similar speedup of 5.89 is achieved using six Xeon (x86) cores.

References

[1]
Barcelona Supercomputing Center. SMP Superscalar (SMPSs) User's Manual, July 2007. http://www.bsc.es/media/1002.pdf.
[2]
K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang, S. Pakin, and J. C. Sancho. Entering the petaflop era: the architecture and performance of roadrunner. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.
[3]
P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta. CellSs: A Programming Model for the Cell BE Architecture. In Proceedings of the ACM/IEEE SC 2006 Conference, November 2006.
[4]
A. Bhatele, S. Kumar, C. Mei, J. C. Phillips, G. Zheng, and L. V. Kale. Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, April 2008.
[5]
E. Bohm, G. J. Martyna, A. Bhatele, S. Kumar, L. V. Kale, J. A. Gunnels, and M. E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM Journal of Research and Development: Applications of Massively Parallel Systems, 52(1/2):159--174, 2008.
[6]
B. Bouzas, R. Cooper, J. Greene, M. Pepe, and M. J. Prelle. MultiCore Framework: An API for Programming Heterogeneous Multicore Processors. Mercury Computer System's Literature Library (http://www.mc.com/mediacenter/litlibrarylist.aspx).
[7]
L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science&Engineering, 5(1), January--March 1998.
[8]
A. E. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P. H. Oden, D. A. Prener, J. C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind. Optimizing compiler for the cell processor. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 161--172, Washington, DC, USA, 2005. IEEE Computer Society.
[9]
K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006.
[10]
P. Jetley, F. Gioachin, C. Mendes, L. V. Kale, and T. R. Quinn. Massively Parallel Cosmological Simulations with ChaNGa. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, 2008.
[11]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Processor. IBM Journal of Research and Development: POWER5 and Packaging, 49(4/5):589, 2005.
[12]
L. V. Kalé. Performance and productivity in parallel programming via processor virtualization. In Proc. of the First Intl. Workshop on Productivity and Performance in High-End Computing (at HPCA 10), Madrid, Spain, February 2004.
[13]
L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006.
[14]
D. Kunzman. Charm++ on the Cell Processor. Master's thesis, Dept. of Computer Science, University of Illinois, 2006. http://charm.cs.uiuc.edu/papers/KunzmanMSThesis06.shtml.
[15]
D. Kunzman, G. Zheng, E. Bohm, and L. V. Kalé. Charm++, Offload API, and the Cell Processor. In Workshop on Programming Models for Ubiquitous Parallelism, Seattle, WA, USA, September 2006.
[16]
M. D. McCool. Data-parallel programming on the cell be and the gpu using the rapidmind development platform. In GSPx Multicore Applications Converence, 2006.
[17]
M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani. MPI microtask for programming the cell broadband engine#8482;processor. IBM Syst. J., 45(1):85--102, 2006.
[18]
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 27(3):1--15, 2008.

Cited By

View all
  • (2017)Programming Models for Multicore and Many‐Core Computing SystemsProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch2(29-58)Online publication date: 27-Jan-2017
  • (2012)Dynamic Scheduling for Work Agglomeration on Heterogeneous ClustersProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.297(2404-2413)Online publication date: 21-May-2012
  • (2012)Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++International Journal of Parallel Programming10.1007/s10766-012-0209-641:1(111-136)Online publication date: 11-Aug-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
November 2009
778 pages
ISBN:9781605587448
DOI:10.1145/1654059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '09
Sponsor:

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Programming Models for Multicore and Many‐Core Computing SystemsProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch2(29-58)Online publication date: 27-Jan-2017
  • (2012)Dynamic Scheduling for Work Agglomeration on Heterogeneous ClustersProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.297(2404-2413)Online publication date: 21-May-2012
  • (2012)Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++International Journal of Parallel Programming10.1007/s10766-012-0209-641:1(111-136)Online publication date: 11-Aug-2012
  • (2012)Vc: A C++ library for explicit vectorizationSoftware—Practice & Experience10.1002/spe.114942:11(1409-1430)Online publication date: 1-Nov-2012
  • (2012)Compiler and runtime support for enabling reduction computations on heterogeneous systemsConcurrency and Computation: Practice & Experience10.1002/cpe.184824:5(463-480)Online publication date: 1-Apr-2012
  • (2011)Accelerating code on multi-cores with fastflowProceedings of the 17th international conference on Parallel processing - Volume Part II10.5555/2033408.2033428(170-181)Online publication date: 29-Aug-2011
  • (2011)Programming heterogeneous clusters with accelerators using object-based programmingScientific Programming10.1155/2011/52571719:1(47-62)Online publication date: 1-Jan-2011
  • (2011)A refactoring tool to extract GPU kernelsProceedings of the 4th Workshop on Refactoring Tools10.1145/1984732.1984739(29-32)Online publication date: 22-May-2011
  • (2011)Cost-aware function migration in heterogeneous systemsProceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers10.1145/1944862.1944883(137-145)Online publication date: 24-Jan-2011
  • (2011)Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2011.16(136-143)Online publication date: 26-Oct-2011
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media