Nothing Special   »   [go: up one dir, main page]

Skip to main content

Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6067))

  • 1374 Accesses

Abstract

This paper presents an approach to adaptation of the double-precision matrix multiplication to the architecture of Cell processors. The algorithm used for the adaptation on a single SPE is based on C = C + A*B operation performed for matrices of size 64 ×64; these matrices are further divided into smaller submatrices which correspond to micro-kernel operations. Our approach is based on a performance model which is constructed as a function of submatrix size. The model accounts for such factors as size of local storage, number of registers, properties of double-precision operations, balance between pipelines, etc. This approach allows us to take into consideration properties of the first generation of Cell processors and its successor - PowerXCell 8i.

This adaptation is followed by an optimization phase which includes loop transformations, kernel implementation with SIMD instructions, and other transformations necessary to achieve balance between even and odd pipelines. Finally we present hand-tunings performed with the IBM Assembly Visualizer tool. The proposed adaptation and optimizations allow us to achieve about 96% of the peak performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Buttari, A., Dongarra, J., Kurzak, J.: Limitations of the PlayStation3 for High Performance Cluster Computing, http://www.netlib.org/lapack/lawnspdf/lawn185.pdf

  2. Chen, T., Raghavan, R., Dale, J.N., Iwata, E.: Cell Broadband Engine Architecture and its first implementation - A performance view. IBM Journal of Research and Development 51(5), 559–572 (2007)

    Article  Google Scholar 

  3. Dolfen, A., Gutheil, I., Homberg, W., Koch, E.: Applications on Juelich’s Cell-based Cluster JUICE, http://www.fz-juelich.de/jsc/datapool/cell/Para08_apps_on_juice.pdf

  4. Kistler, M., Gunnels, J., Brokenshire, D., Benton, B.: Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor. Scientific Programming 17(1-2), 43–57 (2009)

    Google Scholar 

  5. Kurzak, J., Alvaro, W., Dongarra, J.: Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor. Parallel Computing 35(3), 138–150 (2009)

    Article  Google Scholar 

  6. Wang, H., Takizawa, H., Kobayashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. Int. Journal of Grid and High Performance Computing 1(2), 30–44 (2009)

    Google Scholar 

  7. Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the Cell Processor for Scientific Computing. In: Proc. 3rd Conf. on Computing Frontiers, Ischia, Italy, pp. 9–20 (2006)

    Google Scholar 

  8. Woodward, P.R., Jayaraj, J., Lin, P., Yew, P.: Moving Scientific Codes to Multicore Microprocessor CPUs. Computing in Science and Engineering 10(6), 16–25 (2008)

    Article  Google Scholar 

  9. IBM Assembly Visualizer for Cell Broadband Engine, http://www.alphaworks.ibm.com/tech/asmvis

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rojek, K., Szustak, Ł. (2010). Adaptation of Double-Precision Matrix Multiplication to the Cell Broadband Engine Architecture. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14390-8_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14389-2

  • Online ISBN: 978-3-642-14390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics