Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1583991.1584053acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Published: 11 August 2009 Publication History

Abstract

This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.

References

[1]
M. D. Adams and D. S. Wise. Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms. In MSPC, pages 41--50, New York, NY, USA, 2006. ACM.
[2]
D. Bader, J. Feo, J. Gilbert, J. Kepner, D. Koester, E. Loh, K. Madduri, B. Mann, and T. Meuse. HPCS scalable synthetic compact applications #2. Version 1.1.
[3]
G. E. Blelloch. Programming parallel algorithms. CACM, 39(3), Mar. 1996.
[4]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP, pages 207--216, Santa Barbara, CA, July 1995.
[5]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, Sept. 1999.
[6]
A. Buluç and J. R. Gilbert. On the representation and multiplication of hypersparse matrices. In IPDPS, pages 1--11, 2008.
[7]
U. Catalyurek and C. Aykanat. A fine-grain hypergraph model for 2D decomposition of sparse matrices. In IPDPS, page 118, Washington, DC, USA, 2001. IEEE Computer Society.
[8]
J. Cho, H. Garcia-Molina, T. Haveliwala, W. Lam, A. Paepcke, S. Raghavan, and G. Wesley. Stanford webbase components and applications. ACM Transactions on Internet Technology, 6(2):153--186, 2006.
[9]
Cilk Arts, Inc., Burlington, MA. Cilk Programmer's Guide, 2009. Available from http://www.cilk.com/.
[10]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.
[11]
E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 24th National Conference, pages 157--172, New York, NY, USA, 1969. ACM.
[12]
T. A. Davis. University of Florida sparse matrix collection. NA Digest, 92, 1994.
[13]
T. A. Davis. Direct Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, 2006.
[14]
J. Dongarra. Sparse matrix storage formats. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.
[15]
J. Dongarra, P. Koev, and X. Li. Matrix-vector and matrix-matrix multiplication. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.
[16]
I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Oxford University Press, New York, 1986.
[17]
S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. Yale sparse matrix package I: The symmetric codes. International Journal for Numerical Methods in Engineering, 18:1145--1151, 1982.
[18]
E. Elmroth, F. Gustavson, I. Jonsson, and B. Kågström. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review, 46(1):3--45, 2004.
[19]
M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and other Cilk hyperobjects. In SPAA, Calgary, Canada, 2009.
[20]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In SIGPLAN, pages 212--223, Montreal, Quebec, Canada, June 1998.
[21]
A. George and J. W. Liu. Computer Solution of Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981.
[22]
J. R. Gilbert, G. L. Miller, and S.-H. Teng. Geometric mesh partitioning: Implementation and experiments. SIAM Journal on Scientific Computing, 19(6):2091--2110, 1998.
[23]
J. R. Gilbert, C. Moler, and R. Schreiber. Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl, 13:333--356, 1991.
[24]
E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18(1):135--158, 2004.
[25]
K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In Computing Frontiers (CF), pages 87--96, New York, NY, USA, 2008. ACM.
[26]
J. Leskovec, D. Chakrabarti, J. M. Kleinberg, and C. Faloutsos. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In PKDD, pages 133--145, 2005.
[27]
K. P. Lorton and D. S. Wise. Analyzing block locality in Morton-order and Morton-hybrid matrices. SIGARCH Computer Architecture News, 35(4):6--12, 2007.
[28]
H. M. Markowitz. The elimination form of the inverse and its application to linear programming. Management Science, 3(3):255--269, 1957.
[29]
G. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Canada, Mar. 1966.
[30]
R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick. When cache blocking of sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communication and Computing, 18(3):297--311, 2007.
[31]
R. Raman and D. S. Wise. Converting to and from dilated integers. IEEE Trans. on Computers, 57(4):567--573, 2008.
[32]
Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, second edition, 2003.
[33]
N. Sato and W. F. Tinney. Techniques for exploiting the sparsity of the network admittance matrix. IEEE Trans. Power Apparatus and Systems, 82(69):944--950, Dec. 1963.
[34]
V. Shah and J. R. Gilbert. Sparse matrices in Matlab*P: Design and implementation. In HiPC, pages 144--155, 2004.
[35]
B. Stroustrup. The C++ Programming Language. Addison-Wesley, third edition, 2000.
[36]
W. Tinney and J. Walker. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proceedings of the IEEE, 55(11):1801--1809, Nov. 1967.
[37]
S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM J. Research and Development, 41(6):711--726, 1997.
[38]
B. Vastenhouw and R. H. Bisseling. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev., 47(1):67--95, 2005.
[39]
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521+, 2005.
[40]
J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In ICS, pages 307--316, New York, NY, USA, 2006. ACM.
[41]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009.

Cited By

View all
  • (2024)Tomofast-x 2.0: an open-source parallel code for inversion of potential field data with topography using wavelet compressionGeoscientific Model Development10.5194/gmd-17-2325-202417:6(2325-2345)Online publication date: 21-Mar-2024
  • (2024)Input Data Format for Sparse Matrix in Quantum Annealing EmulatorIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023VLP0002E107.A:3(557-565)Online publication date: 1-Mar-2024
  • (2024)SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on SupercomputersProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673129(158-167)Online publication date: 12-Aug-2024
  • Show More Cited By

Index Terms

  1. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
      August 2009
      370 pages
      ISBN:9781605586069
      DOI:10.1145/1583991
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 August 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. compressed sparse blocks
      2. compressed sparse columns
      3. compressed sparse rows
      4. matrix transpose
      5. matrix-vector multiplication
      6. multithreaded algorithm
      7. parallelism
      8. span
      9. sparse matrix
      10. storage format
      11. work

      Qualifiers

      • Research-article

      Conference

      SPAA 09

      Acceptance Rates

      Overall Acceptance Rate 447 of 1,461 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)173
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 21 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Tomofast-x 2.0: an open-source parallel code for inversion of potential field data with topography using wavelet compressionGeoscientific Model Development10.5194/gmd-17-2325-202417:6(2325-2345)Online publication date: 21-Mar-2024
      • (2024)Input Data Format for Sparse Matrix in Quantum Annealing EmulatorIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023VLP0002E107.A:3(557-565)Online publication date: 1-Mar-2024
      • (2024)SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on SupercomputersProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673129(158-167)Online publication date: 12-Aug-2024
      • (2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
      • (2024)TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded SystemsProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657576(23-33)Online publication date: 20-Jun-2024
      • (2024)UniSparse: An Intermediate Language for General Sparse Format CustomizationProceedings of the ACM on Programming Languages10.1145/36498168:OOPSLA1(137-165)Online publication date: 29-Apr-2024
      • (2024)Querying Graph Databases at ScaleCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654695(585-589)Online publication date: 9-Jun-2024
      • (2024)FSpGEMM: A Framework for Accelerating Sparse General Matrix–Matrix Multiplication Using Gustavson’s Algorithm on FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335549932:4(633-644)Online publication date: Apr-2024
      • (2024)Proof of User Similarity: The Spatial Measurer of BlockchainIEEE Transactions on Services Computing10.1109/TSC.2023.334771617:3(1114-1125)Online publication date: May-2024
      • (2024)GenCoder: A Novel Convolutional Neural Network Based Autoencoder for Genomic Sequence Data CompressionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2024.336624021:3(405-415)Online publication date: May-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media