research-article

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Authors:

Jeremy T. Fineman,

John R. Gilbert,

Charles E. LeisersonAuthors Info & Claims

SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

Pages 233 - 244

https://doi.org/10.1145/1583991.1584053

Published: 11 August 2009 Publication History

Abstract

This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.

References

[1]

M. D. Adams and D. S. Wise. Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms. In MSPC, pages 41--50, New York, NY, USA, 2006. ACM.

Digital Library

[2]

D. Bader, J. Feo, J. Gilbert, J. Kepner, D. Koester, E. Loh, K. Madduri, B. Mann, and T. Meuse. HPCS scalable synthetic compact applications #2. Version 1.1.

[3]

G. E. Blelloch. Programming parallel algorithms. CACM, 39(3), Mar. 1996.

Digital Library

[4]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP, pages 207--216, Santa Barbara, CA, July 1995.

Digital Library

[5]

R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, Sept. 1999.

Digital Library

[6]

A. Bulu&#231; and J. R. Gilbert. On the representation and multiplication of hypersparse matrices. In IPDPS, pages 1--11, 2008.

[7]

U. Catalyurek and C. Aykanat. A fine-grain hypergraph model for 2D decomposition of sparse matrices. In IPDPS, page 118, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[8]

J. Cho, H. Garcia-Molina, T. Haveliwala, W. Lam, A. Paepcke, S. Raghavan, and G. Wesley. Stanford webbase components and applications. ACM Transactions on Internet Technology, 6(2):153--186, 2006.

Digital Library

[9]

Cilk Arts, Inc., Burlington, MA. Cilk Programmer's Guide, 2009. Available from http://www.cilk.com/.

[10]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, third edition, 2009.

Digital Library

[11]

E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 24th National Conference, pages 157--172, New York, NY, USA, 1969. ACM.

Digital Library

[12]

T. A. Davis. University of Florida sparse matrix collection. NA Digest, 92, 1994.

[13]

T. A. Davis. Direct Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, 2006.

Digital Library

[14]

J. Dongarra. Sparse matrix storage formats. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.

[15]

J. Dongarra, P. Koev, and X. Li. Matrix-vector and matrix-matrix multiplication. In Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, editors, Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide. SIAM, 2000.

[16]

I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Oxford University Press, New York, 1986.

Digital Library

[17]

S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. Yale sparse matrix package I: The symmetric codes. International Journal for Numerical Methods in Engineering, 18:1145--1151, 1982.

[18]

E. Elmroth, F. Gustavson, I. Jonsson, and B. K&#229;gstr&#246;m. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review, 46(1):3--45, 2004.

[19]

M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. Reducers and other Cilk hyperobjects. In SPAA, Calgary, Canada, 2009.

Digital Library

[20]

M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In SIGPLAN, pages 212--223, Montreal, Quebec, Canada, June 1998.

Digital Library

[21]

A. George and J. W. Liu. Computer Solution of Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981.

Digital Library

[22]

J. R. Gilbert, G. L. Miller, and S.-H. Teng. Geometric mesh partitioning: Implementation and experiments. SIAM Journal on Scientific Computing, 19(6):2091--2110, 1998.

Digital Library

[23]

J. R. Gilbert, C. Moler, and R. Schreiber. Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl, 13:333--356, 1991.

Digital Library

[24]

E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18(1):135--158, 2004.

Digital Library

[25]

K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In Computing Frontiers (CF), pages 87--96, New York, NY, USA, 2008. ACM.

Digital Library

[26]

J. Leskovec, D. Chakrabarti, J. M. Kleinberg, and C. Faloutsos. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In PKDD, pages 133--145, 2005.

Digital Library

[27]

K. P. Lorton and D. S. Wise. Analyzing block locality in Morton-order and Morton-hybrid matrices. SIGARCH Computer Architecture News, 35(4):6--12, 2007.

Digital Library

[28]

H. M. Markowitz. The elimination form of the inverse and its application to linear programming. Management Science, 3(3):255--269, 1957.

Digital Library

[29]

G. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Canada, Mar. 1966.

[30]

R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick. When cache blocking of sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communication and Computing, 18(3):297--311, 2007.

[31]

R. Raman and D. S. Wise. Converting to and from dilated integers. IEEE Trans. on Computers, 57(4):567--573, 2008.

Digital Library

[32]

Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelpha, PA, second edition, 2003.

Digital Library

[33]

N. Sato and W. F. Tinney. Techniques for exploiting the sparsity of the network admittance matrix. IEEE Trans. Power Apparatus and Systems, 82(69):944--950, Dec. 1963.

[34]

V. Shah and J. R. Gilbert. Sparse matrices in Matlab*P: Design and implementation. In HiPC, pages 144--155, 2004.

Digital Library

[35]

B. Stroustrup. The C++ Programming Language. Addison-Wesley, third edition, 2000.

Digital Library

[36]

W. Tinney and J. Walker. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proceedings of the IEEE, 55(11):1801--1809, Nov. 1967.

[37]

S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM J. Research and Development, 41(6):711--726, 1997.

Digital Library

[38]

B. Vastenhouw and R. H. Bisseling. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev., 47(1):67--95, 2005.

Digital Library

[39]

R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521+, 2005.

[40]

J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In ICS, pages 307--316, New York, NY, USA, 2006. ACM.

Digital Library

[41]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009.

Digital Library

Cited By

Ogarko VFrankcombe KLiu TGiraud JMartin RJessell M(2024)Tomofast-x 2.0: an open-source parallel code for inversion of potential field data with topography using wavelet compressionGeoscientific Model Development10.5194/gmd-17-2325-202417:6(2325-2345)Online publication date: 21-Mar-2024
https://doi.org/10.5194/gmd-17-2325-2024
SHIMOMAI SUEDA KKIMURA S(2024)Input Data Format for Sparse Matrix in Quantum Annealing EmulatorIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023VLP0002E107.A:3(557-565)Online publication date: 1-Mar-2024
https://doi.org/10.1587/transfun.2023VLP0002
Gan XLi TZhang QYang BChen XLiu J(2024)SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on SupercomputersProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673129(158-167)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673129
Show More Cited By

Index Terms

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
  2. Mathematical software

Recommendations

Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods

In this article, we introduce a cache-oblivious method for sparse matrix-vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the ...
A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication

A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimize the true communication volume, and also tries to spread the computation and communication work evenly over the ...
GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

Many high performance computing applications require computing both sparse matrix-vector product SMVP and sparse matrix-transpose vector product SMTVP for better overall performance. Under such a circumstance, it is critical to maintain a similarly high ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

August 2009

370 pages

ISBN:9781605586069

DOI:10.1145/1583991

General Chair:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany)
,
Program Chair:
Michael A. Bender
Stony Brook University and Tokutek, Inc., USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA 09

Sponsor:

SPAA 09: 21st ACM Symposium on Parallelism in Algorithms and Architectures

August 11 - 13, 2009

AB, Calgary, Canada

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

232
Total Citations
View Citations
2,101
Total Downloads

Downloads (Last 12 months)173
Downloads (Last 6 weeks)13

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ogarko VFrankcombe KLiu TGiraud JMartin RJessell M(2024)Tomofast-x 2.0: an open-source parallel code for inversion of potential field data with topography using wavelet compressionGeoscientific Model Development10.5194/gmd-17-2325-202417:6(2325-2345)Online publication date: 21-Mar-2024
https://doi.org/10.5194/gmd-17-2325-2024
SHIMOMAI SUEDA KKIMURA S(2024)Input Data Format for Sparse Matrix in Quantum Annealing EmulatorIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2023VLP0002E107.A:3(557-565)Online publication date: 1-Mar-2024
https://doi.org/10.1587/transfun.2023VLP0002
Gan XLi TZhang QYang BChen XLiu J(2024)SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on SupercomputersProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673129(158-167)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673129
Guo JXia RLiu JZhu XZhang X(2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673042
Chae BKim JHeo SShrivastava ASui Y(2024)TinySeg: Model Optimizing Framework for Image Segmentation on Tiny Embedded SystemsProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657576(23-33)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3652032.3657576
Liu JZhao ZDing ZBrock BRong HZhang Z(2024)UniSparse: An Intermediate Language for General Sparse Format CustomizationProceedings of the ACM on Programming Languages10.1145/36498168:OOPSLA1(137-165)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649816
Hogan AVrgoć DBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Querying Graph Databases at ScaleCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654695(585-589)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654695
Bank Tavakoli ERiera MQuraishi MRen F(2024)FSpGEMM: A Framework for Accelerating Sparse General Matrix–Matrix Multiplication Using Gustavson’s Algorithm on FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335549932:4(633-644)Online publication date: Apr-2024
https://doi.org/10.1109/TVLSI.2024.3355499
Wang SShi LShi HZhang YHu QCheng X(2024)Proof of User Similarity: The Spatial Measurer of BlockchainIEEE Transactions on Services Computing10.1109/TSC.2023.334771617:3(1114-1125)Online publication date: May-2024
https://doi.org/10.1109/TSC.2023.3347716
K. S. SNair M(2024)GenCoder: A Novel Convolutional Neural Network Based Autoencoder for Genomic Sequence Data CompressionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2024.336624021:3(405-415)Online publication date: May-2024
https://doi.org/10.1109/TCBB.2024.3366240
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents