Article

Optimal sparse matrix dense vector multiplication in the I/O-model

Authors:

Michael A. Bender,

Gerth Stølting Brodal,

Rolf Fagerberg,

Elias VicariAuthors Info & Claims

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

Pages 61 - 70

https://doi.org/10.1145/1248377.1248391

Published: 09 June 2007 Publication History

Abstract

We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O-model. The task of SpMV is to compute y := Ax, where A is a sparse N x N matrix and x and y are vectors. Here, sparsity is expressed by the parameter k that states that A has a total of at most kN nonzeros, i.e., an average number of k nonzeros per column. The extreme choices for parameter k are well studied special cases, namely for k=1 permuting and for k=N dense matrix-vector multiplication.

We study the worst-case complexity of this computational task, i.e., what is the best possible upper bound on the number of I/Os depending on k and N only. We determine this complexity up to a constant factor for large ranges of the parameters. By our arguments, we find that most matrices with kN nonzeros require this number of I/Os, even if the program may depend on the structure of the matrix. The model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.

We study two variants of the problem, depending on the memory layout of A.

If A is stored in column major layout, SpMV has I/O complexity Θ(min{kN<over>B(1+log_M/BN<over>max{M,k}), kN}) for k ≤ N^1-ε and any constant 1> ε > 0. If the algorithm can choose the memory layout, the I/O complexity of SpMV is Θ(min{kN<over>B(1+log_M/BN<over>kM), kN]) for k ≤ ³√N.

In the cache oblivious setting with tall cache assumption M ≥ B^1+ε, the I/O complexity is Ο(kN<over>B(1+log_M/B N<over>k)) for A in column major layout.

References

[1]

A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. ACM, 31(9):1116--1127, September 1988.

Digital Library

[2]

L. Arge and P. B. Miltersen. On showing lower bounds for external-memory computational geometry problems. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 139--159. AMS Press 1999.

Digital Library

[3]

G. S. Brodal, R. Fagerberg, and G. Moruz. Cache-aware and cache-oblivious adaptive sorting. In Proc. 32nd International Colloquium on Automata, Languages, and Programming, vol. 3580 of Lecture Notes in Computer Science, pages 576--588. Springer Verlag, Berlin, 2005.

Digital Library

[4]

T. H. Cormen, T. Sundquist, and L. F. Wisniewski. Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput., 28(1):105--136, 1999.

Digital Library

[5]

J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, R. V. Antoine Petitet, R. C. Whaley, and K. Yelick. Self-adapting linear algebra algorithms and software. Proc. of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation, 93(2), February 2005.

[6]

S. Filippone and M. Colajanni. PSBLAS: A library for parallel linear algebra computation on sparse matrices. ACM Trans. on Math. Software, 26(4):527--550, Dec. 2000.

Digital Library

[7]

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pages 285--297, New York, NY, Oct. 17-19 1999.

Digital Library

[8]

R. Hartshorne. Algebraic Geometry. Springer, 1977.

[9]

T. Haveliwala. Efficient computation of pagerank. Technical Report 1999-31, Database Group, Computer Science Department, Stanford University, Feb. 1999. Available at http://dbpubs.stanford.edu/pub/1999-31.

[10]

E. J. Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. PhD thesis, University of California, Berkeley, May 2000.

Digital Library

[11]

H. Jia-Wei and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proc. 13th annual ACM symposium on Theory of computing, pages 326--333, New York, NY, USA, 1981. ACM Press.

Digital Library

[12]

R. Raz. Multi-linear formulas for permanent and determinant are of super-polynomial size. In Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), pages 633--641, Chicago, IL, USA, June 2004.

Digital Library

[13]

K. Remington and R. Pozo. NIST sparse BLAS user's guide. Technical report, National Institute of Standards and Technology, Gaithersburg, Maryland, 1996.

[14]

Y. Saad. Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994.

[15]

S. Toledo. A survey of out-of-core algorithms in numerical linear algebra. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 161--179. AMS Press 1999.

Digital Library

[16]

J. S. Vitter. External memory algorithms and data structures. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 1--38. AMS Press 1999.

Digital Library

[17]

R. Vudac, J. W. Demmel, and K. A. Yelick. The Optimized Sparse Kernel Interface (OSKI) Library: User's Guide for Version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, March 15 2006.

[18]

R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, Fall 2003.

Digital Library

Cited By

Ding NMaris PNam HGroves TAwan MLindsey LDaley CSelvitopi OOliker LWright NWilliams S(2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
https://doi.org/10.1002/cpe.8147
Beamer SAsanovic KPatterson D(2017)Reducing Pagerank Communication via Propagation Blocking2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.112(820-831)Online publication date: May-2017
https://doi.org/10.1109/IPDPS.2017.112
Shao ZHe JLv HJin H(2017)FOGInternational Journal of Parallel Programming10.1007/s10766-016-0468-845:6(1259-1272)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s10766-016-0468-8
Show More Cited By

Index Terms

Optimal sparse matrix dense vector multiplication in the I/O-model
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model
Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler

We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider ...
A two-pronged progress in structured dense matrix vector multiplication
SODA '18: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms

Matrix-vector multiplication is one of the most fundamental computing primitives. Given a matrix A ∈ F^{N × N} and a vector b ∈ F^N, it is known that in the worst case Θ(N²) operations over F are needed to compute Ab. Many types of structured matrices do ...
Solving Sparse Linear Systems Faster than Matrix Multiplication

Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph-structured linear systems, in the general setting, the bit complexity of solving an n × n linear system Ax = b is Õ(n^ω)...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

June 2007

376 pages

ISBN:9781595936677

DOI:10.1145/1248377

General Chair:
Phillip B. Gibbons
Intel Research, USA
,
Program Chair:
Christian Scheideler
Technische Universität München, Germany

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA07

Sponsor:

SPAA07: 19th ACM Symposium on Parallelism in Algorithms and Architectures

June 9 - 11, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
504
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ding NMaris PNam HGroves TAwan MLindsey LDaley CSelvitopi OOliker LWright NWilliams S(2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
Beamer SAsanovic KPatterson D(2017)Reducing Pagerank Communication via Propagation Blocking2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.112(820-831)Online publication date: May-2017
Shao ZHe JLv HJin H(2017)FOGInternational Journal of Parallel Programming10.1007/s10766-016-0468-845:6(1259-1272)Online publication date: 1-Dec-2017
Zhang MWu YChen KQian XLi XZheng WKeeton KRoscoe T(2016)Exploring the hidden dimension in graph processingProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026900(285-300)Online publication date: 2-Nov-2016
Zhang MWu YChen KMa TZheng W(2016)Measuring and optimizing distributed array programsProceedings of the VLDB Endowment10.14778/2994509.29945119:12(912-923)Online publication date: 1-Aug-2016
McCune RWeninger TMadey G(2015)Thinking Like a VertexACM Computing Surveys10.1145/281818548:2(1-39)Online publication date: 12-Oct-2015
Roy AMihailovic IZwaenepoel WKaminsky MDahlin M(2013)X-StreamProceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles10.1145/2517349.2522740(472-488)Online publication date: 3-Nov-2013
Kaminsky MDahlin M(2013)Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesundefinedOnline publication date: 3-Nov-2013
Avron HGupta AHollingsworth J(2012)Managing data-movement for effective shared-memory parallelization of out-of-core sparse solversProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389134(1-11)Online publication date: 10-Nov-2012
Avron HGupta A(2012)Managing data-movement for effective shared-memory parallelization of out-of-core sparse solversProceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.74(1-11)Online publication date: 10-Nov-2012
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents