research-article

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms

Authors:

Athena Elafrou,

Vasileios Karakasis,

Theodoros Gkountouvas,

Kornilios Kourtis,

Georgios Goumas,

Nectarios KozirisAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 44, Issue 3

Article No.: 26, Pages 1 - 32

https://doi.org/10.1145/3134442

Published: 03 January 2018 Publication History

Get Access

Abstract

The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at the heart of many iterative methods for the solution of sparse linear systems, and often constitutes a severe performance bottleneck. Its optimization, which is intimately associated with the data structures used to store the sparse matrix, has always been of particular interest to the applied mathematics and computer science communities and has attracted further attention since the advent of multicore architectures. In this article, we present SparseX, an open source software package for SpMV targeting multicore platforms, that employs the state-of-the-art Compressed Sparse eXtended (CSX) sparse matrix storage format to deliver high efficiency through a highly usable “BLAS-like” interface that requires limited or no tuning. Performance results indicate that our library achieves superior performance over competitive libraries on large-scale problems.

References

[1]

Andrew V. Adinetz, Paul F. Baumeister, Hans Böttiger, Thorsten Hater, Thilo Maurer, Dirk Pleiter, Wolfram Schenck, and Sebastiano Fabio Schifano. 2015. Performance Evaluation of Scientific Applications on POWER8. Springer International Publishing, Cham, 24--45.

Google Scholar

[2]

Ramesh C. Agarwal, Fred G. Gustavson, and Mohammad Zubair. 1992. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press, 32--41.

Digital Library

Google Scholar

[3]

J. Ankit. 2008. pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures, Master’s thesis. University of California at Berkeley, Berkeley, CA.

Google Scholar

[4]

D. H. Bailey, E. Barscz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinksi, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS parallel benchmarks summary and preliminary results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM, 156--165.

Digital Library

Google Scholar

[5]

Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen (Eds.). Birkhäuser Press, 163--202.

Digital Library

Google Scholar

[6]

V. H. F. Batista, G. O. Ainsworth, Jr., and F. L. B. Ribeiro. 2010. Parallel structurally-symmetric sparse matrix-vector products on multi-core processors. Computing Research Repository (CoRR) abs/1003.0952.

Google Scholar

[7]

Christopher Beattie, Serkan Gugercin, and others. 2006. Inexact solves in Krylov-based model reduction. In Proceedings of the 2006 45th IEEE Conference on Decision and Control. IEEE, 3405--3411.

Crossref

Google Scholar

[8]

Mehmet Belgin, Godmar Back, and Calvin J. Ribbens. 2009. Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, 100--109.

Digital Library

Google Scholar

[9]

R. Boisvert, R. Pozo, and Karin Remington. 1996. The Matrix Market Exchange Formats: Initial Design. Technical Report NISTIR-5935. National Institute of Standards and Technology.

Google Scholar

[10]

E. Cuthill and J. McKee. 1969. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference (ACM’69). ACM, New York, 157--172.

Digital Library

Google Scholar

[11]

T. Davis and Y. Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Software 38 (2011), 1--25. http://www.cise.ufl.edu/research/sparse/matrices.

Digital Library

Google Scholar

[12]

J. Dongarra and M. A. Heroux. 2013. Toward a New Metric for Ranking High Performance Computing Systems. Technical Report 4744. Sandia National Laboratories.

Google Scholar

[13]

J. Dongarra, V. Eijkhout, and H. Van der Vorst. 2001. Iterative solver benchmark. Scientific Programming 9, 4 (2001), 223--231.

Digital Library

Google Scholar

[14]

I. S. Duff and J. K. Reid. 1979. Some design features of a sparse matrix code. ACM Trans. Math. Software 5, 1 (1979), 18--35.

Digital Library

Google Scholar

[15]

J. Fresno, A. Gonzalez-Escribano, and D. R. Llanos. 2014. Blending extensibility and performance in dense and sparse parallel data management. IEEE Trans. Parallel Distrib. Syst. (TPDS) 25, 10 (2014), 2509--2519.

Crossref

Google Scholar

[16]

T. Gkountouvas, V. Karakasis, K. Kourtis, G. Goumas, and N. Koziris. 2013. Improving the performance of the symmetric sparse matrix-vector multiplication in multicore. IEEE Trans. Parallel Distrib. Syst. (TPDS) 24, 10 (2013), 1930--1940.

Digital Library

Google Scholar

[17]

Tilmann Gneiting and Adrian E. Raftery. 2005. Weather forecasting with ensemble methods. Science 310, 5746 (2005), 248--249.

Google Scholar

[18]

G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. 2009. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. J. Supercomput. 50, 1 (2009), 36--77.

Digital Library

Google Scholar

[19]

J. L. Greathouse and M. Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). 769--780.

Digital Library

Google Scholar

[20]

M. H. Gutknecht. 2007. Block Krylov space methods for linear systems with multiple right-hand sides: An introduction. Modern Mathematical Models, Methods and Algorithms for Real World Systems, 420--447.

Google Scholar

[21]

Mark Harris. 2005. Mapping computational concepts to GPUs. In Proceedings of ACM SIGGRAPH 2005 Courses (SIGGRAPH’05). ACM, New York, Article 50.

Digital Library

Google Scholar

[22]

J. L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (2006), 1--17.

Digital Library

Google Scholar

[23]

M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Willians, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich. 2013. Improving Performance via Mini-applications. Technical Report 4744. Sandia National Laboratories.

Google Scholar

[24]

Michael A. Heroux, Roscoe A. Bartlett, Vicki E. Howle, Robert J. Hoekstra, Jonathan J. Hu, Tamara G. Kolda, Richard B. Lehoucq, Kevin R. Long, Roger P. Pawlowski, Eric T. Phipps, Andrew G. Salinger, Heidi K. Thornquist, Ray S. Tuminaro, James M. Willenbring, Alan Williams, and Kendall S. Stanley. 2005. An overview of the Trilinos project. ACM Trans. Math. Software 31, 3 (2005), 397--423.

Digital Library

Google Scholar

[25]

M. R. Hestenes and E. Stiefel. 1952. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards 49, 6 (1952), 409--436.

Crossref

Google Scholar

[26]

A. Ilic, F. Pratas, and L. Sousa. 2014. Cache-aware roofline model: Upgrading the loft. IEEE Comput. Archit. Lett. 13, 1 (Jan. 2014), 21--24.

Digital Library

Google Scholar

[27]

Eun-Jin Im and Katherine Yelick. 2001. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY. Springer, Berlin, 127--136.

Digital Library

Google Scholar

[28]

Intel® Coorporation. 2013. Intel® Math Kernel Library. Retrieved from http://software.intel.com/en-us/intel-mkl.

Google Scholar

[29]

ITRS. 2011. International Technology Roadmap for Semiconductors: Assembly and Packaging. Retrieved from http://www.itrs.net/Links/2005ITRS/AP2011.pdf.

Google Scholar

[30]

V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris. 2013. An extended compression format for the optimization of sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. (TPDS) 24, 10 (2013), 1930--1940. IEEE.

Digital Library

Google Scholar

[31]

Kornilios Kourtis, Georgios Goumas, and Nectarios Koziris. 2008. Optimizing sparse matrix-vector multiplication using index and value compression. In Proceedings of the 5th Conference on Computing Frontiers (CF’08). ACM, New York, 87--96.

Digital Library

Google Scholar

[32]

K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris. 2011. CSX: An extended compression format for SpMV on shared memory systems. In Proceedings of the 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM.

Digital Library

Google Scholar

[33]

M. Kreutzer, G. Hager, G. Wellein, H. Fehske, and A. R. Bishop. 2014. A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36, 5 (2014), 401--423.

Crossref

Google Scholar

[34]

C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Korgh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software 5, 3 (1979), 308--323.

Digital Library

Google Scholar

[35]

Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An input adaptive auto-tuner for sparse matrix-vector multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, 117--126.

Digital Library

Google Scholar

[36]

Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS’15). ACM, New York, 339--350.

Digital Library

Google Scholar

[37]

D. Lukarski. 2013. PARALUTION project. Retrieved from http://www.paralution.com/.

Google Scholar

[38]

M. Martone, S. Filippone, S. Tucci, M. Paprzycki, and M. Ganzha. 2010. Utilizing recursive storage in sparse matrix-vector multiplication—Preliminary considerations. In Proceedings of the ISCA 25th International Conference on Computers and Their Applications (CATA’10). ISCA, 300--305.

Google Scholar

[39]

J. D. McCalpin. 1995. STREAM: Sustainable Memory Bandwidth in High Performance Computing. Retrieved from http://www.cs.virginia.edu/stream/.

Google Scholar

[40]

Ali Pinar and Michael T. Heath. 1999. Improving performance of sparse matrix-vector multiplication. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (SC’99). ACM, New York, Article 30.

Digital Library

Google Scholar

[41]

U. W. Pooch and A. Nieder. 1973. A survey of indexing techniques for sparse matrices. Comput. Surveys 5, 2 (1973), 109--133.

Digital Library

Google Scholar

[42]

Y. Saad. 1992. Numerical Methods for Large Eigenvalue Problems. Manchester University Press ND.

Google Scholar

[43]

Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM.

Digital Library

Google Scholar

[44]

W. F. Tinney and J. W. Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. IEEE Proc. 55, 11 (1967), 1801--1809.

Crossref

Google Scholar

[45]

Jan Treibig, Georg Hager, and Gerhard Wellein. 2011. LIKWID: Lightweight performance tools. CoRR abs/1104.4874 (2011). http://arxiv.org/abs/1104.4874

Google Scholar

[46]

R. Vuduc, J. W. Demmel, and K. A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. J. Phys,: Conf. Ser. 16, 521 (2005).

Google Scholar

[47]

Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceedings of the 1st International Conference on High Performance Computing and Communications (HPCC’05). Springer-Verlag, Berlin, 807--816.

Digital Library

Google Scholar

[48]

Jeremiah Willcock and Andrew Lumsdaine. 2006. Accelerating sparse matrix computations via data compression. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06). ACM, New York, 307--316.

Digital Library

Google Scholar

[49]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (April 2009), 65--76.

Digital Library

Google Scholar

Cited By

View all

Sgherzi FSiracusa MFernandez IArmejach AMoretó M(2024)SpChar: Characterizing the sparse puzzle via decision treesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104941192(104941)Online publication date: Oct-2024
https://doi.org/10.1016/j.jpdc.2024.104941
Zhao ZZhang GWu YHong RYang YFu Y(2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-05949-6
Lu YLiu WMohror KArnold DBadia R(2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607051
Show More Cited By

Index Terms

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
  2. Mathematical software
    1. Mathematical software performance
2. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
POMACS

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific ...
Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...

Reviews

Reviewer: Zahari Zlatev

The treatment of many scientific and engineering problems leads nearly always to huge computational tasks, which can cause great difficulties even when modern high-speed parallel computers are available. Matrix-vector multiplications are very often one of the most time-consuming parts of the treatment. The order of the matrices and the length of the vectors in these tasks are in most cases very large, but the matrices are sparse (that is, many of their elements are equal to zero) and it is necessary to exploit this property. The authors of this paper describe a library of subroutines used to resolve, rather successfully, problems related to computational efficiency. First, the authors briefly discuss several existing kernels for performing sparse matrix-vector multiplication and explain why improvements are highly desirable. They then outline the advantages of their compressed sparse extended (CSX) special matrix storage format. A discussion of three storage formats for sparse matrices follows: a) the compressed sparse row (CSR) format, b) the blocked compressed sparse row (BCSR) format, and c) the CSX format. The paper then presents a performance analysis for the sparse matrix-vector multiplication for each of these three storage formats. It lists several relationships and explains that they can easily be used to evaluate the expected performance of the sparse matrix-vector multiplication for each of the three formats. Next, the authors present their SparseX library. The library's kernels are based on the application of CSX for sparse matrices and are used to prepare a high-performance sparse matrix-vector multiplication code (written in the C/C++ language), which can be used in different high-level sparse solvers for systems of linear algebraic equations via iterative methods. The authors of the paper are trying to a) provide simple and clear semantics; b) serve users with different levels of expertise; c) facilitate the integration of their kernels in large-scale sparse solver libraries; and d) provide transparent adaptation to the available target platform. Two levels of application performance implementation are available in their library. They discuss both the application of a high-level implementation, which requires minimal user efforts, and the use of a low-level version for advanced users. Furthermore, there are two types of autotuning characteristics facilitating the adaptation, both to the sparsity structure of the treated matrix and to the available hardware platform. Performance issues related to the SparseX library are then covered. Matrices from the University of Florida's collection of sparse matrices were selected and run on several high-performance computers. Results obtained in the comparison of the SparseX library with two other libraries are discussed. Several figures and tables illustrate the performance of the tools developed for sparse matrix-vector multiplication. Section 4 ends with the incorporation of the kernels in some solvers for systems of linear algebraic equations based on the use of the conjugate gradient method. The authors stress that the kernels (based on sparse matrix-vector multiplication) and the use of iterative methods in large linear systems are becoming more and more popular (in comparison with the application of direct methods). Some discussion of situations where the involved sparse matrices are rather ill-conditioned (which happens very often when large-scale scientific and engineering problems are to be handled), and therefore there is a danger that the iterative methods will either be very slowly convergent or the iterative process will simply become divergent, would have been very useful and should have been included.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 44, Issue 3

September 2018

291 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/3175005

Editor:
Daniel Kressner
EPF Lausanne, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2018

Accepted: 01 August 2017

Revised: 01 May 2017

Received: 01 July 2015

Published in TOMS Volume 44, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
753
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)4

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sgherzi FSiracusa MFernandez IArmejach AMoretó M(2024)SpChar: Characterizing the sparse puzzle via decision treesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104941192(104941)Online publication date: Oct-2024
https://doi.org/10.1016/j.jpdc.2024.104941
Zhao ZZhang GWu YHong RYang YFu Y(2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-05949-6
Lu YLiu WMohror KArnold DBadia R(2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607051
Mpakos PGalanopoulos DAnastasiadis PPapadopoulou NKoziris NGoumas G(2023)Feature-based SpMV Performance Analysis on Contemporary Devices2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00072(668-679)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00072
Mandal UDeb A(2023)ReMCOO: An Efficient Representation of Sparse Matrix-Vector Multiplication2023 IEEE Guwahati Subsection Conference (GCON)10.1109/GCON58516.2023.10183488(01-06)Online publication date: 23-Jun-2023
https://doi.org/10.1109/GCON58516.2023.10183488
Li WCheng HLu ZLu YLiu W(2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00025
Giannoula CFernandez ILuna JKoziris NGoumas GMutlu O(2022)SparsePProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080416:1(1-49)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3508041
de Prado MMundy ASaeed RDenna MPazos NBenini L(2021)Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.304656840:11(2293-2305)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1109/TCAD.2020.3046568
Niu YLu ZDong MJin ZLiu WTan G(2021)TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00016(68-78)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00016
Hein EEswar SYaşar ALi JYoung JConte TÇatalyürek ÜVuduc RRiedy JUçar B(2020)Programming Strategies for Irregular Algorithms on the Emu ChickACM Transactions on Parallel Computing10.1145/34180777:4(1-25)Online publication date: 21-Oct-2020
https://dl.acm.org/doi/10.1145/3418077
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

Reviews

Access critical reviews of Computing literature here