Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms

Published: 03 January 2018 Publication History

Abstract

The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at the heart of many iterative methods for the solution of sparse linear systems, and often constitutes a severe performance bottleneck. Its optimization, which is intimately associated with the data structures used to store the sparse matrix, has always been of particular interest to the applied mathematics and computer science communities and has attracted further attention since the advent of multicore architectures. In this article, we present SparseX, an open source software package for SpMV targeting multicore platforms, that employs the state-of-the-art Compressed Sparse eXtended (CSX) sparse matrix storage format to deliver high efficiency through a highly usable “BLAS-like” interface that requires limited or no tuning. Performance results indicate that our library achieves superior performance over competitive libraries on large-scale problems.

References

[1]
Andrew V. Adinetz, Paul F. Baumeister, Hans Böttiger, Thorsten Hater, Thilo Maurer, Dirk Pleiter, Wolfram Schenck, and Sebastiano Fabio Schifano. 2015. Performance Evaluation of Scientific Applications on POWER8. Springer International Publishing, Cham, 24--45.
[2]
Ramesh C. Agarwal, Fred G. Gustavson, and Mohammad Zubair. 1992. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press, 32--41.
[3]
J. Ankit. 2008. pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures, Master’s thesis. University of California at Berkeley, Berkeley, CA.
[4]
D. H. Bailey, E. Barscz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinksi, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS parallel benchmarks summary and preliminary results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM, 156--165.
[5]
Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. 1997. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, E. Arge, A. M. Bruaset, and H. P. Langtangen (Eds.). Birkhäuser Press, 163--202.
[6]
V. H. F. Batista, G. O. Ainsworth, Jr., and F. L. B. Ribeiro. 2010. Parallel structurally-symmetric sparse matrix-vector products on multi-core processors. Computing Research Repository (CoRR) abs/1003.0952.
[7]
Christopher Beattie, Serkan Gugercin, and others. 2006. Inexact solves in Krylov-based model reduction. In Proceedings of the 2006 45th IEEE Conference on Decision and Control. IEEE, 3405--3411.
[8]
Mehmet Belgin, Godmar Back, and Calvin J. Ribbens. 2009. Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, 100--109.
[9]
R. Boisvert, R. Pozo, and Karin Remington. 1996. The Matrix Market Exchange Formats: Initial Design. Technical Report NISTIR-5935. National Institute of Standards and Technology.
[10]
E. Cuthill and J. McKee. 1969. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference (ACM’69). ACM, New York, 157--172.
[11]
T. Davis and Y. Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Software 38 (2011), 1--25. http://www.cise.ufl.edu/research/sparse/matrices.
[12]
J. Dongarra and M. A. Heroux. 2013. Toward a New Metric for Ranking High Performance Computing Systems. Technical Report 4744. Sandia National Laboratories.
[13]
J. Dongarra, V. Eijkhout, and H. Van der Vorst. 2001. Iterative solver benchmark. Scientific Programming 9, 4 (2001), 223--231.
[14]
I. S. Duff and J. K. Reid. 1979. Some design features of a sparse matrix code. ACM Trans. Math. Software 5, 1 (1979), 18--35.
[15]
J. Fresno, A. Gonzalez-Escribano, and D. R. Llanos. 2014. Blending extensibility and performance in dense and sparse parallel data management. IEEE Trans. Parallel Distrib. Syst. (TPDS) 25, 10 (2014), 2509--2519.
[16]
T. Gkountouvas, V. Karakasis, K. Kourtis, G. Goumas, and N. Koziris. 2013. Improving the performance of the symmetric sparse matrix-vector multiplication in multicore. IEEE Trans. Parallel Distrib. Syst. (TPDS) 24, 10 (2013), 1930--1940.
[17]
Tilmann Gneiting and Adrian E. Raftery. 2005. Weather forecasting with ensemble methods. Science 310, 5746 (2005), 248--249.
[18]
G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. 2009. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. J. Supercomput. 50, 1 (2009), 36--77.
[19]
J. L. Greathouse and M. Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). 769--780.
[20]
M. H. Gutknecht. 2007. Block Krylov space methods for linear systems with multiple right-hand sides: An introduction. Modern Mathematical Models, Methods and Algorithms for Real World Systems, 420--447.
[21]
Mark Harris. 2005. Mapping computational concepts to GPUs. In Proceedings of ACM SIGGRAPH 2005 Courses (SIGGRAPH’05). ACM, New York, Article 50.
[22]
J. L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (2006), 1--17.
[23]
M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Willians, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich. 2013. Improving Performance via Mini-applications. Technical Report 4744. Sandia National Laboratories.
[24]
Michael A. Heroux, Roscoe A. Bartlett, Vicki E. Howle, Robert J. Hoekstra, Jonathan J. Hu, Tamara G. Kolda, Richard B. Lehoucq, Kevin R. Long, Roger P. Pawlowski, Eric T. Phipps, Andrew G. Salinger, Heidi K. Thornquist, Ray S. Tuminaro, James M. Willenbring, Alan Williams, and Kendall S. Stanley. 2005. An overview of the Trilinos project. ACM Trans. Math. Software 31, 3 (2005), 397--423.
[25]
M. R. Hestenes and E. Stiefel. 1952. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards 49, 6 (1952), 409--436.
[26]
A. Ilic, F. Pratas, and L. Sousa. 2014. Cache-aware roofline model: Upgrading the loft. IEEE Comput. Archit. Lett. 13, 1 (Jan. 2014), 21--24.
[27]
Eun-Jin Im and Katherine Yelick. 2001. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY. Springer, Berlin, 127--136.
[28]
Intel® Coorporation. 2013. Intel® Math Kernel Library. Retrieved from http://software.intel.com/en-us/intel-mkl.
[29]
ITRS. 2011. International Technology Roadmap for Semiconductors: Assembly and Packaging. Retrieved from http://www.itrs.net/Links/2005ITRS/AP2011.pdf.
[30]
V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris. 2013. An extended compression format for the optimization of sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. (TPDS) 24, 10 (2013), 1930--1940. IEEE.
[31]
Kornilios Kourtis, Georgios Goumas, and Nectarios Koziris. 2008. Optimizing sparse matrix-vector multiplication using index and value compression. In Proceedings of the 5th Conference on Computing Frontiers (CF’08). ACM, New York, 87--96.
[32]
K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris. 2011. CSX: An extended compression format for SpMV on shared memory systems. In Proceedings of the 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM.
[33]
M. Kreutzer, G. Hager, G. Wellein, H. Fehske, and A. R. Bishop. 2014. A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36, 5 (2014), 401--423.
[34]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Korgh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software 5, 3 (1979), 308--323.
[35]
Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An input adaptive auto-tuner for sparse matrix-vector multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, 117--126.
[36]
Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS’15). ACM, New York, 339--350.
[37]
D. Lukarski. 2013. PARALUTION project. Retrieved from http://www.paralution.com/.
[38]
M. Martone, S. Filippone, S. Tucci, M. Paprzycki, and M. Ganzha. 2010. Utilizing recursive storage in sparse matrix-vector multiplication—Preliminary considerations. In Proceedings of the ISCA 25th International Conference on Computers and Their Applications (CATA’10). ISCA, 300--305.
[39]
J. D. McCalpin. 1995. STREAM: Sustainable Memory Bandwidth in High Performance Computing. Retrieved from http://www.cs.virginia.edu/stream/.
[40]
Ali Pinar and Michael T. Heath. 1999. Improving performance of sparse matrix-vector multiplication. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (SC’99). ACM, New York, Article 30.
[41]
U. W. Pooch and A. Nieder. 1973. A survey of indexing techniques for sparse matrices. Comput. Surveys 5, 2 (1973), 109--133.
[42]
Y. Saad. 1992. Numerical Methods for Large Eigenvalue Problems. Manchester University Press ND.
[43]
Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). SIAM.
[44]
W. F. Tinney and J. W. Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. IEEE Proc. 55, 11 (1967), 1801--1809.
[45]
Jan Treibig, Georg Hager, and Gerhard Wellein. 2011. LIKWID: Lightweight performance tools. CoRR abs/1104.4874 (2011). http://arxiv.org/abs/1104.4874
[46]
R. Vuduc, J. W. Demmel, and K. A. Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. J. Phys,: Conf. Ser. 16, 521 (2005).
[47]
Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceedings of the 1st International Conference on High Performance Computing and Communications (HPCC’05). Springer-Verlag, Berlin, 807--816.
[48]
Jeremiah Willcock and Andrew Lumsdaine. 2006. Accelerating sparse matrix computations via data compression. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06). ACM, New York, 307--316.
[49]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (April 2009), 65--76.

Cited By

View all
  • (2024)SpChar: Characterizing the sparse puzzle via decision treesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104941192(104941)Online publication date: Oct-2024
  • (2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
  • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Reviews

Zahari Zlatev

The treatment of many scientific and engineering problems leads nearly always to huge computational tasks, which can cause great difficulties even when modern high-speed parallel computers are available. Matrix-vector multiplications are very often one of the most time-consuming parts of the treatment. The order of the matrices and the length of the vectors in these tasks are in most cases very large, but the matrices are sparse (that is, many of their elements are equal to zero) and it is necessary to exploit this property. The authors of this paper describe a library of subroutines used to resolve, rather successfully, problems related to computational efficiency. First, the authors briefly discuss several existing kernels for performing sparse matrix-vector multiplication and explain why improvements are highly desirable. They then outline the advantages of their compressed sparse extended (CSX) special matrix storage format. A discussion of three storage formats for sparse matrices follows: a) the compressed sparse row (CSR) format, b) the blocked compressed sparse row (BCSR) format, and c) the CSX format. The paper then presents a performance analysis for the sparse matrix-vector multiplication for each of these three storage formats. It lists several relationships and explains that they can easily be used to evaluate the expected performance of the sparse matrix-vector multiplication for each of the three formats. Next, the authors present their SparseX library. The library's kernels are based on the application of CSX for sparse matrices and are used to prepare a high-performance sparse matrix-vector multiplication code (written in the C/C++ language), which can be used in different high-level sparse solvers for systems of linear algebraic equations via iterative methods. The authors of the paper are trying to a) provide simple and clear semantics; b) serve users with different levels of expertise; c) facilitate the integration of their kernels in large-scale sparse solver libraries; and d) provide transparent adaptation to the available target platform. Two levels of application performance implementation are available in their library. They discuss both the application of a high-level implementation, which requires minimal user efforts, and the use of a low-level version for advanced users. Furthermore, there are two types of autotuning characteristics facilitating the adaptation, both to the sparsity structure of the treated matrix and to the available hardware platform. Performance issues related to the SparseX library are then covered. Matrices from the University of Florida's collection of sparse matrices were selected and run on several high-performance computers. Results obtained in the comparison of the SparseX library with two other libraries are discussed. Several figures and tables illustrate the performance of the tools developed for sparse matrix-vector multiplication. Section 4 ends with the incorporation of the kernels in some solvers for systems of linear algebraic equations based on the use of the conjugate gradient method. The authors stress that the kernels (based on sparse matrix-vector multiplication) and the use of iterative methods in large linear systems are becoming more and more popular (in comparison with the application of direct methods). Some discussion of situations where the involved sparse matrices are rather ill-conditioned (which happens very often when large-scale scientific and engineering problems are to be handled), and therefore there is a danger that the iterative methods will either be very slowly convergent or the iterative process will simply become divergent, would have been very useful and should have been included.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 44, Issue 3
September 2018
291 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3175005
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 January 2018
Accepted: 01 August 2017
Revised: 01 May 2017
Received: 01 July 2015
Published in TOMS Volume 44, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CSX
  2. HPC
  3. SpMV
  4. SpMV library
  5. data compression
  6. high-performance computing
  7. multicore
  8. scientific applications
  9. sparse matrix-vector multiplication

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SpChar: Characterizing the sparse puzzle via decision treesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104941192(104941)Online publication date: Oct-2024
  • (2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
  • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
  • (2023)Feature-based SpMV Performance Analysis on Contemporary Devices2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00072(668-679)Online publication date: May-2023
  • (2023)ReMCOO: An Efficient Representation of Sparse Matrix-Vector Multiplication2023 IEEE Guwahati Subsection Conference (GCON)10.1109/GCON58516.2023.10183488(01-06)Online publication date: 23-Jun-2023
  • (2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
  • (2022)SparsePProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080416:1(1-49)Online publication date: 28-Feb-2022
  • (2021)Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.304656840:11(2293-2305)Online publication date: 1-Nov-2021
  • (2021)TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00016(68-78)Online publication date: May-2021
  • (2020)Programming Strategies for Irregular Algorithms on the Emu ChickACM Transactions on Parallel Computing10.1145/34180777:4(1-25)Online publication date: 21-Oct-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media