Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3014904.3014982acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Merge-based parallel sparse matrix-vector multiplication

Published: 13 November 2016 Publication History

Abstract

We present a strictly balanced method for the parallel computation of sparse matrix-vector products (SpMV). Our algorithm operates directly upon the Compressed Sparse Row (CSR) sparse matrix format without preprocessing, inspection, reformatting, or supplemental encoding. Regardless of nonzero structure, our equitable 2D merge-based decomposition tightly bounds the workload assigned to each processing element. Furthermore, our technique is suitable for recursively partitioning CSR datasets themselves into multi-scale, distributed, NUMA, and GPU environments that are constrained by fixed-size local memories.
We evaluate our method on both CPU and GPU microarchitectures across a very large corpus of diverse sparse matrix datasets. We show that traditional CsrMV methods are inconsistent performers, often subject to order-of-magnitude performance variation across similarly-sized datasets. In comparison, our method provides predictable performance that is substantially uncorrelated to the distribution of nonzeros among rows and broadly improves upon that of current CsrMV methods.

References

[1]
J. Kepner, D. A. Bader, A. Buluç, J. R. Gilbert, T. G. Mattson, and H. Meyerhenke, "Graphs, Matrices, and the GraphBLAS: Seven Good Reasons," CoRR, vol. abs/1504.01039, 2015.
[2]
J. Kepner and J. Gilbert, Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics, 2011.
[3]
M. Mohri, "Semiring Frameworks and Algorithms for Shortest-distance Problems," J. Autom. Lang. Comb., vol. 7, no. 3, pp. 321--350, Jan. 2002.
[4]
R. W. Vuduc, "Automatic Performance Tuning of Sparse Matrix Kernels," Ph.D. Dissertation, University of California, Berkeley, 2003.
[5]
S. Williams, N. Bell, J. W. Choi, M. Garland, L. Oliker, and R. Vuduc, "Sparse Matrix-Vector Multiplication on Multicore and Accelerators," in Scientific Computing with Multicore and Accelerators, J. Kurzak, D. A. Bader, and J. Dongarra, Eds. Taylor & Francis, 2011, pp. 83--109.
[6]
S. Filippone, V. Cardellini, D. Barbieri, and A. Fanfarillo, "Sparse Matrix-Vector Multiplication on GPGPUs," ACM Trans. Math. Softw., To Appear.
[7]
N. Bell and M. Garland, "Implementing sparse matrix-vector multiplication on throughput-oriented processors," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York, NY, USA, 2009, pp. 18:1--18:11.
[8]
J. L. Greathouse and M. Daga, "Efficient Sparse Matrix-vector Multiplication on GPUs Using the csr Storage Format," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Piscataway, NJ, USA, 2014, pp. 769--780.
[9]
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan, "Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Piscataway, NJ, USA, 2014, pp. 781--792.
[10]
Intel Math Kernel Library (MKL) v11.3. Intel Corporation, 2015.
[11]
NVIDIA cuSPARSE v7.5. NVIDIA Corporation, 2013.
[12]
S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk, "Merge path - Parallel Merging Made Simple," in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Washington, DC, USA, 2012, pp. 1611--1618.
[13]
S. Baxter and D. Merrill, "Efficient Merge, Search, and Set Operations on GPUs," Mar-2013. {Online}. Available: http://on-demand.gputechconf.com/gtc/2013/presentations/S3414-Efficient-Merge-Search-Set-Operations.pdf.
[14]
N. Deo, A. Jain, and M. Medidi, "An optimal parallel algorithm for merging using multiselection," Inf. Process. Lett., vol. 50, no. 2, pp. 81--87, Apr. 1994.
[15]
A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, "Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks," in Proc. SPAA, Calgary, Canada, 2009.
[16]
A. Jain, "pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures," Master's Thesis, University of California at Berkeley, 2008.
[17]
S. Yan, C. Li, Y. Zhang, and H. Zhou, "yaSpMV: Yet Another SpMV Framework on GPUs," in Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, NY, USA, 2014, pp. 107--118.
[18]
G. E. Blelloch, M. A. Heroux, and M. Zagha, "Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors," School of Computer Science, Carnegie Mellon University, CMU-CS-93--173, Aug. 1993.
[19]
D. Merrill, "Allocation-oriented Algorithm Design with Application to GPU Computing," Ph.D. Dissertation, University of Virginia, 2011.
[20]
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, "Scan primitives for GPU computing," in Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, Aire-la-Ville, Switzerland, Switzerland, 2007, pp. 97--106.
[21]
N. Bell and M. Garland, "Efficient Sparse Matrix-Vector Multiplication on CUDA," NVIDIA Corporation, NVIDIA Technical Report NVR-2008--004, Dec. 2008.
[22]
A. Monakov, A. Lokhmotov, and A. Avetisyan, "Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures," in High Performance Embedded Architectures and Compilers, vol. 5952, Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds. Springer Berlin Heidelberg, 2010, pp. 111--125.
[23]
W. Tang, W. Tan, R. Goh, S. Turner, and W. Wong, "A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU," Parallel and Distributed Systems, IEEE Transactions on, vol. pp, no. 99, pp. 1--1, 2014.
[24]
W. T. Tang, W. J. Tan, R. Ray, Y. W. Wong, W. Chen, S. Kuo, R. S. M. Goh, S. J. Turner, and W.-F. Wong, "Accelerating Sparse Matrix-vector Multiplication on GPUs Using Bit-representation-optimized Schemes," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, New York, NY, USA, 2013, pp. 26:1--26:12.
[25]
B.-Y. Su and K. Keutzer, "clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs," in Proceedings of the 26th ACM International Conference on Supercomputing, New York, NY, USA, 2012, pp. 353--364.
[26]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, "Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms," in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, New York, NY, USA, 2007, pp. 38:1--38:12.
[27]
S. Dalton, S. Baxter, D. Merrill, L. Olson, and M. Garland, "Optimizing Sparse Matrix Operations on GPUs Using Merge Path," in Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, 2015, pp. 407--416.
[28]
S. Baxter, Modern GPU. NVIDIA Research, 2013.
[29]
A. Buluc and J. R. Gilbert, "On the representation and multiplication of hypersparse matrices," in Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, 2008, pp. 1--11.
[30]
D. Chakrabarti, Y. Zhan, and C. Faloutsos, "R-MAT: A Recursive Model for Graph Mining," in SIAM International Conference on Data Mining, 2004.
[31]
T. Davis and Y. Hu, "University of Florida Sparse Matrix Collection." {Online}. Available: http://www.cise.ufl.edu/research/sparse/matrices/. {Accessed: 11-Jul-2011}.
[32]
NVIDIA, "CUDA." {Online}. Available: http://www.nvidia.com/object/cuda_home_new.html. {Accessed: 25-Aug-2011}.
[33]
D. Merrill, CUB v1.5.3: CUDA Unbound, a library of warp-wide, block-wide, and device-wide GPU parallel primitives. NVIDIA Research, 2015.
[34]
J. D. McCalpin, "Memory Bandwidth and Machine Balance in Current High Performance Computers," IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19--25, Dec. 1995.
[35]
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey, "Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors," in Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, New York, NY, USA, 2013, pp. 273--282.

Cited By

View all
  • (2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
  • (2023)GTLB:A Load-Balanced SpMV Computation Method on GPUProceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications10.1145/3606043.3606057(101-107)Online publication date: 17-Jun-2023
  • (2023)PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU ApplicationsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593705(167-179)Online publication date: 21-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2016
1034 pages
ISBN:9781467388153
  • Conference Chair:
  • John West

Sponsors

In-Cooperation

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Author Tags

  1. GPU
  2. SpMV
  3. linear algebra
  4. many-core
  5. merge-path
  6. parallel merge
  7. sparse matrix

Qualifiers

  • Research-article

Conference

SC16
Sponsor:

Acceptance Rates

SC '16 Paper Acceptance Rate 81 of 442 submissions, 18%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
  • (2023)GTLB:A Load-Balanced SpMV Computation Method on GPUProceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications10.1145/3606043.3606057(101-107)Online publication date: 17-Jun-2023
  • (2023)PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU ApplicationsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593705(167-179)Online publication date: 21-Jun-2023
  • (2020)Load-balancing Sparse Matrix Vector Product Kernels on GPUsACM Transactions on Parallel Computing10.1145/33809307:1(1-26)Online publication date: 29-Mar-2020
  • (2019)Hierarchical Matrix Operations on GPUsACM Transactions on Mathematical Software10.1145/323285045:1(1-28)Online publication date: 24-Feb-2019
  • (2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291739(1-15)Online publication date: 11-Nov-2018
  • (2018)HiCOOProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291682(1-15)Online publication date: 11-Nov-2018
  • (2018)TriCoreProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291675(1-12)Online publication date: 11-Nov-2018
  • (2018)BestSFACM Transactions on Architecture and Code Optimization10.1145/322622815:3(1-27)Online publication date: 4-Sep-2018
  • (2018)Implementing Push-Pull Efficiently in GraphBLASProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225122(1-11)Online publication date: 13-Aug-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media