Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503234acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes

Published: 17 November 2013 Publication History

Abstract

The sparse matrix-vector (SpMV) multiplication routine is an important building block used in many iterative algorithms for solving scientific and engineering problems. One of the main challenges of SpMV is its memory-boundedness. Although compression has been proposed previously to improve SpMV performance on CPUs, its use has not been demonstrated on the GPU because of the serial nature of many compression and decompression schemes. In this paper, we introduce a family of bit-representation-optimized (BRO) compression schemes for representing sparse matrices on GPUs. The proposed schemes, BRO-ELL, BRO-COO, and BRO-HYB, perform compression on index data and help to speed up SpMV on GPUs through reduction of memory traffic. Furthermore, we formulate a BRO-aware matrix reordering scheme as a data clustering problem and use it to increase compression ratios. With the proposed schemes, experiments show that average speedups of 1.5x compared to ELLPACK and HYB can be achieved for SpMV on GPUs.

References

[1]
P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl., 17(4):886--905, Oct. 1996.
[2]
M. M. Baskaran and R. Bordawekar. Optimizing sparse matrix-vector multiplication on GPUs. Technical report, RC24704, IBM T. J. Watson, 2009.
[3]
M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 100--109, New York, NY, USA, 2009.
[4]
N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 18:1--18:11, New York, NY, USA, 2009.
[5]
N. Bell and M. Garland. Cusp library, 2012. http://cusp-library.googlecode.com.
[6]
J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 115--126, New York, NY, USA, 2010.
[7]
T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011. http://www.cise.ufl.edu/research/sparse/matrices/.
[8]
E. F. D'Azevedo, M. R. Fahey, and R. T. Mills. Vectorized sparse matrix multiply for compressed row storage format. In Proceedings of the 5th international conference on Computational Science - Volume Part I, ICCS'05, pages 99--106, Berlin, Heidelberg, 2005.
[9]
A. George and J. W. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice Hall Professional Technical Reference, 1981.
[10]
J. Godwin, J. Holewinski, and P. Sadayappan. High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pages 47--56, New York, NY, USA, 2012.
[11]
D. Grewe and A. Lokhmotov. Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 12:1--12:8, New York, NY, USA, 2011.
[12]
E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California Berkeley, 2000.
[13]
E.-J. Im and K. A. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In Proceedings of the International Conference on Computational Sciences-Part I, ICCS '01, pages 127--136, London, UK, 2001.
[14]
A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett., 31(8):651--666, June 2010.
[15]
D. R. Kincaid, T. C. Oppe, and D. M. Young. ITPACKV 2D User Guide, CNA-232, 1989. http://rene.ma.utexas.edu/CNA/ITPACK/manuals/userv2d/.
[16]
K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In Proceedings of the 5th Conference on Computing frontiers, CF '08, pages 87--96, New York, NY, USA, 2008.
[17]
J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. Int. J. High Perform. Comput. Appl., 18(2):225--236, May 2004.
[18]
A. Monakov, A. Lokhmotov, and A. Avetisyan. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers, HiPEAC'10, pages 111--125, Berlin, Heidelberg, 2010.
[19]
Nvidia Compute Unified Device Architecture (CUDA). http://www.nvidia.com/object/cuda_home_new.html.
[20]
A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Proceedings of the 1999 ACM/IEEE conference on Supercomputing, Supercomputing '99, New York, NY, USA, 1999.
[21]
Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd edition, 2003.
[22]
B.-Y. Su and K. Keutzer. clSpMV: A cross-platform OpenCL SpMV framework on GPUs. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 353--364, New York, NY, USA, 2012.
[23]
F. Vázquez, J. J. Fernández, and E. M. Garzón. A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr. Comput.: Pract. Exper., 23(8):815--826, June 2011.
[24]
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521--530, 2005.
[25]
R. W. Vuduc and H.-J. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceedings of the First international conference on High Performance Computing and Communications, HPCC'05, pages 807--816, Berlin, Heidelberg, 2005.
[26]
J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In Proceedings of the 20th annual international conference on Supercomputing, ICS '06, pages 307--316, New York, NY, USA, 2006.
[27]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12, New York, NY, USA, 2007.

Cited By

View all
  • (2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
  • (2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
  • (2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. compression
  3. data compression
  4. matrix-vector multiplication
  5. memory bandwidth
  6. parallelism
  7. sparse matrix format

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
  • (2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
  • (2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2023)Accelerating matrix-centric graph processing on GPUs through bit-level optimizationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.013177(53-67)Online publication date: Jul-2023
  • (2023)A sparse matrix vector multiplication accelerator based on high-bandwidth memoryComputers and Electrical Engineering10.1016/j.compeleceng.2022.108488105:COnline publication date: 1-Jan-2023
  • (2023)Towards Reducing Communications in Sparse Matrix KernelsCloud Computing, Big Data & Emerging Topics10.1007/978-3-031-40942-4_2(17-30)Online publication date: 11-Aug-2023
  • (2023)Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data CompressionParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_36(467-480)Online publication date: 8-Apr-2023
  • (2022)Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00056(515-525)Online publication date: May-2022
  • (2022)A High-performance SpMV Accelerator on HBM-equipped FPGAs2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00171(1081-1087)Online publication date: Dec-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media