research-article

Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes

Authors:

Rick Siow Mong Goh,

Stephen John Turner,

Weng-Fai WongAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 26, Pages 1 - 12

https://doi.org/10.1145/2503210.2503234

Published: 17 November 2013 Publication History

Abstract

The sparse matrix-vector (SpMV) multiplication routine is an important building block used in many iterative algorithms for solving scientific and engineering problems. One of the main challenges of SpMV is its memory-boundedness. Although compression has been proposed previously to improve SpMV performance on CPUs, its use has not been demonstrated on the GPU because of the serial nature of many compression and decompression schemes. In this paper, we introduce a family of bit-representation-optimized (BRO) compression schemes for representing sparse matrices on GPUs. The proposed schemes, BRO-ELL, BRO-COO, and BRO-HYB, perform compression on index data and help to speed up SpMV on GPUs through reduction of memory traffic. Furthermore, we formulate a BRO-aware matrix reordering scheme as a data clustering problem and use it to increase compression ratios. With the proposed schemes, experiments show that average speedups of 1.5x compared to ELLPACK and HYB can be achieved for SpMV on GPUs.

References

[1]

P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl., 17(4):886--905, Oct. 1996.

Digital Library

[2]

M. M. Baskaran and R. Bordawekar. Optimizing sparse matrix-vector multiplication on GPUs. Technical report, RC24704, IBM T. J. Watson, 2009.

[3]

M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 100--109, New York, NY, USA, 2009.

Digital Library

[4]

N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 18:1--18:11, New York, NY, USA, 2009.

Digital Library

[5]

N. Bell and M. Garland. Cusp library, 2012. http://cusp-library.googlecode.com.

[6]

J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 115--126, New York, NY, USA, 2010.

Digital Library

[7]

T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011. http://www.cise.ufl.edu/research/sparse/matrices/.

Digital Library

[8]

E. F. D'Azevedo, M. R. Fahey, and R. T. Mills. Vectorized sparse matrix multiply for compressed row storage format. In Proceedings of the 5th international conference on Computational Science - Volume Part I, ICCS'05, pages 99--106, Berlin, Heidelberg, 2005.

Digital Library

[9]

A. George and J. W. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice Hall Professional Technical Reference, 1981.

Digital Library

[10]

J. Godwin, J. Holewinski, and P. Sadayappan. High-performance sparse matrix-vector multiplication on GPUs for structured grid computations. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pages 47--56, New York, NY, USA, 2012.

Digital Library

[11]

D. Grewe and A. Lokhmotov. Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 12:1--12:8, New York, NY, USA, 2011.

Digital Library

[12]

E.-J. Im. Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California Berkeley, 2000.

Digital Library

[13]

E.-J. Im and K. A. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In Proceedings of the International Conference on Computational Sciences-Part I, ICCS '01, pages 127--136, London, UK, 2001.

Digital Library

[14]

A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett., 31(8):651--666, June 2010.

Digital Library

[15]

D. R. Kincaid, T. C. Oppe, and D. M. Young. ITPACKV 2D User Guide, CNA-232, 1989. http://rene.ma.utexas.edu/CNA/ITPACK/manuals/userv2d/.

[16]

K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In Proceedings of the 5th Conference on Computing frontiers, CF '08, pages 87--96, New York, NY, USA, 2008.

Digital Library

[17]

J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. Int. J. High Perform. Comput. Appl., 18(2):225--236, May 2004.

Digital Library

[18]

A. Monakov, A. Lokhmotov, and A. Avetisyan. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers, HiPEAC'10, pages 111--125, Berlin, Heidelberg, 2010.

Digital Library

[19]

Nvidia Compute Unified Device Architecture (CUDA). http://www.nvidia.com/object/cuda_home_new.html.

[20]

A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Proceedings of the 1999 ACM/IEEE conference on Supercomputing, Supercomputing '99, New York, NY, USA, 1999.

Digital Library

[21]

Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd edition, 2003.

Digital Library

[22]

B.-Y. Su and K. Keutzer. clSpMV: A cross-platform OpenCL SpMV framework on GPUs. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pages 353--364, New York, NY, USA, 2012.

Digital Library

[23]

F. Vázquez, J. J. Fernández, and E. M. Garzón. A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr. Comput.: Pract. Exper., 23(8):815--826, June 2011.

Digital Library

[24]

R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521--530, 2005.

[25]

R. W. Vuduc and H.-J. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceedings of the First international conference on High Performance Computing and Communications, HPCC'05, pages 807--816, Berlin, Heidelberg, 2005.

Digital Library

[26]

J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In Proceedings of the 20th annual international conference on Supercomputing, ICS '06, pages 307--316, New York, NY, USA, 2006.

Digital Library

[27]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 38:1--38:12, New York, NY, USA, 2007.

Digital Library

Cited By

Zhao ZZhang GWu YHong RYang YFu Y(2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-05949-6
Freire MMarichal RGonzaga de Oliveira SDufrechou EEzzatti P(2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-3-031-52186-7_5
HU SITO MYOSHIKAWA THE YNAKAMURA HKONDO M(2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
https://doi.org/10.1587/transinf.2023PAP0008
Show More Cited By

Index Terms

Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes

Recommendations

A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU
Sparse matrix-vector multiplication (SpMV) is an important kernel that is used in many iterative algorithms for solving scientific and engineering problems. One of the main challenges of SpMV is its memory-boundedness due to the low arithmetic intensity ...
Optimizing symmetric dense matrix-vector multiplication on GPUs
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector ...
Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Agency for Science, Technology and Research

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
553
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao ZZhang GWu YHong RYang YFu Y(2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-05949-6
Freire MMarichal RGonzaga de Oliveira SDufrechou EEzzatti P(2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-3-031-52186-7_5
HU SITO MYOSHIKAWA THE YNAKAMURA HKONDO M(2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
https://doi.org/10.1587/transinf.2023PAP0008
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Chen JSung HShen XTallent NBarker KLi A(2023)Accelerating matrix-centric graph processing on GPUs through bit-level optimizationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.02.013177(53-67)Online publication date: Jul-2023
https://doi.org/10.1016/j.jpdc.2023.02.013
Li TShen L(2023)A sparse matrix vector multiplication accelerator based on high-bandwidth memoryComputers and Electrical Engineering10.1016/j.compeleceng.2022.108488105:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.compeleceng.2022.108488
Freire MMarichal RDufrechou EEzzatti P(2023)Towards Reducing Communications in Sparse Matrix KernelsCloud Computing, Big Data & Emerging Topics10.1007/978-3-031-40942-4_2(17-30)Online publication date: 11-Aug-2023
https://doi.org/10.1007/978-3-031-40942-4_2
Hu SIto MYoshikawa THe YKondo M(2023)Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data CompressionParallel and Distributed Computing, Applications and Technologies10.1007/978-3-031-29927-8_36(467-480)Online publication date: 8-Apr-2023
https://doi.org/10.1007/978-3-031-29927-8_36
Chen JSung HShen XTallent NBarker KLi A(2022)Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00056(515-525)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00056
Li TShen LYao S(2022)A High-performance SpMV Accelerator on HBM-equipped FPGAs2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00171(1081-1087)Online publication date: Dec-2022
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00171
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents