Article

Floating-point sparse matrix-vector multiply for FPGAs

Authors:

Michael deLorimier,

André DeHonAuthors Info & Claims

FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays

Pages 75 - 85

https://doi.org/10.1145/1046192.1046203

Published: 20 February 2005 Publication History

Get Access

Abstract

Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, it is not uncommon for microprocessors to yield only 10--20% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single Virtex II 6000-4 and 12 double precision Gflops for 16 Virtex IIs (750Mflops/FPGA).

References

[1]

Matrix Market. http://math.nist.gov/MatrixMarket/, June 2004. Maintained by: National Institute of Standards and Technology (NIST).

Google Scholar

[2]

P. Belanović and M. Leeser. A Library of Parameterized Floating Point Modules and Their Use. In Proceedings of the International Conference on Field-Programmable Logic and Applications, pages 657--666, September 2002.

Digital Library

Google Scholar

[3]

P. Bellows and B. Hutchings. JHDL - An HDL for Reconfigurable Systems. In K. L. Pocek and J. Arnold, editors, IEEE Symposium on FPGAs for Custom Computing Machines, pages 175--184, Los Alamitos, CA, 1998. IEEE Computer Society Press.

Digital Library

Google Scholar

[4]

A. Caldwell, A. Kahng, and I. Markov. Improved Algorithms for Hypergraph Bipartitioning. In Proceedings of the Asia and South Pacific Design Automation Conference, pages 661--666, January 2000.

Digital Library

Google Scholar

[5]

T. Dunigan. ORNL SGI Altix Evaluation. http://www.csm.ornl.gov/~dunigan/sgi/, September 2004.

Google Scholar

[6]

D. S. Hochbaum, editor. Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, 1997.

Digital Library

Google Scholar

[7]

J. R. Jackson. Scheduling a Production Line to Minimize Maximum Tardiness. Management Science Research Project Research Report 43, UCLA, 1955.

Google Scholar

[8]

B. S. Landman and R. L. Russo. On Pin Versus Block Relationship for Partitions of Logic Circuits. IEEE Transactions on Computers, 20:1469--1479, 1971.

Google Scholar

[9]

C. Leiserson, F. Rose, and J. Saxe. Optimizing Synchronous Circuitry by Retiming. In Third Caltech Conference On VLSI, March 1983.

Google Scholar

[10]

I. Lloyd N. Trefethen, David Bau. Numerical Linear Algebra. SIAM, 3600 University City Science Center, Philadelphia, PA, 1997.

Google Scholar

[11]

L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, and R. V. der Wijngaart. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations. In Proceedings of the IEEE/ACM Conference on Supercomputing, 2003, 2003.

Digital Library

Google Scholar

[12]

K. Underwood. FPGAs vs. CPUs: Trends in Peak Floating-Point Performance. In Proceedings of the International Symposium on Field-Programmable Gate Arrays, pages 171--180, February 2004.

Digital Library

Google Scholar

[13]

R. Vudoc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, UC Berkeley, 2003.

Digital Library

Google Scholar

[14]

R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In Proceedings of IEEE/ACM Conference on Supercomputing, November 2002.

Digital Library

Google Scholar

Cited By

View all

Bharathi MMahaboob Basha SKumar ENehal GLapakshi AChintalapalli A(2023)Customised Design of 16-Bit DSP Processor for Signal Processing Applications2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10100153(930-935)Online publication date: 14-Mar-2023
https://doi.org/10.1109/ICIDCA56705.2023.10100153
AlAhmadi SMohammed TAlbeshri AKatib IMehmood R(2020)Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)Electronics10.3390/electronics91016759:10(1675)Online publication date: 13-Oct-2020
https://doi.org/10.3390/electronics9101675
Pligouroudis MNuno RKazmierski T(2020)Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181266(1-5)Online publication date: Oct-2020
https://doi.org/10.1109/ISCAS45731.2020.9181266
Show More Cited By

Index Terms

Floating-point sparse matrix-vector multiply for FPGAs

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...
Sparse Matrix-Vector multiplication on FPGAs
FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays

Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on general-purpose processors, ...
Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer
HPCMP-UGC '10: Proceedings of the 2010 DoD High Performance Computing Modernization Program Users Group Conference

High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays

February 2005

288 pages

ISBN:1595930299

DOI:10.1145/1046192

General Chair:
Herman Schmit
Tabula
,
Program Chair:
Steve Wilton
University of British Columbia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

FPGA05

Sponsor:

FPGA05: ACM/SIGDA International Symposium on Field Programmable Gate Arrays 2005

February 20 - 22, 2005

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

102
Total Citations
View Citations
1,222
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Bharathi MMahaboob Basha SKumar ENehal GLapakshi AChintalapalli A(2023)Customised Design of 16-Bit DSP Processor for Signal Processing Applications2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10100153(930-935)Online publication date: 14-Mar-2023
https://doi.org/10.1109/ICIDCA56705.2023.10100153
AlAhmadi SMohammed TAlbeshri AKatib IMehmood R(2020)Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)Electronics10.3390/electronics91016759:10(1675)Online publication date: 13-Oct-2020
https://doi.org/10.3390/electronics9101675
Pligouroudis MNuno RKazmierski T(2020)Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181266(1-5)Online publication date: Oct-2020
https://doi.org/10.1109/ISCAS45731.2020.9181266
Rakesh HSunitha G(2020)Design and Implementation of Novel 32-Bit MAC Unit for DSP Applications2020 International Conference for Emerging Technology (INCET)10.1109/INCET49848.2020.9154177(1-6)Online publication date: Jun-2020
https://doi.org/10.1109/INCET49848.2020.9154177
Muller JBrunie Nde Dinechin FJeannerod CJoldes MLefèvre VMelquiond GRevol NTorres SMuller JBrunie Nde Dinechin FJeannerod CJoldes MLefèvre VMelquiond GRevol NTorres S(2018)Hardware Implementation of Floating-Point ArithmeticHandbook of Floating-Point Arithmetic10.1007/978-3-319-76526-6_8(267-320)Online publication date: 3-May-2018
https://doi.org/10.1007/978-3-319-76526-6_8
Siddhartha Kapre N(2017)eBSPProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130396(73-78)Online publication date: 27-Mar-2017
https://dl.acm.org/doi/10.5555/3130379.3130396
Siddhartha Kapre N(2017)eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III processorDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7926961(73-78)Online publication date: Mar-2017
https://doi.org/10.23919/DATE.2017.7926961
Sano KYamamoto S(2017)FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP BlocksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.269177028:10(2823-2837)Online publication date: 1-Oct-2017
https://doi.org/10.1109/TPDS.2017.2691770
Albericio JJudd PHetherington TAamodt TJerger NMoshovos A(2016)CnvlutinACM SIGARCH Computer Architecture News10.1145/3007787.300113844:3(1-13)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1145/3007787.3001138
Grigoras PBurovskiy PLuk WChen DGreene J(2016)CASKProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2847263.2847338(179-184)Online publication date: 21-Feb-2016
https://dl.acm.org/doi/10.1145/2847263.2847338
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs

Sparse Matrix-Vector multiplication on FPGAs

Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer