Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1046192.1046203acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
Article

Floating-point sparse matrix-vector multiply for FPGAs

Published: 20 February 2005 Publication History

Abstract

Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, it is not uncommon for microprocessors to yield only 10--20% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single Virtex II 6000-4 and 12 double precision Gflops for 16 Virtex IIs (750Mflops/FPGA).

References

[1]
Matrix Market. http://math.nist.gov/MatrixMarket/, June 2004. Maintained by: National Institute of Standards and Technology (NIST).
[2]
P. Belanović and M. Leeser. A Library of Parameterized Floating Point Modules and Their Use. In Proceedings of the International Conference on Field-Programmable Logic and Applications, pages 657--666, September 2002.
[3]
P. Bellows and B. Hutchings. JHDL - An HDL for Reconfigurable Systems. In K. L. Pocek and J. Arnold, editors, IEEE Symposium on FPGAs for Custom Computing Machines, pages 175--184, Los Alamitos, CA, 1998. IEEE Computer Society Press.
[4]
A. Caldwell, A. Kahng, and I. Markov. Improved Algorithms for Hypergraph Bipartitioning. In Proceedings of the Asia and South Pacific Design Automation Conference, pages 661--666, January 2000.
[5]
T. Dunigan. ORNL SGI Altix Evaluation. http://www.csm.ornl.gov/~dunigan/sgi/, September 2004.
[6]
D. S. Hochbaum, editor. Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, 1997.
[7]
J. R. Jackson. Scheduling a Production Line to Minimize Maximum Tardiness. Management Science Research Project Research Report 43, UCLA, 1955.
[8]
B. S. Landman and R. L. Russo. On Pin Versus Block Relationship for Partitions of Logic Circuits. IEEE Transactions on Computers, 20:1469--1479, 1971.
[9]
C. Leiserson, F. Rose, and J. Saxe. Optimizing Synchronous Circuitry by Retiming. In Third Caltech Conference On VLSI, March 1983.
[10]
I. Lloyd N. Trefethen, David Bau. Numerical Linear Algebra. SIAM, 3600 University City Science Center, Philadelphia, PA, 1997.
[11]
L. Oliker, A. Canning, J. Carter, J. Shalf, D. Skinner, S. Ethier, R. Biswas, J. Djomehri, and R. V. der Wijngaart. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations. In Proceedings of the IEEE/ACM Conference on Supercomputing, 2003, 2003.
[12]
K. Underwood. FPGAs vs. CPUs: Trends in Peak Floating-Point Performance. In Proceedings of the International Symposium on Field-Programmable Gate Arrays, pages 171--180, February 2004.
[13]
R. Vudoc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, UC Berkeley, 2003.
[14]
R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In Proceedings of IEEE/ACM Conference on Supercomputing, November 2002.

Cited By

View all
  • (2023)Customised Design of 16-Bit DSP Processor for Signal Processing Applications2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10100153(930-935)Online publication date: 14-Mar-2023
  • (2020)Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)Electronics10.3390/electronics91016759:10(1675)Online publication date: 13-Oct-2020
  • (2020)Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181266(1-5)Online publication date: Oct-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
February 2005
288 pages
ISBN:1595930299
DOI:10.1145/1046192
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. compressed sparse row
  3. floating point
  4. reconfigurable architecture
  5. sparse matrix

Qualifiers

  • Article

Conference

FPGA05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Customised Design of 16-Bit DSP Processor for Signal Processing Applications2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10100153(930-935)Online publication date: 14-Mar-2023
  • (2020)Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)Electronics10.3390/electronics91016759:10(1675)Online publication date: 13-Oct-2020
  • (2020)Modified Compressed Sparse Row Format for Accelerated FPGA-Based Sparse Matrix Multiplication2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181266(1-5)Online publication date: Oct-2020
  • (2020)Design and Implementation of Novel 32-Bit MAC Unit for DSP Applications2020 International Conference for Emerging Technology (INCET)10.1109/INCET49848.2020.9154177(1-6)Online publication date: Jun-2020
  • (2018)Hardware Implementation of Floating-Point ArithmeticHandbook of Floating-Point Arithmetic10.1007/978-3-319-76526-6_8(267-320)Online publication date: 3-May-2018
  • (2017)eBSPProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130396(73-78)Online publication date: 27-Mar-2017
  • (2017)eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III processorDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7926961(73-78)Online publication date: Mar-2017
  • (2017)FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP BlocksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.269177028:10(2823-2837)Online publication date: 1-Oct-2017
  • (2016)CnvlutinACM SIGARCH Computer Architecture News10.1145/3007787.300113844:3(1-13)Online publication date: 18-Jun-2016
  • (2016)CASKProceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2847263.2847338(179-184)Online publication date: 21-Feb-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media