Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1646461.1646463acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

An integrated reduction technique for a double precision accumulator

Published: 15 November 2009 Publication History

Abstract

The accumulation operation, An+1 = An + X, is perhaps one of the most fundamental and widely-used operations in numerical mathematics and digital signal processing. However, designing double-precision floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists between An+1 and An requires that each new value of X delivered to the accumulator wait for the latency of the adder. There have been several techniques proposed for alleviating this problem, but each carries significant overheads and/or restrictions on input characteristics. In this paper we present a design for a double precision accumulator that requires no timing overhead relative to the underlying add operation. We achieve this by integrating a coalescing reduction circuit within the low-level design of a base-converting floating-point adder. To demonstrate our accumulator design, we use it in a sparse matrix vector multiplication architecture, achieving a throughput of up to 3.7 GFLOPS.

References

[1]
M. deLorimier, A. DeHon, "Floating-point sparse matrix-vector multiply for FPGAs," Proc. 13th ACM/SIGDA Symposium on Field-Programmable Gate Arrays (FPGA 2005).
[2]
L. Zhou, V. K. Prasanna, "Sparse Matrix-Vector Multiplication on FPGAs," Proc. 133h ACM/SIGDA Symposium on Field-Programmable Gate Arrays (FPGA 2005).
[3]
L. Zhuo, V. K. Prasanna, "High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs," IEEE Trans. Parallel and Dist. Sys., Vol. 18, No. 10, October 2007.
[4]
Jason D. Bakos, Krishna K. Nagar, "Exploiting Matrix Symmetry to Improve FPGA-Accelerated Conjugate Gradient," 17th Annual IEEE International Symposium on Field Programmable Custom Computing Machines, April 5--8, 2009.
[5]
M. Gerards, "Streaming Reduction Circuit for Sparse Matrix Vector Multiplication in FPGAs". Master Thesis, University of Twente, The Netherlands, August 15, 2008.
[6]
J. Sun, G. Peterson, O. Storaasli, "Sparse Matrix-Vector Multiplication Design for FPGAs," Proc. 15th IEEE International Symposium on Field Programmable Computing Machines (FCCM 2007).
[7]
S. R. Vangal, Y. V. Hoskote, N. Y. Borkar, A. Alvandpour, "A 6.2-GFlops Floating-Point Multiply-Accumulator With Conditional Normalization," IEEE Journal of Solid-State Circuits, Vol. 41, No. 10, Oct. 2006.
[8]
Z. Luo, M. Martonosi, "Accelerating Pipelined Integer and Floating Point Accumulations in Configurable Hardware with Delayed Addition Techniques," IEEE Transactions on Computers, Vol. 49 No. 3 March 2000.
[9]
Matrix Market, http://math.nist.gov/MatrixMarket.
[10]
The University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices.

Cited By

View all
  • (2014)A hardware-software co-design approach for implementing sparse matrix vector multiplication on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2014.02.00438:8(873-888)Online publication date: 1-Nov-2014
  • (2011)Higher-Order Abstraction in Hardware Descriptions with C?aSHProceedings of the 2011 14th Euromicro Conference on Digital System Design10.1109/DSD.2011.69(495-502)Online publication date: 31-Aug-2011

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HPRCTA '09: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
November 2009
61 pages
ISBN:9781605587219
DOI:10.1145/1646461
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. IEEE 754
  2. accumulator
  3. double precision
  4. high-performance computing
  5. reconfigurable computing
  6. reduction
  7. scientific computing

Qualifiers

  • Research-article

Funding Sources

Conference

SC '09
Sponsor:

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2014)A hardware-software co-design approach for implementing sparse matrix vector multiplication on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2014.02.00438:8(873-888)Online publication date: 1-Nov-2014
  • (2011)Higher-Order Abstraction in Hardware Descriptions with C?aSHProceedings of the 2011 14th Euromicro Conference on Digital System Design10.1109/DSD.2011.69(495-502)Online publication date: 31-Aug-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media