research-article

An integrated reduction technique for a double precision accumulator

Authors:

HPRCTA '09: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications

Pages 11 - 18

https://doi.org/10.1145/1646461.1646463

Published: 15 November 2009 Publication History

Get Access

Abstract

The accumulation operation, A_n+1 = A_n + X, is perhaps one of the most fundamental and widely-used operations in numerical mathematics and digital signal processing. However, designing double-precision floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists between A_n+1 and A_n requires that each new value of X delivered to the accumulator wait for the latency of the adder. There have been several techniques proposed for alleviating this problem, but each carries significant overheads and/or restrictions on input characteristics. In this paper we present a design for a double precision accumulator that requires no timing overhead relative to the underlying add operation. We achieve this by integrating a coalescing reduction circuit within the low-level design of a base-converting floating-point adder. To demonstrate our accumulator design, we use it in a sparse matrix vector multiplication architecture, achieving a throughput of up to 3.7 GFLOPS.

References

[1]

M. deLorimier, A. DeHon, "Floating-point sparse matrix-vector multiply for FPGAs," Proc. 13th ACM/SIGDA Symposium on Field-Programmable Gate Arrays (FPGA 2005).

Digital Library

Google Scholar

[2]

L. Zhou, V. K. Prasanna, "Sparse Matrix-Vector Multiplication on FPGAs," Proc. 133h ACM/SIGDA Symposium on Field-Programmable Gate Arrays (FPGA 2005).

Digital Library

Google Scholar

[3]

L. Zhuo, V. K. Prasanna, "High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs," IEEE Trans. Parallel and Dist. Sys., Vol. 18, No. 10, October 2007.

Digital Library

Google Scholar

[4]

Jason D. Bakos, Krishna K. Nagar, "Exploiting Matrix Symmetry to Improve FPGA-Accelerated Conjugate Gradient," 17th Annual IEEE International Symposium on Field Programmable Custom Computing Machines, April 5--8, 2009.

Digital Library

Google Scholar

[5]

M. Gerards, "Streaming Reduction Circuit for Sparse Matrix Vector Multiplication in FPGAs". Master Thesis, University of Twente, The Netherlands, August 15, 2008.

Google Scholar

[6]

J. Sun, G. Peterson, O. Storaasli, "Sparse Matrix-Vector Multiplication Design for FPGAs," Proc. 15th IEEE International Symposium on Field Programmable Computing Machines (FCCM 2007).

Digital Library

Google Scholar

[7]

S. R. Vangal, Y. V. Hoskote, N. Y. Borkar, A. Alvandpour, "A 6.2-GFlops Floating-Point Multiply-Accumulator With Conditional Normalization," IEEE Journal of Solid-State Circuits, Vol. 41, No. 10, Oct. 2006.

Crossref

Google Scholar

[8]

Z. Luo, M. Martonosi, "Accelerating Pipelined Integer and Floating Point Accumulations in Configurable Hardware with Delayed Addition Techniques," IEEE Transactions on Computers, Vol. 49 No. 3 March 2000.

Digital Library

Google Scholar

[9]

Matrix Market, http://math.nist.gov/MatrixMarket.

Google Scholar

[10]

The University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices.

Google Scholar

Cited By

View all

Jain-Mendon SSass R(2014)A hardware-software co-design approach for implementing sparse matrix vector multiplication on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2014.02.00438:8(873-888)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1016/j.micpro.2014.02.004
Gerards MBaaij CKuper JKooijman M(2011)Higher-Order Abstraction in Hardware Descriptions with C?aSHProceedings of the 2011 14th Euromicro Conference on Digital System Design10.1109/DSD.2011.69(495-502)Online publication date: 31-Aug-2011
https://dl.acm.org/doi/10.1109/DSD.2011.69

Index Terms

An integrated reduction technique for a double precision accumulator
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
2. Hardware
  1. Integrated circuits
    1. Logic circuits
  2. Very large scale integration design
    1. VLSI system specification and constraints

Recommendations

VLSI Implementation of Double-Precision Floating-Point Multiplier Using Karatsuba Technique

The double-precision floating-point arithmetic, specifically multiplication, is a widely used arithmetic operation for many scientific and signal processing applications. In general, the double-precision floating-point multiplier requires a large 53 53 ...
A Quadruple Precision and Dual Double Precision Floating-Point Multiplier
DSD '03: Proceedings of the Euromicro Symposium on Digital Systems Design

Double precision floating-point arithmetic is inadequatefor many scientific computations. This paper presents thedesign of a quadruple precision floating-point multiplierthat also supports two parallel double precision multiplications.Since hardware ...
Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Floating point arithmetic (FPA) is a crucial basic building block in many application domains such as scientific, numerical and signal processing applications. Multiplication is one of the most commonly used one in FPA. This paper presents three ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

HPRCTA '09: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications

November 2009

61 pages

ISBN:9781605587219

DOI:10.1145/1646461

General Chairs:
Volodymyr Kindratenko
NCSA, University of Illinois
,
Tarek El-Ghazawi
The George Washington University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Computing and Communication Foundations

Conference

SC '09

Sponsor:

SIGARCH

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2009

Oregon, Portland

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
137
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jain-Mendon SSass R(2014)A hardware-software co-design approach for implementing sparse matrix vector multiplication on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2014.02.00438:8(873-888)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1016/j.micpro.2014.02.004
Gerards MBaaij CKuper JKooijman M(2011)Higher-Order Abstraction in Hardware Descriptions with C?aSHProceedings of the 2011 14th Euromicro Conference on Digital System Design10.1109/DSD.2011.69(495-502)Online publication date: 31-Aug-2011
https://dl.acm.org/doi/10.1109/DSD.2011.69

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

VLSI Implementation of Double-Precision Floating-Point Multiplier Using Karatsuba Technique

A Quadruple Precision and Dual Double Precision Floating-Point Multiplier

Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support