Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Architectural modifications to enhance the floating-point performance of FPGAs

Published: 01 February 2008 Publication History

Abstract

With the density of field-programmable gate arrays (FPGAs) steadily increasing, FPGAs have reached the point where they are capable of implementing complex floating-point applications. However, their general-purpose nature has limited the use of FPGAs in scientific applications that require floating-point arithmetic due to the large amount of FPGA resources that floating-point operations still require. This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. The first modification embeds floating-point multiply-add units in an island-style FPGA. While offering a dramatic reduction in area and improvement in clock rate, these embedded units are a significant change and may not be justified by the market. The next two modifications target a major component of IEEE compliant floating-point computations: variable length shifters. The first alternative to lookup tables (LUTs) for implementing the variable length shifters is a coarse-grained approach: embedded variable length shifters in the FPGA fabric. These shifters offer a significant reduction in area with a modest increase in clock rate and are smaller and more general than embedded floating-point units. The next alternative is a fine-grained approach: adding a 4:1 multiplexer unit inside a configurable logic block (CLB), in parallel to each 4-LUT. While this offers the smallest overall area improvement, it does offer a significant improvement in clock rate with only a trivial increase in the size of the CLB.

References

[1]
K. D. Underwood, "FPGAs vs. CPUs: Trends in peak floating-point performance," in Proc. ACM Int. Symp. Field Program. Gate Arrays, 2004, pp. 171-180.
[2]
K. D. Underwood and K. S. Hemmert, "Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance," in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 2004, pp. 219-228.
[3]
K. S. Hemmert and K. D. Underwood, "An analysis of the double-precision floating-point FFT on FPGAs," in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 2005, pp. 171-180.
[4]
M. de Lorimier and A. DeHon, "Floating point sparse matrix-vector multiply for FPGAs," in Proc. ACM Int. Symp. Field Program. Gate Arrays, 2005, pp. 75-85.
[5]
G. Govindu, S. Choi, V. K. Prasanna, V. Daga, S. Gangadharpalli, and V. Sridhar, "A high-performance and energy efficient architecture for floating-point based lu decomposition on fpgas," in Proc. 11th Reconfigurable Arch. Workshop (RAW), 2004, p. 149.
[6]
L. Zhuo and V. K. Prasanna, "Scalable and modular algorithms for floating-point matrix multiplication on FPGAs," in Proc. 18th Int. Parallel Distrib. Process. Symp. (IPDPS), 2004, pp. 433-448.
[7]
L. Zhuo and V. K. Prasanna, "Sparse matrix-vector multiplication on FPGAs," in Proc. ACM Int. Symp. Field Program. Gate Arrays, 2005, pp. 63-74.
[8]
"IEEE standard for binary floating-point arithmetic," ANSI/IEEE Std. 754-1985, 1985.
[9]
K.S. Hemmert and K. D. Underwood, "Open source high performance floating-point modules," in Proc. IEEE Symp. FPGAs Custom Comput. Mach., 2006, pp. 349-350.
[10]
I. Koren, Computer Arithmetic Algorithms, 2nd ed. Natick, MA: A.K. Peters, Ltd., 2002.
[11]
Xilinx, San Jose, CA, "Virtex-II Pro and Virtex-II Pro X platform FPGAs: Complete data sheet," Jun. 2005 {Online}. Available: http://direct.xilinx.com/bvdocs/publications/ds083.pdf
[12]
Xilinx, San Jose, CA, "Virtex-4 family overview," Jun. 2005 {Online}. Available: http://direct.xilinx.com/bvdocs/publications/ds112.pdf
[13]
V. Betz and J. Rose, "VPR: A new packing, placement and routing tool for FPGA research," in Proc. 7th Int. Workshop Field-Program. Logic Appl., 1997, pp. 213-222.
[14]
V. Betz and J. Rose, Architecture and CAD for Deep-Submicron FPGAs. Boston, MA: Kluwer, 1999.
[15]
Xilinx, San Jose, CA, "Xilinx: ASMBL architecture," 2005 {On-line}. Available: http://www.xilinx.com/products/silicon_solutions/fpgas/virtex/virtex4/overview/
[16]
Xilinx, San Jose, CA, "Virtex-4 data sheet: DC and switching characteristics," Aug. 2005 {Online}. Available: http://direct.xilinx.com/bvdocs/publications/ds302.pdf
[17]
Xilinx, San Jose, CA, "Virtex-II platform FPGAs: Complete data sheet," Mar. 2005 {Online}. Available: http://direct.xilinx.com/bvdocs/publications/ds031.pdf
[18]
MIPS Technologies, Inc., Mountain View, CA, "64-Bit Cores, MIPS64 Family Features," 2005 {Online}. Available: http://www.mips.com/ content/Products/Cores/64-BitCores
[19]
J. B. Brockman, S. Thoziyoor, S. Kuntz, and P. Kogge, "A low cost, multithreaded processing-in-memory system," in Proc. 3rd Workshop Memory Performance Issues, 2004, pp. 16-22.
[20]
B. Hutchings, P. Bellows, J. Hawkins, K. S. Hemmert, B. Nelson, and M. Rytting, "A CAD suite for high-performance FPGA design," in Proc. IEEE Workshop FPGAs Custom Comput. Mach., 1999, pp. 12-24.
[21]
E. Sentovich et al., "SIS: A system for sequential circuit analysis," Univ. California, Berkeley, Tech Rep. UCB/ERL M92/41, 1992.
[22]
J. Cong and Y. Ding, "FlowMap: An optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 13, no. 1, pp. 1-12, Jan. 1994.
[23]
A. Ye and J. Rose, "Using bus-based connections to improve field-programmable gate array density for implementing datapath circuits" in Proc. ACM Int. Symp. Field-Program. Gate Arrays, 2005, pp. 3-13.
[24]
D. Lewis et al., "The stratix II logic and routing architecture," in Proc. ACM Int. Symp. Field-Program. Gate Arrays, 2005, pp. 14-20.
[25]
Xilinx, San Jose, CA, "Virtex-5 LX platform overview," 2006 {Online}. Available: http://direct.xilinx.com/bvdocs/publications/ds100.pdf
[26]
Xilinx, San Jose, CA, "Xilinx ISE 6 software manuals" 2007 {On-line}. Available: http://toolbox.xilinx.com/docsan/xilinx6/books/manuals.pdf

Cited By

View all
  • (2023)Area-latency efficient floating point adder using interleaved alignment and normalizationMicroprocessors & Microsystems10.1016/j.micpro.2023.10484299:COnline publication date: 1-Jun-2023
  • (2017)Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput TradeoffsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.258014225:1(100-113)Online publication date: 1-Jan-2017
  • (2015)Area Optimization of Arithmetic Units by Component Sharing for FPGAs (Abstract Only)Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2684746.2689146(276-276)Online publication date: 22-Feb-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 16, Issue 2
February 2008
104 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 February 2008

Author Tags

  1. Field-programmable gate array (FPGA)
  2. field-programmable gate array (FPGA)
  3. floating-point arithmetic
  4. reconfigurable architecture

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Area-latency efficient floating point adder using interleaved alignment and normalizationMicroprocessors & Microsystems10.1016/j.micpro.2023.10484299:COnline publication date: 1-Jun-2023
  • (2017)Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput TradeoffsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.258014225:1(100-113)Online publication date: 1-Jan-2017
  • (2015)Area Optimization of Arithmetic Units by Component Sharing for FPGAs (Abstract Only)Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2684746.2689146(276-276)Online publication date: 22-Feb-2015
  • (2015)Floating-Point DSP Block Architecture for FPGAsProceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/2684746.2689071(117-125)Online publication date: 22-Feb-2015
  • (2012)Reducing the cost of floating-point mantissa alignment and normalization in FPGAsProceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays10.1145/2145694.2145738(255-264)Online publication date: 22-Feb-2012
  • (2012)Real-time architecture for a robust multi-scale stereo engine on FPGAIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.217200720:12(2208-2219)Online publication date: 1-Dec-2012
  • (2012)Optimizing floating point units in hybrid FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.215388320:7(1295-1303)Online publication date: 1-Jul-2012
  • (2010)An Automated Flow for Arithmetic Component Generation in Field-Programmable Gate ArraysACM Transactions on Reconfigurable Technology and Systems10.1145/1839480.18394833:3(1-21)Online publication date: 1-Sep-2010
  • (2010)Reconfigurable custom floating-point instructions (abstract only)Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1723112.1723173(287-287)Online publication date: 21-Feb-2010
  • (2010)Improving FPGA performance for carry-save arithmeticIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2009.201438018:4(578-590)Online publication date: 1-Apr-2010
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media