research-article

Reducing the cost of floating-point mantissa alignment and normalization in FPGAs

Authors:

Yehdhih Ould Mohammed Moctar,

Hadi Parandeh-Afshar,

Guy G.F. Lemieux,

Philip BriskAuthors Info & Claims

FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

Pages 255 - 264

https://doi.org/10.1145/2145694.2145738

Published: 22 February 2012 Publication History

Abstract

In floating-point datapaths synthesized on FPGAs, the shifters that perform mantissa alignment and normalization consume a disproportionate number of LUTs. Shifters are implemented using several rows of small multiplexers; unfortunately, multiplexer-based logic structures map poorly onto LUTs. FPGAs, meanwhile, contain a large number of multiplexers in the programmable routing network; these multiplexer are placed under static control of the FPGA's configuration bitstream. In this work, we modify some of the routing multiplexers in the intra-cluster routing network of a CLB in an FPGA to implement shifters for floating-point mantissa alignment and normalization; the number of CLBs required for these operations is reduced by 67%. If shifting is not required, the routing multiplexers that have been modified can be configured to operate as normal routing multiplexers, so no functionality is sacrificed. The area overhead incurred by these modifications is small, and there is no need to modify every routing multiplexer in the FPGA. Experiments show that there is no negative impact in terms of clock frequency or routability for benchmarks that do not use the dynamic multiplexers.

References

[1]

Ahmed, E., and Rose, J. The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans. VLSI, vol. 12, no. 3, March, 2003, pp. 288--298. DOI= http://dx.doi.org/10.1109/TVLSI.2004.824300

Digital Library

[2]

Beauchamp, M. J., Hauck, S., Underwood, K. D., and Hemmert, K. S. Architectural modifications to enhance the floating-point performance of FPGAs. IEEE Trans. VLSI, vol. 16, no. 2, Feb. 2008, pp. 177--187. DOI= http://dx.doi.org/10.1109/TVLSI.2007.912041

Digital Library

[3]

Berkeley Logic Synthesis and Verification Group. "ABC: A system for sequential synthesis and verification.: December 2005 release. URL= http://www.eecs.berkeley.edu/~alanmi/abc

[4]

Betz, V., and Rose, J., "Automatic generation of FPGA routing architectures from high-level descriptions," ACM/SIGDA Int. Symp. FPGAs (FPGA '00), pp. 175--184, Feb. 10-11, 2000, DOI= http://doi.acm.org/10.1145/329166.329203

Digital Library

[5]

Chong, Y. and Parameswaran, S., "Flexible multi-mode embedded floating-point unit for field programmable gate arrays," ACM/SIGDA Int. Symp. FPGAs (FPGA '09), pp. 171--180, Feb. 22-24, 2009, DOI= http://doi.acm.org/10.1145/1508128.1508155

Digital Library

[6]

de Dinechin, F., Klein, C., and Pasca, B., "Generating high-performance custom floating-point pipelines," Int. Conf. Field Programmable Logic and Applications (FPL '09), Aug. 31-Sept. 2, 2009. DOI=http://dx.doi.org/10.1109/FPL.2009.527255/

[7]

Feng, W. and Kaptanoglu, S. Designing Efficient Input Interconnect Blocks for LUT Clusters Using Counting and Entropy. ACM Trans. Reconfigurable Technol. Syst., vol. 1, no. 1, Mar. 2008, pp. 1--28. DOI= http://doi.acm.org/10.1145/1331897.1331902

Digital Library

[8]

Gigliotti, P., "Implementing barrel shifters using multipliers," XAPP -- Application Note: Virtex II Family, pp. 1--4, Aug., 2004. URL= http://www.xilinx.com/support/documentation/application_notes/xapp195.pdf

[9]

Ho, C. H., et al., Floating-point FPGA: architecture and modeling. IEEE Trans. VLSI, vol. 17, no. 12, Dec. 2009, pp. 1709--1718. DOI= http://dx.doi.org/10.1109/TVLSI.2008.2006616

Digital Library

[10]

IWLS 2005 benchmarks. URL= http://iwls.org/iwls2005/benchmarks.html

[11]

Jamieson, P., and Rose, J., "Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters," IEEE Trans. CAD, vol. 18, no. 12, Dec. 2010, pp. 1696--1709. DOI = http://dx.doi.org/10.1109/TVLSI.2009.2026651

Digital Library

[12]

Jamieson, P., and Rose, J., "Mapping multiplexers onto hard multipliers in FPGAs," 3rd Int. IEEE Northeast Workshop on Circuits & Systems (IEEE-NEWCAS '05), pp. 323--326, June 19-22, 2005. DOI= http://dx.doi.org/10.1109/NEWCAS.2005.1496692

[13]

Kaviani, A., FPGA with improved structure for implementing large multiplexors. U.S. patent, no. US 6,556,042 B1, Apr. 29, 2003.

[14]

I. Kuon and J. Rose, "Area and delay trade-offs in the circuit and architecture design of FPGAs," ACM/SIGDA Int. Symp. FPGAs (FPGA '08), pp. 149--158, Feb. 24-26, 2008, DOI= http://doi.acm.org/10.1145/1344671.1344695

Digital Library

[15]

I. Kuon and J. Rose, "Automated transistor sizing for FPGA architecture exploration," ACM/IEEE Design Automation Conference (DAC '08), pp. 792--795, June 8-13, 2008, DOI= http://doi.acm.org/10.1145/1391469.1391671

Digital Library

[16]

Langhammer, M., "Floating point datapath synthesis for FPGAs," Int. Conf. Field Programmable Logic and Applications, (FPL '08), pp.355--360, Sept. 8--10, 2008. DOI= http://dx.doi.org/10.1109/FPL.2008.4629963

[17]

Langhammer, M., and Vancourt, T., "FPGA floating point datapath compiler," IEEE Symp. 17th IEEE Symp. Field-programamble Custom Computing Machines (FCCM '09), April 5-7, 2009. DOI = http://dx.doi.org/10.1109/FCCM.2009.54

Digital Library

[18]

Lemieux, G. Lee, E. Tom, M., and Yu, A. "Directional and single-driver wires in FPGA interconnect," IEEE International Conference on Field-Programmable Technology (FPT '04), pp. 41--48, Dec. 6-8, 2004.

[19]

Lemieux, G, and Lewis, D. "Using sparse crossbars within LUT clusters," ACM/SIGDA Int. Symp. FPGAs (FPGA '01), pp. 59--68, Feb. 11-13, 2001, DOI= http://doi.acm.org/10.1145/360276.360299

Digital Library

[20]

Luu, J., Kuon, I., Jamieson, P., Campbell, T., Ye, A., Fang, W. M., and Rose, J. "VPR 5.0: FPGA CAD and architecture exploration tools with single-driver routing, heterogeneity and process scaling," ACM/SIGDA Int. Symp. FPGAs (FPGA '09), pp. 133--142, Feb. 22-24, 2009, DOI= http://doi.acm.org/10.1145/1508128.1508150

Digital Library

[21]

Marquardt, A., Betz, V., and Rose, J. "Timing-driven placement for FPGAs," ACM/SIGDA Int. Symp. FPGAs (FPGA '00), pp. 203--213, Feb. 10-11, 2000, DOI= http://doi.acm.org/10.1145/329166.329208

Digital Library

[22]

Marquardt, A., Betz, V., and Rose, J. "Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density," ACM/SIGDA Int. Symp. FPGAs (FPGA '99), pp. 37--46, Feb. 21-23, 1999, DOI= http://doi.acm.org/10.1145/296399.296426

Digital Library

[23]

McMurchie, L., and Ebeling, C. "PathFinder: a negotiation-based performance-driven router for FPGAs," ACM/SIGDA Int. Symp. FPGAs (FPGA '95), pp. 111--117, Feb. 12-14, 1995, DOI= http://doi.acm.org/10.1145/201310.201328

Digital Library

[24]

Metzgen, P., and Nancekievill, D. Multiplexer restructuring for FPGA implementation cost reduction. Design Automation Conf. (DAC '05) pp. 421--426, June 13-17, 2005, DOI= http://doi.acm.org/10.1145/1065579.1065692

Digital Library

[25]

Verma, A., et al. "Synthesis of floating-point addition clusters on FPGAs using carry-save arithmetic," Int. Conf. Field Programmable Logic and Applications (FPL '10), pp. 19--24, Aug. 31-Sep. 2, 2010.

Digital Library

[26]

Xilinx Corporation. Virtex-6 FPGA DSP48E1 Slice User Guide UG369 (v1.2), September 16, 2009. URL= http://www.xilinx.com/support/documentation/user_guides/ug369.pdf

Cited By

Sun JLao Y(2024)Efficient Data Extraction Circuit for Posit Number System: LDD-Based Posit DecoderIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334729543:6(1919-1923)Online publication date: Jun-2024
https://doi.org/10.1109/TCAD.2023.3347295
Pitchai SPitchai S(2023)Area-latency efficient floating point adder using interleaved alignment and normalizationMicroprocessors and Microsystems10.1016/j.micpro.2023.10484299(104842)Online publication date: Jun-2023
https://doi.org/10.1016/j.micpro.2023.104842
de Dinechin FKumm Mde Dinechin FKumm M(2023)Shifters and Leading Bit CountersApplication-Specific Arithmetic10.1007/978-3-031-42808-1_10(307-327)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_10
Show More Cited By

Index Terms

Reducing the cost of floating-point mantissa alignment and normalization in FPGAs
1. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs

Recommendations

Floating-point divider design for FPGAs

Growth in floating-point applications for field-programmable gate arrays (FPGAs) has made it critical to optimize floating-point units for FPGA technology. The divider is of particular interest because the design space is large and divider usage in ...
Area-latency efficient floating point adder using interleaved alignment and normalization
Highlights
- Bidirectional barrel shifter replaces the two barrel shifters in conventional FP adder.
Abstract
The barrel shifter is an indispensable floating-point (FP) adder circuit. It performs the alignment on the mantissa of the smallest FP number and also normalizes the added mantissa in a conventional FP adder. Alignment and ...
Multipliers for floating-point double precision and beyond on FPGAs

The implementation of high-precision floating-point applications on reconfigurable hardware requires large multipliers. Full multipliers are the core of floating-point multipliers. Truncated multipliers, trading resources for a well-controlled accuracy ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

February 2012

352 pages

ISBN:9781450311557

DOI:10.1145/2145694

General Chair:
Katherine Compton
University of Wisconsin-Madison
,
Program Chair:
Brad Hutchings
Brigham Young University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '12

Sponsor:

SIGDA

FPGA '12: ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 22 - 24, 2012

California, Monterey, USA

Acceptance Rates

FPGA '12 Paper Acceptance Rate 20 of 87 submissions, 23%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
230
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun JLao Y(2024)Efficient Data Extraction Circuit for Posit Number System: LDD-Based Posit DecoderIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334729543:6(1919-1923)Online publication date: Jun-2024
https://doi.org/10.1109/TCAD.2023.3347295
Pitchai SPitchai S(2023)Area-latency efficient floating point adder using interleaved alignment and normalizationMicroprocessors and Microsystems10.1016/j.micpro.2023.10484299(104842)Online publication date: Jun-2023
https://doi.org/10.1016/j.micpro.2023.104842
de Dinechin FKumm Mde Dinechin FKumm M(2023)Shifters and Leading Bit CountersApplication-Specific Arithmetic10.1007/978-3-031-42808-1_10(307-327)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_10
Ebrahimi ZUllah SKumar ACheng KYang H(2020)LeAp: Leading-One Detection-Based Softcore Approximate Multipliers with Tunable AccuracyProceedings of the 25th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC47756.2020.9045171(605-610)Online publication date: 17-Jan-2020
https://dl.acm.org/doi/10.1109/ASP-DAC47756.2020.9045171
Hung EWilton S(2014)Accelerating FPGA debugACM Transactions on Design Automation of Electronic Systems10.1145/256666819:2(1-23)Online publication date: 28-Mar-2014
https://dl.acm.org/doi/10.1145/2566668
Hung EWilton SHutchings BBetz V(2013)Towards simulator-like observability for FPGAsProceedings of the ACM/SIGDA international symposium on Field programmable gate arrays10.1145/2435264.2435272(19-28)Online publication date: 11-Feb-2013
https://dl.acm.org/doi/10.1145/2435264.2435272
Shah NRose J(2012)On the difficulty of pin-to-wire routing in FPGAs22nd International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2012.6339245(83-90)Online publication date: Aug-2012
https://doi.org/10.1109/FPL.2012.6339245

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents