research-article

An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor

Authors:

Hadi Parandeh-Afshar,

Paolo IenneAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 2, Issue 3

Article No.: 19, Pages 1 - 42

https://doi.org/10.1145/1575774.1575778

Published: 01 September 2009 Publication History

Abstract

To improve FPGA performance for arithmetic circuits that are dominated by multi-input addition operations, an FPGA logic block is proposed that can be configured as a 6:2 or 7:2 compressor. Compressors have been used successfully in the past to realize parallel multipliers in VLSI technology; however, the peculiar structure of FPGA logic blocks, coupled with the high cost of the routing network relative to ASIC technology, renders compressors ineffective when mapped onto the general logic of an FPGA. On the other hand, current FPGA logic cells have already been enhanced with carry chains to improve arithmetic functionality, for example, to realize fast ternary carry-propagate addition. The contribution of this article is a new FPGA logic cell that is specialized to help realize efficient compressor trees on FPGAs. The new FPGA logic cell has two variants that can respectively be configured as a 6:2 or a 7:2 compressor using additional carry chains that, coupled with lookup tables, provide the necessary functionality. Experiments show that the use of these modified logic cells significantly reduces the delay of compressor trees synthesized on FPGAs compared to state-of-the-art synthesis techniques, with a moderate increase in area and power consumption.

References

[1]

Betz, V. and Rose, J. 1997. VPR: A new packing, placement, and routing tool for FPGA research. In Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications. 213--222.

Digital Library

[2]

Betz, V., Rose, J., and Marquardt, A. 1999. Architecture and CAD for Deep Submicron FPGAs. Kluwer Academic Publishers, Norwell, MA.

Digital Library

[3]

Brisk, P., Verma, A. K., Ienne, P., and Parandeh-Afshar, H. 2007. Enhancing FPGA performance for arithmetic circuits. In Proceedings of the 44th Design Automation Conference. 404--409.

Digital Library

[4]

Cevrero, A., Athanasopoulos, P., Parandeh-Afshar, H., Verma, A. K., Brisk, P., Gurkaynak, F. K., Leblebici, Y., and Ienne, P. 2008. Architectural improvements for field programmable counter arrays: Enabling efficient synthesis of fast compressor trees on FPGAs. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 181--190.

Digital Library

[5]

Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., and Chen, L.-G. 2006. Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circ. Syst. 53, 578--593.

[6]

Cherepacha, D. and Lewis, D. 1996. DP-FPGA: an FPGA architecture optimizated for datapaths. VLSI Des. 4, 329--343.

[7]

Choy, N. C. K. and Wilton, S. J. E. 2006. Activity-based power estimation and characterization of DSP and multiplier blocks in FPGAs. In Proceedings of the IEEE International Conference on Field Programmable Technology. 253--256.

[8]

Cong, J. and Huang, H. 2005. Technology mapping and architecture evaluation for k/m-macrocell-based FPGAs. ACM Trans. Des. Automat. Electron. Syst. 10, 3--23.

Digital Library

[9]

Dadda, L. 1965. Some schemes for parallel multipliers. Alta Frequenza 34, 349--356.

[10]

DeHon, A. 1999. Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100&percnt; LUT utilization). In Proceedings of the International Symposium on Field Programmable Gate Arrays. 69--76.

Digital Library

[11]

Fadavi-Ardekani, J. 1993. M × N Booth encoded multiplier generator using optimized Wallace trees. IEEE Trans. VLSI Syst. 1, 120--125.

Digital Library

[12]

Frederick, M. T. and Somani, A. K. 2006. Multi-bit carry chains for high performance reconfigurable fabrics. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--6.

[13]

Grover, R. S., Shang, W., and Li, Q. 2002. A faster distributed arithmetic architecture for FPGAs. In Proceedings of the 10th International Symposium on FPGAs. 31--39.

Digital Library

[14]

Hauck, S., Hosler, M. M., and Fry, T. W. 2000. High-performance carry chains for FPGAs. IEEE Trans. VLSI Syst. 8, 138--147.

Digital Library

[15]

Hu, Y., Das, S., Trimberger, S., and He, L. 2007. Design, synthesis, and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates. In Proceedings of the International Conference on Computer-Aided Design. 188--193.

Digital Library

[16]

Jamieson, P. and Rose, J. 2006. Enhancing the area of FPGAs with hard circuits using shadow clusters. In Proceedings of the IEEE International Conference on Field-Programmable Technology. 1--8.

[17]

Kastner, R., Kaplan, A., Ogrenci-Memik, S., and Bozorgzadeh, E. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Automat. Electro. Syst. 7, 605--627.

Digital Library

[18]

Kaviani, A., Vranisec, D., and Brown, S. 1998. Computational field programmable architecture. In Proceedings of the IEEE Custom Integrated Circuits Conference. 261--264.

[19]

Kuon, I. and Rose, J. 2007. Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput.-Aid. Des. 26, 203--215.

Digital Library

[20]

Kwon, O., Nowka, K., and Swartzlander Jr., E. E. 2002. A 16-bit by 16-bit MAC design using fast 5:3 compressor cells. J. VLSI Sign. Process. 31, 77--89.

Digital Library

[21]

Lamoureux, J. and Wilton, S. J. E. 2006. Activity estimation for field programmable gate arrays. In Proceedings of the 16th International Conference on Field Programmable Logic and Applications. 1--8.

[22]

Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th International Symposium on Microarchitecture. 330--335.

Digital Library

[23]

Leijten-Nowak, K. and Van Meerbergen, J. L. 2003. An FPGA architecture with enhanced datapath functionality. In Proceedings of the 11th International Symposium on FPGAs. 195--204.

Digital Library

[24]

Mirzaei, S., Hosangadi, A., and Kastner, R. 2006. FPGA implementation of high speed FIR filters using add and shift method. In Proceedings of the International Conference on Computer Design. 308--313.

[25]

Mora Mora, H., Pascual Mora, J., Sanchez Romero, J. L., and Pujol Lopez, F. 2006. Partial product reduction based on look-up tables. In Proceedings of the International Conference on VLSI Design. 399--404.

Digital Library

[26]

Najm, F. N. 1994. A survey of power estimation techniques in VLSI circuits. IEEE Trans. VLSI Syst. 2, 446--455.

Digital Library

[27]

Oklobdzija, V. G. and Villeger, D. 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. VLSI Syst. 3, 292--301.

Digital Library

[28]

Parandeh-Afshar, H. Brisk, P., and Ienne, P. 2008a. A novel FPGA logic block for improved arithmetic performance. In Proceedings of the 16th International Symposium on Field Programmable Gate Arrays. 171--180.

Digital Library

[29]

Parandeh-Afshar, H., Brisk, P., and Ienne, P. 2008b. Efficient synthesis of compressor trees on FPGAs. In Proceedings of the Asia-South Pacific Design Automation Conference. 138--143.

Digital Library

[30]

Parandeh-Afhsar, H., Brisk, P., and Ienne, P. 2008c. Improving synthesis of compressor trees on FPGAs via integer linear programming. In Proceedings of the International Conference on Design Automation and Test in Europe. 1256--1262.

Digital Library

[31]

Parandeh-Afshar, H. Brisk, P., and Ienne, P. 2009. Exploiting fast carry chains of FPGAs for designing compressor trees. In Proceedings of the 19th International Conference on Field Programmable Logic and Applications. 242--249.

[32]

Parhami, B. 2000. Computer Arithmetic, Algorithms and Hardware Designs. Oxford University Press.

Digital Library

[33]

Poldre, J. and Tammemae, K. 1999. Reconfigurable multiplier for Virtex FPGA family. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. 359--364.

Digital Library

[34]

Poon, K. K. W., Wilton, S. J. E., and Yan, A. 2005. A detailed power model for field-programmable gate arrays. ACM Trans. Des. Automat. Electro. Syst. 10, 279--302.

Digital Library

[35]

Santoro, M. and Horowitz, M. 1988. A pipelined 64x64b iterative array multiplier. In Proceedings of the IEEE Solid State Circuits Conference. 36--37, 290.

[36]

Song, P. J. and De Micheli, G. 1991. Circuit and architecture tradeoffs for high-speed multiplication. IEEE J. Solid-State Circ. 26, 1184--1198.

[37]

Stelling, P. F., Martel, C. U., Oklobdzija, V. J., and Ravi, R. 1998. Optimal circuits for parallel multipliers. IEEE Trans. Comput. 47, 273--285.

Digital Library

[38]

Stelling, P. F. and Oklobdzija, V. J. 1996. Design strategies for optimal hybrid final adders in a parallel multiplier. J. VLSI Signal Process. 14, 321--331.

Digital Library

[39]

Stenzel, W. J., Kubitz, W. J., and Garcia, G. H. 1977. A compact high-speed parallel multiplication scheme. IEEE Trans. Comput. C-26, 948--957.

Digital Library

[40]

Swartzlander Jr., E. E. 1973. Parallel counters. IEEE Trans. Comput. C-22, 1021--1024.

Digital Library

[41]

Um, J. and Kim, T. 2002. Layout-aware synthesis of arithmetic circuits. In Proceedings of the 39th Design Automation Conference. 207--212.

Digital Library

[42]

Verma, A. K., Brisk, P., and Ienne, P. 2008. Data-flow transformations to maximise the use of carry-save representation in arithmetic circuits. IEEE Trans. Comput.-Aid. Des. 27, 1761--1774.

Digital Library

[43]

Verma, A. K. and Ienne, P. 2007a. Automatic synthesis of compressor trees: Reevaluating large counters. In Proceedings of the International Conference on Design Automation and Test in Europe. 443--448.

Digital Library

[44]

Verma, A. K. and Ienne, P. 2007b. Improving XOR-dominated circuits by exploiting dependencies between operands. In Proceedings of the Asia-South Pacific Design Automation Conference. 601--608.

Digital Library

[45]

Wallace, C. S. 1964. A suggestion for a fast multiplier. IEEE Trans. Elec. Comput. 13, 14--17.

[46]

Weinberger, A. 1981. A 4:2 carry save adder module. IBM Techn. Disclos. Bull. 23.

[47]

Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., and Troxel, B. 2002. A hybrid ASIC and FPGA architecture. In Proceedings of the International Conference on Computer-Aided Design. 187--194.

Digital Library

Cited By

Rasoulinezhad SSiddhartha Zhou HWang LBoland DLeong PNeuendorffer SShannon L(2020)LUXORProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375303(161-171)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3373087.3375303
Kim JLee JAnderson J(2018)FPGA Architecture Enhancements for Efficient BNN Implementation2018 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2018.00039(214-221)Online publication date: Dec-2018
https://doi.org/10.1109/FPT.2018.00039
Parandeh-Afshar HBenbihi HNovo DIenne PCompton KHutchings B(2012)Rethinking FPGAsProceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays10.1145/2145694.2145715(119-128)Online publication date: 22-Feb-2012
https://dl.acm.org/doi/10.1145/2145694.2145715

Index Terms

An FPGA Logic Cell and Carry Chain Configurable as a 6:2 or 7:2 Compressor
1. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Arbitrary-precision arithmetic
      2. Interval arithmetic

Recommendations

Compressor tree synthesis on commercial high-performance FPGAs

Compressor trees are a class of circuits that generalizes multioperand addition and the partial product reduction trees of parallel multipliers using carry-save arithmetic. Compressor trees naturally occur in many DSP applications, such as FIR filters, ...
A novel FPGA logic block for improved arithmetic performance
FPGA '08: Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays

To improve FPGA performance for arithmetic circuits, this paper proposes a new architecture for FPGA logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed ...
Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs

Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 2, Issue 3

September 2009

121 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/1575774

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2009

Accepted: 01 June 2009

Revised: 01 February 2009

Received: 01 August 2008

Published in TRETS Volume 2, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
527
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rasoulinezhad SSiddhartha Zhou HWang LBoland DLeong PNeuendorffer SShannon L(2020)LUXORProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375303(161-171)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3373087.3375303
Kim JLee JAnderson J(2018)FPGA Architecture Enhancements for Efficient BNN Implementation2018 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2018.00039(214-221)Online publication date: Dec-2018
https://doi.org/10.1109/FPT.2018.00039
Parandeh-Afshar HBenbihi HNovo DIenne PCompton KHutchings B(2012)Rethinking FPGAsProceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays10.1145/2145694.2145715(119-128)Online publication date: 22-Feb-2012
https://dl.acm.org/doi/10.1145/2145694.2145715

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents