Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/512529.512550acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

A compiler approach to fast hardware design space exploration in FPGA-based systems

Published: 17 May 2002 Publication History

Abstract

The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.

References

[1]
S. Abraham, B. Rau, R. Schreiber, G. Snider, and M. Schlansker. Efficient design space exploration in PICO. Tech. report, HP Labs, 1999]]
[2]
J. Babb, M. Rinard, A. Moritz, W. Lee, M. Frank, R. Barua and S. Amarasinghe. Parallelizing Applications into Silicon. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'99), 1999]]
[3]
R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A compiler-managed memory system for raw machines. In Proc. of the 26th Intl. Symp. on Computer Architecture (ISCA'99), 1999]]
[4]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proc. of the ACM Conference on Program Language Design and Implementation (PLDI'90), pages 53--65, 1990]]
[5]
S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 15(3):400--462, July 1994]]
[6]
Altera Corp. APEX II programmable logic device data sheets. 2001]]
[7]
D. Cronquist, P. Franklin, and C. Ebeling. Specifying and compiling applications for RaPiD. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'98), pages 116--125, 1998]]
[8]
S. Derrien and S. Rajopadhye. Loop tiling for reconfigurable accelerators. In Proc. of the Eleventh Intl. Symp. on Field Programmable Logic (FPL'2001), 2001]]
[9]
P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler. Bridging the gap between compilation and synthesis in the DEFACTO system. In Proc. of the Forteenth Workshop on Languages and Compilers for Parallel Computing (LCPC'2001), August 2001. To be published as Lecture Notes in Computer Science]]
[10]
J. P. Elliott. UnderStanding Behavioral Synthesis: A Practical Guide to High-Level Design. 1999]]
[11]
J. Frigo, M. Gokhale, and D. Lavenier. Evaluation of the Streams-C C-to-FPGA compiler: an applications perspective. In Proc. of the ACM Symp. on Field Programmable Gate Arrays (FPGA'2002), 2001]]
[12]
S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. Taylor, and R. Laufer. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. of the 26th Intl. Symp. on Computer Architecture (ISCA'99), 1999]]
[13]
Annapolis~MicroSystems WildStar™ manual, 4.0. 1999]]
[14]
Mentor Graphics Monet™ user's manual (release r42). 1999]]
[15]
XILINX Virtex-II 1.5V FPGA data sheet. ds031(v1.7). 2001]]
[16]
D. Knapp. Behavioral Synthesis. Prentice-Hall, 1996]]
[17]
D. Kulkarni, W. Najjar, R. Rinker, and F. Kurdahi. Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems. In Proc. of the IEEE Symp. on FPGAs for Custom Computing Machines (FCCM'2002), 2002]]
[18]
M. Leong, O. Cheung, K. Tsoi, and P. Leong. A bit-serial implementation of the international data encryption algorithm IDEA. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'98), pages 122--131, 1998]]
[19]
Y. Li, T. Callahan, E. Darnell, R.E. Harr, U. Kurkure, and J. Stockwood. Hardware-software co-design of embedded reconfigurable architectures. In Proc. of the Design Automation Conference (DAC '00), June, 2000]]
[20]
W. Luk, D. Ferguson, and I. Page. Structured hardware compilation of parallel programs. Abingdon EE &CS Books, 1994]]
[21]
I. Page and W. Luk. Compiling OCCAM into FPGAs. In Proc. of the First Intl. Symp. on Field Programmable Logic (FPL'91), 1991]]
[22]
J. Proakis and D. G. Manolakis. Digital Signal Processing: Principles, Algorithms and Applications. Prentice-Hall, 1995]]
[23]
R. Rinker, M. Carter, A. Patel, M.Chawathe, C. Ross, J. Hammes, W. Najjar, and W. Bohm. An automated process for compiling dataflow graphs into reconfigurable hardware. IEEE Trans. on VLSI Systems, 9(1):130--139, 2001]]
[24]
M. Weinhardt. Compilation and pipeline synthesis for reconfigurable architectures. In Proc. of the 1997 Reconfigurable Architectures Workshop RAW'97. Springer-Verlag, 1997]]
[25]
M. Wolfe. Optimizing Supercompilers for Supercomputers. Addison-Wesley, 1996]]
[26]
H. Ziegler, B. So, M. Hall, and P. Diniz. Coarse-Grain Pipelining for Multiple FPGA Architectures. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'02), 2002]]

Cited By

View all
  • (2021)Thread-Aware Area-Efficient High-Level Synthesis Compiler for Embedded Devices2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO51591.2021.9370341(327-339)Online publication date: 27-Feb-2021
  • (2019)Fast Mapping-Based High-Level Synthesis of Pipelined Circuits20th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2019.8697596(33-38)Online publication date: Mar-2019
  • (2018)Sensei: An area-reduction advisor for FPGA high-level synthesis2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8341974(25-30)Online publication date: Mar-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '02: Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation
June 2002
338 pages
ISBN:1581134630
DOI:10.1145/512529
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data dependence analysis
  2. design space exploration
  3. loop transformations
  4. reuse analysis

Qualifiers

  • Article

Conference

PLDI02
Sponsor:

Acceptance Rates

PLDI '02 Paper Acceptance Rate 28 of 169 submissions, 17%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Thread-Aware Area-Efficient High-Level Synthesis Compiler for Embedded Devices2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO51591.2021.9370341(327-339)Online publication date: 27-Feb-2021
  • (2019)Fast Mapping-Based High-Level Synthesis of Pipelined Circuits20th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2019.8697596(33-38)Online publication date: Mar-2019
  • (2018)Sensei: An area-reduction advisor for FPGA high-level synthesis2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8341974(25-30)Online publication date: Mar-2018
  • (2018)Lattice-Traversing Design Space Exploration for High Level Synthesis2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00040(210-217)Online publication date: Oct-2018
  • (2018)Efficiency analysis methodology of FPGAs based on lost frequencies, area and cyclesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.11.012113(204-217)Online publication date: Mar-2018
  • (2017)FlexCLProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062251(1-6)Online publication date: 18-Jun-2017
  • (2016)Core-level modeling and frequency prediction for DSP applications on FPGAsInternational Journal of Reconfigurable Computing10.1155/2015/7846722015(7-7)Online publication date: 1-Jan-2016
  • (2016)A DSL Compiler for Accelerating Image Processing Pipelines on FPGAsProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967969(327-338)Online publication date: 11-Sep-2016
  • (2016)Lin-analyzerProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2898040(1-6)Online publication date: 5-Jun-2016
  • (2016)Probabilistic Multiknob High-Level Synthesis Design Space Exploration AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.247200735:3(394-406)Online publication date: 1-Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media