Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2847263.2847282acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis

Published: 21 February 2016 Publication History

Abstract

Loops are pervasive in numerical programs, so high-level synthesis (HLS) tools use state-of-the-art scheduling techniques to pipeline them efficiently. Still, the run time performance of the resultant FPGA implementation is limited by data dependences between loop iterations. Some of these dependence constraints can be alleviated by rewriting the program according to arithmetic identities (e.g. associativity and distributivity), memory access reductions, and control flow optimisations (e.g. partial loop unrolling). HLS tools cannot safely enable such rewrites by default because they may impact the accuracy of floating-point computations and increase area usage. In this paper, we introduce the first open-source program optimizer for automatically rewriting a given program to optimize latency while controlling for accuracy and area. Our tool, SOAP3, reports a multi-dimensional Pareto frontier that the programmer can use to resolve the trade-off according to their needs. When applied to a suite of PolyBench and Livermore Loops benchmarks, our tool has generated programs that enjoy up to a 12x speedup, with a simultaneous 7x increase in accuracy, at a cost of up to 4x more LUTs.

References

[1]
W. Meeus, K. Van Beeck, T. Goedemé, J. Meel, and D. Stroobandt, "An Overview of Today's High-Level Synthesis Tools," Design Automation for Embedded Systems, vol. 16, no. 3, pp. 31--51, 2012.
[2]
Berkeley Design Technology, Inc., "An Independent Evaluation of: High-Level Synthesis Tools for Xilinx FPGAs," 2010. {Online}. Available: http://www.xilinx.com/technology/dsp/BDTI_techpaper.pdf
[3]
A. Canis, S. D. Brown, and J. H. Anderson, "Modulo SDC scheduling with recurrence minimization in high-level synthesis," in FPL, 2014.
[4]
N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2002.
[5]
Xilinx, Inc., "Vivado Design Suite User Guide--High-Level Synthesis," 2015.
[6]
X. Gao, S. Bayliss, and G. A. Constantinides, "SOAP\@: Structural optimization of arithmetic expressions for high-level synthesis," in FPT, 2013.
[7]
C. Mouilleron, "Efficient computation with structured matrices and arithmetic expressions," Ph.D. dissertation, ENS LYON, 2011.
[8]
X. Gao and G. A. Constantinides, "Numerical program optimization for high-level synthesis," in FPGA, 2015.
[9]
B. R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops," in MICRO, 1994.
[10]
J. Dongarra and P. Luszczek, "Livermore Loops," in Encyclopedia of Parallel Computing. Springer US, 2011, pp. 1041--1043.
[11]
L.-N. Pouchet, "PolyBench/C--the Polyhedral Benchmark suite," http://web.cse.ohio-state.edu/ pouchet/software/polybench/.
[12]
S. Verdoolaege, "isl: An Integer Set Library for the Polyhedral Model," in ICMS, 2010.
[13]
R. Bridson, "Fast poisson disk sampling in arbitrary dimensions," in SIGGRAPH Sketches, 2007.
[14]
A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. H. Anderson, S. Brown, and T. Czajkowski, "LegUp: High-level Synthesis for FPGA-based Processor/Accelerator Systems," in FPGA, 2011.
[15]
G. Wang, W. Gong, and R. Kastner, "Operation Scheduling: Algorithms and Applications," in High-Level Synthesis. Springer, 2008.
[16]
P. Li, P. Zhang, L.-N. Pouchet, and J. Cong, "Resource-aware throughput optimization for high-level synthesis," in FPGA, 2015.
[17]
Xilinx, Inc., "Vivado Design Suite User Guide--Synthesis," 2015.
[18]
B. Rau, "Data Flow and Dependence Analysis for Instruction Level Parallelism," in Languages and Compilers for Parallel Computing, ser. LNCS. Springer, 1992, vol. 589, pp. 236--250.
[19]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM TOPLAS, vol. 13, no. 4, pp. 451--490, Oct. 1991.
[20]
D. D. Gajski and L. Ramachandran, "Introduction to high-level synthesis," IEEE Design and Test of Computers, vol. 11, no. 4, pp. 44--54, Oct. 1994.
[21]
A. Nicolau and R. Potasmann, "Incremental tree height reduction for high level synthesis," in DAC, 1991.
[22]
N. Damouche, M. Martel, and A. Chapoutot, "Intra-procedural optimization of the numerical accuracy of programs," in Formal Methods for Industrial Critical Systems, ser. LNCS. Springer, 2015, vol. 9128, pp. 31--46.
[23]
P. Panchekha, A. Sanchez-Stern, J. R. Wilcox, and Z. Tatlock, "Automatically improving accuracy for floating point expressions," in PLDI, 2015.
[24]
A. Hosangadi, F. Fallah, and R. Kastner, "Factoring and eliminating common subexpressions in polynomial expressions," in ICCAD, 2004.
[25]
A. Peymandoust and G. De Micheli, "Using symbolic algebra in algorithmic level DSP synthesis," in DAC, 2001.
[26]
D. Boland and G. Constantinides, "Revisiting the reduction circuit: A case study for simultaneous architecture and precision optimisation," in FPT, 2013.

Cited By

View all
  • (2022)A Review of Spatial Exploration of FPGA DesignComputer Science and Application10.12677/CSA.2022.12410712:04(1043-1053)Online publication date: 2022
  • (2021)Fast Resource and Timing Aware Design Optimisation for High-Level SynthesisIEEE Transactions on Computers10.1109/TC.2021.3112260(1-1)Online publication date: 2021
  • (2021)Integrating Quick Resource Estimators in Hardware Construction Framework for Design Space Exploration2021 IEEE International Workshop on Rapid System Prototyping (RSP)10.1109/RSP53691.2021.9806276(64-70)Online publication date: 14-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2016
298 pages
ISBN:9781450338561
DOI:10.1145/2847263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. floating-point arithmetic
  2. high-level synthesis
  3. loop pipelining
  4. numerical accuracy
  5. round-off error

Qualifiers

  • Research-article

Conference

FPGA'16
Sponsor:

Acceptance Rates

FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;
Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Review of Spatial Exploration of FPGA DesignComputer Science and Application10.12677/CSA.2022.12410712:04(1043-1053)Online publication date: 2022
  • (2021)Fast Resource and Timing Aware Design Optimisation for High-Level SynthesisIEEE Transactions on Computers10.1109/TC.2021.3112260(1-1)Online publication date: 2021
  • (2021)Integrating Quick Resource Estimators in Hardware Construction Framework for Design Space Exploration2021 IEEE International Workshop on Rapid System Prototyping (RSP)10.1109/RSP53691.2021.9806276(64-70)Online publication date: 14-Oct-2021
  • (2020)Automatic Selection and Insertion of HLS Directives Via a Source-to-Source Compiler2020 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT51103.2020.00039(227-232)Online publication date: Dec-2020
  • (2020)A Survey on Performance Optimization of High-Level Synthesis ToolsJournal of Computer Science and Technology10.1007/s11390-020-9414-835:3(697-720)Online publication date: 29-May-2020
  • (2019)Module-per-Object: A Human-Driven Methodology for C++-Based High-Level Synthesis Design2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM.2019.00037(218-226)Online publication date: Apr-2019
  • (2018)Software Support for Heterogeneous Computing2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00142(756-762)Online publication date: Jul-2018
  • (2017)COMBAProceedings of the 36th International Conference on Computer-Aided Design10.5555/3199700.3199757(430-437)Online publication date: 13-Nov-2017
  • (2017)Design space exploration of FPGA-based accelerators with multi-level parallelismProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130649(1141-1146)Online publication date: 27-Mar-2017
  • (2017)Design Space exploration of FPGA-based accelerators with multi-level parallelismDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7927161(1141-1146)Online publication date: Mar-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media