Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICCAD.2017.8203844guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

HLScope+,: Fast and accurate performance estimation for FPGA HLS

Published: 13 November 2017 Publication History

Abstract

High-level synthesis (HLS) tools have vastly increased the productivity of field-programmable gate array (FPGA) programmers with design automation and abstraction. However, the side effect is that many architectural details are hidden from the programmers. As a result, programmers who wish to improve the performance of their design often have difficulty identifying the performance bottleneck. It is true that current HLS tools provide some estimate of the performance with a fixed loop count, but they often fail to do so for programs with input-dependent execution behavior. Also, their external memory latency model does not accurately fit the actual bus-based shared memory architecture. This work describes a high-level cycle estimation methodology to solve these problems. To reduce the time overhead, we propose a cycle estimation process that is combined with the HLS software simulation. We also present an automatic code instrumentation technique that finds the reason for stall accurately in on-board execution. The experimental results show that our framework provides a cycle estimate with an average error rate of 1.1% and 5.0% for compute- and DRAM-bound modules, respectively, for ADM-PCIE-7V3 board. The proposed method is about two orders of magnitude faster than the FPGA bitstream generation.

References

[1]
Alpha Data, Alpha Data ADM-PCI E-7V3 Datasheet, 2017, http://www.alpha-data.com/pdfs/adm-pcie-7v3.pdf.
[2]
Alpha Data, Alpha Data ADM-PCI E-KU3 Datasheet, 2017, http://www.alpha-data.com/pdfs/adm-pcie-ku3.pdf.
[4]
L. Benini, et al., “SystemC cosimulation and emulation of multiprocessor SoC designs,” Computer, 53–59, 2003.
[5]
L. Cai and D. Gajski, “Transaction level modeling: an overview,” in Proc. Int. Conf. Hardware/software Codesign and System Synthesis, 19–24, 2003.
[6]
A. Canis et al., “From software to accelerators with LegUp high-level synthesis,” in Proc. Int. Conf. CASES, 18–26, 2013.
[7]
Y. Choi et al., “A quantitative analysis on microarchitectures of modern CPU-FPGA platforms,” in Proc. DAC, 109–114, 2016.
[8]
Y. Choi and J. Cong, “HLScope: High-Level performance debugging for FPGA designs,” in Proc. Int. Symp. FCCM, 2017.
[9]
J. Cong et al., “CPU-FPGA co-optimization for big data applications: A case study of in-memory Samtool sorting,” in Proc. Int. Symp. FPGA, 291, 2017.
[10]
D. Finley, Optimized QuickSort, 2007, http://alienryderflex.com/quicksort.
[11]
IBM, Application Note: Understanding DRAM Operation, 1996.
[12]
Intel, Intel FPGA SDK for OpenCL, 2016, http://www.altera.com/.
[13]
J. Jang, S. Choi, and V. Prasanna, “Energy-and time-efficient matrix multiplication on FPGAs,” IEEE T. VLSI, 13(11):1305–19, 2005.
[14]
Kingston, KVR13LSE9/8 memory module specifications, 2012, http://www.kingston.com/datasheets/.
[15]
D. Koeplinger et al., “Automatic generation of efficient accelerators for reconfigurable hardware,” in Proc. ISCA, 2016.
[16]
C. Lee, O. Mutlu, V. Narasiman, and Y. Patt, “Prefetch-aware DRAM controllers,” in Proc. Int. Symp. Microarchitecture, 200–209, 2008.
[17]
J. Lei et al., “A high-throughput architecture for lossless decompression on FPGA designed using HLS,” in Proc. Int. Symp. FPGA, 277, 2016.
[18]
P. Li, P. Zhang, L. Pouchet, and J. Cong, “Resource-aware throughput optimization for high-level synthesis,” in Proc. Int. Symp. FPGA, 200–209, 2015.
[19]
J. Park, P. Diniz, and K. Shayee, “Performance and area modeling of complete FPGA designs in the presence of loop transformations,” IEEE T. Computers, 53(11):1420–1435, 2004.
[21]
B. Reagon et al., “Machsuite: Benchmarks for accelerator design and customized architectures,” in Proc. IISWC, 110–119, 2014.
[22]
ROSE compiler infrastructure, http://rosecompiler.org/.
[23]
Y. Shao et al., “Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures,” in Proc. ISCA, 97–108, 2014.
[24]
A. Verma et al., “Developing dynamic profiling and debugging support in OpenCL for FPGAs,” in Proc. DAC, 56–61, 2017.
[25]
Xilinx, AXI Reference Guide UG761, 2012, http://www.xilinx.com/.
[26]
Xilinx, SDAccel Development Environment, 2016, http://www.xilinx.com/.
[27]
Xilinx, Vivado High-level Synthesis UG902, 2016, http://www.xilinx.com/.
[28]
C. Zhang et al., “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. Int. Symp. FPGA, 161–170, 2015.
[29]
G. Zhong et al., “Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators,” in Proc. DAC, 136–141, 2016.

Cited By

View all
  • (2024)HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/365562718:1(1-26)Online publication date: 17-Dec-2024
  • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 21-Apr-2022
  • (2022)HLS_ProfilerProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511684(187-198)Online publication date: 9-Apr-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Nov 2017
1049 pages

Publisher

IEEE Press

Publication History

Published: 13 November 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/365562718:1(1-26)Online publication date: 17-Dec-2024
  • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 21-Apr-2022
  • (2022)HLS_ProfilerProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511684(187-198)Online publication date: 9-Apr-2022

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media