Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICCAD.2017.8203844guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

HLScope+,: Fast and accurate performance estimation for FPGA HLS

Published: 13 November 2017 Publication History

Abstract

High-level synthesis (HLS) tools have vastly increased the productivity of field-programmable gate array (FPGA) programmers with design automation and abstraction. However, the side effect is that many architectural details are hidden from the programmers. As a result, programmers who wish to improve the performance of their design often have difficulty identifying the performance bottleneck. It is true that current HLS tools provide some estimate of the performance with a fixed loop count, but they often fail to do so for programs with input-dependent execution behavior. Also, their external memory latency model does not accurately fit the actual bus-based shared memory architecture. This work describes a high-level cycle estimation methodology to solve these problems. To reduce the time overhead, we propose a cycle estimation process that is combined with the HLS software simulation. We also present an automatic code instrumentation technique that finds the reason for stall accurately in on-board execution. The experimental results show that our framework provides a cycle estimate with an average error rate of 1.1% and 5.0% for compute- and DRAM-bound modules, respectively, for ADM-PCIE-7V3 board. The proposed method is about two orders of magnitude faster than the FPGA bitstream generation.

References

[1]
Alpha Data, Alpha Data ADM-PCI E-7V3 Datasheet, 2017, http://www.alpha-data.com/pdfs/adm-pcie-7v3.pdf.
[2]
Alpha Data, Alpha Data ADM-PCI E-KU3 Datasheet, 2017, http://www.alpha-data.com/pdfs/adm-pcie-ku3.pdf.
[4]
L. Benini, et al., “SystemC cosimulation and emulation of multiprocessor SoC designs,” Computer, 53–59, 2003.
[5]
L. Cai and D. Gajski, “Transaction level modeling: an overview,” in Proc. Int. Conf. Hardware/software Codesign and System Synthesis, 19–24, 2003.
[6]
A. Canis et al., “From software to accelerators with LegUp high-level synthesis,” in Proc. Int. Conf. CASES, 18–26, 2013.
[7]
Y. Choi et al., “A quantitative analysis on microarchitectures of modern CPU-FPGA platforms,” in Proc. DAC, 109–114, 2016.
[8]
Y. Choi and J. Cong, “HLScope: High-Level performance debugging for FPGA designs,” in Proc. Int. Symp. FCCM, 2017.
[9]
J. Cong et al., “CPU-FPGA co-optimization for big data applications: A case study of in-memory Samtool sorting,” in Proc. Int. Symp. FPGA, 291, 2017.
[10]
D. Finley, Optimized QuickSort, 2007, http://alienryderflex.com/quicksort.
[11]
IBM, Application Note: Understanding DRAM Operation, 1996.
[12]
Intel, Intel FPGA SDK for OpenCL, 2016, http://www.altera.com/.
[13]
J. Jang, S. Choi, and V. Prasanna, “Energy-and time-efficient matrix multiplication on FPGAs,” IEEE T. VLSI, 13(11):1305–19, 2005.
[14]
Kingston, KVR13LSE9/8 memory module specifications, 2012, http://www.kingston.com/datasheets/.
[15]
D. Koeplinger et al., “Automatic generation of efficient accelerators for reconfigurable hardware,” in Proc. ISCA, 2016.
[16]
C. Lee, O. Mutlu, V. Narasiman, and Y. Patt, “Prefetch-aware DRAM controllers,” in Proc. Int. Symp. Microarchitecture, 200–209, 2008.
[17]
J. Lei et al., “A high-throughput architecture for lossless decompression on FPGA designed using HLS,” in Proc. Int. Symp. FPGA, 277, 2016.
[18]
P. Li, P. Zhang, L. Pouchet, and J. Cong, “Resource-aware throughput optimization for high-level synthesis,” in Proc. Int. Symp. FPGA, 200–209, 2015.
[19]
J. Park, P. Diniz, and K. Shayee, “Performance and area modeling of complete FPGA designs in the presence of loop transformations,” IEEE T. Computers, 53(11):1420–1435, 2004.
[21]
B. Reagon et al., “Machsuite: Benchmarks for accelerator design and customized architectures,” in Proc. IISWC, 110–119, 2014.
[22]
ROSE compiler infrastructure, http://rosecompiler.org/.
[23]
Y. Shao et al., “Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures,” in Proc. ISCA, 97–108, 2014.
[24]
A. Verma et al., “Developing dynamic profiling and debugging support in OpenCL for FPGAs,” in Proc. DAC, 56–61, 2017.
[25]
Xilinx, AXI Reference Guide UG761, 2012, http://www.xilinx.com/.
[26]
Xilinx, SDAccel Development Environment, 2016, http://www.xilinx.com/.
[27]
Xilinx, Vivado High-level Synthesis UG902, 2016, http://www.xilinx.com/.
[28]
C. Zhang et al., “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. Int. Symp. FPGA, 161–170, 2015.
[29]
G. Zhong et al., “Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators,” in Proc. DAC, 136–141, 2016.

Cited By

View all
  • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 21-Apr-2022
  • (2022)HLS_ProfilerProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511684(187-198)Online publication date: 9-Apr-2022
  • (undefined)HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/3655627

Index Terms

  1. HLScope+,: Fast and accurate performance estimation for FPGA HLS
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
    Nov 2017
    1049 pages

    Publisher

    IEEE Press

    Publication History

    Published: 13 November 2017

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)FPGA HLS Today: Successes, Challenges, and OpportunitiesACM Transactions on Reconfigurable Technology and Systems10.1145/353077515:4(1-42)Online publication date: 21-Apr-2022
    • (2022)HLS_ProfilerProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511684(187-198)Online publication date: 9-Apr-2022
    • (undefined)HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow ArchitecturesACM Transactions on Reconfigurable Technology and Systems10.1145/3655627

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media