Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3061639.3062251acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs

Published: 18 June 2017 Publication History

Abstract

The recent adoption of OpenCL programming model by FPGA vendors has realized the function portability of OpenCL workloads on FPGA. However, the poor performance portability prevents its wide adoption. To harness the power of FPGAs using OpenCL programming model, it is advantageous to design an analytical performance model to estimate the performance of OpenCL workloads on FPGAs and provide insights into the performance bottlenecks of OpenCL model on FPGA architecture. To this end, this paper presents FlexCL, an analytical performance model for OpenCL workloads on flexible FPGAs. FlexCL estimates the overall performance by tightly coupling the off-chip global memory and on-chip computation models based on the communication mode. Experiments demonstrate that with respect to RTL-based implementation, the average of absolute error of FlexCL is 9.5% and 8.7% for the Rodinia and PolyBench suite, respectively. Moreover, FlexCL enables rapid exploration of the design space within seconds instead of hours or days.

References

[1]
H. Esmaeilzadeh et al., "Dark Silicon and the End of Multicore Scaling," in ISCA'11, pp. 365--376, 2011.
[2]
A. Putnam et al., "A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services," in ISCA'14, pp. 13--24, 2014.
[3]
J. Ouyang et al., "SDA: Software-Defined Accelerator for Large-Scale DNN Systems," in HotChips'14, 2014.
[4]
J. Cong et al., "High-Level Synthesis for FPGAs: From Prototyping to Deployment," TCAD, vol. 30, no. 4, pp. 473--491, 2011.
[5]
Y. Liang et al., "High-level Synthesis: Productivity, Performance, and Software Constraints," JECE, 2012.
[6]
Y. S. Shao et al., "Aladdin: A Pre-RTL, Power-performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures," in ISCA '14, pp. 97--108, 2014.
[7]
B. So et al., "A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-based Systems," in PLDI, pp. 165--176, 2002.
[8]
S. Bilavarn et al., "Design Space Pruning Through Early Estimations of Area/Delay Tradeoffs for FPGA Implementations," TCAD, vol. 25, no. 10, pp. 1950--1968, 2006.
[9]
B. C. Schafer et al., "Divide and Conquer High-level Synthesis Design Space Exploration," TODAES, vol. 17, no. 3, pp. 29:1--29:19, 2012.
[10]
H.-Y. Liu and L. P. Carloni, "On learning-based methods for design-space exploration with High-Level Synthesis," in DAC, 2013.
[11]
N. K. Pham et al., "Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis," in DATE, pp. 157--162, 2015.
[12]
J. Villarreal et al., "Designing modular hardware accelerators in c with roccc 2.0," in FCCM'10.
[13]
F. Vahid et al., "Warp Processing: Dynamic Translation of Binaries to FPGA Circuits," Computer, vol. 41, pp. 40--46, July 2008.
[14]
G. Zhong et al., "Lin-analyzer: A High-level Performance Analysis Tool for FPGA-based Accelerators," in DAC'16, 2016.
[15]
N. Suda et al., "Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks," in FPGA, pp. 16--25, 2016.
[16]
Z. Wang et al., "A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs," in HPCA'16, pp. 97--108, 2016.
[17]
S. Wang et al., "A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model," in DAC'17.
[18]
A. Aho et al., Compilers: Principles, Techniques, and Tools. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1986.
[19]
J. Cong and Z. Zhang, "An Efficient and Versatile Scheduling Algorithm Based on SDC Formulation," in DAC'06, pp. 433--438, 2006.
[20]
Z. Zhang and B. Liu, "SDC-based Modulo Scheduling for Pipeline Synthesis," in ICCAD '13, pp. 211--218, 2013.
[21]
A. Canis et al., "Legup: High-level synthesis for fpga-based processor/accelerator systems," in FPGA '11, pp. 33--36, 2011.
[22]
T. M. Lattner, "An Implementation of Swing Modulo Scheduling with Extensions for Superblocks," Master's thesis, UIUC, 2005.
[23]
B. R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops," in MICRO'94, pp. 63--74, 1994.
[24]
J. Llosa et al., "Swing module scheduling: a lifetime-sensitive approach," in PACT'96, pp. 80--86, 1996.
[25]
H. Choi et al., "Memory Access Pattern-aware DRAM Performance Model for Multi-core Systems," in ISPASS'11.
[26]
S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," in IISWC'09, pp. 44--54, 2009.
[27]
S. G.-G. et al., "Auto-tuning a high-level language targeted to GPU codes," in InPar'12.

Cited By

View all
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
  • (2023)Multi-objective Design Space Exploration for High-Level Synthesis via Bayesian Optimization2023 International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA59274.2023.10218665(150-155)Online publication date: 8-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
June 2017
533 pages
ISBN:9781450349277
DOI:10.1145/3061639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DAC '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
  • (2023)Multi-objective Design Space Exploration for High-Level Synthesis via Bayesian Optimization2023 International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA59274.2023.10218665(150-155)Online publication date: 8-May-2023
  • (2022)Graph Neural Networks for High-Level Synthesis Design Space ExplorationACM Transactions on Design Automation of Electronic Systems10.1145/357092528:2(1-20)Online publication date: 24-Dec-2022
  • (2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
  • (2022)Sherlock: A Multi-Objective Design Space Exploration FrameworkACM Transactions on Design Automation of Electronic Systems10.1145/351147227:4(1-20)Online publication date: 8-Mar-2022
  • (2022)HECTORProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549370(1-9)Online publication date: 30-Oct-2022
  • (2022)AutoDSE: Enabling Software Programmers to Design Efficient FPGA AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/349453427:4(1-27)Online publication date: 12-Feb-2022
  • (2022)Efficient, Dynamic Multi-Task Execution on FPGA-Based Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310115333:3(710-722)Online publication date: 1-Mar-2022
  • (2022)ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00060(741-755)Online publication date: Apr-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media