research-article

FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs

Authors:

Wei ZhangAuthors Info & Claims

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Article No.: 27, Pages 1 - 6

https://doi.org/10.1145/3061639.3062251

Published: 18 June 2017 Publication History

Abstract

The recent adoption of OpenCL programming model by FPGA vendors has realized the function portability of OpenCL workloads on FPGA. However, the poor performance portability prevents its wide adoption. To harness the power of FPGAs using OpenCL programming model, it is advantageous to design an analytical performance model to estimate the performance of OpenCL workloads on FPGAs and provide insights into the performance bottlenecks of OpenCL model on FPGA architecture. To this end, this paper presents FlexCL, an analytical performance model for OpenCL workloads on flexible FPGAs. FlexCL estimates the overall performance by tightly coupling the off-chip global memory and on-chip computation models based on the communication mode. Experiments demonstrate that with respect to RTL-based implementation, the average of absolute error of FlexCL is 9.5% and 8.7% for the Rodinia and PolyBench suite, respectively. Moreover, FlexCL enables rapid exploration of the design space within seconds instead of hours or days.

References

[1]

H. Esmaeilzadeh et al., "Dark Silicon and the End of Multicore Scaling," in ISCA'11, pp. 365--376, 2011.

Digital Library

[2]

A. Putnam et al., "A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services," in ISCA'14, pp. 13--24, 2014.

Digital Library

[3]

J. Ouyang et al., "SDA: Software-Defined Accelerator for Large-Scale DNN Systems," in HotChips'14, 2014.

[4]

J. Cong et al., "High-Level Synthesis for FPGAs: From Prototyping to Deployment," TCAD, vol. 30, no. 4, pp. 473--491, 2011.

Digital Library

[5]

Y. Liang et al., "High-level Synthesis: Productivity, Performance, and Software Constraints," JECE, 2012.

Digital Library

[6]

Y. S. Shao et al., "Aladdin: A Pre-RTL, Power-performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures," in ISCA '14, pp. 97--108, 2014.

Digital Library

[7]

B. So et al., "A Compiler Approach to Fast Hardware Design Space Exploration in FPGA-based Systems," in PLDI, pp. 165--176, 2002.

Digital Library

[8]

S. Bilavarn et al., "Design Space Pruning Through Early Estimations of Area/Delay Tradeoffs for FPGA Implementations," TCAD, vol. 25, no. 10, pp. 1950--1968, 2006.

Digital Library

[9]

B. C. Schafer et al., "Divide and Conquer High-level Synthesis Design Space Exploration," TODAES, vol. 17, no. 3, pp. 29:1--29:19, 2012.

Digital Library

[10]

H.-Y. Liu and L. P. Carloni, "On learning-based methods for design-space exploration with High-Level Synthesis," in DAC, 2013.

Digital Library

[11]

N. K. Pham et al., "Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis," in DATE, pp. 157--162, 2015.

Digital Library

[12]

J. Villarreal et al., "Designing modular hardware accelerators in c with roccc 2.0," in FCCM'10.

Digital Library

[13]

F. Vahid et al., "Warp Processing: Dynamic Translation of Binaries to FPGA Circuits," Computer, vol. 41, pp. 40--46, July 2008.

Digital Library

[14]

G. Zhong et al., "Lin-analyzer: A High-level Performance Analysis Tool for FPGA-based Accelerators," in DAC'16, 2016.

Digital Library

[15]

N. Suda et al., "Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks," in FPGA, pp. 16--25, 2016.

Digital Library

[16]

Z. Wang et al., "A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs," in HPCA'16, pp. 97--108, 2016.

[17]

S. Wang et al., "A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs using OpenCL Model," in DAC'17.

Digital Library

[18]

A. Aho et al., Compilers: Principles, Techniques, and Tools. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1986.

Digital Library

[19]

J. Cong and Z. Zhang, "An Efficient and Versatile Scheduling Algorithm Based on SDC Formulation," in DAC'06, pp. 433--438, 2006.

Digital Library

[20]

Z. Zhang and B. Liu, "SDC-based Modulo Scheduling for Pipeline Synthesis," in ICCAD '13, pp. 211--218, 2013.

Digital Library

[21]

A. Canis et al., "Legup: High-level synthesis for fpga-based processor/accelerator systems," in FPGA '11, pp. 33--36, 2011.

Digital Library

[22]

T. M. Lattner, "An Implementation of Swing Modulo Scheduling with Extensions for Superblocks," Master's thesis, UIUC, 2005.

[23]

B. R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops," in MICRO'94, pp. 63--74, 1994.

Digital Library

[24]

J. Llosa et al., "Swing module scheduling: a lifetime-sensitive approach," in PACT'96, pp. 80--86, 1996.

Digital Library

[25]

H. Choi et al., "Memory Access Pattern-aware DRAM Performance Model for Multi-core Systems," in ISPASS'11.

Digital Library

[26]

S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," in IISWC'09, pp. 44--54, 2009.

Digital Library

[27]

S. G.-G. et al., "Auto-tuning a high-level language targeted to GPU codes," in InPar'12.

Cited By

Xiao YLuo ZZhou KLiang YZhang ZPutnam A(2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637561
Du ZZhang QLin MLi SLi XJu L(2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3179323
Kuang HWang L(2023)Multi-objective Design Space Exploration for High-Level Synthesis via Bayesian Optimization2023 International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA59274.2023.10218665(150-155)Online publication date: 8-May-2023
https://doi.org/10.1109/ISEDA59274.2023.10218665
Show More Cited By

Recommendations

Exploring alternative flexible OpenCL (FlexCL) core designs in FPGA-based MPSoC systems
RAPIDO '13: Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools

Open Compute Language (OpenCL) has been proposed as a platform-independent parallel execution framework to target multicores, graphics processing units (GPUs), digital signal processors (DSPs), and other custom accelerators. Traditionally OpenCL is ...
FlexCL: A Model of Performance and Power for OpenCL Workloads on FPGAs
Hardware acceleration is a promising trend for the energy and thermally constrained systems. The programmable nature of FPGAs allows it to deliver high performance and energy efficient solution. Unfortunately, the traditional RTL-based synthesis flow of ...
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

June 2017

533 pages

ISBN:9781450349277

DOI:10.1145/3061639

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '17

Sponsor:

EDAC
SIGDA

DAC '17: The 54th Annual Design Automation Conference 2017

June 18 - 22, 2017

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
339
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)3

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xiao YLuo ZZhou KLiang YZhang ZPutnam A(2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637561
Du ZZhang QLin MLi SLi XJu L(2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3179323
Kuang HWang L(2023)Multi-objective Design Space Exploration for High-Level Synthesis via Bayesian Optimization2023 International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA59274.2023.10218665(150-155)Online publication date: 8-May-2023
https://doi.org/10.1109/ISEDA59274.2023.10218665
Ferretti LCini AZacharopoulos GAlippi CPozzi L(2022)Graph Neural Networks for High-Level Synthesis Design Space ExplorationACM Transactions on Design Automation of Electronic Systems10.1145/357092528:2(1-20)Online publication date: 24-Dec-2022
https://dl.acm.org/doi/10.1145/3570925
Chen XCheng FTan HChen YHe BWong WChen D(2022)ThunderGP: Resource-Efficient Graph Processing Framework on FPGAs with HLSACM Transactions on Reconfigurable Technology and Systems10.1145/351714115:4(1-31)Online publication date: 9-Dec-2022
https://dl.acm.org/doi/10.1145/3517141
Gautier QAlthoff ACrutchfield CKastner R(2022)Sherlock: A Multi-Objective Design Space Exploration FrameworkACM Transactions on Design Automation of Electronic Systems10.1145/351147227:4(1-20)Online publication date: 8-Mar-2022
https://dl.acm.org/doi/10.1145/3511472
Xu RXiao YLuo JLiang YMitra TYoung EXiong J(2022)HECTORProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549370(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549370
Sohrabizadeh AYu CGao MCong J(2022)AutoDSE: Enabling Software Programmers to Design Efficient FPGA AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/349453427:4(1-27)Online publication date: 12-Feb-2022
https://dl.acm.org/doi/10.1145/3494534
Minhas UWoods RNikolopoulos DKarakonstantis G(2022)Efficient, Dynamic Multi-Task Execution on FPGA-Based Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.310115333:3(710-722)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TPDS.2021.3101153
Ye HHao CCheng JJeong HHuang JNeuendorffer SChen D(2022)ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00060(741-755)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00060
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents