Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA45697.2020.00034acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

SOFF: an OpenCL high-level synthesis framework for FPGAs

Published: 23 September 2020 Publication History

Abstract

Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a high-level synthesis framework of OpenCL for FPGAs, called SOFF. It automatically synthesizes a datapath to execute many OpenCL kernel threads in a pipelined manner. It also synthesizes an efficient memory subsystem for the datapath based on the characteristics of OpenCL kernels. Unlike previous high-level synthesis techniques, we propose a formal way to handle variable-latency instructions, complex control flows, OpenCL barriers, and atomic operations that appear in real-world OpenCL kernels. SOFF is the first OpenCL framework that correctly compiles and executes all applications in the SPEC ACCEL benchmark suite except three applications that require more FPGA resources than are available. In addition, SOFF achieves the speedup of 1.33 over Intel FPGA SDK for OpenCL without any explicit user annotation or source code modification.

References

[1]
Amazon, "Amazon EC2 F1 instances," https://aws.amazon.com/ec2/instance-types/f1/.
[2]
M. Budiu, G. Venkataramani, T. Chelcea, and S. C. Goldstein, "Spatial computation," in Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004, pp. 14--26.
[3]
T. J. Callahan, J. R. Hauser, and J. Wawrzynek, "The garp architecture and c compiler," IEEE Computer, vol. 33, no. 4, pp. 62--69, 2000.
[4]
T. J. Callahan and J. Wawrzynek, "Instruction-level parallelism for reconfigurable computing," in Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications, From FPGAs to Computing Paradigm, 1998, pp. 248--257.
[5]
T. J. Callahan and J. Wawrzynek, "Adapting software pipelining for reconfigurable computing," in Proceedings of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2000, pp. 57--64.
[6]
A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. Anderson, S. Brown, and T. Czajkowski, "LegUp: High-level synthesis for FPGA-based processor/accelerator systems," in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2011, pp. 33--36.
[7]
J. M. P. Cardoso, P. C. Diniz, and M. Weinhardt, "Compiling for reconfigurable computing: A survey," ACM Computing Surveys, vol. 42, no. 4, pp. 13:1--13:65, 2010.
[8]
E. Cartwright, S. Ma, D. Andrews, and M. Huang, "Creating HW/SW co-designed MPSoPCs from high level programming models," in Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011, pp. 554--560.
[9]
J. Choi, K. Nam, A. Canis, J. Anderson, S. Brown, and T. Czajkowski, "Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems," in Proceedings of IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, 2012, pp. 17--24.
[10]
J. Cong, Y. Fan, G. Han, W. Jiang, and Z. Zhang, "Platform-based behavior-level and system-level synthesis," in Proceedings of 2006 IEEE International SOC Conference, 2006, pp. 199--202.
[11]
S. P. E. Corporation, "SPEC ACCEL," https://www.spec.org/accel/.
[12]
J. Cortadella, M. Kishinevsky, and B. Grundmann, "Synthesis of synchronous elastic architectures," in Proceedings of the 43rd Annual Design Automation Conference, 2006, pp. 657--662.
[13]
P. Coussy, D. D. Gajski, M. Meredith, and A. Takach, "An introduction to high-level synthesis," IEEE Design & Test of Computers, vol. 26, no. 4, pp. 8--17, 2009.
[14]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Transactions on Programming Languages and Systems, vol. 13, no. 4, pp. 451--490, 1991.
[15]
T. S. Czajkowski, U. Aydonat, D. Denisenko, J. Freeman, M. Kinsner, D. Neto, J. Wong, P. Yiannacouras, and D. P. Singh, "From OpenCL to high-performance hardware on FPGAs," in Proceedings of the 22nd International Conference on Field Programmable Logic and Applications, 2012, pp. 531--534.
[16]
T. S. Czajkowski, D. Neto, M. Kinsner, U. Aydonat, J. Wong, D. Denisenko, P. Yiannacouras, J. Freeman, D. P. Singh, and S. D. Brown, "OpenCL for FPGAs: Prototyping a compiler," in Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, 2012.
[17]
J. A. Fisher, "Trace scheduling: A technique for global microcode compaction," IEEE Transactions on Computers, vol. C-30, no. 7, pp. 478--490, 1981.
[18]
J. A. Fisher, "Very long instruction word architectures and the ELI-512," in Proceedings of the 10th Annual International Symposium on Computer Architecture, 1983, pp. 140--150.
[19]
J. Fowers, J.-Y. Kim, D. Burger, and S. Hauck, "A scalable high-bandwidth architecture for lossless compression on FPGAs," in Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015, pp. 52--59.
[20]
D. D. Gajski, N. D. Dutt, A. C.-H. Wu, and S. Y.-L. Lin, High-level Synthesis: Introduction to Chip and System Design. Norwell, MA, USA: Kluwer Academic Publishers, 1992.
[21]
S. Grauer-Gray and L.-N. Pouchet, "PolyBench/GPU," http://web.cse.ohio-state.edu/~pouchet.2/software/polybench/GPU/.
[22]
Z. Guo, B. Buyukkurt, W. Najjar, and K. Vissers, "Optimized generation of data-path from C codes for FPGAs," in Proceedings of the Conference on Design, Automation and Test in Europe, 2005, pp. 112--117.
[23]
S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, "SPARK: A high-level synthesis framework for applying parallelizing compiler transformations," in Proceedings of 16th International Conference on VLSI Design, 2003, pp. 461--466.
[24]
S. Hadjis, A. Canis, J. Anderson, J. Choi, K. Nam, S. Brown, and T. Czajkowski, "Impact of FPGA architecture on resource sharing in high-level synthesis," in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2012, pp. 111--114.
[25]
Intel, "Intel FPGA SDK for OpenCL," https://www.intel.com/content/www/us/en/software/programmable/sdk-for-opencl/overview.html.
[26]
Intel, "Intel Quartus Prime," https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/overview.html.
[27]
Intel, "Intel Stratix 10 fpgas overview," https://www.intel.com/content/www/us/en/products/programmable/fpga/stratix-10.html.
[28]
Intel, "Open programmable acceleration engine - documentation," https://opae.github.io/.
[29]
Intel, "Avalon interface specifications," 2019, https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf.
[30]
H. M. Jacobson, P. N. Kudva, P. Bose, P. W. Cook, S. E. Schuster, E. G. Mercer, and C. J. Myers, "Synchronous interlocked pipelines," in Proceedings of Eighth International Symposium on Asynchronous Circuits and Systems, 2002, pp. 3--12.
[31]
L. Josipovic, R. Ghosal, and P. Ienne, "Dynamically scheduled high-level synthesis," in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018, pp. 127--136.
[32]
Khronos Group, "OpenCL overview - the open standard for parallel programming of heterogeneous systems," https://www.khronos.org/opencl/.
[33]
Khronos OpenCL Working Group, "The OpenCL specification," 2012, https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf.
[34]
M. Lin, I. Lebedev, and J. Wawrzynek, "Openrcl: Low-power high-performance computing with reconfigurable devices," in Proceedings of the 2010 International Conference on Field Programmable Logic and Applications, 2010, pp. 458--463.
[35]
S. S. Muchnick, Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997.
[36]
K. G. Murty, Linear Programming. New York, NY, USA: John Wiley & Sons, 1983.
[37]
K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. S. Chung, "Accelerating deep convolutional neural networks using specialized hardware," Microsoft Research, Tech. Rep., 2015, https://www.microsoft.com/en-us/research/publication/accelerating-deep-convolutional-neural-networks-using-specialized-hardware/.
[38]
M. Owaida, N. Bellas, K. Daloukas, and C. D. Antonopoulos, "Synthesis of platform architectures from OpenCL programs," in Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011, pp. 186--193.
[39]
A. Papakonstantinou, K. Gururaj, J. A. Stratton, D. Chen, J. Cong, and W.-M. W. Hwu, "FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs," in Proceedings of the 2009 IEEE 7th Symposium on Application Specific Processors, 2009, pp. 35--42.
[40]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A reconfigurable fabric for accelerating large-scale datacenter services," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, 2014, pp. 13--24.
[41]
A. Putnam, S. Eggers, D. Bennett, E. Dellinger, J. Mason, H. Styles, P. Sundararajan, and R. Wittig, "Performance and power of cache-based reconfigurable computing," in Proceedings of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 395--405.
[42]
Seoul National University, "SnuCL suite: OpenCL frameworks and tools for heterogeneous clusters," http://snucl.snu.ac.kr.
[43]
K. Shagrithaya, K. Kepa, and P. Athanas, "Enabling development of OpenCL applications on FPGA platforms," in Proceedings of the 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, 2013, pp. 26--30.
[44]
M. Sharir, "Structural analysis: a new approach to flow analysis in optimizing compilers," Computer Languages, vol. 5, no. 3--4, pp. 141--153, 1980.
[45]
B. Steensgaard, "Points-to analysis in almost linear time," in Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1996, pp. 32--41.
[46]
J. L. Tripp, K. D. Peterson, C. Ahrens, J. D. Poznanovic, and M. B. Gokhale, "Trident: An FPGA compiler framework for floating-point algorithms," in Proceedings of2005 International Conference on Field Programmable Logic and Applications, 2005, pp. 317--322.
[47]
M. Weinhardt and W. Luk, "Pipeline vectorization," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 2, pp. 234--248, 2001.
[48]
Xilinx, "SDAccel development environment," https://www.xilinx.com/products/design-tools/software-zone/sdaccel.html.
[49]
Xilinx, "Virtex UltraScale+," https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html.
[50]
Xilinx, "Vivado design suite," https://www.xilinx.com/products/design-tools/vivado.html.
[51]
Xilinx, "AXI reference guide," 2017, https://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/latest/ug1037-vivado-axi-reference-guide.pdf.

Cited By

View all
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2023)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 14-Sep-2023
  • (2022)mu-grindProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569671(346-358)Online publication date: 8-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture
May 2020
1152 pages
ISBN:9781728146614

Sponsors

In-Cooperation

  • IEEE

Publisher

IEEE Press

Publication History

Published: 23 September 2020

Check for updates

Author Tags

  1. FPGAs
  2. accelerator architectures
  3. high level synthesis
  4. parallel programming
  5. pipeline processing

Qualifiers

  • Research-article

Conference

ISCA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2023)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 14-Sep-2023
  • (2022)mu-grindProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569671(346-358)Online publication date: 8-Oct-2022
  • (2022)Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocksProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569659(398-411)Online publication date: 8-Oct-2022
  • (2022)PLD: fast FPGA compilation to make reconfigurable acceleration compatible with modern incremental refinement software developmentProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507740(933-945)Online publication date: 28-Feb-2022
  • (2022)Memory-Aware Functional IR for Higher-Level Synthesis of AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/350176819:2(1-26)Online publication date: 31-Jan-2022
  • (2021)Automatic mapping and code optimization for OpenCL kernels on FT-matrix architecture (WIP paper)Proceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3461648.3463845(37-41)Online publication date: 22-Jun-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media