short-paper

Public Access

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration

Authors:

Jason CongAuthors Info & Claims

FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 51 - 57

https://doi.org/10.1145/3373087.3375320

Published: 24 February 2020 Publication History

Abstract

The domain-specific language (DSL) for image processing, Halide, has generated a lot of interest because of its capability of decoupling algorithms from schedules that allow programmers to search for optimized mappings targeting CPU and GPU. Unfortunately, while the Halide community has been growing rapidly, there is currently no way to easily map the vast number of Halide programs to efficient FPGA accelerators. To tackle this challenge, we propose HeteroHalide, an end-to-end system for compiling Halide programs to FPGA accelerators. This system makes use of both algorithm and scheduling information specified in a Halide program. Compared to the existing approaches, flow provided by HeteroHalide is significantly simplified, as it only requires moderate modifications for Halide programs on the scheduling part to be applicable to FPGAs. For part of the compilation flow, and to act as the intermediate representation (IR) of HeteroHalide, we choose HeteroCL, a heterogeneous programming infrastructure which supports multiple implementation backends (such as systolic arrays and stencil implementations). By using HeteroCL, HeteroHalide can generate efficient accelerators by choosing different backends according to the application. The performance evaluation compares the accelerator generated by HeteroHalide with multi-core CPU and an existing Halide-HLS compiler. As a result, HeteroHalide achieves 4.15\texttimes speedup on average over 28 CPU cores, and 2 \textasciitilde 4\texttimes throughput improvement compared with the existing Halide-HLS compiler.

References

[1]

Alan C. Bovik. The Essential Guide to Image Processing . Academic Press, 2009.

Digital Library

[2]

Yuze Chi, Jason Cong, Peng Wei, and Peipei Zhou. SODA : Stencil with Optimized Dataflow Architecture. In ICCAD, 2018.

Digital Library

[3]

Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs. In PACT, 2016.

Digital Library

[4]

Jason Cong, Muhuan Huang, Peichen Pan, Di Wu, and Peng Zhang. Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers. In ISLPED, 2016.

Digital Library

[5]

Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. TCAD, 2011.

Digital Library

[6]

Jason Cong and Jie Wang. PolySA: Polyhedral-Based Systolic Array Auto-Compilation. In ICCAD, 2018.

Digital Library

[7]

Halide developers. Halide. https://github.com/halide/Halide, 2019.

[8]

Hayit Greenspan, Bram Van Ginneken, and Ronald M Summers. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, 35(5):1153--1159, 2016.

[9]

James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines. In SIGGRAPH, 2014.

[10]

Shinpei Kato, Eijiro Takeuchi, Yoshio Ishiguro, Yoshiki Ninomiya, Kazuya Takeda, and Tsuyoshi Hamada. An open approach to autonomous vehicles. IEEE Micro, 35(6):60--68, 2015.

Digital Library

[11]

Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing. In FPGA, 2019.

Digital Library

[12]

Rastislav Lukac. Computational photography: methods and applications. CRC Press, 2016.

[13]

Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. Automatically Scheduling Halide Image Processing Pipelines. In SIGGRAPH, 2016.

Digital Library

[14]

Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. Programming heterogeneous systems from an image processing DSL. TACO, 14(3), 2017.

[15]

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. In SIGGRAPH, 2012.

Digital Library

[16]

Oliver Reiche, M. Akif Ozkan, Richard Membarth, Jürgen Teich, and Frank Hannig. Generating FPGA-based Image Processing Accelerators with Hipacc. In ICCAD, 2017.

[17]

Falcon Computing Solutions. https://www.falconcomputing.com, 2019.

[18]

Erik Todeschini. Augmented-reality signature capture, February 2 2016. US Patent 9,251,411.

[19]

Xilinx. Xilinx xfOpenCV Library. https://github.com/Xilinx/xfopencv, 2019.

Cited By

Wijnja SAlachiotis NEvans KSchenk O(2024)SoftCache: A Software Cache for PCIe-Attached Hardware AcceleratorsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659917(1-11)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3659914.3659917
Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Wong LZhang JLi J(2024)DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLSACM Transactions on Reconfigurable Technology and Systems10.1145/365003817:3(1-32)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1145/3650038
Show More Cited By

Index Terms

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging languages and compilers

Recommendations

Rigel: flexible multi-rate image processing hardware

Image processing algorithms implemented using custom hardware or FPGAs of can be orders-of-magnitude more energy efficient and performant than software. Unfortunately, converting an algorithm by hand to a hardware description language suitable for ...
SOFF: an OpenCL high-level synthesis framework for FPGAs
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a high-level synthesis framework ...
RIPL: A Parallel Image Processing Language for FPGAs
Special Section on FCCM 2016 and Regular Papers

Specialized FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real-time image processing. Programming challenges limit their wider use, because the implementation of FPGA ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2020

346 pages

ISBN:9781450370998

DOI:10.1145/3373087

General Chair:
Stephen Neuendorffer
Xilinx, USA
,
Program Chair:
Lesley Shannon
Simon Fraser University, Canada

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Intel
National Science Foundation

Conference

FPGA '20

Sponsor:

SIGDA

FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 23 - 25, 2020

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
1,373
Total Downloads

Downloads (Last 12 months)332
Downloads (Last 6 weeks)36

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wijnja SAlachiotis NEvans KSchenk O(2024)SoftCache: A Software Cache for PCIe-Attached Hardware AcceleratorsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659917(1-11)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3659914.3659917
Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Wong LZhang JLi J(2024)DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLSACM Transactions on Reconfigurable Technology and Systems10.1145/365003817:3(1-32)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1145/3650038
Huang BLyubomirsky SLi YHe MSmith GTambe TGaonkar ACanumalla VCheung AWei GGupta ATatlock ZMalik S(2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3639051
Honorat ADardaillon MMiomandre HNezan J(2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3626103
Dai TShi BLuo G(2024)Weave: Abstraction and Integration Flow for Accelerators of Generated ModulesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.332597243:3(854-867)Online publication date: Mar-2024
https://doi.org/10.1109/TCAD.2023.3325972
Willis BShrivastava AMack JDave SChakrabarti CBrunhaver J(2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: Jan-2024
https://doi.org/10.1109/TC.2023.3327504
Zhang WZhao JShen GChen QChen CGuo M(2024)An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00017(75-90)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00017
Kanetaka YTakagi HMaeda YFukushima N(2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3345660
de França AOliveira FGomes JNedjah N(2024)Hardware designs for convolutional neural networksIntegration, the VLSI Journal10.1016/j.vlsi.2023.10207494:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.vlsi.2023.102074
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents