research-article

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs

Authors:

Uday BondhugulaAuthors Info & Claims

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Pages 327 - 338

https://doi.org/10.1145/2967938.2967969

Published: 11 September 2016 Publication History

Abstract

This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-wise, stencil, or other more complex operations on image pixels. Recent efforts have led to the development of domain-specific languages (DSL) and optimization frameworks for image processing pipelines. In this paper, we develop an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs. Our approach exploits reuse and available memory bandwidth (or chip resources) maximally. When compared to Darkroom, a state-of-the-art approach to compile high-level DSL to FPGAs, our approach (a) leads to designs that deliver significantly higher throughput, and (b) supports a greater variety of filters. Furthermore, the designs we generate obtain an improvement even over pre-optimized FPGA implementations provided by vendor libraries for some of the benchmarks.

References

[1]

C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA. In International workshop on Polyhedral Compilation Techniques (IMPACT), 2012.

Digital Library

[2]

J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, and S. Shukla. A compiler and runtime for heterogeneous computing. In Design Automation Conference, pages 271--276, 2012.

Digital Library

[3]

D. F. Bacon, R. M. Rabbah, and S. Shukla. FPGA programming for the masses. Commun. ACM, 56(4):56--63, 2013.

Digital Library

[4]

Blender Foundation. Big Buck Bunny, 2008. The movie. http://www.bigbuckbunny.org/ License: CC BY 3.0 https://creativecommons.org/licenses/by/3.0/.

[5]

U. Bondhugula, J. Ramanujam, and P. Sadayappan. Automatic mapping of nested loops to FPGAs. In ACM SIGPLAN PPoPP, Mar. 2007.

Digital Library

[6]

J. M. Cardoso and D. P. C. Compilation Techniques for Reconfigurable Architectures. Springer US, 2009.

Digital Library

[7]

Creative Commons Attribution 3.0 license (CC BY 3.0). https://creativecommons.org/licenses/by/3.0/.

[8]

Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0). https://creativecommons.org/licenses/by-sa/3.0/.

[9]

A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. A Constructive Solution to the Juggling Problem in Processor Array Synthesis. In IPDPS, pages 815--822, 2000.

Digital Library

[10]

C. Dase, J. Falcon, and B. MacCleery. Motorcycle control prototyping using an FPGA-based embedded control system. Control Systems, IEEE, 26(5):17--21, 2006.

[11]

P. C. Diniz, M. W. Hall, J. Park, B. So, and H. Ziegler. Bridging the Gap between Compilation and Synthesis in the DEFACTO System. In LCPC, pages 52--70, 2001.

Digital Library

[12]

M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In IEEE symposium on Field-Programmable Custom Computing Machines, pages 49--56, 2000.

Digital Library

[13]

Z. Guo, W. Najjar, and B. Buyukkurt. Efficient hardware code generation for FPGAs. ACM Trans. Archit. Code Optim., 5(1):6:1--6:26, May 2008.

Digital Library

[14]

A. Hagiescu, W.-F. Wong, D. Bacon, and R. Rabbah. A computing origami: Folding streams in FPGAs. In ACM/IEEE Design Automation Conference, pages 282--287, 2009.

Digital Library

[15]

J. Hegarty, J. Brunhaver, Z. DeVito, J. Ragan-Kelley, N. Cohen, S. Bell, A. Vasilyev, M. Horowitz, and P. Hanrahan. Darkroom: Compiling high-level image nprocessing code into hardware pipelines. ACM Trans. Graph., 33(4):144:1--144:11, 2014.

Digital Library

[16]

The Heterogeneous Image Processing Acceleration Framework. http://hipacc-lang.org/.

[17]

J. Holewinski, L.-N. Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Supercomputing, pages 311--320, 2012.

Digital Library

[18]

A. Hormati, M. Kudlur, S. Mahlke, D. Bacon, and R. Rabbah. Optimus: Efficient realization of streaming applications on FPGAs. In 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), pages 41--50, 2008.

Digital Library

[19]

B. K. P. Horn and B. G. Schunck. Determining optical flow. Artif. Intell., 17(1-3):185--203, 1981.

Digital Library

[20]

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007.

Digital Library

[21]

MATLAB HDL Coder. The MathWorks Inc. http://in.mathworks.com/products/hdl-coder//.

[22]

R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst., 27(1):210--224, 2016.

Digital Library

[23]

R. T. Mullapudi, V. Vasista, and U. Bondhugula. Polymage: Automatic optimization for image processing pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 429--443, 2015.

Digital Library

[24]

W. A. Najjar, W. Böhm, B. A. Draper, J. Hammes, R. Rinker, J. R. Beveridge, M. Chawathe, and C. Ross. High-level language abstraction for reconfigurable computing. Computer, 36(8):63--69, Aug. 2003.

Digital Library

[25]

R. S. Nikhil and Arvind. What is bluespec? SIGDA Newsl., 39(1):1--1, Jan. 2009.

Digital Library

[26]

M. Owaida, N. Bellas, K. Daloukas, and C. Antonopoulos. Synthesis of platform architectures from OpenCL programs. In IEEE Field-Programmable Custom Computing Machines (FCCM), pages 186--193, May 2011.

Digital Library

[27]

P. R. Panda. Systemc: A modeling platform supporting multiple design abstractions. In 14th International symposium on Systems Synthesis, pages 75--80, 2001.

Digital Library

[28]

A. Papakonstantinou, K. Gururaj, J. A. Stratton, D. Chen, J. Cong, and W. W. Hwu. Efficient compilation of CUDA kernels for high-performance computing on FPGAs. ACM Trans. Embedded Comput. Syst., 13(2):25, 2013.

Digital Library

[29]

PolyMage benchmarks, 2015. https://github.com/bondhugula/polymage-benchmarks.

[30]

PolyMage: A DSL and compiler for automatic optimization of image processing pipelines, 2015. http://mcl.csa.iisc.ernet.in/polymage.html.

[31]

L. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In ACM/SIGDA International symposium on FPGAs, pages 29--38, 2013.

Digital Library

[32]

J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013.

Digital Library

[33]

M. Ravishankar, J. Holewinski, and V. Grover. Forma: A dsl for image processing applications to target gpus and multi-core cpus. In 8th Workshop on General Purpose Processing Using GPUs, pages 109--120, 2015.

Digital Library

[34]

O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis, pages 17:1--17:10, 2014.

Digital Library

[35]

R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. R. Rau, D. Cronquist, and M. Sivaraman. PICO-NPA: High-Level synthesis of non-programmable hardware maccelerators. J. VLSI Signal Process. Syst., 31(2):127--142, 2002.

Digital Library

[36]

B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 165--176, 2002.

Digital Library

[37]

C. B. Spear. SystemVerilog for Verification: A Guide to Learning the Testbench Language Features. Springer, 2nd edition, 2010.

Digital Library

[38]

Adult tortoise, 2016. Finlay Cox. http://www.pasthorizonspr.com/wp-content/uploads/2016/02/tortoise.jpg License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/.

[39]

X. Zhou, J.-P. Giacalone, M. J. Garzarán, R. H. Kuhn, Y. Ni, and D. Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012.

Digital Library

Cited By

Kanetaka YTakagi HMaeda YFukushima N(2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3345660
Choudhury ZGulati APurini S(2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629523
Majumder KBondhugula UAamodt TSwift MJerger N(2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624767
Show More Cited By

Index Terms

A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging languages and compilers
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Programming Heterogeneous Systems from an Image Processing DSL

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, “programming,” and ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Compiling and Optimizing Image Processing Algorithms for FPGAs
CAMP '00: Proceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception (CAMP'00)

This paper presents a high-level language for expressing image processing algorithms, and an optimizing compiler that targets FPGAs. The language is called SA-C, and this paper focuses on the language features that 1) support image processing, and 2) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

September 2016

474 pages

ISBN:9781450341219

DOI:10.1145/2967938

General Chairs:
Ayal Zaks
Intel, Israel
,
Bilha Mendelson
Optitura, Israel
,
Program Chairs:
Lawrence Rauchwerger
Texas A&M University, USA
,
Wen-mei W. Hwu
University of Illinois at Urbana-Champaign, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '16

Sponsor:

IFIP WG 10.3
IEEE TCCA
SIGARCH
IEEE CS TCPP

PACT '16: International Conference on Parallel Architectures and Compilation

September 11 - 15, 2016

Haifa, Israel

Acceptance Rates

PACT '16 Paper Acceptance Rate 31 of 119 submissions, 26%;

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
504
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kanetaka YTakagi HMaeda YFukushima N(2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3345660
Choudhury ZGulati APurini S(2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629523
Majumder KBondhugula UAamodt TSwift MJerger N(2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624767
Xiao YPark DNiu ZHota ADehon A(2023)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3617837
Liu QSetter JHuff DStrange MFeng KHorowitz MRaina PKjolstad F(2023)Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/357290820:2(1-26)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3572908
Koul KMelchert JSreedhar KTruong LNyengele GZhang KLiu QSetter JChen PMei YStrange MDaly RDonovick CCarsello AKong TFeng KHuff DNayak ASetaluri RThomas JBhagdikar NDurst DMyers ZTsiskaridze NRichardson SBahr RFatahalian KHanrahan PBarrett CHorowitz MTorng CKjolstad FRaina P(2023)AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and CompilersACM Transactions on Embedded Computing Systems10.1145/353493322:2(1-34)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3534933
Li ANing AWentzlaff D(2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070989
Deiana ATran NAgar JBlott MDi Guglielmo GDuarte JHarris PHauck SLiu MNeubauer MNgadiuba JOgrenci-Memik SPierini MAarrestad TBähr SBecker JBerthold ABonventre RMüller Bravo TDiefenthaler MDong ZFritzsche NGholami AGovorkova EGuo DHazelwood KHerwig CKhan BKim SKlijnsma TLiu YLo KNguyen TPezzullo GRasoulinezhad SRivera RScholberg KSelig JSen SStrukov DTang WThais SUnger KVilalta Rvon Krosigk BWang SWarburton T(2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
https://doi.org/10.3389/fdata.2022.787421
Sozzo EConficconi DZeni ASalaris MSciuto DSantambrogio M(2022)Pushing the Level of Abstraction of Digital System Design: A Survey on How to Program FPGAsACM Computing Surveys10.1145/353298955:5(1-48)Online publication date: 3-Dec-2022
https://dl.acm.org/doi/10.1145/3532989
Xiao YMicallef EButt AHofmann MAlston MGoldsmith MMerczynski-Hait ADeHon AFalsafi BFerdman MLu SWenisch T(2022)PLD: fast FPGA compilation to make reconfigurable acceleration compatible with modern incremental refinement software developmentProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507740(933-945)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507740
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents