Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3207719.3207723acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscopesConference Proceedingsconference-collections
research-article

Automatic Kernel Fusion for Image Processing DSLs

Published: 28 May 2018 Publication History

Abstract

Programming image processing algorithms on hardware accelerators such as graphics processing units (GPUs) often exhibits a trade-off between software portability and performance portability. Domain-specific languages (DSLs) have proven to be a promising remedy, which enable optimizations and generation of efficient code from a concise, high-level algorithm representation.
The scope of this paper is an optimization framework for image processing DSLs in the form of a source-to-source compiler. To cope with the inter-kernel communication bound via global memory for GPU applications, kernel fusion is investigated as a primary optimization technique to improve temporal locality. In order to enable automatic kernel fusion, we analyze the fusibility of each kernel in the algorithm, in terms of data dependencies, resource utilization, and parallelism granularity. By combining the obtained information with the domain-specific knowledge captured in the DSL, a method to automatically fuse the suitable kernels is proposed and integrated into an open source DSL framework. The novel kernel fusion technique is evaluated on two filter-based image processing applications, for which speedups of up to 1.60 are obtained for an NVIDIA Geforce 745 graphics card target.

References

[1]
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006. ISBN: 0321486811.
[2]
J. Filipovič, M. Madzin, J. Fousek, and L. Matyska. Optimizing CUDA code by kernel fusion: Application on BLAS. The Journal of Supercomputing, 71(10):3934--3957, Oct. 2015. ISSN: 1573-0484.
[3]
C. Harris and M. Stephens. A combined corner and edge detector. In In Proceedings of the Fourth Alvey Vision Conference (AVC). (Manchester, UK), pages 147--151, Sept. 1988.
[4]
H. W. Jensen, S. Premoze, P. Shirley, W. B. Thompson, J. A. Ferwerda, and M. M. Stark. Night Rendering. Technical report UUCS-00-016, Computer Science Department, University of Utah, Aug. 2000.
[5]
D. Koch, F. Hannig, and D. Ziener, editors. FPGAs for Software Programmers. Springer, June 2016. 327 pages. ISBN: 978-3-319-26406-6.
[6]
R. Membarth, F. Hannig, J. Teich, M. Körner, and W. Eckert. Generating device-specific GPU code for local operators in medical imaging. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS). (Shanghai, China), pages 569--581. IEEE, May 21--25, 2012. ISBN: 978-0-7695-4675-9.
[7]
R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. HIPAcc: A domain-specific language and compiler for image processing. IEEE Transactions on Parallel and Distributed Systems, 27(1):210--224, Jan. 2016. ISSN: 1045-9219.
[8]
R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. Automatically scheduling Halide image processing pipelines. ACM Transactions on Graphics, 35(4):83:1--83:11, July 2016. ISSN: 0730-0301.
[9]
R. T. Mullapudi, V. Vasista, and U. Bondhugula. Polymage: Automatic optimization for image processing pipelines. ACM SIGARCH Computer Architecture News, 43(1):429--443, Mar. 2015. ISSN: 0163-5964.
[10]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). (Seattle, WA, USA), pages 519--530, New York, NY, USA. ACM, 2013. ISBN: 978-1-4503-2014-6.
[11]
O. Reiche, M. Özkan, R. Membarth, J. Teich, and F. Hannig. Generating FPGA-based image processing accelerators with Hipacc. In Proceedings of the International Conference on Computer Aided Design (ICCAD). (Irvine, CA, USA), pages 1026--1033. IEEE, Nov. 13--16, 2017. ISBN: 978-1-5386-3094-5.
[12]
O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). (New Dehli, India), 17:1--17:10. ACM, Oct. 12--17, 2014. ISBN: 978-1-4503-3051-0.
[13]
M. J. Shensa. The discrete wavelet transform: Wedding the à trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40(10):2464--2482, Oct. 1992. ISSN: 1053-587X.
[14]
G. Wang, Y. Lin, and W. Yi. Kernel fusion: An effective method for better power efficiency on multithreaded GPU. In Proceedings of the 2010 IEEE/ACM Int'L Conference on Green Computing and Communications & Int'L Conference on Cyber, Physical and Social Computing, GREENCOM-CPSCOM '10, pages 344--350, Washington, DC, USA. IEEE Computer Society, 2010. ISBN: 978-0-7695-4331-4.
[15]
H. Wu, G. Diamos, J. Wang, S. Cadambi, S. Yalamanchili, and S. Chakradhar. Optimizing data warehousing applications for GPUs using kernel fusion/fission. In Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pages 2433--2442, May 2012.

Cited By

View all
  • (2023)Transpilers: A Systematic Mapping Review of Their Usage in Research and IndustryApplied Sciences10.3390/app1306366713:6(3667)Online publication date: 13-Mar-2023
  • (2022)EasyView: Enabling and Scheduling Tensor Views in Deep Learning CompilersProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545037(1-11)Online publication date: 29-Aug-2022
  • (2022)Automated kernel fusion for GPU based on code motionProceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535078(151-161)Online publication date: 14-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SCOPES '18: Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems
May 2018
120 pages
ISBN:9781450357807
DOI:10.1145/3207719
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Domain-Specific Languages
  2. GPUs
  3. Image Processing
  4. Kernel Fusion

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SCOPES '18

Acceptance Rates

Overall Acceptance Rate 38 of 79 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)5
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Transpilers: A Systematic Mapping Review of Their Usage in Research and IndustryApplied Sciences10.3390/app1306366713:6(3667)Online publication date: 13-Mar-2023
  • (2022)EasyView: Enabling and Scheduling Tensor Views in Deep Learning CompilersProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545037(1-11)Online publication date: 29-Aug-2022
  • (2022)Automated kernel fusion for GPU based on code motionProceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535078(151-161)Online publication date: 14-Jun-2022
  • (2022)AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architecturesProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507723(359-373)Online publication date: 28-Feb-2022
  • (2020)Schedule Synthesis for Halide Pipelines on GPUsACM Transactions on Architecture and Code Optimization10.1145/340611717:3(1-25)Online publication date: 3-Aug-2020
  • (2020)Efficient parallel reduction on GPUs with HipaccProceedings of the 23th International Workshop on Software and Compilers for Embedded Systems10.1145/3378678.3391885(58-61)Online publication date: 25-May-2020
  • (2020)Unveiling kernel concurrency in multiresolution filters on GPUs with an image processing DSLProceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/3366428.3380773(11-20)Online publication date: 23-Feb-2020
  • (2020)Static Scheduling of Moldable Streaming Tasks with Task Fusion for Parallel Systems with DVFSIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.3013054(1-1)Online publication date: 2020
  • (2020)DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00041(118-127)Online publication date: Dec-2020
  • (2019)From loop fusion to kernel fusion: a domain-specific approach to locality optimizationProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314901(242-253)Online publication date: 16-Feb-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media