research-article

An effective fusion and tile size model for optimizing image processing pipelines

Authors:

Abhinav Jangda,

Uday BondhugulaAuthors Info & Claims

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 261 - 275

https://doi.org/10.1145/3178487.3178507

Published: 10 February 2018 Publication History

Abstract

Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with parallelism and locality. This paper presents a new fusion algorithm for high-performance domain-specific compilers for image processing pipelines. The fusion algorithm is driven by dynamic programming and explores spaces of fusion possibilities not covered by previous approaches, and is driven by a cost function more concrete and precise in capturing optimization criteria than prior approaches. The fusion model is particularly tailored to the transformation and optimization sequence applied by PolyMage and Halide, two recent DSLs for image processing pipelines. Our model-driven technique when implemented in PolyMage provides significant improvements (up to 4.32X) over PolyMage's approach (which uses auto-tuning to aid its model), and over Halide's automatic approach (by up to 2.46X) on two state-of-the-art shared-memory multicore architectures.

Supplementary Material

Artifacts Available (polymage-ppopp-2018-ae-v1.0.0.zip)

PolyMage PPoPP 2018 artifact

Download
38.96 MB

References

[1]

Protonu Basu, Anand Venkat, Mary W. Hall, Samuel W. Williams, Brian van Straalen, and Leonid Oliker. 2013. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid. In 20th International Conference on High Performance Computing (HiPC). 452--461.

[2]

Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In International conference on Parallel Architectures and Compilation Techniques. 343--352.

Digital Library

[3]

Guang R. Gao, R. Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective Loop Fusion for Array Contraction. In Languages and Compilers for Parallel Computing, 5th International Workshop. 281--295.

Digital Library

[4]

Halide on GitHub, MIT license 2017. Halide auto-scheduler. (2017). https://github.com/halide/Halide/tree/auto_scheduler commit 89679918b42eb14d358a8e6214755de1e42ff046, Dec 11, 2017.

[5]

Google Inc. 2017. XLA (Accelerated Linear Algebra) for TensorFlow. (2017). https://www.tensorflow.org/performance/xla/.

[6]

Ken Kennedy. 2001. Fast Greedy Weighted Fusion. International Journal of Parallel Programming 29, 5 (2001), 463--491.

Digital Library

[7]

Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Languages and Compilers for Parallel Computing. 301--320.

Digital Library

[8]

Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. 2007. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation (PLDI).

Digital Library

[9]

Nimrod Megiddo and Vivek Sarkar. 1997. Optimal Weighted Loop Fusion for Parallel Programs. In ACM Symposium on Parallel Algorithms and Architectures (SPAA '97). 282--291.

Digital Library

[10]

Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. SIGGRAPH 2016/ACM Trans. Graph. 35, 4 (July 2016), 83:1--83:11.

Digital Library

[11]

Ravi Teja Mullapudi, Vinay Vasista, and Uday. Bondhugula. 2015. PolyMage: Automatic Optimization for Image Processing Pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 429--443.

Digital Library

[12]

Catherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, and Jeffrey Hittinger. 2014. A Study on Balancing Parallelism, Data Locality, and Recomputation in Existing PDE Solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 793--804.

Digital Library

[13]

PolyMage project, Apache 2.0 license 2016. PolyMage. (2016). https://bitbucket.org/udayb/polymage commit 0ff0b46456605a5579db09c6ef98cb247dd2131d, Dec 16, 2016.

[14]

Apan Qasem and Ken Kennedy. 2006. Profitable loop fusion and tiling using model-driven empirical search. In International Conference on Supercomputing (ICS). 249--258.

Digital Library

[15]

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Transactions on Graphics 31, 4 (2012), 32:1--32:12.

Digital Library

[16]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation. 519--530.

Digital Library

[17]

István Z. Reguly, Gihan R. Mudalige, and Mike B. Giles. 2017. Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS. CoRR abs/1704.00693 (2017). arXiv:1704.00693 http://arxiv.org/abs/1704.00693

[18]

Gerald Roth and Ken Kennedy. 1998. Loop Fusion in High Performance Fortran. In International conference on Supercomputing, ICS 1998. 125--132.

Digital Library

[19]

M. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In ACM SIGPLAN symposium on Programming Languages Design and Implementation. 30--44.

Digital Library

[20]

David Wonnacott. 1999. Time Skewing for Parallel Computers. In In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 477--480.

Digital Library

[21]

Qing Yi and Ken Kennedy. 2004. Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion. IJHPCA 18, 2 (2004), 237--253.

Digital Library

[22]

Xing Zhou, Jean-Pierre Giacalone, María Jesús Garzarán, Robert H. Kuhn, Yang Ni, and David Padua. 2012. Hierarchical Overlapped Tiling. In International symposium on Code Generation and Optimization (CGO). 207--218.

Digital Library

Cited By

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Zhao JLi BNie WGeng ZZhang RGao XCheng BWu CCheng YLi ZDi PZhang KJin XFreund SYahav E(2021)AKG: automatic kernel generation for neural processing units using polyhedral transformationsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454106(1233-1248)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454106
Narasimhan KAcharya ABaid ABondhugula UZhou HMoreira JMueller FEtsion Y(2021)A practical tile size selection model for affine loop nestsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3462213(27-39)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3462213
Show More Cited By

Index Terms

An effective fusion and tile size model for optimizing image processing pipelines
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

An Effective Fusion and Tile Size Model for PolyMage

Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with ...
Automatically scheduling halide image processing pipelines

The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a ...
An effective fusion and tile size model for optimizing image processing pipelines
PPoPP '18

Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2018

442 pages

ISBN:9781450349826

DOI:10.1145/3178487

General Chair:
Andreas Krall
Vienna University of Technology, Austria
,
Program Chair:
Thomas R. Gross
ETH Zürich, Switzerland

ACM SIGPLAN Notices Volume 53, Issue 1
PPoPP '18
January 2018
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3200691
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 10 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

PPoPP '18

Sponsor:

PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 24 - 28, 2018

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
953
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)5

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Zhao JLi BNie WGeng ZZhang RGao XCheng BWu CCheng YLi ZDi PZhang KJin XFreund SYahav E(2021)AKG: automatic kernel generation for neural processing units using polyhedral transformationsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454106(1233-1248)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454106
Narasimhan KAcharya ABaid ABondhugula UZhou HMoreira JMueller FEtsion Y(2021)A practical tile size selection model for affine loop nestsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3462213(27-39)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3462213
Neth BScogland Tde Supinski BStrout MZhou HMoreira JMueller FEtsion Y(2021)Inter-loop optimization in RAJA using loop chainsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3461665(1-12)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3461665
Abdelaal KKong MZhou HMoreira JMueller FEtsion Y(2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460369
Hu SDu GLi JLi ZNi WYin YWang X(2021)Optimization of Node-clustering-based DAG partition targeting NVDLA Architecture2021 IEEE 14th International Conference on ASIC (ASICON)10.1109/ASICON52560.2021.9620327(1-4)Online publication date: 26-Oct-2021
https://doi.org/10.1109/ASICON52560.2021.9620327
Davis EOlschanowsky CVan Straalen B(2021)A Structured Grid Solver with Polyhedral+Dataflow RepresentationLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_10(127-146)Online publication date: 26-Mar-2021
https://doi.org/10.1007/978-3-030-72789-5_10
Qiao BÖzkan MTeich JHannig FLi Z(2020)The best of both worldsProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437635(1-6)Online publication date: 20-Jul-2020
https://dl.acm.org/doi/10.5555/3437539.3437635
Acharya ABondhugula UCohen A(2020)Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict GraphsACM Transactions on Architecture and Code Optimization10.1145/341651017:4(1-26)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3416510
Jangda AGuha ASarkar VKim H(2020)Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414649(317-328)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3410463.3414649
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents