Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3178487.3178507acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

An effective fusion and tile size model for optimizing image processing pipelines

Published: 10 February 2018 Publication History

Abstract

Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with parallelism and locality. This paper presents a new fusion algorithm for high-performance domain-specific compilers for image processing pipelines. The fusion algorithm is driven by dynamic programming and explores spaces of fusion possibilities not covered by previous approaches, and is driven by a cost function more concrete and precise in capturing optimization criteria than prior approaches. The fusion model is particularly tailored to the transformation and optimization sequence applied by PolyMage and Halide, two recent DSLs for image processing pipelines. Our model-driven technique when implemented in PolyMage provides significant improvements (up to 4.32X) over PolyMage's approach (which uses auto-tuning to aid its model), and over Halide's automatic approach (by up to 2.46X) on two state-of-the-art shared-memory multicore architectures.

Supplementary Material

Artifacts Available (polymage-ppopp-2018-ae-v1.0.0.zip)
PolyMage PPoPP 2018 artifact

References

[1]
Protonu Basu, Anand Venkat, Mary W. Hall, Samuel W. Williams, Brian van Straalen, and Leonid Oliker. 2013. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid. In 20th International Conference on High Performance Computing (HiPC). 452--461.
[2]
Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In International conference on Parallel Architectures and Compilation Techniques. 343--352.
[3]
Guang R. Gao, R. Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective Loop Fusion for Array Contraction. In Languages and Compilers for Parallel Computing, 5th International Workshop. 281--295.
[4]
Halide on GitHub, MIT license 2017. Halide auto-scheduler. (2017). https://github.com/halide/Halide/tree/auto_scheduler commit 89679918b42eb14d358a8e6214755de1e42ff046, Dec 11, 2017.
[5]
Google Inc. 2017. XLA (Accelerated Linear Algebra) for TensorFlow. (2017). https://www.tensorflow.org/performance/xla/.
[6]
Ken Kennedy. 2001. Fast Greedy Weighted Fusion. International Journal of Parallel Programming 29, 5 (2001), 463--491.
[7]
Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Languages and Compilers for Parallel Computing. 301--320.
[8]
Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. 2007. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation (PLDI).
[9]
Nimrod Megiddo and Vivek Sarkar. 1997. Optimal Weighted Loop Fusion for Parallel Programs. In ACM Symposium on Parallel Algorithms and Architectures (SPAA '97). 282--291.
[10]
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. SIGGRAPH 2016/ACM Trans. Graph. 35, 4 (July 2016), 83:1--83:11.
[11]
Ravi Teja Mullapudi, Vinay Vasista, and Uday. Bondhugula. 2015. PolyMage: Automatic Optimization for Image Processing Pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 429--443.
[12]
Catherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, and Jeffrey Hittinger. 2014. A Study on Balancing Parallelism, Data Locality, and Recomputation in Existing PDE Solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 793--804.
[13]
PolyMage project, Apache 2.0 license 2016. PolyMage. (2016). https://bitbucket.org/udayb/polymage commit 0ff0b46456605a5579db09c6ef98cb247dd2131d, Dec 16, 2016.
[14]
Apan Qasem and Ken Kennedy. 2006. Profitable loop fusion and tiling using model-driven empirical search. In International Conference on Supercomputing (ICS). 249--258.
[15]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Transactions on Graphics 31, 4 (2012), 32:1--32:12.
[16]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation. 519--530.
[17]
István Z. Reguly, Gihan R. Mudalige, and Mike B. Giles. 2017. Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS. CoRR abs/1704.00693 (2017). arXiv:1704.00693 http://arxiv.org/abs/1704.00693
[18]
Gerald Roth and Ken Kennedy. 1998. Loop Fusion in High Performance Fortran. In International conference on Supercomputing, ICS 1998. 125--132.
[19]
M. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In ACM SIGPLAN symposium on Programming Languages Design and Implementation. 30--44.
[20]
David Wonnacott. 1999. Time Skewing for Parallel Computers. In In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 477--480.
[21]
Qing Yi and Ken Kennedy. 2004. Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion. IJHPCA 18, 2 (2004), 237--253.
[22]
Xing Zhou, Jean-Pierre Giacalone, María Jesús Garzarán, Robert H. Kuhn, Yang Ni, and David Padua. 2012. Hierarchical Overlapped Tiling. In International symposium on Code Generation and Optimization (CGO). 207--218.

Cited By

View all
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • (2021)AKG: automatic kernel generation for neural processing units using polyhedral transformationsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454106(1233-1248)Online publication date: 19-Jun-2021
  • (2021)A practical tile size selection model for affine loop nestsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3462213(27-39)Online publication date: 3-Jun-2021
  • Show More Cited By

Index Terms

  1. An effective fusion and tile size model for optimizing image processing pipelines

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    February 2018
    442 pages
    ISBN:9781450349826
    DOI:10.1145/3178487
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 53, Issue 1
      PPoPP '18
      January 2018
      426 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3200691
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication Notes

    Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

    Publication History

    Published: 10 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. fusion
    2. image processing pipelines
    3. tiling

    Qualifiers

    • Research-article

    Conference

    PPoPP '18

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
    • (2021)AKG: automatic kernel generation for neural processing units using polyhedral transformationsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454106(1233-1248)Online publication date: 19-Jun-2021
    • (2021)A practical tile size selection model for affine loop nestsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3462213(27-39)Online publication date: 3-Jun-2021
    • (2021)Inter-loop optimization in RAJA using loop chainsProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3461665(1-12)Online publication date: 3-Jun-2021
    • (2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
    • (2021)Optimization of Node-clustering-based DAG partition targeting NVDLA Architecture2021 IEEE 14th International Conference on ASIC (ASICON)10.1109/ASICON52560.2021.9620327(1-4)Online publication date: 26-Oct-2021
    • (2021)A Structured Grid Solver with Polyhedral+Dataflow RepresentationLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_10(127-146)Online publication date: 26-Mar-2021
    • (2020)The best of both worldsProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437635(1-6)Online publication date: 20-Jul-2020
    • (2020)Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict GraphsACM Transactions on Architecture and Code Optimization10.1145/341651017:4(1-26)Online publication date: 30-Sep-2020
    • (2020)Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414649(317-328)Online publication date: 30-Sep-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media