research-article

Open access

Declarative Loop Tactics for Domain-specific Optimization

Authors:

Lorenzo Chelini,

Oleksandr Zinenko,

Tobias Grosser,

Henk CorporaalAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 4

Article No.: 55, Pages 1 - 25

https://doi.org/10.1145/3372266

Published: 26 December 2019 Publication History

All formats PDF

Abstract

Increasingly complex hardware makes the design of effective compilers difficult. To reduce this problem, we introduce Declarative Loop Tactics, which is a novel framework of composable program transformations based on an internal tree-like program representation of a polyhedral compiler. The framework is based on a declarative C++ API built around easy-to-program matchers and builders, which provide the foundation to develop loop optimization strategies. Using our matchers and builders, we express computational patterns and core building blocks, such as loop tiling, fusion, and data-layout transformations, and compose them into algorithm-specific optimizations. Declarative Loop Tactics (Loop Tactics for short) can be applied to many domains. For two of them, stencils and linear algebra, we show how developers can express sophisticated domain-specific optimizations as a set of composable transformations or calls to optimized libraries. By allowing developers to add highly customized optimizations for a given computational pattern, we expect our approach to reduce the need for DSLs and to extend the range of optimizations that can be performed by a current general-purpose compiler.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 265--283.

Digital Library

[2]

Varun Agrawal, Abhiroop Dabral, Tapti Palit, Yongming Shen, and Michael Ferdman. 2015. Architectural support for dynamic linking. ACM SIGARCH Comput. Archit. News, Vol. 43. ACM, 691--702.

Digital Library

[3]

Manuel Arenaz, Juan Touriño, and Ramon Doallo. 2008. XARK: An extensible framework for automatic recognition of computational kernels. ACM Trans. Prog. Lang. Syst. 30, 6 (2008), 32.

Digital Library

[4]

Lénaïc Bagnères, Oleksandr Zinenko, Stéphane Huot, and Cédric Bastoul. 2016. Opening polyhedral compiler’s black box. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’16). ACM, New York, NY, 128--138.

Digital Library

[5]

William Blume, Rudolf Eigenmann, Keith Faigin, John Grout, Jay Hoeflinger, David Padua, Paul Petersen, William Pottenger, Lawrence Rauchwerger, Peng Tu, and Stephen Weatherford. 1995. Polaris: Improving the effectiveness of parallelizing compilers. In Languages and Compilers for Parallel Computing, Keshav Pingali, Utpal Banerjee, David Gelernter, Alex Nicolau, and David Padua (Eds.). Springer Berlin, 141--154.

[6]

U. Bondhugula, S. Dash, O. Gunluk, and L. Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). 343--352.

[7]

Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical and fully automatic polyhedral program optimization system. In Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation (PLDI’08).

[8]

Uday Bondhugula, Albert Hartono, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Not. 43, 6 (2008), 101--113.

Digital Library

[9]

Martin Bravenboer, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Visser. 2008. Stratego/XT 0.17. A language and toolset for program transformation. Sci. Comput. Prog. 72, 1 (2008), 52--70.

Digital Library

[10]

Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A Framework for Composing High-level Loop Transformations. Technical Report. Citeseer. USC Computer Science.

[11]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 578--594.

[12]

B. Di Martino and G. Iannello. 1996. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th Workshop on Program Comprehension (WPC’96) 164--174.

[13]

Sebastien Donadio, James Brodman, Thomas Roeder, Kamen Yotov, Denis Barthou, Albert Cohen, María Jesús Garzarán, David Padua, and Keshav Pingali. 2006. A language for the compact representation of multiple program versions. In Languages and Compilers for Parallel Computing, Eduard Ayguadé, Gerald Baumgartner, J. Ramanujam, and P. Sadayappan (Eds.). Springer Berlin, 136--151.

[14]

Paul Feautrier and Christian Lengauer. 2011. Polyhedron Model. Springer, Boston, MA, 1581--1592.

[15]

Roman Gareev, Tobias Grosser, and Michael Kruse. 2018. High-performance generalized tensor operations: A compiler-oriented approach. ACM Trans. Archit. Code Optim. 15, 3, Article 34 (Sept. 2018), 27 pages.

Digital Library

[16]

Michael P. Gerlek, Eric Stoltz, and Michael Wolfe. 1995. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA form. ACM Trans. Prog. Lang. Syst. 17, 1 (Jan. 1995), 85--122.

Digital Library

[17]

Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Prog. 34, 3 (June 2006), 261--317.

Digital Library

[18]

Tobias Grosser, Albert Cohen, Justin Holewinski, P. Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY, Article 66, 10 pages.

Digital Library

[19]

Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly-Performing polyhedral optimizations on a low-level intermediate representation. Parallel Proc. Lett. 22, 4 (2012), 1250010.

[20]

Tobias Grosser, J. Ramanujam, Louis-Noel Pouchet, P. Sadayappan, and Sebastian Pop. 2015. Optimistic delinearization of parametrically sized arrays. In Proceedings of the 29th ACM International Conference on Supercomputing (ICS’15). ACM, New York, NY, 351--360.

Digital Library

[21]

Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. 2011. Polly-polyhedral optimization in LLVM. In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT’11), Vol. 2011. 1.

[22]

Tom Henretty, Kevin Stock, Louis-Noël Pouchet, Franz Franchetti, J. Ramanujam, and P. Sadayappan. 2011. Data layout transformation for stencil computations on short-vector SIMD architectures. In Proceedings of the 20th International Conference on Compiler Construction: Part of the Joint European Conferences on Theory and Practice of Software (CC’11/ETAPS’11). Springer-Verlag, 225--245. http://dl.acm.org/citation.cfm?id=1987237.1987255.

[23]

Wayne Kelly and William Pugh. 1998. A Framework for Unifying Reordering Transformations. Technical Report. University of Maryland.

[24]

Christoph W. Kessler. 1996. Pattern-driven automatic parallelization. Sci. Prog. 5, 3 (Aug. 1996), 251--274.

Digital Library

[25]

C. W. Kessler and C. H. Smith. 1999. The SPARAMAT approach to automatic comprehension of sparse matrix computations. In Proceedings of the 7th International Workshop on Program Comprehension. 200--207.

[26]

Malik Khan, Protonu Basu, Gabe Rudy, Mary Hall, Chun Chen, and Jacqueline Chame. 2013. A script-based autotuning compiler system to generate high-performance CUDA code. ACM Trans. Archit. Code Optim. 9, 4, Article 31 (Jan. 2013), 25 pages.

Digital Library

[27]

Martin Kong and Louis-Noël Pouchet. 2019. Model-driven transformations for multi-and many-core CPUs. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 469--484.

Digital Library

[28]

W. Kozaczynski, J. Ning, and A. Engberts. 1992. Program concept recognition and transformation. IEEE Trans. Softw. Eng. 18, 12 (Dec. 1992), 1065--1075.

Digital Library

[29]

Michael Kruse and Hal Finkel. 2018. A proposal for loop-transformation pragmas. In Evolving OpenMP for Evolving Architectures, Bronis R. de Supinski, Pedro Valero-Lara, Xavier Martorell, Sergi Mateo Bellido, and Jesus Labarta (Eds.). Springer International Publishing, Cham, 37--52.

[30]

Sang-Ik Lee, Troy A. Johnson, and Rudolf Eigenmann. 2004. Cetus—An extensible compiler infrastructure for source-to-source transformation. In Languages and Compilers for Parallel Computing, Lawrence Rauchwerger (Ed.). Springer Berlin, 539--553.

[31]

Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Orti. 2016. Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43, 2, Article 12 (Aug. 2016), 18 pages.

Digital Library

[32]

S. Markidis, S. W. D. Chien, E. Laure, I. B. Peng, and J. S. Vetter. 2018. NVIDIA tensor core programmability, performance precision. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’18). 522--531.

[33]

Robert Metzger and Zhaofang Wen. 2000. Automatic Algorithm Recognition and Replacement: A New Approach to Program Optimization. The MIT Press.

[34]

Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. SIGARCH Comput. Archit. News 43, 1 (Mar. 2015), 429--443.

Digital Library

[35]

Shlomit S. Pinter and Ron Y. Pinter. 1994. Program optimization and parallelization using idioms. ACM Trans. Prog. Lang. Syst. 16, 3 (May 1994), 305--327.

Digital Library

[36]

Sebastian Pop, Albert Cohen, Cédric Bastoul, Sylvain Girbal, Georges-André Silber, and Nicolas Vasilache. 2006. GRAPHITE: Polyhedral analyses and optimizations for GCC. In Proceedings of the GCC Developers Summit. Citeseer, 2006.

[37]

Bill Pottenger and Rudolf Eigenmann. 1995. Idiom recognition in the Polaris parallelizing compiler. In Proceedings of the 9th International Conference on Supercomputing (ICS’95). ACM, New York, NY, 444--448.

Digital Library

[38]

William Pugh and David Wonnacott. 1994. Static analysis of upper and lower bounds on dependences and parallelism. ACM Trans. Prog. Lang. Syst. 16, 4 (1994), 1248--1278.

Digital Library

[39]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Not., Vol. 48. ACM, 519--530.

Digital Library

[40]

Gabe Rudy. 2010. CUDA-CHiLL: A Programming Language Interface for GPGPU Optimizations and Code Generation. Ph.D. Dissertation. School of Computing, University of Utah.

[41]

Amin Shafiee Sarvestani, Erik Hansson, and Christoph Kessler. 2013. Extensible recognition of algorithmic patterns in DSP programs for automatic parallelization. Int. J. Parallel Prog. 41, 6 (1 Dec. 2013), 806--824.

[42]

Gordon D. Smith. 1985. Numerical Solution of Partial Differential Equations: Finite Difference Methods. Oxford University Press.

[43]

Toshio Suganuma, Hideaki Komatsu, and Toshio Nakatani. 1996. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the 10th International Conference on Supercomputing (ICS’96). ACM, New York, NY, 18--25.

Digital Library

[44]

Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. 13, 4s, Article 134 (Apr. 2014), 25 pages.

Digital Library

[45]

Arvind K. Sujeeth, Austin Gibbons, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Martin Odersky, and Kunle Olukotun. 2013. Forge: Generating a high performance DSL implementation from a declarative specification. In Proceedings of the 12th International Conference on Generative Programming: Concepts 8 Experiences (GPCE’13). ACM, New York, NY, 145--154.

Digital Library

[46]

Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksandar Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013. Composition and reuse with compiled domain-specific languages. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’13), Giuseppe Castagna (Ed.). Springer Berlin, 52--78.

Digital Library

[47]

Allen Taflove and Susan C. Hagness. 2005. Computational Electrodynamics: The Finite-difference Time-domain Method. Artech House.

[48]

Thiago S. F. X. Teixeira, Corinne Ancourt, David Padua, and William Gropp. 2019. Locus: A system and a language for program optimization. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). IEEE Press, Piscataway, NJ, 217--228. http://dl.acm.org/citation.cfm?id=3314872.3314898.

[49]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. CoRR abs/1802.04730 (2018). arxiv:1802.04730 http://arxiv.org/abs/1802.04730.

[50]

Sven Verdoolaege. 2010. Isl: An integer set library for the polyhedral model. In Proceedings of the 3rd International Congress Conference on Mathematical Software (ICMS’10). Springer, Berlin, 299--302. http://dl.acm.org/citation.cfm?id=1888390.1888455.

Digital Library

[51]

Sven Verdoolaege. 2011. Counting affine calculator and applications. In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT’11).

[52]

Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral extraction tool. In Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT’12).

[53]

Sven Verdoolaege, Serge Guelton, Tobias Grosser, and Albert Cohen. 2014. Schedule trees. In Proceedings of the 4th Workshop on Polyhedral Compilation Techniques (IMPACT’14, Associated with HiPEAC’14). 9.

[54]

Sven Verdoolaege and Alexandre Isoard. 2017. Consecutivity in the isl polyhedral scheduler. (2017).

[55]

Sven Verdoolaege and Gerda Janssens. 2017. Scheduling for PPCG. Report CW 706. Department of Computer Science, KU Leuven, Leuven, Belgium.

[56]

Qing Yi. 2012. POET: A scripting language for applying parameterized source-to-source program transformations. Softw. Pract. Exper. 42, 6 (June 2012), 675--706.

Digital Library

[57]

Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye. 2012. Alphaz: A system for design space exploration in the polyhedral model. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, 17--31.

[58]

Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen. 2018. Modeling the conflicting demands of parallelism and temporal/spatial locality in affine scheduling. In Proceedings of the 27th International Conference on Compiler Construction. ACM, 3--13.

Digital Library

Cited By

Chen ZYang W(2024)Substitution of kernel functions based on pattern matching on schedule treesWorkshop Proceedings of the 53rd International Conference on Parallel Processing10.1145/3677333.3678152(48-57)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3677333.3678152
Espindola VZago LYviquel HAraujo G(2022)Source Matching and Rewriting for MLIR Using String-Based AutomataACM Transactions on Architecture and Code Optimization10.1145/357128320:2(1-26)Online publication date: 18-Nov-2022
https://dl.acm.org/doi/10.1145/3571283
Chelini LKong MGrosser TCorporaal H(2021)LoopOptProceedings of the 24th International Workshop on Software and Compilers for Embedded Systems10.1145/3493229.3493301(11-16)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3493229.3493301
Show More Cited By

Index Terms

Declarative Loop Tactics for Domain-specific Optimization
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

Coarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on ...
LoopOpt: Declarative Transformations Made Easy
SCOPES '21: Proceedings of the 24th International Workshop on Software and Compilers for Embedded Systems

Despite years of research, the optimization strategy of loop-level optimization frameworks remains fragile when addressing modern and heterogeneous architectures. Furthermore, optimizers act as an opaque operation, a black-box, to the users, forcing ...
Improving compiler scalability: optimizing large programs at small price
PLDI '15

Compiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 16, Issue 4

December 2019

572 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3366460

Editor:
Koen De Bosschere
Ghent University, Belgium

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2019

Accepted: 01 November 2019

Revised: 01 October 2019

Received: 01 August 2019

Published in TACO Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Horizon 2020

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
2,037
Total Downloads

Downloads (Last 12 months)280
Downloads (Last 6 weeks)35

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZYang W(2024)Substitution of kernel functions based on pattern matching on schedule treesWorkshop Proceedings of the 53rd International Conference on Parallel Processing10.1145/3677333.3678152(48-57)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3677333.3678152
Espindola VZago LYviquel HAraujo G(2022)Source Matching and Rewriting for MLIR Using String-Based AutomataACM Transactions on Architecture and Code Optimization10.1145/357128320:2(1-26)Online publication date: 18-Nov-2022
https://dl.acm.org/doi/10.1145/3571283
Chelini LKong MGrosser TCorporaal H(2021)LoopOptProceedings of the 24th International Workshop on Software and Compilers for Embedded Systems10.1145/3493229.3493301(11-16)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3493229.3493301
Han JFei XLi ZZhang Y(2021)Polyhedral-Based Compilation Framework for In-Memory Neural Network AcceleratorsACM Journal on Emerging Technologies in Computing Systems10.1145/346984718:1(1-23)Online publication date: 29-Sep-2021
https://dl.acm.org/doi/10.1145/3469847
Moses WChelini LZhao RZinenko O(2021)Polygeist: Raising C to Polyhedral MLIRProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00011(45-59)Online publication date: 26-Sep-2021
https://dl.acm.org/doi/10.1109/PACT52795.2021.00011
Chelini LDrebes AZinenko OCohen AVasilache NGrosser TCorporaal HLee J(2021)Progressive raising in multi-level IRProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370332(15-26)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370332
Weng JJain AWang JWang LWang YNowatzki TLee J(2021)UNITProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370330(77-89)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370330
Lattner CAmini MBondhugula UCohen ADavis APienaar JRiddle RShpeisman TVasilache NZinenko OLee J(2021)MLIRProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370308(2-14)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370308
Vadivel KChelini LBanaGozar ASingh GCorda SJordans RCorporaal HDi Natale GFummi F(2020)TDO-CIMProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408716(1602-1605)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408716
Vadivel KChelini LBanaGozar ASingh GCorda SJordans RCorporaal H(2020)TDO-CIM: Transparent Detection and Offloading for Computation In-memory2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116464(1602-1605)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116464
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents