research-article

Open access

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation

Authors:

Christoph Müller,

Oleksandr Zinenko,

Stephan Herhut,

Torsten Hoefler,

Tobias GrosserAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4

Article No.: 51, Pages 1 - 23

https://doi.org/10.1145/3469030

Published: 03 September 2021 Publication History

All formats PDF

Abstract

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.

References

[1]

2020. CLIMA. Retrieved from https://github.com/climate-machine/CLIMA/.

[2]

2020. Consortium for Small-scale Modeling. Retrieved from http://www.cosmo-model.org/.

[3]

2020. FV3: Finite-Volume Cubed-Sphere Dynamical Core. Retrieved from https://www.gfdl.noaa.gov/fv3/.

[4]

2020. GridTools. Retrieved from https://github.com/GridTools/gridtools.

[5]

2020. GT4Py. Retrieved from https://github.com/gridtools/gt4py.

[6]

2020. RAJA. Retrieved from https://github.com/LLNL/RAJA.

[7]

Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019. Learning to optimize halide with tree search and random programs. ACM Trans. Graph. 38, 4 (July 2019).

Digital Library

[8]

S. V. Adams, R. W. Ford, M. Hambley, J. M. Hobson, I. Kavc̆ic̆, C. M. Maynard, T. Melvin, E. H. Müller, S. Mullerworth, A. R. Porter, M. Rezny, B. J. Shipway, and R. Wong. 2019. LFRic: Meeting the challenges of scalability and performance portability in weather and climate models. J. Parallel Distrib. Comput. 132 (2019), 383–396.

[9]

R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse, C. Reddy, S. Verdoolaege, A. Betts, A. F. Donaldson, J. Ketema, J. Absar, S. v. Haastregt, A. Kravets, A. Lokhmotov, R. David, and E. Hajiyev. 2015. PENCIL: A platform-neutral compute intermediate language for accelerator programming. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT’15). 138–149.

[10]

Michael Baldauf, Axel Seifert, Jochen Förstner, Detlev Majewski, Matthias Raschendorfer, and Thorsten Reinhardt. 2011. Operational convective-scale numerical weather prediction with the COSMO model: Description and sensitivities. Month. Weath. Rev. 139, 12 (2011), 3887–3905.

[11]

Ulysse Beaugnon, Antoine Pouille, Marc Pouzet, Jacques Pienaar, and Albert Cohen. 2017. Optimization space pruning without regrets. In Proceedings of the 26th International Conference on Compiler Construction. 34–44.

Digital Library

[12]

Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, and Torsten Hoefler. 2019. Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’19).

Digital Library

[13]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). USENIX Association, 578–594. Retrieved from https://www.usenix.org/conference/osdi18/presentation/chen.

[14]

Valentin Clement, Sylvaine Ferrachat, Oliver Fuhrer, Xavier Lapillonne, Carlos E. Osuna, Robert Pincus, Jon Rood, and William Sawyer. 2018. The CLAW DSL: Abstractions for performance portable weather and climate models. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC’18). Association for Computing Machinery, New York, NY.

Digital Library

[15]

Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. 2013. Terra: A multi-stage language for high-performance computing. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). Association for Computing Machinery, New York, NY, 105–116.

Digital Library

[16]

H. C. Edwards and C. R. Trott. 2013. Kokkos: Enabling performance portability across manycore architectures. In Proceedings of the Extreme Scaling Workshop (XSW’13). 18–24.

[17]

Oliver Fuhrer, Carlos Osuna, Xavier Lapillonne, Tobias Gysi, Ben Cumming, Mauro Bianco, Andrea Arteaga, and Thomas Schulthess. 2014. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomput. Front. Innov. 1, 1 (2014).

[18]

Tobias Grosser, Albert Cohen, Justin Holewinski, P. Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). Association for Computing Machinery, New York, NY, 66–75.

Digital Library

[19]

Tobias Grosser, Albert Cohen, Paul H. J. Kelly, J. Ramanujam, P. Sadayappan, and Sven Verdoolaege. 2013. Split tiling for GPUs: Automatic parallelization using trapezoidal tiles. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU’13). Association for Computing Machinery, New York, NY, 24–31.

Digital Library

[20]

Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly—Performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22, 04 (2012), 1250010.

[21]

Tobias Grosser and Torsten Hoefler. 2016. Polly-ACC transparent compilation to heterogeneous hardware. In Proceedings of the International Conference on Supercomputing (ICS’16). Association for Computing Machinery, New York, NY.

Digital Library

[22]

Tobias Gysi, Tobias Grosser, and Torsten Hoefler. 2015. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS’15). Association for Computing Machinery, New York, NY, 177–186.

Digital Library

[23]

T. Gysi, T. Grosser, and T. Hoefler. 2019. Absinthe: Learning an analytical performance model to fuse and tile stencil codes in one shot. In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). 370–382.

[24]

Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas C. Schulthess. 2015. STELLA: A domain-specific tool for structured grid methods in weather and climate models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). Association for Computing Machinery, New York, NY.

[25]

Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with lift. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). Association for Computing Machinery, New York, NY, 100–112.

[26]

Lucas M. Harris and Shian-Jiann Lin. 2013. A two-way nested global-regional dynamical core on the cubed-sphere grid. Month. Weath. Rev. 141, 1 (2013), 283–306.

[27]

Justin Holewinski, Louis-Noël Pouchet, and P. Sadayappan. 2012. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). Association for Computing Machinery, New York, NY, 311–320.

[28]

M. Kruse and H. Finkel. 2018. User-directed loop-transformations in Clang. In Proceedings of the IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC’18). 49–58.

[29]

Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). IEEE, 75–86.

Digital Library

[30]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’21). 2–14.

Digital Library

[31]

Chris Leary and Todd Wang. 2017. XLA: TensorFlow, compiled. In Proceedings of the TensorFlow Dev Summit.

[32]

Roland Leißa, Klaas Boesche, Sebastian Hack, Arsène Pérard-Gayot, Richard Membarth, Philipp Slusallek, André Müller, and Bertil Schmidt. 2018. AnyDSL: A partial evaluation framework for programming high-performance libraries. Proc. ACM Program. Lang. 2, OOPSLA (Oct. 2018).

[33]

Naoya Maruyama and Takayuki Aoki. 2014. Optimizing stencil computations for NVIDIA Kepler GPUs. In Proceedings of the 1st International Workshop on High-performance Stencil Computations. 89–95.

[34]

Kazuaki Matsumura, Hamid Reza Zohouri, Mohamed Wahib, Toshio Endo, and Satoshi Matsuoka. 2020. AN5D: Automated stencil framework for high-degree temporal blocking on GPUs. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO’20). Association for Computing Machinery, New York, NY, 199–211.

Digital Library

[35]

William M. McKeeman. 1965. Peephole optimization. Commun. ACM 8, 7 (1965), 443–444.

Digital Library

[36]

G. A. McMechan. 1983. Migration by extrapolation of time-dependent boundary VALUES. Geophys. Prospect. 31 (June 1983), 413–420.

[37]

Steven S. Muchnick. 1998. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA.

Digital Library

[38]

Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). Association for Computing Machinery, New York, NY, 429–443.

Digital Library

[39]

Michel Müller and Takayuki Aoki. 2018. Hybrid Fortran: High productivity GPU porting framework applied to Japanese weather prediction model. In Accelerator Programming Using Directives, Sunita Chandrasekaran and Guido Juckeland (Eds.). Springer International Publishing, Cham, 20–41.

[40]

A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey. 2010. 3.5-D Blocking optimization for stencil computations on modern CPUs and GPUs. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. 1–13.

[41]

John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable parallel programming with CUDA. Queue 6, 2 (Mar. 2008), 40–53.

Digital Library

[42]

Carlos Osuna, Tobias Wicky, Fabian Thuering, Torsten Hoefler, and Oliver Fuhrer. 2020. Dawn: A high-level domain-specific language compiler toolchain for weather and climate applications. Supercomput. Front. Innov. 7, 2 (2020).

[43]

Dan Quinlan. 2000. ROSE: Compiler support for object-oriented frameworks. Parallel Process. Lett. 10, 02n03 (2000), 215–226.

[44]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). Association for Computing Machinery, New York, NY, 519–530.

Digital Library

[45]

Prashant Rawat, Martin Kong, Tom Henretty, Justin Holewinski, Kevin Stock, Louis-Noël Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2015. SDSLc: A multi-target domain-specific compiler for stencil computations. In Proceedings of the 5th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC’15). Association for Computing Machinery, New York, NY.

[46]

Prashant Singh Rawat, Changwan Hong, Mahesh Ravishankar, Vinod Grover, Louis-Noël Pouchet, and P. Sadayappan. 2016. Effective resource management for enhancing performance of 2D and 3D stencils on GPUs. In Proceedings of the 9th Workshop on General Purpose Processing Using Graphics Processing Unit (GPGPU’16). Association for Computing Machinery, New York, NY, 92–102.

[47]

Prashant Singh Rawat, Fabrice Rastello, Aravind Sukumaran-Rajam, Louis-Noël Pouchet, Atanas Rountev, and P. Sadayappan. 2018. Register optimizations for stencils on GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). Association for Computing Machinery, New York, NY, USA, 168–182.

[48]

P. S. Rawat, M. Vaidya, A. Sukumaran-Rajam, M. Ravishankar, V. Grover, A. Rountev, L. Pouchet, and P. Sadayappan. 2018. Domain-specific optimization and generation of high-performance GPU code for stencil computations. Proc. IEEE 106, 11 (2018), 1902–1920.

[49]

Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering (GPCE’10). Association for Computing Machinery, New York, NY, 127–136.

Digital Library

[50]

Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1988. Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 12–27.

[51]

Mohammed Sourouri, Scott B. Baden, and Xing Cai. 2017. Panda: A compiler framework for concurrent CPU+GPU execution of 3D stencil computations on GPU-accelerated supercomputers. Int. J. Parallel Program. 45, 3 (June 2017), 711–729.

Digital Library

[52]

M. Steuwer, T. Remmelg, and C. Dubach. 2017. LIFT: A functional data-parallel IR for high-performance GPU code generation. In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’17). 74–85.

[53]

Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. 13, 4s (April 2014).

Digital Library

[54]

Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). Association for Computing Machinery, New York, NY, 117–128.

[55]

Nicolas Vasilache, Cédric Bastoul, Albert Cohen, and Sylvain Girbal. 2006. Violated dependence analysis. In Proceedings of the 20th International Conference on Supercomputing. 335–344.

Digital Library

[56]

Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim. 9, 4 (Jan. 2013).

Digital Library

[57]

M. Wahib and N. Maruyama. 2014. Scalable kernel fusion for memory-bound GPU applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 191–202.

[58]

C. Yount, J. Tobin, A. Breuer, and A. Duran. 2016. YASK—Yet Another Stencil Kernel: A framework for HPC stencil code-generation and tuning. In Proceedings of the 6th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC’16). 30–39.

[59]

Tuowen Zhao, Protonu Basu, Samuel Williams, Mary Hall, and Hans Johansen. 2019. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs. In Proceedings of the International Conference for High-performance Computing, Networking, Storage and Analysis (SC’19). Association for Computing Machinery, New York, NY.

Digital Library

[60]

Oleksandr Zinenko, Sven Verdoolaege, Chandan Reddy, Jun Shirako, Tobias Grosser, Vivek Sarkar, and Albert Cohen. 2018. Modeling the conflicting demands of parallelism and temporal/spatial locality in affine scheduling. In Proceedings of the 27th International Conference on Compiler Construction (CC’18). Association for Computing Machinery, New York, NY, 3–13.

Digital Library

Cited By

Brown N(2024)Domain Specific Abstractions for the Development of Fast-by-Construction Dataflow Codes on FPGAsChips10.3390/chips30400173:4(334-360)Online publication date: 4-Oct-2024
https://doi.org/10.3390/chips3040017
Bisbas GLydike ABauer EBrown NFehr MMitchell LRodriguez-Canal GJamieson MKelly PSteuwer MGrosser TTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)A shared compilation stack for distributed-memory parallelism in stencil DSLsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651344(38-56)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651344
Zhang YLi KYuan LCheng JZhang YCao TYang M(2024)LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor CoresProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00059(1-17)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00059
Show More Cited By

Index Terms

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation
1. Applied computing
  1. Physical sciences and engineering
    1. Earth and atmospheric sciences
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. Context specific languages
      1. Domain specific languages

Recommendations

A shared compilation stack for distributed-memory parallelism in stencil DSLs
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target ...
Macros for domain-specific languages

Macros provide a powerful means of extending languages. They have proven useful in both general-purpose and domain-specific programming contexts. This paper presents an architecture for implementing macro-extensible DSLs on top of macro-extensible host ...
Declaratively defining domain-specific language debuggers
GCPE '11

Tool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 18, Issue 4

December 2021

497 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3476575

Editor:
David Kaeli
Northeastern University, USA

Issue’s Table of Contents

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2021

Accepted: 01 May 2021

Revised: 01 April 2021

Received: 01 December 2020

Published in TACO Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
3,235
Total Downloads

Downloads (Last 12 months)1,862
Downloads (Last 6 weeks)81

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Brown N(2024)Domain Specific Abstractions for the Development of Fast-by-Construction Dataflow Codes on FPGAsChips10.3390/chips30400173:4(334-360)Online publication date: 4-Oct-2024
https://doi.org/10.3390/chips3040017
Bisbas GLydike ABauer EBrown NFehr MMitchell LRodriguez-Canal GJamieson MKelly PSteuwer MGrosser TTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)A shared compilation stack for distributed-memory parallelism in stencil DSLsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651344(38-56)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651344
Zhang YLi KYuan LCheng JZhang YCao TYang M(2024)LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor CoresProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00059(1-17)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00059
Liu XYang XMa KLiu SZhang KYang HLiu YLuan ZQian D(2024)Moirae: Generating High-Performance Composite Stencil Programs with Global OptimizationsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00026(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00026
Sai RMellor-Crummey JXu JAraya-Polo M(2024)Automated Code Generation of High-Order Stencils for a Dataflow ArchitectureProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00025(1-13)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00025
Zhang WZhao JShen GChen QChen CGuo M(2024)An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00017(75-90)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00017
Ivanov IZinenko ODomke JEndo TMoses WGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Retargeting and Respecializing GPU Workloads for Performance PortabilityProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444828(119-132)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444828
Brown NJamieson MLydike ABauer EGrosser T(2023)Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in FlangProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624167(904-913)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624167
Ben-Nun TAtes BCalotoiu AHoefler TDubach CBruening DHardekopf B(2023)Bridging Control-Centric and Data-Centric OptimizationProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580018(173-185)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580018
Andersson MMarkidis S(2023)A Case Study on DaCe Portability & Performance for Batched Discrete Fourier TransformsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3578178.3578239(55-63)Online publication date: 27-Feb-2023
https://doi.org/10.1145/3578178.3578239
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents