Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2212908.2212917acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Mesh independent loop fusion for unstructured mesh applications

Published: 15 May 2012 Publication History

Abstract

Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation.
In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis --- which is often inhibited due to the size of the control flow graph - often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.

References

[1]
The ROSE compiler. http://wwww.rosecompiler.org/.
[2]
M. Bartlett, I. Bate, and D. Kazakov. Guaranteed loop bound identification from program traces for wcet. In Proceedings of the $15^th$ Real-Time Technology and Applications Symposium (RTAS'09), April 2009.
[3]
C. Bertolli, A. Betts, G. Mudalige, M. B. Giles, and P. H.J. Kelly. Design and performance of the OP2 library for unstructured mesh applications. In Euro-Par 2001 Parallel Processing Workshops, LNCS. Springer, 2011.
[4]
G. Bilardi and K. Pingali. A framework for generalized control dependence. SIGPLAN Not., 31, May 1996.
[5]
D.A. Burgess, P.I. Crumpton, and M.B. Giles. A parallel framework for unstructured grid solvers. In K.M. Decker and R.M. Rehmann, editors, Programming Environments for Massively Parallel Distributed Systems, pages 97--106, 1994.
[6]
P.I. Crumpton and M.B. Giles. Parallel Computational Fluid Dynamics: Implementations and Results Using Parallel Computers, chapter Multigrid aircraft computations using the OPlus parallel library, pages 339--346. 1996.
[7]
Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: a domain specific language for building portable mesh-based pde solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 9:1--9:12, New York, NY, USA, 2011. ACM.
[8]
A. Ermedahl, C. Sandberg, J. Gustafsson, S. Bygde, and B. Lisper. Loop bound analysis based on a combination of program slicing, abstract interpretation, and invariant analysis. In Proceedings of the 7th Int'l. Workshop on Worst Case Execution Time (WCET) Analysis, July 2007.
[9]
M. B. Giles, M. C. Duta, J. D. Muller, and N. A. Pierce. Algorithm developments for discrete adjoint methods. AIAA Journal, 42(2):198--205, 2003.
[10]
M.B. Giles, G.R. Mudalige, Z. Sharif, G. Markall, and P. H.J. Kelly. Performance analysis and optimisation of the OP2 framework on many-core architectures. The Computer Journal, 2011.
[11]
M.B. Giles, G.R. Mudalige, Z. Sharif, G. Markall, and P. H.J. Kelly. Performance analysis of the OP2 framework on many-core architectures. SIGMETRICS Perform. Eval. Rev., 38(4):9--15, March 2011.
[12]
C. A. Healy, M. Sjödin, V. Rustagi, D. Whalley, and R. van Engelen. Supporting timing analysis by automatic bounding of loops iterations. Real-Time Systems, 18(2--3):129--156, May 2000.
[13]
Lee W. Howes, Anton Lokhmotov, Alastair F. Donaldson, and Paul H.J. Kelly. Deriving efficient data movement from decoupled access/execute specifications. In Proceedings of the 4th International. Conference on High Performance Embedded Architectures and Compilers, HiPEAC '09, 2009.
[14]
J. S. Meredith, R. Sisneros, D. Pugmire, and S. Ahern. A distributed data-parallel framework for analysis and visualization algorithm development. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pages 11--19, New York, NY, USA, 2012. ACM.
[15]
P. Moinier, J. D. Muller, and M. B. Giles. Edge-based multigrid and preconditioning for hybrid grids. AIAA Journal, 40(10):1954--1960, 2002.
[16]
http://www.oerc.ox.ac.uk/research/op2.
[17]
M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2003.
[18]
Mark Weiser. Program slicing. In Proceedings of the 5th Int'l. conference on Software engineering, ICSE '81, 1981.

Cited By

View all
  • (2024)Optimizing Deep Learning Inference via Global Analysis and Tensor ExpressionsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624858(286-301)Online publication date: 27-Apr-2024
  • (2013)Loop ChainingProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.68(375-384)Online publication date: 20-May-2013
  • (2012)Using domain-specific languages and access-execute descriptors to expand the parallel code synthesis design spaceProceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing10.1145/2364474.2364476(1-2)Online publication date: 15-Sep-2012

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '12: Proceedings of the 9th conference on Computing Frontiers
May 2012
320 pages
ISBN:9781450312158
DOI:10.1145/2212908
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compilers
  2. loop fusion
  3. unstructured mesh applications
  4. whole program control flow analysis

Qualifiers

  • Research-article

Conference

CF'12
Sponsor:
CF'12: Computing Frontiers Conference
May 15 - 17, 2012
Cagliari, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Deep Learning Inference via Global Analysis and Tensor ExpressionsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624858(286-301)Online publication date: 27-Apr-2024
  • (2013)Loop ChainingProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPSW.2013.68(375-384)Online publication date: 20-May-2013
  • (2012)Using domain-specific languages and access-execute descriptors to expand the parallel code synthesis design spaceProceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing10.1145/2364474.2364476(1-2)Online publication date: 15-Sep-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media