Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/369028.369051acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Transformations for imperfectly nested loops

Published: 17 November 1996 Publication History

Abstract

Loop transformations are critical for compiling high-performance code for modern computers. Existing work has focused on transformations for perfectly nested loops (that is, loops in which all assignment statements are contained within the innermost loop of a loop nest). In practice, most loop nests, such as those in matrix factorization codes, are imperfectly nested. In some programs, imperfectly nested loops can be transformed into perfectly nested loops by loop distribution, but this is not always legal. In this paper, we present an approach to transforming imperfectly nested loops directly. Our approach is an extension of the linear loop transformation framework for perfectly nested loops, and it models permutation, reversal, skewing, scaling, alignment, distribution and jamming. We also give a completion procedure which generates a complete transformation from a partial transformation.

References

[1]
C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In Principle and Practice of Parallel Programming, pages 39-50, April 1991.
[2]
E. Ayguadé and Jordi Torres. Partitioning the statement per iteration space using non-singular matrices. In 1993 ACM International Conference on Supercomputing, pages 407-415, Tokyo, jul 1993.
[3]
Uptal Banerjee. A theory of loop permutations. In Languages and compilers for parallel computing, pages 54-74, 1989.
[4]
Uptal Banerjee. Unimodular transformations of double loops. In Languages and compilers for parallel computing, pages 192-219, 1990.
[5]
W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, W. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. Technical Report 1375, Center for Supercomputing Research and Development (CSRD), University of Illinois Urbana-Champaign.
[6]
Paul Feautrier. Some efficient solutions to the affine scheduling problem - part i: one dimensional time. International Journal of Parallel Programming, October 1992.
[7]
Paul Feautrier. Some efficient solutions to the affine scheduling problem - part ii: multi-dimensional time. International Journal of Parallel Programming, December 1992.
[8]
Wayne Kelly, William Pugh, and Evan Rosser. Code generation for multiple mappings. In The 5th Symposium on the Frontiers of Massively Parallel Computation, pages 332-341, McLean, Virginia, feb 1995.
[9]
S. Y. Kung. VLSI Array Processors. Prentice-Hall Inc, 1988.
[10]
Wei Li and Keshav Pingali. A singular loop transformation based on non-singular matrices. International Journal of Parallel Programming, 22(2), April 1994.
[11]
William Pugh. The omega test: A fast and practical integer programming algorithm for dependence analysis. In Communications of the ACM, pages 102-114, August 1992.
[12]
J. Ramanujam. Optimal code parallelization using unimodular transformations. In Proceedings of Supercomputing, 1992.
[13]
M. E. Wolf and M. S. Lam. An algorithmic approach to compound loop transformations. In Languages and compilers for parallel computing, pages 243-273, 1990.
[14]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Cited By

View all
  • (2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
  • (2016)An approach for code generation in the Sparse Polyhedral FrameworkParallel Computing10.1016/j.parco.2016.02.00453:C(32-57)Online publication date: 1-Apr-2016
  • (2013)Reshaping cache misses to improve row-buffer locality in multicore systemsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523754(235-244)Online publication date: 7-Oct-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing
November 1996
898 pages
ISBN:0897918541

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 17 November 1996

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SC '96
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)9
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
  • (2016)An approach for code generation in the Sparse Polyhedral FrameworkParallel Computing10.1016/j.parco.2016.02.00453:C(32-57)Online publication date: 1-Apr-2016
  • (2013)Reshaping cache misses to improve row-buffer locality in multicore systemsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523754(235-244)Online publication date: 7-Oct-2013
  • (2013)Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnectProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618820(309-318)Online publication date: Oct-2013
  • (2011)Optimizing data locality using array tilingProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132359(142-149)Online publication date: 7-Nov-2011
  • (2011)A data layout optimization framework for NUCA-based multicoresProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155677(489-500)Online publication date: 3-Dec-2011
  • (2011)Optimizing data locality using array tilingProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105318(142-149)Online publication date: 7-Nov-2011
  • (2007)Loop Optimization using Hierarchical Compilation and Kernel DecompositionProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2007.22(170-184)Online publication date: 11-Mar-2007
  • (2006)Iterative compilation with kernel explorationProceedings of the 19th international conference on Languages and compilers for parallel computing10.5555/1757112.1757132(173-189)Online publication date: 2-Nov-2006
  • (2004)A Polyhedral Approach to Ease the Composition of Program TransformationsEuro-Par 2004 Parallel Processing10.1007/978-3-540-27866-5_38(292-303)Online publication date: 2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media