Article

Free access

Transformations for imperfectly nested loops

Authors:

Induprakas Kodukula,

Keshav PingaliAuthors Info & Claims

Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing

Pages 12 - es

https://doi.org/10.1145/369028.369051

Published: 17 November 1996 Publication History

PDF eReader

Abstract

Loop transformations are critical for compiling high-performance code for modern computers. Existing work has focused on transformations for perfectly nested loops (that is, loops in which all assignment statements are contained within the innermost loop of a loop nest). In practice, most loop nests, such as those in matrix factorization codes, are imperfectly nested. In some programs, imperfectly nested loops can be transformed into perfectly nested loops by loop distribution, but this is not always legal. In this paper, we present an approach to transforming imperfectly nested loops directly. Our approach is an extension of the linear loop transformation framework for perfectly nested loops, and it models permutation, reversal, skewing, scaling, alignment, distribution and jamming. We also give a completion procedure which generates a complete transformation from a partial transformation.

References

[1]

C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In Principle and Practice of Parallel Programming, pages 39-50, April 1991.

Digital Library

Google Scholar

[2]

E. Ayguadé and Jordi Torres. Partitioning the statement per iteration space using non-singular matrices. In 1993 ACM International Conference on Supercomputing, pages 407-415, Tokyo, jul 1993.

Digital Library

Google Scholar

[3]

Uptal Banerjee. A theory of loop permutations. In Languages and compilers for parallel computing, pages 54-74, 1989.

Digital Library

Google Scholar

[4]

Uptal Banerjee. Unimodular transformations of double loops. In Languages and compilers for parallel computing, pages 192-219, 1990.

Google Scholar

[5]

W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, W. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. Technical Report 1375, Center for Supercomputing Research and Development (CSRD), University of Illinois Urbana-Champaign.

Google Scholar

[6]

Paul Feautrier. Some efficient solutions to the affine scheduling problem - part i: one dimensional time. International Journal of Parallel Programming, October 1992.

Digital Library

Google Scholar

[7]

Paul Feautrier. Some efficient solutions to the affine scheduling problem - part ii: multi-dimensional time. International Journal of Parallel Programming, December 1992.

Google Scholar

[8]

Wayne Kelly, William Pugh, and Evan Rosser. Code generation for multiple mappings. In The 5th Symposium on the Frontiers of Massively Parallel Computation, pages 332-341, McLean, Virginia, feb 1995.

Digital Library

Google Scholar

[9]

S. Y. Kung. VLSI Array Processors. Prentice-Hall Inc, 1988.

Digital Library

Google Scholar

[10]

Wei Li and Keshav Pingali. A singular loop transformation based on non-singular matrices. International Journal of Parallel Programming, 22(2), April 1994.

Digital Library

Google Scholar

[11]

William Pugh. The omega test: A fast and practical integer programming algorithm for dependence analysis. In Communications of the ACM, pages 102-114, August 1992.

Digital Library

Google Scholar

[12]

J. Ramanujam. Optimal code parallelization using unimodular transformations. In Proceedings of Supercomputing, 1992.

Google Scholar

[13]

M. E. Wolf and M. S. Lam. An algorithmic approach to compound loop transformations. In Languages and compilers for parallel computing, pages 243-273, 1990.

Google Scholar

[14]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Digital Library

Google Scholar

Cited By

View all

Aggarwal KBondhugula U(2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
https://dl.acm.org/doi/10.1145/3418075
Strout MLaMielle ACarter LFerrante JKreaseck BOlschanowsky C(2016)An approach for code generation in the Sparse Polyhedral FrameworkParallel Computing10.1016/j.parco.2016.02.00453:C(32-57)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.parco.2016.02.004
Ding WLiu JKandemir MIrwin MFensch CO'Boyle MSeznec ABodin F(2013)Reshaping cache misses to improve row-buffer locality in multicore systemsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523754(235-244)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.5555/2523721.2523754
Show More Cited By

Index Terms

Transformations for imperfectly nested loops

Recommendations

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
ICS '00: Proceedings of the 14th international conference on Supercomputing

We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product ...
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

A majority of loop restructuring techniques developed so far assume that loops are perfectly nested. The unimodular approach unifies three individual transformations -- loop interchange, skewing and reversal -- but is still limited to perfect loop ...
Tiling Imperfectly-nested Loops

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing

November 1996

898 pages

ISBN:0897918541

Chairman:
Beverly Clayton

Publisher

IEEE Computer Society

United States

Publication History

Published: 17 November 1996

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SC '96

Sponsor:

SIGARCH
IEEE-CS

SC '96: International Conference for High Performance Computing, Networking, Storage and Analysis

January 1, 1996

Pennsylvania, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
248
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)9

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Aggarwal KBondhugula U(2020)Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core SystemsACM Transactions on Parallel Computing10.1145/34180757:4(1-45)Online publication date: 25-Nov-2020
https://dl.acm.org/doi/10.1145/3418075
Strout MLaMielle ACarter LFerrante JKreaseck BOlschanowsky C(2016)An approach for code generation in the Sparse Polyhedral FrameworkParallel Computing10.1016/j.parco.2016.02.00453:C(32-57)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1016/j.parco.2016.02.004
Ding WLiu JKandemir MIrwin MFensch CO'Boyle MSeznec ABodin F(2013)Reshaping cache misses to improve row-buffer locality in multicore systemsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523754(235-244)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.5555/2523721.2523754
Wei Ding Jun Liu Kandemir MIrwin M(2013)Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnectProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618820(309-318)Online publication date: Oct-2013
https://doi.org/10.1109/PACT.2013.6618820
Ding WZhang YLiu JKandemir MPhillips JHu AGraeb H(2011)Optimizing data locality using array tilingProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132359(142-149)Online publication date: 7-Nov-2011
https://dl.acm.org/doi/10.5555/2132325.2132359
Zhang YDing WKandemir MLiu JJang OGaluzzi CCarro LMoshovos APrvulovic M(2011)A data layout optimization framework for NUCA-based multicoresProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155677(489-500)Online publication date: 3-Dec-2011
https://dl.acm.org/doi/10.1145/2155620.2155677
Ding WZhang YLiu JKandemir M(2011)Optimizing data locality using array tilingProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105318(142-149)Online publication date: 7-Nov-2011
https://dl.acm.org/doi/10.1109/ICCAD.2011.6105318
Barthou DDonadio SCarribault PDuchateau AJalby W(2007)Loop Optimization using Hierarchical Compilation and Kernel DecompositionProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2007.22(170-184)Online publication date: 11-Mar-2007
https://dl.acm.org/doi/10.1109/CGO.2007.22
Barthou DDonadio SDuchateau AJalby WCourtois E(2006)Iterative compilation with kernel explorationProceedings of the 19th international conference on Languages and compilers for parallel computing10.5555/1757112.1757132(173-189)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.5555/1757112.1757132
Cohen AGirbal STemam O(2004)A Polyhedral Approach to Ease the Composition of Program TransformationsEuro-Par 2004 Parallel Processing10.1007/978-3-540-27866-5_38(292-303)Online publication date: 2004
https://doi.org/10.1007/978-3-540-27866-5_38
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Affine-by-Statement Transformations of Imperfectly Nested Loops

Tiling Imperfectly-nested Loops