Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Quantifying the Multi-Level Nature of Tiling Interactions

Published: 01 December 1998 Publication History

Abstract

Optimizations, including tiling, often target a single level of memory or parallelism, such as cache. These optimizations usually operate on a level-by-level basis, guided by a cost function parameterized by features of that single level. The benefit of optimizations guided by these one-level cost functions decreases as architectures tend towards a hierarchy of memory and of parallelism. We have identified three common architectural scenarios where a single tiling choice could be improved by using information from multiple levels in concert. For each scenario, we derive multi-level cost functions which guide the optimal choice of tile size and shape, and quantify the improvement gained. We give both analysis and simulation results to support our points.

References

[1]
1. Michael E. Wolf and Monica S. Lam, A data locality optimizing algorithm, Progr. Lang. Design Implementation (1991).
[2]
2. Steve Carr and Ken Kennedy, Compiler blockability of numerical algorithms, J. Supercomputing , pp. 114-124 (November 1992).
[3]
3. Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng, Compiler optimizations for improving data locality, Sixth Int'l. Conf. Archit. Support Progr. Lang. Oper. Syst., San Jose, California, Oct. 1994.
[4]
4. Steve Carr and Ken Kennedy, Improving the ratio of memory operations to floatingpoint operations in loops, Trans. Progr. Lang. Syst. 16(6):1768-1810 (November 1994).
[5]
5. Corinne Ancourt and François Irigoin, Scanning polyhedra with DO loops, Principles and Practice of Parallel Progr., pp. 39-50 (April 1991).
[6]
6. Michael E. Wolf and Monica S. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Trans. Parallel Distrib. Syst. 2(4):452-471 (1991).
[7]
7. Paul Feautrier, Some efficient solutions to the affine scheduling problem, Part I, one-dimensional time, IJPP 21(5):xx-xx (October 1992).
[8]
8. Wayne Kelly and William Pugh, A unifying framework for iteration reordering transformations, IEEE First Int'l. Conf. Algorithms and Architectures for Parallel Processing (April 1995).
[9]
9. Daniel Lavery and Wen-mei Hwu, Unrolling-based optimizations for modulo scheduling, 28th Int'l. Symp. Microarchit., pp. 126-141 (December 1995).
[10]
10. Stephanie Coleman and Kathryn S. McKinley, Tile size selection using cache organization and data layout, Progr. Lang. Design and Implementation (June 1995).
[11]
11. Vivek Sarkar, Guang R. Gao, and Shaohua Han, Locality analysis for distributed shared-memory multiprocessors, Lang. Compilers for Parallel Computing (1996).
[12]
12. Dennis Gannon and Ko-Yang Wang, Applying AI Techniques to Program Optimization for Parallel Computers, Chap. 12, McGraw Hill Co. (1989).
[13]
13. Michael E. Wolf, Dror Maydan, and Ding-Kai Chen, Combining loop transformations considering caches and scheduling, 29th Int'l. Symp. Microarchit. (December 1996).
[14]
14. Michael J. Wolfe, Iteration space tiling for memory hierarchies, Parallel Processing for Sci. Comput., pp. 357-361 (1987).
[15]
15. J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for nonshared memory machines, Supercomputing (November 1991).
[16]
16. David A. Padua and Michael J. Wolfe, Advanced compiler optimizations for supercomputers, Commun. ACM 29(12):1184-1201 (December 1986).
[17]
17. Dennis Gannon, William Jalby, and Kyle Gallivan, Strategies for cache and local memory management by global program transformation, J. Parallel and Distrib. Comput., Vol. 5, No. 5 (October 1988).
[18]
18. François Irigoin and Rémi Triolet, Supernode partitioning, Principles of Progr. Lang., pp. 319-328 (January 1988).
[19]
19. Michael J. Wolfe, More iteration space tiling, Supercomputing, pp. 655-664 (1989).
[20]
20. Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf, The cache performance and optimizations of blocked algorithms, ASPLOS-IV, Palo Alto, California (April 1991).
[21]
21. Utpal Banerjee, Unimodular transformations of double loops, in Progr. Lang. Compilers for Parallel Computing, Irvine, California (August 1990).
[22]
22. Ken Kennedy and Kathryn S. McKinley, Optimizing for parallelism and data locality, Int'l. Conf. Supercomputing (July 1992).
[23]
23. Jeanne Ferrante, Vivek Sarkar, and Wedy Thrash, On estimating and enhancing cache effectiveness, Lang. Compilers for Parallel Computing (1991).
[24]
24. Anant Agarwal, David Kranz, and Venkat Natarajan, Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors, Int'l. Conf. Parallel Computing (1993).
[25]
25. Vivek Sarkar and Radhika Thekkath, A general framework for iteration-reordering loop transformations, Technical Summary, Progr. Lang. Design and Implementation (1992).
[26]
26. Steve Carr, Combining optimization for cache and instruction-level parallelism, PACT '96, pp. 238-247 (1996).
[27]
27. Ken Kennedy and Kathryn S. McKinley, Maximizing loop parallelism and improving data locality via loop fusion and distribution, Lang. Compilers for Parallel Computing (1993).
[28]
28. Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Int'l. Conf. Supercomputing (1997).
[29]
29. Larry Carter, Jeanne Ferrante, Susan Flynn Hummel, Bowen Alpern, and Kang Su Gatlin, Hierarchical tiling: A methodology for high performance, Technical Report CS96- 508, UCSD, Department of Computer Science and Engineering (November 1996).
[30]
30. Doug Burger and Todd Austin, The SimpleScalar architectural research tool set, Version 2.0, http://www.cs.wisc.edu/~mscalar/simplescalar.html.
[31]
31. Karin Högstedt, Larry Carter, and Jeanne Ferrante, Determining the idle time of a tiling, Principles of Progr. Lang. (1997).
[32]
32. Larry Carter, Jeanne Ferrante, and S. Flynn Hummel, Hierarchical tiling for improved superscalar performance, Int'l. Parallel Processing Symp. (April 1995).

Cited By

View all
  • (2013)Defensive loop tiling for shared cacheProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6495008(1-11)Online publication date: 23-Feb-2013
  • (2011)Practical loop transformations for tensor contraction expressions on multi-level memory hierarchiesProceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software10.5555/1987237.1987258(266-285)Online publication date: 26-Mar-2011
  • (2010)Automatic creation of tile size selection modelsProceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1772954.1772982(190-199)Online publication date: 24-Apr-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Parallel Programming
International Journal of Parallel Programming  Volume 26, Issue 6
December 1998
58 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 1998

Author Tags

  1. COMPILER
  2. LOCALITY
  3. MEMORY HIERARCHY
  4. PARALLELISM
  5. TILING

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Defensive loop tiling for shared cacheProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6495008(1-11)Online publication date: 23-Feb-2013
  • (2011)Practical loop transformations for tensor contraction expressions on multi-level memory hierarchiesProceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software10.5555/1987237.1987258(266-285)Online publication date: 26-Mar-2011
  • (2010)Automatic creation of tile size selection modelsProceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1772954.1772982(190-199)Online publication date: 24-Apr-2010
  • (2009)Compact multi-dimensional kernel extraction for register tilingProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654105(1-12)Online publication date: 14-Nov-2009
  • (2008)Positivity, posynomials and tile size selectionProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413426(1-12)Online publication date: 15-Nov-2008
  • (2007)Block size selection of parallel LU and QR on PVP-based and RISC-based supercomputersProceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing10.1145/1375783.1375809(115-125)Online publication date: 11-Nov-2007
  • (2006)Profitable loop fusion and tiling using model-driven empirical searchProceedings of the 20th annual international conference on Supercomputing10.1145/1183401.1183437(249-258)Online publication date: 28-Jun-2006
  • (2006)Efficient synthesis of out-of-core algorithms using a nonlinear optimization solverJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.06.01766:5(659-673)Online publication date: 1-May-2006
  • (2005)Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction ExpressionsProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.35Online publication date: 12-Nov-2005
  • (2005)Optimizing matrix multiplication with a classifier learning systemProceedings of the 18th international conference on Languages and Compilers for Parallel Computing10.1007/978-3-540-69330-7_9(121-135)Online publication date: 20-Oct-2005
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media