Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Data-centric multi-level blocking

Published: 01 May 1997 Publication History

Abstract

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers.

References

[1]
Ramesh C. Agarwal and Fred G. Gustavson. Algorithm and Architecture Aspects of Producing ESSL BLAS on POWERS.
[2]
E. Anderson, Z. Bat, C. Bischof, J. Demreel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, editors. LAPA CK Users' Guide. Sec. ond Edition. SIAM, Philadelphia, 1995.
[3]
Jennifer Anderson, Saman Amarsinghe, and Monica Lain. Data and computation transformations for multiprocessors. In A CM Symposium on Principles and Practice of Parallel Programming, Jun 1995.
[4]
U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Workshop on Ad. vances in Languages and Compilers for Parallel Processing, pages 192-219, August 1990.
[5]
David Bau, Induprakas Kodukula, Vladimir Kotlyar, Keshav Pingali, and Paul Stodghil. Solving alignment using elementary linear algebra. In Proceedings of the 7th LCPC Workshop, August 1994. Also available as Cornell Computer Science Dept. tech report TR95-1478.
[6]
Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTE- GRATION, the VLSI Journal, volume 17, pages 33-51. 1994.
[7]
Steve Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercompating, 1992.
[8]
Steve Cart and R. B. Lehoucq. Compiler blockability of dense matrix factorizations. Technical report, Argonne National Laboratory, Oct 1996.
[9]
Steven Cart and R. B. Lehoucq. A compilerblockable algorithm for QR decomposition, 1994.
[10]
L. Carter, J. Ferrante, and S. Flynn Hummel. Hierarchical tiling for improved superscalar performance. In International Parallel Processing Sym. posiam, April 1995.
[11]
Michael Cierniak and Wet Li. Unifying data and control transformations for distributed shared memory machines. In $IGPLAN 1995 conference on Programming Languages Design and Implementation, Jun 1995.
[12]
Stephanie Coleman and Kathryn S. McKinley. Tile size selection using cache organization and data layout, in David W. Wail, editor, A CM SIGPLAN '95 Con/erence on Programming Language Design and Implementation (PLDI), volume 30(6) of A CM $IGPLAN Notices, pages 279-290, New York, NY, USA, June 1995. ACM Press.
[13]
Jim Demmel. Personal communication, Sep 1996.
[14]
Jack Dongarra and Robert Schreiber. Automatic blocking of nested loops. Technical Report UT-CS- 90-108, Department of Computer Science, University of Tennessee, May 1990.
[15]
Gene Golub and Charles Van Loan. Matrix Com. putations. The Johns Hopkins University Press, 1996.
[16]
Monica S. Lain, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, April 8-11, 1991. ACM SIGARCH, SIG- PLAN, SIGOPS, and the IEEE Computer Society.
[17]
W. Li and K. Pingali. Access Normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 1993.
[18]
Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. Improving data locality with loop transformations. In A CM Transactions on Programming Languages and Systems, volume 18, pages 424-453. july 1996.
[19]
W. Pugh. A practical algorithm for exact array dependency analysis. Comm. of the A CM, 35(8):102, August 1992.
[20]
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108-120, October 1992.
[21]
A. Rogers and K. Pingali. Process decomposition through locality of reference. In SIGPLAN89 conference on Programming Languages, Design and Implementation, Jun 1989.
[22]
Vivek Sarkar. Automatic selection of high order transformations in the IBM ASTI optimizer. Technical Report ADTI-96-004, Application Development Technology Institute, IBM Software Solutions Division, July 1996. Submitted to special issue of IBM Journal of Research and Development.
[23]
M.E. Wolf and M.S. Lam. A data locality optimizing algorithm. In SIGPLAN 1991 conference on Programming Languages Design and Implementation, Jun 1991.
[24]
M. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Pro. cessing for Scientific Computing, December 1987.
[25]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Cited By

View all
  • (2023)Computationally Efficient DNN Mapping Search Heuristic using Deep Reinforcement LearningACM Transactions on Embedded Computing Systems10.1145/360911022:5s(1-21)Online publication date: 9-Sep-2023
  • (2023)Employing Polyhedral Methods to Reduce Data Movement in FPGA Stencil CodesLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_4(47-63)Online publication date: 10-May-2023
  • (2017)Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNsAdvanced Concepts for Intelligent Vision Systems10.1007/978-3-319-70353-4_8(89-100)Online publication date: 23-Nov-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 32, Issue 5
May 1997
365 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/258916
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
    May 1997
    365 pages
    ISBN:0897919076
    DOI:10.1145/258915
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997
Published in SIGPLAN Volume 32, Issue 5

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)21
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Computationally Efficient DNN Mapping Search Heuristic using Deep Reinforcement LearningACM Transactions on Embedded Computing Systems10.1145/360911022:5s(1-21)Online publication date: 9-Sep-2023
  • (2023)Employing Polyhedral Methods to Reduce Data Movement in FPGA Stencil CodesLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_4(47-63)Online publication date: 10-May-2023
  • (2017)Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNsAdvanced Concepts for Intelligent Vision Systems10.1007/978-3-319-70353-4_8(89-100)Online publication date: 23-Nov-2017
  • (2015)Jagged Tiling for Intra-tile Parallelism and Fine-Grain MultithreadingLanguages and Compilers for Parallel Computing10.1007/978-3-319-17473-0_11(161-175)Online publication date: 1-May-2015
  • (2010)Improving scratchpad allocation with demand-driven data tilingProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878942(127-136)Online publication date: 24-Oct-2010
  • (2007)Towards data tiling for whole programs in scratchpad memory allocationProceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture10.5555/2392163.2392171(63-74)Online publication date: 23-Aug-2007
  • (2007)Cache-efficient numerical algorithms using graphics hardwareParallel Computing10.1016/j.parco.2007.09.00633:10-11(663-684)Online publication date: 1-Nov-2007
  • (2007)Towards Data Tiling for Whole Programs in Scratchpad Memory AllocationAdvances in Computer Systems Architecture10.1007/978-3-540-74309-5_8(63-74)Online publication date: 2007
  • (2006)A Memory Model for Scientific Algorithms on Graphics ProcessorsACM/IEEE SC 2006 Conference (SC'06)10.1109/SC.2006.2(6-6)Online publication date: Nov-2006
  • (2004)More Legal Transformations for LocalityEuro-Par 2004 Parallel Processing10.1007/978-3-540-27866-5_36(272-283)Online publication date: 2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media