article

Free access

Data-centric multi-level blocking

Authors:

Induprakas Kodukula,

Keshav PingaliAuthors Info & Claims

ACM SIGPLAN Notices, Volume 32, Issue 5

Pages 346 - 357

https://doi.org/10.1145/258916.258946

Published: 01 May 1997 Publication History

Abstract

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers.

References

[1]

Ramesh C. Agarwal and Fred G. Gustavson. Algorithm and Architecture Aspects of Producing ESSL BLAS on POWERS.

[2]

E. Anderson, Z. Bat, C. Bischof, J. Demreel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, editors. LAPA CK Users' Guide. Sec. ond Edition. SIAM, Philadelphia, 1995.

Digital Library

[3]

Jennifer Anderson, Saman Amarsinghe, and Monica Lain. Data and computation transformations for multiprocessors. In A CM Symposium on Principles and Practice of Parallel Programming, Jun 1995.

Digital Library

[4]

U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Workshop on Ad. vances in Languages and Compilers for Parallel Processing, pages 192-219, August 1990.

[5]

David Bau, Induprakas Kodukula, Vladimir Kotlyar, Keshav Pingali, and Paul Stodghil. Solving alignment using elementary linear algebra. In Proceedings of the 7th LCPC Workshop, August 1994. Also available as Cornell Computer Science Dept. tech report TR95-1478.

Digital Library

[6]

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTE- GRATION, the VLSI Journal, volume 17, pages 33-51. 1994.

Digital Library

[7]

Steve Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercompating, 1992.

Digital Library

[8]

Steve Cart and R. B. Lehoucq. Compiler blockability of dense matrix factorizations. Technical report, Argonne National Laboratory, Oct 1996.

[9]

Steven Cart and R. B. Lehoucq. A compilerblockable algorithm for QR decomposition, 1994.

[10]

L. Carter, J. Ferrante, and S. Flynn Hummel. Hierarchical tiling for improved superscalar performance. In International Parallel Processing Sym. posiam, April 1995.

Digital Library

[11]

Michael Cierniak and Wet Li. Unifying data and control transformations for distributed shared memory machines. In $IGPLAN 1995 conference on Programming Languages Design and Implementation, Jun 1995.

Digital Library

[12]

Stephanie Coleman and Kathryn S. McKinley. Tile size selection using cache organization and data layout, in David W. Wail, editor, A CM SIGPLAN '95 Con/erence on Programming Language Design and Implementation (PLDI), volume 30(6) of A CM $IGPLAN Notices, pages 279-290, New York, NY, USA, June 1995. ACM Press.

Digital Library

[13]

Jim Demmel. Personal communication, Sep 1996.

[14]

Jack Dongarra and Robert Schreiber. Automatic blocking of nested loops. Technical Report UT-CS- 90-108, Department of Computer Science, University of Tennessee, May 1990.

Digital Library

[15]

Gene Golub and Charles Van Loan. Matrix Com. putations. The Johns Hopkins University Press, 1996.

[16]

Monica S. Lain, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, Santa Clara, California, April 8-11, 1991. ACM SIGARCH, SIG- PLAN, SIGOPS, and the IEEE Computer Society.

Digital Library

[17]

W. Li and K. Pingali. Access Normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 1993.

Digital Library

[18]

Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. Improving data locality with loop transformations. In A CM Transactions on Programming Languages and Systems, volume 18, pages 424-453. july 1996.

Digital Library

[19]

W. Pugh. A practical algorithm for exact array dependency analysis. Comm. of the A CM, 35(8):102, August 1992.

Digital Library

[20]

J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108-120, October 1992.

[21]

A. Rogers and K. Pingali. Process decomposition through locality of reference. In SIGPLAN89 conference on Programming Languages, Design and Implementation, Jun 1989.

Digital Library

[22]

Vivek Sarkar. Automatic selection of high order transformations in the IBM ASTI optimizer. Technical Report ADTI-96-004, Application Development Technology Institute, IBM Software Solutions Division, July 1996. Submitted to special issue of IBM Journal of Research and Development.

[23]

M.E. Wolf and M.S. Lam. A data locality optimizing algorithm. In SIGPLAN 1991 conference on Programming Languages Design and Implementation, Jun 1991.

Digital Library

[24]

M. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Pro. cessing for Scientific Computing, December 1987.

Digital Library

[25]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.

Digital Library

Cited By

Bakshi SJohnsson L(2023)Computationally Efficient DNN Mapping Search Heuristic using Deep Reinforcement LearningACM Transactions on Embedded Computing Systems10.1145/360911022:5s(1-21)Online publication date: 9-Sep-2023
https://doi.org/10.1145/3609110
Mayer FBrandner JPhilippsen M(2023)Employing Polyhedral Methods to Reduce Data Movement in FPGA Stencil CodesLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_4(47-63)Online publication date: 10-May-2023
https://doi.org/10.1007/978-3-031-31445-2_4
Cecconi LSmets SBenini LVerhelst M(2017)Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNsAdvanced Concepts for Intelligent Vision Systems10.1007/978-3-319-70353-4_8(89-100)Online publication date: 23-Nov-2017
https://doi.org/10.1007/978-3-319-70353-4_8
Show More Cited By

Index Terms

Data-centric multi-level blocking

Recommendations

Data-centric multi-level blocking
PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our ...
Energy-efficient multi-level cell phase-change memory system with data encoding
ICCD '11: Proceedings of the 2011 IEEE 29th International Conference on Computer Design

Phase-change memory (PCM) is one of the most promising technologies among emerging non-volatile memories. Recently, the technology of multi-level cell (MLC) for PCM has been developed and a high capacity memory system can be implemented by storing ...
A data-centric approach to high-level synthesis

Moving data between various components of a system is fast becoming the performance bottleneck in digital design today. This is especially true in high-throughput, memory-intensive applications, like those in multimedia and video processing. With ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 32, Issue 5

May 1997

365 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/258916

Chairmen:
Ron K. Cytron
Washington Univ., St. Louis, MO
,
Marina Chen
Boston Univ., Boston, MA
,
Editor:
A. Michael Berman
Rowan Univ., Glassboro, NJ

Issue’s Table of Contents

PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
May 1997
365 pages
ISBN:0897919076
DOI:10.1145/258915
Chairmen:
Marina Chen
Boston Univ., Boston, MA
,
Ron K. Cytron
Washington Univ., St. Louis, MO
,
Editor:
A. Michael Berman
Rowan Univ., Glassboro, NJ

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1997

Published in SIGPLAN Volume 32, Issue 5

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

180
Total Citations
View Citations
894
Total Downloads

Downloads (Last 12 months)134
Downloads (Last 6 weeks)21

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bakshi SJohnsson L(2023)Computationally Efficient DNN Mapping Search Heuristic using Deep Reinforcement LearningACM Transactions on Embedded Computing Systems10.1145/360911022:5s(1-21)Online publication date: 9-Sep-2023
https://doi.org/10.1145/3609110
Mayer FBrandner JPhilippsen M(2023)Employing Polyhedral Methods to Reduce Data Movement in FPGA Stencil CodesLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_4(47-63)Online publication date: 10-May-2023
https://doi.org/10.1007/978-3-031-31445-2_4
Cecconi LSmets SBenini LVerhelst M(2017)Optimal Tiling Strategy for Memory Bandwidth Reduction for CNNsAdvanced Concepts for Intelligent Vision Systems10.1007/978-3-319-70353-4_8(89-100)Online publication date: 23-Nov-2017
https://doi.org/10.1007/978-3-319-70353-4_8
Shrestha SManzano JMarquez AFeo JGao G(2015)Jagged Tiling for Intra-tile Parallelism and Fine-Grain MultithreadingLanguages and Compilers for Parallel Computing10.1007/978-3-319-17473-0_11(161-175)Online publication date: 1-May-2015
https://doi.org/10.1007/978-3-319-17473-0_11
Yang XWang LXue JTang TRen XYe SKathail VTatge RBarua R(2010)Improving scratchpad allocation with demand-driven data tilingProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878942(127-136)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878942
Li LWu HFeng HXue J(2007)Towards data tiling for whole programs in scratchpad memory allocationProceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture10.5555/2392163.2392171(63-74)Online publication date: 23-Aug-2007
https://dl.acm.org/doi/10.5555/2392163.2392171
Govindaraju NManocha D(2007)Cache-efficient numerical algorithms using graphics hardwareParallel Computing10.1016/j.parco.2007.09.00633:10-11(663-684)Online publication date: 1-Nov-2007
https://dl.acm.org/doi/10.1016/j.parco.2007.09.006
Li LWu HFeng HXue J(2007)Towards Data Tiling for Whole Programs in Scratchpad Memory AllocationAdvances in Computer Systems Architecture10.1007/978-3-540-74309-5_8(63-74)Online publication date: 2007
https://doi.org/10.1007/978-3-540-74309-5_8
Govindaraju NLarsen SGray JManocha D(2006)A Memory Model for Scientific Algorithms on Graphics ProcessorsACM/IEEE SC 2006 Conference (SC'06)10.1109/SC.2006.2(6-6)Online publication date: Nov-2006
https://doi.org/10.1109/SC.2006.2
Bastoul CFeautrier P(2004)More Legal Transformations for LocalityEuro-Par 2004 Parallel Processing10.1007/978-3-540-27866-5_36(272-283)Online publication date: 2004
https://doi.org/10.1007/978-3-540-27866-5_36
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents