Article

Free access

Locality optimizations for multi-level caches

Authors:

Gabriel Rivera,

Chau-Wen TsengAuthors Info & Claims

SC '99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing

Pages 2 - es

https://doi.org/10.1145/331532.331534

Published: 01 January 1999 Publication History

PDF eReader

References

[1]

D. Bacon, J.-H. Chow, D.-C. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON'94, Toronto, Canada, October 1994.

Digital Library

Google Scholar

[2]

S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992.

Digital Library

Google Scholar

[3]

J. Chame and S. Moon. A tile selection algorithm for data locality and cache interference. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.

Digital Library

Google Scholar

[4]

S. Chatterjee, V. Jain, A. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.

Digital Library

Google Scholar

[5]

M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.

Digital Library

Google Scholar

[6]

S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.

Digital Library

Google Scholar

[7]

J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.

Digital Library

Google Scholar

[8]

D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988.

Digital Library

Google Scholar

[9]

G. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, August 1992.

Digital Library

Google Scholar

[10]

S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997.

Digital Library

Google Scholar

[11]

F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988.

Digital Library

Google Scholar

[12]

M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. Improving locality using loop and data transformations in an integrated framework. In Proceedings of the 31th IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998.

Digital Library

Google Scholar

[13]

M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997.

Digital Library

Google Scholar

[14]

K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.

Digital Library

Google Scholar

[15]

I. Kodukula and K. Pingali. An experimental evaluation of tiling and shacking for memory hierarchy management. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.

Digital Library

Google Scholar

[16]

M. Lam, E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991.

Digital Library

Google Scholar

[17]

N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193-209, February 1997.

Digital Library

Google Scholar

[18]

K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996.

Digital Library

Google Scholar

[19]

N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.

Digital Library

Google Scholar

[20]

G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.

Digital Library

Google Scholar

[21]

G. Rivera and C.-W. Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, Melbourne, Australia, July 1998.

Digital Library

Google Scholar

[22]

G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99), Amsterdam, The Netherlands, March 1999.

Digital Library

Google Scholar

[23]

V. Sarkar. Automatic selection of higher order transformations in the IBM XL Fortran compilers. IBM Journal of Research and Development, 41(3):233- 264, May 1997.

Digital Library

Google Scholar

[24]

S. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6):340- 355, 1997.

Crossref

Google Scholar

[25]

Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the SIG- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.

Digital Library

Google Scholar

[26]

O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1994.

Digital Library

Google Scholar

[27]

R. Wilson et al. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 29(12):31-37, December 1994.

Digital Library

Google Scholar

[28]

M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th IEEE/ACM International Symposium on Microarchitecture, Paris, France, December 1996.

Digital Library

Google Scholar

[29]

M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.

Digital Library

Google Scholar

[30]

M. E. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452-471, October 1991.

Digital Library

Google Scholar

[31]

M. J. Wolfe. More iteration space tiling. In Proceedingsof Supercomputing '89, Reno, NV, November 1989.

Digital Library

Google Scholar

Cited By

View all

Lee WKim CPaik YPark JPark IKim S(2019)Design of Processing-“Inside”-Memory Optimized for DRAM BehaviorsIEEE Access10.1109/ACCESS.2019.29242407(82633-82648)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2924240
Tanase AHannig FTeich JTanase AHannig FTeich J(2018)Symbolic Multi-Level ParallelizationSymbolic Parallelization of Nested Loop Programs10.1007/978-3-319-73909-0_4(93-122)Online publication date: 23-Feb-2018
https://doi.org/10.1007/978-3-319-73909-0_4
Assaf M(2015)Predictive Prefetching for Parallel Hybrid Storage SystemsInternational Journal of Communications, Network and System Sciences10.4236/ijcns.2015.8501808:05(161-180)Online publication date: 2015
https://doi.org/10.4236/ijcns.2015.85018
Show More Cited By

Index Terms

Locality optimizations for multi-level caches

Recommendations

Exploiting reuse locality on inclusive shared last-level caches
Special Issue on High-Performance Embedded Architectures and Compilers

Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is ...
Exploiting spatial locality in data caches using spatial footprints
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)

Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SC '99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing

January 1999

1015 pages

ISBN:1581130910

DOI:10.1145/331532

General Chair:
Cherri Pancake

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 1999

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SC '99

Sponsor:

SIGARCH
IEEE-CS

SC '99: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 19, 1999

Oregon, Portland, USA

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
624
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)12

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lee WKim CPaik YPark JPark IKim S(2019)Design of Processing-“Inside”-Memory Optimized for DRAM BehaviorsIEEE Access10.1109/ACCESS.2019.29242407(82633-82648)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2924240
Tanase AHannig FTeich JTanase AHannig FTeich J(2018)Symbolic Multi-Level ParallelizationSymbolic Parallelization of Nested Loop Programs10.1007/978-3-319-73909-0_4(93-122)Online publication date: 23-Feb-2018
https://doi.org/10.1007/978-3-319-73909-0_4
Assaf M(2015)Predictive Prefetching for Parallel Hybrid Storage SystemsInternational Journal of Communications, Network and System Sciences10.4236/ijcns.2015.8501808:05(161-180)Online publication date: 2015
https://doi.org/10.4236/ijcns.2015.85018
Bao BDing C(2013)Defensive loop tiling for shared cacheProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6495008(1-11)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1109/CGO.2013.6495008
Shirako JSharma KFauzia NPouchet LRamanujam JSadayappan PSarkar V(2012)Analytical bounds for optimal tile size selectionProceedings of the 21st international conference on Compiler Construction10.1007/978-3-642-28652-0_6(101-121)Online publication date: 24-Mar-2012
https://dl.acm.org/doi/10.1007/978-3-642-28652-0_6
Ma WKrishnamoorthy SAgrawal G(2011)Practical loop transformations for tensor contraction expressions on multi-level memory hierarchiesProceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software10.5555/1987237.1987258(266-285)Online publication date: 26-Mar-2011
https://dl.acm.org/doi/10.5555/1987237.1987258
Tavarageri SPouchet LRamanujam JRountev ASadayappan P(2011)Dynamic selection of tile sizesProceedings of the 2011 18th International Conference on High Performance Computing10.1109/HiPC.2011.6152742(1-10)Online publication date: 18-Dec-2011
https://dl.acm.org/doi/10.1109/HiPC.2011.6152742
Ma WKrishnamoorthy SAgrawal G(2011)Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory HierarchiesCompiler Construction10.1007/978-3-642-19861-8_15(266-285)Online publication date: 2011
https://doi.org/10.1007/978-3-642-19861-8_15
Baskaran MHartono ATavarageri SHenretty TRamanujam JSadayappan PMoshovos ASteffan GHazelwood KKaeli D(2010)Parameterized tiling revisitedProceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1772954.1772983(200-209)Online publication date: 24-Apr-2010
https://dl.acm.org/doi/10.1145/1772954.1772983
Hartono ABaskaran MBastoul CCohen AKrishnamoorthy SNorris BRamanujam JSadayappan PGschwind MNicolau ASalapura VMoreira J(2009)Parametric multi-level tiling of imperfectly nested loopsProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542301(147-157)Online publication date: 8-Jun-2009
https://dl.acm.org/doi/10.1145/1542275.1542301
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Exploiting reuse locality on inclusive shared last-level caches

Exploiting spatial locality in data caches using spatial footprints

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches