Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/331532.331534acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Locality optimizations for multi-level caches

Published: 01 January 1999 Publication History
First page of PDF

References

[1]
D. Bacon, J.-H. Chow, D.-C. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON'94, Toronto, Canada, October 1994.
[2]
S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992.
[3]
J. Chame and S. Moon. A tile selection algorithm for data locality and cache interference. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.
[4]
S. Chatterjee, V. Jain, A. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.
[5]
M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.
[6]
S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.
[7]
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.
[8]
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988.
[9]
G. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, August 1992.
[10]
S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997.
[11]
F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988.
[12]
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. Improving locality using loop and data transformations in an integrated framework. In Proceedings of the 31th IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998.
[13]
M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997.
[14]
K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
[15]
I. Kodukula and K. Pingali. An experimental evaluation of tiling and shacking for memory hierarchy management. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999.
[16]
M. Lam, E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991.
[17]
N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193-209, February 1997.
[18]
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996.
[19]
N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997.
[20]
G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.
[21]
G. Rivera and C.-W. Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, Melbourne, Australia, July 1998.
[22]
G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99), Amsterdam, The Netherlands, March 1999.
[23]
V. Sarkar. Automatic selection of higher order transformations in the IBM XL Fortran compilers. IBM Journal of Research and Development, 41(3):233- 264, May 1997.
[24]
S. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6):340- 355, 1997.
[25]
Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the SIG- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.
[26]
O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1994.
[27]
R. Wilson et al. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 29(12):31-37, December 1994.
[28]
M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th IEEE/ACM International Symposium on Microarchitecture, Paris, France, December 1996.
[29]
M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.
[30]
M. E. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452-471, October 1991.
[31]
M. J. Wolfe. More iteration space tiling. In Proceedingsof Supercomputing '89, Reno, NV, November 1989.

Cited By

View all
  • (2019)Design of Processing-“Inside”-Memory Optimized for DRAM BehaviorsIEEE Access10.1109/ACCESS.2019.29242407(82633-82648)Online publication date: 2019
  • (2018)Symbolic Multi-Level ParallelizationSymbolic Parallelization of Nested Loop Programs10.1007/978-3-319-73909-0_4(93-122)Online publication date: 23-Feb-2018
  • (2015)Predictive Prefetching for Parallel Hybrid Storage SystemsInternational Journal of Communications, Network and System Sciences10.4236/ijcns.2015.8501808:05(161-180)Online publication date: 2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing
January 1999
1015 pages
ISBN:1581130910
DOI:10.1145/331532
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 1999

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SC '99
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)93
  • Downloads (Last 6 weeks)12
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Design of Processing-“Inside”-Memory Optimized for DRAM BehaviorsIEEE Access10.1109/ACCESS.2019.29242407(82633-82648)Online publication date: 2019
  • (2018)Symbolic Multi-Level ParallelizationSymbolic Parallelization of Nested Loop Programs10.1007/978-3-319-73909-0_4(93-122)Online publication date: 23-Feb-2018
  • (2015)Predictive Prefetching for Parallel Hybrid Storage SystemsInternational Journal of Communications, Network and System Sciences10.4236/ijcns.2015.8501808:05(161-180)Online publication date: 2015
  • (2013)Defensive loop tiling for shared cacheProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6495008(1-11)Online publication date: 23-Feb-2013
  • (2012)Analytical bounds for optimal tile size selectionProceedings of the 21st international conference on Compiler Construction10.1007/978-3-642-28652-0_6(101-121)Online publication date: 24-Mar-2012
  • (2011)Practical loop transformations for tensor contraction expressions on multi-level memory hierarchiesProceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software10.5555/1987237.1987258(266-285)Online publication date: 26-Mar-2011
  • (2011)Dynamic selection of tile sizesProceedings of the 2011 18th International Conference on High Performance Computing10.1109/HiPC.2011.6152742(1-10)Online publication date: 18-Dec-2011
  • (2011)Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory HierarchiesCompiler Construction10.1007/978-3-642-19861-8_15(266-285)Online publication date: 2011
  • (2010)Parameterized tiling revisitedProceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1772954.1772983(200-209)Online publication date: 24-Apr-2010
  • (2009)Parametric multi-level tiling of imperfectly nested loopsProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542301(147-157)Online publication date: 8-Jun-2009
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media