Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Cache-conscious data placement

Published: 01 October 1998 Publication History

Abstract

As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet$ormance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache pedormance.In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average.

References

[1]
A. Agarwal and S. D. Pudar. Column-associative caches: A technique for reducing the miss rate of direct-mapped caches. Proceedings oj' the 20th Annual International Symposium on Computer Architecture, 21(2):179-190, May 1993.
[2]
Todd M. Austin. Hardware and Software Mechanisms for Reducing Load Latency. TR 131 I, Computer Sciences Department, UW- Madison, Madison, Wi, April 1996.
[3]
D. A. Barrett and B. G. Zorn. Using lifetime predictors to improve memory allocation performance. Proceedings of the A CM SIGPLAN '93 Conference on Programming Language Design and Implementation, 28(6): 187-196, June 1993.
[4]
S. Cart, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. Proceedings oJ' the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, 28(5):252-262, October 1994.
[5]
C.-L. Chen and C.-K. Liao. Analysis of vector access performance on skewed interleaved memory. Proceedings of the 16th Annual International Symposium on Computer Architecture, 17(3):387-394, May 1989.
[6]
T. Chilimbi and J. Larus. Using generational garbage collection to implement cache-conscious data placement. Submitted for publication., 1998.
[7]
T. Chilimbi, J. Larus, and M. Hill. Improving pointer-based codes through cache-conscious data placement. Submitted for publication., 1998.
[8]
F. Dahlgren and P. Stenstrom. On reconfigurable on-chip data caches. Proceedings of the 24th Annual International Symposium on Microarchitecture, 22(1):189-198, November 1991.
[9]
K. !. Farkas and N. P. Jouppi. Complexity/performance tradeoffs with non-blocking loads. Proceedings of the 21st Annual International Symposium on Computer Architecture, 22(2):211-222, April 1994.
[10]
Q. S. Gao. The chinese remainder theorem and the prime memory system. Proceedings of the 20th Annual International Symposium on Computer Architecture, 21(2):337-340, May 1993.
[11]
N. Gloy, T Blockwell, M.D. Smith, and B. Calder. Procedure placement using temporal ordering information. In 30th international Symposium on Microarchitecture, December 1997.
[12]
D. Grunwald, B. G. Zorn, and R. Henderson. Improving the cache locality of memory allocation. Proceedings of the ACM SIGPLAN '93 ConJerence on Programming Language Design and Implementation, 28(6):177-186, June 1993.
[13]
A.H. Hashemi, D.R. Kaeli, and B. Calder. Efficient procedure mapping using cache lince coloring. In Proceedings of the SIGPLANS'97 Conference on Programming Language Design and Implementation, pages 171-182, June 1997.
[14]
M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612-1630, December 1989.
[15]
W. W. Hwu and P. P. Chang. Achieving high instruction cache pertbrmance with an optimizing compiler. Proceedings o.f'thel6th International Symposium on Computer Architecture, pages 242-251, June 1989.
[16]
T. E. Jeremiassen and S. J. Eggers. Reducing false sharing on shared memory multiprocessors through compile-time data transformations. Proceedings qf the Symposium on Principals and Practice t~'Parallel Programming, July 1995.
[17]
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Pro. ceedings o.f the 17th Annual International Symposium on Computer Architecture, 18(2):364-373, May 1990.
[18]
N. P. Jouppi and S. J.E. Wilton. Tradeoffs in two-level on-chip caching. Proceedings of the 21st Annual International Symposium on Computer Architecture, 22(2):34-45, April 1994.
[19]
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is unknown. Proceedings of the 1992 ACM SIGPLAN Con. fkrence on Programming Language Design and Implementation, 28(6):278-289, June 1993.
[20]
R. E. Kessler. Analysis of Multi-Megabyte Secondary CPU Cache Memories. TR 1032, Computer Sciences Department, UW-Madison, Madison, WI, July 1991.
[21]
R. E. Kessler, R. Jooss, A. Lebeck, and M. D. Hill. Inexpensive implementations of set-associativity. Proceedings of the 16th Annual International Symposium on Computer Architecture, 17(3):131-139, 1989.
[22]
A. R. Lebeck and D. A. Wood. Cache profiling and the spec benchmarks: A case study. IEEE Computer, 27(10):15-26, October 1994.
[23]
W. L. Lynch, B. K. Bray, and M. J. Flynn. Theeffect of page allocation on caches. Proceedings of the 25th Annual International Symposium on Microarchitecture, 23( 1):222-225, December 1992.
[24]
S. McFarling. Procedure merging with instruction caches. Proceedings t!f the ACM SIGPLAN '91 Confi~rence on Programming Language Design and Implementation, 26(6):71-79, June 1991.
[25]
A. B. Montz, D. Mosberger, S. W. O'Malley, L. L. Peterson, T. A. Proebsting, and J. H. Hartman. Scout: A communications-oriented operating system. Technical Report TR 94-20, Department of Computer Science, University of Arizona, Tucson, AZ, June 1994.
[26]
T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. Conjerence Proceedings oJ' the Fifth International Symposium on Architectural Support jbr Programming Languages and Operating Systems, 27(9):62-73, October 1992.
[27]
F. Mueller. Compiler support for software-based cache partitioning. ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Real-Time Systems, 30(I 1):125-133, June 1995.
[28]
K. Pettis and R. C. Hansen. Profile guided code positioning. Proceedings of the ACM SIGPLAN '90 ConJerence on Programming Language Design and Implementation, 25(6): 16-27, June 1990.
[29]
G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proceedings oj' the SiGPLAN'98 Con l'erence on Programming Language Design and Implementation, June 1998.
[30]
M. Seidl and B. Zorn. Predicting references to dynamically allocated objects. CU-CS-826-97, University of Colorado at Boulder, January 1997.
[31]
J. E. Smith. Decoupled access/execute architectures. Proceedings of the 9th Annual International Symposium on Computer Architecture, pages 112-119, April 1982.
[32]
A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of the Conference on Programming Language Design and Implementation, pages 196-205. ACM, 1994.
[33]
J. Torrellas, C. Xia, and R. Daigle. Optimizing instruction cache performance for operating system intensive workloads. In Proceedings oJ'the First International Symposium on High-PerJormance Computer Architecture, pages 360-369, January 1995.
[34]
Y. Wu. Ordering functions for improving memory reference locality in a shared memory multiprocessor system. Proceedings of the 25th Annual International Symposium on Microarchitecture, 23(1):218-221, December 1992.

Cited By

View all
  • (2023)The Bounded Pathwidth of Control-Flow GraphsProceedings of the ACM on Programming Languages10.1145/36228077:OOPSLA2(292-317)Online publication date: 16-Oct-2023
  • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
  • (2020)Reshape your layouts, not your programs: A safe language extension for better cache localityScience of Computer Programming10.1016/j.scico.2020.102481(102481)Online publication date: May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
    October 1998
    326 pages
    ISBN:1581131070
    DOI:10.1145/291069
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 1998
Published in SIGPLAN Volume 33, Issue 11

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)233
  • Downloads (Last 6 weeks)61
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The Bounded Pathwidth of Control-Flow GraphsProceedings of the ACM on Programming Languages10.1145/36228077:OOPSLA2(292-317)Online publication date: 16-Oct-2023
  • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
  • (2020)Reshape your layouts, not your programs: A safe language extension for better cache localityScience of Computer Programming10.1016/j.scico.2020.102481(102481)Online publication date: May-2020
  • (2019)ShiftsReduceACM Transactions on Architecture and Code Optimization10.1145/337248916:4(1-23)Online publication date: 26-Dec-2019
  • (2019)Evaluating the effectiveness of program data features for guiding memory managementProceedings of the International Symposium on Memory Systems10.1145/3357526.3357537(383-395)Online publication date: 30-Sep-2019
  • (2019)Fast and exact analysis for LRU cachesProceedings of the ACM on Programming Languages10.1145/32903673:POPL(1-29)Online publication date: 2-Jan-2019
  • (2019)Efficient parameterized algorithms for data packingProceedings of the ACM on Programming Languages10.1145/32903663:POPL(1-28)Online publication date: 2-Jan-2019
  • (2019)Fully abstract module compilationProceedings of the ACM on Programming Languages10.1145/32903233:POPL(1-29)Online publication date: 2-Jan-2019
  • (2019)Sound and complete bidirectional typechecking for higher-rank polymorphism with existentials and indexed typesProceedings of the ACM on Programming Languages10.1145/32903223:POPL(1-28)Online publication date: 2-Jan-2019
  • (2018)C♭: a new modular approach to implementing efficient and tunable collectionsProceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3276954.3276956(57-71)Online publication date: 24-Oct-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media