Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Analytical modeling of cache behavior for affine programs

Published: 27 December 2017 Publication History

Abstract

Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated.
This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework.

Supplementary Material

WEBM File (analyticalmodeling.webm)

References

[1]
M. Adams. 2014. HPGMG: a benchmark for ranking high performance computing systems. (2014). https://www.hpgmg.org/
[2]
A. Agarwal, J. Hennessy, and M. Horowitz. 1989. An Analytical Cache Model. ACM Transactions on Computer Systems (1989), 184ś215.
[3]
N. Ahmed, N. Mateev, and K. Pingali. 2001. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. International Journal of Parallel Programming (2001), 493ś544.
[4]
M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. 1996. Cache behavior prediction by abstract interpretation. In International Static Analysis Symposium (SAS’96). 52ś66.
[5]
W. Bao, C. Hong, S. Chunduri, S. Krishnamoorthy, N. Pouchet, F. Rastello, and P. Sadayappan. 2016a. Static and Dynamic Frequency Scaling on Multicore CPUs. ACM Transactions on Architecture and Code Optimization (2016), 1ś26.
[6]
W. Bao, S. Krishnamoorthy, L. Pouchet, F. Rastello, and P. Sadayappan. 2016b. PolyCheck: Dynamic Veriication of Iteration Space Transformations on Aine Programs. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16) (2016), 539ś554.
[7]
W. Bao, P. Rawat, M. Kong, S. Krishnamoorthy, L. Pouchet, and P. Sadayappan. 2017. Eicient Cache Simulation for Aine Computations. In International Workshop on Languages and Compilers for Parallel Computing (LCPC’17).
[8]
W. Bao, S. Tavarageri, F. Ozguner, and P. Sadayappan. 2014. PWCET: Power-Aware Worst Case Execution Time Analysis. In 43rd International Conference on Parallel Processing Workshops. 439ś447.
[9]
E. Berg and E. Hagersten. 2004. StatCache: a probabilistic approach to eicient and accurate data locality analysis. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’04). 20ś27.
[10]
Kristof Beyls and Erik H. D’Hollander. 2005. Generating cache hints for improved program eiciency. Journal of Systems Architecture 51, 4 (2005), 223 ś 250.
[11]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08).
[12]
T. Carlson, W. Heirman, S. Eyerman, I. Hur, and L. Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Transactions on Architecture and Code Optimization (2014).
[13]
S. Carr, S. McKinley, and C. Tseng. 1994. Compiler Optimizations for Improving Data Locality. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’94). 252ś262.
[14]
C. Cascaval and A. Padua. 2003. Estimating cache misses and locality using stack distances. In 17th Annual International Conference on Supercomputing (ICS’03). 150ś159.
[15]
S. Chatterjee, E. Parker, J. Hanlon, and R. Lebeck. 2001. Exact Analysis of the Cache Behavior of Nested Loops. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). 286ś297.
[16]
J. Edler and M. Hill. 1999. Dinero IV Trace-Driven Uniprocessor Cache Simulator. http://pages.cs.wisc.edu/~markhill/ DineroIV
[17]
C. Fang, S. Can, S. Onder, and Z. Wang. 2005. Instruction based memory distance analysis and its application to optimization. In International Conference on Parallel Architectures and Compilation Techniques (PACT’05). 27ś37.
[18]
C. Fang, S. Carr, S. Önder, and Z. Wang. 2004. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proc. 2004 Workshop on Memory System Performance. 60ś68.
[19]
P. Feautrier. 1992. Some eicient solutions to the aine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming (1992), 389ś420.
[20]
J. Ferrante, V. Sarkar, and W. Thrash. 1991. On estimating and enhancing cache efectiveness. In International Workshop on Languages and Compilers for Parallel Computing (LCPC’91). 328ś343.
[21]
B. Fraguela, R. Doallo, and L. Zapata. 1999. Automatic analytical modeling for the estimation of cache misses. In International Conference on Parallel Architectures and Compilation Techniques (PACT’99). 221ś231.
[22]
B. Fraguela, R. Doallo, and L. Zapata. 2003. Probabilistic miss equations: Evaluating memory hierarchy performance. IEEE Trans. Comput. (2003), 321ś336.
[23]
A. Frumkin and Rob F. Van W. 2002. Tight bounds on cache use for stencil operations on rectangular grids. J. ACM (2002), 434ś453.
[24]
S. Ghosh, M. Martonosi, and S. Malik. 1998. Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’98). 228ś239.
[25]
S. Ghosh, M. Martonosi, and S. Malik. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (1999), 703ś746.
[26]
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. 2006. Semi-Automatic Composition of Loop Transformations. International Journal of Parallel Programming (2006), 261ś317.
[27]
S. Harper, J. Kerbyson, and R. Nudd. 1999. Analytical modeling of set-associative cache behavior. IEEE Trans. Comput. (1999), 1009ś1024.
[28]
D. Hill and J. Smith. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. (1989), 1612ś1630.
[29]
C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L. Pouchet, F. Rastello, J. Ramanujam, and P. Sadayappan. 2016. Efective Padding of Multidimensional Arrays to Avoid Cache Conlict Misses. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’16) (2016), 129ś144.
[30]
W. Kelly and W. Pugh. 1993. A Framework for Unifying Reordering Transformations. Technical Report.
[31]
M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet, and P. Sadayappan. 2013. When polyhedral transformations meet SIMD code generation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 127ś138.
[32]
W. Lim and S. Lam. 1997. Maximizing Parallelism and Minimizing Synchronization with Aine Transforms. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’97). 201ś214.
[33]
C. Oppen. 1978. A 2 2 2pn upper bound on the complexity of Presburger arithmetic. J. Comput. System Sci. (1978), 323ś332.
[34]
L. Pouchet. 2017a. PoCC, the Polyhedral Compiler Collection 1.4. http://pocc.sourceforge.net
[35]
L. Pouchet. 2017b. PolyBench/C 4.0. http://polybench.sourceforge.net
[36]
H. Ramaprasad and F. Mueller. 2005. Bounding worst-case data cache behavior by analytically deriving cache reference patterns. In 11th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS’05). 148ś157.
[37]
G. Rivera and C. Tseng. 1998. Data transformations for eliminating conlict misses. In ACM SIGPLAN conference on Programming language design and implementation (PLDI’98). 38ś49.
[38]
V. Sarkar and N. Megiddo. 2000. An analytical model for loop tiling and its solution. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’00). IEEE, 146ś153.
[39]
J. Shirako, K. Sharma, N. Fauzia, L. Pouchet, J. Ramanujam, P Sadayappan, and V. Sarkar. 2012. Analytical bounds for optimal tile size selection. In International Conference on Compiler Construction (CC’12). Springer, 101ś121.
[40]
A. Shrivastava, J. Lee, and R. Jeyapaul. 2010. Cache vulnerability equations for protecting data in embedded processor caches from soft errors. In ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’10). 143ś152.
[41]
P. Singh, S. Stone, and F. Thiebaut. 1992. A model of workloads and its use in miss-rate prediction for fully associative caches. IEEE Trans. Comput. (1992), 811ś825.
[42]
M. Valiev, J. Bylaska, N. Govind, K. Kowalski, Tjerk P. Straatsma, Hubertus J J. Van D., D. Wang, J. Nieplocha, E. Apra, L. Windus, et al. 2010. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications (2010), 1477ś1489.
[43]
X. Vera, J. Abella, A. González, and J. Llosa. 2003. Optimizing program locality through CMEs and GAs. In International Conference on Parallel Architectures and Compilation Techniques (PACT’03). 68ś78.
[44]
X. Vera, J. Abella, J. Llosa, and A. González. 2005. An Accurate Cost Model for Guiding Data Locality Transformations. ACM Transactions on Programming Languages and Systems (2005), 946ś987.
[45]
X. Vera, N. Bermudo, J. Llosa, and A. González. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Transactions on Programming Languages and Systems (2004), 263ś300.
[46]
X. Vera and J. Xue. 2002. Let’s study whole-program cache behaviour analytically. In International Symposium on HighPerformance Computer Architecture (HPCA’02). 175ś186.
[47]
S. Verdoolaege. 2007. Barvinok, a library for counting the integer points in parametric and non-parametric polytopes. http://barvinok.gforge.inria.fr
[48]
S. Verdoolaege. 2010a. ISL: An integer set library for the polyhedral model. In the 3rd International Congress on Mathematical Software.
[49]
S. Verdoolaege. 2010b. ISL, the Integer Set Library. http://repo.or.cz/w/isl.git
[50]
S. Verdoolaege and T. Grosser. 2012. Polyhedral extraction tool. In 2nd International Workshop on Polyhedral Compilation Techniques.
[51]
S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica (2007), 37ś66.
[52]
W. Wang and L. Baer. 1990. Eicient Trace-driven Simulation Method for Cache Performance Analysis. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’90). 27ś36.
[53]
J. Xue and X. Vera. 2004. Eicient and accurate analytical modeling of whole-program data cache behavior. IEEE Trans. Comput. (2004), 547ś566.
[54]
W. Zhang. 2005. Computing cache vulnerability to transient errors and its implication. In IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05). 427ś435.

Cited By

View all
  • (2024)Falcon: A Scalable Analytical Cache ModelProceedings of the ACM on Programming Languages10.1145/36564528:PLDI(1854-1878)Online publication date: 20-Jun-2024
  • (2024)HiEval: A scheduling performance estimation approach for spatial accelerators via hierarchical abstractionJournal of Systems Architecture10.1016/j.sysarc.2024.103079148(103079)Online publication date: Mar-2024
  • (2023)MC-ELMM: Multi-Chip Endurance-Limited Memory ManagementProceedings of the International Symposium on Memory Systems10.1145/3631882.3631905(1-16)Online publication date: 2-Oct-2023
  • Show More Cited By

Index Terms

  1. Analytical modeling of cache behavior for affine programs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 2, Issue POPL
    January 2018
    1961 pages
    EISSN:2475-1421
    DOI:10.1145/3177123
    Issue’s Table of Contents
    © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 December 2017
    Published in PACMPL Volume 2, Issue POPL

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Compiler optimization
    2. performance analysis
    3. static analysis

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)177
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 30 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Falcon: A Scalable Analytical Cache ModelProceedings of the ACM on Programming Languages10.1145/36564528:PLDI(1854-1878)Online publication date: 20-Jun-2024
    • (2024)HiEval: A scheduling performance estimation approach for spatial accelerators via hierarchical abstractionJournal of Systems Architecture10.1016/j.sysarc.2024.103079148(103079)Online publication date: Mar-2024
    • (2023)MC-ELMM: Multi-Chip Endurance-Limited Memory ManagementProceedings of the International Symposium on Memory Systems10.1145/3631882.3631905(1-16)Online publication date: 2-Oct-2023
    • (2023)DiRaC-I: Identifying Diverse and Rare Training Classes for Zero-Shot LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360314720:1(1-23)Online publication date: 31-May-2023
    • (2023)What Quality Aspects Influence the Adoption of Docker Images?ACM Transactions on Software Engineering and Methodology10.1145/360311132:6(1-30)Online publication date: 30-Sep-2023
    • (2023)Cache Programming for Scientific Loops Using LeasesACM Transactions on Architecture and Code Optimization10.1145/360009020:3(1-25)Online publication date: 19-Jul-2023
    • (2023)Parallelising Control Flow in Dynamic-scheduling High-level SynthesisACM Transactions on Reconfigurable Technology and Systems10.1145/359997316:4(1-32)Online publication date: 1-Sep-2023
    • (2023)MIS: A Multi-Identifier Management and Resolution System in the MetaverseACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3597641Online publication date: 26-May-2023
    • (2023)Virtual, Augmented, and Mixed Reality for Human-robot Interaction: A Survey and Virtual Design Element TaxonomyACM Transactions on Human-Robot Interaction10.1145/359762312:4(1-39)Online publication date: 31-May-2023
    • (2023)Leveraging LLVM's ScalarEvolution for Symbolic Data Cache Analysis2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00029(237-250)Online publication date: 5-Dec-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media