Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3210563.3210565acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article
Public Access

Prediction and bounds on shared cache demand from memory access interleaving

Published: 18 June 2018 Publication History

Abstract

Cache in multicore machines is often shared, and the cache performance depends on how memory accesses belonging to different programs interleave with one another. The full range of performance possibilities includes all possible interleavings, which are too numerous to be studied by experiments for any mix of non-trivial programs.
This paper presents a theory to characterize the effect of memory access interleaving due to parallel execution of non-data-sharing programs. The theory uses an established metric called the footprint (which can be used to calculate miss ratios in fully-associative LRU caches) to measure cache demand, and considers the full range of interleaving possibilities. The paper proves a lower bound for footprints of interleaved traces, and then formulates an upper bound in terms of the footprints of the constituent traces. It also shows the correctness of footprint composition used in a number of existing techniques, and places precise bounds on its accuracy.

References

[1]
Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, and Yingwei Luo. 2015. Optimal Cache Partition-Sharing. In International Conference on Parallel Processing.
[2]
Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi, and Santosh Nagarakatte. 2010. A randomized scheduler with probabilistic guarantees of finding bugs. In ASPLOS. 167–178.
[3]
Dhruba Chandra, Fei Guo, Seongbeom Kim, and Yan Solihin. 2005. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In Proceedings of the International Symposium on HighPerformance Computer Architecture. 340–351.
[4]
Peter J. Denning. 1968. The working set model for program behaviour. Commun. ACM 11, 5 (1968), 323–333.
[5]
Peter J. Denning and Stuart C. Schwartz. 1972. Properties of the working set model. Commun. ACM 15, 3 (1972), 191–198.
[6]
Peter J. Denning and Donald R. Slutz. 1978. Generalized working sets for segment reference strings. Commun. ACM 21, 9 (1978), 750–759.
[7]
Chen Ding and Trishul Chilimbi. 2008. All-window profiling of concurrent executions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Poster paper.
[8]
David Eklov, David Black-Schaffer, and Erik Hagersten. 2011. Fast modeling of shared caches in multicore systems. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers. 147–157. Best paper.
[9]
David Eklov and Erik Hagersten. 2010. StatStack: Efficient modeling of LRU caches. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 55–65.
[10]
Alexandra Fedorova, Margo I. Seltzer, and Michael D. Smith. 2007. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 25–38.
[11]
Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Chen Ding, and Zhenlin Wang. 2016. Kinetic Modeling of Data Eviction in Cache. In Proceedings of USENIX Annual Technical Conference. 351– 364. https://www.usenix.org/conference/atc16/technical-sessions/ presentation/hu
[12]
Yunlian Jiang, Kai Tian, and Xipeng Shen. 2010. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers. 201–215.
[13]
Hao Li, Jialiang Chang, Zijiang Yang, and Steve Carr. 2017. Memory Distance Measurement for Concurrent Programs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing.
[14]
Pengcheng Li, Chen Ding, and Hao Luo. 2014. Modeling Heap Data Growth Using Average Liveness. In Proceedings of the International Symposium on Memory Management.
[15]
Pengcheng Li, Hao Luo, and Chen Ding. 2016. Rethinking a heap hierarchy as a cache hierarchy: a higher-order theory of memory demand (HOTM). In Proceedings of the International Symposium on Memory Management. 111–121.
[16]
Andreas Sandberg, Andreas Sembrant, Erik Hagersten, and David Black-Schaffer. 2013. Modeling performance variation due to cache sharing. In Proceedings of the International Symposium on HighPerformance Computer Architecture. 155–166.
[17]
Xipeng Shen, Jonathan Shaw, Brian Meeker, and Chen Ding. 2007. Locality approximation using time. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 55–61.
[18]
G. Edward Suh, Srinivas Devadas, and Larry Rudolph. 2001. Analytical cache models with applications to cache partitioning. In Proceedings of the International Conference on Supercomputing. 1–12.
[19]
Richard West, Puneet Zaroo, Carl A. Waldspurger, and Xiao Zhang. 2010. Online cache modeling for commodity multicore processors. Operating Systems Review 44, 4 (2010), 19–29.
[20]
Xiaoya Xiang, Bin Bao, Chen Ding, and Yaoqing Gao. 2011. Linear-time Modeling of Program Working Set in Shared Cache. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 350–360.
[21]
Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: a higher order theory of locality. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 343–356.
[22]
Chencheng Ye, Chen Ding, Hao Luo, Jacob Brock, Dong Chen, and Hai Jin. 2017. Cache Exclusivity and Sharing: Theory and Optimization. ACM Transactions on Architecture and Code Optimization 14, 4, 34:1– 34:26.
[23]
Liang Yuan, Chen Ding, Peter J. Denning, and Yunquan Zhang. 2018. A Measurement Theory of Locality(MTL). arXiv preprint arXiv:1802.01254 (2018).
[24]
Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the EuroSys Conference. 89–102.
[25]
Xiao Zhang, Rongrong Zhong, Sandhya Dwarkadas, and Kai Shen. 2012. A Flexible Framework for Throttling-Enabled Multicore Management (TEMM). In International Conference on Parallel Processing. 389–398.

Cited By

View all
  • (2019)Generation of in-bounds inputs for arrays in memory-unsafe languagesProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314890(136-148)Online publication date: 16-Feb-2019
  • (2019)A Relational Theory of LocalityACM Transactions on Architecture and Code Optimization10.1145/334110916:3(1-26)Online publication date: 20-Aug-2019
  • (2019)Generation of In-Bounds Inputs for Arrays in Memory-Unsafe Languages2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661178(136-148)Online publication date: Feb-2019
  • Show More Cited By

Index Terms

  1. Prediction and bounds on shared cache demand from memory access interleaving

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISMM 2018: Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory Management
    June 2018
    119 pages
    ISBN:9781450358019
    DOI:10.1145/3210563
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Footprint
    2. Multicore
    3. Parallel Processing
    4. Performance Prediction
    5. Shared Cache

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ISMM '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 72 of 156 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Generation of in-bounds inputs for arrays in memory-unsafe languagesProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314890(136-148)Online publication date: 16-Feb-2019
    • (2019)A Relational Theory of LocalityACM Transactions on Architecture and Code Optimization10.1145/334110916:3(1-26)Online publication date: 20-Aug-2019
    • (2019)Generation of In-Bounds Inputs for Arrays in Memory-Unsafe Languages2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661178(136-148)Online publication date: Feb-2019
    • (2019)Footmark: A New Formulation for Working Set StatisticsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_5(61-69)Online publication date: 13-Nov-2019

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media