Article

Kinetic modeling of data eviction in cache

Authors:

Zhenlin WangAuthors Info & Claims

USENIX ATC '16: Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference

Pages 351 - 364

Published: 22 June 2016 Publication History

Abstract

The reuse distance (LRU stack distance) is an essential metric for performance prediction and optimization of storage and CPU cache. Over the last four decades, there have been steady improvements in the algorithmic efficiency of reuse distance measurement. This progress is accelerating in recent years both in theory and practical implementation.

In this paper, we present a kinetic model of LRU cache memory, based on the average eviction time (AET) of the cached data. The AET model enables fast measurement and low-cost sampling. It can produce the miss ratio curve (MRC) in linear time with extremely low space costs. On both CPU and storage benchmarks, AET reduces the time and space costs compare to former techniques. Furthermore, AET is a composable model that can characterize shared cache behavior through modeling individual programs.

References

[1]

Peter J Denning. Working sets past and present. Software Engineering, IEEE Transactions on, (1):64-84, 1980.

[2]

Peter J Denning. The working set model for program behavior. Communications of the ACM, 11(5):323-333, 1968.

[3]

Peter J Denning and Donald R Slutz. Generalized working sets for segment reference strings. Communications of the ACM, 21(9):750-759, 1978.

[4]

R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78-117, 1970.

[5]

Frank Olken. Efficient methods for calculating the success function of fixed-space replacement policies. Technical report, Lawrence Berkeley Lab., CA (USA), 1981.

[6]

Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. HOTL: a higher order theory of locality. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 343-356, 2013.

[7]

David Eklov and Erik Hagersten. StatStack: Efficient modeling of LRU caches. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 55-65. IEEE, 2010.

[8]

Yunlian Jiang, Eddy Z Zhang, Kai Tian, and Xipeng Shen. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Compiler Construction, pages 264-282. Springer, 2010.

[9]

Xipeng Shen, Jonathan Shaw, Brian Meeker, and Chen Ding. Locality approximation using time. In ACM SIGPLAN Notices, volume 42, pages 55-61. ACM, 2007.

[10]

Peter J Denning and Stuart C Schwartz. Properties of the working-set model. Communications of the ACM, 15(3):191- 198, 1972.

[11]

Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas JA Harvey, Andrew Warfield, and Coho Data. Characterizing storage workloads with counter stacks. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation, pages 335-349. USENIX Association, 2014.

[12]

Carl A Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad. Efficient MRC construction with SHARDS. In 13th USENIX Conference on File and Storage Technologies (FAST 15), pages 95-110. USENIX Association, 2015.

[13]

Zachary Drudi, Nicholas JA Harvey, Stephen Ingram, Andrew Warfield, and Jake Wires. Approximating hit rate curves using streaming algorithms. In LIPIcs-Leibniz International Proceedings in Informatics, volume 40. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015.

[14]

Peter J Denning, Craig H Martell, and Vint Cerf. Great Principles of Computing. MIT Press, 2015.

[15]

Chen Ding, Xiaoya Xiang, Bin Bao, Hao Luo, Ying-Wei Luo, and Xiao-Lin Wang. Performance metrics and models for shared cache. Journal of Computer Science and Technology, 29(4):692-712, 2014.

[16]

Dhruba Chandra, Fei Guo, Seongbeom Kim, and Yan Solihin. Predicting inter-thread cache contention on a chip multiprocessor architecture. In High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on, pages 340-351. IEEE, 2005.

[17]

G Edward Suh, Srinivas Devadas, and Larry Rudolph. Analytical cache models with applications to cache partitioning. In 25th Anniversary International Conference on Supercomputing Anniversary Volume, pages 323-334. ACM, 2014.

[18]

Xiaoya Xiang, Bin Bao, Tongxin Bai, Chen Ding, and Trishul M. Chilimbi. All-window profiling and composable models of cache sharing. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 91-102, 2011.

[19]

Xiaoya Xiang, Bin Bao, Chen Ding, and Yaoqing Gao. Linear-time modeling of program working set in shared cache. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 350-360. IEEE, 2011.

[20]

Xiaolin Wang, Yechen Li, Yingwei Luo, Xiameng Hu, Jacob Brock, Chen Ding, and Zhenlin Wang. Optimal footprint symbiosis in shared cache. In CCGRID, 2015.

[21]

Jacob Brock, Yechen Li, Chencheng Ye, and Chen Ding. Optimal cache partition-sharing : Dont ever take a fence down until you know why it was put up. robert frost. In International Conference on Parallel Processing, 2015.

[22]

Yutao Zhong and Wentao Chang. Sampling-based program locality approximation. In Proceedings of the International Symposium on Memory Management, pages 91-100, 2008.

[23]

Derek L. Schuff, Milind Kulkarni, and Vijay S. Pai. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 53-64, 2010.

[24]

David K. Tam, Reza Azimi, Livio Soares, and Michael Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 121-132, 2009.

[25]

Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, and Robert W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 339-349, 2005.

[26]

Kristof Beyls and Erik H. D'Hollander. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of High Performance Computing and Communications. Springer. Lecture Notes in Computer Science, volume 4208, pages 220-229, 2006.

[27]

Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37-57, 1985.

[28]

Éric Fusy, G Olivier, and Frédéric Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In In AofA07: Proceedings of the 2007 International Conference on Analysis of Algorithms, 2007.

[29]

Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 4(3):10, 2008.

[30]

Nimrod Megiddo and Dharmendra S Modha. Arc: A self-tuning, low overhead replacement cache. In FAST, volume 3, pages 115-130, 2003.

[31]

Erik Berg and Erik Hagersten. Fast data-locality profiling of native execution. In ACM SIGMETRICS Performance Evaluation Review, volume 33, pages 169-180. ACM, 2005.

[32]

Erik Berg and Erik Hagersten. Statcache: a probabilistic approach to efficient and accurate data locality analysis. In Performance Analysis of Systems and Software, 2004 IEEE International Symposium on-ISPASS, pages 20-27. IEEE, 2004.

[33]

cpu2006. https://www.spec.org/cpu2006/, 2015. [Online].

[34]

Chi-Keung Luk, Robert S. Cohn, Robert Muth, Harish Patil, Artur Klauser, P. Geoffrey Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 190-200, 2005.

[35]

Matthew Arnold and Barbara G Ryder. A framework for reducing the cost of instrumented code. ACM SIGPLAN Notices, 36(5):168-179, 2001.

[36]

Michael D Bond, Katherine E Coons, and Kathryn S McKinley. Pacer: proportional detection of data races. ACM SIGPLAN Notices, 45(6):255-268, 2010.

[37]

Donald R. Slutz and Irving L. Traiger. A note on the calculation working set size. Communications of the ACM.

[38]

George Almasi, Calin Cascaval, and David A. Padua. Calculating stack distances efficiently. In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance, pages 37-43, Berlin, Germany, June 2002.

[39]

Yutao Zhong, Xipeng Shen, and Chen Ding. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems (TOPLAS), 31(6):20, 2009.

[40]

G Edward Suh, Srinivas Devadas, and Larry Rudolph. Analytical cache models with applications to cache partitioning. In Proceedings of the 15th international conference on Supercomputing, pages 1-12. ACM, 2001.

[41]

Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European conference on Computer systems, pages 89-102. ACM, 2009.

[42]

Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou, and Sanjeev Kumar. Dynamic tracking of page miss ratio curve for memory management. In ACM SIGOPS Operating Systems Review, volume 38, pages 177-188. ACM, 2004.

[43]

Yul H. Kim, Mark D. Hill, and David A. Wood. Implementing stack simulation for highly-associative memories. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, pages 212-213, 1991.

[44]

Hjortur Bjornsson, Gregory Chockler, Trausti Saemundsson, and Ymir Vigfusson. Dynamic performance profiling of cloud caches. In Proceedings of the 4th annual Symposium on Cloud Computing, page 59. ACM, 2013.

[45]

Xiameng Hu, Xiaolin Wang, Yechen Li, Lan Zhou, Yingwei Luo, Chen Ding, Song Jiang, and Zhenlin Wang. Lama: Optimized locality-aware memory allocation for key-value cache. In Proceedings of USENIX ATC, 2015.

Cited By

Park HQiu ZGanger GAmvrosiadis GWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON CacheProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695972(347-368)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695972
Sha SLi CWang XWang ZLuo Y(2024)Hardware-Software Collaborative Tiered-Memory Management Framework for VirtualizationACM Transactions on Computer Systems10.1145/363956442:1-2(1-32)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3639564
Guo XWang HZhou KJiang HHan YXing G(2024)FLOWS: Balanced MRC Profiling for Heterogeneous Object-Size CacheProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650078(421-440)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650078
Show More Cited By

Kinetic modeling of data eviction in cache
1. General and reference
  1. Cross-computing tools and techniques
2. Social and professional topics
  1. Professional topics
    1. Computing profession

Recommendations

Improving inclusive cache performance with two-level eviction priority
ICCD '12: Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)

Inclusive cache hierarchies are widely adopted in modern processors, since they can simplify the implementation of cache coherence. However, it sacrifices some performance to guarantee inclusion. Many recent intelligent management policies are proposed ...
Modeling LRU cache with invalidation

Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...
Data trace cache: an application specific cache architecture
Special issue: MEDEA'05

Benefits of advances in processor technology have long been held hostage to the widening processor-memory gap. Off-chip memory access latency is one of the most critical parameters limiting system performance. Caches have been used as a way of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIX ATC '16: Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference

June 2016

662 pages

ISBN:9781931971300

Program Chairs:
Ajay Gulati
Zerostack, Inc.
,
Hakim Weatherspoon
Cornell University

Sponsors

VMware
NetApp
Google Inc.
NSF
Facebook: Facebook

Publisher

USENIX Association

United States

Publication History

Published: 22 June 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park HQiu ZGanger GAmvrosiadis GWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON CacheProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695972(347-368)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695972
Sha SLi CWang XWang ZLuo Y(2024)Hardware-Software Collaborative Tiered-Memory Management Framework for VirtualizationACM Transactions on Computer Systems10.1145/363956442:1-2(1-32)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3639564
Guo XWang HZhou KJiang HHan YXing G(2024)FLOWS: Balanced MRC Profiling for Heterogeneous Object-Size CacheProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650078(421-440)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650078
Maruf HZhong YWang HChowdhury MCidon AWaldspurger C(2023)Memtrade: Marketplace for Disaggregated Memory CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35899857:2(1-27)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1145/3589985
Sha SLi CLuo YWang XWang ZFedorova ANarayanan DDi Luna GQuerzoni L(2023)vTMM: Tiered Memory Management for Virtual MachinesProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587449(283-297)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3552326.3587449
Gokul Vasan LF. Zulian ÉWeis CJung MWehn N(2021)Online Working Set Change Detection with Constant ComplexityProceedings of the International Symposium on Memory Systems10.1145/3488423.3519332(1-16)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1145/3488423.3519332
Smith WByrne DDing C(2021)Writeback Modeling: Theory and Application to Zipfian WorkloadsProceedings of the International Symposium on Memory Systems10.1145/3488423.3519331(1-14)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1145/3488423.3519331
Yang JWang YWang Z(2021)Efficient Modeling of Random Sampling-Based LRUProceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472514(1-11)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472514
Pan CWang XLuo YWang Z(2021)Penalty- and Locality-aware Memory Allocation in Redis Using Enhanced AETACM Transactions on Storage10.1145/344757317:2(1-45)Online publication date: 28-May-2021
https://dl.acm.org/doi/10.1145/3447573
Fuerst ASharma PSherwood TBerger EKozyrakis C(2021)FaasCache: keeping serverless computing alive with greedy-dual cachingProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446757(386-400)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446757
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents