research-article

Public Access

Writeback Modeling: Theory and Application to Zipfian Workloads

Authors:

Chen DingAuthors Info & Claims

MEMSYS '21: Proceedings of the International Symposium on Memory Systems

Article No.: 10, Pages 1 - 14

https://doi.org/10.1145/3488423.3519331

Published: 09 May 2022 Publication History

All formats PDF

Abstract

As per-core CPU performance plateaus and data-bound applications like graph analytics and key-value stores become more prevalent, understanding memory performance is more important than ever. Many existing techniques to predict and measure cache performance on a given workload involve either static analysis or tracing, but programs like key-value stores can easily have billions of memory accesses in a trace and have access patterns driven by non-statically observable phenomena such as user behavior. Past analytical solutions focus on modeling cache hits, but the rise of non-volatile memory (NVM) like Intel’s Optane with asymmetric read/write latencies, bandwidths, and power consumption means that writes and writebacks are now critical performance considerations as well, especially in the context of large-scale software caches. We introduce two novel analytical cache writeback models that function for workloads with general frequency distributions; in addition we provide closed-form instantiations for Zipfian workloads, one of the most ubiquitous frequency distribution types in data-bound applications. The models have different use cases and asymptotic runtimes, making them suited for use in different circumstances, but both are fully analytical; cache writeback statistics are computed with no tracing or sampling required. We demonstrate that these models are extremely accurate and fast: the first model, for infinitely large level-two (L2) software cache, averaged 5.0% relative error from ground truth and achieved a minimum speedup over a state-of-the-art trace analysis technique (AET) of 515x to generate writeback information for a single cache size. The second model, which is fully general with respect to L1 and L2 sizes but slower, averaged 3.0% relative error from ground truth and achieved a minimum speedup over AET of 105x for a single cache size.

References

[1]

2018. memcached. https://memcached.org.

[2]

2020. RGSL Library. https://docs.rs/GSL/1.1.0/rgsl/index.html.

[3]

Lada A Adamic and Bernardo A Huberman. 2002. Zipf’s law and the Internet.Glottometrics 3, 1 (2002), 143–150.

[4]

Nathan Beckmann, Phillip B Gibbons, Bernhard Haeupler, and Charles McGuffey. 2020. Writeback-aware caching. In Symposium on Algorithmic Principles of Computer Systems (APOCS 2020).

[5]

Christian Berthet. 2017. Approximation of LRU Caches Miss Rate: Application to Power-law Popularities. CoRR abs/1705.10738(2017). arxiv:1705.10738http://arxiv.org/abs/1705.10738

[6]

Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. 1998. On the implications of Zipf’s law for web caching. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[7]

Giovanna Carofiglio, Massimo Gallo, Luca Muscariello, and Diego Perino. 2011. Modeling data transfer in content-centric networking. In 2011 23rd International Teletraffic Congress (ITC). 111–118.

[8]

D. Carra and P. Michiardi. 2014. Memory partitioning in Memcached: An experimental performance analysis. In 2014 IEEE International Conference on Communications (ICC ’14). 1154–1159.

[9]

Dong Chen, Fangzhou Liu, Chen Ding, and Sreepathi Pai. 2018. Locality Analysis through Static Parallel Sampling. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) (PLDI 2018). Association for Computing Machinery, New York, NY, USA, 557–570. https://doi.org/10.1145/3192366.3192402

Digital Library

[10]

Dong Chen, Chencheng Ye, and Chen Ding. 2016. Write Locality and Optimization for Persistent Memory. In Proceedings of the Second International Symposium on Memory Systems (Alexandria, VA, USA) (MEMSYS ’16). Association for Computing Machinery, New York, NY, USA, 77–87. https://doi.org/10.1145/2989081.2989119

Digital Library

[11]

Ariel Duarte-López, Marta Pérez-Casany, and Jordi Valero. 2020. The Zipf–Poisson-stopped-sum distribution with an application for modeling the degree sequence of social networks. Computational Statistics & Data Analysis 143 (2020), 106838. https://doi.org/10.1016/j.csda.2019.106838

Digital Library

[12]

Ronald Fagin. 1977. Asymptotic Miss Ratios over Independent References. J. Comput. Syst. Sci. 14, 2 (1977), 222–250. https://doi.org/10.1016/S0022-0000(77)80014-7

[13]

Dror Feitelson. 2014. Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press.

Digital Library

[14]

Christine Fricker, Philippe Robert, James Roberts, and Nada Sbihi. 2012. Impact of traffic mix on caching performance in a content-centric network. In 2012 Proceedings IEEE INFOCOM Workshops. 310–315. https://doi.org/10.1109/INFCOMW.2012.6193511

[15]

Frank Hady. 2020. Intel Optane Technology Delivers New Levels of Endurance. Technical Report. Intel Corp.

[16]

Xiameng Hu, Xiaolin Wang, Yechen Li, Lan Zhou, Yingwei Luo, Chen Ding, Song Jiang, and Zhenlin Wang. 2015. LAMA: Optimized Locality-aware Memory Allocation for Key-value Cache. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 57–69.

[17]

Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Chen Ding, and Zhenlin Wang. 2016. Kinetic Modeling of Data Eviction in Cache. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) (USENIX ATC ’16). USENIX Association, USA, 351–364.

Digital Library

[18]

Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Zhenlin Wang, Chen Ding, and Chencheng Ye. 2018. Fast Miss Ratio Curve Modeling for Storage Cache. ACM Trans. Storage 14, 2, Article 12 (April 2018), 34 pages. https://doi.org/10.1145/3185751

Digital Library

[19]

Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. CoRR abs/1903.05714(2019). arxiv:1903.05714http://arxiv.org/abs/1903.05714

[20]

Predrag R. Jelenković. 1999. Asymptotic approximation of the move-to-front search cost distribution and least-recently used caching fault probabilities. Ann. Appl. Probab. 9, 2 (05 1999), 430–464. https://doi.org/10.1214/aoap/1029962750

[21]

Zixiao Jia, Peng Zhang, Jiwei Huang, Chuang Lin, and John C. S. Lui. 2013. Modeling Hierarchical Caches in Content-Centric Networks. In 2013 22nd International Conference on Computer Communication and Networks (ICCCN). 1–7. https://doi.org/10.1109/ICCCN.2013.6614153

[22]

Krishna Kant. 2009. Estimation of Invalidation and Writeback Rates in Multiple Processor Systems. (04 2009).

[23]

Qiang Li, Yuanmei Zhang, Ashish Pandharipande, Xiaohu Ge, and Jiliang Zhang. 2019. D2D-assisted caching on truncated Zipf distribution. IEEE Access 7(2019), 13411–13421.

[24]

Lena Olson and Mark Hill. 2016. Probabilistic Directed Writebacks for Exclusive Caches. 44, 1 (jul 2016). https://doi.org/10.1145/2971331.2971334

Digital Library

[25]

Ivy Bo Peng, Maya B. Gokhale, and Eric W. Green. 2019. System evaluation of the Intel optane byte-addressable NVM. In Proceedings of the International Symposium on Memory Systems, MEMSYS 2019, Washington, DC, USA, September 30 - October 03, 2019. ACM, 304–315. https://doi.org/10.1145/3357526.3357568

Digital Library

[26]

David MW Powers. 1998. Applications and explanations of Zipf’s law. In New methods in language processing and computational natural language learning.

[27]

Elisha Rosensweig and Jim Kurose. 2013. A Network Calculus for Cache Networks. Proceedings - IEEE INFOCOM, 85–89. https://doi.org/10.1109/INFCOM.2013.6566740

[28]

R. Salkhordeh, O. Mutlu, and H. Asadi. 2019. An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories. IEEE Trans. Comput. 68, 8 (2019), 1114–1130. https://doi.org/10.1109/TC.2019.2906597

Digital Library

[29]

D. N. Serpanos, G. Karakostas, and W. H. Wolf. 2000. Effective caching of Web objects using Zipf’s law. In 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), Vol. 2. 727–730 vol.2. https://doi.org/10.1109/ICME.2000.871464

[30]

Daniel G. Waddington, Mark Kunitomi, Clem Dickey, Samyukta Rao, Amir Abboud, and Jantz Tran. 2019. Evaluation of intel 3D-xpoint NVDIMM technology for memory-intensive genomic workloads. In Proceedings of the International Symposium on Memory Systems, MEMSYS 2019, Washington, DC, USA, September 30 - October 03, 2019. ACM, 277–287. https://doi.org/10.1145/3357526.3357528

Digital Library

[31]

Carl A. Waldspurger, Trausti Saemundsson, Irfan Ahmad, and Nohhyun Park. 2017. Cache Modeling and Optimization using Miniature Simulations. 487–498. https://www.usenix.org/conference/atc17/technical-sessions/presentation/waldspurger

[32]

Wikipedia contributors. 2020. Independent Reference Model — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Independent_Reference_Model&oldid=959913121 [Online; accessed 7-April-2021].

[33]

Wikipedia contributors. 2021. Write amplification — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Write_amplification&oldid=1015893909 [Online; accessed 19-April-2021].

[34]

Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas J. A. Harvey, and Andrew Warfield. 2014. Characterizing Storage Workloads with Counter Stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 335–349.

Digital Library

[35]

Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed Power-law Graph Computing: Theoretical and Empirical Analysis. In Nips, Vol. 27. 1673–1681.

[36]

Juncheng Yang. 2021. libCacheSim - a library for cache simulation, profiling, and analysis. Retrieved May, 2021 from https://github.com/1a1a11a/libCacheSim

[37]

Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 191–208. https://www.usenix.org/conference/osdi20/presentation/yang

Digital Library

[38]

Yue Yang and Jianwen Zhu. 2016. Write Skew and Zipf Distribution: Evidence and Implications. ACM Trans. Storage 12, 4, Article 21 (June 2016), 19 pages. https://doi.org/10.1145/2908557

Digital Library

[39]

Liang Yuan, Chen Ding, Wesley Smith, Peter Denning, and Yunquan Zhang. 2019. A Relational Theory of Locality. ACM Trans. Archit. Code Optim. 16, 3, Article 33 (Aug. 2019), 26 pages. https://doi.org/10.1145/3341109

Digital Library

[40]

Teng Zhang, Jianying Wang, Xuntao Cheng, Hao Xu, Gui Huang, Tieying Zhang, Dengcheng He, Feifei Li, Wei Cao, Zhongdong Huang, and Jianling Sun. 2020. FPGA-Accelerated Compactions for LSM-based Key-Value Store. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20).

Digital Library

[41]

Yutao Zhong, Xipeng Shen, and Chen Ding. 2009. Program Locality Analysis Using Reuse Distance. ACM Trans. Program. Lang. Syst. (TOPLAS ’09) 31, 6, Article 20 (Aug. 2009), 39 pages.

Recommendations

Writeback-Aware Caching (Brief Announcement)
SPAA '19: The 31st ACM Symposium on Parallelism in Algorithms and Architectures

Motivated by emerging memory technologies and the increasing importance of energy and bandwidth, we study the Writeback-Aware Caching Problem. This problem modifies the caching problem by explicitly accounting for the cost of writing data to memory. In ...
VAIL: A Victim-Aware Cache Policy for Improving Lifetime of Hybrid Memory
PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores

Nowadays emerging Non-Volatile Memory (NVM) technologies are introduced to remedy the shortages of the current DRAM-based memory system. However, NVM has limited write endurance, which would severely restrict the performance of memory system. In order ...
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '21: Proceedings of the International Symposium on Memory Systems

September 2021

158 pages

ISBN:9781450385701

DOI:10.1145/3488423

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

MEMSYS 2021

MEMSYS 2021: The International Symposium on Memory Systems

September 27 - 30, 2021

DC, Washington DC, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
319
Total Downloads

Downloads (Last 12 months)170
Downloads (Last 6 weeks)43

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents