research-article

Open access

Darwin: Flexible Learning-based CDN Caching

Authors:

Sanjay Shakkottai,

Ramesh K SitaramanAuthors Info & Claims

ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference

Pages 981 - 999

https://doi.org/10.1145/3603269.3604863

Published: 01 September 2023 Publication History

Abstract

Cache management is critical for Content Delivery Networks (CDNs), impacting their performance and operational costs. Most production CDNs apply static, hand-tuned caching policy parameters at cache servers, such as admission frequency or size thresholds for the Hot Object Caches (HOC) of their system. However, these static policies fall short when a server is faced with unpredictable traffic pattern changes, even when policies employ multiple control parameters/knobs. Recent approaches have proposed learning-based solutions to dynamically adjust policy parameters, but they are limited in action space, caching objectives, or impose high overhead. We propose Darwin, a CDN cache management system that is robust to traffic pattern changes and can flexibly optimize different caching objectives with unrestricted action spaces. Darwin employs a three-stage pipeline involving traffic pattern feature collection, unsupervised clustering for classification, and neural bandit expert selection to choose the optimal caching policy. Through extensive simulations, experiments using an Apache Traffic Server (ATS)-based prototype, and theoretical analysis, we show that Darwin achieves significant performance gain w.r.t. different objectives such as maximizing object hit rates and minimizing disk writes, while simultaneously adapting to traffic pattern shifts. Darwin imposes negligible overhead and achieves high throughput compared to the state-of-the-art.

References

[1]

Amin, K., Kearns, M., and Syed, U. Graphical models for bandit problems. arXiv preprint arXiv:1202.3782 (2012).

[2]

Ari, I., Amer, A., Gramacy, R. B., Miller, E. L., Brandt, S. A., and Long, D. D. Acme: Adaptive caching using multiple experts. In WDAS (2002), vol. 2, pp. 143--158.

[3]

Asiso. Overview of the cdn akamai. https://www.asioso.com/en/blog/overview-of-the-cdn-akamai-b520, Feb 2021.

[4]

Atsidakou, A., Papadigenopoulos, O., Caramanis, C., Sanghavi, S., and Shakkottai, S. Asymptotically-optimal gaussian bandits with side observations. In International Conference on Machine Learning (2022), PMLR, pp. 1057--1077.

[5]

Basat, R. B., Einziger, G., Friedman, R., and Kassner, Y. Randomized admission policy for efficient top-k and frequency estimation. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications (2017), IEEE, pp. 1--9.

[6]

Beckmann, N., Chen, H., and Cidon, A. LHD: improving cache hit rate by maximizing hit density. In 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2018, Renton, WA, USA, April 9-11, 2018 (2018), S. Banerjee and S. Seshan, Eds., USENIX Association, pp. 389--403.

[7]

Belady, L. A. A study of replacement algorithms for a virtual-storage computer. IBM Systems journal 5, 2 (1966), 78--101.

[8]

Belady, L. A. A study of replacement algorithms for a virtual-storage computer. In IBM Systems journal (1996), vol. 5, pp. 78--101.

[9]

Berger, D. S. Towards lightweight and robust machine learning for CDN caching. In Proceedings of the 17th ACM Workshop on Hot Topics in Networks, HotNets 2018, Redmond, WA, USA, November 15-16, 2018 (2018), ACM, pp. 134--140.

Digital Library

[10]

Berger, D. S., Sitaraman, R. K., and Harchol-Balter, M. Adaptsize: Orchestrating the hot object memory cache in a content delivery network. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (2017), pp. 483--498.

Digital Library

[11]

Buccapatnam, S., Eryilmaz, A., and Shroff, N. B. Stochastic bandits with side observations on networks. In The 2014 ACM international conference on Measurement and modeling of computer systems (2014), pp. 289--300.

Digital Library

[12]

Chen, F., Sitaraman, R. K., and Torres, M. End-user mapping: Next generation request routing for content delivery. In ACM SIGCOMM Computer Communication Review (2015), vol. 45, ACM, pp. 167--181.

Digital Library

[13]

Cherkasova, L., and Ciardo, G. Role of aging, frequency, and size in web cache replacement policies. In High-Performance Computing and Networking: 9th International Conference, HPCN Europe 2001 Amsterdam, The Netherlands, June 25--27, 2001 Proceedings 9 (2001), Springer, pp. 114--123.

[14]

Dilley, J., Maggs, B. M., Parikh, J., Prokop, H., Sitaraman, R. K., and Weihl, W. E. Globally distributed content delivery. IEEE Internet Computing 6, 5 (2002), 50--58.

Digital Library

[15]

Dubhashi, D. P., and Panconesi, A. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, 2009.

[16]

Duplyakin, D., Ricci, R., Maricq, A., Wong, G., Duerig, J., Eide, E., Stoller, L., Hibler, M., Johnson, D., Webb, K., et al. The design and operation of cloudlab. In USENIX Annual Technical Conference (2019), pp. 1--14.

[17]

Einziger, G., Friedman, R., and Manes, B. Tinylfu: A highly efficient cache admission policy. ACM Transactions on Storage (ToS) 13, 4 (2017), 1--31.

Digital Library

[18]

Foundation, T. A. S. Apache traffic server. https://trafficserver.apache.org/, 2018. Accessed: 2023-01-30.

[19]

Garivier, A., and Kaufmann, E. Optimal best arm identification with fixed confidence. In Conference on Learning Theory (2016), PMLR, pp. 998--1027.

[20]

Guan, Y., Zhang, X., and Guo, Z. Caca: Learning-based content-aware cache admission for video content in edge caching. In Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 456--464.

Digital Library

[21]

Jeong, J., and Dubois, M. Cost-sensitive cache replacement algorithms. In The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. (2003), IEEE, pp. 327--337.

[22]

Kirilin, V., Sundarrajan, A., Gorinsky, S., and Sitaraman, R. K. Rl-cache: Learning-based cache admission for content delivery. In Proceedings of the 2019 Workshop on Network Meets AI & ML (2019), pp. 57--63.

Digital Library

[23]

Lattimore, F., Lattimore, T., and Reid, M. D. Causal bandits: Learning good interventions via causal inference. In Advances in Neural Information Processing Systems (2016), pp. 1181--1189.

[24]

Lattimore, T., and Szepesvári, C. Bandit algorithms. Cambridge University Press, 2020.

[25]

Li, L., Chu, W., Langford, J., and Schapire, R. E. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (2010), pp. 661--670.

Digital Library

[26]

Li, Z., Ratliff, L., Nassif, H., Jamieson, K., and Jain, L. Instance-optimal pac algorithms for contextual bandits. arXiv preprint arXiv:2207.02357 (2022).

[27]

Maggs, B. M., and Sitaraman, R. K. Algorithmic nuggets in content delivery. ACM SIGCOMM Computer Communication Review 45, 3 (2015), 52--66.

Digital Library

[28]

McAllister, S., Berg, B., Tutuncu-Macias, J., Yang, J., Gunasekar, S., Lu, J., Berger, D. S., Beckmann, N., and Ganger, G. R. Kangaroo: Theory and practice of caching billions of tiny objects on flash. ACM Trans. Storage 18, 3 (2022), 21:1--21:33.

Digital Library

[29]

McDiarmid, C. Concentration. Probabilistic methods for algorithmic discrete mathematics (1998), 195--248.

[30]

Narayanan, A., Verma, S., Ramadan, E., Babaie, P., and Zhang, Z. Deepcache: A deep learning based framework for content caching. In Proceedings of the 2018 Workshop on Network Meets AI & ML, NetAI@SIGCOMM 2018, Budapest, Hungary, August 24, 2018 (2018), ACM, pp. 48--53.

Digital Library

[31]

Nygren, E., Sitaraman, R. K., and Sun, J. The Akamai Network: A platform for high-performance Internet applications. ACM SIGOPS Operating Systems Review 44, 3 (2010), 2--19.

Digital Library

[32]

Rizzo, L., and Vicisano, L. Replacement policies for a proxy cache. IEEE/ACM Transactions on networking 8, 2 (2000), 158--170.

Digital Library

[33]

Rodriguez, L. V., Yusuf, F. B., Lyons, S., Paz, E., Rangaswami, R., Liu, J., Zhao, M., and Narasimhan, G. Learning cache replacement with cacheus. In FAST (2021), pp. 341--354.

[34]

Sabnis, A., and Sitaraman, R. K. Tragen: a synthetic trace generator for realistic cache simulations. In Proceedings of the 21st ACM Internet Measurement Conference (2021), pp. 366--379.

Digital Library

[35]

Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, Ö. A financial cost metric for result caching. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (2013), pp. 873--876.

Digital Library

[36]

Schomp, K., Bhardwaj, O., Kurdoglu, E., Muhaimen, M., and Sitaraman, R. K. Akamai dns: Providing authoritative answers to the world's queries. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication (2020), pp. 465--478.

Digital Library

[37]

Sen, R., Shanmugam, K., Kocaoglu, M., Dimakis, A., and Shakkottai, S. Contextual bandits with latent confounders: An nmf approach. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (2017), pp. 518--527.

[38]

Sharma, N., Basu, S., Shanmugam, K., and Shakkottai, S. On under-exploration in bandits with mean bounds from confounded data. arXiv preprint arXiv:2002.08405 (2020).

[39]

Song, Z., Berger, D. S., Li, K., and Lloyd, W. Learning relaxed belady for content distribution network caching. In 17th USENIX Symposium on Networked Systems Design and Implementation (2020).

[40]

Suksomboon, K., Tarnoi, S., Ji, Y., Koibuchi, M., Fukuda, K., Abe, S., Motonori, N., Aoki, M., Urushidani, S., and Yamada, S. Popcache: Cache more or less based on content popularity for information-centric networking. In 38th Annual IEEE conference on local computer networks (2013), IEEE, pp. 236--243.

[41]

Sundarrajan, A., Feng, M., Kasbekar, M., and Sitaraman, R. K. Footprint descriptors: Theory and practice of cache provisioning in a global cdn. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies (2017), pp. 55--67.

Digital Library

[42]

Sundarrajan, A., Kasbekar, M., Sitaraman, R. K., and Shukla, S. Midgress-aware traffic provisioning for content delivery. In 2020 USENIX Annual Technical Conference (USENIX ATC 20) (July 2020), USENIX Association, pp. 543--557.

[43]

Valko, M., Munos, R., Kveton, B., and Kocák, T. Spectral bandits for smooth graph functions. In International Conference on Machine Learning (2014), pp. 46--54.

[44]

Vietri, G., Rodriguez, L. V., Martinez, W. A., Lyons, S., Liu, J., Rangaswami, R., Zhao, M., and Narasimhan, G. Driving cache replacement with ml-based lecar. In HotStorage (2018), pp. 928--936.

[45]

Vietri, G., Rodriguez, L. V., Martinez, W. A., Lyons, S., Liu, J., Rangaswami, R., Zhao, M., and Narasimhan, G. Driving cache replacement with ml-based lecar. In 10th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2018, Boston, MA, USA, July 9-10, 2018 (2018), A. Goel and N. Talagala, Eds., USENIX Association.

[46]

Wikipedia. Stack distance, 2023. https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric.

[47]

Wu, K.-L., Yu, P. S., and Wolf, J. L. Segment-based proxy caching of multimedia streams. In Proceedings of the 10th international conference on World Wide Web (2001), pp. 36--44.

Digital Library

[48]

Wu, Y., György, A., and Szepesvári, C. Online learning with gaussian payoffs and side observations. Advances in Neural Information Processing Systems 28 (2015).

[49]

Yan, G., Li, J., and Towsley, D. Learning from optimal caching for content delivery. In CoNEXT '21: The 17th International Conference on emerging Networking EXperiments and Technologies, Virtual Event, Munich, Germany, December 7 - 10, 2021 (2021), G. Carle and J. Ott, Eds., ACM, pp. 344--358.

Digital Library

[50]

Yang, J., Sabnis, A., Berger, D. S., Rashmi, K., and Sitaraman, R. K. C2dn: How to harness erasure codes at the edge for efficient content delivery. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) (2022), pp. 1159--1177.

[51]

Zhang, J., and Bareinboim, E. Transfer learning in multi-armed bandit: A causal approach. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (2017), pp. 1778--1780.

[52]

Zhang, Q., Xiang, Z., Zhu, W., and Gao, L. Cost-based cache replacement and server selection for multimedia proxy across wireless internet. IEEE Transactions on Multimedia 6, 4 (2004), 587--598.

Digital Library

Cited By

Wang HZhang RLi CXue ZPeng YPang XZhang YRen SShi S(2024)Twist: A Multi-site Transmission Solution for On-demand Video StreamingProceedings of the ACM on Networking10.1145/36562972:CoNEXT2(1-19)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3656297
Wu BBao WZhou B(2024)Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery NetworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347541235:12(2449-2462)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3475412
Luo MZhou KWang HJiang ZKong HJi Y(2024)CAMS: A Cost-Aware Migration Scheme for Cloud Object Storage Systems2024 International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS63802.2024.10781371(1-4)Online publication date: 9-Nov-2024
https://doi.org/10.1109/NAS63802.2024.10781371

Index Terms

Darwin: Flexible Learning-based CDN Caching
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Performance Estimation and Evaluation Framework for Caching Policies in Hierarchical Caches
Abstract
The emergence of information-centric network (ICN) architectures has attracted a flurry of renewed research interest in caching policies and their performance analysis. One important feature ICNs offer that is distinct from classical ...
Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches

In this paper, we propose a novel on-chip L2 cache organization for chip multiprocessors (CMPs) with private L2 caches. The proposed approach, called reusability-aware cache sharing (RACS), combines the advantages of both a private L2 cache and a shared ...
Using dead blocks as a virtual victim cache
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Caches mitigate the long memory latency that limits the performance of modern processors. However, caches can be quite inefficient. On average, a cache block in a 2MB L2 cache is dead 59% of the time, i.e., it will not be referenced again before it is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference

September 2023

1217 pages

ISBN:9798400702365

DOI:10.1145/3603269

Chairs:
Henning Schulzrinne,
Vishal Misra,
Program Chairs:
Eddie Kohler,
David Maltz

Copyright © 2023 Owner/Author(s).

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

CNS

Conference

ACM SIGCOMM '23

Sponsor:

SIGCOMM

ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference

September 10, 2023

NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
2,464
Total Downloads

Downloads (Last 12 months)1,413
Downloads (Last 6 weeks)242

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang HZhang RLi CXue ZPeng YPang XZhang YRen SShi S(2024)Twist: A Multi-site Transmission Solution for On-demand Video StreamingProceedings of the ACM on Networking10.1145/36562972:CoNEXT2(1-19)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3656297
Wu BBao WZhou B(2024)Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery NetworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347541235:12(2449-2462)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3475412
Luo MZhou KWang HJiang ZKong HJi Y(2024)CAMS: A Cost-Aware Migration Scheme for Cloud Object Storage Systems2024 International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS63802.2024.10781371(1-4)Online publication date: 9-Nov-2024
https://doi.org/10.1109/NAS63802.2024.10781371

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten