Nothing Special   »   [go: up one dir, main page]

CERN Accelerating science

Article
Title Caching for dataset-based workloads with heterogeneous file sizes
Author(s) Chuchuk, Olga (CERN ; INRIA, Sophia Antipolis) ; Neglia, Giovanni (INRIA, Sophia Antipolis) ; Schulz, Markus (CERN) ; Duellmann, Dirk (CERN)
Publication 2022
Number of pages 18
In: PoS ISGC2022 (2022) 009
In: International Symposium on Grids & Clouds (ISGC 2022), Taipei, Taiwan, 21 - 25 Mar 2022, pp.009
DOI 10.22323/1.415.0009
Subject category Computing and Computers
Abstract Caching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe. Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones. In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching.
Copyright/License © 2022-2024 The author(s) (License: CC-BY-NC-ND-4.0)

Corresponding record in: Inspire


 レコード 生成: 2023-06-07, 最終変更: 2023-06-07


フルテキスト:
Download fulltext
PDF