002861084 001__ 2861084
002861084 003__ SzGeCERN
002861084 005__ 20230607205634.0
002861084 0247_ $$2DOI$$9SISSA$$a10.22323/1.415.0009
002861084 0248_ $$aoai:cds.cern.ch:2861084$$pcerncds:FULLTEXT$$pcerncds:CERN:FULLTEXT$$pcerncds:CERN
002861084 035__ $$9https://inspirehep.net/api/oai2d$$aoai:inspirehep.net:2158687$$d2023-06-06T10:42:00Z$$h2023-06-07T04:09:18Z$$mmarcxml
002861084 035__ $$9Inspire$$a2158687
002861084 041__ $$aeng
002861084 100__ $$aChuchuk, Olga$$uCERN$$uINRIA, Sophia Antipolis
002861084 245__ $$9SISSA$$aCaching for dataset-based workloads with heterogeneous file sizes
002861084 260__ $$c2022
002861084 300__ $$a18 p
002861084 520__ $$9SISSA$$aCaching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe.
Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones.
In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching.
002861084 540__ $$aCC-BY-NC-ND-4.0$$bSISSA$$uhttps://creativecommons.org/licenses/by-nc-nd/4.0/
002861084 542__ $$dThe author(s)$$g2022
002861084 65017 $$2SzGeCERN$$aComputing and Computers
002861084 690C_ $$aARTICLE
002861084 690C_ $$aCERN
002861084 700__ $$aNeglia, Giovanni$$uINRIA, Sophia Antipolis
002861084 700__ $$aSchulz, Markus$$uCERN
002861084 700__ $$aDuellmann, Dirk$$uCERN
002861084 773__ $$c009$$pPoS$$vISGC2022$$wC22-03-21.2$$y2022
002861084 8564_ $$82457004$$s1508623$$uhttp://cds.cern.ch/record/2861084/files/document.pdf$$yFulltext
002861084 960__ $$a13
002861084 962__ $$b2861001$$k009$$ntaipei20220321
002861084 980__ $$aARTICLE
002861084 980__ $$aConferencePaper