Keyword: cache : Search

research-article

Open Access

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 3Article No.: 50, Pages 1–23https://doi.org/10.1145/3659207

The increasing demand for computing power and the emergence of heterogeneous computing architectures have driven the exploration of innovative techniques to address current limitations in both the compute and memory subsystems. One such solution is the ...

Article

Reusing Your Prepared Data: An Informed Cache for Accelerating DNN Model Training

Database Systems for Advanced ApplicationsPages 463–473https://doi.org/10.1007/978-981-97-5572-1_34

Abstract

In deep learning training, CPU-intensive data preprocessing often leads to CPU bottlenecks, and expensive GPUs cannot be fully utilized, thus degrading end-to-end training performance. In general, CPUs are used to doing preprocessing and GPUs are ...

research-article

Non-Fusion Based Coherent Cache Randomization Using Cross-Domain Accesses

ASIA CCS '24: Proceedings of the 19th ACM Asia Conference on Computer and Communications SecurityPages 186–202https://doi.org/10.1145/3634737.3645011

Randomization has proven to be a effective defense against conflict-based side-channel attacks in a shared cache. It improves security by assigning a unique randomization scheme to each security domain, e.g., though a different hashing function. However, ...

research-article

Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems

WWW '24: Companion Proceedings of the ACM Web Conference 2024Pages 284–291https://doi.org/10.1145/3589335.3648326

Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for ...

research-article

No Clash on Cache: Observations from a Multi-tenant Ecommerce Platform

ICPE '24: Proceedings of the 15th ACM/SPEC International Conference on Performance EngineeringPages 258–266https://doi.org/10.1145/3629526.3645039

Caching is a classic technique for improving system performance by reducing client-perceived latency and server load. However, cache management still needs to be improved and is even more difficult in multi-tenant systems. To shed light on these problems ...

research-article

CFP: A Coherence-Free Processor Design

Franklin Yang

Journal of Computer Science and Technology (JCST), Volume 39, Issue 1Pages 99–102https://doi.org/10.1007/s11390-023-3964-5

Abstract

This paper presents the design of a Coherence-Free Processor (CFP) that enables a scalable multiprocessor by eliminating cache coherence operations in both hardware and software. The CFP uses a coherence-free cache (CFC) that can improve the cost-...

research-article

ZGaming: Zero-Latency 3D Cloud Gaming by Image Prediction

ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 ConferencePages 710–723https://doi.org/10.1145/3603269.3604819

In cloud gaming, interactive latency is one of the most important factors in users' experience. Although the interactive latency can be reduced through typical network infrastructures like edge caching and congestion control, the interactive latency of ...

research-article

DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU

Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 166, Pages 1–24https://doi.org/10.1145/3589311

Recently Graph Neural Networks (GNNs) have achieved great success in many applications. The mini-batch training has become the de-facto way to train GNNs on giant graphs. However, the mini-batch generation task is extremely expensive which slows down the ...

short-paper

Open Access

RBGC: Repurpose the Buffer of Fixed Graphics Pipeline to Enhance GPU Cache

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023Pages 173–177https://doi.org/10.1145/3583781.3590305

The limited cache size of GPU in general-purpose computing hinders the execution efficiency of thousands of concurrent threads. Several techniques have been proposed to increase the cache size per thread, such as repurposing shared memory and register ...

research-article

A Study of Early Aggregation in Database Query Processing on FPGAs

FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate ArraysPages 55–65https://doi.org/10.1145/3543622.3573194

In database query processing, aggregation is an operator by which data with a common property is grouped and expressed in a summary form. Early aggregation is a popular method for improving the performance of the aggregation operator. In this paper, we ...

research-article

Predicting reuse interval for optimized web caching: an LSTM-based machine learning approach

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 86, Pages 1–15

Caching techniques are widely used in the era of cloud computing from applications, such as Web caches to infrastructures, Memcached and memory caches in computer architectures. Prediction of cached data can greatly help improve cache management and hit ...

research-article

Merging Similar Patterns for Hardware Prefetching

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1012–1026https://doi.org/10.1109/MICRO56248.2022.00071

One critical challenge of designing an efficient prefetcher is to strike a balance between performance and hardware overhead. Some state-of-the-art prefetchers achieve very high performance at the price of a very large storage requirement, which makes ...

research-article

VMIFresh: Efficient and Fresh Caches for Virtual Machine Introspection

ARES '22: Proceedings of the 17th International Conference on Availability, Reliability and SecurityArticle No.: 1, Pages 1–9https://doi.org/10.1145/3538969.3539002

Virtual machine introspection (VMI) is the process of extracting knowledge about the inner state of a virtual machine from the outside. Traditional passive introspection mechanisms have proved themselves ineffective in many application domains due to ...

research-article

Evolving Skyrmion Racetrack Memory as Energy-Efficient Last-Level Cache Devices

ISLPED '22: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and DesignArticle No.: 8, Pages 1–6https://doi.org/10.1145/3531437.3539709

Skyrmion racetrack memory (SK-RM) has been regarded as a promising alternative to replace static random-access memory (SRAM) as a large-size on-chip cache device with high memory density. Different from other nonvolatile random-access memories (NVRAMs),...

research-article

Open Access

Performance Analysis and Modelling of Concurrent Multi-access Data Structures

SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and ArchitecturesPages 333–344https://doi.org/10.1145/3490148.3538578

The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in ...

rfc

Free

RFC 9211: The Cache-Status HTTP Response Header Field

M. Nottingham

To aid debugging, HTTP caches often append header fields to a response, explaining how they handled the request in an ad hoc manner. This specification defines a standard mechanism to do so that is aligned with HTTP's caching model.

research-article

Open Access

Building a Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage

ACM Transactions on Architecture and Code Optimization (TACO), Volume 19, Issue 3Article No.: 37, Pages 1–26https://doi.org/10.1145/3527452

The explosive growth of modern web-scale applications has made cost-effectiveness a primary design goal for their underlying databases. As a backbone of modern databases, LSM-tree based key–value stores (LSM store) face limited storage options. They are ...

research-article

REMOC: efficient request managements for on-chip memories of GPUs

CF '22: Proceedings of the 19th ACM International Conference on Computing FrontiersPages 1–11https://doi.org/10.1145/3528416.3530229

The on-chip memories of GPUs, including the register file, shared memory and L1 cache, can provide high bandwidth and low latency access for the temporary storage of data. The capacity of L1 cache can be increased by using the registers/shared memory ...

research-article

MagNet: Cooperative Edge Caching by Automatic Content Congregating

WWW '22: Proceedings of the ACM Web Conference 2022Pages 3280–3288https://doi.org/10.1145/3485447.3512146

Nowadays, the surge of Internet contents and the need for high Quality of Experience (QoE) put the backbone network under unprecedented pressure. The emerging edge caching solutions help ease the pressure by caching contents closer to users. However, ...

research-article

Open Access

Accelerating SSSP for Power-Law Graphs

FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPages 190–200https://doi.org/10.1145/3490422.3502358

The single-source shortest path (SSSP) problem is one of the most important and well-studied graph problems widely used in many application domains, such as road navigation, neural image reconstruction, and social network analysis. Although we have ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences