Author: Litz, Heiner : Search

research-article

Open Access

En4S: Enabling SLOs in Serverless Storage Systems

SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 160–177https://doi.org/10.1145/3698038.3698529

Serverless computing promises scalability and cost-efficiency by decomposing monolithic tasks into small, stateless, self-contained functions. As functions only reserve hardware resources during their lifetime, and serverless providers such as Amazon ...

opinion

Special Issue on Hot Chips 2023

IEEE Micro (IMIC), Volume 44, Issue 3Pages 6–7https://doi.org/10.1109/MM.2024.3396008

This special issue of IEEE Micro is devoted to selected top-pick articles presented at Hot Chips 2023. The Hot Chips Conference serves as a leading venue for presenting the technical details of innovative microchips on a wide range of topics, including ...

research-article

RPG²: Robust Profile-Guided Runtime Prefetch Generation

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 999–1013https://doi.org/10.1145/3620665.3640396

Data cache prefetching is a well-established optimization to overcome the limits of the cache hierarchy and keep the processor pipeline fed with data. In principle, accurate, well-timed prefetches can sidestep the majority of cache misses and ...

research-article

Open Access

TMC: Near-Optimal Resource Allocation for Tiered-Memory Systems

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingPages 376–393https://doi.org/10.1145/3620678.3624667

Main memory dominates data center server cost, and hence data center operators are exploring alternative technologies such as CXL-attached and persistent memory to improve cost without jeopardizing performance. Introducing multiple tiers of memory ...

research-article

Open Access

Enabling Multi-tenancy on SSDs with Accurate IO Interference Modeling

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingPages 216–232https://doi.org/10.1145/3620678.3624657

Technological advancements in the past decades have substantially increased the capacity and performance of Solid State Drives (SSDs). Provisioning such high-capacity SSDs among tenants can reap multiple benefits, such as elevated performance, efficient ...

research-article

Online Code Layout Optimizations via OCOLOS

IEEE Micro (IMIC), Volume 43, Issue 4Pages 71–79https://doi.org/10.1109/MM.2023.3274758

The processor front end has become an increasingly important bottleneck in recent years due to growing application code footprints, particularly in data centers. Profile-guided optimizations performed by compilers represent a promising approach, as they ...

abstract

Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage

SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 19–20https://doi.org/10.1145/3578338.3593569

Smash is a new placement and lookup method for distributed storage systems. It achieves full placement flexibility and low DRAM cost to store ID-to-location mappings, two desired features that could not be achieved simultaneously by any prior method.

Also Published in:

ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1

research-article

Open Access

Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage

Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 2Article No.: 33, Pages 1–22https://doi.org/10.1145/3589977

Large-scale distributed storage systems, such as object stores, usually apply hashing-based placement and lookup methods to achieve scalability and resource efficiency. However, when object locations are determined by hash values, placement becomes ...

research-article

OCOLOS: Online COde Layout OptimizationS

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 530–545https://doi.org/10.1109/MICRO56248.2022.00045

The processor front-end has become an increasingly important bottleneck in recent years due to growing application code footprints, particularly in data centers. First-level instruction caches and branch prediction engines have not been able to keep ...

research-article

Whisper: Profile-Guided Branch Misprediction Elimination for Data Center Applications

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 19–34https://doi.org/10.1109/MICRO56248.2022.00017

Modern data center applications experience frequent branch mispredictions - degrading performance, increasing cost, and reducing energy efficiency in data centers. Even the state-of-the-art branch predictor, TAGE-SC-L, suffers from an average branch ...

research-article

Public Access

Thermometer: profile-guided btb replacement for data center applications

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 742–756https://doi.org/10.1145/3470496.3527430

Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching (FDIP) to avoid frontend stalls in data center applications. However, the large branch footprint of data center applications precipitates frequent Branch Target ...

research-article

Open Access

APT-GET: profile-guided timely software prefetching

EuroSys '22: Proceedings of the Seventeenth European Conference on Computer SystemsPages 747–764https://doi.org/10.1145/3492321.3519583

Prefetching which predicts future memory accesses and preloads them from main memory, is a widely-adopted technique to overcome the processor-memory performance gap. Unfortunately, hardware prefetchers implemented in today's processors cannot identify ...

research-article

Open Access

CRISP: critical slice prefetching

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 300–313https://doi.org/10.1145/3503222.3507745

The high access latency of DRAM continues to be a performance challenge for contemporary microprocessor systems. Prefetching is a well-established technique to address this problem, however, existing implemented designs fail to provide any performance ...

research-article

Public Access

RAIL: Predictable, Low Tail Latency for NVMe Flash

ACM Transactions on Storage (TOS), Volume 18, Issue 1Article No.: 5, Pages 1–21https://doi.org/10.1145/3465406

Flash-based storage is replacing disk for an increasing number of data center applications, providing orders of magnitude higher throughput and lower average latency. However, applications also require predictable storage latency. Existing Flash devices ...

research-article

Public Access

Twig: Profile-Guided BTB Prefetching for Data Center Applications

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 816–829https://doi.org/10.1145/3466752.3480124

Modern data center applications have deep software stacks, with instruction footprints that are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently prefetch instructions into the I-cache despite large application ...

research-article

Public Access

PDede: Partitioned, Deduplicated, Delta Branch Target Buffer

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 779–791https://doi.org/10.1145/3466752.3480046

Due to large instruction footprints, contemporary data center applications suffer from frequent frontend stalls. Despite being a significant contributor to these stalls, the Branch Target Buffer (BTB) has received less attention compared to other ...

research-article

Ripple: profile-guided instruction cache replacement for data center applications

ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 734–747https://doi.org/10.1109/ISCA52012.2021.00063

Modern data center applications exhibit deep software stacks, resulting in large instruction footprints that frequently cause instruction cache misses degrading performance, cost, and energy efficiency. Although numerous mechanisms have been proposed to ...

research-article

Open Access

Reducing write amplification in flash by death-time prediction of logical block addresses

SYSTOR '21: Proceedings of the 14th ACM International Conference on Systems and StorageArticle No.: 11, Pages 1–12https://doi.org/10.1145/3456727.3463784

Flash-based solid state drives lack support for in-place updates, and hence deploy a flash translation layer to absorb the writes. For this purpose, SSDs implement a log-structured storage system introducing garbage collection and write-amplification ...

short-paper

Open Access

Design for computational storage simulation platform

CHEOPS '21: Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage SystemsArticle No.: 5, Pages 1–8https://doi.org/10.1145/3439839.3459085

Data movement between storage and compute resources represents a bottleneck in data-driven applications. This performance bottleneck can be mitigated by leveraging inherent parallelism in the user application and offloading component tasks, called ...

doctoral_thesis

Algorithmic and System Innovations for Network Data Plane: Efficiency, Scalability, and Flexibility

Abstract

Due to the advanced reliability, scalability, and cost-effectiveness, more and more businesses are turning to cloud computing, and large-scale cloud networks have been connecting users, data, and machines more tightly than any past time. According ...

Applied Filters

People

Names

Institutions

Authors

Advisors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences

Also Published in: