Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
En4S: Enabling SLOs in Serverless Storage Systems
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 160–177https://doi.org/10.1145/3698038.3698529Serverless computing promises scalability and cost-efficiency by decomposing monolithic tasks into small, stateless, self-contained functions. As functions only reserve hardware resources during their lifetime, and serverless providers such as Amazon ...
- opinionJune 2024
Special Issue on Hot Chips 2023
This special issue of IEEE Micro is devoted to selected top-pick articles presented at Hot Chips 2023. The Hot Chips Conference serves as a leading venue for presenting the technical details of innovative microchips on a wide range of topics, including ...
- research-articleApril 2024
RPG2: Robust Profile-Guided Runtime Prefetch Generation
- Yuxuan Zhang,
- Nathan Sobotka,
- Soyoon Park,
- Saba Jamilan,
- Tanvir Ahmed Khan,
- Baris Kasikci,
- Gilles A Pokam,
- Heiner Litz,
- Joseph Devietti
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 999–1013https://doi.org/10.1145/3620665.3640396Data cache prefetching is a well-established optimization to overcome the limits of the cache hierarchy and keep the processor pipeline fed with data. In principle, accurate, well-timed prefetches can sidestep the majority of cache misses and ...
- research-articleOctober 2023
TMC: Near-Optimal Resource Allocation for Tiered-Memory Systems
SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingPages 376–393https://doi.org/10.1145/3620678.3624667Main memory dominates data center server cost, and hence data center operators are exploring alternative technologies such as CXL-attached and persistent memory to improve cost without jeopardizing performance. Introducing multiple tiers of memory ...
- research-articleOctober 2023
Enabling Multi-tenancy on SSDs with Accurate IO Interference Modeling
SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingPages 216–232https://doi.org/10.1145/3620678.3624657Technological advancements in the past decades have substantially increased the capacity and performance of Solid State Drives (SSDs). Provisioning such high-capacity SSDs among tenants can reap multiple benefits, such as elevated performance, efficient ...
-
- research-articleJuly 2023
Online Code Layout Optimizations via OCOLOS
The processor front end has become an increasingly important bottleneck in recent years due to growing application code footprints, particularly in data centers. Profile-guided optimizations performed by compilers represent a promising approach, as they ...
- abstractJune 2023
Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 19–20https://doi.org/10.1145/3578338.3593569Smash is a new placement and lookup method for distributed storage systems. It achieves full placement flexibility and low DRAM cost to store ID-to-location mappings, two desired features that could not be achieved simultaneously by any prior method.
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1 - research-articleMay 2023
Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 2Article No.: 33, Pages 1–22https://doi.org/10.1145/3589977Large-scale distributed storage systems, such as object stores, usually apply hashing-based placement and lookup methods to achieve scalability and resource efficiency. However, when object locations are determined by hash values, placement becomes ...
- research-articleDecember 2023
OCOLOS: Online COde Layout OptimizationS
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 530–545https://doi.org/10.1109/MICRO56248.2022.00045The processor front-end has become an increasingly important bottleneck in recent years due to growing application code footprints, particularly in data centers. First-level instruction caches and branch prediction engines have not been able to keep ...
- research-articleDecember 2023
Whisper: Profile-Guided Branch Misprediction Elimination for Data Center Applications
- Tanvir Ahmed Khan,
- Muhammed Ugur,
- Krishnendra Nathella,
- Dam Sunwoo,
- Heiner Litz,
- Daniel A. Jiménez,
- Baris Kasikci
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 19–34https://doi.org/10.1109/MICRO56248.2022.00017Modern data center applications experience frequent branch mispredictions - degrading performance, increasing cost, and reducing energy efficiency in data centers. Even the state-of-the-art branch predictor, TAGE-SC-L, suffers from an average branch ...
- research-articleJune 2022
Thermometer: profile-guided btb replacement for data center applications
- Shixin Song,
- Tanvir Ahmed Khan,
- Sara Mahdizadeh Shahri,
- Akshitha Sriraman,
- Niranjan K Soundararajan,
- Sreenivas Subramoney,
- Daniel A. Jiménez,
- Heiner Litz,
- Baris Kasikci
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 742–756https://doi.org/10.1145/3470496.3527430Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching (FDIP) to avoid frontend stalls in data center applications. However, the large branch footprint of data center applications precipitates frequent Branch Target ...
- research-articleMarch 2022
APT-GET: profile-guided timely software prefetching
EuroSys '22: Proceedings of the Seventeenth European Conference on Computer SystemsPages 747–764https://doi.org/10.1145/3492321.3519583Prefetching which predicts future memory accesses and preloads them from main memory, is a widely-adopted technique to overcome the processor-memory performance gap. Unfortunately, hardware prefetchers implemented in today's processors cannot identify ...
- research-articleFebruary 2022
CRISP: critical slice prefetching
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 300–313https://doi.org/10.1145/3503222.3507745The high access latency of DRAM continues to be a performance challenge for contemporary microprocessor systems. Prefetching is a well-established technique to address this problem, however, existing implemented designs fail to provide any performance ...
- research-articleJanuary 2022
RAIL: Predictable, Low Tail Latency for NVMe Flash
ACM Transactions on Storage (TOS), Volume 18, Issue 1Article No.: 5, Pages 1–21https://doi.org/10.1145/3465406Flash-based storage is replacing disk for an increasing number of data center applications, providing orders of magnitude higher throughput and lower average latency. However, applications also require predictable storage latency. Existing Flash devices ...
- research-articleOctober 2021
Twig: Profile-Guided BTB Prefetching for Data Center Applications
- Tanvir Ahmed Khan,
- Nathan Brown,
- Akshitha Sriraman,
- Niranjan K Soundararajan,
- Rakesh Kumar,
- Joseph Devietti,
- Sreenivas Subramoney,
- Gilles A Pokam,
- Heiner Litz,
- Baris Kasikci
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 816–829https://doi.org/10.1145/3466752.3480124Modern data center applications have deep software stacks, with instruction footprints that are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently prefetch instructions into the I-cache despite large application ...
- research-articleOctober 2021
PDede: Partitioned, Deduplicated, Delta Branch Target Buffer
- Niranjan K Soundararajan,
- Peter Braun,
- Tanvir Ahmed Khan,
- Baris Kasikci,
- Heiner Litz,
- Sreenivas Subramoney
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 779–791https://doi.org/10.1145/3466752.3480046Due to large instruction footprints, contemporary data center applications suffer from frequent frontend stalls. Despite being a significant contributor to these stalls, the Branch Target Buffer (BTB) has received less attention compared to other ...
- research-articleNovember 2021
Ripple: profile-guided instruction cache replacement for data center applications
- Tanvir Ahmed Khan,
- Dexin Zhang,
- Akshitha Sriraman,
- Joseph Devietti,
- Gilles Pokam,
- Heiner Litz,
- Baris Kasikci
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 734–747https://doi.org/10.1109/ISCA52012.2021.00063Modern data center applications exhibit deep software stacks, resulting in large instruction footprints that frequently cause instruction cache misses degrading performance, cost, and energy efficiency. Although numerous mechanisms have been proposed to ...
- research-articleJune 2021
Reducing write amplification in flash by death-time prediction of logical block addresses
SYSTOR '21: Proceedings of the 14th ACM International Conference on Systems and StorageArticle No.: 11, Pages 1–12https://doi.org/10.1145/3456727.3463784Flash-based solid state drives lack support for in-place updates, and hence deploy a flash translation layer to absorb the writes. For this purpose, SSDs implement a log-structured storage system introducing garbage collection and write-amplification ...
- short-paperApril 2021
Design for computational storage simulation platform
CHEOPS '21: Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage SystemsArticle No.: 5, Pages 1–8https://doi.org/10.1145/3439839.3459085Data movement between storage and compute resources represents a bottleneck in data-driven applications. This performance bottleneck can be mitigated by leveraging inherent parallelism in the user application and offloading component tasks, called ...
- doctoral_thesisJanuary 2021
Algorithmic and System Innovations for Network Data Plane: Efficiency, Scalability, and Flexibility
AbstractDue to the advanced reliability, scalability, and cost-effectiveness, more and more businesses are turning to cloud computing, and large-scale cloud networks have been connecting users, data, and machines more tightly than any past time. According ...