Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
FaaSRail: Employing Real Workloads to Generate Representative Load for Serverless Research
- Christos Katsakioris,
- Chloe Alverti,
- Konstantinos Nikas,
- Dimitrios Siakavaras,
- Stratos Psomadakis,
- Nectarios Koziris
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed ComputingPages 214–226https://doi.org/10.1145/3625549.3658684With the proliferation of Serverless Computing, the Function-asa-Service (FaaS) paradigm is nowadays ubiquitous. As a result, the domain has attracted extensive research, both in industry and academia, identifying opportunities and addressing limitations ...
- research-articleMay 2024
Enabling Operational Data Analytics for Datacenters through Ontologies, Monitoring, and Simulation-based Prediction
ICPE '24 Companion: Companion of the 15th ACM/SPEC International Conference on Performance EngineeringPages 120–126https://doi.org/10.1145/3629527.3652897Datacenters are key components in the ICT infrastructure supporting our digital society. Datacenter operations are hampered by operational complexity and dynamics, risking to reduce or even offset the performance, energy efficiency, and other datacenter ...
- research-articleApril 2024
Characterizing a Memory Allocator at Warehouse Scale
- Zhuangzhuang Zhou,
- Vaibhav Gogte,
- Nilay Vaish,
- Chris Kennelly,
- Patrick Xia,
- Svilen Kanev,
- Tipp Moseley,
- Christina Delimitrou,
- Parthasarathy Ranganathan
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 192–206https://doi.org/10.1145/3620666.3651350Memory allocation constitutes a substantial component of warehouse-scale computation. Optimizing the memory allocator not only reduces the datacenter tax, but also improves application performance, leading to significant cost savings.
We present the ...
- research-articleNovember 2023
Divided at the Edge - Measuring Performance and the Digital Divide of Cloud Edge Data Centers
Proceedings of the ACM on Networking (PACMNET), Volume 1, Issue CoNEXT3Article No.: 16, Pages 1–23https://doi.org/10.1145/3629138Cloud providers are highly incentivized to reduce latency. One way they do this is by locating data centers as close to users as possible. These “cloud edge” data centers are placed in metropolitan areas and enable edge computing for residents of these ...
- research-articleSeptember 2023
TCP's Third Eye: Leveraging eBPF for Telemetry-Powered Congestion Control
eBPF '23: Proceedings of the 1st Workshop on eBPF and Kernel ExtensionsPages 1–7https://doi.org/10.1145/3609021.3609295For years, congestion control algorithms have been navigating in the dark, blind to the actual state of the network. They were limited to the course-grained signals that are visible from the OS kernel, which are measured locally (e.g., RTT) or hints ...
-
- research-articleSeptember 2023
Reinvent Cloud Software Stacks for Resource Disaggregation
Journal of Computer Science and Technology (JCST), Volume 38, Issue 5Pages 949–969https://doi.org/10.1007/s11390-023-3272-0AbstractDue to the unprecedented development of low-latency interconnect technology, building large-scale disaggregated architecture is drawing more and more attention from both industry and academia. Resource disaggregation is a new way to organize the ...
- posterSeptember 2023
Poster: Near Non-blocking Performance with All-optical Circuit-switched Core
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 ConferencePages 1117–1119https://doi.org/10.1145/3603269.3610868All-optical circuit-switched (OCS) core is the holy grail for the future generation datacenter architectures. However, such proposals consist of a common operational abstraction termed as round-robin circuit scheduling, which heavily suffers from a) high ...
- research-articleSeptember 2023
Fathom: Understanding Datacenter Application Network Performance
- Mubashir Adnan Qureshi,
- Junhua Yan,
- Yuchung Cheng,
- Soheil Hassas Yeganeh,
- Yousuk Seung,
- Neal Cardwell,
- Willem De Bruijn,
- Van Jacobson,
- Jasleen Kaur,
- David Wetherall,
- Amin Vahdat
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 ConferencePages 394–405https://doi.org/10.1145/3603269.3604815We describe our experience with Fathom, a system for identifying the network performance bottlenecks of any service running in the Google fleet. Fathom passively samples RPCs, the principal unit of work for services. It segments the overall latency into ...
- research-articleAugust 2023
Capybara: μSecond-Scale Live TCP Migration
APSys '23: Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on SystemsPages 30–36https://doi.org/10.1145/3609510.3609813Latency-critical μs-scale data center applications are susceptible to server load spikes. The issue is particularly challenging for services using long-lived TCP connections. This paper introduces Capybara, a highly efficient and versatile live TCP ...
- research-articleAugust 2023
Myths and Misconceptions Around Reducing Carbon Embedded in Cloud Platforms
- Jialun Lyu,
- Jaylen Wang,
- Kali Frost,
- Chaojie Zhang,
- Celine Irvene,
- Esha Choukse,
- Rodrigo Fonseca,
- Ricardo Bianchini,
- Fiodar Kazhamiaka,
- Daniel S. Berger
HotCarbon '23: Proceedings of the 2nd Workshop on Sustainable Computer SystemsArticle No.: 7, Pages 1–7https://doi.org/10.1145/3604930.3605717Major cloud providers have stated public plans to lower their carbon emissions. Historically, this has meant focusing on emissions from producing the electricity consumed by datacenters. While work and challenges remain on this avenue, research and ...
- abstractJune 2023
Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 3–4https://doi.org/10.1145/3578338.3593551The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal ...
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1 - short-paperJune 2023
DPFS: DPU-Powered File System Virtualization
SYSTOR '23: Proceedings of the 16th ACM International Conference on Systems and StoragePages 1–7https://doi.org/10.1145/3579370.3594769As we move towards hyper-converged cloud solutions, the efficiency and overheads of distributed file systems at the cloud tenant side (i.e., client) become of paramount importance. Often, the clientside driver of a cloud file system is complex and CPU ...
DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web Services
WWW '23: Proceedings of the ACM Web Conference 2023Pages 3067–3076https://doi.org/10.1145/3543507.3583437Traditional power reduction techniques such as DVFS or RAPL are challenging to use with web services because they significantly affect the services’ latency and throughput. Previous work suggested the use of controllers based on control theory or ...
- research-articleMarch 2023
Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 2, Pages 1–43https://doi.org/10.1145/3579312The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal ...
- research-articleJanuary 2023
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
- Huaicheng Li,
- Daniel S. Berger,
- Lisa Hsu,
- Daniel Ernst,
- Pantea Zardoshti,
- Stanko Novakovic,
- Monish Shah,
- Samir Rajadnya,
- Scott Lee,
- Ishwar Agarwal,
- Mark D. Hill,
- Marcus Fontoura,
- Ricardo Bianchini
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 574–587https://doi.org/10.1145/3575693.3578835Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and thereby reduce costs. However, pooling is ...
- research-articleDecember 2022
AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-stage Serverless Workflows
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1Pages 1–14https://doi.org/10.1145/3567955.3567960Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these ...
- research-articleNovember 2022
GreenDRL: managing green datacenters using deep reinforcement learning
SoCC '22: Proceedings of the 13th Symposium on Cloud ComputingPages 445–460https://doi.org/10.1145/3542929.3563501Managing datacenters to maximize efficiency and sustain-ability is a complex and challenging problem. In this work, we explore the use of deep reinforcement learning (RL) to manage "green" datacenters, bringing a robust approach for designing efficient ...
- research-articleDecember 2023
DFX: A Low-Latency Multi-FPGA Appliance for Accelerating Transformer-Based Text Generation
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 616–630https://doi.org/10.1109/MICRO56248.2022.00051Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or ...
- research-articleJanuary 2023
Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis
- Kangjin Wang,
- Ying Li,
- Cheng Wang,
- Tong Jia,
- Kingsum Chow,
- Yang Wen,
- Yaoyong Dou,
- Guoyao Xu,
- Chuanjia Hou,
- Jie Yao,
- Liping Zhang
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 47, Pages 1–11https://doi.org/10.1145/3545008.3545026Understanding the microarchitectural resource characteristics of datacenter jobs has become increasingly critical to guarantee the performance of jobs while improving resource utilization. Prior work studied the resource characteristics of datacenter ...
- research-articleAugust 2022
ABM: active buffer management in datacenters
SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 ConferencePages 36–52https://doi.org/10.1145/3544216.3544252Today's network devices share buffer across queues to avoid drops during transient congestion and absorb bursts. As the buffer-per-bandwidth-unit in datacenter decreases, the need for optimal buffer utilization becomes more pressing. Typical devices use ...