research-article

UFO: the ultimate QoS-aware CPU core management for virtualized and oversubscribed public clouds

AUTHORs:

Zhibin YuAuthors Info & Claims

NSDI'24: Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation

Article No.: 84, Pages 1511 - 1530

Published: 16 April 2024 Publication History

Abstract

Public clouds typically adopt (1) multi-tenancy to increase server utilization; (2) virtualization to provide isolation between different tenants; (3) oversubscription of resources to further increase resource efficiency. However, prior work all focuses on optimizing one or two elements, and fails to considerately bring QoS-aware multi-tenancy, virtualization and resource oversubscription together.

We find three challenges when the three elements coexist. First, the double scheduling symptoms are 10× worse with latency-critical (LC) workloads which are comprised of numerous sub-millisecond tasks and are significantly different from conventional batch applications. Second, inner-VM resource contention also exists between threads of the same VM when running LC applications, calling for inner-VM core isolation. Third, no application-level performance metrics can be obtained by the host to guide resource management in realistic public clouds.

To address these challenges, we propose a QoS-aware core manager dubbed UFO to specifically support co-location of multiple LC workloads in virtualized and oversubscribed public cloud environments. UFO solves the three above-mentioned challenges, by (1) coordinating the guest and host CPU cores (vCPU-pCPU coordination), and (2) doing fine-grained inner-VM resource isolation, to push core management in realistic public clouds to the extreme. Compared with the state-of-the-art core manager, it saves up to 50% (average of 22%) of physical cores under the same co-location scenario.

References

[1]

Alibaba cloud elastic compute service. https://www.alibabacloud.com/product/ecs.

[2]

Alibaba cloud linux. https://alibaba.github.io/cloud-kernel/os.html.

[3]

Amazon ec2. https://aws.amazon.com/ec2/.

[4]

Amazon linux. https://github.com/amazonlinux/amazon-linux-2023.

[5]

Centos end of life date. https://endoflife.date/centos.

[6]

Coefficient of determination. https://en.wikipedia.org/wiki/Coefficient_of_determination.

[7]

Github page of mutated load generator. https://github.com/scslab/mutated.

[8]

Github page of sysbench load generator. https://github.com/akopytov/sysbench.

[9]

Github page of wrk2 load generator. https://github.com/sc2682cornell/wrk2.

[10]

Guidelines for overcommitting vmware resources.

[11]

Memcached official website. https://memcached.org/.

[12]

Mysql official website. https://www.mysql.com/.

[13]

Nginx official website. https://www.nginx.com/.

[14]

Steal time for kvm. https://lwn.net/Articles/449657/.

[15]

Tecent cloud virtual machine. https://cloud.tencent.com/product/cvm.

[16]

Tpc-ds homepage. https://www.tpc.org/tpcds/.

[17]

Using virsh emulatorpin in virtual environments with nfv. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/ovs-dpdk_end_to_end_troubleshooting_guide/using_virsh_emulatorpin_in_virtual_environments_with_nfv.

[18]

Jeongseob Ahn, Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh. Accelerating critical os services in virtualized systems with flexible micro-sliced cores. In Proceedings of the Thirteenth EuroSys Conference, EuroSys '18, New York, NY, USA, 2018. Association for Computing Machinery.

Digital Library

[19]

Maryam Amiri and Leyli Mohammad-Khanli. Survey on prediction models of applications for resources provisioning in cloud. Journal of Network and Computer Applications, 82:93-113, 2017.

Digital Library

[20]

Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD international conference on management of data, pages 1383-1394, 2015.

Digital Library

[21]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 72-81, 2008.

Digital Library

[22]

Shuang Chen, Christina Delimitrou, and José F Martínez. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 107-120, 2019.

Digital Library

[23]

Shuang Chen, Angela Jin, Christina Delimitrou, and José F Martínez. Retail: Opting for learning simplicity to enable qos-aware power management in the cloud. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 155-168. IEEE, 2022.

[24]

Luwei Cheng, Jia Rao, and Francis C. M. Lau. Vs-cale: Automatic and efficient processor scaling for smp virtual machines. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, New York, NY, USA, 2016. Association for Computing Machinery.

Digital Library

[25]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 153-167, 2017.

Digital Library

[26]

Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices, 49(4):127-144, 2014.

Digital Library

[27]

Xiaoning Ding, Phillip B. Gibbons, Michael A. Kozuch, and Jianchen Shan. Gleaner: Mitigating the Blocked-Waiter wakeup problem for virtualized multicore applications. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 73-84, Philadelphia, PA, June 2014. USENIX Association.

[28]

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. Caladan: Mitigating interference at microsecond timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 281-297, 2020.

[29]

Jing Guo, Zihao Chang, Sa Wang, Haiyang Ding, Yihui Feng, Liang Mao, and Yungang Bao. Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In Proceedings of the International Symposium on Quality of Service, pages 1-10, 2019.

Digital Library

[30]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 620-629. IEEE, 2018.

[31]

SR Jiri Herrmann, Dayle Parker, and Scott Radvan. Red hat enterprise linux 7 virtualization tuning and optimization guide, 2015.

[32]

Kenta Ishiguro, Naoki Yasuno, Pierre-Louis Aublin, and Kenji Kono. Mitigating excessive vcpu spinning in vm-agnostic kvm. In Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 139-152, 2021.

Digital Library

[33]

Weiwei Jia, Jianchen Shan, Tsz On Li, Xiaowei Shang, Heming Cui, and Xiaoning Ding. vSMT-IO: Improving I/O performance and efficiency on SMT processors in virtualized clouds. In 2020 USENIX Annual Technical Conference (USENIX ATC 20), pages 449-463. USENIX Association, July 2020.

[34]

Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. Shinjuku: Preemptive scheduling for {µsecond-scale} tail latency. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 345-360, 2019.

[35]

Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. Opportunistic spinlocks: Achieving virtual machine scalability in the clouds. ACM SIGOPS Operating Systems Review, 50(1):9-16, 2016.

Digital Library

[36]

Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. Scaling guest {OS} critical sections with ecs. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18), pages 159-172, 2018.

[37]

Harshad Kasture, Davide B Bartolini, Nathan Beckmann, and Daniel Sanchez. Rubik: Fast analytical power management for latency-critical systems. In Proceedings of the 48th International Symposium on Microarchitecture, pages 598-610, 2015.

Digital Library

[38]

Harshad Kasture and Daniel Sanchez. Ubik: Efficient cache sharing with strict qos for latency-critical workloads. ACM SIGPLAN Notices, 49(4):729-742, 2014.

Digital Library

[39]

Harshad Kasture and Daniel Sanchez. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1-10. IEEE, 2016.

[40]

Hwanju Kim, Sangwook Kim, Jinkyu Jeong, and Joonwon Lee. Virtual asymmetric multiprocessor for interactive performance of consolidated desktops. SIGPLAN Not., 49(7):29-40, mar 2014.

Digital Library

[41]

Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, and Seungryoul Maeng. Demand-based coordinated scheduling for smp vms. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, pages 369-380, 2013.

Digital Library

[42]

Jacob Leverich and Christos Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceedings of the Ninth European Conference on Computer Systems, pages 1-14, 2014.

Digital Library

[43]

Yuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen, and Yungang Bao. Ah-q: Quantifying and handling the interference within a datacenter from a system perspective. pages 471-484, 2023.

[44]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, pages 450-462, 2015.

Digital Library

[45]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, pages 248-259, 2011.

Digital Library

[46]

Paul Marshall, Kate Keahey, and Tim Freeman. Improving utilization of infrastructure clouds. In 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 205-214. IEEE, 2011.

Digital Library

[47]

Aravind Menon, Jose Renato Santos, Yoshio Turner, G. (John) Janakiraman, and Willy Zwaenepoel. Diagnosing performance overheads in the xen virtual machine environment. VEE '05, page 13-23, New York, NY, USA, 2005. Association for Computing Machinery.

Digital Library

[48]

Rajiv Nishtala, Vinicius Petrucci, Paul Carpenter, and Magnus Sjalander. Twig: Multi-agent task management for colocated latency-critical cloud services. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 167-179. IEEE, 2020.

[49]

Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. Shenango: Achieving high {CPU} efficiency for latency-sensitive datacenter workloads. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 361-378, 2019.

[50]

Jinsu Park, Seongbeom Park, Myeonggyun Han, Jihoon Hyun, and Woongki Baek. Hypart: A hybrid technique for practical memory bandwidth partitioning on commodity servers. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pages 1-14, 2018.

[51]

Tirthak Patel and Devesh Tiwari. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 193-206. IEEE, 2020.

[52]

Yaqiong Peng, Song Wu, and Hai Jin. Robinhood: Towards efficient work-stealing in virtualized environments. IEEE Transactions on Parallel and Distributed Systems, 27(8):2363-2376, 2016.

Digital Library

[53]

Henry Qin, Qian Li, Jacqueline Speiser, Peter Kraft, and John Ousterhout. Arachne:{Core-Aware} thread management. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 145-160, 2018.

[54]

Jia Rao and Xiaobo Zhou. Towards fair and efficient smp virtual machine scheduling. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, page 273-286, New York, NY, USA, 2014. Association for Computing Machinery.

Digital Library

[55]

Stijn Schildermans, Jianchen Shan, Kris Aerts, Jason Jackrel, and Xiaoning Ding. Virtualization overhead of multithreading in x86 state-of-the-art & remaining challenges. IEEE Transactions on Parallel and Distributed Systems, 32(10):2557-2570, 2021.

Digital Library

[56]

Jianchen Shan, Xiaoning Ding, and Narain Gehani. Apples: Efficiently handling spin-lock synchronization on virtualized platforms. IEEE Transactions on Parallel and Distributed Systems, 28(7):1811-1824, 2017.

Digital Library

[57]

Xiang Song, Jicheng Shi, Haibo Chen, and Binyu Zang. Schedule processes, not vcpus. In Proceedings of the 4th Asia-Pacific Workshop on Systems, pages 1-7, 2013.

Digital Library

[58]

Orathai Sukwong and Hyong S Kim. Is co-scheduling too expensive for smp vms? In Proceedings of the sixth conference on Computer systems, pages 257-272, 2011.

Digital Library

[59]

Boris Teabe, Vlad Nitu, Alain Tchana, and Daniel Hagimont. The lock holder and the lock waiter preemption problems: Nip them in the bud using informed spinlocks (i-spinlock). In Proceedings of the Twelfth European Conference on Computer Systems, pages 286-297, 2017.

Digital Library

[60]

Rich Uhlig, Gil Neiger, Dion Rodgers, Amy L Santoni, Fernando CM Martins, Andrew V Anderson, Steven M Bennett, Alain Kagi, Felix H Leung, and Larry Smith. Intel virtualization technology. Computer, 38(5):48-56, 2005.

Digital Library

[61]

Xiaodong Wang, Shuang Chen, Jeff Setter, and José F Martínez. Swap: Effective fine-grain management of shared last-level caches with minimum hardware support. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 121-132. IEEE, 2017.

[62]

Chuliang Weng, Zhigang Wang, Minglu Li, and Xinda Lu. The hybrid scheduling framework for virtual machine systems. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 111-120, 2009.

Digital Library

[63]

Song Wu, Zhenjiang Xie, Haibao Chen, Sheng Di, Xinyu Zhao, and Hai Jin. Dynamic acceleration of parallel applications in cloud platforms by adaptive time-slice control. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 343- 352, 2016.

[64]

Ying Zhang, Jian Chen, Xiaowei Jiang, Qiang Liu, IanM Steiner, Andrew J Herdrich, Kevin Shu, Ripan Das, Long Cui, and Litrin Jiang. Libra: Clearing the cloud through dynamic memory bandwidth management. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 815-826. IEEE, 2021.

Index Terms

UFO: the ultimate QoS-aware CPU core management for virtualized and oversubscribed public clouds

Index terms have been assigned to the content through auto-classification.

Recommendations

SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
Nosv

nOSV can provide a bare-metal like performance for HPC applications on Cloud.The CPU cores and main memory are not shared among guest VMs of nOSV.Dedicated I/O resources are allocated to I/O sensitive HPC guests.Other virtualization environments can run ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NSDI'24: Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation

April 2024

2062 pages

ISBN:978-1-939133-39-7

Others:
Laurent Vanbever
ETH Zürich
,
Irene Zhang
Microsoft Research

Copyright © 2024 The USENIX Association.

Sponsors

Meta
FUTUREWEI
NSF
Microsort
Google Inc.

Publisher

USENIX Association

United States

Publication History

Published: 16 April 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten