Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627703.3629557acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

HD-IOV: SW-HW Co-designed I/O Virtualization with Scalability and Flexibility for Hyper-Density Cloud

Published: 22 April 2024 Publication History

Abstract

As the resource density of cloud servers increases, cloud providers deploy hundreds of VMs concurrently on a single server, requiring a high-performance, scalable, flexible and high-density I/O virtualization method. Hardware assisted virtualization such as device pass-through with SR-IOV can achieve near-native performance, however, at the expense of flexibility and a limited device count. Traditional software-based I/O virtualization systems tend to dedicate additional computing cores for higher performance, but suffer from critical scalability problems especially in high-density cloud.
In this paper, the proposed Hyper-Density I/O virtualization (HD-IOV) system tries to achieve pass-through level performance without scalability and flexibility limitations in previous works. HD-IOV is a software-hardware co-designed I/O virtualization solution. The core insight of HD-IOV is to decouple virtualization and resource management logic from hardware devices to software, reducing device complexity and enabling more flexible hardware resource management. DMA transactions and interrupts are sent directly to guest VMs without VM exits. Isolation is achieved by leveraging an existing PCIe feature, allowing IOMMU to enforce queue pair level isolation. Extensive experiments show that HD-IOV achieves similar performance as SR-IOV for both network and accelerator devices. Furthermore, HD-IOV supports maximally 2.96x higher device count with 2.9x faster median device initialization time, which is critical for emerging container and lightweight VM systems.

References

[1]
2008. NGINX: The high-performance web server and reverse proxy. https://www.linuxjournal.com/article/10108.
[2]
2016. AMD I/O Virtualization Technology (IOMMU) Specification Rev 3.00. (December 2016).
[3]
2020. Intel Virtualization Technology for Directed I/O. Intel technology journal (2020).
[4]
Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Richard Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel Virtualization Technology for Directed I/O. Intel Technology Journal 10, 3 (08 2006), 179--192.
[5]
Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In NSDI. 419--434.
[6]
Markuze Alex, Shay Vargaftik, Gil Kupfer, Boris Pismeny, Nadav Amit, Adam Morrison, and Dan Tsafrir. 2021. Characterizing, exploiting, and detecting DMA code injection vulnerabilities in the presence of an IOMMU. In EuroSys. 395--409.
[7]
Alibaba Cloud. 2020. Cloud Networking White Paper by Alibaba Cloud. https://developer.aliyun.com/topic/cloudnetworking-paper.
[8]
AMD Corporation. 2021. AMD64 Architecture Programmer's Manual Volume 2: System Programming. https://www.amd.com/system/files/TechDocs/24593.pdf., 4-25 pages.
[9]
Huawei Corporation. 2020. StratoVirt. https://gitee.com/openeuler/stratovirt.
[10]
Intel corporation. 2019. Cloud Hypervisor. https://github.com/cloud-hypervisor/cloud-hypervisor.
[11]
Intel Corporation. 2021. Intel® 64 and IA-32 Architectures Software Developer's Manual. https://www.intel.com/content/dam/support/us/en/documents/processors/pentium4/sb/25366821.pdf., 3-8 pages.
[12]
Intel Corporation. 2021. Introduce dev-msi and interrupt message store. https://lwn.net/Articles/844993/.
[13]
Yaozu Dong, Mochi Xue, Xiao Zheng, Jiajun Wang, Zhengwei Qi, and Haibing Guan. 2015. Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables. In USENIX ATC. 517--528.
[14]
Yaozu Dong, Xiaowei Yang, Jianhui Li, Guangdeng Liao, Kun Tian, and Haibing Guan. 2012. High performance network virtualization with SR-IOV. J. Parallel and Distrib. Comput. 72, 11 (2012), 1471--1480.
[15]
Keir Fraser, Steven Hand, Rolf Neugebauer, Ian Pratt, Andrew Warfield, Mark Williamson, et al. 2004. Safe hardware access with the Xen virtual machine monitor. In OASIS. 1--1.
[16]
Nadav Har'El, Abel Gordon, Alex Landau, Muli Ben-Yehuda, Avishay Traeger, and Razya Ladelsky. 2013. Efficient and Scalable Paravirtual I/O System. In USENIX ATC. USENIX Association, 231--242.
[17]
Xiaokang Hu, Jian Li, Changzheng Wei, Weigang Li, Xin Zeng, Ping Yu, and Haibing Guan. 2021. STYX: A Hierarchical Key Management System for Elastic Content Delivery Networks on Public Clouds. IEEE Trans. Dependable Secur. Comput. 18, 2 (2021), 843--857.
[18]
Xiaokang Hu, Fuzong Wang, Weigang Li, Jian Li, and Haibing Guan. 2019. QZFS: QAT Accelerated Compression in File System for Application Agnostic and Cost Efficient Data Storage. In USENIX ATC. 163--176.
[19]
Xiaokang Hu, Changzheng Wei, Jian Li, Brian Will, Ping Yu, Lu Gong, and Haibing Guan. 2019. QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technology. In PPoPP. 158--172.
[20]
Intel. 2018. Intel® QuickAssist Technology (Intel® QAT). https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technologyoverview.html.
[21]
Asim Kadav and Michael M Swift. 2009. Live migration of direct-access devices. ACM SIGOPS Operating Systems Review 43, 3 (2009), 95--104.
[22]
Daehyeok Kim, Tianlong Yu, Hongqiang Harry Liu, Yibo Zhu, Jitu Padhye, Shachar Raindel, Chuanxiong Guo, Vyas Sekar, and Srinivasan Seshan. 2019. FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds. In NSDI. 113--126.
[23]
Yossi Kuperman, Eyal Moscovici, Joel Nider, Razya Ladelsky, Abel Gordon, and Dan Tsafrir. 2016. Paravirtual Remote I/O. In ASPLOS. ACM, 49--65.
[24]
Alex Landau, Muli Ben-Yehuda, and Abel Gordon. 2011. SplitX: Split Guest/Hypervisor Execution on Multi-Core. In WIOV.
[25]
Jiaxin Lei, Manish Munikar, Kun Suo, Hui Lu, and Jia Rao. 2021. Parallelizing packet processing in container overlay networks. In EuroSys. 261--276.
[26]
Hao Li, Yihan Dang, Guangda Sun, Guyue Liu, Danfeng Shan, and Peng Zhang. 2023. LemonNFV: Consolidating Heterogeneous Network Functions at Line Speed. In NSDI. USENIX Association, 1451--1468.
[27]
Jian Li, Xiaokang Hu, David Qian, Changzheng Wei, Gordon McFadden, Brian Will, Ping Yu, Weigang Li, and Haibing Guan. 2020. QWEB: High-Performance Event-Driven Web Architecture With QAT Acceleration. IEEE Trans. Parallel Distributed Syst. 31, 11 (2020), 2633--2649.
[28]
Zijun Li, Jiagan Cheng, Quan Chen, Eryu Guan, Zizheng Bian, Yi Tao, Bin Zha, Qiang Wang, Weidong Han, and Minyi Guo. 2022. RunD: A Lightweight Secure Container Runtime for High-density Deployment and High-concurrency Startup in Serverless Computing. In USENIX Annual Technical Conference. USENIX Association, 53--68.
[29]
Cunming Liang and Tiwei Bie. 2018. vdpa: vhost-mdev as a New vhost Protocol Transport. In KVM Forum.
[30]
Jiuxing Liu, Wei Huang, Bülent Abali, and Dhabaleswar K. Panda. 2006. High Performance VMM-Bypass I/O in Virtual Machines. In USENIX Annual Technical Conference, General Track. USENIX, 29--42.
[31]
Xiaofeng Lyu, Yanchao Li, Na Ren, Chenhao Nan, Dong Cao, and Shuai Jiang. 2020. Optimization of High-Density and High-Efficiency Switched-Tank Converter for Data Center Applications. IEEE Trans. Ind. Electron. 67, 2 (2020), 1626--1637.
[32]
Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu, and Felipe Huici. 2017. My VM is Lighter (and Safer) than your Container. In SOSP. 218--233.
[33]
Alex Markuze, Adam Morrison, and Dan Tsafrir. 2016. True IOMMU protection from DMA attacks: When copy is faster than zero copy. In ASPLOS. 249--262.
[34]
Eric Masanet, Arman Shehabi, Nuoa Lei, Sarah Smith, and Jonathan Koomey. 2020. Recalibrating global data center energy-use estimates. Science 367, 6481 (2020), 984--986.
[35]
Gil Neiger, Amy Santoni, Felix Leung, Dion Rodgers, and Richard Uhlig. 2006. Intel Virtualization Technology: Hardware Support for Efficient Processor Virtualization. Intel Technology Journal 10, 3 (08 2006).
[36]
DPDK open source community. 2020. AF_XDP Poll Mode Driver. https://doc.dpdk.org/guides/nics/af_xdp.html.
[37]
Zhenhao Pan, Yaozu Dong, Yu Chen, Lei Zhang, and Zhijiao Zhang. 2012. CompSC: Live migration with pass-through devices. In VEE. 109--120.
[38]
Michele Paolino, Nikolay Nikolaev, Jeremy Fanguede, and Daniel Raho. 2015. SnabbSwitch user space virtual switch benchmark and performance optimization for NFV. In NFV-SDN. 86--92.
[39]
PCIe-Consortium et al. 2006. PCI Express Base Specification Revision 2.0. PCIe Group (2006), 381--384.
[40]
PCIe-Consortium et al. 2011. PCI-SIG ENGINEERING CHANGE NOTICE. PCIe Group(2011).
[41]
Bo Peng, Haozhong Zhang, Jianguo Yao, Yaozu Dong, Yu Xu, and Haibing Guan. 2018. MDev-NVMe: A NVMe storage virtualization solution with mediated pass-through. In USENIX ATC. 665--676.
[42]
Rusty Russell. 2008. virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Operating Systems Review 42, 5 (2008), 95--103.
[43]
PCI SIG. 2007. Single Root I/O Virtualization and Sharing 1.0 Specification. PCIe Group (01 2007).
[44]
Arjun Singhvi, Junaid Khalid, Aditya Akella, and Sujata Banerjee. 2020. SNF: serverless network functions. In SoCC. ACM, 296--310.
[45]
Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A full GPU virtualization solution with mediated pass-through. In USENIX ATC. 121--132.
[46]
Michael S. Tsirkin. 2010. vhost-net and virtio-net: need for speed. https://www.linux-kvm.org/images/1/1b/2010-forum-vhost_virtio_net_need_for_speed_2.pdf.
[47]
Jianfeng Wang, Tamás Lévai, Zhuojin Li, Marcos A. M. Vieira, Ramesh Govindan, and Barath Raghavan. 2022. Quadrant: a cloud-deployable NF virtualization platform. In SoCC. ACM, 493--509.
[48]
Simon Winwood, Yefim Shuf, and Hubertus Franke. 2002. Multiple page size support in the linux kernel. In OLS. 573--593.
[49]
Xin Xu and Bhavesh Davda. 2016. SRVM: Hypervisor support for live migration with passthrough SR-IOV network devices. ACM SIGPLAN Notices 51, 7 (2016), 65--77.
[50]
Mochi Xue, Kun Tian, Yaozu Dong, Jiacheng Ma, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling up {GPU} Virtualization with Dynamic Sharing of Graphics Memory Space. In USENIX ATC. 579--590.
[51]
Sheng Yang. 2018. PCI: Enable Function Level Reset (FLR) for PCI-E. https://linux-pci.vger.kernel.narkive.com/kZwb1IHR/rfc-patch-pci-enable-function-level-reset-flr-for-pci-e.
[52]
Wei Yang. 2015. PCI: Refresh First VF Offset and VF Stride when updating NumVFs. https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/.
[53]
Tianlong Yu, Shadi Abdollahian Noghabi, Shachar Raindel, Hongqiang Harry Liu, Jitu Padhye, and Vyas Sekar. 2016. FreeFlow: High Performance Container Networking. In HotNets. 43--49.
[54]
Zhang Yu. 2019. The Application of Kata Containers in Baidu AI Cloud. https://katacontainers.io/collateral/ApplicationOfKataContainersInBaiduAICloud.pdf., 14-17 pages.
[55]
Edwin Zhai, Gregory D Cummings, and Yaozu Dong. 2008. Live migration with pass-through device for Linux VM. In OLS'08: The 2008 Ottawa Linux Symposium. 261--268.
[56]
Xiantao Zhang, Xiao Zheng, Zhi Wang, Hang Yang, Yibin Shen, and Xin Long. 2020. High-density multi-tenant bare-metal cloud. In ASPLOS. 483--495.
[57]
Zongpu Zhang, Hubin Zhang, Junyuan Wang, Xiaokang Hu, Jian Li, Wenqian Yu, Ping Yu, Weigang Li, Bo Cui, Guodong Zhu, Kapil Sood, Brian Will, and Haibing Guan. 2023. QKPT: Securing Your Private Keys in Cloud With Performance, Scalability and Transparency. IEEE Trans. Dependable Secur. Comput. 20, 1 (2023), 478--491.

Cited By

View all
  • (2024)Rethinking the Networking Stack for Serverless Environments: A Sidecar ApproachProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698561(213-222)Online publication date: 20-Nov-2024

Index Terms

  1. HD-IOV: SW-HW Co-designed I/O Virtualization with Scalability and Flexibility for Hyper-Density Cloud

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
    April 2024
    1245 pages
    ISBN:9798400704376
    DOI:10.1145/3627703
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Cloud Computing
    2. I/O Virtualization
    3. Intel QAT
    4. PASID

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    EuroSys '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)406
    • Downloads (Last 6 weeks)29
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Rethinking the Networking Stack for Serverless Environments: A Sidecar ApproachProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698561(213-222)Online publication date: 20-Nov-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media