Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3678015.3680487acmconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

SmartNIC-Enabled Live Migration for Storage-Optimized VMs

Published: 04 September 2024 Publication History

Abstract

Cloud providers offer storage-optimized VMs equipped with locally attached storage to meet the high performance requirements of cloud users. However, current cloud providers cannot enable live migration for storage-optimized VMs due to the high resource overheads. Moreover, resources should be permanently provisioned for live migration as on-demand provisioning needs to de-allocate resources from either VMs or the hypervisor, thus violating SLA. We propose a storage live migration acceleration system on SmartNICs. Our design achieves minimal resource overhead and SLA violations by proposing (1) a SmartNIC-managed live migration architecture and (2) an efficient consistency algorithm. We implement a basic prototype on an FPGA-based SmartNIC. Preliminary results show that we can migrate storage-optimized VMs with no host resource usage and minimal performance interference to RocksDB running inside the VM. This project is part of the Terminus Project [28].

References

[1]
[n. d.]. AWS Nitro System. https://aws.amazon.com/ec2/nitro/. Accessed: 2024-7-8.
[2]
[n. d.]. Intel® Optane SSD 900P Series 280GB 12 Height PCIe x4 20nm 3D XPoint Product Specifications. https://ark.intel.com/content/www/us/en/ark/products/123628/intel-optane-ssd-900p-series-280gb-12-height-pcie-x4-20nm-3d-xpoint.html. Accessed: 2024-7-8.
[3]
[n. d.]. Kernel-based Virtual Machine (KVM). http://www.linux-kvm.org. Accessed: 2024-7-8.
[4]
[n.d.]. Microsoft Hyper-V. http://www.microsoft.com/en-us/server-cloud/solutions/virtualization.aspx. Accessed: 2024-7-8.
[5]
[n.d.]. VMware. http://www.vmware.com. Accessed: 2024-7-8.
[6]
2013. Intel®FPGAs- Intel® Arria® 10 GX FPGA. https://www.intel.com/content/www/us/en/products/details/fpga/arria/10/gx.html. Accessed: 2024-7-8.
[7]
2018. QEMU: the FAST! processor emulator. https://www.qemu.org/Error! Hyperlink reference not valid.Accessed: 2024-7-8.
[8]
2022. Benchmarking tools • facebook/rocksdb Wiki • GitHub. https://github.com/facebook/rocksdb/wiki/Benchmarking-tools. Accessed: 2024-7-8.
[9]
2022. Broadcom Stingray PS1100R. https://docs.broadcom.com/doc/PS1100R-PB. Accessed: 2022-1-1.
[10]
2022. RocksDB: A Persistent Key-Value Store for Fast Storage Environments. https://rocksdb.org/. Accessed: 2024-7-8.
[11]
2023. Alibaba CIPU. https://www.alibabacloud.com/blog/a-detailed-explanation-about-alibaba-cloud-cipu_599183. Accessed: 2024-7-8.
[12]
2024. Agilio CX SmartNICs. https://www.netronome.com/products/agilio-cx/. Accessed: 2024-7-8.
[13]
2024. Amazon DynamoDB. https://aws.amazon.com/dynamodb/. Accessed: 2024-7-8.
[14]
2024. Azure Disk Storage Overview - Azure Virtual Machines | Microsoft Learn. https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview. Accessed: 2024-7-8.
[15]
2024. Cloud Block Storage - Amazon EBS - AWS. https://aws.amazon.com/ebs/. Accessed: 2024-7-8.
[16]
2024. Google Cloud - Live Migration Process during Maintenance Events. https://cloud.google.com/compute/docs/instances/live-migration-process. Accessed: 2024-7-8.
[17]
2024. Maintenance and Updates - Azure Virtual Machines. https://learn.microsoft.com/en-us/azure/virtual-machines/maintenance-and-updates#live-migration. Accessed: 2024-7-8.
[18]
2024. Marvell OCTEON 10 DPU. https://www.marvell.com/products/data-processing-units.html. Accessed: 2024-7-8.
[19]
2024. Overview of Azure Boost. https://learn.microsoft.com/en-us/azure/azure-boost/overview. Accessed: 2024-7-8.
[20]
2024. Persistent Disk: Durable Block Storage | Google Cloud. https://cloud.google.com/persistent-disk. Accessed: 2024-7-8.
[21]
2024. Storage Optimized Instances - Amazon EC2. https://docs.aws.amazon.com/ec2/latest/instancetypes/so.html. Accessed: 2024-7-8.
[22]
2024. Storage Optimized Virtual Machine Sizes - Azure Virtual Machines. https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-storage. Accessed: 2024-7-8.
[23]
Samer Al-Kiswany, Dinesh Subhraveti, Prasenjit Sarkar, and Matei Ripeanu. 2011. Vmflock: Virtual Machine Co-migration for the Cloud. In Proceedings of the 20th International Symposium on High Performance Distributed Computing. 159--170.
[24]
Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir, and Assaf Schuster. 2011. vIOMMU: Efficient IOMMU Emulation. In Proceedings of the 2011 USENIX Annual Technical Conference (ATC). 73--88.
[25]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the Art of Virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP). 164--177.
[26]
Junehyuk Boo, Yujin Chung, Eunjin Baek, Seongmin Na, Changsu Kim, and Jangwoo Kim. 2023. F4T: A Fast and Flexible FPGA-based Full-stack TCP Acceleration Framework. In Proceedings of the 50th ACM/IEEE International Symposium on Computer Architecture (ISCA). 1--13.
[27]
Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A Cloud-scale Acceleration Architecture. In 2016 49th IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.
[28]
Derek Chiou, Ran Shu, Lei Qu, Peng Cheng, Yongqiang Xiong, Ram Huggahalli, Arun Kishan, Mark D. Hill, and Steve Scott. 2024. Terminus: Moving the Center of Cloud Servers from Cores to SmartNICs and Beyond. HPCA 2024 Keynote.
[29]
Inho Choi, Nimish Wadekar, Raj Joshi, Joshua Fried, Dan R.K. Ports, Irene Zhang, and Jialin Li. 2023. Capybara: μSecond-Scale Live TCP Migration. In Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys). 30--36.
[30]
Sean Choi, Muhammad Shahbaz, Balaji Prabhakar, and Mendel Rosenblum. 2019. λ-NIC: Interactive Serverless Compute on Programmable SmartNICs. arXiv preprint arXiv:1909.11958 (2019).
[31]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (2013), 74--80.
[32]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications. ACM Transactions on Storage (TOS) 17, 4 (2021), 1--32.
[33]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 51--66.
[34]
Michael R Hines and Kartik Gopalan. 2009. Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments. 51--60.
[35]
Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, and Ryan Stutsman. 2017. Rocksteady: Fast Migration for Low-latency In-memory Storage. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). 390--405.
[36]
Yossi Kuperman, Eyal Moscovici, Joel Nider, Razya Ladelsky, Abel Gordon, and Dan Tsafrir. 2016. Paravirtual Remote I/O. In Proceedings of the 21st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 49--65.
[37]
Dongup Kwon, Junehyuk Boo, Dongryeong Kim, and Jangwoo Kim. 2020. FVM: FPGA-assisted Virtual Device Emulation for Fast, Scalable, and Flexible Storage Virtualization. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 955--971.
[38]
Huaicheng Li, Mingzhe Hao, Stanko Novakovic, Vaibhav Gogte, Sriram Govindan, Dan R.K. Ports, Irene Zhang, Ricardo Bianchini, Haryadi S. Gunawi, and Anirudh Badam. 2020. Leapio: Efficient and Portable Virtual NVMe Storage on ARM SoCs. In Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 591--605.
[39]
Xiaoyu Li, Ran Shu, Yongqiang Xiong, and Fengyuan Ren. 2024. Software-based Live Migration for Containerized RDMA. In Proceedings of the 8th ACM SIGCOMM Asia-Pacific Workshop on Networking (APNet). 52--58.
[40]
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 429--444.
[41]
Katie Lim, Matthew Giordano, Theano Stavrinos, Baris Kasikci, and Thomas Anderson. 2024. Beehive: A Flexible Network Stack for Direct-Attached Accelerators. arXiv preprint arXiv:2403.14770 (2024).
[42]
Ming Liu, Simon Peter, Arvind Krishnamurthy, and Phitchaya Mangpo Phothilimthana. 2019. E3: Energy-efficient Microservices on SmartNIC-accelerated Servers. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). 363--378.
[43]
Artemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, and Boris Grot. 2019. Prefetched Address Translation. In Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO). 1023--1036.
[44]
Ali José Mashtizadeh, Emré Celebi, Tal Garfinkel, and Min Cai. 2011. The Design and Evolution of Live Storage Migration in VMware ESX. In Proceedings of the 2011 USENIX Annual Technical Conference (ATC). 187--200.
[45]
Nirav Mehta. 2022. Introducing C3 machines with Google's custom Intel IPU | Google Cloud Blog. https://cloud.google.com/blog/products/compute/introducing-c3-machines-with-googles-custom-intel-ipu. Accessed: 2024-7-8.
[46]
Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-tenant Storage Disaggregation on SmartNIC JBOFs. In Proceedings of the 2021 ACM SIGCOMM Conference. 106--122.
[47]
Maksym Planeta, Jan Bierbaum, Leo Sahaya Daphne Antony, Torsten Hoefler, and Hermann Härtig. 2021. MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications. In Proceedings of the 2021 USENIX Annual Technical Conference (ATC). 47--63.
[48]
George Prekas, Marios Kogias, and Edouard Bugnion. 2017. Zygos: Achieving low tail latency for microsecond-scale networked tasks. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). 325--341.
[49]
Adam Ruprecht, Danny Jones, Dmitry Shiraev, Greg Harmon, Maya Spivak, Michael Krebs, Miche Baker-Harvey, and Tyler Sanderson. 2018. VM Live Migration At Scale. In Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE). 45--56.
[50]
Xiang Song, Jicheng Shi, Ran Liu, Jian Yang, and Haibo Chen. 2013. Parallelizing Live Migration of Virtual Machines. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE). 85--96.
[51]
Xin Xu and Bhavesh Davda. 2016. SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices. ACM SIGPLAN Notices 51, 7 (2016), 65--77.
[52]
Fei Zhang, Guangming Liu, Xiaoming Fu, and Ramin Yahyapour. 2018. A Survey on Virtual Machine Migration: Challenges, Techniques, and Open Issues. IEEE Communications Surveys & Tutorials 20, 2 (2018), 1206--1243.
[53]
Jiechen Zhao, Iris Uwizeyimana, Karthik Ganesan, Mark C. Jeffrey, and Natalie Enright Jerger. 2022. Altocumulus: Scalable Scheduling for Nanosecond-scale Remote Procedure Calls. In Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). 423--440.
[54]
Guanwen Zhong, Aditya Kolekar, Burin Amornpaisannon, Inho Choi, Haris Javaid, and Mario Baldi. 2023. A Primer on RecoNIC: RDMA-enabled Compute Offloading on SmartNIC. arXiv preprint arXiv:2312.06207 (2023). RE@Received 2nd May 2024; accepted 1st July 2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
APSys '24: Proceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems
September 2024
150 pages
ISBN:9798400711053
DOI:10.1145/3678015
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGAs
  2. NVMe storage
  3. SmartNICs
  4. cloud computing
  5. hypervisors
  6. live migration
  7. virtual machines

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

APSys '24
Sponsor:

Acceptance Rates

APSys '24 Paper Acceptance Rate 20 of 44 submissions, 45%;
Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 104
    Total Downloads
  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)31
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media