research-article

SmartHarvest: harvesting idle CPUs safely and efficiently in the cloud

Authors:

Aditya Bhandari,

Neeraja J. Yadwadkar,

Siddhartha Sen,

Sameh Elnikety,

Christos Kozyrakis,

Ricardo BianchiniAuthors Info & Claims

EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

Pages 1 - 16

https://doi.org/10.1145/3447786.3456225

Published: 21 April 2021 Publication History

Abstract

We can increase the efficiency of public cloud datacenters by harvesting allocated but temporarily idling CPU cores from customer virtual machines (VMs) to run batch or analytics workloads. Even small efficiency gains translate into substantial savings, since provisioning and operating a datacenter costs hundreds of millions of dollars per year. The main challenge is to harvest idle cores with little or no impact on customer VMs, which could be running latency-sensitive services and are essentially black-boxes to the cloud provider.

We introduce ElasticVM, a new VM type that can run batch workloads cheaply using mainly harvested cores. We also propose SmartHarvest, a system that dynamically manages the number of cores available to ElasticVMs in each fine-grained time window. SmartHarvest uses online learning to predict the core demand of primary, customer VMs and compute the number of cores that can be safely harvested. Our results show that SmartHarvest can harvest a significant amount of CPU resources without increasing the 99th-percentile tail latency of latency-critical primary workloads by more than 10%. Unlike static harvesting techniques that rely on offline profiling, SmartHarvest is robust to different primary workloads, batch workloads, and load changes. Finally, we show that the online learning in SmartHarvest is complementary to systems optimizations for VM management.

References

[1]

Azure HDInsight. https://azure.microsoft.com/en-us/services/hdinsight/.

[2]

CSOAA multiclass classification. https://github.com/rvw-org/rvw/wiki/CSOAA-multiclass-classification.

[3]

Feature importances with forests of trees. https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html.

[4]

Hadoop TeraSort. https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/examples/terasort/package-summary.html.

[5]

Hyper-V Minroot. https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/manage-hyper-v-minroot-2016.

[6]

Kvm/linux kernel scheduler. https://elixir.bootlin.com/linux/v4.14/source/kernel/sched/core.c#L479.

[7]

Machine Learning Reductions. http://hunch.net/~jl/projects/reductions/reductions.html.

[8]

Memcached: high-performance, distributed memory object caching system. https://memcached.org/.

[9]

Vowpal Wabbit. https://github.com/VowpalWabbit/vowpal_wabbit/wiki.

[10]

Xen scheduler. https://github.com/xen-project/xen/blob/master/xen/common/schedule.c#L1293.

[11]

Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 469--482, Boston, MA, March 2017. USENIX Association.

[12]

Amazon Elastic Compute Cloud. Amazon EC2 Spot Instances, 2019. https://aws.amazon.com/ec2/spot/.

[13]

Pradeep Ambati, Inigo Goiri, Felipe Frujeri, Alper Gun, Ke Wang, Brian Dolan, Brian Corell, Sekhar Pasupuleti, Thomas Moscibroda, Sameh Elnikety, Marcus Fontoura, and Ricardo Bianchini. Providing slos for resource-harvesting vms in cloud platforms. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), November 2020.

[14]

Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review, volume 40, pages 53--64. ACM, 2012.

Digital Library

[15]

Microsoft Azure. Azure Spot Virtual Machines, 2020. https://azure.microsoft.com/en-us/pricing/spot.

[16]

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In Proceedings of the 2003 International Symposium on Operating Systems Principles, pages 164--177, 2003.

Digital Library

[17]

Alina Beygelzimer and John Langford. The offset tree for learning with partial labels. CoRR, abs/0812.4044, 2008.

[18]

Léon Bottou. On-line learning and stochastic approximations. In Online Learning in Neural Networks, pages 9--42. Cambridge University Press, 1998.

[19]

G. E. P. Box and G. M. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco, 1976.

[20]

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015.

[21]

Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. Borg, omega, and kubernetes. Commun. ACM, 59(5):50--57, April 2016.

Digital Library

[22]

Marcus Carvalho, Walfredo Cirne, Franciso Brasileiro, and John Wilkes. Long-term SLOs for reclaimed cloud computing resources. In Proceedings of the ACM Symposium on Cloud Computing (SoCC), pages 20:1--20:13, Seattle, WA, USA, 2014.

Digital Library

[23]

Navraj Chohan, Claris Castillo, Mike Spreitzer, Malgorzata Steinder, Asser Tantawi, and Chandra Krintz. See spot run: Using spot instances for mapreduce workflows. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, page 7, USA, 2010. USENIX Association.

[24]

Google Cloud. A deep network handwriting classifier. https://github.com/xingdi-ericyuan/multi-layer-convnet.

[25]

Google Cloud. Preemptible VM Instances, 2020. https://cloud.google.com/compute/docs/instances/preemptible.

[26]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 153--167, New York, NY, USA, 2017. ACM.

Digital Library

[27]

Jeffrey Dean and Luiz André Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, 2013.

Digital Library

[28]

Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. SIGPLAN Not., 48(4):77--88, March 2013.

Digital Library

[29]

Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 127--144, New York, NY, USA, 2014. ACM.

Digital Library

[30]

Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141--142, 2012.

[31]

John R. Douceur and William J. Bolosky. Progress-based regulation of low-importance processes. In Proceedings of the 17th ACM Symposium on Operating Systems Principles, pages 247--260. ACM Press, 1999.

[32]

Ray J Frank, Neil Davey, and Stephen P Hunt. Time series prediction and neural networks. Journal of intelligent and robotic systems, 31(1):91--103, 2001.

[33]

Zhenhuan Gong, Xiaohui Gu, and John Wilkes. Press: Predictive elastic resource scaling for cloud systems. pages 9 -- 16, 11 2010.

[34]

Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 65--80, Savannah, GA, November 2016. USENIX Association.

[35]

Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 295--308, Berkeley, CA, USA, 2011. USENIX Association.

[36]

Călin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, et al. Perfiso: performance isolation for commercial latency-sensitive services. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC 18), pages 519--532, 2018.

[37]

Seyyed Ahmad Javadi, Amoghavarsha Suresh, Muhammad Wajahat, and Anshul Gandhi. Scavenger: A black-box batch workload resource manager for improving utilization in cloud environments. In Proceedings of the ACM Symposium on Cloud Computing, SoCC '19, page 272--285, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[38]

Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. Morpheus: Towards automated slos for enterprise clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 117--134, Savannah, GA, November 2016. USENIX Association.

[39]

Evangelia Kalyvianaki, Themistoklis Charalambous, and Steven Hand. Self-adaptive and self-configured cpu resource provisioning for virtualized servers using kalman filters. In Proceedings of the 6th International Conference on Autonomic Computing, pages 117--126, 2009.

Digital Library

[40]

Harshad Kasture, Davide B Bartolini, Nathan Beckmann, and Daniel Sanchez. Rubik: Fast analytical power management for latency-critical systems. In Proceedings of the 48th International Symposium on Microarchitecture, pages 598--610. ACM, 2015.

Digital Library

[41]

Harshad Kasture and Daniel Sanchez. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1--10. IEEE, 2016.

[42]

Saehoon Kim, Yuxiong He, Seung-won Hwang, Sameh Elnikety, and Seungjin Choi. Delayed-Dynamic-Selective (DDS) prediction for reducing extreme tail latency in web search. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pages 7--16. ACM, 2015.

Digital Library

[43]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177--180. Association for Computational Linguistics, 2007.

Digital Library

[44]

Jacob Leverich and Christos Kozyrakis. Reconciling high server utilization and sub-millisecond quality-of-service. In Proceedings of the 9th European Conference on Computer Systems, page 4. ACM, 2014.

[45]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA '15, pages 450--462, New York, NY, USA, 2015. ACM.

Digital Library

[46]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. Improving resource efficiency at scale with Heracles. ACM Transactions on Computer Systems (TOCS), 34(2):6, 2016.

[47]

Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, Hot-Nets 2016, pages 50--56, New York, NY, USA, 2016. Association for Computing Machinery.

Digital Library

[48]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM '19, page 270--288, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[49]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th IEEE/ACM International Symposium on Microarchitecture, pages 248--259. ACM, 2011.

Digital Library

[50]

Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, and John Wilkes. AGILE: Elastic distributed resource scaling for infrastructure-as-a-service. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13), pages 69--82, San Jose, CA, 2013. USENIX.

[51]

Rajiv Nishtala, Paul Carpenter, Vinicius Petrucci, and Xavier Martorell. Hipster: Hybrid task manager for latency-critical cloud workloads. In Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 409--420. IEEE, 2017.

[52]

Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments. In Proceedings of the USENIX Annual Technical Conference, 2013.

[53]

Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. Shenango: Achieving high cpu efficiency for latency-sensitive datacenter workloads. In Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI'19), pages 361--378, 2019.

[54]

Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, pages 69--84, New York, NY, USA, 2013. ACM.

[55]

Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing, SoCC '12, pages 7:1--7:13, New York, NY, USA, 2012. ACM.

Digital Library

[56]

Eric Schurman and Jake Brutlag. Performance related changes and their user impact. In velocity web performance and operations conference, 2009.

[57]

Prateek Sharma, Ahmed Ali-Eldin, and Prashant Shenoy. Resource deflation: A new approach for transient resource reclamation. In Proceedings of the 14th EuroSys Conference 2019, EuroSys '19, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[58]

Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the Symposium on Cloud Computing, 2013.

[59]

Anthony Velte and Toby Velte. Microsoft Virtualization with Hyper-V. McGraw-Hill, Inc., USA, 1 edition, 2009.

Digital Library

[60]

Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. Ernest: Efficient performance prediction for large-scale advanced analytics. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, NSDI'16, pages 363--378, Berkeley, CA, USA, 2016. USENIX Association.

[61]

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems, page 18. ACM, 2015.

Digital Library

[62]

Ji Xue, Feng Yan, Robert Birke, Lydia Y Chen, Thomas Scherer, and Evgenia Smirni. Practise: Robust prediction of data center time series. In Proceedings of the 2015 11th International Conference on Network and Service Management (CNSM), pages 126--134. IEEE, 2015.

Digital Library

[63]

Neeraja J. Yadwadkar, Ganesh Ananthanarayanan, Joseph E. Gonzalez, and Randy H. Katz. Wrangler: Predictable and faster jobs using fewer resources. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 26:1--26:14. ACM, 2014.

Digital Library

[64]

Neeraja J. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, and Randy H. Katz. Faster Jobs in Distributed Data Processing using Multi-Task Learning, pages 532--540. 06 2015.

[65]

Neeraja J. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, and Randy H. Katz. Multi-task learning for straggler avoiding predictive job scheduling. Journal of Machine Learning Research, 17(106):1--37, 2016.

Digital Library

[66]

Neeraja J. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, Burton Smith, and Randy H. Katz. Selecting the best vm across multiple public clouds: A data-driven performance modeling approach. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC '17, pages 452--465, New York, NY, USA, 2017. ACM.

Digital Library

[67]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News, volume 41, pages 607--618. ACM, 2013.

Digital Library

[68]

Xi Yang, Stephen M Blackburn, and Kathryn S McKinley. Elfen scheduling: Fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading. In Proceedings of the USENIX Annual Technical Conference, pages 309--322, 2016.

[69]

Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. CPI²: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 379--391. ACM, 2013.

Digital Library

[70]

Xiao Zhang, Rongrong Zhong, Sandhya Dwarkadas, and Kai Shen. A flexible framework for throttling-enabled multicore management (TEMM). In Proceedings of the 2012 International Conference on Parallel Processing (ICPP), pages 389--398. IEEE, 2012.

Digital Library

[71]

Yunqi Zhang, George Prekas, Giovanni Matteo Fumarola, Marcus Fontoura, Inigo Goiri, and Ricardo Bianchini. History-based harvesting of spare cycles and storage in large-scale datacenters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 755--770, Savannah, GA, November 2016. USENIX Association.

Cited By

Gupta NNarayanan IHanda SChakraborti SThapar PShan BRao ALiu YWang PWu YGao QCheng CYou SHuang LFan JYu KLin KMu TMalani PWang HLu TZhang P(2024)Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At MetaProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698537(792-810)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698537
Segarra CDurev IPietzuch P(2024)Is It Time To Put Cold Starts In The Deep Freeze?Proceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698527(259-268)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698527
Zhang XHe QFan HWu S(2024)Faascale: Scaling MicroVM Vertically for Serverless Computing with Memory ElasticityProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698512(196-212)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698512
Show More Cited By

Recommendations

SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...
Nosv

nOSV can provide a bare-metal like performance for HPC applications on Cloud.The CPU cores and main memory are not shared among guest VMs of nOSV.Dedicated I/O resources are allocated to I/O sensitive HPC guests.Other virtualization environments can run ...
GPU virtualization for high performance general purpose computing on the ESX hypervisor
HPC '14: Proceedings of the High Performance Computing Symposium

Graphics Processing Units (GPU) have become important components in high performance computing (HPC) systems for their massively parallel computing capability and energy efficiency. Virtualization technologies are increasingly applied to HPC to reduce ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

April 2021

631 pages

ISBN:9781450383349

DOI:10.1145/3447786

General Chairs:
Antonio Barbalace
The University of Edinburgh
,
Pramod Bhatotia
Technical University of Munich
,
Program Chairs:
Lorenzo Alvisi
Cornell University
,
Cristian Cadar
Imperial College London

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

EuroSys '21

Sponsor:

SIGOPS

EuroSys '21: Sixteenth European Conference on Computer Systems

April 26 - 28, 2021

Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
1,338
Total Downloads

Downloads (Last 12 months)289
Downloads (Last 6 weeks)38

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gupta NNarayanan IHanda SChakraborti SThapar PShan BRao ALiu YWang PWu YGao QCheng CYou SHuang LFan JYu KLin KMu TMalani PWang HLu TZhang P(2024)Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At MetaProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698537(792-810)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698537
Segarra CDurev IPietzuch P(2024)Is It Time To Put Cold Starts In The Deep Freeze?Proceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698527(259-268)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698527
Zhang XHe QFan HWu S(2024)Faascale: Scaling MicroVM Vertically for Serverless Computing with Memory ElasticityProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698512(196-212)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698512
Yu HWang HLi JYuan XPark S(2024)Freyr +: Harvesting Idle Resources in Serverless Computing via Deep Reinforcement LearningIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346229435:11(2254-2269)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3462294
Wang JBerger DKazhamiaka FIrvene CZhang CChoukse EFrost KFonseca RWarrier BBansal CStern JBianchini RSriraman A(2024)Designing Cloud Servers for Lower Carbon2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00041(452-470)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00041
Alfares NKesidis GUrgaonkar BBaarzi AJain A(2024)Online VM Service Selection with Spot Cores for Dynamic Workloads2024 IEEE Cloud Summit10.1109/Cloud-Summit61220.2024.00016(54-60)Online publication date: 27-Jun-2024
https://doi.org/10.1109/Cloud-Summit61220.2024.00016
Huang ZTang SChang ZTan LLu QOuyang JLv WYao ZBao YWang S(2024)HAPPIES: a History-Aware Efficient Cloud Resource Overcommitment System2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00064(514-524)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00064
Jacquet PLedoux TRouvoy R(2024)SweetspotVM: Oversubscribing CPU without Sacrificing VM Performance2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00026(148-157)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00026
Kur JXue JChen JHuang J(2023)Bridging Resource Prediction and System Management: A Case Study in Cloud Systems2023 19th International Conference on Network and Service Management (CNSM)10.23919/CNSM59352.2023.10327893(1-5)Online publication date: 30-Oct-2023
https://doi.org/10.23919/CNSM59352.2023.10327893
Chugh TKandula SKrishnamurthy AMahajan RMenache I(2023)Anticipatory Resource Allocation for ML TrainingProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624669(410-426)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624669
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents