Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3127479.3131614acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Selecting the best VM across multiple public clouds: a data-driven performance modeling approach

Published: 24 September 2017 Publication History

Abstract

Users of cloud services are presented with a bewildering choice of VM types and the choice of VM can have significant implications on performance and cost. In this paper we address the fundamental problem of accurately and economically choosing the best VM for a given workload and user goals. To address the problem of optimal VM selection, we present PARIS, a data-driven system that uses a novel hybrid offline and online data collection and modeling framework to provide accurate performance estimates with minimal data collection. PARIS is able to predict workload performance for different user-specified metrics, and resulting costs for a wide range of VM types and workloads across multiple cloud providers. When compared to sophisticated baselines, including collaborative filtering and a linear interpolation model using measured workload performance on two VM types, PARIS produces significantly better estimates of performance. For instance, it reduces runtime prediction error by a factor of 4 for some workloads on both AWS and Azure. The increased accuracy translates into a 45% reduction in user cost while maintaining performance.

References

[1]
2017. Aerospike Datastore. https://www.aerospike.com. (2017).
[2]
2017. Amazon EC2. https://aws.amazon.com/ec2. (2017).
[3]
2017. AWS Customer Success. https://aws.amazon.com/solutions/case-studies/. (2017).
[4]
2017. AWS Lambda. https://aws.amazon.com/lambda/. (2017).
[5]
2017. Azure Functions. https://azure.microsoft.com/en-us/services/functions/. (2017).
[6]
2017. Google Cloud Functions. https://cloud.google.com/functions/. (2017).
[7]
2017. Google Cloud Platform. https://cloud.google.com/compute/. (2017).
[8]
2017. Hadoop's Capacity Scheduler. http://hadoop.apache.org/core/docs/current/capacity_scheduler.html. (2017). [9] 2017. Squash Compression Benchmark. https://quixdb.github.io/squash-benchmark/. (2017).
[9]
2017. Yahoo! Cloud Serving Benchmark. https://github.com/brianfrankcooper/YCSB/wiki/Implementing-New-Workloads. (2017).
[10]
2017. Yahoo! Cloud Serving Benchmark. https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads. (2017).
[11]
Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 469--482. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/alipourfard
[12]
Gail Alverson, Simon Kahan, Richard Korry, Cathy Mccann, and Burton Smith. 1995. Scheduling on the Tera MTA. In In Job Scheduling Strategies for Parallel Processing. Springer-Verlag, 19--44.
[13]
Amazon.com. 2017. Amazon Web Services: Case Studies. https://aws.amazon.com/solutions/case-studies/. (2017).
[14]
Sarah Bird. 2014. Optimizing Resource Allocations for Dynamic Interactive Applications. Ph.D. Dissertation. EECS Department, University of California, Berkeley.
[15]
Guy E. Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM 39, 3 (1996), 85--97.
[16]
Peter Bodik, Rean Griffith, Charles Sutton, Armando Fox, Michael I. Jordan, and David A. Patterson. 2009. Automatic Exploration of Datacenter Performance Regimes. In Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds (ACDC '09). ACM, New York, NY, USA, 1--6.
[17]
Leo Breiman. 2001. Random Forests. Mach. Learn. (Oct. 2001), 28.
[18]
John M. Calandrino and James H. Anderson. 2009. On the Design and Implementation of a Cache-Aware Multicore Real-Time Scheduler. In ECRTS. 194--204.
[19]
Josiah L. Carlson. 2013. Redis in Action. Manning Publications Co., Greenwich, CT, USA.
[20]
Surajit Chaudhuri, Vivek Narasayya, and Ravishankar Ramamurthy. 2004. Estimating Progress of Execution for SQL Queries. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD '04).
[21]
Xi Chen, Lukas Rupprecht, Rasha Osman, Peter Pietzuch, William Knottenbelt, and Felipe Franciosi. 2015. CloudScope: Diagnosing Performance Interference for Resource Management in Multi-Tenant Clouds. In 23rd IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS). Atlanta, GA, USA.
[22]
Kristina Chodorow and Michael Dirolf. 2010. MongoDB: The Definitive Guide (1st ed.). O'Reilly Media, Inc.
[23]
Michael Conley, Amin Vahdat, and George Porter. 2015. Achieving Cost-efficient, Data-intensive Computing in the Cloud (SoCC '15). 13.
[24]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154.
[25]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (Feb. 2013), 74--80.
[26]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In ASPLOS.
[27]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management (ASPLOS '14). 18.
[28]
Benjamin Farley, Ari Juels, Venkatanathan Varadarajan, Thomas Ristenpart, Kevin D. Bowers, and Michael M. Swift. 2012. More for Your Money: Exploiting Performance Heterogeneity in Public Clouds. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC '12). ACM, New York, NY, USA, Article 20, 14 pages.
[29]
Dror G. Feitelson. 1997. Job Scheduling in Multiprogrammed Parallel Systems. (1997).
[30]
Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn. 2004. Parallel Job Scheduling - A Status Report. In JSSPP. 1--16.
[31]
Andrew D. Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 99--112.
[32]
Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. 2011. Dominant resource fairness: fair allocation of multiple resource types (NSDI'11). 1.
[33]
Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam. 2011. Cuanta: Quantifying Effects of Shared On-chip Resource Interference for Consolidated Virtual Machines (SOCC '11). Article 22, 14 pages.
[34]
Mohammad Hajjat, Ruiqi Liu, Yiyang Chang, TS Eugene Ng, and Sanjay Rao. 2015. Application-specific configuration selection in the cloud: impact of provider policy and potential of systematic testing. In Computer Communications (INFOCOM), 2015 IEEE Conference on. IEEE, 873--881.
[35]
Keqiang He, Alexis Fisher, Liang Wang, Aaron Gember, Aditya Akella, and Thomas Ristenpart. 2013. Next Stop, the Cloud: Understanding Modern Web Service Deployment in EC2 and Azure. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC '13). ACM, New York, NY, USA, 177--190.
[36]
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center (NSDI'11). USENIX Association, Berkeley, CA, USA, 14. http://dl.acm.org/citation.cfm?id=1972457.1972488
[37]
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 16.
[38]
Virajith Jalaparti, Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. 2012. Bridging the Tenant-provider Gap in Cloud Services (SoCC '12). Article 10, 14 pages.
[39]
Younggyun Koh, Rob Knauerhase, Paul Brett, Mic Bowman, Zhihua Wen, and Calton Pu. 2007. An analysis of performance interference effects in virtual environments. In ISPASS.
[40]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010).
[41]
Henry Li. 2009. Introducing Windows Azure. Apress, Berkely, CA, USA.
[42]
Amiya Kumar Maji, Subrata Mitra, and Saurabh Bagchi. 2015. ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services. In ICAC.
[43]
Amiya K. Maji, Subrata Mitra, Bowen Zhou, Saurabh Bagchi, and Akshat Verma. 2014. Mitigating Interference in Cloud Services by Middleware Reconfiguration. In Proceedings of the 15th International Middleware Conference (Middleware '14).
[44]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations (MICRO-44). 248--259.
[45]
Matthew L. Massie, Brent N. Chun, and David E. Culler. 2003. The Ganglia Distributed Monitoring System: Design, Implementation And Experience. Parallel Comput. 30 (2003), 2004.
[46]
Andreas Merkel and Frank Bellosa. 2006. Balancing Power Consumption in Multiprocessor Systems (EuroSys '06). 403--414.
[47]
Andreas Merkel and Frank Bellosa. 2008. Task activity vectors: a new metric for temperature-aware scheduling. In Proc. Eurosys '08. New York, NY, USA, 1--12.
[48]
Kristi Morton, Magdalena Balazinska, and Dan Grossman. 2010. ParaTimer: A Progress Indicator for MapReduce DAGs. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). 507--518.
[49]
Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: Managing Performance Interference Effects for QoS-aware Clouds. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10). 237--250.
[50]
Jan Newmarch. 2017. FFmpeg/Libav. Apress, Berkeley, CA, 227--234.
[51]
Daniel Nurmi, Rich Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, and Dmitrii Zagorodnov. 2009. The Eucalyptus Open-Source Cloud-Computing System (CCGRID '09). 124--131.
[52]
Rice University. Department of Computer Science, V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. 1990. A Static Performance Estimator to Guide Data Partitioning Decisions. Number 136. Rice University, Department of Computer Science.
[53]
Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, Antti Ylä-Jääski, and Pan Hui. 2012. Exploiting Hardware Heterogeneity Within the Same Instance Type of Amazon EC2 (HotCloud'12).
[54]
Kay Ousterhout, Christopher Canel, Max Wolffe, Sylvia Ratnasamy, and Scott Shenker. 2017. Performance clarity as a first-class design principle. In 16th Workshop on Hot Topics in Operating Systems (HotOS'17).
[55]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011), 2825--2830.
[56]
Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of Massive Datasets. Cambridge University Press, New York, NY, USA.
[57]
Rajesh Raman, Miron Livny, and Marv Solomon. 1999. Matchmaking: An extensible framework for distributed resource management. Cluster Computing 2, 2 (April 1999).
[58]
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis (SoCC '12). Article 7, 13 pages.
[59]
Rightscale Inc. 2017. Amazon EC2: Rightscale. http://www.rightscale.com/. (2017).
[60]
Kai Shen, Ming Zhong, Sandhya Dwarkadas, Chuanpeng Li, Christopher Stewart, and Xiao Zhang. 2008. Hardware counter driven on-the-fly request signatures. SIGOPS Oper. Syst. Rev. 42, 2 (2008), 189--200.
[61]
Christopher Stewart, Terence Kelly, and Alex Zhang. 2007. Exploiting Nonstationarity for Performance Prediction (EuroSys '07). 31--44.
[62]
Byung-Chul Tak, Chunqiang Tang, Hai Huang, and Long Wang. 2013. PseudoApp: Performance prediction for application migration to cloud. In 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), Ghent, Belgium, May 27--31, 2013. 303--310.
[63]
David Tam, Reza Azimi, and Michael Stumm. 2007. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys '07.
[64]
Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, and Mary Lou Soffa. 2011. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications (ISCA '11). 283--294.
[65]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A Warehousing Solution over a Map-reduce Framework. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1626--1629.
[66]
Nedeljko Vasic, Dejan M. Novakovic, Svetozar Miucin, Dejan Kostic, and Ricardo Bianchini. 2012. DejaVu: accelerating resource allocation in virtualized environments. In ASPLOS. 423--436.
[67]
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. Apache Hadoop YARN: Yet Another Resource Negotiator (SOCC '13). Article 5, 16 pages.
[68]
Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 363--378. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/venkataraman
[69]
Akshat Verma, Puneet Ahuja, and Anindya Neogi. 2008. Power-aware Dynamic Placement of HPC Applications (ICS '08). 175--184.
[70]
Richard West, Puneet Zaroo, Carl A. Waldspurger, and Xiao Zhang. 2010. Online Cache Modeling for Commodity Multicore Processors. SIGOPS Oper. Syst. Rev. 44, 4 (Dec. 2010), 19--29.
[71]
Alexander Wieder, Pramod Bhatotia, Ansley Post, and Rodrigo Rodrigues. 2012. Orchestrating the Deployment of Computations in the Cloud with Conductor (NSDI'12). 1.
[72]
Neeraja J. Yadwadkar, Ganesh Ananthanarayanan, and Randy Katz. 2014. Wrangler: Predictable and Faster Jobs Using Fewer Resources (SoCC '14). Article 26, 14 pages.
[73]
Matei Zaharia. 2012. The Hadoop Fair Scheduler. http://developer.yahoo.net/blogs/hadoop/FairSharePres.ppt. (2012).
[74]
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling (EuroSys '10). ACM, New York, NY, USA, 265--278.
[75]
Xiao Zhang, Sandhya Dwarkadas, Girts Folkmanis, and Kai Shen. 2007. Processor hardware counter statistics as a first-class system resource. In HOTOS'07. 1--6.
[76]
Wei Zheng, Ricardo Bianchini, G. John Janakiraman, Jose Renato Santos, and Yoshio Turner. 2009. JustRunIt: Experiment-based Management of Virtualized Data Centers (USENIX'09). 18--18.
[77]
Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing Shared Resource Contention in Multicore Processors via Scheduling (ASPLOS XV). 129--142.

Cited By

View all
  • (2024)Co-Approximator: Enabling Performance Prediction in Colocated Applications.ACM Transactions on Embedded Computing Systems10.1145/367718024:1(1-28)Online publication date: 25-Jul-2024
  • (2024)Self-Adapting Machine Learning-based Systems via a Probabilistic Model Checking FrameworkACM Transactions on Autonomous and Adaptive Systems10.1145/3648682Online publication date: 7-Mar-2024
  • (2024)Erlang: Application-Aware Autoscaling for Cloud MicroservicesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650084(888-923)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
September 2017
672 pages
ISBN:9781450350280
DOI:10.1145/3127479
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. data-driven modeling
  3. performance prediction
  4. resource allocation

Qualifiers

  • Research-article

Funding Sources

Conference

SoCC '17
Sponsor:
SoCC '17: ACM Symposium on Cloud Computing
September 24 - 27, 2017
California, Santa Clara

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)380
  • Downloads (Last 6 weeks)26
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Co-Approximator: Enabling Performance Prediction in Colocated Applications.ACM Transactions on Embedded Computing Systems10.1145/367718024:1(1-28)Online publication date: 25-Jul-2024
  • (2024)Self-Adapting Machine Learning-based Systems via a Probabilistic Model Checking FrameworkACM Transactions on Autonomous and Adaptive Systems10.1145/3648682Online publication date: 7-Mar-2024
  • (2024)Erlang: Application-Aware Autoscaling for Cloud MicroservicesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650084(888-923)Online publication date: 22-Apr-2024
  • (2024)LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00022(1-18)Online publication date: 17-Nov-2024
  • (2024)Proactive Auto-Scaling for Delay-Sensitive IoT Applications Over Edge CloudsIEEE Internet of Things Journal10.1109/JIOT.2023.332454611:6(9536-9546)Online publication date: 15-Mar-2024
  • (2024)CNN Training Latency Prediction Using Hardware Metrics on Cloud GPUs2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00033(216-226)Online publication date: 6-May-2024
  • (2024)COTuner: Joint Optimization of Resource Configuration and Software Parameters for Recurring Streaming Jobs on the Cloud2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00019(87-96)Online publication date: 6-May-2024
  • (2024)TimeLink: enabling dynamic runtime prediction for Flink iterative jobsThe Journal of Supercomputing10.1007/s11227-024-06085-x80:11(16546-16573)Online publication date: 1-Jul-2024
  • (2023)Dynamic Optimization of Provider-Based Scheduling for HPC Workloads2023 International Conference on Software, Telecommunications and Computer Networks (SoftCOM)10.23919/SoftCOM58365.2023.10271608(1-6)Online publication date: 21-Sep-2023
  • (2023)SmartpickProceedings of the 24th International Middleware Conference10.1145/3590140.3592850(29-42)Online publication date: 27-Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media