Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3064176.3064181acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters

Published: 23 April 2017 Publication History

Abstract

Datacenters are under-utilized, primarily due to unused resources on over-provisioned nodes of latency-critical jobs. Such idle resources can be used to run batch data analytic jobs to increase datacenter utilization, but these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations, which is usually handled by checkpointing intermediate results on stable storages of eviction-free reserved resources. However, checkpointing has major shortcomings in its substantial overhead of transferring data back and forth. In this work, we step away from such approaches and focus on observing the job structure and the relationships between computations of the job. We carefully mark the computations that are most likely to cause a large number of recomputations upon evictions, to run them reliably using reserved resources. This lets us retain corresponding intermediate results effortlessly without any additional checkpointing. We design Pado, a general data processing engine, which carries out our idea with several optimizations that minimize the number of additional reserved nodes. Evaluation results show that Pado outperforms Spark 2.0.0 by up to 5.1×, and checkpoint-enabled Spark by up to 3.8×.

References

[1]
Apache Beam. http://beam.incubator.apache.org.
[2]
Cloud Dataflow. https://cloud.google.com/dataflow.
[3]
Apache Flink. http://flink.apache.org.
[4]
GlusterFS. https://www.gluster.org.
[5]
Apache Hadoop. http://hadoop.apache.org.
[6]
Spark MLlib. http://spark.apache.org/mllib.
[7]
Apache REEF. http://reef.apache.org.
[8]
Apache Spark. http://spark.apache.org.
[9]
Page view statistics for Wikimedia projects. https://dumps.wikimedia.org/other/pagecounts-raw.
[10]
Yahoo! Music User Ratings of Songs with Artist, Album, and Genre Meta Information, v. 1.0. https://webscope.sandbox.yahoo.com/catalog.php?datatype=r.
[11]
C. de Boor. A Practical Guide to Splines. Springer New York, 2001.
[12]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
[13]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Publishing Company, New York, NY, 2009.
[14]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, 2011.
[15]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.
[16]
Q. Ke, M. Isard, and Y. Yu. Optimus: A dynamic rewriting framework for data-parallel execution plans. In EuroSys, 2013.
[17]
S. Y. Ko, I. Hoque, B. Cho, and I. Gupta. On availability of intermediate data in cloud computations. In HotOS, 2009.
[18]
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 2009.
[19]
D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis. Heracles: Improving resource efficiency at scale. In ISCA, 2015.
[20]
M. Pundir, L. M. Leslie, I. Gupta, and R. H. Campbell. Zorro: Zero-cost reactive failure recovery in distributed graph processing. In SOCC, 2015.
[21]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In SOCC, 2012.
[22]
S. Schelter, S. Ewen, K. Tzoumas, and V. Markl. All roads lead to rome: optimistic recovery for distributed iterative data processing. In ACM CIKM, 2013.
[23]
P. Sharma, T. Guo, X. He, D. Irwin, and P. Shenoy. Flint: Batch-interactive data-intensive processing on transient servers. In EuroSys, 2016.
[24]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. In SOCC, 2013.
[25]
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In EuroSys, 2015.
[26]
M. Weimer, Y. Chen, B.-G. Chun, T. Condie, C. Curino, C. Douglas, Y. Lee, T. Majestro, D. Malkhi, S. Matusevych, B. Myers, S. Narayanamurthy, R. Ramakrishnan, S. Rao, R. Sears, B. Sezgin, and J. Wang. Reef: Retainable evaluator execution framework. In ACM SIGMOD, 2015.
[27]
J. Wilkes and C. Reiss. ClusterData2011_2 traces. https://github.com/google/cluster-data/blob/master/ClusterData2011_2.md.
[28]
E. P. Xing, Q. Ho, W. Dai, J. K. Kim, J. Wei, S. Lee, X. Zheng, P. Xie, A. Kumar, and Y. Yu. Petuum: A new platform for distributed machine learning on big data. In ACM SIGKDD, 2015.
[29]
Y. Yan, Y. Gao, Y. Chen, Z. Guo, B. Chen, and T. Moscibroda. Tr-spark: Transient computing for big data analytics. In SOCC, 2016.
[30]
X. Yang, S. M. Blackburn, and K. S. McKinley. Elfen scheduling: Fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading. In USENIX ATC, 2016.
[31]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[32]
Y. Zhang, G. Prekas, G. M. Fumarola, M. Fontoura, I. n. Goiri, and R. Bianchini. History-based harvesting of spare cycles and storage in large-scale datacenters. In OSDI, 2016.

Cited By

View all
  • (2024)Blaze: Holistic Caching for Iterative Data ProcessingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629558(370-386)Online publication date: 22-Apr-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2022)HermodProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563468(289-305)Online publication date: 7-Nov-2022
  • Show More Cited By

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
April 2017
648 pages
ISBN:9781450349383
DOI:10.1145/3064176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EuroSys '17
Sponsor:
EuroSys '17: Twelfth EuroSys Conference 2017
April 23 - 26, 2017
Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Blaze: Holistic Caching for Iterative Data ProcessingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629558(370-386)Online publication date: 22-Apr-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2022)HermodProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563468(289-305)Online publication date: 7-Nov-2022
  • (2022)Elastic Deep Learning in Multi-Tenant GPU ClustersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306496633:1(144-158)Online publication date: 1-Jan-2022
  • (2022)RISCLESS: A Reinforcement Learning Strategy to Guarantee SLA on Cloud Ephemeral and Stable Resources2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP55904.2022.00021(83-87)Online publication date: Mar-2022
  • (2021)Enabling Sustainable CloudsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487009(350-358)Online publication date: 1-Nov-2021
  • (2021)Apache Nemo: A Framework for Optimizing Distributed Data ProcessingACM Transactions on Computer Systems10.1145/346814438:3-4(1-31)Online publication date: 15-Oct-2021
  • (2021)Design of Distributed Network Mass Data Processing System based on Cloud Computing Technology2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)10.1109/ICOEI51242.2021.9452963(1317-1320)Online publication date: 3-Jun-2021
  • (2021)Cocoa: Towards a Scalable Compute Cost-aware Data Analytics System2021 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E52221.2021.00025(110-117)Online publication date: Oct-2021
  • (2021)Probability distribution based resource management for multitenant cloud clustersConcurrency and Computation: Practice and Experience10.1002/cpe.636035:21Online publication date: 13-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media