Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3545008.3545063acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Online Resource Optimization for Elastic Stream Processing with Regret Guarantee

Published: 13 January 2023 Publication History

Abstract

Recognizing the explosion of large-scale real-time analytics needs, a plethora of stream processing systems, such as Apache Storm and Flink, have been developed to support such applications. Under these systems, a stream processing application is realized as a directed acyclic graph (DAG) of operators, where the resource configuration of each operator has a significant impact on its overall throughput and latency performance. However, there is a lack of dynamic resource allocation schemes, which are theoretically sound and practically implementable, especially under the drastically changing offered load. To address this challenge, we present Dragster1, an online-optimization-based dynamic resource allocation scheme for elastic stream processing. By combining the online optimization framework with upper confidence bound (UCB) techniques, Dragster can guarantee, in expectation, a sub-linear increase in the throughput regret w.r.t. time. To demonstrate the efficacy, we implement Dragster to improve the throughput of Flink applications over Kubernetes. Compared to the state-of-the-art algorithm Dhalion, Dragster can achieve a 1.8X-2.2X speed-up in converging to the optimal configuration. It can contribute to 20.0%-25.8% gain in tuple-processing goodput and 14.6%-15.6% cost-savings.

References

[1]
2011. Flink REST API. https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html.
[2]
2014. Kubernetes Metrics Server. https://github.com/kubernetes-sigs/metrics-server.
[3]
2015. Spark Dynamic Allocation. https://jaceklaskowski.gitbooks.io/master-ing-apache-spark/spark-dynamic-allocation.html.
[4]
2016. PyTorch autograd: automatic differentiation. https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html.
[5]
2019. Horizontal Pod Autoscaling. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.
[6]
2019. Overcommiting CPUs one sole-tenant VMs. https://cloud.google.com/compute/docs/nodes/overcommitting-cpus-sole-tenant-vms.
[7]
2019. Vertical Pod Autoscaling. https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler.
[8]
Mokhtari A, Shahrampour S, Jadbabaie A, and et al. 2016. Online optimization in dynamic environments: Improved regret rates for strongly convex problems[C]. In Decision and Control (CDC). 7195–7201.
[9]
Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In NSDI, Vol. 2. 4–2.
[10]
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive online scheduling in storm. In Proceedings of the 7th ACM international conference on Distributed event-based systems. 207–218.
[11]
Muhammad Bilal and Marco Canini. 2017. Towards automatic parameter tuning of stream processing systems. In Proceedings of the 2017 Symposium on Cloud Computing. 189–200.
[12]
Eric Brochu, Vlad M Cora, and Nando De Freitas. 2010. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599(2010).
[13]
Checkpoint. 2011. Flink Savepoints. https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html.
[14]
TM Cover and JA Thomas. 1991. Elements of Information Theory,(pp 33-36) John Wiley and Sons. Inc, NY (1991).
[15]
Varsha Dani, Thomas P Hayes, and Sham M Kakade. 2008. Stochastic linear optimization under bandit feedback. (2008).
[16]
Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, and Karthik Ramasamy. 2017. Dhalion: self-regulating stream processing in heron. Proceedings of the VLDB Endowment 10, 12 (2017), 1825–1836.
[17]
Pooyan Jamshidi and Giuliano Casale. 2016. An uncertainty-aware approach to optimal configuration of stream processing systems. In 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 39–48.
[18]
Vasiliki Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In OSDI. 783–798.
[19]
Andreas Krause and Cheng S Ong. 2011. Contextual gaussian process bandit optimization. In NIPS. 2447–2455.
[20]
Kuberntes. 2014. Kubernetes. https://kubernetes.io.
[21]
Min Li, Liangzhao Zeng, Shicong Meng, Jian Tan, Li Zhang, Ali R Butt, and Nicholas Fuller. 2014. Mronline: Mapreduce online performance tuning. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. 165–176.
[22]
Yang Liu, Huanle Xu, and Wing Cheong Lau. 2020. Accordia: Adaptive Cloud Configuration Optimization for Recurring Data-Intensive Applications. In ICDCS 2020.
[23]
Mahdavi M, Jin R, and Yang T. 2012. Trading regret for efficiency: online convex optimization with long term constraints[J]. In Journal of Machine Learning Research. 2503–2538.
[24]
Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. {FIRM}: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 805–825.
[25]
rebalance. 2014. Yahoo Streaming benchmark. https://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html.
[26]
Spark. 2011. Spark ExecutorAllocationManager. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala.
[27]
Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In ICML.
[28]
Dawei Sun, Guangyan Zhang, Songlin Yang, Weimin Zheng, Samee U Khan, and Keqin Li. 2015. Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Information Sciences 319(2015), 92–112.
[29]
Yang T, Zhang L, Jin R, and et al. 2016. Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient[C]. In ICML. 449–457.
[30]
Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. 2008. NEXMark–A Benchmark for Queries over Data Streams (DRAFT). Technical Report. Technical report, OGI School of Science & Engineering at OHSU, Septembers.
[31]
Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, and Rahul Potharaju. 2020. Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo. arXiv preprint arXiv:2010.03035(2020).
[32]
Tao Ye and Shivkumar Kalyanaraman. 2003. A recursive random search algorithm for large-scale network parameter configuration. In Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. 196–205.

Cited By

View all

Index Terms

  1. Online Resource Optimization for Elastic Stream Processing with Regret Guarantee

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
    August 2022
    976 pages
    ISBN:9781450397339
    DOI:10.1145/3545008
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cloud computing
    2. elastic stream processing
    3. gaussian-process UCB
    4. kubernetes
    5. online optimization
    6. resource allocation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP '22
    ICPP '22: 51st International Conference on Parallel Processing
    August 29 - September 1, 2022
    Bordeaux, France

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media