research-article

Online Resource Optimization for Elastic Stream Processing with Regret Guarantee

Authors:

Wing Cheong LauAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 46, Pages 1 - 11

https://doi.org/10.1145/3545008.3545063

Published: 13 January 2023 Publication History

Abstract

Recognizing the explosion of large-scale real-time analytics needs, a plethora of stream processing systems, such as Apache Storm and Flink, have been developed to support such applications. Under these systems, a stream processing application is realized as a directed acyclic graph (DAG) of operators, where the resource configuration of each operator has a significant impact on its overall throughput and latency performance. However, there is a lack of dynamic resource allocation schemes, which are theoretically sound and practically implementable, especially under the drastically changing offered load. To address this challenge, we present Dragster1, an online-optimization-based dynamic resource allocation scheme for elastic stream processing. By combining the online optimization framework with upper confidence bound (UCB) techniques, Dragster can guarantee, in expectation, a sub-linear increase in the throughput regret w.r.t. time. To demonstrate the efficacy, we implement Dragster to improve the throughput of Flink applications over Kubernetes. Compared to the state-of-the-art algorithm Dhalion, Dragster can achieve a 1.8X-2.2X speed-up in converging to the optimal configuration. It can contribute to 20.0%-25.8% gain in tuple-processing goodput and 14.6%-15.6% cost-savings.

References

[1]

2011. Flink REST API. https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html.

[2]

2014. Kubernetes Metrics Server. https://github.com/kubernetes-sigs/metrics-server.

[3]

2015. Spark Dynamic Allocation. https://jaceklaskowski.gitbooks.io/master-ing-apache-spark/spark-dynamic-allocation.html.

[4]

2016. PyTorch autograd: automatic differentiation. https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html.

[5]

2019. Horizontal Pod Autoscaling. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

[6]

2019. Overcommiting CPUs one sole-tenant VMs. https://cloud.google.com/compute/docs/nodes/overcommitting-cpus-sole-tenant-vms.

[7]

2019. Vertical Pod Autoscaling. https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler.

[8]

Mokhtari A, Shahrampour S, Jadbabaie A, and et al. 2016. Online optimization in dynamic environments: Improved regret rates for strongly convex problems[C]. In Decision and Control (CDC). 7195–7201.

[9]

Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In NSDI, Vol. 2. 4–2.

[10]

Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive online scheduling in storm. In Proceedings of the 7th ACM international conference on Distributed event-based systems. 207–218.

Digital Library

[11]

Muhammad Bilal and Marco Canini. 2017. Towards automatic parameter tuning of stream processing systems. In Proceedings of the 2017 Symposium on Cloud Computing. 189–200.

Digital Library

[12]

Eric Brochu, Vlad M Cora, and Nando De Freitas. 2010. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599(2010).

[13]

Checkpoint. 2011. Flink Savepoints. https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html.

[14]

TM Cover and JA Thomas. 1991. Elements of Information Theory,(pp 33-36) John Wiley and Sons. Inc, NY (1991).

[15]

Varsha Dani, Thomas P Hayes, and Sham M Kakade. 2008. Stochastic linear optimization under bandit feedback. (2008).

[16]

Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, and Karthik Ramasamy. 2017. Dhalion: self-regulating stream processing in heron. Proceedings of the VLDB Endowment 10, 12 (2017), 1825–1836.

Digital Library

[17]

Pooyan Jamshidi and Giuliano Casale. 2016. An uncertainty-aware approach to optimal configuration of stream processing systems. In 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 39–48.

[18]

Vasiliki Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In OSDI. 783–798.

[19]

Andreas Krause and Cheng S Ong. 2011. Contextual gaussian process bandit optimization. In NIPS. 2447–2455.

[20]

Kuberntes. 2014. Kubernetes. https://kubernetes.io.

[21]

Min Li, Liangzhao Zeng, Shicong Meng, Jian Tan, Li Zhang, Ali R Butt, and Nicholas Fuller. 2014. Mronline: Mapreduce online performance tuning. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. 165–176.

Digital Library

[22]

Yang Liu, Huanle Xu, and Wing Cheong Lau. 2020. Accordia: Adaptive Cloud Configuration Optimization for Recurring Data-Intensive Applications. In ICDCS 2020.

[23]

Mahdavi M, Jin R, and Yang T. 2012. Trading regret for efficiency: online convex optimization with long term constraints[J]. In Journal of Machine Learning Research. 2503–2538.

[24]

Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. {FIRM}: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 805–825.

[25]

rebalance. 2014. Yahoo Streaming benchmark. https://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html.

[26]

Spark. 2011. Spark ExecutorAllocationManager. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala.

[27]

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In ICML.

[28]

Dawei Sun, Guangyan Zhang, Songlin Yang, Weimin Zheng, Samee U Khan, and Keqin Li. 2015. Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Information Sciences 319(2015), 92–112.

Digital Library

[29]

Yang T, Zhang L, Jin R, and et al. 2016. Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient[C]. In ICML. 449–457.

[30]

Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. 2008. NEXMark–A Benchmark for Queries over Data Streams (DRAFT). Technical Report. Technical report, OGI School of Science & Engineering at OHSU, Septembers.

[31]

Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, and Rahul Potharaju. 2020. Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo. arXiv preprint arXiv:2010.03035(2020).

[32]

Tao Ye and Shivkumar Kalyanaraman. 2003. A recursive random search algorithm for large-scale network parameter configuration. In Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. 196–205.

Digital Library

Cited By

Guo HCao HHe JLiu XShi Y(2024)POBOPerformance Evaluation10.1016/j.peva.2023.102376162:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.peva.2023.102376
Volnes EPlagemann TGoebel V(2023)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: 7-Nov-2023
https://dl.acm.org/doi/10.1109/COMST.2023.3330953

Index Terms

Online Resource Optimization for Elastic Stream Processing with Regret Guarantee
1. Computer systems organization
  1. Real-time systems

Recommendations

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the minimal loss she would have achieved ...
Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Advances in real-world applications require high-throughput processing over large data streams. Micro-batching has been proposed to support the needs of these applications. In micro-batching, the processing and batching of the data are interleaved, ...
Distributed resource allocation for stream data processing
HPCC'06: Proceedings of the Second international conference on High Performance Computing and Communications

Data streaming applications are becoming more and more common due to the rapid development in the areas such as sensor networks, multimedia streaming, and on-line data mining, etc. These applications are often running in a decentralized, distributed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
175
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)12

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Guo HCao HHe JLiu XShi Y(2024)POBOPerformance Evaluation10.1016/j.peva.2023.102376162:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.peva.2023.102376
Volnes EPlagemann TGoebel V(2023)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: 7-Nov-2023
https://dl.acm.org/doi/10.1109/COMST.2023.3330953

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten