Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2806777.2806847acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Online parameter optimization for elastic data stream processing

Published: 27 August 2015 Publication History

Abstract

Elastic scaling allows data stream processing systems to dynamically scale in and out to react to workload changes. As a consequence, unexpected load peaks can be handled and the extent of the overprovisioning can be reduced. However, the strategies used for elastic scaling of such systems need to be tuned manually by the user. This is an error prone and cumbersome task, because it requires a detailed knowledge of the underlying system and workload characteristics. In addition, the resulting quality of service for a specific scaling strategy is unknown a priori and can be measured only during runtime.
In this paper we present an elastic scaling data stream processing prototype, which allows to trade off monetary cost against the offered quality of service. To that end, we use an online parameter optimization, which minimizes the monetary cost for the user. Using our prototype a user is able to specify the expected quality of service as an input to the optimization, which automatically detects significant changes of the workload pattern and adjusts the elastic scaling strategy based on the current workload characteristics. Our prototype is able to reduce the costs for three real-world use cases by 19% compared to a naive parameter setting and by 10% compared to a manually tuned system. In contrast to state of the art solutions, our system provides a stable and good trade-off between monetary cost and quality of service.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The Design of the Borealis Stream Processing Engine. In Proceedings of the Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, pages 277--289, 2005.
[2]
Amazon. Amazon Auto Scaling. http://aws.amazon.com/autoscaling/.
[3]
Amazon. Amazon Kinesis. http://aws.amazon.com/de/kinesis/.
[4]
A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In Proceedings of the Seventh SIAM International Conference on Data Mining, SDM 2007, pages 443--448, 2007.
[5]
E. G. Coffman Jr, M. R. Garey, and D. S. Johnson. Approximation algorithms for bin packing: A survey. In Approximation algorithms for NP-hard problems, pages 46--93. PWS Publishing Co., 1996.
[6]
C. Delimitrou and C. Kozyrakis. Quasar: resource-efficient and qos-aware cluster management. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, ASPLOS 2014, pages 127--144. ACM, 2014.
[7]
M. Ead, H. Herodotou, A. Aboulnaga, and S. Babu. PStorM: Profile Storage and Matching for Feedback-Based Tuning of MapReduce Jobs. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, pages 1--12, 2014.
[8]
R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the SIGMOD International Conference on Management of Data, SIGMOD 2013, pages 725--736. ACM, 2013.
[9]
B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu. Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems (TPDS), 25(6):1447--1463, 2014.
[10]
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. In Proceedings of the 2014 ACM conference on SIGCOMM, pages 455--466. ACM, 2014.
[11]
V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems (TPDS), 23(12):2351--2365, 2012.
[12]
T. Heinze, Z. Jerzak, G. Hackenbroich, and C. Fetzer. Latency-aware elastic scaling for distributed data stream processing systems. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, DEBS 2014, pages 13--22. ACM, 2014.
[13]
T. Heinze, V. Pappalardo, Z. Jerzak, and C. Fetzer. Auto-scaling techniques for elastic data stream processing. In Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, ICDEW 2014, pages 296--302. IEEE, 2014.
[14]
N. R. Herbst, S. Kounev, and R. Reussner. Elasticity in cloud computing: What it is, and what it is not. In Proceedings of the 10th International Conference on Autonomic Computing, ICAC 2013, pages 23--27, 2013.
[15]
H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. Proceedings of the VLDB Endowment, 4(11):1111--1122, 2011.
[16]
E. Kalyvianaki, W. Wiesemann, Q. H. Vu, D. Kuhn, and P. Pietzuch. SQPR: Stream query planning with reuse. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, pages 840--851. IEEE, 2011.
[17]
U. Lampe, R. Hans, M. Seliger, and M. Pauly. Pricing in infrastructure clouds - an analytical and empirical examination. In Proceedings of the 20th Americas Conference on Information Systems, AMCIS 2014, 2014.
[18]
T. Lorido-Botrán, J. Miguel-Alonso, and J. A. Lozano. Auto-scaling techniques for elastic applications in cloud environments. Department of Computer Architecture and Technology, University of Basque Country, Tech. Rep. EHU-KAT-IK-09, 12, 2012.
[19]
S. Martello and P. Toth. Algorithms for knapsack problems. Surveys in Combinatorial Optimization, 31:213--258, 1987.
[20]
X. Meng, V. Pappas, and L. Zhang. Improving the scalability of data center networks with traffic-aware virtual machine placement. In Proceedings of 2010 IEEE INFOCOM, pages 1--9. IEEE, 2010.
[21]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing, SoCC 2012, page 7. ACM, 2012.
[22]
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu. Elastic scaling of data parallel operators in stream processing. In IPDPS 2009: Proceedings of the 23rd IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pages 1--12. IEEE, 2009.
[23]
Z. Shen, S. Subbiah, X. Gu, and J. Wilkes. Cloudscale: elastic resource scaling for multi-tenant cloud systems. In Proceedings of the second ACM Annual Symposium on Cloud Computing, SoCC 2011, pages 1--14. ACM, 2011.
[24]
W. Song, Z. Xiao, Q. Chen, and H. Luo. Adaptive resource provisioning for the cloud using online bin packing. IEEE Transactions on Computers, 63(11):2647--2660, 2014.
[25]
E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In Proceedings of the SIGMOD International Conference on Management of Data, SIGMOD 2006, pages 407--418. ACM, 2006.
[26]
Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pages 791--802. IEEE, 2005.
[27]
T. Ye and S. Kalyanaraman. A recursive random search algorithm for large-scale network parameter configuration. In Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS 2003, pages 196--205. ACM, 2003.
[28]
H. Ziekow and Z. Jerzak. The DEBS 2014 Grand Challenge. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, DEBS 2014, pages 266--269. ACM, 2014.

Cited By

View all
  • (2024)Optimization enabled elastic scaling in cloud based on predicted load for resource managementMultiagent and Grid Systems10.3233/MGS-23000319:4(289-311)Online publication date: 4-Mar-2024
  • (2024)Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS TargetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339983435:7(1251-1267)Online publication date: Jul-2024
  • (2023)Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous ResourcesACM Transactions on Autonomous and Adaptive Systems10.1145/359743518:4(1-44)Online publication date: 14-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
August 2015
446 pages
ISBN:9781450336512
DOI:10.1145/2806777
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed data stream processing
  2. elasticity
  3. load balancing
  4. parameter optimization

Qualifiers

  • Research-article

Conference

SoCC '15
Sponsor:
SoCC '15: ACM Symposium on Cloud Computing
August 27 - 29, 2015
Hawaii, Kohala Coast

Acceptance Rates

SoCC '15 Paper Acceptance Rate 34 of 157 submissions, 22%;
Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimization enabled elastic scaling in cloud based on predicted load for resource managementMultiagent and Grid Systems10.3233/MGS-23000319:4(289-311)Online publication date: 4-Mar-2024
  • (2024)Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS TargetsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339983435:7(1251-1267)Online publication date: Jul-2024
  • (2023)Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous ResourcesACM Transactions on Autonomous and Adaptive Systems10.1145/359743518:4(1-44)Online publication date: 14-Oct-2023
  • (2023)SASPAR: Shared Adaptive Stream Partitioning2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00076(922-935)Online publication date: Apr-2023
  • (2022)DaltonProceedings of the VLDB Endowment10.14778/3570690.357069916:3(491-504)Online publication date: 1-Nov-2022
  • (2022) Forecasting Cloud Application Workloads With CloudInsight for Predictive Resource Management IEEE Transactions on Cloud Computing10.1109/TCC.2020.299801710:3(1848-1863)Online publication date: 1-Jul-2022
  • (2022)An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clustersThe Journal of Supercomputing10.1007/s11227-022-04669-z79:1(461-498)Online publication date: 11-Jul-2022
  • (2022)ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworksCluster Computing10.1007/s10586-022-03767-026:5(2737-2755)Online publication date: 14-Oct-2022
  • (2022)Alps: An Adaptive Load Partitioning Scaling Solution for Stream Processing System on Skewed StreamDatabase and Expert Systems Applications10.1007/978-3-031-12426-6_2(17-31)Online publication date: 29-Jul-2022
  • (2022)JointConf: Jointly autotuning configuration parameters for modularized graph databasesJournal of Software: Evolution and Process10.1002/smr.249534:12Online publication date: 29-Jul-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media