Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3328905.3329505acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications

Published: 24 June 2019 Publication History

Abstract

In modern Stream Processing Engines (SPEs), numerous diverse applications, which can differ in aspects such as cost, criticality or latency sensitivity, can co-exist in the same computing node. When these differences need to be considered to control the performance of each application, custom scheduling of operators to threads is of key importance (e.g., when a smart vehicle needs to ensure that safety-critical applications always have access to computational power, while other applications are given lower, variable priorities).
Many solutions have been proposed regarding schedulers that allocate threads to operators to optimize specific metrics (e.g., latency) but there is still lack of a tool that allows arbitrarily complex scheduling strategies to be seamlessly plugged on top of an SPE. We propose Haren to fill this gap. More specifically, we (1) formalize the thread scheduling problem in stream processing in a general way, allowing to define ad-hoc scheduling policies, (2) identify the bottlenecks and the opportunities of scheduling in stream processing, (3) distill a compact interface to connect Haren with SPEs, enabling rapid testing of various scheduling policies, (4) illustrate the usability of the framework by integrating it into an actual SPE and (5) provide a thorough evaluation. As we show, Haren makes it is possible to adapt the use of computational resources over time to meet the goals of a variety of scheduling policies.

References

[1]
Swarup Acharya and S. Muthukrishnan. 1998. Scheduling On-demand Broad-casts: New Metrics and Algorithms. In Proceedings of the 4th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom '98). ACM, New York, NY, USA, 43--54.
[2]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1792--1803.
[3]
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive Online Scheduling in Storm. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems (DEBS '13). ACM, New York, NY, USA, 207--218.
[4]
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Dilys Thomas. 2004. Operator Scheduling in Data Stream Systems. The VLDB Journal 13, 4 (Dec. 2004), 333--353.
[5]
Brian Babcock, Shivnath Babu, Rajeev Motwani, and Mayur Datar. 2003. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03). ACM, New York, NY, USA, 253--264.
[6]
Michael A. Bender, Soumen Chakrabarti, and S. Muthukrishnan. 1998. Flow and Stretch Metrics for Scheduling Continuous Job Streams. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '98). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 270--279. http://dl.acm.org/citation.cfm?id=314613.314715
[7]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
[8]
Don Carney, Uğur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2002. Monitoring Streams: A New Class of Data Management Applications. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB '02). VLDB Endowment, 215--226. http://dl.acm.org/citation.cfm?id=1287369.1287389
[9]
Don Carney, Uğur Çetintemel, Alex Rasin, Stan Zdonik, Mitch Cherniack, and Mike Stonebraker. 2003. Operator Scheduling in a Data Stream Manager. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (VLDB '03). VLDB Endowment, 838--849. http://dl.acm.org/citation.cfm?id=1315451.1315523
[10]
Badrish Chandramouli, Jonathan Goldstein, Roger Barga, Mirek Riedewald, and and. 2010. Accurate Latency Estimation in a Distributed Event Processing System. Technical Report. https://www.microsoft.com/en-us/research/publication/accurate-latency-estimation-in-a-distributed-event-processing-system/
[11]
Martin Hirzel, Robert SoulÃľ, Scott Schneider, Bugra Gedik, and Robert Grimm. 2011. A catalog of stream processing optimizations. Technical Report.
[12]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 239--250.
[13]
Teng Li, Zhiyuan Xu, Jian Tang, and Yanzhi Wang. 2018. Model-free Control for Distributed Stream Data Processing Using Deep Reinforcement Learning. Proc. VLDB Endow. 11, 6 (Feb. 2018), 705--718.
[14]
liebre 2017. Liebre SPE. https://github.com/vincenzo-gulisano/Liebre.
[15]
Lory Al Moakar, Thao N. Pham, Panayiotis Neophytou, Panos K. Chrysanthis, Alexandros Labrinidis, and Mohamed Sharaf. 2009. Class-based Continuous Query Scheduling for Data Streams. In Proceedings of the Sixth International Workshop on Data Management for Sensor Networks (DMSN '09). ACM, New York, NY, USA, Article 9, 6 pages.
[16]
S. Muthukrishnan, Rajmohan Rajaraman, Anthony Shaheen, and Johannes E. Gehrke. 1999. Online Scheduling to Minimize Average Stretch. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS '99). IEEE Computer Society, Washington, DC, USA, 433--. http://dl.acm.org/citation.cfm?id=795665.796508
[17]
Odroid-XU4 2016. Odroid-XU4. http://www.hardkernel.com.
[18]
Dimitris Palyvos-Giannas, Vincenzo Gulisano, and Marina Papatriantafilou. 2018. GeneaLog: Fine-Grained Data Streaming Provenance at the Edge. In Proceedings of the 19th International Middleware Conference (Middleware '18). ACM, New York, NY, USA, 227--238.
[19]
Thao N. Pham, Panos K. Chrysanthis, and Alexandros Labrinidis. 2016. Avoiding Class Warfare: Managing Continuous Queries with Differentiated Classes of Service. The VLDB Journal 25, 2 (April 2016), 197--221.
[20]
T. N. Pham, L. A. Moakar, P. K. Chrysanthis, and A. Labrinidis. 2011. DILoS: A dynamic integrated load manager and scheduler for continuous queries. In 2011 IEEE 27th International Conference on Data Engineering Workshops. 10--15.
[21]
M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis. 2005. Preemptive Rate-based Operator Scheduling in a Data Stream Management System. In Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications (AICCSA '05). IEEE Computer Society, Washington, DC, USA, 46--I. http://dl.acm.org/citation.cfm?id=1249246.1249645
[22]
Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis, and Kirk Pruhs. 2006. Efficient Scheduling of Heterogeneous Continuous Queries. In Proceedings of the 32Nd International Conference on Very Large Data Bases (VLDB '06). VLDB Endowment, 511--522. http://dl.acm.org/citation.cfm?id=1182635.1164172
[23]
Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis, and Kirk Pruhs. 2008. Algorithms and Metrics for Processing Multiple Heterogeneous Continuous Queries. ACM Trans. Database Syst. 33, 1, Article 5 (March 2008), 44 pages.
[24]
storm 2017. Apache Storm. http://storm.apache.org/.
[25]
Tolga Urhan and Michael J. Franklin. 2001. Dynamic Pipeline Scheduling for Improving Interactive Query Performance. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB '01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 501--510. http://dl.acm.org/citation.cfm?id=645927.672188
[26]
Ivan Walulya, Dimitris Palyvos-Giannas, Yiannis Nikolakopoulos, Vincenzo Gulisano, Marina Papatriantafilou, and Philippas Tsigas. 2018. Viper: A module for communication-layer determinism and scaling in low-latency stream processing. Future Generation Computer Systems 88 (2018), 297--308.
[27]
Joel Wolf, Nikhil Bansal, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Rohit Wagle, Kun-Lung Wu, and Lisa Fleischer. 2008. SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems. In Middleware 2008, Valérie Issarny and Richard Schantz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 306--325.
[28]
Y. Xing, S. Zdonik, and J. Hwang. 2005. Dynamic load distribution in the Borealis stream processor. In 21st International Conference on Data Engineering (ICDE'05). 791--802.
[29]
J. Xu, Z. Chen, J. Tang, and S. Su. 2014. T-Storm: Traffic-Aware Online Scheduling in Storm. In 2014 IEEE 34th International Conference on Distributed Computing Systems. 535--544.

Cited By

View all
  • (2024)An Algorithm for Tunable Memory Compression of Time-Based Windows for Stream AggregatesEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_2(18-29)Online publication date: 16-Apr-2024
  • (2023)FORTE: an extensible framework for robustness and efficiency in data transfer pipelinesProceedings of the 17th ACM International Conference on Distributed and Event-based Systems10.1145/3583678.3596892(139-150)Online publication date: 27-Jun-2023
  • (2022)Towards data-driven additive manufacturing processesProceedings of the 23rd International Middleware Conference Industrial Track10.1145/3564695.3564778(43-49)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '19: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems
June 2019
291 pages
ISBN:9781450367943
DOI:10.1145/3328905
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Middleware
  2. Scheduling
  3. Stream processing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Vetenskapsrådet
  • Stiftelsen för Strategisk Forskning

Conference

DEBS '19

Acceptance Rates

DEBS '19 Paper Acceptance Rate 13 of 47 submissions, 28%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Algorithm for Tunable Memory Compression of Time-Based Windows for Stream AggregatesEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_2(18-29)Online publication date: 16-Apr-2024
  • (2023)FORTE: an extensible framework for robustness and efficiency in data transfer pipelinesProceedings of the 17th ACM International Conference on Distributed and Event-based Systems10.1145/3583678.3596892(139-150)Online publication date: 27-Jun-2023
  • (2022)Towards data-driven additive manufacturing processesProceedings of the 23rd International Middleware Conference Industrial Track10.1145/3564695.3564778(43-49)Online publication date: 7-Nov-2022
  • (2022)Research Summary: Deterministic, Explainable and Efficient Stream ProcessingProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542750(65-69)Online publication date: 25-Jul-2022
  • (2022)Resource scheduling and provisioning for processing of dynamic stream workflows under latency constraintsFuture Generation Computer Systems10.1016/j.future.2022.01.020131(166-182)Online publication date: Jun-2022
  • (2021)AnankeProceedings of the VLDB Endowment10.14778/3430915.343092814:3(391-403)Online publication date: 9-Dec-2021
  • (2021)LachesisProceedings of the 22nd International Middleware Conference10.1145/3464298.3493407(365-378)Online publication date: 6-Dec-2021
  • (2021)Klink: Progress-Aware Scheduling for Streaming Data SystemsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452794(485-498)Online publication date: 9-Jun-2021
  • (2021)Elastic Pulsar Functions for Distributed Stream ProcessingCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451901(9-16)Online publication date: 19-Apr-2021
  • (2021)Motivations and Challenges for Stream Processing in Edge ComputingCompanion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451899(17-18)Online publication date: 19-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media