Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Chi: a scalable and programmable control plane for distributed stream processing systems

Published: 01 June 2018 Publication History

Abstract

Stream-processing workloads and modern shared cluster environments exhibit high variability and unpredictability. Combined with the large parameter space and the diverse set of user SLOs, this makes modern streaming systems very challenging to statically configure and tune. To address these issues, in this paper we investigate a novel control-plane design, Chi, which supports continuous monitoring and feedback, and enables dynamic re-configuration. Chi leverages the key insight of embedding control-plane messages in the data-plane channels to achieve a low-latency and flexible control plane for stream-processing systems.
Chi introduces a new reactive programming model and design mechanisms to asynchronously execute control policies, thus avoiding global synchronization. We show how this allows us to easily implement a wide spectrum of control policies targeting different use cases observed in production. Large-scale experiments using production workloads from a popular cloud provider demonstrate the flexibility and efficiency of our approach.

References

[1]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. PVLDB, 12(2):120--139, 2003.
[2]
T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. Millwheel: fault-tolerant stream processing at internet scale. PVLDB, 6(11):1033--1044, 2013.
[3]
T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. Fernández-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB, 8:1792--1803, 2015.
[4]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In SIGMOD, pages 1383--1394. ACM, 2015.
[5]
P. Bailis, E. Gan, K. Rong, and S. Suri. MacroBase, A Fast Data Analysis Engine. In SIGMOD. ACM, 2017.
[6]
T. Bingmann, M. Axtmann, E. Jöbstl, S. Lamm, H. C. Nguyen, A. Noe, S. Schlag, M. Stumpp, T. Sturm, and P. Sanders. Thrill: High-performance algorithmic distributed batch data processing with c++. In IEEE International Conference on Big Data, pages 172--183. IEEE, 2016.
[7]
S. Bykov, A. Geller, G. Kliot, J. R. Larus, R. Pandya, and J. Thelin. Orleans: cloud computing for everyone. In Proceedings of the 2nd ACM Symposium on Cloud Computing, page 16. ACM, 2011.
[8]
F. J. Cangialosi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, CA, January 2005.
[9]
P. Carbone, G. Fóra, S. Ewen, S. Haridi, and K. Tzoumas. Lightweight asynchronous snapshots for distributed dataflows. arXiv preprint arXiv:1506.08603, 2015.
[10]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015.
[11]
R. Castro Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD, pages 725--736. ACM, 2013.
[12]
U. Cetintemel, J. Du, T. Kraska, S. Madden, D. Maier, J. Meehan, A. Pavlo, M. Stonebraker, E. Sutherland, N. Tatbul, et al. S-store: a streaming newsql system for big velocity applications. PVLDB, 7(13):1633--1636, 2014.
[13]
R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, 2008.
[14]
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. ACM Sigplan Notices, 45(6):363--375, 2010.
[15]
B. Chandramouli, C. N. Bond, S. Babu, and J. Yang. Query suspend and resume. In SIGMOD, pages 557--568, New York, NY, USA, 2007. ACM.
[16]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, D. Fisher, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. PVLDB, 8(4):401--412, 2014.
[17]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. Telegraphcq: continuous dataflow processing. In SIGMOD, pages 668--668. ACM, 2003.
[18]
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. In CIDR 2003 - First Biennial Conference on Innovative Data Systems Research, Asilomar, CA, January 2003.
[19]
Z. Chothia, J. Liagouris, D. Dimitrova, and T. Roscoe. Online reconstruction of structural information from datacenter logs. In EuroSys, pages 344--358. ACM, 2017.
[20]
C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: a stream database for network applications. In SIGMOD, pages 647--651. ACM, 2003.
[21]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[22]
A. Flink. Recovery. https://goo.gl/AJHiFu, 2017.
[23]
A. Flink. Savepoints. https://goo.gl/dT4zY2, 2017.
[24]
A. Floratou, A. Agrawal, B. Graham, S. Rao, and K. Ramasamy. Dhalion: Self-regulating stream processing in heron. PVLDB, August 2017.
[25]
J. Grier. Extending the yahoo! streaming benchmark. URL http://data-artisans.com/extending-the-yahoo-streamingbenchmark, 2016.
[26]
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2(2):1378--1389, 2009.
[27]
S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter heron: Stream processing at scale. In SIGMOD, 2015.
[28]
W. Lin, H. Fan, Z. Qian, J. Xu, S. Yang, J. Zhou, and L. Zhou. Streamscope: Continuous reliable distributed processing of big data streams. In NSDI, pages 439--453, 2016.
[29]
P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, W. Paul, M. I. Jordan, and I. Stoica. Ray: A distributed framework for emerging ai applications. arXiv preprint arXiv:1712.05889, 2017.
[30]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439--455. ACM, 2013.
[31]
NetFlix. Stream-processing with mantis. https://medium.com/netflix-techblog/stream-processing-with-mantis-78af913f51a6, 2016.
[32]
T. Nykiel, M. Potamias, C. Mishra, G. Kollios, and N. Koudas. Mrshare: sharing across multiple queries in mapreduce. PVLDB, 3(1--2):494--505, 2010.
[33]
P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. In SIGMOD, pages 249--260. ACM, 2000.
[34]
T. K. Sellis. Multiple-query optimization. ACM Transactions on Database Systems (TODS), 13(1):23--52, 1988.
[35]
K. Sheykh Esmaili, T. Sanamrad, P. M. Fischer, and N. Tatbul. Changing flights in mid-air: a model for safely modifying continuous queries. In SIGMOD, pages 613--624. ACM, 2011.
[36]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al. Storm@ twitter. pages 147--156. ACM, 2014.
[37]
P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng., 15(3):555--568, 2003.
[38]
L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1990.
[39]
S. Venkataraman, A. Panda, K. Ousterhout, A. Ghodsi, M. J. Franklin, B. Recht, and I. Stoica. Drizzle: Fast and adaptable stream processing at scale. Spark Summit, 2016.
[40]
G. Wang and C.-Y. Chan. Multi-query optimization in mapreduce framework. PVLDB, 7(3):145--156, 2013.
[41]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 423--438. ACM, 2013.
[42]
Yahoo! Streaming Benchmarks. https://github.com/yahoo/streaming-benchmarks.

Cited By

View all
  • (2024)FlexSP:(1 + β)-Choice based Flexible Stream Partitioning for Stateful OperatorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673157(732-741)Online publication date: 12-Aug-2024
  • (2024)Ads Recommendation in a Collapsed and Entangled WorldProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671607(5566-5577)Online publication date: 25-Aug-2024
  • (2024)Exploring the Asynchrony of Slow Memory Filesystem with EasyIOProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629586(624-640)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 10
June 2018
248 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2018
Published in PVLDB Volume 11, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)7
Reflects downloads up to 03 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FlexSP:(1 + β)-Choice based Flexible Stream Partitioning for Stateful OperatorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673157(732-741)Online publication date: 12-Aug-2024
  • (2024)Ads Recommendation in a Collapsed and Entangled WorldProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671607(5566-5577)Online publication date: 25-Aug-2024
  • (2024)Exploring the Asynchrony of Slow Memory Filesystem with EasyIOProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629586(624-640)Online publication date: 22-Apr-2024
  • (2024)DeCoCDR: Deployable Cloud-Device Collaboration for Cross-Domain RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657786(2114-2123)Online publication date: 10-Jul-2024
  • (2024)Making Small Language Models Better Multi-task Learners with Mixture-of-Task-AdaptersProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635690(1094-1097)Online publication date: 4-Mar-2024
  • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: 1-Jan-2024
  • (2024)Online Nonstop Task Management for Storm-Based Distributed Stream Processing EnginesJournal of Computer Science and Technology10.1007/s11390-021-1629-939:1(116-138)Online publication date: 1-Feb-2024
  • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
  • (2023)StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154316:12(3501-3514)Online publication date: 1-Aug-2023
  • (2023)Djenne: Dependable and Decentralized Computation for Networked Embedded SystemsProceedings of the Int'l ACM Conference on Modeling Analysis and Simulation of Wireless and Mobile Systems10.1145/3616388.3617540(243-252)Online publication date: 30-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media