Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583780.3615293acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
tutorial
Open access

Proactive Streaming Analytics at Scale: A Journey from the State-of-the-art to a Production Platform

Published: 21 October 2023 Publication History

Abstract

Proactive streaming analytics continuously extract real-time business value from massive data that stream in data centers or clouds. This requires (a) to process the data while they are still in motion; (b) to scale the processing to multiple machines, often over various, dispersed computer clusters, with diverse Big Data technologies; and (c) to forecast complex business events for proactive decision-making. Combining the necessary facilities for proactive streaming analytics at scale entails: (I) deep knowledge of the relevant state-of-the-art, (II) cherry-picking cutting edge research outcomes based on desired features and with the prospect of building interoperable components, and (III) building components and deploying them into a holistic architecture within a real-world platform. In this tutorial, we drive the audience through the whole journey from (I) to (III), delivering cutting edge research into a commercial analytics platform, for which we provide a hands-on experience.

Supplementary Material

MP4 File (tut3212-video.mp4)
The permanent web page of this tutorial: https://www.softnet.tuc.gr/en/research/journeycikm23tutorial provides additional material including the tutorial slides, open source code repositories of the developed platform, and use case related videos.

References

[1]
V. Cardellini et al. 2022. Runtime Adaptation of Data Stream Processing Systems: The State of the Art. ACM Comput. Surv., Vol. 54, 11s, Article 237 (sep 2022), 36 pages.
[2]
G. Cormode and K. Yi. 2020. Small Summaries for Big Data. Cambridge University Press.
[3]
G. Cugola and A. Margara. 2012. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv., Vol. 44, 3 (2012), 15:1--15:62.
[4]
Apache DataSketches. 2020. https://datasketches.github.io/.
[5]
K. Doka et al. 2015. IReS: Intelligent, Multi-Engine Resource Scheduler for Big Data Analytics Workflows. In SIGMOD.
[6]
Y. Engel and O. Etzion. 2011. Towards proactive event-driven computing. In DEBS.
[7]
A. Artikis et al. 2017a. A Prototype for Credit Card Fraud Management: Industry Paper. In DEBS.
[8]
A. Eduardo Má rquez-Chamorro et al. 2018a. Predictive Monitoring of Business Processes: A Survey. IEEE Trans. Services Computing, Vol. 11, 6 (2018), 962--977.
[9]
A. J. Elmore et al. 2015. A Demonstration of the BigDAWG Polystore System. Proc. VLDB Endow., Vol. 8, 12 (2015).
[10]
A. Kontaxakis et al. 2020a. A Synopses Data Engine for Interactive Extreme-Scale Analytics. In CIKM.
[11]
A. Kontaxakis et al. 2023. And synopses for all: A synopses data engine for extreme scale analytics-as-a-service. Information Systems, Vol. 116 (2023), 102221.
[12]
A. Milios et al. 2019. Automatic Fusion of Satellite Imagery and AIS data for Vessel Detection. In FUSION.
[13]
A. Sandur et al. 2022a. Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing. In ICDE.
[14]
Chung-Wen Cho et al. 2011. On-line rule matching for event prediction. VLDB J., Vol. 20, 3 (2011), 303--334.
[15]
C. Zhou et al. 2015a. A pattern based predictor for event streams. Expert Syst. Appl., Vol. 42, 23 (2015), 9294--9306.
[16]
D. Agrawal et al. 2018b. RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -. Proc. VLDB Endow., Vol. 11, 11 (2018).
[17]
D. L. Quoc et al. 2017b. StreamApprox: approximate computing for stream analytics. In Middleware.
[18]
D. Montgomery et al. 2015b. Introduction to time series analysis and forecasting. John Wiley & Sons.
[19]
D. Ron et al. 1996. The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length. Machine Learning, Vol. 25, 2--3 (1996), 117--149.
[20]
E. Alevizos et al. 2018c. Wayeb: a Tool for Complex Event Forecasting. In LPAR.
[21]
E. Alevizos et al. 2022b. Complex event forecasting with prediction suffix trees. VLDB J., Vol. 31, 1 (2022).
[22]
E. Ntoulias et al. 2021a. Online trajectory analysis with scalable event recognition. In EDBT/ICDT (CEUR Workshop Proceedings).
[23]
F. M. J. Willems et al. 1995. The context-tree weighting method: basic properties. IEEE Trans. Information Theory, Vol. 41, 3 (1995), 653--664.
[24]
G. Cormode et al. 2012a. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundations and Trends in Databases, Vol. 4, 1--3 (2012).
[25]
G. Stamatakis et al. 2022c. SheerMP: Optimized Streaming Analytics-as-a-Service over Multi-site and Multi-platform Settings. In EDBT.
[26]
H. Herodotou et al. 2020b. A Survey on Automatic Parameter Tuning for Big Data Processing Systems. ACM Comput. Surv., Vol. 53, 2 (2020).
[27]
I. Flouris et al. 2020c. Network-wide complex event processing over geographically distributed data sources. Inf. Syst., Vol. 88 (2020).
[28]
I. Gog et al. 2015c. Musketeer: all for one, one for all in data processing systems. In EuroSys.
[29]
J. Meehan et al. 2016a. Integrating real-time and batch processing in a polystore. In HPEC.
[30]
L. J. Fü lö p et al. 2012b. Predictive complex event processing: a conceptual framework for combining complex event processing and predictive analytics. In BCI.
[31]
M. Christ et al. 2016b. Integrating Predictive Analytics into Complex Event Processing by Using Conditional Density Estimations. In EDOC Workshops.
[32]
M. Garofalakis et al. 2016c. Data Stream Management: A Brave New World. In Data Stream Management - Processing High-Speed Data Streams.
[33]
M. Vodas et al. 2021b. Online Distributed Maritime Event Detection & Forecasting over Big Vessel Tracking Data. In IEEE Big Data.
[34]
N. Giatrakos et al. 2020d. Complex event recognition in the Big Data era: a survey. VLDB J., Vol. 29, 1 (2020), 313--352.
[35]
N. Giatrakos et al. 2020 e. INforE: Interactive Cross-platform Analytics for Everyone. In CIKM.
[36]
O. Alipourfard et al. 2017c. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In NSDI.
[37]
P. Bühlmann et al. 1999. Variable length Markov chains. The Annals of Statistics, Vol. 27, 2 (1999), 480--513.
[38]
R. Begleiter et al. 2004. On Prediction Using Variable Order Markov Models. J. Artif. Intell. Res., Vol. 22 (2004), 385--421.
[39]
R. P. Lemaitre et al. 2021c. In the Land of Data Streams where Synopses are Missing, One Framework to Bring Them All. Proc. VLDB Endow., Vol. 14, 10 (2021).
[40]
R Vilalta et al. 2002. Predicting Rare Events In Temporal Domains. In ICDM. IEEE Computer Society, 474--481.
[41]
S. Beamer et al. 2013. Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search. In IPDPSW.
[42]
S. Chatterjee et al. 2021d. Cosine: A Cloud-Cost Optimized Self-Designing Key-Value Storage Engine. PVLDB, Vol. 15, 1 (2021).
[43]
S. Gillani et al. 2017d. Pi-CEP: Predictive Complex Event Processing Using Range Queries over Historical Pattern Space. In ICDM Workshops. IEEE Computer Society, 1166--1171.
[44]
S. Laxman et al. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD. ACM, 453--461.
[45]
V. Muthusamy et al. 2010. Predictive publish/subscribe matching. In DEBS.
[46]
V. Stavropoulos et al. 2022d. Optimizing complex event forecasting. In DEBS.
[47]
Y. Li et al. 2020 f. Data Stream Event Prediction Based on Timing Knowledge and State Transitions. Proceedings of the VLDB Endowment, Vol. 13, 10 (2020).
[48]
A. Floratou et al. 2017. Dhalion: Self-Regulating Stream Processing in Heron. PVLDB, Vol. 10, 12 (2017).
[49]
B. Mozafari. 2019. SnappyData. In Encyclopedia of Big Data Technologies.
[50]
Stream-lib. 2019. Stream-lib. https://github.com/addthis/stream-lib.
[51]
G. van Dongen et al. 2020. Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, Vol. 31, 8 (2020), 1845--1858.
[52]
S. Venkataraman et al. 2017. Drizzle: Fast and Adaptable Stream Processing at Scale. In SoSP.
[53]
F. Waas and A. Pellenkoft. 2000. Join Order Selection - Good Enough Is Easy (BNCOD 17). 51--67.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Check for updates

Author Tags

  1. big streaming data
  2. complex event forecasting
  3. optimizer
  4. synopses

Qualifiers

  • Tutorial

Funding Sources

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 236
    Total Downloads
  • Downloads (Last 12 months)190
  • Downloads (Last 6 weeks)14
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media