Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3401025.3404088acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
short-paper

The role of event-time order in data streaming analysis

Published: 15 July 2020 Publication History

Abstract

The data streaming paradigm was introduced around the year 2000 to overcome the limitations of traditional store-then-process paradigms found in relational databases (DBs). Opposite to DBs' "first-the-data-then-the-query" approach, data streaming applications build on the "first-the-query-then-the-data" alternative. More concretely, data streaming applications do not rely on storage to initially persist data and later query it, but rather build on continuous single-pass analysis in which incoming streams of data are processed on the fly and result in continuous streams of outputs.
In contrast with traditional batch processing, data streaming applications require the user to reason about an additional dimension in the data: event-time. Numerous models have been proposed in the literature to reason about event-time, each with different guarantees and trade-offs. Since it is not always clear which of these models is appropriate for a particular application, this tutorial studies the relevant concepts and compares the available options. This study can be highly relevant for people working with data streaming applications, both researchers and industrial practitioners.

References

[1]
Apache Heron. https://heron.incubator.apache.org/. Accessed: 2020-04-15.
[2]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. In VLDB Journal, volume 12, pages 120--139, aug 2003.
[3]
T. Akidau, S. Chernyak, and R. Lax. Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing. Technical Report 4, 2018.
[4]
Apache. Beam, 2020.
[5]
A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts. Linear road: A stream data management benchmark. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB '04, pages 480--491, Toronto, Canada, 2004. VLDB Endowment.
[6]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 1--16, 2002.
[7]
S. Babu, U. Srivastava, and J. Widom. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Transactions on Database Systems, 29(3):545--590, sep 2004.
[8]
M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker. Fault-tolerance in the borealis distributed stream processing system. ACM Transactions on Database Systems, 33(1):1--44, mar 2008.
[9]
R. Cai, W. Wu, N. Huang, and L. Wu. Processing partially ordered requests in distributed stream processing systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 10048 LNCS, pages 211--219. Springer Verlag, dec 2016.
[10]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015.
[11]
G. Cormode, F. Korn, and S. Tirthapura. Time-decaying aggregates in out-of-order streams. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 89--98, 2008.
[12]
V. Gulisano. StreamCloud: an elastic parallel-distributed stream processing engine. PhD thesis, 2012.
[13]
V. Gulisano, M. Almgren, and M. Papatriantafilou. When smart cities meet big data. Smart Cities, 1(98):40, 2014.
[14]
V. Gulisano, R. Jiménez-Peris, M. Patiño-Martínez, C. Soriente, and P. Valduriez. StreamCloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems, 23(12):2351--2365, 2012.
[15]
V. Gulisano, Y. Nikolakopoulos, D. Cederman, M. Papatriantafilou, and P. Tsigas. Efficient data streaming multiway aggregation through concurrent algorithmic designs and new abstract data types. ACM Transactions on Parallel Computing, 4(2):1--28, oct 2017.
[16]
V. Gulisano, Y. Nikolakopoulos, M. Papatriantafilou, and P. Tsigas. ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join. IEEE Transactions on Big Data, pages 1--1, nov 2016.
[17]
V. Gulisano, A. V. Papadopoulos, Y. Nikolakopoulos, M. Papatriantafilou, and P. Tsigas. Performance modeling of stream joins. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pages 191--202, 2017.
[18]
B. Havers, R. Duvignau, H. Najdataei, V. Gulisano, A. C. Koppisetty, and M. Papatriantafilou. Driven: a framework for efficient data retrieval and clustering in vehicular networks. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1850--1861. IEEE, 2019.
[19]
J. H. Hwang, U. Çetintemel, and S. Zdonik. Fast and reliable stream processing over wide area networks. In Proceedings - International Conference on Data Engineering, pages 604--613, 2007.
[20]
Y. Ji, J. Sun, A. Nica, Z. Jerzak, G. Hackenbroich, and C. Fetzer. Quality-driven disorder handling for m-way sliding window stream joins. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, pages 493--504. Institute of Electrical and Electronics Engineers Inc., jun 2016.
[21]
Y. Ji, H. Zhou, Z. Jerzak, A. Nica, G. Hackenbroich, and C. Fetzer. Quality-driven processing of sliding window aggregates over out-of-order data streams. In DEBS 2015 - Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pages 68--79, New York, New York, USA, jun 2015. Association for Computing Machinery, Inc.
[22]
E. Kalyvianaki, M. Fiscato, T. Salonidis, and P. Pietzuch. Themis: Fairness in federated stream processing under overload. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 541--553, New York, NY, USA, 2016. ACM.
[23]
S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous analytics over discontinuous streams. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 1081--1092, 2010.
[24]
J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: A new architecture for high-performance stream systems. Proceedings of the VLDB Endowment, 1(1):274--288, aug 2008.
[25]
C. Mutschler and M. Philippsen. Distributed low-latency out-of-order event processing for high data rate sensor streams. In Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium, IPDPS 2013, pages 1133--1144, 2013.
[26]
M. Najafi, M. Sadoghi, and H.-A. Jacobsen. SplitJoin: A Scalable, Low-Latency Stream Join Architecture with Adjustable Ordering Precision. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC '16, pages 493--505, USA, 2016. USENIX Association.
[27]
H. Najdataei, Y. Nikolakopoulos, M. Papatriantafilou, P. Tsigas, and V. Gulisano. STRETCH: Scalable and elastic deterministic streaming analysis with virtual shared-nothing parallelism. In DEBS 2019 - Proceedings of the 13th ACM International Conference on Distributed and Event-Based Systems, pages 7--18, New York, NY, USA, jun 2019. Association for Computing Machinery, Inc.
[28]
Y. Nikolakopoulos, M. Papatriantafilou, P. Brauer, M. Lundqvist, V. Gulisano, and P. Tsigas. Highly concurrent stream synchronization in many-core embedded systems. In Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, pages 2--9, 2016.
[29]
D. Palyvos-Giannas, V. Gulisano, and M. Papatriantafilou. GeneaLog: Fine-grained data streaming provenance in cyber-physical systems. Parallel Computing, 89:102552, nov 2019.
[30]
S. Peros, S. Delbruel, S. Michiels, W. Joosen, and D. Hughes. Khronos: Middleware for simplified time management in CPS. In DEBS 2019 - Proceedings of the 13th ACM International Conference on Distributed and Event-Based Systems, pages 127--138, New York, NY, USA, jun 2019. Association for Computing Machinery, Inc.
[31]
N. Rivetti, A. Gal, N. Zacheilas, and V. Kalogeraki. Probabilistic management of late arrival of events. In DEBS 2018 - Proceedings of the 12th ACM International Conference on Distributed and Event-Based Systems, pages 52--63, New York, New York, USA, jun 2018. Association for Computing Machinery, Inc.
[32]
U. Srivastava and J. Widom. Flexible Time Management in Data Stream Systems. In Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '04, pages 263--274, New York, NY, USA, 2004. Association for Computing Machinery.
[33]
P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):555--568, 2003.
[34]
N. Zacheilas, V. Kalogeraki, Y. Nikolakopoulos, V. Gulisano, M. Papatriantafilou, and P. Tsigas. Maximizing determinism in stream processing under latency constraints. In DEBS 2017 - Proceedings of the 11th ACM International Conference on Distributed Event-Based Systems, pages 112--123, New York, New York, USA, jun 2017. Association for Computing Machinery, Inc.

Cited By

View all
  • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
  • (2024)Aggregates are all you need (to bridge stream processing and Complex Event Recognition)Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666032(66-77)Online publication date: 24-Jun-2024
  • (2024)Evolutionary Computation Meets Stream ProcessingApplications of Evolutionary Computation10.1007/978-3-031-56852-7_24(377-393)Online publication date: 3-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '20: Proceedings of the 14th ACM International Conference on Distributed and Event-based Systems
July 2020
244 pages
ISBN:9781450380287
DOI:10.1145/3401025
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streaming
  2. event-time
  3. stream processing engines

Qualifiers

  • Short-paper

Funding Sources

  • Stiftelsen för Strategisk Forskning
  • VINNOVA
  • Vetenskapsrådet

Conference

DEBS '20

Acceptance Rates

DEBS '20 Paper Acceptance Rate 11 of 43 submissions, 26%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Research Summary: Enhancing Localization, Selection, and Processing of Data in Vehicular Cyber-Physical SystemsProceedings of the 2024 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3663338.3663680(1-5)Online publication date: 17-Jun-2024
  • (2024)Aggregates are all you need (to bridge stream processing and Complex Event Recognition)Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666032(66-77)Online publication date: 24-Jun-2024
  • (2024)Evolutionary Computation Meets Stream ProcessingApplications of Evolutionary Computation10.1007/978-3-031-56852-7_24(377-393)Online publication date: 3-Mar-2024
  • (2024)An Algorithm for Tunable Memory Compression of Time-Based Windows for Stream AggregatesEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_2(18-29)Online publication date: 16-Apr-2024
  • (2022)STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318197933:12(4221-4238)Online publication date: 1-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media