Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2663165.2663334acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Slider: incremental sliding window analytics

Published: 08 December 2014 Publication History

Abstract

Sliding window analytics is often used in distributed data-parallel computing for analyzing large streams of continuously arriving data. When pairs of consecutive windows overlap, there is a potential to update the output incrementally, more efficiently than recomputing from scratch. However, in most systems, realizing this potential requires programmers to explicitly manage the intermediate state for overlapping windows, and devise an application-specific algorithm to incrementally update the output.
In this paper, we present self-adjusting contraction trees, a set of data structures and algorithms for transparently updating the output of a sliding window computation as the window moves, while reusing, to the extent possible, results from prior computations. Self-adjusting contraction trees structure sub-computations of a data-parallel computation in the form of a shallow (logarithmic depth) balanced data dependence graph, through which input changes are efficiently propagated in asymptotically sub-linear time.
We implemented self-adjusting contraction trees in a system called Slider. The design of Slider incorporates several novel techniques, most notably: (i) a set of self balancing trees tuned for different variants of sliding window computation (append-only, fixed-width, or variable-width slides); (ii) a split processing mode, where a background pre-processing stage leverages the predictability of input changes to pave the way for a more efficient foreground processing when the window slides; and (iii) an extension of the data structures to handle multiple-job workflows such as data-flow query processing. We evaluated Slider using a variety of applications and real-world case studies. Our results show significant performance gains without requiring any changes to the existing application code used for non-incremental data processing.

References

[1]
Apache Hadoop: http://hadoop.apache.org.
[2]
Apache PigMix: http://wiki.apache.org/pig/PigMix.
[3]
Apache S4: http://incubator.apache.org/s4.
[4]
Asymptotic analysis of self-adjusting contraction trees: http://www.mpi-sws.org/~bhatotia/publications.
[5]
Storm: http://storm-project.net.
[6]
Trident: http://storm.incubator.apache.org/.
[7]
Wikipedia dataset: http://wiki.dbpedia.org.
[8]
U. A. Acar. Self-Adjusting Computation. PhD thesis, Carnegie Mellon University, 2005.
[9]
U. A. Acar, G. E. Blelloch, M. Blume, R. Harper, and K. Tangwongsan. An experimental analysis of self-adjusting computation. ACM TOPLAS, 2009.
[10]
U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The Data Locality of Work Stealing. In SPAA, 2000.
[11]
U. A. Acar, G. E. Blelloch, R. Ley-Wild, K. Tangwongsan, and D. Türkoğlu. Traceable data types for self-adjusting computation. In ACM PLDI, 2010.
[12]
P. Aditya, M. Zhao, Y. Lin, A. Haeberlen, P. Druschel, B. Maggs, and B. Wishon. Reliable Client Accounting for P2P-Infrastructure Hybrids. In NSDI, 2012.
[13]
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In NSDI, 2012.
[14]
Ananthanarayanan et al. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. In SIGMOD, 2013.
[15]
M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker. Fault-Tolerance in the Borealis Distributed Stream Processing System. In SIGMOD, 2005.
[16]
P. Bhatotia, R. Rodrigues, and A. Verma. Shredder: GPU-Accelerated Incremental Storage and Computation. In FAST, 2012.
[17]
P. Bhatotia, A. Wieder, I. E. Akkus, R. Rodrigues, and U. A. Acar. Large-scale incremental data processing with change propagation. In HotCloud, 2011.
[18]
P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquini. Incoop: MapReduce for Incremental Computations. In SoCC, 2011.
[19]
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. In VLDB, 2010.
[20]
C. Olston et al. Pig Latin: A Not-So-Foreign Language for Data Processing. In SIGMOD, 2008.
[21]
C. Olston et al. Nova: Continuous Pig/Hadoop Workflows. In SIGMOD, 2011.
[22]
Y.-J. Chiang and R. Tamassia. Dynamic algorithms in computational geometry. Proceedings of the IEEE, 1992.
[23]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. MapReduce Online. In NSDI, 2010.
[24]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, 2004.
[25]
C. Demetrescu, I. Finocchi, and G. Italiano. Handbook on Data Structures and Applications.
[26]
M. Dischinger, M. Marcon, S. Guha, K. P. Gummadi, R. Mahajan, and S. Saroiu. Glasnost: Enabling End Users to Detect Traffic Differentiation. In NSDI, 2010.
[27]
P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Datacenters. In OSDI, 2010.
[28]
B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: Batched Stream Processing for Data Intensive Distributed Computing. In SoCC, 2010.
[29]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In EuroSys, 2007.
[30]
D. Logothetis, C. Olston, B. Reed, K. Web, and K. Yocum. Stateful bulk processing for incremental analytics. In SoCC, 2010.
[31]
M. Ali et al. Microsoft CEP server and online behavioral targeting. In VLDB, 2009.
[32]
M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012.
[33]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A Timely Dataflow System. In SOSP, 2013.
[34]
D. Peng and F. Dabek. Large-scale Incremental Processing Using Distributed Transactions and Notifications. In OSDI, 2010.
[35]
L. Popa, M. Budiu, Y. Yu, and M. Isard. DryadInc: Reusing work in large-scale computations. In HotCloud, 2009.
[36]
W. Pugh. Skip Lists: A Probabilistic Alternative to Balanced Trees. In CACM, 1990.
[37]
G. Ramalingam and T. Reps. A Categorized Bibliography on Incremental Computation. In POPL, 1993.
[38]
T. Rodrigues, F. Benevenuto, M. Cha, K. Gummadi, and V. Almeida. On Word-of-Mouth Based Discovery of the Web. In IMC, 2011.
[39]
M. A. Shah, J. M. Hellerstein, and E. Brewer. Highly available, fault-tolerant, parallel dataflows. In SIGMOD, 2004.
[40]
Z. Qian et al. TimeStream: Reliable Stream Computation in the Cloud. In EuroSys, 2013.
[41]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In SOSP, 2013.
[42]
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In OSDI, 2008.

Cited By

View all
  • (2024)GREEMProceedings of the 15th ACM Multimedia Systems Conference10.1145/3625468.3652168(264-270)Online publication date: 15-Apr-2024
  • (2024)O(1)-Time Complexity for Fixed Sliding-Window Aggregation Over Out-of-Order Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341956636:11(6745-6757)Online publication date: Nov-2024
  • (2023)Survey of window types for aggregation in stream processing systemsThe VLDB Journal10.1007/s00778-022-00778-6Online publication date: 17-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '14: Proceedings of the 15th International Middleware Conference
December 2014
334 pages
ISBN:9781450327855
DOI:10.1145/2663165
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Orange
  • Conseil Régional d'Aquitaine
  • LaBRI: LaBRI
  • Raytheon BBN Technologies: Raytheon BBN Technologies
  • ACM: Association for Computing Machinery
  • Red Hat JBoss Middleware: Red Hat JBoss Middleware
  • Bordeaux: City of Bordeaux
  • USENIX Assoc: USENIX Assoc
  • GDR ASR: GDR Architecture, Systèmes et Réseaux
  • IBM: IBM
  • HP: HP
  • IFIP

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

Middleware '14
Sponsor:
  • LaBRI
  • Raytheon BBN Technologies
  • ACM
  • Red Hat JBoss Middleware
  • Bordeaux
  • USENIX Assoc
  • GDR ASR
  • IBM
  • HP

Acceptance Rates

Middleware '14 Paper Acceptance Rate 27 of 144 submissions, 19%;
Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)GREEMProceedings of the 15th ACM Multimedia Systems Conference10.1145/3625468.3652168(264-270)Online publication date: 15-Apr-2024
  • (2024)O(1)-Time Complexity for Fixed Sliding-Window Aggregation Over Out-of-Order Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341956636:11(6745-6757)Online publication date: Nov-2024
  • (2023)Survey of window types for aggregation in stream processing systemsThe VLDB Journal10.1007/s00778-022-00778-6Online publication date: 17-Feb-2023
  • (2022)Window-based parallel operator execution with in-network computingProceedings of the 16th ACM International Conference on Distributed and Event-Based Systems10.1145/3524860.3539804(91-96)Online publication date: 27-Jun-2022
  • (2022)CPiX: Real-Time Analytics Over Out-of-Order Data Streams by Incremental Sliding-Window AggregationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.305489834:11(5239-5250)Online publication date: 1-Nov-2022
  • (2021)ScottyACM Transactions on Database Systems10.1145/343367546:1(1-46)Online publication date: 27-Mar-2021
  • (2021)VID-WIN: Fast Video Event Matching With Query-Aware Windowing at the Edge for the Internet of Multimedia ThingsIEEE Internet of Things Journal10.1109/JIOT.2021.30753368:13(10367-10389)Online publication date: 1-Jul-2021
  • (2020)SBASH Stack Based Allocation of Sheer Window Architecture for Real Time Stream Data ProcessingInternational Journal of Data Analytics10.4018/IJDA.20200101011:1(1-21)Online publication date: Jan-2020
  • (2020)LightSaber: Efficient Window Aggregation on Multi-core ProcessorsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389753(2505-2521)Online publication date: 11-Jun-2020
  • (2020)L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexesKnowledge and Information Systems10.1007/s10115-020-01444-5Online publication date: 21-Feb-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media