Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3269206.3274271acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
tutorial

Incremental Techniques for Large-Scale Dynamic Query Processing

Published: 17 October 2018 Publication History

Abstract

Many applications from various disciplines are now required to analyze fast evolving big data in real time. Various approaches for incremental processing of queries have been proposed over the years. Traditional approaches rely on updating the results of a query when updates are streamed rather than re-computing these queries, and therefore, higher execution performance is expected. However, they do not perform well for large databases that are updated at high frequencies. Therefore, new algorithms and approaches have been proposed in the literature to address these challenges by, for instance, reducing the complexity of processing updates. Moreover, many of these algorithms are now leveraging distributed streaming platforms such as Spark Streaming and Flink. In this tutorial, we briefly discuss legacy approaches for incremental query processing, and then give an overview of the new challenges introduced due to processing big data streams. We then discuss in detail the recently proposed algorithms that address some of these challenges. We emphasize the characteristics and algorithmic analysis of various proposed approaches and conclude by discussing future research directions.

References

[1]
Alexander Alexandrov et al. 2014. The Stratosphere Platform for Big Data Analytics. The VLDB Journal, Vol. 23, 6 (Dec. 2014), 939--964.
[2]
Michael Armbrust et al. 2018. Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 601--613.
[3]
Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. 2017. Answering Conjunctive Queries under Updates. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS). 303--318.
[4]
Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. 2018. Answering UCQs under Updates and in the Presence of Integrity Constraints. In Proc. ACM Int. Conf. on Database Theory (ICDT). 8:1--8:19.
[5]
Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquin. 2011. Incoop: MapReduce for Incremental Computations. In Proc. ACM Symposium on Cloud Computing. 7:1--7:14.
[6]
Rada Chirkova and Jun Yang. 2012. Materialized Views. Found. & Trends in DB, Vol. 4, 4 (2012), 295--405.
[7]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proc. USENIX Conf. on Operating Systems Design and Implementation (OSDI). 137--150.
[8]
Todd J. Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance Semirings. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS). 31--40.
[9]
Ashish Gupta and Iderpal Singh Mumick (Eds.). 1999. Materialized Views: Techniques, Implementations, and Applications .MIT Press, Cambridge, MA, USA.
[10]
Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. 2017. The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 1259--1274.
[11]
Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2018. Conjunctive Queries with Inequalities Under Updates. Proc. VLDB Endow. (PVLDB), Vol. 11, 7 (2018), 733--745.
[12]
Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. 2019. Counting Triangles under Updates in Worst-Case Optimal Time. In ICDT (to appear) .
[13]
Christoph Koch. 2010. Incremental query evaluation in a ring of databases. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS). 87--98.
[14]
Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: higher-order delta processing for dynamic, frequently fresh views. The VLDB Journal, Vol. 23, 2 (2014), 253--278.
[15]
Sanjeev Kulkarni et al. 2015. Twitter Heron: Stream Processing at Scale. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 239--250.
[16]
Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In Proc. Conf. on Innovative Data Systems Research (CIDR).
[17]
Derek G. Murray, Frank McSherry, Michael Isard, Rebecca Isaacs, Paul Barham, and Martin Abadi. 2016. Incremental, Iterative Data Processing with Timely Dataflow. Commun. ACM, Vol. 59, 10 (Sept. 2016), 75--83.
[18]
Milos Nikolic, Mohammad Dashti, and Christoph Koch. 2016. How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 511--526.
[19]
Milos Nikolic, Mohammed Elseidy, and Christoph Koch. 2014. LINVIEW: incremental view maintenance for complex analytical queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 253--264.
[20]
Milos Nikolic and Dan Olteanu. 2018. Incremental View Maintenance with Triple Lock Factorization Benefits. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 365--380.
[21]
Christopher Olston et al. 2011. Nova: continuous pig/hadoop workflows. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 1081--1090.
[22]
Ankit Toshniwal et al. 2014. Storm@Twitter. In Proc. ACM SIGMOD Int. Conf. on Management of Data. 147--156.
[23]
Cairong Yan, Xin Yang, Ze Yu, Min Li, and Xiaolin Li. 2012. IncMR: Incremental Data Processing Based on MapReduce. In Proc. IEEE Int. Conf. on Cloud Computing. 534--541.
[24]
Matei Zaharia et al. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In USENIX Conf. on Networked Systems Design and Implementation (NSDI). 15--28.
[25]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proc. ACM Symp. on Operating Systems Principles (SOSP). 423--438.

Cited By

View all
  • (2024)Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous DataData Intelligence10.1162/dint_a_002166:2(504-530)Online publication date: 1-May-2024
  • (2024)Wrapping Rings in Lattices: An Algebraic Symbiosis of Incremental View Maintenance and Eventual ConsistencyProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653031(15-22)Online publication date: 22-Apr-2024
  • (2020)The relational data borg is learningProceedings of the VLDB Endowment10.14778/3415478.341557213:12(3502-3515)Online publication date: 14-Sep-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Check for updates

Author Tags

  1. big data
  2. data streams
  3. dynamic query processing
  4. incremental view maintenance

Qualifiers

  • Tutorial

Funding Sources

  • Wiener-Anspach Foundation
  • European Union's Horizon 2020 research and innovation programme

Conference

CIKM '18
Sponsor:

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous DataData Intelligence10.1162/dint_a_002166:2(504-530)Online publication date: 1-May-2024
  • (2024)Wrapping Rings in Lattices: An Algebraic Symbiosis of Incremental View Maintenance and Eventual ConsistencyProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653031(15-22)Online publication date: 22-Apr-2024
  • (2020)The relational data borg is learningProceedings of the VLDB Endowment10.14778/3415478.341557213:12(3502-3515)Online publication date: 14-Sep-2020
  • (2019)Learning Models over Relational Data: A Brief TutorialScalable Uncertainty Management10.1007/978-3-030-35514-2_32(423-432)Online publication date: 2-Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media