Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3514221.3517889acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Efficient Incrementialization of Correlated Nested Aggregate Queries using Relative Partial Aggregate Indexes (RPAI)

Published: 11 June 2022 Publication History

Abstract

Incrementalization of queries is imperative in cases where data arrives as streams and output is latency-critical and/or desired before the full data has been received. Incremental execution computes the output at a given time by reusing the previously computed outputs or maintained views rather than re-evaluating the query from scratch. There are various approaches to perform this incrementalization ranging from query-specific algorithms and data structures (e.g., DYN, AJU) to general systems (e.g., DBToaster, Materialize).
DBToaster is a state-of-the-art system that comes with an appealing theoretical background based on the idea of applying Incremental View Maintenance (IVM) recursively, maintaining a hierarchy of materialized views via delta queries. However, one key limitation of this approach is its inability to efficiently incrementalize correlated nested-aggregate queries due to an inefficient delta rule for such queries. Moreover, none of the other specialized approaches have shown efficient ways to optimize such queries either. Nonetheless, these types of queries can be found in many real-world application domains (e.g., finance), for which efficient incrementalization remains a crucial open problem. In this work, we propose an approach to incrementalize such queries based on a novel tree-based index structure called Relative Partial Aggregate Indexes (RPAI). Our approach is asymptotically faster than other systems and shows up to 1100× speedups in workloads of practical importance.

References

[1]
Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. Proc. VLDB Endow., Vol. 6, 11 (2013), 1033--1044.
[2]
Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Mayur Datar, Keith Ito, Rajeev Motwani, Utkarsh Srivastava, and Jennifer Widom. 2016. STREAM: The Stanford Data Stream Management System. In Data Stream Management. Springer, 317--336.
[3]
Rudolf Bayer and Edward M. McCreight. 1970. Organization and Maintenance of Large Ordered Indexes. In SIGFIDET Workshop. ACM, 107--141.
[4]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink#8482;: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull., Vol. 38, 4 (2015), 28--38.
[5]
Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. 1995. Optimizing Queries with Materialized Views. In ICDE. IEEE Computer Society, 190--200.
[6]
Rada Chirkova and Jun Yang. 2012. Materialized Views. Found. Trends Databases, Vol. 4, 4 (2012), 295--405.
[7]
Latha S. Colby, Timothy Griffin, Leonid Libkin, Inderpal Singh Mumick, and Howard Trickey. 1996. Algorithms for Deferred View Maintenance. In SIGMOD Conference. ACM Press, 469--480. https://doi.org/10.1145/233269.233364
[8]
Mark de Berg, Otfried Cheong, Marc J. van Kreveld, and Mark H. Overmars. 2008. Computational geometry: algorithms and applications, 3rd Edition .Springer. 231--236 pages.
[9]
Gré gory M. Essertel, Ruby Y. Tahboub, James M. Decker, Kevin J. Brown, Kunle Olukotun, and Tiark Rompf. 2018. Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data. In OSDI. USENIX Association, 799--815.
[10]
Peter M. Fenwick. 1994. A New Data Structure for Cumulative Frequency Tables. Softw. Pract. Exp., Vol. 24, 3 (1994), 327--336.
[11]
Cé sar A. Galindo-Legaria and Milind Joshi. 2001. Orthogonal Optimization of Subqueries and Aggregation. In SIGMOD Conference. ACM, 571--581.
[12]
Timothy Griffin and Leonid Libkin. 1995. Incremental Maintenance of Views with Duplicates. In SIGMOD Conference. ACM Press, 328--339.
[13]
Leonidas J. Guibas and Robert Sedgewick. 1978. A Dichromatic Framework for Balanced Trees. In FOCS. IEEE Computer Society, 8--21.
[14]
Ashish Gupta and Inderpal Singh Mumick. 1995. Maintenance of Materialized Views: Problems, Techniques, and Applications. IEEE Data Eng. Bull., Vol. 18, 2 (1995), 3--18.
[15]
Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. 1993. Maintaining Views Incrementally. In SIGMOD Conference. ACM Press, 157--166.
[16]
Muhammad Idris, Mart'i n Ugarte, and Stijn Vansummeren. 2017. The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. In SIGMOD Conference. ACM, 1259--1274.
[17]
Muhammad Idris, Mart'i n Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2018. Conjunctive Queries with Inequalities Under Updates. Proc. VLDB Endow., Vol. 11, 7 (2018), 733--745.
[18]
Muhammad Idris, Mart'i n Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2019. Efficient Query Processing for Dynamically Changing Datasets. SIGMOD Rec., Vol. 48, 1 (2019), 33--40.
[19]
Materialize Inc. 2021. Materialize: Event Streaming Database for Real-Time Applications. https://materialize.com/ Retrieved June 22, 2021 from
[20]
Jeyhun Karimov, Tilmann Rabl, and Volker Markl. 2019 a. AJoin: Ad-hoc Stream Joins at Scale. Proc. VLDB Endow., Vol. 13, 4 (2019), 435--448.
[21]
Jeyhun Karimov, Tilmann Rabl, and Volker Markl. 2019 b. AStream: Ad-hoc Shared Stream Processing. In SIGMOD Conference. ACM, 607--622.
[22]
Oliver Kennedy, Yanif Ahmad, and Christoph Koch. 2011. DBToaster: Agile Views for a Dynamic Data Management System. In CIDR. www.cidrdb.org, 284--295.
[23]
Christoph Koch. 2010. Incremental query evaluation in a ring of databases. In PODS. ACM, 87--98.
[24]
Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nö tzli, Daniel Lupei, and Amir Shaikhha. 2014a. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J., Vol. 23, 2 (2014), 253--278.
[25]
Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nö tzli, Daniel Lupei, and Amir Shaikhha. 2014b. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J., Vol. 23, 2 (2014), 253--278.
[26]
Frank McSherry, Andrea Lattuada, Malte Schwarzkopf, and Timothy Roscoe. 2020. Shared Arrangements: practical inter-query sharing for streaming dataflows. Proc. VLDB Endow., Vol. 13, 10 (2020), 1793--1806.
[27]
Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow. In CIDR. www.cidrdb.org.
[28]
Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Mart'i n Abadi. 2013. Naiad: a timely dataflow system. In SOSP. ACM, 439--455.
[29]
Milos Nikolic, Mohammad Dashti, and Christoph Koch. 2016. How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates. In SIGMOD Conference. ACM, 511--526.
[30]
Oracle. 2021. TreeMap (Java Platform SE 8 ). https://docs.oracle.com/javase/8/docs/api/java/util/TreeMap.html Retrieved June 22, 2021 from
[31]
Tiark Rompf and Nada Amin. 2019. A SQL to C compiler in 500 lines of code. J. Funct. Program., Vol. 29 (2019), e9.
[32]
Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In GPCE. ACM, 127--136.
[33]
Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In POPL. ACM, 497--510.
[34]
Scala. 2021. TreeMap (Scala Standard Library). https://www.scala-lang.org/api/current/scala/collection/mutable/TreeMap.html Retrieved June 22, 2021 from
[35]
Robert Sedgewick. 2008. Left-Leaning Red-Black Trees. https://www.cs.princeton.edu/ rs/talks/LLRB/RedBlack.pdf
[36]
Walid Taha and Tim Sheard. 2000. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci., Vol. 248, 1--2 (2000), 211--242.
[37]
Ruby Y. Tahboub, Gré gory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In SIGMOD Conference. ACM, 307--322.
[38]
Kian-Lee Tan, Cheng Hian Goh, and Beng Chin Ooi. 2000. Progressive evaluation of nested aggregate queries. VLDB J., Vol. 9, 3 (2000), 261--278.
[39]
Dixin Tang, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, and Michael J. Franklin. 2019. Intermittent Query Processing. Proc. VLDB Endow., Vol. 12, 11 (2019), 1427--1441. https://doi.org/10.14778/3342263.3342278
[40]
Dixin Tang, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, and Michael J. Franklin. 2020. Thrifty Query Execution via Incrementability. In SIGMOD Conference. ACM, 1241--1256. https://doi.org/10.1145/3318464.3389756
[41]
Dixin Tang, Zechao Shang, William W. Ma, Aaron J. Elmore, and Sanjay Krishnan. 2021. Resource-efficient Shared Query Execution via Exploiting Time Slackness. In SIGMOD Conference. ACM, 1797--1810. https://doi.org/10.1145/3448016.3457282
[42]
Qichen Wang and Ke Yi. 2020. Maintaining Acyclic Foreign-Key Joins under Updates. In SIGMOD Conference. ACM, 1225--1239.
[43]
Zuozhi Wang, Kai Zeng, Botong Huang, Wei Chen, Xiaozong Cui, Bo Wang, Ji Liu, Liya Fan, Dachuan Qu, Zhenyu Hou, Tao Guan, Chen Li, and Jingren Zhou. 2020 a. Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing. In SIGMOD Conference. ACM, 2797--2800. https://doi.org/10.1145/3318464.3384708
[44]
Zuozhi Wang, Kai Zeng, Botong Huang, Wei Chen, Xiaozong Cui, Bo Wang, Ji Liu, Liya Fan, Dachuan Qu, Zhenyu Hou, Tao Guan, Chen Li, and Jingren Zhou. 2020 b. Tempura: A General Cost-Based Optimizer Framework for Incremental Data Processing. Proc. VLDB Endow., Vol. 14, 1 (2020), 14--27. https://doi.org/10.14778/3421424.3421427
[45]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: fault-tolerant streaming computation at scale. In SOSP. ACM, 423--438.
[46]
Kai Zeng, Sameer Agarwal, and Ion Stoica. 2016. iOLAP: Managing Uncertainty for Efficient Incremental OLAP. In SIGMOD Conference. ACM, 1347--1361.
[47]
Jingren Zhou, Per-Åke Larson, and Hicham G. Elmongui. 2007. Lazy Maintenance of Materialized Views. In VLDB. ACM, 231--242. https://doi.org/doi/10.5555/1325851.1325881

Cited By

View all
  • (2024)From Batch to Stream: Automatic Generation of Online AlgorithmsProceedings of the ACM on Programming Languages10.1145/36564188:PLDI(1014-1039)Online publication date: 20-Jun-2024
  • (2024)Incremental Computation: What Is the Essence? (Invited Contribution)Proceedings of the 2024 ACM SIGPLAN International Workshop on Partial Evaluation and Program Manipulation10.1145/3635800.3637447(39-52)Online publication date: 11-Jan-2024
  • (2024)Rhyme: A Data-Centric Multi-paradigm Query Language Based on Functional Logic MetaprogrammingFunctional and Logic Programming10.1007/978-981-97-2300-3_14(273-288)Online publication date: 15-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Check for updates

Author Tags

  1. aggregate indexes
  2. incremental processing
  3. nested aggregate queries
  4. parent-relative trees

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)185
  • Downloads (Last 6 weeks)19
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)From Batch to Stream: Automatic Generation of Online AlgorithmsProceedings of the ACM on Programming Languages10.1145/36564188:PLDI(1014-1039)Online publication date: 20-Jun-2024
  • (2024)Incremental Computation: What Is the Essence? (Invited Contribution)Proceedings of the 2024 ACM SIGPLAN International Workshop on Partial Evaluation and Program Manipulation10.1145/3635800.3637447(39-52)Online publication date: 11-Jan-2024
  • (2024)Rhyme: A Data-Centric Multi-paradigm Query Language Based on Functional Logic MetaprogrammingFunctional and Logic Programming10.1007/978-981-97-2300-3_14(273-288)Online publication date: 15-May-2024
  • (2024)Rhyme: A Data-Centric Expressive Query Language for Nested Data StructuresPractical Aspects of Declarative Languages10.1007/978-3-031-52038-9_5(64-81)Online publication date: 15-Jan-2024
  • (2023)DBSP: Automatic Incremental View Maintenance for Rich Query LanguagesProceedings of the VLDB Endowment10.14778/3587136.358713716:7(1601-1614)Online publication date: 1-Mar-2023
  • (2023)Foreign Keys Open the Door for Faster Incremental View MaintenanceProceedings of the ACM on Management of Data10.1145/35887201:1(1-25)Online publication date: 30-May-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media