Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

Published: 08 June 2023 Publication History

Abstract

We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We show that optimistic estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains subqueries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

References

[1]
M. Abo Khamis, H. Q. Ngo, and D. Suciu. Computing Join Queries with Functional Dependencies. In PODS, 2016.
[2]
A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB, 2001.
[3]
A. Aboulnaga and S. Chaudhuri. Self-Tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD, 1999.
[4]
G. Alu¸c, O. Hartig, M. T. ¨Ozsu, and K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In ISWC, 2014.
[5]
A. Atserias, M. Grohe, and D. Marx. Size Bounds and Query Plans for Relational Joins. SICOMP, 42(4), 2013.
[6]
W. Cai, M. Balazinska, and D. Suciu. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD, 2019.
[7]
CEG Evaluation Source Code. https://github.com/cetechreport/CEExperiments, 2022.
[8]
J. Chen, Y. Huang, W. Mushi, S. Semih, and S. Ken. Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs https://cs.uwaterloo.ca/~ssalihog/papers/ceglong. pdf. Technical report, January 2022.
[9]
J. Chen, Y. Huang, M. Wang, S. Salihoglu, and K. Salem. Accurate Summary-Based Cardinality Estimation through the Lens of Cardinality Estimation Graphs. PVLDB, 15(8), 2022.
[10]
Y. Chen and K. Yi. Random Sampling and Size Estimation Over Cyclic Joins. In ICDT, 2020.
[11]
DBLP 2012--11--28 Dump. https://dblp.org/, 2012.
[12]
Epinions. https: //snap.stanford.edu/data/soc-Epinions1.html, 2003.
[13]
L. Getoor, B. Taskar, and D. Koller. Selectivity Estimation Using Probabilistic Models. In SIGMOD, 2001.
[14]
Haas, Peter J. and Naughton, Jeffrey F. and Seshadri, S. and Swami, Arun N. Selectivity and Cost Estimation for Joins Based on Random Sampling. JCSS, 52(3), 1996.
[15]
Hetionet v1.0. https://het.io/, 2015.
[16]
M. Joglekar and C. R´e. It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins. TOCS, 62(4), 2018.
[17]
K. Kim, H. Kim, G. Fletcher, and W.-S. Han. Combining sampling and synopses with worst-case optimal runtime and quality guarantees for graph pattern cardinality estimation. In SIGMOD, 2021.
[18]
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3), 2015.
[19]
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR, 2017.
[20]
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ, 27(5), 2018.
[21]
F. Li, B. Wu, K. Yi, and Z. Zhao. Wander Join: Online Aggregation via Random Walks. In SIGMOD, 2016.
[22]
A. Maduko, K. Anyanwu, A. Sheth, and P. Schliekelman. Graph Summaries for Subgraph Frequency Estimation. In ESWC, 2008.
[23]
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. In SIGMOD, 1998.
[24]
A. Mhedhbi and S. Salihoglu. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. PVLDB, 12(11), 2019.
[25]
M. Muralikrishna and D. J. DeWitt. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In SIGMOD, 1988.
[26]
P. Negi, R. Marcus, H. Mao, N. Tatbul, T. Kraska, and M. Alizadeh. Cost-guided cardinality estimation: Focus where it matters. In ICDEW, 2020.
[27]
T. Neumann and G. Moerkotte. Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE, 2011.
[28]
Y. Park, S. Ko, S. S. Bhowmick, K. Kim, K. Hong, and W.-S. Han. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In SIGMOD, 2020.
[29]
N. Polyzotis and M. Garofalakis. Statistical Synopses for Graph-Structured XML Databases. In SIGMOD, 2002.
[30]
V. Poosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In VLDB, 1997.
[31]
G. Stefanoni, B. Motik, and E. V. Kostylev. Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. In WWW, 2018.
[32]
W. Sun, Y. Ling, N. Rishe, and Y. Deng. An Instant and Accurate Size Estimation Method for Joins and Selections in a Retrieval-Intensive Environment. In SIGMOD, 1993.
[33]
D. Vengerov, A. C. Menck, M. Zait, and S. P. Chakkappen. Join Size Estimation Subject to Filter Conditions. PVLDB, 8(12), 2015.
[34]
W. Wang, H. Jiang, H. Lu, and J. X. Yu. Bloom Histogram: Path Selectivity Estimation for XML Data with Updates. In VLDB, 2004.
[35]
WatDiv v.0.6. https://dsg.uwaterloo.ca/watdiv/, 2014.
[36]
L. Woltmann, C. Hartmann, M. Thiele, D. Habich, and W. Lehner. Cardinality estimation with local deep learning models. In aiDM, 2019.
[37]
L. Woltmann, D. Olwig, C. Hartmann, D. Habich, and W. Lehner. PostCENN: PostgreSQL with Machine Learning Models for Cardinality Estimation. PVLDB, 14(12), 2021.
[38]
W. Wu, J. F. Naughton, and H. Singh. Sampling-Based Query Re-Optimization. In SIGMOD, 2016.
[39]
Y. Wu, J. M. Patel, and H. V. Jagadish. Estimating Answer Sizes for XML Queries. In EDBT, 2002.
[40]
YAGO 1. https://yago-knowledge.org/downloads/yago-1, 2008.
[41]
N. Zhang, M. T. Ozsu, A. Aboulnaga, and I. F. Ilyas. XSEED: Accurate and Fast Cardinality Estimation for XPath Queries. In ICDE, 2006

Cited By

View all
  • (2024)LearnSC: An Efficient and Unified Learning-Based Framework for Subgraph Counting Problem2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00206(2625-2638)Online publication date: 13-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 52, Issue 1
March 2023
118 pages
ISSN:0163-5808
DOI:10.1145/3604437
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2023
Published in SIGMOD Volume 52, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)10
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LearnSC: An Efficient and Unified Learning-Based Framework for Subgraph Counting Problem2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00206(2625-2638)Online publication date: 13-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media