article

Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

Authors:

Semih Salihoglu,

Kenneth SalemAuthors Info & Claims

ACM SIGMOD Record, Volume 52, Issue 1

Pages 94 - 102

https://doi.org/10.1145/3604437.3604458

Published: 08 June 2023 Publication History

Abstract

We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We show that optimistic estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains subqueries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

References

[1]

M. Abo Khamis, H. Q. Ngo, and D. Suciu. Computing Join Queries with Functional Dependencies. In PODS, 2016.

Digital Library

[2]

A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB, 2001.

Digital Library

[3]

A. Aboulnaga and S. Chaudhuri. Self-Tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD, 1999.

Digital Library

[4]

G. Alu¸c, O. Hartig, M. T. ¨Ozsu, and K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In ISWC, 2014.

[5]

A. Atserias, M. Grohe, and D. Marx. Size Bounds and Query Plans for Relational Joins. SICOMP, 42(4), 2013.

[6]

W. Cai, M. Balazinska, and D. Suciu. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD, 2019.

Digital Library

[7]

CEG Evaluation Source Code. https://github.com/cetechreport/CEExperiments, 2022.

[8]

J. Chen, Y. Huang, W. Mushi, S. Semih, and S. Ken. Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs https://cs.uwaterloo.ca/~ssalihog/papers/ceglong. pdf. Technical report, January 2022.

[9]

J. Chen, Y. Huang, M. Wang, S. Salihoglu, and K. Salem. Accurate Summary-Based Cardinality Estimation through the Lens of Cardinality Estimation Graphs. PVLDB, 15(8), 2022.

[10]

Y. Chen and K. Yi. Random Sampling and Size Estimation Over Cyclic Joins. In ICDT, 2020.

[11]

DBLP 2012--11--28 Dump. https://dblp.org/, 2012.

[12]

Epinions. https: //snap.stanford.edu/data/soc-Epinions1.html, 2003.

[13]

L. Getoor, B. Taskar, and D. Koller. Selectivity Estimation Using Probabilistic Models. In SIGMOD, 2001.

Digital Library

[14]

Haas, Peter J. and Naughton, Jeffrey F. and Seshadri, S. and Swami, Arun N. Selectivity and Cost Estimation for Joins Based on Random Sampling. JCSS, 52(3), 1996.

[15]

Hetionet v1.0. https://het.io/, 2015.

[16]

M. Joglekar and C. R´e. It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins. TOCS, 62(4), 2018.

[17]

K. Kim, H. Kim, G. Fletcher, and W.-S. Han. Combining sampling and synopses with worst-case optimal runtime and quality guarantees for graph pattern cardinality estimation. In SIGMOD, 2021.

Digital Library

[18]

V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How Good Are Query Optimizers, Really? PVLDB, 9(3), 2015.

[19]

V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR, 2017.

[20]

V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ, 27(5), 2018.

[21]

F. Li, B. Wu, K. Yi, and Z. Zhao. Wander Join: Online Aggregation via Random Walks. In SIGMOD, 2016.

Digital Library

[22]

A. Maduko, K. Anyanwu, A. Sheth, and P. Schliekelman. Graph Summaries for Subgraph Frequency Estimation. In ESWC, 2008.

[23]

Y. Matias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. In SIGMOD, 1998.

Digital Library

[24]

A. Mhedhbi and S. Salihoglu. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. PVLDB, 12(11), 2019.

[25]

M. Muralikrishna and D. J. DeWitt. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In SIGMOD, 1988.

[26]

P. Negi, R. Marcus, H. Mao, N. Tatbul, T. Kraska, and M. Alizadeh. Cost-guided cardinality estimation: Focus where it matters. In ICDEW, 2020.

[27]

T. Neumann and G. Moerkotte. Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE, 2011.

Digital Library

[28]

Y. Park, S. Ko, S. S. Bhowmick, K. Kim, K. Hong, and W.-S. Han. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In SIGMOD, 2020.

Digital Library

[29]

N. Polyzotis and M. Garofalakis. Statistical Synopses for Graph-Structured XML Databases. In SIGMOD, 2002.

Digital Library

[30]

V. Poosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In VLDB, 1997.

Digital Library

[31]

G. Stefanoni, B. Motik, and E. V. Kostylev. Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. In WWW, 2018.

[32]

W. Sun, Y. Ling, N. Rishe, and Y. Deng. An Instant and Accurate Size Estimation Method for Joins and Selections in a Retrieval-Intensive Environment. In SIGMOD, 1993.

Digital Library

[33]

D. Vengerov, A. C. Menck, M. Zait, and S. P. Chakkappen. Join Size Estimation Subject to Filter Conditions. PVLDB, 8(12), 2015.

[34]

W. Wang, H. Jiang, H. Lu, and J. X. Yu. Bloom Histogram: Path Selectivity Estimation for XML Data with Updates. In VLDB, 2004.

Digital Library

[35]

WatDiv v.0.6. https://dsg.uwaterloo.ca/watdiv/, 2014.

[36]

L. Woltmann, C. Hartmann, M. Thiele, D. Habich, and W. Lehner. Cardinality estimation with local deep learning models. In aiDM, 2019.

[37]

L. Woltmann, D. Olwig, C. Hartmann, D. Habich, and W. Lehner. PostCENN: PostgreSQL with Machine Learning Models for Cardinality Estimation. PVLDB, 14(12), 2021.

Digital Library

[38]

W. Wu, J. F. Naughton, and H. Singh. Sampling-Based Query Re-Optimization. In SIGMOD, 2016.

Digital Library

[39]

Y. Wu, J. M. Patel, and H. V. Jagadish. Estimating Answer Sizes for XML Queries. In EDBT, 2002.

Digital Library

[40]

YAGO 1. https://yago-knowledge.org/downloads/yago-1, 2008.

[41]

N. Zhang, M. T. Ozsu, A. Aboulnaga, and I. F. Ilyas. XSEED: Accurate and Fast Cardinality Estimation for XPath Queries. In ICDE, 2006

Digital Library

Cited By

Hou WZhao XTang B(2024)LearnSC: An Efficient and Unified Learning-Based Framework for Subgraph Counting Problem2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00206(2625-2638)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00206

Recommendations

Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs

This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that ...
Pessimistic Cardinality Estimation

Cardinality Estimation is to estimate the size of the output of a query without computing it, by using only statistics on the input relations. Existing estimators try to return an unbiased estimate of the cardinality: this is notoriously difficult. A new ...
Cardinality Estimation Techniques in Relational Database Systems

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 52, Issue 1

March 2023

118 pages

ISSN:0163-5808

DOI:10.1145/3604437

Editors:
Rada Chirkova
North Carolina State University
,
Vanessa Braganholo
Universidade Federal Fluminense
,
Wim Martens
University of Bayreuth
,
Manos Athanassoulis
DBrainstorming
,
Marcelo Arenas
Research Highlights
,
Marianne Winslett
University of Illinois
,
Susan B. Davidson
The Future of Data(base) Education
,
Lyublena Antova
Datometry
,
Aaron J. Elmore
University of Chicago
,
Kyriakos Mouratidis
Singapore Management University
,
Dan Olteanu
University of Oxford
,
Immanuel Trummer
Cornell University
,
Yannis Velegrakis
Utrecht University
,
Renata Borovica-Gajic
Surveys
,
Tamer Özsu
University of Waterloo
,
Pınar Tözün
IT University of Copenhagen
,
Wook-Shin Han
Research and Vision columns
,
Kenneth Ross
Research Highlights
,
Alfons Kemper
Technical University of Munich
,
Samuel Madden
MIT

Issue’s Table of Contents

Copyright © 2023 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2023

Published in SIGMOD Volume 52, Issue 1

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
151
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hou WZhao XTang B(2024)LearnSC: An Efficient and Unified Learning-Based Framework for Subgraph Counting Problem2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00206(2625-2638)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00206

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents