Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2213556.2213565acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Worst-case optimal join algorithms: [extended abstract]

Published: 21 May 2012 Publication History

Abstract

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this paper that any project-join plan is polynomially slower than the optimal bound for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. In addition, we show that this bound is equivalent to a geometric inequality by Bollobás and Thomason, one of whose special cases is the famous Loomis-Whitney inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. In PODS, pages 10--20, 1999.
[3]
N. Alon, I. Newman, A. Shen, G. Tardos, and N. K. Vereshchagin. Partitioning multi-dimensional sets in a small number of "uniform" parts. Eur. J. Comb., 28(1):134--144, 2007.
[4]
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.
[5]
A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748. IEEE, 2008.
[6]
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272, 2000.
[7]
S. Babu, P. Bizarro, and D. J. DeWitt. Proactive reoptimization. In SIGMOD Conference, pages 107--118, 2005.
[8]
B. Bollobás and A. Thomason. Projections of bodies and hereditary properties of hypergraphs. Bull. London Math. Soc., 27(5), 1995.
[9]
F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A, 43(1):23--37, 1986.
[10]
A. Deligiannakis, M. N. Garofalakis, and N. Roussopoulos. Extended wavelets for multiple measures. TODS, 32(2):10, 2007.
[11]
J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716--752, 2002.
[12]
A. C. Gilbert, H. Q. Ngo, E. Porat, A. Rudra, and M. J. Strauss. Efficiently decodable l 2/l 2 for each compressed sensing with tiny failure probability, November 2011. Manuscript.
[13]
G. Gottlob, S. T. Lee, and G. Valiant. Size and treewidth bounds for conjunctive queries. In PODS, pages 45--54, 2009.
[14]
G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions: A survey. In MFCS, 2001.
[15]
G. Gottlob, Z. Miklós, and T. Schwentick. Generalized hypertree decompositions: np-hardness and tractable variants. In PODS, 2007.
[16]
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993.
[17]
M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298, 2006.
[18]
K. Gyarmati, M. Matolcsi, and I. Z. Ruzsa. A superadditivity and submultiplicativity property for cardinalities of sumsets. Combinatorica, 30(2):163--174, 2010.
[19]
T. S. Han. Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36(2):133--156, 1978.
[20]
Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003.
[21]
Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, 1991.
[22]
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.
[23]
H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In VLDB, 1998.
[24]
A. C. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In VLDB, 1999.
[25]
A. R. Lehman and E. Lehman. Network coding: does the model need tuning? In SODA, pages 499--504, 2005.
[26]
L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc, 55:961--962, 1949.
[27]
R. Lyons. Probability on trees and networks, jun 2011. with Yuval Peres url: http://php.indiana.edu/ rdlyons/prbtree/prbtree.html.
[28]
V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, pages 373--384, 2005.
[29]
D. Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC, pages 735--744, 2010.
[30]
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms, 2012. arXiv:1203.1952 {cs.DB}.
[31]
H. Q. Ngo, E. Porat, and A. Rudra. Personal Communciation.
[32]
A. Pagh and R. Pagh. Scalable computation of acyclic joins. In PODS, pages 225--232, 2006.
[33]
V. Poosala, Y. Ioannidis, P. Haas, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD, pages 294--305, 1996.
[34]
A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.
[35]
U. Srivastava, P. J. Haas, V. Markl,M. Kutsch, and T. M. Tran. Isomer: Consistent histogram construction using query feedback. In ICDE, page 39, 2006.
[36]
K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11):852--863, 2011.
[37]
L. G. Valiant and V. V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85--93, 1986.
[38]
D. E. Willard. Applications of range query theory to relational data base join and selection operations. J. Comput. Syst. Sci., 52(1), 1996.
[39]
Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD, 2008.

Cited By

View all
  • (2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
  • (2024)Space & Time Efficient Leapfrog TriejoinProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661898(1-9)Online publication date: 14-Jun-2024
  • (2024)Tackling Challenges in Implementing Large-Scale Graph DatabasesCommunications of the ACM10.1145/3653314Online publication date: 16-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
May 2012
332 pages
ISBN:9781450312486
DOI:10.1145/2213556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bollob?s-thomason inequality
  2. fractional cover bound
  3. join algorithms
  4. loomis-whitney inequality

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

PODS '12 Paper Acceptance Rate 26 of 101 submissions, 26%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
  • (2024)Space & Time Efficient Leapfrog TriejoinProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661898(1-9)Online publication date: 14-Jun-2024
  • (2024)Tackling Challenges in Implementing Large-Scale Graph DatabasesCommunications of the ACM10.1145/3653314Online publication date: 16-Jul-2024
  • (2024)The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra SpaceACM Transactions on Database Systems10.1145/364482449:2(1-45)Online publication date: 23-Mar-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • (2024)Recent Increments in Incremental View MaintenanceCompanion of the 43rd Symposium on Principles of Database Systems10.1145/3635138.3654763(8-17)Online publication date: 9-Jun-2024
  • (2024)Parallel Acyclic Joins: Optimal Algorithms and Cyclicity SeparationJournal of the ACM10.1145/363351271:1(1-44)Online publication date: 11-Feb-2024
  • (2024)To Tile or not to Tile, That is the Question2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00096(449-458)Online publication date: 27-May-2024
  • (2023)Free Join: Unifying Worst-Case Optimal and Traditional JoinsProceedings of the ACM on Management of Data10.1145/35892951:2(1-23)Online publication date: 20-Jun-2023
  • (2023)On Join Sampling and the Hardness of Combinatorial Output-Sensitive Join AlgorithmsProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588666(99-111)Online publication date: 18-Jun-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media