Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Evaluating Datalog over Semirings: A Grounding-based Approach

Published: 14 May 2024 Publication History

Abstract

Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring σ, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over σ: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over σ. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of databases. Vol. 8. Addison-Wesley Reading.
[2]
Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS '16). New York, NY, USA.
[3]
Foto Afrati and Christos H Papadimitriou. 1993. The parallel complexity of simple logic programs. Journal of the ACM (JACM), Vol. 40, 4 (1993), 891--916.
[4]
Lars Ole Andersen. 1994. Program analysis and specialization for the C programming language. Ph.,D. Dissertation. Citeseer.
[5]
Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D Ullman. 1985. Magic sets and other strange ways to implement logic programs. In Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems. 1--15.
[6]
Pablo Barceló Baeza. 2013. Querying graph databases. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems. 175--188.
[7]
Nofar Carmeli, Batya Kenig, Benny Kimelfeld, and Markus Krö ll. 2021. Efficiently enumerating minimal triangulations. Discret. Appl. Math., Vol. 303 (2021), 216--236.
[8]
Nofar Carmeli and Markus Kröll. 2021. On the Enumeration Complexity of Unions of Conjunctive Queries. ACM Transactions on Database Systems (TODS), Vol. 46, 2 (2021), 1--41.
[9]
Katrin Casel and Markus L. Schmid. 2021. Fine-Grained Complexity of Regular Path Queries. In ICDT (LIPIcs, Vol. 186). Schloss Dagstuhl - Leibniz-Zentrum fü r Informatik, 19:1--19:20.
[10]
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2022. Introduction to algorithms. MIT press.
[11]
Stavros Cosmadakis. 1999. Inherent complexity of recursive queries. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 148--154.
[12]
Bruno Courcelle. 1990. Graph rewriting: An algebraic and logic approach. In Formal Models and Semantics. Elsevier, 193--242.
[13]
Brian A Davey and Hilary A Priestley. 2002. Introduction to lattices and order. Cambridge university press.
[14]
Daniel Deutch, Tova Milo, Sudeepa Roy, and Val Tannen. 2014. Circuits for Datalog Provenance. In ICDT, Vol. 3. Citeseer, 2014.
[15]
Edsger W Dijkstra. 2022. A note on two problems in connexion with graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy. 287--290.
[16]
Javier Esparza, Stefan Kiefer, and Michael Luttenberger. 2010. Newtonian Program Analysis. J. ACM, Vol. 57, 6, Article 33 (nov 2010), bibinfonumpages47 pages.
[17]
Robert W Floyd. 1962. Algorithm 97: shortest path. Commun. ACM, Vol. 5, 6 (1962), 345.
[18]
Jörg Flum, Markus Frick, and Martin Grohe. 2002. Query evaluation via tree-decompositions. Journal of the ACM (JACM), Vol. 49, 6 (2002), 716--752.
[19]
J. Nathan Foster, Todd J. Green, and Val Tannen. 2008. Annotated XML: Queries and Provenance. In PODS. ACM, 271--280.
[20]
Michael R Garey. 1997. Computers and intractability: A guide to the theory of np-completeness, freeman. Fundamental (1997).
[21]
Georg Gottlob, Erich Gr"adel, and Helmut Veith. 2002. Datalog LITE: A deductive query language with linear time model checking. ACM Transactions on Computational Logic (TOCL), Vol. 3, 1 (2002), 42--79.
[22]
Georg Gottlob and Christoph Koch. 2004. Monadic datalog and the expressive power of languages for Web information extraction. J. ACM, Vol. 51, 1 (2004), 74--113.
[23]
Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. 2013. Datalog and Recursive Query Processing. Found. Trends Databases, Vol. 5, 2 (2013), 105--195.
[24]
Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 31--40.
[25]
Sungjin Im, Benjamin Moseley, Hung Ngo, and Kirk Pruhs. 2023. On the Convergence Rate of Linear Datalogo over Stable Semirings. arXiv preprint arXiv:2311.17664 (2023).
[26]
Manas Joglekar and Christopher Ré. 2018. It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins. Theory Comput. Syst., Vol. 62, 4 (2018), 810--853.
[27]
Stasys Jukna. 2015. Lower Bounds for Tropical Circuits and Dynamic Programs. Theory Comput. Syst., Vol. 57, 1 (2015), 160--194.
[28]
Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu, and Yisu Remy Wang. 2022. Convergence of Datalog over (Pre-) Semirings. In PODS. ACM, 105--117.
[29]
Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. 2017. What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?. In PODS. ACM, 429--444.
[30]
Donald E Knuth. 1977. A generalization of Dijkstra's algorithm. Inform. Process. Lett., Vol. 6, 1 (1977), 1--5.
[31]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 780--793.
[32]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2021. On the complexity of bidirected interleaved Dyck-reachability. Proceedings of the ACM on Programming Languages, Vol. 5, POPL (2021), 1--28.
[33]
Andrea Lincoln, Virginia Vassilevska Williams, and R. Ryan Williams. 2018. Tight Hardness for Shortest Cycles and Paths in Sparse Graphs. In SODA. SIAM, 1236--1252.
[34]
Carsten Lutz and Marcin Przybylko. 2022. Efficiently enumerating answers to ontology-mediated queries. In Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 277--289.
[35]
Dá niel Marx. 2010. Can You Beat Treewidth? Theory Comput., Vol. 6, 1 (2010), 85--112.
[36]
Anders Alnor Mathiasen and Andreas Pavlogiannis. 2021. The fine-grained and parallel complexity of andersen's pointer analysis. Proc. ACM Program. Lang., Vol. 5, POPL (2021), 1--29.
[37]
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-case Optimal Join Algorithms. J. ACM, Vol. 65, 3 (2018), 16:1--16:40.
[38]
Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew Strikes Back: New Developments in the Theory of Join Algorithms. SIGMOD Rec. (2014).
[39]
Yann Ramusat, Silviu Maniu, and Pierre Senellart. 2021a. A Practical Dynamic Programming Approach to Datalog Provenance Computation. CoRR, Vol. abs/2112.01132 (2021).
[40]
Yann Ramusat, Silviu Maniu, and Pierre Senellart. 2021b. Provenance-Based Algorithms for Rich Queries over Graph Databases. In EDBT. OpenProceedings.org, 73--84.
[41]
Thomas W. Reps. 1998. Program analysis via graph reachability. Inf. Softw. Technol., Vol. 40, 11--12 (1998), 701--726.
[42]
Bernhard Scholz, Herbert Jordan, Pavle Subotic, and Till Westmann. 2016. On fast large-scale program analysis in Datalog. In CC. ACM, 196--206.
[43]
Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013. Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis. Proc. VLDB Endow., Vol. 6, 14 (2013), 1906--1917. https://doi.org/10.14778/2556549.2556572
[44]
Yannis Smaragdakis and George Balatsouras. 2015. Pointer Analysis. Found. Trends Program. Lang., Vol. 2, 1 (2015), 1--69.
[45]
Jeffrey D. Ullman and Allen Van Gelder. 1988. Parallel Complexity of Logical Query Programs. Algorithmica, Vol. 3 (1988), 5--42.
[46]
Leslie Valiant. 1974. General context-free recognition in less than cubic time. (1974).
[47]
Moshe Y Vardi. 1982. The complexity of relational query languages. In Proceedings of the fourteenth annual ACM symposium on Theory of computing. 137--146.
[48]
Yilei Wang and Ke Yi. 2021. Secure Yannakakis: Join-Aggregate Queries over Private Data. In SIGMOD Conference. ACM, 1969--1981.
[49]
Stephen Warshall. 1962. A theorem on boolean matrices. Journal of the ACM (JACM), Vol. 9, 1 (1962), 11--12.
[50]
Mihalis Yannakakis. 1990. Graph-Theoretic Methods in Database Theory. In PODS. ACM Press, 230--242.
[51]
Clement Tak Yu and Meral Z Ozsoyoglu. 1979. An algorithm for tree-query membership of a distributed query. In COMPSAC 79. Proceedings. Computer Software and The IEEE Computer Society's Third International Applications Conference, 1979. IEEE, 306--312.
[52]
Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, and Val Tannen. 2024. Evaluating Datalog over Semirings: A Grounding-based Approach. arXiv preprint arXiv:2403.12436 (2024).

Cited By

View all
  • (2024)Output-sensitive Conjunctive Query EvaluationProceedings of the ACM on Management of Data10.1145/36958382:5(1-24)Online publication date: 7-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 2
PODS
May 2024
852 pages
EISSN:2836-6573
DOI:10.1145/3665155
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2024
Published in PACMMOD Volume 2, Issue 2

Author Tags

  1. datalog
  2. evaluation
  3. grounding
  4. semirings

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)247
  • Downloads (Last 6 weeks)48
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Output-sensitive Conjunctive Query EvaluationProceedings of the ACM on Management of Data10.1145/36958382:5(1-24)Online publication date: 7-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media