Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Public Access

HoTTSQL: proving query rewrites with univalent SQL semantics

Published: 14 June 2017 Publication History

Abstract

Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which makes reasoning about rewrite rules difficult. In this paper, we propose a machine-checkable denotational semantics for SQL, the de facto language for relational database, for rigorously validating rewrite rules. Unlike previously proposed semantics that are either non-mechanized or only cover a small amount of SQL language features, our semantics covers all major features of SQL, including bags, correlated subqueries, aggregation, and indexes. Our mechanized semantics, called HoTT SQL, is based on K-Relations and homotopy type theory, where we denote relations as mathematical functions from tuples to univalent types. We have implemented HoTTSQL in Coq, which takes only fewer than 300 lines of code and have proved a wide range of SQL rewrite rules, including those from database research literature (e.g., magic set rewrites) and real-world query optimizers (e.g., subquery elimination). Several of these rewrite rules have never been previously proven correct. In addition, while query equivalence is generally undecidable, we have implemented an automated decision procedure using HoTTSQL for conjunctive queries: a well studied decidable fragment of SQL that encompasses many real-world queries.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
F. Bancilhon, D. Maier, Y. Sagiv, and J. D. Ullman. Magic sets and other strange ways to implement logic programs. In PODS, pages 1–15, 1986.
[3]
B. Barras, B. Grégoire, A. Mahboubi, and L. Théry. Coq reference manual chapter 25: The ring and field tactic families. https://coq.inria.fr/refman/Reference-Manual028. html.
[4]
V. Benzaken, E. Contejean, and S. Dumbrava. A coq formalization of the relational data model. In ESOP, pages 189–208, 2014.
[5]
D. Bruijn and N. Govert. Lambda calculus notation with nameless dummies: A tool for automatic formula manipulation, with application to the church-rosser theorem. Indagationes Mathematicae, 34:381–392, 1972.
[6]
P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87–96, 1994.
[7]
P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theor. Comput. Sci., 149(1):3–48, 1995.
[8]
A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77–90. ACM, 1977.
[9]
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, 1997.
[10]
S. Chaudhuri and M. Y. Vardi. Optimization of Real conjunctive queries. In Proceedings of the Twelfth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, May 25-28, 1993, Washington, DC, USA, pages 59–70, 1993.
[11]
S. Chaudhuri and M. Y. Vardi. On the complexity of equivalence between recursive and nonrecursive datalog programs. In PODS, pages 107–116. ACM Press, 1994.
[12]
H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In SOSP, pages 18–37, 2015.
[13]
M. Cherniack and S. B. Zdonik. Rule languages and internal algebras for rule-based optimizers. In SIGMOD Conference, pages 401–412. ACM Press, 1996.
[14]
M. Cherniack and S. B. Zdonik. Changing the rules: Transformations for rule-based optimizers. In SIGMOD Conference, pages 61–72. ACM Press, 1998.
[15]
A. Chlipala. The bedrock structured programming system: combining generative metaprogramming and hoare logic in an extensible program verifier. In ICFP, pages 391–402. ACM, 2013.
[16]
S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. CoRR, abs/1607.04822, 2016. 04822.
[17]
S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR. www.cidrdb.org, 2017.
[18]
E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377–387, 1970.
[19]
E. F. Codd. Relational completeness of data base sublanguages. In: R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose, California, 1972.
[20]
S. Dar, M. J. Franklin, B. T. J´onsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In VLDB, pages 330–341. Morgan Kaufmann, 1996.
[21]
C. J. Date. A Guide to the SQL Standard, Second Edition. Addison-Wesley, 1989. ISBN 978-0-201-50209-1.
[22]
R. A. Ganski and H. K. T. Wong. Optimization of nested SQL queries revisited. In SIGMOD Conference, pages 23–33. ACM Press, 1987.
[23]
H. Garcia-Molina, J. D. Ullman, and J. Widom. Database systems - the complete book (2. ed.). Pearson Education, 2009. ISBN 978-0-13-187325-4.
[24]
G. Geck, B. Ketsman, F. Neven, and T. Schwentick. Parallelcorrectness and containment for conjunctive queries with union and negation. In ICDT, volume 48 of LIPIcs, pages 9:1– 9:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.
[25]
G. Graefe. The cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19–29, 1995.
[26]
G. Graefe and D. J. DeWitt. The EXODUS optimizer generator. In SIGMOD Conference, pages 160–172. ACM Press, 1987.
[27]
G. Graefe and W. J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209– 218. IEEE Computer Society, 1993.
[28]
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31–40, 2007.
[29]
J. Gross, M. Shulman, A. Bauer, P. L. Lumsdaine, A. Mahboubi, and B. Spitters. The hott libary in coq. https: //github.com/HoTT/HoTT.
[30]
J. V. Guttag and J. J. Horning. Larch: Languages and Tools for Formal Specification. Springer-Verlag New York, Inc., New York, NY, USA, 1993. ISBN 0-387-94006-5.
[31]
L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible query processing in starburst. In SIGMOD Conference, pages 377–388. ACM Press, 1989.
[32]
C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B. Zill. Ironclad apps: End-to-end security via automated full-system verification. In OSDI, pages 165–181. USENIX Association, 2014.
[33]
Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. ACM Trans. Database Syst., 20(3):288–324, 1995.
[34]
ISO/IEC. Iso/iec 9075-1:2011. https://www.iso.org/obp/ ui/#iso:std:iso-iec:9075:-1:ed-4:v1:en. Online; accessed 9-May-2016.
[35]
T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for REAL conjunctive queries with inequalities. In PODS, pages 80–89. ACM, 2006.
[36]
M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: questions asked frequently. In PODS, pages 13–28, 2016.
[37]
G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. sel4: formal verification of an OS kernel. In SOSP, pages 207–220. ACM, 2009.
[38]
X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, 2009.
[39]
A. Y. Levy, I. S. Mumick, and Y. Sagiv. Query optimization by predicate move-around. In VLDB, pages 96–107. Morgan Kaufmann, 1994.
[40]
G. Malecha and R. Wisnesky. Using dependent types and tactics to enable semantic optimization of language-integrated queries. In DBPL, pages 49–58. ACM, 2015.
[41]
J. G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Toward a verified relational database management system. In POPL, pages 237–248, 2010.
[42]
I. S. Mumick, S. J. Finkelstein, H. Pirahesh, and R. Ramakrishnan. Magic is relevant. In SIGMOD Conference, pages 247–258, 1990.
[43]
M. Muralikrishna. Improved unnesting algorithms for join aggregate SQL queries. In VLDB, pages 91–102. Morgan Kaufmann, 1992.
[44]
M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of SQL queries. ACM Trans. Database Syst., 16(3):513–534, 1991.
[45]
G. Nelson and D. C. Oppen. Fast decision procedures based on congruence closure. J. ACM, 27(2):356–364, 1980.
[46]
H. Pirahesh, J. M. Hellerstein, and W. Hasan. Extensible/rule based query rewrite optimization in starburst. In SIGMOD Conference, pages 39–48. ACM Press, 1992.
[47]
J. Rohmer, R. Lescoeur, and J. Kerisit. The alexander method - A technique for the processing of recursive axioms in deductive databases. New Generation Comput., 4(3):273–285, 1986.
[48]
Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4):633–655, 1980.
[49]
D. Schmitt. Bug #5673: Optimizer creates strange execution plan leading to wrong results. https: //www.postgresql.org/message-id/201009231503.
[50]
[email protected]. Online; accessed 1-July-2016.
[51]
P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. Y. C. Leung, R. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In SIGMOD Conference, pages 435–446, 1996.
[52]
M. Sulik. Bug #70038: Wrong select count distinct with a field included in two-column unique key. http://bugs.mysql. com/bug.php?id=70038. Online; accessed 1-July-2016.
[53]
The Univalent Foundations Program. Homotopy Type Theory: Univalent Foundations of Mathematics. https: //homotopytypetheory.org/book, Institute for Advanced Study, 2013.
[54]
B. A. Trakhtenbrot. Impossibility of an algorithm for the decision problem in finite classes. Dok. Akad. Nauk USSR, 70(1):569–572, 1950.
[55]
Transaction Processing Performance Council (TPC). Tpc benchmark h revision 2.17.1. http://www.tpc.org/tpc documents current versions/pdf/tpc-h v2.17.1.pdf.
[56]
O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis. The GMAP: A versatile tool for physical data independence. In VLDB, pages 367–378. Morgan Kaufmann, 1994.
[57]
J. D. Ullman. Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press, 1989.
[58]
J. D. Ullman. Information integration using logical views. In ICDT, volume 1186 of Lecture Notes in Computer Science, pages 19–40. Springer, 1997.
[59]
R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. In PODS, pages 331– 345, 1992.
[60]
M. Veanes, P. Grigorenko, P. de Halleux, and N. Tillmann. Symbolic query exploration. In ICFEM, pages 49–68, 2009.
[61]
M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In LPAR (Dakar), volume 6355 of Lecture Notes in Computer Science, pages 425–446. Springer, 2010.
[62]
D. Vytiniotis, S. L. P. Jones, K. Claessen, and D. Rosén. HALO: haskell to logic through denotational semantics. In POPL, pages 431–442, 2013.
[63]
K. Weitz, D. Woos, E. Torlak, M. D. Ernst, A. Krishnamurthy, and Z. Tatlock. Bagpipe: Verified BGP configuration checking. Technical Report UW-CSE-16-01-01, University of Washington Department of Computer Science and Engineering, Seattle, WA, USA, Jan. 2016.
[64]
J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368. ACM, 2015.

Cited By

View all
  • (2024)QED: A Powerful Query Equivalence Decider for SQLProceedings of the VLDB Endowment10.14778/3681954.368202417:11(3602-3614)Online publication date: 1-Jul-2024
  • (2024)Qr-Hint: Actionable Hints Towards Correcting Wrong SQL QueriesProceedings of the ACM on Management of Data10.1145/36549952:3(1-27)Online publication date: 30-May-2024
  • (2023)Towards Auto-Generated Data SystemsProceedings of the VLDB Endowment10.14778/3611540.361163516:12(4116-4129)Online publication date: 12-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 6
PLDI '17
June 2017
708 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140587
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2017
    708 pages
    ISBN:9781450349888
    DOI:10.1145/3062341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017
Published in SIGPLAN Volume 52, Issue 6

Check for updates

Author Tags

  1. Equivalence
  2. Formal Semantics
  3. Homotopy Types
  4. SQL

Qualifiers

  • Article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)277
  • Downloads (Last 6 weeks)64
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)QED: A Powerful Query Equivalence Decider for SQLProceedings of the VLDB Endowment10.14778/3681954.368202417:11(3602-3614)Online publication date: 1-Jul-2024
  • (2024)Qr-Hint: Actionable Hints Towards Correcting Wrong SQL QueriesProceedings of the ACM on Management of Data10.1145/36549952:3(1-27)Online publication date: 30-May-2024
  • (2023)Towards Auto-Generated Data SystemsProceedings of the VLDB Endowment10.14778/3611540.361163516:12(4116-4129)Online publication date: 12-Sep-2023
  • (2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 24-Aug-2023
  • (2023)Leveraging Application Data Constraints to Optimize Database-Backed Web ApplicationsProceedings of the VLDB Endowment10.14778/3583140.358314116:6(1208-1221)Online publication date: 20-Apr-2023
  • (2023)Proving Query Equivalence Using Linear Integer ArithmeticProceedings of the ACM on Management of Data10.1145/36267681:4(1-26)Online publication date: 12-Dec-2023
  • (2023)GEqO: ML-Accelerated Semantic Equivalence DetectionProceedings of the ACM on Management of Data10.1145/36267101:4(1-25)Online publication date: 12-Dec-2023
  • (2023)Fine-Tuning Data Structures for Query ProcessingProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580016(149-161)Online publication date: 17-Feb-2023
  • (2023)Graph-Attention-Network-Based Cost Estimation Model in Materialized View Environment2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00198(1388-1396)Online publication date: 17-Dec-2023
  • (2022)Functional collection programming with semi-ring dictionariesProceedings of the ACM on Programming Languages10.1145/35273336:OOPSLA1(1-33)Online publication date: 29-Apr-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media