Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints

Published: 29 April 2024 Publication History

Abstract

The task of SQL query equivalence checking is important in various real-world applications (including query rewriting and automated grading) that involve complex queries with integrity constraints; yet, state-of-the-art techniques are very limited in their capability of reasoning about complex features (e.g., those that involve sorting, case statement, rich integrity constraints, etc.) in real-life queries. To the best of our knowledge, we propose the first SMT-based approach and its implementation, VeriEQL, capable of proving and disproving bounded equivalence of complex SQL queries. VeriEQL is based on a new logical encoding that models query semantics over symbolic tuples using the theory of integers with uninterpreted functions. It is simple yet highly practical -- our comprehensive evaluation on over 20,000 benchmarks shows that VeriEQL outperforms all state-of-the-art techniques by more than one order of magnitude in terms of the number of benchmarks that can be proved or disproved. VeriEQL can also generate counterexamples that facilitate many downstream tasks (such as finding serious bugs in systems like MySQL and Apache Calcite).

References

[1]
Alfred V. Aho, Yehoshua Sagiv, and Jeffrey D. Ullman. 1979. Equivalences among Relational Expressions. SIAM J. Comput., 8, 2 (1979), 218–246. https://doi.org/10.1137/0208017
[2]
Gilles Barthe, Juan Manuel Crespo, and César Kunz. 2011. Relational Verification Using Product Programs. In Proceedings of the International Symposium on Formal Methods (FM) (Lecture Notes in Computer Science, Vol. 6664). 200–214. https://doi.org/10.1007/978-3-642-21437-0_17
[3]
Nick Benton. 2004. Simple relational correctness proofs for static analyses and program transformations. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 14–25. https://doi.org/10.1145/964001.964003
[4]
Véronique Benzaken and Evelyne Contejean. 2019. A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs. 249–261. https://doi.org/10.1145/3293880.3294107
[5]
Calcite. 2023. Calcite Optimization Rules Test Suite. https://github.com/apache/calcite/blob/main/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java
[6]
Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panichella, and Arie van Deursen. 2018. Search-based test data generation for SQL queries. In Proceedings of the 40th international conference on software engineering. 1220–1230. https://doi.org/10.1145/3180155.3180202
[7]
Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In Proceedings of the ACM Symposium on Theory of Computing (STOC). ACM, 77–90. https://doi.org/10.1145/800105.803397
[8]
Bikash Chandra, Ananyo Banerjee, Udbhas Hazra, Mathew Joseph, and S Sudarshan. 2019. Automated grading of sql queries. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1630–1633. https://doi.org/10.1109/ICDE.2019.00159
[9]
Bikash Chandra, Bhupesh Chawda, Biplab Kar, KV Maheshwara Reddy, Shetal Shah, and S Sudarshan. 2015. Data generation for testing and grading SQL queries. The VLDB Journal, 24, 6 (2015), 731–755. https://doi.org/10.1007/s00778-015-0395-0
[10]
James Cheney and Wilmer Ricciotti. 2021. Comprehending nulls. In Proceedings of the International Symposium on Database Programming Languages (DBPL). ACM, 3–6. https://doi.org/10.1145/3475726.3475730
[11]
Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, (PLDI). ACM, 3–14. https://doi.org/10.1145/2491956.2462180
[12]
Shumo Chu, Daniel Li, Chenglong Wang, Alvin Cheung, and Dan Suciu. 2017. Demonstration of the cosette automated sql prover. In Proceedings of the 2017 ACM International Conference on Management of Data. 1591–1594. https://doi.org/10.1145/3035918.3058728
[13]
Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, and Dan Suciu. 2018. Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries. Proc. VLDB Endow., 11, 11 (2018), 1482–1495. https://doi.org/10.14778/3236187.3236200
[14]
Shumo Chu, Chenglong Wang, Konstantin Weitz, and Alvin Cheung. 2017. Cosette: An Automated Prover for SQL. In CIDR.
[15]
Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017. HoTTSQL: Proving query rewrites with univalent SQL semantics. ACM SIGPLAN Notices, 52, 6 (2017), 510–524. https://doi.org/10.1145/3062341.3062348
[16]
Edmund Clarke, Daniel Kroening, and Flavio Lerda. 2004. A tool for checking ANSI-C programs. In Tools and Algorithms for the Construction and Analysis of Systems: 10th International Conference, TACAS 2004, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2004, Barcelona, Spain, March 29-April 2, 2004. Proceedings 10. 168–176. https://doi.org/10.1007/978-3-540-24730-2_15
[17]
Fabien Coelho. 2013. DataFiller – generate random data from database schema. https://www.cri.ensmp.fr/people/coelho/datafiller.html
[18]
Lucas Cordeiro, Pascal Kesseli, Daniel Kroening, Peter Schrammel, and Marek Trtik. 2018. JBMC: A bounded model checking tool for verifying Java bytecode. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I. 183–190. https://doi.org/10.1007/978-3-319-96145-3_10
[19]
Cosette. 2018. Cosette website. https://cosette.cs.washington.edu/
[20]
Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS) (Lecture Notes in Computer Science, Vol. 4963). 337–340. https://doi.org/10.1007/978-3-540-78800-3_24
[21]
Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 689–700. https://doi.org/10.1145/2676726.2677006
[22]
Alin Deutsch, Monica Marcus, Liying Sui, Victor Vianu, and Dayou Zhou. 2005. A Verifier for Interactive, Data-Driven Web Applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 539–550. https://doi.org/10.1145/1066157.1066219
[23]
Alin Deutsch, Liying Sui, and Victor Vianu. 2007. Specification and verification of data-driven Web applications. J. Comput. Syst. Sci., 73, 3 (2007), 442–474. https://doi.org/10.1016/j.jcss.2006.10.006
[24]
John K. Feser, Sam Madden, Nan Tang, and Armando Solar-Lezama. 2020. Deductive optimization of relational data storage. Proc. ACM Program. Lang., 4, OOPSLA (2020), 170:1–170:30. https://doi.org/10.1145/3428238
[25]
Milos Gligoric and Rupak Majumdar. 2013. Model checking database applications. In Tools and Algorithms for the Construction and Analysis of Systems: 19th International Conference, TACAS 2013, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings 19. 549–564. https://doi.org/10.1007/978-3-642-36742-7_40
[26]
Goetz Graefe. 1995. The cascades framework for query optimization. IEEE Data Eng. Bull., 18, 3 (1995), 19–29.
[27]
Todd J. Green. 2011. Containment of Conjunctive Queries on Annotated Relations. Theory Comput. Syst., 49, 2 (2011), 429–459. https://doi.org/10.1007/s00224-011-9327-6
[28]
Paolo Guagliardo and Leonid Libkin. 2017. A Formal Semantics of SQL Queries, Its Validation, and Applications. Proc. VLDB Endow., 11, 1 (2017), sep, 27–39. issn:2150-8097 https://doi.org/10.14778/3151113.3151116
[29]
Yang He, Pinhan Zhao, Xinyu Wang, and Yuepeng Wang. 2024. Artifact Evaluation VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints. https://doi.org/10.5281/zenodo.10795614
[30]
Yang He, Pinhan Zhao, Xinyu Wang, and Yuepeng Wang. 2024. VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints. arxiv:2403.03193.
[31]
Akash Lal, Shaz Qadeer, and Shuvendu K Lahiri. 2012. A solver for reachability modulo theories. In Computer Aided Verification: 24th International Conference, CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceedings 24. 427–443. https://doi.org/10.1007/978-3-642-31424-7_32
[32]
LeetCode. 2023. LeetCode website. https://leetcode.com/
[33]
Zhengjie Miao, Sudeepa Roy, and Jun Yang. 2019. Explaining wrong queries using small examples. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD). 503–520. https://doi.org/10.1145/3299869.3319866
[34]
George C. Necula. 2000. Translation validation for an optimizing compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 83–94. https://doi.org/10.1145/349299.349314
[35]
Mauro Negri, Giuseppe Pelagatti, and Licia Sbattella. 1991. Formal semantics of SQL queries. ACM Transactions on Database Systems (TODS), 16, 3 (1991), 513–534. https://doi.org/10.1145/111197.111212
[36]
Amir Pnueli, Michael Siegel, and Eli Singerman. 1998. Translation Validation. In Proceedings of International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS) (Lecture Notes in Computer Science, Vol. 1384). 151–166. https://doi.org/10.1007/BFb0054170
[37]
Wilmer Ricciotti and James Cheney. 2019. Mixing set and bag semantics. In Proceedings of the ACM SIGPLAN International Symposium on Database Programming Languages (DBPL). ACM, 70–73. https://doi.org/10.1145/3315507.3330202
[38]
Wilmer Ricciotti and James Cheney. 2022. A Formalization of SQL with Nulls. J. Autom. Reason., 66, 4 (2022), 989–1030. https://doi.org/10.1007/s10817-022-09632-4
[39]
Praveen Seshadri, Hamid Pirahesh, and TY Cliff Leung. 1996. Complex query decorrelation. In Proceedings of the Twelfth International Conference on Data Engineering. 450–458. https://doi.org/10.1109/ICDE.1996.492194
[40]
Shetal Shah, S Sudarshan, Suhas Kajbaje, Sandeep Patidar, Bhanu Pratap Gupta, and Devang Vira. 2011. Generating test data for killing SQL mutants: A constraint-based approach. In 2011 IEEE 27th International Conference on Data Engineering. 1175–1186. https://doi.org/10.1109/ICDE.2011.5767876
[41]
Marcelo Sousa and Isil Dillig. 2016. Cartesian hoare logic for verifying k-safety properties. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 57–69. https://doi.org/10.1145/2908080.2908092
[42]
Michael Stepp, Ross Tate, and Sorin Lerner. 2011. Equality-Based Translation Validator for LLVM. In Proceedings of the International Conference on Computer Aided Verification (CAV) (Lecture Notes in Computer Science, Vol. 6806). 737–742. https://doi.org/10.1007/978-3-642-22110-1_59
[43]
Emina Torlak and Rastislav Bodík. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 530–541. https://doi.org/10.1145/2594291.2594340
[44]
Javier Tuya, María José Suárez-Cabal, and Claudio De La Riva. 2010. Full predicate coverage for testing SQL database queries. Software Testing, Verification and Reliability, 20, 3 (2010), 237–288. https://doi.org/10.1002/stvr.424
[45]
Margus Veanes, Nikolai Tillmann, and Jonathan de Halleux. 2010. Qex: Symbolic SQL query explorer. In International Conference on Logic for Programming Artificial Intelligence and Reasoning. 425–446. https://doi.org/10.1007/978-3-642-17511-4_24
[46]
Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2018. Speeding up symbolic reasoning for relational queries. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. https://doi.org/10.1145/3276527
[47]
Yuepeng Wang, Isil Dillig, Shuvendu K. Lahiri, and William R. Cook. 2018. Verifying Equivalence of Database-Driven Applications. Proc. ACM Program. Lang., 2, POPL (2018), Article 56, 29 pages. https://doi.org/10.1145/3158144
[48]
Yuepeng Wang, James Dong, Rushi Shah, and Isil Dillig. 2019. Synthesizing database programs for schema refactoring. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 286–300. https://doi.org/10.1145/3314221.3314588
[49]
Yuepeng Wang, Rushi Shah, Abby Criswell, Rong Pan, and Isil Dillig. 2020. Data Migration using Datalog Program Synthesis. Proc. VLDB Endow., 13, 7 (2020), 1006–1019. https://doi.org/10.14778/3384345.3384350
[50]
Anna Zaks and Amir Pnueli. 2008. CoVaC: Compiler Validation by Program Analysis of the Cross-Product. In Proceedings of the International Symposium on Formal Methods (FM) (Lecture Notes in Computer Science, Vol. 5014). 35–51. https://doi.org/10.1007/978-3-540-68237-0_5
[51]
Qi Zhou, Joy Arulraj, Shamkant Navathe, William Harris, and Dong Xu. 2019. Automated verification of query equivalence using satisfiability modulo theories. Proceedings of the VLDB Endowment, 12, 11 (2019), 1276–1288. https://doi.org/10.14778/3342263.3342267
[52]
Qi Zhou, Joy Arulraj, Shamkant B Navathe, William Harris, and Jinpeng Wu. 2022. SPES: A Symbolic Approach to Proving Query Equivalence Under Bag Semantics. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 2735–2748. https://doi.org/10.1109/ICDE53745.2022.00250

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA1
April 2024
1492 pages
EISSN:2475-1421
DOI:10.1145/3554316
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024
Published in PACMPL Volume 8, Issue OOPSLA1

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Equivalence Checking
  2. Program Verification
  3. Relational Databases

Qualifiers

  • Research-article

Funding Sources

  • Natural Sciences and Engineering Research Council of Canada
  • NSF (National Science Foundation)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 325
    Total Downloads
  • Downloads (Last 12 months)325
  • Downloads (Last 6 weeks)109
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media