research-article

Open access

VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints

Authors:

Yuepeng WangAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 8, Issue OOPSLA1

Article No.: 132, Pages 1071 - 1099

https://doi.org/10.1145/3649849

Published: 29 April 2024 Publication History

Abstract

The task of SQL query equivalence checking is important in various real-world applications (including query rewriting and automated grading) that involve complex queries with integrity constraints; yet, state-of-the-art techniques are very limited in their capability of reasoning about complex features (e.g., those that involve sorting, case statement, rich integrity constraints, etc.) in real-life queries. To the best of our knowledge, we propose the first SMT-based approach and its implementation, VeriEQL, capable of proving and disproving bounded equivalence of complex SQL queries. VeriEQL is based on a new logical encoding that models query semantics over symbolic tuples using the theory of integers with uninterpreted functions. It is simple yet highly practical -- our comprehensive evaluation on over 20,000 benchmarks shows that VeriEQL outperforms all state-of-the-art techniques by more than one order of magnitude in terms of the number of benchmarks that can be proved or disproved. VeriEQL can also generate counterexamples that facilitate many downstream tasks (such as finding serious bugs in systems like MySQL and Apache Calcite).

References

[1]

Alfred V. Aho, Yehoshua Sagiv, and Jeffrey D. Ullman. 1979. Equivalences among Relational Expressions. SIAM J. Comput., 8, 2 (1979), 218–246. https://doi.org/10.1137/0208017

Digital Library

[2]

Gilles Barthe, Juan Manuel Crespo, and César Kunz. 2011. Relational Verification Using Product Programs. In Proceedings of the International Symposium on Formal Methods (FM) (Lecture Notes in Computer Science, Vol. 6664). 200–214. https://doi.org/10.1007/978-3-642-21437-0_17

[3]

Nick Benton. 2004. Simple relational correctness proofs for static analyses and program transformations. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 14–25. https://doi.org/10.1145/964001.964003

Digital Library

[4]

Véronique Benzaken and Evelyne Contejean. 2019. A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs. 249–261. https://doi.org/10.1145/3293880.3294107

Digital Library

[5]

Calcite. 2023. Calcite Optimization Rules Test Suite. https://github.com/apache/calcite/blob/main/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java

[6]

Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panichella, and Arie van Deursen. 2018. Search-based test data generation for SQL queries. In Proceedings of the 40th international conference on software engineering. 1220–1230. https://doi.org/10.1145/3180155.3180202

Digital Library

[7]

Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In Proceedings of the ACM Symposium on Theory of Computing (STOC). ACM, 77–90. https://doi.org/10.1145/800105.803397

Digital Library

[8]

Bikash Chandra, Ananyo Banerjee, Udbhas Hazra, Mathew Joseph, and S Sudarshan. 2019. Automated grading of sql queries. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1630–1633. https://doi.org/10.1109/ICDE.2019.00159

[9]

Bikash Chandra, Bhupesh Chawda, Biplab Kar, KV Maheshwara Reddy, Shetal Shah, and S Sudarshan. 2015. Data generation for testing and grading SQL queries. The VLDB Journal, 24, 6 (2015), 731–755. https://doi.org/10.1007/s00778-015-0395-0

Digital Library

[10]

James Cheney and Wilmer Ricciotti. 2021. Comprehending nulls. In Proceedings of the International Symposium on Database Programming Languages (DBPL). ACM, 3–6. https://doi.org/10.1145/3475726.3475730

Digital Library

[11]

Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, (PLDI). ACM, 3–14. https://doi.org/10.1145/2491956.2462180

Digital Library

[12]

Shumo Chu, Daniel Li, Chenglong Wang, Alvin Cheung, and Dan Suciu. 2017. Demonstration of the cosette automated sql prover. In Proceedings of the 2017 ACM International Conference on Management of Data. 1591–1594. https://doi.org/10.1145/3035918.3058728

Digital Library

[13]

Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, and Dan Suciu. 2018. Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries. Proc. VLDB Endow., 11, 11 (2018), 1482–1495. https://doi.org/10.14778/3236187.3236200

Digital Library

[14]

Shumo Chu, Chenglong Wang, Konstantin Weitz, and Alvin Cheung. 2017. Cosette: An Automated Prover for SQL. In CIDR.

[15]

Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017. HoTTSQL: Proving query rewrites with univalent SQL semantics. ACM SIGPLAN Notices, 52, 6 (2017), 510–524. https://doi.org/10.1145/3062341.3062348

Digital Library

[16]

Edmund Clarke, Daniel Kroening, and Flavio Lerda. 2004. A tool for checking ANSI-C programs. In Tools and Algorithms for the Construction and Analysis of Systems: 10th International Conference, TACAS 2004, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2004, Barcelona, Spain, March 29-April 2, 2004. Proceedings 10. 168–176. https://doi.org/10.1007/978-3-540-24730-2_15

[17]

Fabien Coelho. 2013. DataFiller – generate random data from database schema. https://www.cri.ensmp.fr/people/coelho/datafiller.html

[18]

Lucas Cordeiro, Pascal Kesseli, Daniel Kroening, Peter Schrammel, and Marek Trtik. 2018. JBMC: A bounded model checking tool for verifying Java bytecode. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part I. 183–190. https://doi.org/10.1007/978-3-319-96145-3_10

[19]

Cosette. 2018. Cosette website. https://cosette.cs.washington.edu/

[20]

Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS) (Lecture Notes in Computer Science, Vol. 4963). 337–340. https://doi.org/10.1007/978-3-540-78800-3_24

[21]

Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). ACM, 689–700. https://doi.org/10.1145/2676726.2677006

Digital Library

[22]

Alin Deutsch, Monica Marcus, Liying Sui, Victor Vianu, and Dayou Zhou. 2005. A Verifier for Interactive, Data-Driven Web Applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 539–550. https://doi.org/10.1145/1066157.1066219

Digital Library

[23]

Alin Deutsch, Liying Sui, and Victor Vianu. 2007. Specification and verification of data-driven Web applications. J. Comput. Syst. Sci., 73, 3 (2007), 442–474. https://doi.org/10.1016/j.jcss.2006.10.006

Digital Library

[24]

John K. Feser, Sam Madden, Nan Tang, and Armando Solar-Lezama. 2020. Deductive optimization of relational data storage. Proc. ACM Program. Lang., 4, OOPSLA (2020), 170:1–170:30. https://doi.org/10.1145/3428238

Digital Library

[25]

Milos Gligoric and Rupak Majumdar. 2013. Model checking database applications. In Tools and Algorithms for the Construction and Analysis of Systems: 19th International Conference, TACAS 2013, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings 19. 549–564. https://doi.org/10.1007/978-3-642-36742-7_40

Digital Library

[26]

Goetz Graefe. 1995. The cascades framework for query optimization. IEEE Data Eng. Bull., 18, 3 (1995), 19–29.

[27]

Todd J. Green. 2011. Containment of Conjunctive Queries on Annotated Relations. Theory Comput. Syst., 49, 2 (2011), 429–459. https://doi.org/10.1007/s00224-011-9327-6

Digital Library

[28]

Paolo Guagliardo and Leonid Libkin. 2017. A Formal Semantics of SQL Queries, Its Validation, and Applications. Proc. VLDB Endow., 11, 1 (2017), sep, 27–39. issn:2150-8097 https://doi.org/10.14778/3151113.3151116

Digital Library

[29]

Yang He, Pinhan Zhao, Xinyu Wang, and Yuepeng Wang. 2024. Artifact Evaluation VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints. https://doi.org/10.5281/zenodo.10795614

[30]

Yang He, Pinhan Zhao, Xinyu Wang, and Yuepeng Wang. 2024. VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints. arxiv:2403.03193.

[31]

Akash Lal, Shaz Qadeer, and Shuvendu K Lahiri. 2012. A solver for reachability modulo theories. In Computer Aided Verification: 24th International Conference, CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceedings 24. 427–443. https://doi.org/10.1007/978-3-642-31424-7_32

Digital Library

[32]

LeetCode. 2023. LeetCode website. https://leetcode.com/

[33]

Zhengjie Miao, Sudeepa Roy, and Jun Yang. 2019. Explaining wrong queries using small examples. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD). 503–520. https://doi.org/10.1145/3299869.3319866

Digital Library

[34]

George C. Necula. 2000. Translation validation for an optimizing compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 83–94. https://doi.org/10.1145/349299.349314

Digital Library

[35]

Mauro Negri, Giuseppe Pelagatti, and Licia Sbattella. 1991. Formal semantics of SQL queries. ACM Transactions on Database Systems (TODS), 16, 3 (1991), 513–534. https://doi.org/10.1145/111197.111212

Digital Library

[36]

Amir Pnueli, Michael Siegel, and Eli Singerman. 1998. Translation Validation. In Proceedings of International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS) (Lecture Notes in Computer Science, Vol. 1384). 151–166. https://doi.org/10.1007/BFb0054170

[37]

Wilmer Ricciotti and James Cheney. 2019. Mixing set and bag semantics. In Proceedings of the ACM SIGPLAN International Symposium on Database Programming Languages (DBPL). ACM, 70–73. https://doi.org/10.1145/3315507.3330202

Digital Library

[38]

Wilmer Ricciotti and James Cheney. 2022. A Formalization of SQL with Nulls. J. Autom. Reason., 66, 4 (2022), 989–1030. https://doi.org/10.1007/s10817-022-09632-4

Digital Library

[39]

Praveen Seshadri, Hamid Pirahesh, and TY Cliff Leung. 1996. Complex query decorrelation. In Proceedings of the Twelfth International Conference on Data Engineering. 450–458. https://doi.org/10.1109/ICDE.1996.492194

[40]

Shetal Shah, S Sudarshan, Suhas Kajbaje, Sandeep Patidar, Bhanu Pratap Gupta, and Devang Vira. 2011. Generating test data for killing SQL mutants: A constraint-based approach. In 2011 IEEE 27th International Conference on Data Engineering. 1175–1186. https://doi.org/10.1109/ICDE.2011.5767876

Digital Library

[41]

Marcelo Sousa and Isil Dillig. 2016. Cartesian hoare logic for verifying k-safety properties. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 57–69. https://doi.org/10.1145/2908080.2908092

Digital Library

[42]

Michael Stepp, Ross Tate, and Sorin Lerner. 2011. Equality-Based Translation Validator for LLVM. In Proceedings of the International Conference on Computer Aided Verification (CAV) (Lecture Notes in Computer Science, Vol. 6806). 737–742. https://doi.org/10.1007/978-3-642-22110-1_59

[43]

Emina Torlak and Rastislav Bodík. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 530–541. https://doi.org/10.1145/2594291.2594340

Digital Library

[44]

Javier Tuya, María José Suárez-Cabal, and Claudio De La Riva. 2010. Full predicate coverage for testing SQL database queries. Software Testing, Verification and Reliability, 20, 3 (2010), 237–288. https://doi.org/10.1002/stvr.424

[45]

Margus Veanes, Nikolai Tillmann, and Jonathan de Halleux. 2010. Qex: Symbolic SQL query explorer. In International Conference on Logic for Programming Artificial Intelligence and Reasoning. 425–446. https://doi.org/10.1007/978-3-642-17511-4_24

[46]

Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2018. Speeding up symbolic reasoning for relational queries. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. https://doi.org/10.1145/3276527

Digital Library

[47]

Yuepeng Wang, Isil Dillig, Shuvendu K. Lahiri, and William R. Cook. 2018. Verifying Equivalence of Database-Driven Applications. Proc. ACM Program. Lang., 2, POPL (2018), Article 56, 29 pages. https://doi.org/10.1145/3158144

Digital Library

[48]

Yuepeng Wang, James Dong, Rushi Shah, and Isil Dillig. 2019. Synthesizing database programs for schema refactoring. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 286–300. https://doi.org/10.1145/3314221.3314588

Digital Library

[49]

Yuepeng Wang, Rushi Shah, Abby Criswell, Rong Pan, and Isil Dillig. 2020. Data Migration using Datalog Program Synthesis. Proc. VLDB Endow., 13, 7 (2020), 1006–1019. https://doi.org/10.14778/3384345.3384350

Digital Library

[50]

Anna Zaks and Amir Pnueli. 2008. CoVaC: Compiler Validation by Program Analysis of the Cross-Product. In Proceedings of the International Symposium on Formal Methods (FM) (Lecture Notes in Computer Science, Vol. 5014). 35–51. https://doi.org/10.1007/978-3-540-68237-0_5

Digital Library

[51]

Qi Zhou, Joy Arulraj, Shamkant Navathe, William Harris, and Dong Xu. 2019. Automated verification of query equivalence using satisfiability modulo theories. Proceedings of the VLDB Endowment, 12, 11 (2019), 1276–1288. https://doi.org/10.14778/3342263.3342267

Digital Library

[52]

Qi Zhou, Joy Arulraj, Shamkant B Navathe, William Harris, and Jinpeng Wu. 2022. SPES: A Symbolic Approach to Proving Query Equivalence Under Bag Semantics. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 2735–2748. https://doi.org/10.1109/ICDE53745.2022.00250

Index Terms

VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Formal software verification
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Software verification
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program verification

Recommendations

Demonstration of the VeriEQL Equivalence Checker for Complex SQL Queries

Equivalence checking for SQL queries has many real-world applications but typically requires supporting an expressive SQL language in order to be practical. We develop VeriEQL, a system that can prove and disprove equivalence of complex SQL queries. ...
Verifying equivalence of database-driven applications

This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database ...
A Relational Algebraic Approach to Protocol Verification

Communications protocols are usually modeled by a pair of finite-state machines that generate the interaction between processes. Protocol verification is a procedure to validate the logical correctness of these interaction sequences and to detect ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 8, Issue OOPSLA1

April 2024

1492 pages

EISSN:2475-1421

DOI:10.1145/3554316

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024

Published in PACMPL Volume 8, Issue OOPSLA1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada
NSF (National Science Foundation)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
325
Total Downloads

Downloads (Last 12 months)325
Downloads (Last 6 weeks)109

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents