Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Provenance-guided synthesis of Datalog programs

Published: 20 December 2019 Publication History

Abstract

We propose a new approach to synthesize Datalog programs from input-output specifications. Our approach leverages query provenance to scale the counterexample-guided inductive synthesis (CEGIS) procedure for program synthesis. In each iteration of the procedure, a SAT solver proposes a candidate Datalog program, and a Datalog solver evaluates the proposed program to determine whether it meets the desired specification. Failure to satisfy the specification results in additional constraints to the SAT solver. We propose efficient algorithms to learn these constraints based on “why” and “why not” provenance information obtained from the Datalog solver. We have implemented our approach in a tool called ProSynth and present experimental results that demonstrate significant improvements over the state-of-the-art, including in synthesizing invented predicates, reducing running times, and in decreasing variances in synthesis performance. On a suite of 40 synthesis tasks from three different domains, ProSynth is able to synthesize the desired program in 10 seconds on average per task—an order of magnitude faster than baseline approaches—and takes only under a second each for 28 of them.

Supplementary Material

Auxiliary Archive (popl20main-p236-p-aux.zip)
Appendix, containing one additional table of experimental data, and a detailed log file for the working example.
WEBM File (a62-raghothaman.webm)

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of Databases: The Logical Level (1st ed.). Pearson.
[2]
Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. 2017. Constraint-Based Synthesis of Datalog Programs. In Principles and Practice of Constraint Programming (CP 2017). Springer, 689–706.
[3]
Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo Martin, Mukund Raghothaman, Sanjit Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design (FMCAD 2013) . IEEE, 1–8.
[4]
Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. DIKU, University of Copenhagen.
[5]
Molham Aref, Balder ten Cate, Todd Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2015) . ACM, 1371–1382.
[6]
Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In Proceedings of the International Conference on Database Theory (ICDT 2001) . Springer, 316–330.
[7]
Donald Chamberlin and Raymond Boyce. 1974. SEQUEL: A Structured English Query Language. In Proceedings of the 1974 ACM SIGFIDET Workshop on Data Description, Access and Control (SIGFIDET 1974) . ACM, 249–264.
[8]
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1, 4 (2009), 379–474.
[9]
Laura Chiticariu and Wang-Chiew Tan. 2006. Debugging Schema Mappings with Routes. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006) . VLDB Endowment, 79–90.
[10]
Andrew Cropper and Stephen Muggleton. 2015. Logical Minimisation of Meta-Rules Within Meta-Interpretive Learning. In Inductive Logic Programming . Springer, 62–75.
[11]
Jacek Czerniak and Hubert Zarzycki. 2003. Application of Rough Sets in the Presumptive Diagnosis of Urinary System Diseases. In Artificial Intelligence and Security in Computing Systems. Springer, 41–51.
[12]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2008) . Springer, 337–340.
[13]
Daniel Deutch, Amir Gilad, and Yuval Moskovitch. 2015. Selective Provenance for Datalog Programs Using Topk Queries. Proceedings of the VLDB Endowment 8, 12 (Aug. 2015), 1394–1405.
[14]
Daniel Deutch, Tova Milo, Sudeepa Roy, and Val Tannen. 2014. Circuits for Datalog Provenance. In Proceedings of the 17th International Conference on Database Theory (ICDT 2014) . OpenProceedings.org, 201–212.
[15]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, 422–436.
[16]
Pierre Flener and Serap Yilmaz. 1999. Inductive Synthesis of Recursive Logic Programs: Achievements and Prospects. The Journal of Logic Programming 41, 2 (1999), 141–195.
[17]
Todd Green, Grigoris Karvounarakis, Zachary Ives, and Val Tannen. 2007b. Update Exchange with Mappings and Provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007). VLDB Endowment, 675–686. http://dl.acm.org/citation.cfm?id=1325851.1325929
[18]
Todd Green, Grigoris Karvounarakis, and Val Tannen. 2007a. Provenance Semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2007) . ACM, 31–40.
[19]
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Proceedings of the International Conference on Computer Aided Verification (CAV 2016) . Springer, 422–430.
[20]
Grigoris Karvounarakis, Zachary Ives, and Val Tannen. 2010. Querying Data Provenance. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2010) . ACM, 951–962.
[21]
Emanuel Kitzelmann. 2010. Inductive Programming: A Survey of Program Synthesis Techniques. In Approaches and Applications of Inductive Programming . Springer, 50–73.
[22]
Sven Köhler, Bertram Ludäscher, and Yannis Smaragdakis. 2012. Declarative Datalog Debugging for Mere Mortals. In Datalog in Academia and Industry . Springer, 111–122.
[23]
Seokki Lee, Bertram Ludäscher, and Boris Glavic. 2019. PUG: A Framework and Practical Implementation for Why and Why-not Provenance. The VLDB Journal 28, 1 (Feb. 2019), 47–71.
[24]
Ana Milanova, Atanas Rountev, and Barbara Ryder. 2002. Parameterized Object Sensitivity for Points-to and Side-effect Analyses for Java. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2002) . ACM, 1–11.
[25]
Stephen Muggleton. 1995. Inverse Entailment and Progol. New Generation Computing 13, 3 (Dec. 1995), 245–286.
[26]
Stephen Muggleton. 1999. Scientific Knowledge Discovery Using Inductive Logic Programming. Commun. ACM 42, 11 (Nov. 1999), 42–46.
[27]
Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. 2015. Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited. Machine Learning 100, 1 (01 July 2015), 49–73.
[28]
Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2005. Abstract DPLL and Abstract DPLL Modulo Theories. In International Conference on Logic for Programming Artificial Intelligence and Reasoning . Springer, 36–50.
[29]
J. Ross Quinlan and Mike Cameron-Jones. 1995. Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 3 (Dec. 1995), 287–312.
[30]
Luc De Raedt. 2008. Logical and Relational Learning. Springer.
[31]
Luc De Raedt and Kristian Kersting. 2008. Probabilistic Inductive Logic Programming. Springer, 1–27.
[32]
Anish Das Sarma, Martin Theobald, and Jennifer Widom. 2008. Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE 2008). IEEE, 1023–1032.
[33]
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-guided Synthesis of Datalog Programs. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, 515–527.
[34]
Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog Programs Using Numerical Relaxation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). AAAI Press, 6117–6124.
[35]
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011) . ACM, 17–30.
[36]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII) . ACM, 404–415.
[37]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, 452–466.
[38]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2018. Speeding Up Symbolic Reasoning for Relational Queries. Proceedings of the ACM on Programming Languages 2, OOPSLA, Article 157 (Oct. 2018), 25 pages.
[39]
John Whaley and Monica Lam. 2004. Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2004) . ACM, 131–144.
[40]
Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISan: Sanitizing API Usages Through Semantic Cross-Checking. In Proceedings of the 25th USENIX Security Symposium. USENIX Association, 363–378. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/yun
[41]
Andreas Zeller. 1999. Yesterday, My Program Worked. Today, It Does Not. Why?. In Proceedings of the Joint 7th European Software Engineering Conference and the 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-7) . Springer, 253–267.
[42]
Qiang Zeng, Jignesh Patel, and David Page. 2014. QuickFOIL: Scalable Inductive Logic Programming. Proceedings of the VLDB Endowment 8, 3 (Nov. 2014), 197–208.
[43]
Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On Abstraction Refinement for Program Analyses in Datalog. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2014) . ACM, 239–248.
[44]
David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Provenance for Large-scale Datalog. arXiv: 1907.05045 In submission.
[45]
Moshé Zloof. 1975. Query by Example. In Proceedings of the National Computer Conference and Exposition (AFIPS 1975). ACM, 431–438.

Cited By

View all
  • (2024)SYNTHBX: An Example-guided Synthesizer for Bidirectional Programs on RelationsJournal of Information Processing10.2197/ipsjjip.32.47132(471-486)Online publication date: 2024
  • (2024)Synthesis of Bidirectional Programs from Examples with Functional DependenciesJournal of Information Processing10.2197/ipsjjip.32.45132(451-465)Online publication date: 2024
  • (2023)Do machine learning models learn statistical rules inferred from data?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619475(25677-25693)Online publication date: 23-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 4, Issue POPL
January 2020
1984 pages
EISSN:2475-1421
DOI:10.1145/3377388
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2019
Published in PACMPL Volume 4, Issue POPL

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Counter-Example Guided Inductive Synthesis (CEGIS)
  2. Datalog
  3. Program synthesis
  4. SAT solvers
  5. Syntax-Guided Synthesis (SyGuS)
  6. data provenance

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)193
  • Downloads (Last 6 weeks)47
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SYNTHBX: An Example-guided Synthesizer for Bidirectional Programs on RelationsJournal of Information Processing10.2197/ipsjjip.32.47132(471-486)Online publication date: 2024
  • (2024)Synthesis of Bidirectional Programs from Examples with Functional DependenciesJournal of Information Processing10.2197/ipsjjip.32.45132(451-465)Online publication date: 2024
  • (2023)Do machine learning models learn statistical rules inferred from data?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619475(25677-25693)Online publication date: 23-Jul-2023
  • (2023)Relational Query Synthesis ⋈ Decision Tree LearningProceedings of the VLDB Endowment10.14778/3626292.362630617:2(250-263)Online publication date: 1-Oct-2023
  • (2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 24-Aug-2023
  • (2023)Saggitarius: A DSL for Specifying Grammatical DomainsProceedings of the ACM on Programming Languages10.1145/36228697:OOPSLA2(2023-2051)Online publication date: 16-Oct-2023
  • (2023)Mobius: Synthesizing Relational Queries with Recursive and Invented PredicatesProceedings of the ACM on Programming Languages10.1145/36228477:OOPSLA2(1394-1417)Online publication date: 16-Oct-2023
  • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
  • (2023)From SMT to ASP: Solver-Based Approaches to Solving Datalog Synthesis-as-Rule-Selection ProblemsProceedings of the ACM on Programming Languages10.1145/35712007:POPL(185-217)Online publication date: 11-Jan-2023
  • (2023)Synthesizing Formal Network Specifications From Input-Output ExamplesIEEE/ACM Transactions on Networking10.1109/TNET.2022.320855131:3(994-1009)Online publication date: Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media