Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Mobius: Synthesizing Relational Queries with Recursive and Invented Predicates

Published: 16 October 2023 Publication History

Abstract

Synthesizing relational queries from data is challenging in the presence of recursion and invented predicates. We propose a fully automated approach to synthesize such queries. Our approach comprises of two steps: it first synthesizes a non-recursive query consistent with the given data, and then identifies recursion schemes in it and thereby generalizes to arbitrary data. This generalization is achieved by an iterative predicate unification procedure which exploits the notion of data provenance to accelerate convergence. In each iteration of the procedure, a constraint solver proposes a candidate query, and a query evaluator checks if the proposed program is consistent with the given data. The data provenance for a failed query allows us to construct additional constraints for the constraint solver and refine the search. We have implemented our approach in a tool named Mobius. On a suite of 21 challenging recursive query synthesis tasks, Mobius outperforms three state-of-the-art baselines Gensynth, ILASP, and Popper, both in terms of runtime and accuracy. We also demonstrate that the synthesized queries generalize well to unseen data.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.
[2]
Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis. In Computer Aided Verification. Springer Berlin Heidelberg, 934–950. https://doi.org/10.1007/978-3-642-39799-8_67
[3]
Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. 2017. Constraint-based synthesis of Datalog programs. In Proceedings of the International Conference on Principles and Practice of Constraint Programming (CP).
[4]
Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph. D. Dissertation. DIKU, University of Copenhagen.
[5]
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, 1, 4 (2009).
[6]
Andrew Cropper, Sebastijan Dumancic, and Stephen H. Muggleton. 2020. Turning 30: New Ideas in Inductive Logic Programming. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[7]
Andrew Cropper and Rolf Morel. 2021. Learning Programs by Learning from Failures. Machine Learning, 110 (2021), 801–856.
[8]
Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019. Neural Logic Machines. In Proceedings of the 7th International Conference on Learning Representations (ICLR). https://doi.org/10.1145/3340531.3411949
[9]
Richard Evans and Edward Grefenstette. 2018. Learning Explanatory Rules from Noisy Data. J. Artif. Int. Res., 61, 1 (2018), Jan., 1–64. issn:1076-9757
[10]
Azadeh Farzan and Victor Nicolet. 2021. Counterexample-Guided Partial Bounding for Recursive Function Synthesis. In Computer Aided Verification. Springer International Publishing, 832–855. https://doi.org/10.1007/978-3-030-81685-8_39
[11]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. SIGPLAN Not., 53, 4 (2018), jun, 420–435. issn:0362-1340 https://doi.org/10.1145/3296979.3192382
[12]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 420–435. isbn:9781450356985 https://doi.org/10.1145/3192366.3192382
[13]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL ' 11. ACM Press. https://doi.org/10.1145/1926385.1926423
[14]
Daniel Conrad Halbert. 1984. Programming by Example. Ph. D. Dissertation. AAI8512843
[15]
Shachar Itzhaky, Hila Peleg, Nadia Polikarpova, Reuben N. S. Rowe, and Ilya Sergey. 2021. Cyclic program synthesis. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ACM. https://doi.org/10.1145/3453483.3454087
[16]
Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question selection for interactive program synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM. https://doi.org/10.1145/3385412.3386025
[17]
Ruyi Ji, Jingtao Xia, Yingfei Xiong, and Zhenjiang Hu. 2021. Generalizable synthesis through unification. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), Oct., 1–28. https://doi.org/10.1145/3485544
[18]
Mark Law, Alessandra Russo, and Krysia Broda. 2020. The ILASP system for Inductive Learning of Answer Set Programs. CoRR, abs/2005.00904 (2020).
[19]
Vu Le, Daniel Perelman, Oleksandr Polozov, Mohammad Raza, Abhishek Udupa, and Sumit Gulwani. 2017. Interactive Program Synthesis. ArXiv, abs/1703.03539 (2017).
[20]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for Simple Program Repairs. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE. https://doi.org/10.1109/icse.2015.63
[21]
Jonathan Mendelson, Aaditya Naik, Mukund Raghothaman, and Mayur Naik. 2021. GenSynth: Synthesizing Datalog Programs without Language Bias. In Proceedings of the Conference on Artificial Intelligence (AAAI).
[22]
Ana Milanova, Atanas Rountev, and Barbara Ryder. 2002. Parameterized Object Sensitivity for Points-to and Side-effect Analyses for Java. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2002). ACM, 1–11. isbn:1-58113-562-9
[23]
Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. 2015. Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited. Machine Learning, 100, 1 (2015), 01 July, 49–73. issn:1573-0565 https://doi.org/10.1007/s10994-014-5471-y
[24]
Saswat Padhi, Prateek Jain, Daniel Perelman, Oleksandr Polozov, Sumit Gulwani, and Todd Millstein. 2018. FlashProfile: a framework for synthesizing data profiles. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), Oct., 1–28. https://doi.org/10.1145/3276520
[25]
Mukund Raghothaman, Jonathan Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz. 2020. Provenance-guided synthesis of Datalog programs. In Proceedings of the ACM Symposium on Principles of Programming Languages (POPL).
[26]
Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM. https://doi.org/10.1145/2837614.2837671
[27]
Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end Differentiable Proving. In Advances in Neural Information Processing Systems (NeurIPS).
[28]
Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end Differentiable Proving. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/b2ab001909a8a6f04b51920306046ce5-Paper.pdf
[29]
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-guided Synthesis of Datalog Programs. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3236024.3236034
[30]
Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog programs using numerical relaxation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[31]
Michael Sipser. 2012. Introduction to the Theory of Computation. Cengage Learning.
[32]
Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011). ACM, 17–30. isbn:978-1-4503-0490-0 https://doi.org/10.1145/1926385.1926390
[33]
Aalok Thakkar, Aaditya Naik, Nathaniel Sands, Rajeev Alur, Mayur Naik, and Mukund Raghothaman. 2021. Example-guided synthesis of relational queries. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).
[34]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Interactive Query Synthesis from Input-Output Examples. In Proceedings of the ACM Conference on Management of Data (SIGMOD).
[35]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Interactive Query Synthesis from Input-Output Examples. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM. https://doi.org/10.1145/3035918.3058738
[36]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI).
[37]
John Whaley and Monica Lam. 2004. Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2004). ACM, 131–144. isbn:1-58113-807-5 https://doi.org/10.1145/996841.996859
[38]
Fan Yang, Zhilin Yang, and William Cohen. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Advances in Neural Information Processing Systems (NeurIPS).
[39]
Sai Zhang and Yuyin Sun. 2013. Automatically synthesizing SQL queries from input-output examples. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE. https://doi.org/10.1109/ase.2013.6693082
[40]
David Zhao, Pavle Subotic, and Bernhard Scholz. 2020. Debugging Large-scale Datalog: A Scalable Provenance Evaluation Strategy. ACM Trans. Program. Lang. Syst., 42, 2 (2020).

Cited By

View all
  • (2024)Computable Relations Mapping with Horn Clauses for Inductive Program SynthesisKnowledge Management and Acquisition for Intelligent Systems10.1007/978-981-96-0026-7_2(15-28)Online publication date: 16-Nov-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 7, Issue OOPSLA2
October 2023
2250 pages
EISSN:2475-1421
DOI:10.1145/3554312
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2023
Published in PACMPL Volume 7, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Programming by example
  2. example-guided synthesis
  3. recursive program synthesis

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)249
  • Downloads (Last 6 weeks)39
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computable Relations Mapping with Horn Clauses for Inductive Program SynthesisKnowledge Management and Acquisition for Intelligent Systems10.1007/978-981-96-0026-7_2(15-28)Online publication date: 16-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media