Nothing Special   »   [go: up one dir, main page]

skip to main content
Public Access

Component-based synthesis of table consolidation and transformation tasks from examples

Published: 14 June 2017 Publication History


This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.


Motivating Example 1. http://stackoverflow. com/questions/30399516/complex-datareshaping-in-r. Accessed 27-Mar-2017.
Motivating Example 2. questions/33207263/finding-proportionsin-flights-dataset-in-r. Accessed 27-Mar-2017.
Motivating Example 3. http://stackoverflow. com/questions/32875699/how-to-combinetwo-data-frames-in-r-see-details. Accessed 27-Mar-2017.
Morpheus. morpheus/. Accessed 27-Mar-2017.
A. Albarghouthi, S. Gulwani, and Z. Kincaid. Recursive Program Synthesis. In Proc. International Conference on Computer Aided Verification, pages 934–950. Springer, 2013.
D. W. Barowy, S. Gulwani, T. Hart, and B. G. Zorn. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. In Proc. Conference on Programming Language Design and Implementation, pages 218–228. ACM, 2015.
T. Dasu and T. Johnson. Exploratory data mining and data cleaning, volume 479. John Wiley & Sons, 2003.
L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. Tools and Algorithms for Construction and Analysis of Systems, pages 337–340. Springer, 2008.
Y. Feng, R. Martins, Y. Wang, I. Dillig, and T. Reps. Component-Based Synthesis for Complex APIs. In Proc. Symposium on Principles of Programming Languages. ACM, 2017.
J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In Proc. Conference on Programming Language Design and Implementation, pages 229–239. ACM, 2015.
J. Frankle, P. Osera, D. Walker, and S. Zdancewic. Exampledirected synthesis: a type-theoretic interpretation. In Proc. Symposium on Principles of Programming Languages, pages 802–815. ACM, 2016.
S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proc. Symposium on Principles of Programming Languages, pages 317–330. ACM, 2011.
S. Gulwani. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, volume 46, pages 317–330. ACM, 2011.
S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In Proc. Conference on Programming Language Design and Implementation, pages 62–73. ACM, 2011.
P. J. Guo, S. Kandel, J. M. Hellerstein, and J. Heer. Proactive Wrangling: Mixed-initiative End-user Programming of Data Transformation Scripts. In Proc. Symposium on User Interface Software and Technology, pages 65–74. ACM, 2011.
T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proc. Conference on Programming Language Design and Implementation, pages 27–38. ACM, 2013.
W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In Proc. Conference on Programming Language Design and Implementation, pages 317–328. ACM, 2011.
S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In Proc. International Conference on Software Engineering, pages 215–224. IEEE, 2010.
T. A. Johnson and R. Eigenmann. Context-sensitive domainindependent algorithm composition and selection. In Proc. Conference on Programming Language Design and Implementation, pages 181–192. ACM, 2006.
S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In Proc. International Conference on Human Factors in Computing Systems, pages 3363–3372. ACM, 2011.
E. Kitzelmann. A combined analytical and search-based approach for the inductive synthesis of functional programs. Künstliche Intelligenz, 25(2):179–182, 2011.
V. Le and S. Gulwani. FlashExtract: a framework for data extraction by examples. In Proc. Conference on Programming Language Design and Implementation, pages 542–553. ACM, 2014.
D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid mining: helping to navigate the API jungle. In Proc. Conference on Programming Language Design and Implementation, pages 48–61. ACM, 2005.
P.-M. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 619–630. ACM, 2015.
D. Perelman, S. Gulwani, D. Grossman, and P. Provost. Testdriven synthesis. In Proc. Conference on Programming Language Design and Implementation, page 43. ACM, 2014.
N. Polikarpova, I. Kuraj, and A. Solar-Lezama. Program synthesis from polymorphic refinement types. In Proc. Conference on Programming Language Design and Implementation, pages 522–538. ACM, 2016.
O. Polozov and S. Gulwani. FlashMeta: A framework for inductive program synthesis. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126. ACM, 2015.
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proc. Conference on Programming Language Design and Implementation, pages 419–428. ACM, 2014.
P. M. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In Proc. Conference on Programming Language Design and Implementation, pages 159–169. ACM, 2008.
C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 326–340. ACM, 2016.
A. Solar-Lezama, R. M. Rabbah, R. Bod´ık, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In Proc. Conference on Programming Language Design and Implementation, pages 281–294. ACM, 2005.
A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems, pages 404– 415. ACM, 2006.
A. Solar-Lezama, G. Arnold, L. Tancau, R. Bod´ık, V. A. Saraswat, and S. A. Seshia. Sketching stencils. In Proc. Conference on Programming Language Design and Implementation, pages 167–178. ACM, 2007.
A. Stolcke. SRILM - an extensible language modeling toolkit. In Proc. International Conference on Spoken Language Processing, pages 901–904. ISCA, 2002.
P. Vekris, B. Cosman, and R. Jhala. Refinement types for typescript. In Proc. Conference on Programming Language Design and Implementation, pages 310–325. ACM, 2016.
N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In Proc. Conference on Programming Language Design and Implementation, pages 508–521. ACM, 2016.
S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In Proc. International Conference on Automated Software Engineering, pages 224–234. IEEE, 2013.
Introduction Motivating Examples Problem Formulation Hypotheses as Refinement Trees Synthesis Algorithm SMT-based Deduction Sketch Completion Implementation Evaluation Related Work Conclusion

Cited By

View all
  • (2024)Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?Proceedings of the ACM on Software Engineering10.1145/36608071:FSE(2261-2284)Online publication date: 12-Jul-2024
  • (2024)Query Reverse Engineering of Pre-deleted Uncorrelated OperatorsData Mining and Big Data10.1007/978-981-97-0844-4_4(45-58)Online publication date: 22-Feb-2024
  • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 6
PLDI '17
June 2017
708 pages
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2017
    708 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017
Published in SIGPLAN Volume 52, Issue 6

Check for updates

Author Tags

  1. Component-based synthesis
  2. Data preparation
  3. Program synthesis
  4. Programming by example
  5. SMT-based deduction


  • Article

Funding Sources


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)44
Reflects downloads up to 08 Feb 2025

Other Metrics


Cited By

View all
  • (2024)Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?Proceedings of the ACM on Software Engineering10.1145/36608071:FSE(2261-2284)Online publication date: 12-Jul-2024
  • (2024)Query Reverse Engineering of Pre-deleted Uncorrelated OperatorsData Mining and Big Data10.1007/978-981-97-0844-4_4(45-58)Online publication date: 22-Feb-2024
  • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
  • (2023)Inductive Program Synthesis via Iterative Forward-Backward Abstract InterpretationProceedings of the ACM on Programming Languages10.1145/35912887:PLDI(1657-1681)Online publication date: 6-Jun-2023
  • (2023)Automated Translation of Functional Big Data Queries to SQLProceedings of the ACM on Programming Languages10.1145/35860477:OOPSLA1(580-608)Online publication date: 6-Apr-2023
  • (2023)Survey of intelligent program synthesis techniquesInternational Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023)10.1117/12.3011627(119)Online publication date: 7-Dec-2023
  • (2023)Visualizing the Scripts of Data Wrangling With SomnusIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314497529:6(2950-2964)Online publication date: 1-Jun-2023
  • (2023)SQL Synthesis with Input-Output Example Based on Deep Learning2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191168(1-8)Online publication date: 18-Jun-2023
  • (2023)PyEvolve: Automating Frequent Code Changes in Python ML SystemsProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00091(995-1007)Online publication date: 14-May-2023
  • (2023)Synthesising Programs with Non-trivial ConstantsJournal of Automated Reasoning10.1007/s10817-023-09664-467:2Online publication date: 13-May-2023
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


Login options






Share this Publication link

Share on social media