Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3062341.3062351acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

Component-based synthesis of table consolidation and transformation tasks from examples

Published: 14 June 2017 Publication History

Abstract

This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.

References

[1]
Motivating Example 1. http://stackoverflow. com/questions/30399516/complex-datareshaping-in-r. Accessed 27-Mar-2017.
[2]
Motivating Example 2. http://stackoverflow.com/ questions/33207263/finding-proportionsin-flights-dataset-in-r. Accessed 27-Mar-2017.
[3]
Motivating Example 3. http://stackoverflow. com/questions/32875699/how-to-combinetwo-data-frames-in-r-see-details. Accessed 27-Mar-2017.
[4]
Morpheus. https://utopia-group.github.io/ morpheus/. Accessed 27-Mar-2017.
[5]
A. Albarghouthi, S. Gulwani, and Z. Kincaid. Recursive Program Synthesis. In Proc. International Conference on Computer Aided Verification, pages 934–950. Springer, 2013.
[6]
D. W. Barowy, S. Gulwani, T. Hart, and B. G. Zorn. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. In Proc. Conference on Programming Language Design and Implementation, pages 218–228. ACM, 2015.
[7]
T. Dasu and T. Johnson. Exploratory data mining and data cleaning, volume 479. John Wiley & Sons, 2003.
[8]
L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. Tools and Algorithms for Construction and Analysis of Systems, pages 337–340. Springer, 2008.
[9]
Y. Feng, R. Martins, Y. Wang, I. Dillig, and T. Reps. Component-Based Synthesis for Complex APIs. In Proc. Symposium on Principles of Programming Languages. ACM, 2017.
[10]
J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In Proc. Conference on Programming Language Design and Implementation, pages 229–239. ACM, 2015.
[11]
J. Frankle, P. Osera, D. Walker, and S. Zdancewic. Exampledirected synthesis: a type-theoretic interpretation. In Proc. Symposium on Principles of Programming Languages, pages 802–815. ACM, 2016.
[12]
S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proc. Symposium on Principles of Programming Languages, pages 317–330. ACM, 2011.
[13]
S. Gulwani. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, volume 46, pages 317–330. ACM, 2011.
[14]
S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In Proc. Conference on Programming Language Design and Implementation, pages 62–73. ACM, 2011.
[15]
P. J. Guo, S. Kandel, J. M. Hellerstein, and J. Heer. Proactive Wrangling: Mixed-initiative End-user Programming of Data Transformation Scripts. In Proc. Symposium on User Interface Software and Technology, pages 65–74. ACM, 2011.
[16]
T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proc. Conference on Programming Language Design and Implementation, pages 27–38. ACM, 2013.
[17]
W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In Proc. Conference on Programming Language Design and Implementation, pages 317–328. ACM, 2011.
[18]
S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In Proc. International Conference on Software Engineering, pages 215–224. IEEE, 2010.
[19]
T. A. Johnson and R. Eigenmann. Context-sensitive domainindependent algorithm composition and selection. In Proc. Conference on Programming Language Design and Implementation, pages 181–192. ACM, 2006.
[20]
S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In Proc. International Conference on Human Factors in Computing Systems, pages 3363–3372. ACM, 2011.
[21]
E. Kitzelmann. A combined analytical and search-based approach for the inductive synthesis of functional programs. Künstliche Intelligenz, 25(2):179–182, 2011.
[22]
V. Le and S. Gulwani. FlashExtract: a framework for data extraction by examples. In Proc. Conference on Programming Language Design and Implementation, pages 542–553. ACM, 2014.
[23]
D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid mining: helping to navigate the API jungle. In Proc. Conference on Programming Language Design and Implementation, pages 48–61. ACM, 2005.
[24]
P.-M. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 619–630. ACM, 2015.
[25]
D. Perelman, S. Gulwani, D. Grossman, and P. Provost. Testdriven synthesis. In Proc. Conference on Programming Language Design and Implementation, page 43. ACM, 2014.
[26]
N. Polikarpova, I. Kuraj, and A. Solar-Lezama. Program synthesis from polymorphic refinement types. In Proc. Conference on Programming Language Design and Implementation, pages 522–538. ACM, 2016.
[27]
O. Polozov and S. Gulwani. FlashMeta: A framework for inductive program synthesis. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126. ACM, 2015.
[28]
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proc. Conference on Programming Language Design and Implementation, pages 419–428. ACM, 2014.
[29]
P. M. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In Proc. Conference on Programming Language Design and Implementation, pages 159–169. ACM, 2008.
[30]
C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 326–340. ACM, 2016.
[31]
A. Solar-Lezama, R. M. Rabbah, R. Bod´ık, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In Proc. Conference on Programming Language Design and Implementation, pages 281–294. ACM, 2005.
[32]
A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems, pages 404– 415. ACM, 2006.
[33]
A. Solar-Lezama, G. Arnold, L. Tancau, R. Bod´ık, V. A. Saraswat, and S. A. Seshia. Sketching stencils. In Proc. Conference on Programming Language Design and Implementation, pages 167–178. ACM, 2007.
[34]
A. Stolcke. SRILM - an extensible language modeling toolkit. In Proc. International Conference on Spoken Language Processing, pages 901–904. ISCA, 2002.
[35]
P. Vekris, B. Cosman, and R. Jhala. Refinement types for typescript. In Proc. Conference on Programming Language Design and Implementation, pages 310–325. ACM, 2016.
[36]
N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In Proc. Conference on Programming Language Design and Implementation, pages 508–521. ACM, 2016.
[37]
S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In Proc. International Conference on Automated Software Engineering, pages 224–234. IEEE, 2013.
[38]
Introduction Motivating Examples Problem Formulation Hypotheses as Refinement Trees Synthesis Algorithm SMT-based Deduction Sketch Completion Implementation Evaluation Related Work Conclusion

Cited By

View all
  • (2024)Synthesis of Bidirectional Programs from Examples with Functional DependenciesJournal of Information Processing10.2197/ipsjjip.32.45132(451-465)Online publication date: 2024
  • (2024)Refinement Types for VisualizationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695550(1871-1881)Online publication date: 27-Oct-2024
  • (2024)BatFix: Repairing language model-based transpilationACM Transactions on Software Engineering and Methodology10.1145/365866833:6(1-29)Online publication date: 27-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2017
708 pages
ISBN:9781450349888
DOI:10.1145/3062341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Component-based synthesis
  2. Data preparation
  3. Program synthesis
  4. Programming by example
  5. SMT-based deduction

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)210
  • Downloads (Last 6 weeks)32
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Synthesis of Bidirectional Programs from Examples with Functional DependenciesJournal of Information Processing10.2197/ipsjjip.32.45132(451-465)Online publication date: 2024
  • (2024)Refinement Types for VisualizationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695550(1871-1881)Online publication date: 27-Oct-2024
  • (2024)BatFix: Repairing language model-based transpilationACM Transactions on Software Engineering and Methodology10.1145/365866833:6(1-29)Online publication date: 27-Jun-2024
  • (2024)Reverse-Engineering Congestion Control Algorithm BehaviorProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688443(401-414)Online publication date: 4-Nov-2024
  • (2024)Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by ExampleProceedings of the ACM on Software Engineering10.1145/36437551:FSE(631-653)Online publication date: 12-Jul-2024
  • (2024)Enhanced Enumeration Techniques for Syntax-Guided Synthesis of Bit-Vector ManipulationsProceedings of the ACM on Programming Languages10.1145/36329138:POPL(2129-2159)Online publication date: 5-Jan-2024
  • (2024)AL-SQUARES: SQL Synthesis System with the Addition of ReducerData Mining and Big Data10.1007/978-981-97-0844-4_3(33-44)Online publication date: 22-Feb-2024
  • (2024)Relational Synthesis of Recursive Programs via Constraint Annotated Tree AutomataComputer Aided Verification10.1007/978-3-031-65633-0_3(41-63)Online publication date: 24-Jul-2024
  • (2024)Towards Reliable SQL Synthesis: Fuzzing-Based Evaluation and DisambiguationFundamental Approaches to Software Engineering10.1007/978-3-031-57259-3_11(232-254)Online publication date: 6-Apr-2024
  • (2023)Data Extraction via Semantic Regular Expression SynthesisProceedings of the ACM on Programming Languages10.1145/36228637:OOPSLA2(1848-1877)Online publication date: 16-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media