research-article

Public Access

Sporq: An Interactive Environment for Exploring Code using Query-by-Example

Authors:

Jonathan Mendelson,

Nathaniel Sands,

Mukund RaghothamanAuthors Info & Claims

UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

Pages 84 - 99

https://doi.org/10.1145/3472749.3474737

Published: 12 October 2021 Publication History

All formats PDF

Abstract

There has been widespread adoption of IDEs and powerful tools for program analysis. However, programmers still find it difficult to conveniently analyze their code for custom patterns. Such systems either provide inflexible interfaces or require knowledge of complex query languages and compiler internals. In this paper, we present Sporq, a tool that allows developers to mine their codebases for a range of patterns, including bugs, code smells, and violations of coding standards. Sporq offers an interactive environment in which the user highlights program elements, and the system responds by identifying other parts of the codebase with similar patterns. The programmer can then provide feedback which enables the system to rapidly infer the programmer’s intent. Internally, our system is driven by high-fidelity relational program representations and algorithms to synthesize database queries from examples. Our experiments and user studies with a VS Code extension indicate that Sporq reduces the effort needed by programmers to write custom analyses and discover bugs in large codebases.

Supplementary Material

VTT File (p84-talk.vtt)

Download
11.62 KB

VTT File (p84-video_preview.vtt)

Download
.75 KB

Demo video and subtitles walking through the illustrative overview section of the paper (p84-supplement_material.zip)

Download
77.71 MB

MP4 File (p84-talk.mp4)

Talk video and captions

Download
211.05 MB

MP4 File (p84-video_preview.mp4)

Video preview and captions

Download
9.81 MB

References

[1]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of Databases: The Logical Level (1st ed.). Pearson.

Digital Library

[2]

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. Comput. Surveys (2018).

[3]

Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural Type Hints. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[4]

Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the International Conference on Management of Data (SIGMOD).

Digital Library

[5]

Michael Arntzenius and Neelakantan R. Krishnaswami. 2016. Datafun: A Functional Datalog. In Proceedings of the ACM International Conference on Functional Programming (ICFP).

Digital Library

[6]

Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. 2016. QL: Object-oriented Queries on Relational Data. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP).

[7]

Nathaniel Ayewah, William Pugh, David Hovemeyer, J David Morgenthaler, and John Penix. 2008. Using static analysis to find bugs. IEEE software 25, 5 (2008), 22–29.

Digital Library

[8]

Aaron Bembenek, Michael Greenberg, and Stephen Chong. 2020. Formulog: Datalog for SMT-Based Static Analysis. In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA).

Digital Library

[9]

Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (2010).

Digital Library

[10]

Pavol Bielik, Veselin Raychev, and Martin Vechev. 2017. Learning a Static Analyzer from Data. In Proceedings of the International Conference on Computer Aided Verification (CAV).

[11]

Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA).

Digital Library

[12]

Cristiano Calcagno, Dino Distefano, Jeremy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In Proceedings of the NASA Formal Method Symposium.

[13]

G. Ann Campbell and Patroklos P. Papapetrou. 2013. SonarQube in Action. Manning Publications Co.

Digital Library

[14]

Stefano Ceri, Georg Gottlob, Letizia Tanca, 1989. What you always wanted to know about Datalog (and never dared to ask). IEEE transactions on knowledge and data engineering 1, 1(1989), 146–166.

Digital Library

[15]

Maria Christakis and Christian Bird. 2016. What Developers Want and Need from Program Analysis: An Empirical Study. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering(Singapore, Singapore) (ASE 2016). Association for Computing Machinery, New York, NY, USA, 332–343.

Digital Library

[16]

CWE Community. 2008. CWE 681: Incorrect Conversion Between Numeric Types. https://cwe.mitre.org/data/definitions/681.html.

[17]

Return To Corporation. 2021. Semgrep. https://semgrep.dev.

[18]

Andrew Cropper, Sebastijan Dumančić, and Stephen H. Muggleton. 2020. Turning 30: New Ideas in Inductive Logic Programming. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).

[19]

Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019. Neural Logic Machines. In Proceedings of the International Conference on Learning Representations (ICLR).

[20]

Richard Evans and Edward Grefenstette. 2018. Learning Explanatory Rules from Noisy Data. Journal of Artificial Intelligence Research 61 (2018).

[21]

Sara Evensen, Chang Ge, Dongjin Choi, and Çağatay Demiralp. 2020. Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions. arxiv:2009.01444 [cs.LG]

[22]

Todd J. Green. 2015. LogiQL: A Declarative Language for Enterprise Applications. In Proceedings of the Symposium on Principles of Database Systems (PODS).

Digital Library

[23]

Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Digital Library

[24]

Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2019. Resource-Aware Program Analysis Via Online Abstraction Coarsening. In Proceedings of the International Conference on Software Engineering (ICSE).

Digital Library

[25]

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. In Proceedings of the International Conference on Software Engineering (San Francisco, CA, USA) (ICSE ’13). IEEE Press, 672–681.

[26]

Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools. In Proceedings of the ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL).

Digital Library

[27]

Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, Chris Meek, Jennifer Neville, 2007. Introduction to statistical relational learning. MIT press.

[28]

GitHub Security Lab. 2021. Bounties. https://securitylab.github.com/bounties/.

[29]

Mark Law, Alessandra Russo, Elisa Bertino, Krysia Broda, and Jorge Lobo. 2020. FastLAS: Scalable Inductive Logic Programming Incorporating Domain-Specific Optimisation Criteria. In Proceedings of the Conference on Artificial Intelligence (AAAI).

[30]

Mark Law, Alessandra Russo, and Krysia Broda. 2014. Inductive Learning of Answer Set Programs. In Proceedings of the European Conference on Logics in Artificial Intelligence.

Digital Library

[31]

Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Wanling Ding, Tom M. Mitchell, and Brad A. Myers. 2018. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 105–114.

[32]

Ziyang Li, Aravind Machiry, Binghong Chen, Ke Wang, Mayur Naik, and Le Song. 2021. Arbitrar: User-Guided API Misuse Detection. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).

[33]

LLVM. 2021. Clang Static Analyzer. https://clang-analyzer.llvm.org/.

[34]

Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: A Declarative Language for Fixed Points on Lattices. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[35]

Jonathan Mendelson, Aaditya Naik, Mayur Naik, and Mukund Raghothaman. 2021. GenSynth: Synthesizing Datalog Programs without Language Bias. (2021).

[36]

Stephen Muggleton. 1995. Inverse Entailment and Progol. New Generation Computing 13, 3 (Dec. 1995).

Digital Library

[37]

Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. 2015. Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited. Machine Learning 100, 1 (July 2015).

Digital Library

[38]

Mayur Naik. 2011. Chord: A versatile platform for program analysis. In Tutorial at the ACM Conference on Programming Language Design and Implementation (PLDI).

[39]

Mukund Raghothaman, Sulekha Kulkarni, Kihong Heo, and Mayur Naik. 2018. User-guided Program Reasoning Using Bayesian Inference. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[40]

Mukund Raghothaman, Jonathan Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz. 2020. Provenance-Guided Synthesis Of Datalog Programs. In Proceedings of the ACM Symposium on Principles of Programming Languages (POPL).

Digital Library

[41]

Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end Differentiable Proving. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).

[42]

Gabriel Ryan, Justin Wong, Jianan Yao, Ronghui Gu, and Suman Jana. 2020. CLN2INV: Learning Loop Invariants with Continuous Logic Networks. In Proceedings of the International Conference on Learning Representations (ICLR).

[43]

Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Soederberg, and Collin Winter. 2015. Tricorder: Building a Program Analysis Ecosystem. In Proceedings of the International Conference on Software Engineering (ICSE).

[44]

Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On Fast Large-scale Program Analysis in Datalog. In Proceedings of the International Conference on Compiler Construction (CC).

Digital Library

[45]

[45] Semmle.2021. https://lgtm.com.

[46]

H. S. Seung, M. Opper, and H. Sompolinsky. 1992. Query by Committee. In Proceedings of the Annual Workshop on Computational Learning Theory (COLT).

[47]

Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. 2018. Learning Loop Invariants for Program Verification. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).

[48]

Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-Guided Synthesis of Datalog Programs. In Proceedings of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

Digital Library

[49]

Xujie Si, Aaditya Naik, Hanjun Dai, Mayur Naik, and Le Song. 2020. Code2Inv: A Deep Learning Framework for Program Verification. In Proceedings of the International Conference on Computer Aided Verification (CAV).

Digital Library

[50]

Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog Programs Using Numerical Relaxation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).

[51]

Y. Smaragdakis and M. Bravenboer. 2010. Using Datalog for Fast and Easy Program Analysis. In Proceedings of the Datalog 2.0 Workshop.

[52]

[52] Bjarne Stroustrup and Herb Sutter.2021. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.

[53]

Aalok Thakkar, Aaditya Naik, Nate Sands, Rajeev Alur, Mayur Naik, and Mukund Raghothaman. 2021. Example-Guided Synthesis of Relational Queries. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).

Digital Library

[54]

Jingbo Wang, Chungha Sung, Mukund Raghothaman, and Chao Wang. 2021. Data-Driven Synthesis of Provably Sound Side Channel Analyses. In Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE). 810–822.

Digital Library

[55]

Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. LambdaNet: Probabilistic Type Inference using Graph Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR).

[56]

J. Whaley, D. Avots, M. Carbin, and M. Lam. 2005. Using Datalog with Binary Decision Diagrams for Program Analysis. In Proceedings of the Asian Symposium on Programming Languages and Systems (APLAS).

[57]

Fan Yang, Zhilin Yang, and William Cohen. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).

[58]

Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISAN: Sanitizing API Usages through Semantic Cross-Checking. In Proceedings of the USENIX Security Symposium.

Digital Library

[59]

Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. In Proceedings of the ACM International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).

Digital Library

Cited By

Zhang ZHou QYing LDiao WGu YLi RGuo SDuan HLuo BLiao XXu JKirda ELie D(2024)MiniCAT: Understanding and Detecting Cross-Page Request Forgery Vulnerabilities in Mini-ProgramsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670294(525-539)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3670294
Goldstein HTao JHatfield-Dodds ZPierce BHead A(2024)Tyche: Making Sense of PBT EffectivenessProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676407(1-16)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676407
Wang CPotanin A(2022)CodeSpider: Automatic Code Querying with Multi-modal Conjunctive Query SynthesisCompanion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3563768.3563954(63-65)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3563768.3563954
Show More Cited By

Recommendations

An interactive ambient visualization for code smells
SOFTVIS '10: Proceedings of the 5th international symposium on Software visualization

Code smells are characteristics of software that indicate that code may have a design problem. Code smells have been proposed as a way for programmers to recognize the need for restructuring their software. Because code smells can go unnoticed while ...
Prioritising Refactoring Using Code Bad Smells
ICSTW '11: Proceedings of the 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops

We investigated the relationship between six of Fowler et al.'s Code Bad Smells (Duplicated Code, Data Clumps, Switch Statements, Speculative Generality, Message Chains, and Middle Man) and software faults. In this paper we discuss how our results can ...
Detect Related Bugs from Source Code Using Bug Information
COMPSAC '10: Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference

Open source projects often maintain open bug repositories during development and maintenance, and the reporters often point out straightly or implicitly the reasons why bugs occur when they submit them. The comments about a bug are very valuable for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

October 2021

1357 pages

ISBN:9781450386357

DOI:10.1145/3472749

Editors:
Jeffrey Nichols
Apple, USA
,
Ranjitha Kumar
UIUC, USA
,
Michael Nebeling
University of Michigan, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

UIST '21

Sponsor:

UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

October 10 - 14, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
812
Total Downloads

Downloads (Last 12 months)279
Downloads (Last 6 weeks)38

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZHou QYing LDiao WGu YLi RGuo SDuan HLuo BLiao XXu JKirda ELie D(2024)MiniCAT: Understanding and Detecting Cross-Page Request Forgery Vulnerabilities in Mini-ProgramsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670294(525-539)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3670294
Goldstein HTao JHatfield-Dodds ZPierce BHead A(2024)Tyche: Making Sense of PBT EffectivenessProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676407(1-16)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676407
Wang CPotanin A(2022)CodeSpider: Automatic Code Querying with Multi-modal Conjunctive Query SynthesisCompanion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3563768.3563954(63-65)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3563768.3563954
Garg PSengamedu S(2022)Synthesizing code quality rules from examplesProceedings of the ACM on Programming Languages10.1145/35633506:OOPSLA2(1757-1787)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563350
Pu KFu RDong RWang XChen YGrossman T(2022)SemanticOn: Specifying Content-Based Semantic Conditions for Web Automation ProgramsProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545691(1-16)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545691

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten