Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3472749.3474737acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Public Access

Sporq: An Interactive Environment for Exploring Code using Query-by-Example

Published: 12 October 2021 Publication History

Abstract

There has been widespread adoption of IDEs and powerful tools for program analysis. However, programmers still find it difficult to conveniently analyze their code for custom patterns. Such systems either provide inflexible interfaces or require knowledge of complex query languages and compiler internals. In this paper, we present Sporq, a tool that allows developers to mine their codebases for a range of patterns, including bugs, code smells, and violations of coding standards. Sporq offers an interactive environment in which the user highlights program elements, and the system responds by identifying other parts of the codebase with similar patterns. The programmer can then provide feedback which enables the system to rapidly infer the programmer’s intent. Internally, our system is driven by high-fidelity relational program representations and algorithms to synthesize database queries from examples. Our experiments and user studies with a VS Code extension indicate that Sporq reduces the effort needed by programmers to write custom analyses and discover bugs in large codebases.

Supplementary Material

VTT File (p84-talk.vtt)
VTT File (p84-video_preview.vtt)
Demo video and subtitles walking through the illustrative overview section of the paper (p84-supplement_material.zip)
MP4 File (p84-talk.mp4)
Talk video and captions
MP4 File (p84-video_preview.mp4)
Video preview and captions

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of Databases: The Logical Level (1st ed.). Pearson.
[2]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. Comput. Surveys (2018).
[3]
Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural Type Hints. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).
[4]
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In Proceedings of the International Conference on Management of Data (SIGMOD).
[5]
Michael Arntzenius and Neelakantan R. Krishnaswami. 2016. Datafun: A Functional Datalog. In Proceedings of the ACM International Conference on Functional Programming (ICFP).
[6]
Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. 2016. QL: Object-oriented Queries on Relational Data. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP).
[7]
Nathaniel Ayewah, William Pugh, David Hovemeyer, J David Morgenthaler, and John Penix. 2008. Using static analysis to find bugs. IEEE software 25, 5 (2008), 22–29.
[8]
Aaron Bembenek, Michael Greenberg, and Stephen Chong. 2020. Formulog: Datalog for SMT-Based Static Analysis. In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA).
[9]
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (2010).
[10]
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2017. Learning a Static Analyzer from Data. In Proceedings of the International Conference on Computer Aided Verification (CAV).
[11]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA).
[12]
Cristiano Calcagno, Dino Distefano, Jeremy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In Proceedings of the NASA Formal Method Symposium.
[13]
G. Ann Campbell and Patroklos P. Papapetrou. 2013. SonarQube in Action. Manning Publications Co.
[14]
Stefano Ceri, Georg Gottlob, Letizia Tanca, 1989. What you always wanted to know about Datalog (and never dared to ask). IEEE transactions on knowledge and data engineering 1, 1(1989), 146–166.
[15]
Maria Christakis and Christian Bird. 2016. What Developers Want and Need from Program Analysis: An Empirical Study. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering(Singapore, Singapore) (ASE 2016). Association for Computing Machinery, New York, NY, USA, 332–343.
[16]
CWE Community. 2008. CWE 681: Incorrect Conversion Between Numeric Types. https://cwe.mitre.org/data/definitions/681.html.
[17]
Return To Corporation. 2021. Semgrep. https://semgrep.dev.
[18]
Andrew Cropper, Sebastijan Dumančić, and Stephen H. Muggleton. 2020. Turning 30: New Ideas in Inductive Logic Programming. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[19]
Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019. Neural Logic Machines. In Proceedings of the International Conference on Learning Representations (ICLR).
[20]
Richard Evans and Edward Grefenstette. 2018. Learning Explanatory Rules from Noisy Data. Journal of Artificial Intelligence Research 61 (2018).
[21]
Sara Evensen, Chang Ge, Dongjin Choi, and Çağatay Demiralp. 2020. Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions. arxiv:2009.01444 [cs.LG]
[22]
Todd J. Green. 2015. LogiQL: A Declarative Language for Enterprise Applications. In Proceedings of the Symposium on Principles of Database Systems (PODS).
[23]
Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
[24]
Kihong Heo, Hakjoo Oh, and Hongseok Yang. 2019. Resource-Aware Program Analysis Via Online Abstraction Coarsening. In Proceedings of the International Conference on Software Engineering (ICSE).
[25]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. In Proceedings of the International Conference on Software Engineering (San Francisco, CA, USA) (ICSE ’13). IEEE Press, 672–681.
[26]
Ugur Koc, Parsa Saadatpanah, Jeffrey S. Foster, and Adam A. Porter. 2017. Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools. In Proceedings of the ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL).
[27]
Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, Chris Meek, Jennifer Neville, 2007. Introduction to statistical relational learning. MIT press.
[28]
GitHub Security Lab. 2021. Bounties. https://securitylab.github.com/bounties/.
[29]
Mark Law, Alessandra Russo, Elisa Bertino, Krysia Broda, and Jorge Lobo. 2020. FastLAS: Scalable Inductive Logic Programming Incorporating Domain-Specific Optimisation Criteria. In Proceedings of the Conference on Artificial Intelligence (AAAI).
[30]
Mark Law, Alessandra Russo, and Krysia Broda. 2014. Inductive Learning of Answer Set Programs. In Proceedings of the European Conference on Logics in Artificial Intelligence.
[31]
Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Wanling Ding, Tom M. Mitchell, and Brad A. Myers. 2018. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 105–114.
[32]
Ziyang Li, Aravind Machiry, Binghong Chen, Ke Wang, Mayur Naik, and Le Song. 2021. Arbitrar: User-Guided API Misuse Detection. In Proceedings of the IEEE Symposium on Security and Privacy (S&P).
[33]
LLVM. 2021. Clang Static Analyzer. https://clang-analyzer.llvm.org/.
[34]
Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: A Declarative Language for Fixed Points on Lattices. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).
[35]
Jonathan Mendelson, Aaditya Naik, Mayur Naik, and Mukund Raghothaman. 2021. GenSynth: Synthesizing Datalog Programs without Language Bias. (2021).
[36]
Stephen Muggleton. 1995. Inverse Entailment and Progol. New Generation Computing 13, 3 (Dec. 1995).
[37]
Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. 2015. Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited. Machine Learning 100, 1 (July 2015).
[38]
Mayur Naik. 2011. Chord: A versatile platform for program analysis. In Tutorial at the ACM Conference on Programming Language Design and Implementation (PLDI).
[39]
Mukund Raghothaman, Sulekha Kulkarni, Kihong Heo, and Mayur Naik. 2018. User-guided Program Reasoning Using Bayesian Inference. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).
[40]
Mukund Raghothaman, Jonathan Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz. 2020. Provenance-Guided Synthesis Of Datalog Programs. In Proceedings of the ACM Symposium on Principles of Programming Languages (POPL).
[41]
Tim Rocktäschel and Sebastian Riedel. 2017. End-to-end Differentiable Proving. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).
[42]
Gabriel Ryan, Justin Wong, Jianan Yao, Ronghui Gu, and Suman Jana. 2020. CLN2INV: Learning Loop Invariants with Continuous Logic Networks. In Proceedings of the International Conference on Learning Representations (ICLR).
[43]
Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Soederberg, and Collin Winter. 2015. Tricorder: Building a Program Analysis Ecosystem. In Proceedings of the International Conference on Software Engineering (ICSE).
[44]
Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On Fast Large-scale Program Analysis in Datalog. In Proceedings of the International Conference on Compiler Construction (CC).
[45]
[45] Semmle.2021. https://lgtm.com.
[46]
H. S. Seung, M. Opper, and H. Sompolinsky. 1992. Query by Committee. In Proceedings of the Annual Workshop on Computational Learning Theory (COLT).
[47]
Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. 2018. Learning Loop Invariants for Program Verification. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).
[48]
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-Guided Synthesis of Datalog Programs. In Proceedings of the Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
[49]
Xujie Si, Aaditya Naik, Hanjun Dai, Mayur Naik, and Le Song. 2020. Code2Inv: A Deep Learning Framework for Program Verification. In Proceedings of the International Conference on Computer Aided Verification (CAV).
[50]
Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog Programs Using Numerical Relaxation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[51]
Y. Smaragdakis and M. Bravenboer. 2010. Using Datalog for Fast and Easy Program Analysis. In Proceedings of the Datalog 2.0 Workshop.
[52]
[52] Bjarne Stroustrup and Herb Sutter.2021. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.
[53]
Aalok Thakkar, Aaditya Naik, Nate Sands, Rajeev Alur, Mayur Naik, and Mukund Raghothaman. 2021. Example-Guided Synthesis of Relational Queries. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).
[54]
Jingbo Wang, Chungha Sung, Mukund Raghothaman, and Chao Wang. 2021. Data-Driven Synthesis of Provably Sound Side Channel Analyses. In Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE). 810–822.
[55]
Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. LambdaNet: Probabilistic Type Inference using Graph Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR).
[56]
J. Whaley, D. Avots, M. Carbin, and M. Lam. 2005. Using Datalog with Binary Decision Diagrams for Program Analysis. In Proceedings of the Asian Symposium on Programming Languages and Systems (APLAS).
[57]
Fan Yang, Zhilin Yang, and William Cohen. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).
[58]
Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISAN: Sanitizing API Usages through Semantic Cross-Checking. In Proceedings of the USENIX Security Symposium.
[59]
Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. In Proceedings of the ACM International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).

Cited By

View all
  • (2024)MiniCAT: Understanding and Detecting Cross-Page Request Forgery Vulnerabilities in Mini-ProgramsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670294(525-539)Online publication date: 2-Dec-2024
  • (2024)Tyche: Making Sense of PBT EffectivenessProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676407(1-16)Online publication date: 13-Oct-2024
  • (2022)CodeSpider: Automatic Code Querying with Multi-modal Conjunctive Query SynthesisCompanion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3563768.3563954(63-65)Online publication date: 29-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology
October 2021
1357 pages
ISBN:9781450386357
DOI:10.1145/3472749
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

UIST '21

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25
The 38th Annual ACM Symposium on User Interface Software and Technology
September 28 - October 1, 2025
Busan , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)279
  • Downloads (Last 6 weeks)38
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MiniCAT: Understanding and Detecting Cross-Page Request Forgery Vulnerabilities in Mini-ProgramsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670294(525-539)Online publication date: 2-Dec-2024
  • (2024)Tyche: Making Sense of PBT EffectivenessProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676407(1-16)Online publication date: 13-Oct-2024
  • (2022)CodeSpider: Automatic Code Querying with Multi-modal Conjunctive Query SynthesisCompanion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3563768.3563954(63-65)Online publication date: 29-Nov-2022
  • (2022)Synthesizing code quality rules from examplesProceedings of the ACM on Programming Languages10.1145/35633506:OOPSLA2(1757-1787)Online publication date: 31-Oct-2022
  • (2022)SemanticOn: Specifying Content-Based Semantic Conditions for Web Automation ProgramsProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545691(1-16)Online publication date: 29-Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media