research-article

ControlFlag: a self-supervised idiosyncratic pattern detection system for software control structures

Authors:

Niranjan Hasabnis,

Justin GottschlichAuthors Info & Claims

MAPS 2021: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming

Pages 32 - 42

https://doi.org/10.1145/3460945.3464954

Published: 20 June 2021 Publication History

Abstract

Software debugging has been shown to utilize upwards of half of developers’ time. Yet, machine programming (MP), the field concerned with the automation of software (and hardware) development, has recently made strides in both research and production-quality automated debugging systems. In this paper we present ControlFlag, a self-supervised MP system that aims to improve debugging by attempting to detect idiosyncratic pattern violations in software control structures. ControlFlag also suggests possible corrections in the event an anomalous pattern is detected. We present ControlFlag’s design and provide an experimental evaluation and analysis of its efficacy in identifying potential programming errors in production-quality software. As a first concrete evidence towards improving software quality, ControlFlag has already found an anomaly in CURL that has been acknowledged and fixed by its developers. We also discuss future extensions of ControlFlag.

References

[1]

Mithun Acharya and Tao Xie. Mining API Error-handling Specifications from Source Code. In International Conference on Fundamental Approaches to Software Engineering, 2009.

Digital Library

[2]

Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier S Turek, Tim Mattson, and Abdullah Muzahid. A zero-positive learning approach for diagnosing software performance regressions. Advances in Neural Information Processing Systems, 2019.

[3]

M. Allamanis, E. T. Barr, C. Bird, P. Devanbu, M. Marron, and C. Sutton. Mining Semantic Loop Idioms. IEEE Transactions on Software Engineering, 2018.

[4]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs. In International Conference on Learning Representations, ICLR, 2018.

[5]

Miltiadis Allamanis and Charles Sutton. Mining Idioms from Source Code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, 2014.

Digital Library

[6]

Kory Becker and Justin Gottschlich. AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms. In GECCO ’21 Workshop on Evolutionary Computation Software Systems, 2021.

[7]

Sahil Bhatia and Rishabh Singh. Automated Correction for Syntax Errors in Programming Assignments Using Recurrent Neural Networks. In Proceedings of 2nd Indian Workshop on Machine Learning, 2016.

[8]

Tom Britton, Lisa Jeng, Graham Carver, and Paul Cheak. Reversible Debugging Software — Quantify the Time and Cost Saved Using Reversible Debuggers, 2012.

[9]

Lujing Cen, Ryan Marcus, Hongzi Mao, Justin Gottschlich, Mohammad Alizadeh, and Tim Kraska. Learned Garbage Collection. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL, 2020.

[10]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.

Digital Library

[11]

CURL. http_proxy: Use Enum with State Names for ‘keepon’. https://github.com/curl/curl/pull/6193, 2020.

[12]

CURL. Re: Potential Confusion in http_proxy.c and a Recommendation. https://curl.se/mail/lib-2020-11/0028.html, 2020.

[13]

Dario Di Nucci, Hoang-Son Pham, Johan Fabry, Coen De Roover, Kim Mens, Tim Molderez, Siegfried Nijssen, and Vadim Zaytsev. A Language-Parametric Modular Framework for Mining Idiomatic Code Patterns. In Proceedings of the 12th Seminar on Advanced Techniques Tools for Software Evolution, SATToSE, 2019.

[14]

Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In International Conference on Learning Representations, ICLR, 2020.

[15]

Yizhak Yisrael Elboher, Justin Gottschlich, and Guy Katz. An Abstraction-Based Framework for Neural Network Verification. In Computer Aided Verification, CAV, 2020.

[16]

N. E. Fenton and M. Neil. A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering, 1999.

[17]

J. Fowkes and Charles A. Sutton. Parameter-free Probabilistic API Mining Across GitHub. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016.

Digital Library

[18]

Xiang Gao, Shraddha Barke, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani, Alan Leung, Nachi Nagappan, and Ashish Tiwari. Feedback-Driven Semi-Supervised Synthesis of Program Transformations. In Proceedings of the ACM on Programming Languages, OOPSLA, 2020.

[19]

Wolf Garbe. SymSpell. https://github.com/wolfgarbe/SymSpell, 2020.

[20]

Justin Gottschlich, Gilles Pokam, Cristiano Pereira, and Youfeng Wu. Concurrent Predicates: A Debugging Technique for Every Parallel Programmer. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pages 331–340, 2013.

[21]

Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman Amarasinghe, Joshua B. Tenenbaum, and Tim Mattson. The Three Pillars of Machine Programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL, 2018.

[22]

Justin E. Gottschlich, Maurice P. Herlihy, Gilles A. Pokam, and Jeremy G. Siek. Visualizing Transactional Memory. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, page 159–170, New York, NY, USA, 2012. Association for Computing Machinery.

Digital Library

[23]

Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. Automated program repair. Commun. ACM, 62(12):56–65, November 2019.

Digital Library

[24]

Kavi Gupta, Peter Ebert Christensen, Xinyun Chen, and Dawn Song. Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17685–17695. Curran Associates, Inc., 2020.

[25]

Niranjan Hasabnis and R Sekar. Extracting instruction semantics via symbolic execution of code generators. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, 2016.

Digital Library

[26]

Niranjan Hasabnis and R. Sekar. Lifting assembly to intermediate representation: A novel approach leveraging compilers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2016.

Digital Library

[27]

Srini Iyer, Alvin Cheung, and Luke Zettlemoyer. Learning Programmatic Idioms for Scalable Semantic Parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019.

[28]

Ferosh Jacob and Robert Tairas. Code Template Inference Using Language Models. In Proceedings of the 48th Annual Southeast Regional Conference, ACM SE, 2010.

[29]

Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. Verified Lifting of Stencil Computations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2016.

[30]

Benjamin Livshits and Thomas Zimmermann. Dynamine: Finding Common Error Patterns by Mining Software Revision Histories. ACM SIGSOFT Software Engineering Notes, 2005.

[31]

Angela Lozano, Andy Kellens, Kim Mens, and Gabriela Arevalo. Mining Source Code for Structural Regularities. In 17th Working Conference on Reverse Engineering. IEEE, 2010.

[32]

Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. Aroma: Code Recommendation via Structural Code Search. Proceedings of the ACM on Programming Languages, 2019.

Digital Library

[33]

Shantanu Mandal, Todd Anderson, Javier Turek, Justin Gottschlich, Shengtian Zhou, and Abdullah Muzahid. Learning Fitness Functions for Machine Programming. In Proceedings of Machine Learning and Systems, MLSys, 2021.

[34]

Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H Pham, Jafar M Al-Kofahi, and Tien N Nguyen. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009.

[35]

Sebastian Nielebock, Robert Heumüller, Kevin Michael Schott, and Frank Ortmeier. Guided Pattern Mining for API Misuse Detection by Change-Based Code Analysis. arXiv preprint arXiv:2008.00277, 2020.

[36]

Peter Norvig. How to Write a Spelling Corrector. https://norvig.com/spell-correct.html, 2016.

[37]

Augustus Odena and Charles Sutton. Learning to Represent Programs with Property Signatures. In International Conference on Learning Representations, ICLR, 2020.

[38]

Dmitry Orlov. Finding Idioms in Source Code Using Subtree Counting Techniques. In International Symposium on Leveraging Applications of Formal Methods, 2020.

[39]

Tharindu Patabandi, Anand Venkat, Abhishek Kulkarni, Pushkar Ratnalikar, Mary Hall, and Justin Gottschlich. Predictive Locality Optimization for Higher-Order Tensor Computations. In Proceedings of the 5th ACM SIGPLAN International Machine Programming Symposium, MAPS, 2021.

[40]

Hoang Son Pham, Siegfried Nijssen, Kim Mens, Dario Di Nucci, Tim Molderez, Coen De Roover, Johan Fabry, and Vadim Zaytsev. Mining Patterns in Source Code Using Tree Mining Algorithms. In International Conference on Discovery Science, 2019.

[41]

Michael Pradel and Koushik Sen. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proceedings of the ACM on Programming Languages, 2018.

Digital Library

[42]

Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, and Ameet Talwalkar. MLSys: The New Frontier of Machine Learning Systems. In Machine Learning and Systems, MLSys, 2019.

[43]

Eui Chul Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. Program Synthesis and Semantic Parsing with Learned Code Idioms. In Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems, NeurIPS, 2019.

[44]

T. Shippey, D. Bowes, and T. Hall. Automatically Identifying Code Features for Software Defect Prediction: Using AST N-grams. Information and Software Technology, 2019.

[45]

Stuart Sutherland and Don Mills. Standard Gotchas Subtleties in the Verilog and System Verilog Standards that Every Engineer Should Know. https://lcdm-eng.com/papers/snug06_Verilog 2006.

[46]

Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. Pythia: AI-Assisted Code Completion System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD, 2019.

[47]

Suresh Thummalapenta and Tao Xie. Parseweb: A Programmer Assistant for Reusing Open Source Code on the Web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, 2007.

Digital Library

[48]

TreeSitter. An Incremental Parsing System for Programming Tools. https://tree-sitter.github.io/tree-sitter/, 2017.

[49]

Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh singh. Neural Program Repair by Jointly Learning to Localize and Repair. In International Conference on Learning Representations, ICLR, 2019.

[50]

Westley Weimer and George C. Necula. Mining Temporal Specifications for Error Detection. In Tools and Algorithms for the Construction and Analysis of Systems, 2005.

[51]

Tao Xie and J. Pei. MAPO: Mining API Usages from Open Source Repositories. In In International Workshop on Mining Software Repositories, MSR, 2006.

Digital Library

[52]

Michihiro Yasunaga and Percy Liang. Graph-based, Self-supervised Program Repair from Diagnostic Feedback. In International Conference on Machine Learning (ICML), 2020.

[53]

Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marucs, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, and Justin Gottschlich. MISIM: An End-to-End Neural Code Similarity System. arXiv preprint arXiv:2006.05265, 2020.

[54]

M. J. Zaki. Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transactions on Knowledge & Data Engineering, 2005.

Cited By

Wu CAmiri MQin HMehta BMarcus RLoo B(2024)Towards Full Stack Adaptivity in Permissioned BlockchainsProceedings of the VLDB Endowment10.14778/3641204.364121617:5(1073-1080)Online publication date: 2-May-2024
https://dl.acm.org/doi/10.14778/3641204.3641216
Schneider NKadosh THasabnis NMattson TPinter YOren G(2023)MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with TransformersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624063(2-10)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624063
Kadosh TSchneider NHasabnis NMattson TPinter YOren G(2023)Advising OpenMP Parallelization via A Graph-Based Approach with TransformersOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_1(3-17)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-40744-4_1
Show More Cited By

Index Terms

ControlFlag: a self-supervised idiosyncratic pattern detection system for software control structures
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
    2. Machine learning approaches
      1. Rule learning
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

Self-supervised anomaly detection in computer vision and beyond: A survey and outlook
Abstract
Anomaly detection (AD) plays a crucial role in various domains, including cybersecurity, finance, and healthcare, by identifying patterns or events that deviate from normal behavior. In recent years, significant progress has been made in this ...
Highlights
- We present a cohesive overview of self-supervised methods for anomaly detection.
- We categorize existing self-supervised anomaly detection algorithms. their proxy tasks.
- We conduct a comprehensive performance comparison.
- We ...
Boosting Facial Landmark Detection via Self-supervised and Semi-supervised Learning
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Keypoint detection is one of the main focused fields in computer vision with various applications. Traditional fully-supervised deep learning methods currently dominate the field with impressive accuracy, but typically require careful, expensive, and ...
Multi-task Self-supervised Few-Shot Detection
Pattern Recognition and Computer Vision
Abstract
Few-shot object detection involves detecting novel objects with only a few training samples. But very few samples are difficult to cover the bias of the new class in the deep model. To address the issue, we use self-supervision to expand the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MAPS 2021: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming

June 2021

52 pages

ISBN:9781450384674

DOI:10.1145/3460945

General Chair:
Roopsha Samanta
Purdue University, USA
,
Program Chair:
Isil Dillig
University of Texas at Austin, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '21

Sponsor:

SIGPLAN

PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 21, 2021

Virtual, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
140
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu CAmiri MQin HMehta BMarcus RLoo B(2024)Towards Full Stack Adaptivity in Permissioned BlockchainsProceedings of the VLDB Endowment10.14778/3641204.364121617:5(1073-1080)Online publication date: 2-May-2024
https://dl.acm.org/doi/10.14778/3641204.3641216
Schneider NKadosh THasabnis NMattson TPinter YOren G(2023)MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with TransformersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624063(2-10)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624063
Kadosh TSchneider NHasabnis NMattson TPinter YOren G(2023)Advising OpenMP Parallelization via A Graph-Based Approach with TransformersOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_1(3-17)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-40744-4_1
Wasay ATatbul NGottschlich J(2022)Machine programmingProceedings of the VLDB Endowment10.14778/3554821.355489215:12(3754-3757)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.14778/3554821.3554892

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents