Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3460945.3464954acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

ControlFlag: a self-supervised idiosyncratic pattern detection system for software control structures

Published: 20 June 2021 Publication History

Abstract

Software debugging has been shown to utilize upwards of half of developers’ time. Yet, machine programming (MP), the field concerned with the automation of software (and hardware) development, has recently made strides in both research and production-quality automated debugging systems. In this paper we present ControlFlag, a self-supervised MP system that aims to improve debugging by attempting to detect idiosyncratic pattern violations in software control structures. ControlFlag also suggests possible corrections in the event an anomalous pattern is detected. We present ControlFlag’s design and provide an experimental evaluation and analysis of its efficacy in identifying potential programming errors in production-quality software. As a first concrete evidence towards improving software quality, ControlFlag has already found an anomaly in CURL that has been acknowledged and fixed by its developers. We also discuss future extensions of ControlFlag.

References

[1]
Mithun Acharya and Tao Xie. Mining API Error-handling Specifications from Source Code. In International Conference on Fundamental Approaches to Software Engineering, 2009.
[2]
Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier S Turek, Tim Mattson, and Abdullah Muzahid. A zero-positive learning approach for diagnosing software performance regressions. Advances in Neural Information Processing Systems, 2019.
[3]
M. Allamanis, E. T. Barr, C. Bird, P. Devanbu, M. Marron, and C. Sutton. Mining Semantic Loop Idioms. IEEE Transactions on Software Engineering, 2018.
[4]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to Represent Programs with Graphs. In International Conference on Learning Representations, ICLR, 2018.
[5]
Miltiadis Allamanis and Charles Sutton. Mining Idioms from Source Code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, 2014.
[6]
Kory Becker and Justin Gottschlich. AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms. In GECCO ’21 Workshop on Evolutionary Computation Software Systems, 2021.
[7]
Sahil Bhatia and Rishabh Singh. Automated Correction for Syntax Errors in Programming Assignments Using Recurrent Neural Networks. In Proceedings of 2nd Indian Workshop on Machine Learning, 2016.
[8]
Tom Britton, Lisa Jeng, Graham Carver, and Paul Cheak. Reversible Debugging Software — Quantify the Time and Cost Saved Using Reversible Debuggers, 2012.
[9]
Lujing Cen, Ryan Marcus, Hongzi Mao, Justin Gottschlich, Mohammad Alizadeh, and Tim Kraska. Learned Garbage Collection. In Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL, 2020.
[10]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
[11]
CURL. http_proxy: Use Enum with State Names for ‘keepon’. https://github.com/curl/curl/pull/6193, 2020.
[12]
CURL. Re: Potential Confusion in http_proxy.c and a Recommendation. https://curl.se/mail/lib-2020-11/0028.html, 2020.
[13]
Dario Di Nucci, Hoang-Son Pham, Johan Fabry, Coen De Roover, Kim Mens, Tim Molderez, Siegfried Nijssen, and Vadim Zaytsev. A Language-Parametric Modular Framework for Mining Idiomatic Code Patterns. In Proceedings of the 12th Seminar on Advanced Techniques Tools for Software Evolution, SATToSE, 2019.
[14]
Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In International Conference on Learning Representations, ICLR, 2020.
[15]
Yizhak Yisrael Elboher, Justin Gottschlich, and Guy Katz. An Abstraction-Based Framework for Neural Network Verification. In Computer Aided Verification, CAV, 2020.
[16]
N. E. Fenton and M. Neil. A Critique of Software Defect Prediction Models. IEEE Transactions on Software Engineering, 1999.
[17]
J. Fowkes and Charles A. Sutton. Parameter-free Probabilistic API Mining Across GitHub. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016.
[18]
Xiang Gao, Shraddha Barke, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani, Alan Leung, Nachi Nagappan, and Ashish Tiwari. Feedback-Driven Semi-Supervised Synthesis of Program Transformations. In Proceedings of the ACM on Programming Languages, OOPSLA, 2020.
[19]
Wolf Garbe. SymSpell. https://github.com/wolfgarbe/SymSpell, 2020.
[20]
Justin Gottschlich, Gilles Pokam, Cristiano Pereira, and Youfeng Wu. Concurrent Predicates: A Debugging Technique for Every Parallel Programmer. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, pages 331–340, 2013.
[21]
Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman Amarasinghe, Joshua B. Tenenbaum, and Tim Mattson. The Three Pillars of Machine Programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL, 2018.
[22]
Justin E. Gottschlich, Maurice P. Herlihy, Gilles A. Pokam, and Jeremy G. Siek. Visualizing Transactional Memory. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, page 159–170, New York, NY, USA, 2012. Association for Computing Machinery.
[23]
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. Automated program repair. Commun. ACM, 62(12):56–65, November 2019.
[24]
Kavi Gupta, Peter Ebert Christensen, Xinyun Chen, and Dawn Song. Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17685–17695. Curran Associates, Inc., 2020.
[25]
Niranjan Hasabnis and R Sekar. Extracting instruction semantics via symbolic execution of code generators. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE, 2016.
[26]
Niranjan Hasabnis and R. Sekar. Lifting assembly to intermediate representation: A novel approach leveraging compilers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2016.
[27]
Srini Iyer, Alvin Cheung, and Luke Zettlemoyer. Learning Programmatic Idioms for Scalable Semantic Parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019.
[28]
Ferosh Jacob and Robert Tairas. Code Template Inference Using Language Models. In Proceedings of the 48th Annual Southeast Regional Conference, ACM SE, 2010.
[29]
Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. Verified Lifting of Stencil Computations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, 2016.
[30]
Benjamin Livshits and Thomas Zimmermann. Dynamine: Finding Common Error Patterns by Mining Software Revision Histories. ACM SIGSOFT Software Engineering Notes, 2005.
[31]
Angela Lozano, Andy Kellens, Kim Mens, and Gabriela Arevalo. Mining Source Code for Structural Regularities. In 17th Working Conference on Reverse Engineering. IEEE, 2010.
[32]
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. Aroma: Code Recommendation via Structural Code Search. Proceedings of the ACM on Programming Languages, 2019.
[33]
Shantanu Mandal, Todd Anderson, Javier Turek, Justin Gottschlich, Shengtian Zhou, and Abdullah Muzahid. Learning Fitness Functions for Machine Programming. In Proceedings of Machine Learning and Systems, MLSys, 2021.
[34]
Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H Pham, Jafar M Al-Kofahi, and Tien N Nguyen. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009.
[35]
Sebastian Nielebock, Robert Heumüller, Kevin Michael Schott, and Frank Ortmeier. Guided Pattern Mining for API Misuse Detection by Change-Based Code Analysis. arXiv preprint arXiv:2008.00277, 2020.
[36]
Peter Norvig. How to Write a Spelling Corrector. https://norvig.com/spell-correct.html, 2016.
[37]
Augustus Odena and Charles Sutton. Learning to Represent Programs with Property Signatures. In International Conference on Learning Representations, ICLR, 2020.
[38]
Dmitry Orlov. Finding Idioms in Source Code Using Subtree Counting Techniques. In International Symposium on Leveraging Applications of Formal Methods, 2020.
[39]
Tharindu Patabandi, Anand Venkat, Abhishek Kulkarni, Pushkar Ratnalikar, Mary Hall, and Justin Gottschlich. Predictive Locality Optimization for Higher-Order Tensor Computations. In Proceedings of the 5th ACM SIGPLAN International Machine Programming Symposium, MAPS, 2021.
[40]
Hoang Son Pham, Siegfried Nijssen, Kim Mens, Dario Di Nucci, Tim Molderez, Coen De Roover, Johan Fabry, and Vadim Zaytsev. Mining Patterns in Source Code Using Tree Mining Algorithms. In International Conference on Discovery Science, 2019.
[41]
Michael Pradel and Koushik Sen. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proceedings of the ACM on Programming Languages, 2018.
[42]
Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, and Ameet Talwalkar. MLSys: The New Frontier of Machine Learning Systems. In Machine Learning and Systems, MLSys, 2019.
[43]
Eui Chul Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. Program Synthesis and Semantic Parsing with Learned Code Idioms. In Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems, NeurIPS, 2019.
[44]
T. Shippey, D. Bowes, and T. Hall. Automatically Identifying Code Features for Software Defect Prediction: Using AST N-grams. Information and Software Technology, 2019.
[45]
Stuart Sutherland and Don Mills. Standard Gotchas Subtleties in the Verilog and System Verilog Standards that Every Engineer Should Know. https://lcdm-eng.com/papers/snug06_Verilog 2006.
[46]
Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. Pythia: AI-Assisted Code Completion System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD, 2019.
[47]
Suresh Thummalapenta and Tao Xie. Parseweb: A Programmer Assistant for Reusing Open Source Code on the Web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, 2007.
[48]
TreeSitter. An Incremental Parsing System for Programming Tools. https://tree-sitter.github.io/tree-sitter/, 2017.
[49]
Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh singh. Neural Program Repair by Jointly Learning to Localize and Repair. In International Conference on Learning Representations, ICLR, 2019.
[50]
Westley Weimer and George C. Necula. Mining Temporal Specifications for Error Detection. In Tools and Algorithms for the Construction and Analysis of Systems, 2005.
[51]
Tao Xie and J. Pei. MAPO: Mining API Usages from Open Source Repositories. In In International Workshop on Mining Software Repositories, MSR, 2006.
[52]
Michihiro Yasunaga and Percy Liang. Graph-based, Self-supervised Program Repair from Diagnostic Feedback. In International Conference on Machine Learning (ICML), 2020.
[53]
Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marucs, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, and Justin Gottschlich. MISIM: An End-to-End Neural Code Similarity System. arXiv preprint arXiv:2006.05265, 2020.
[54]
M. J. Zaki. Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Transactions on Knowledge & Data Engineering, 2005.

Cited By

View all
  • (2024)Towards Full Stack Adaptivity in Permissioned BlockchainsProceedings of the VLDB Endowment10.14778/3641204.364121617:5(1073-1080)Online publication date: 2-May-2024
  • (2023)MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with TransformersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624063(2-10)Online publication date: 12-Nov-2023
  • (2023)Advising OpenMP Parallelization via A Graph-Based Approach with TransformersOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_1(3-17)Online publication date: 13-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MAPS 2021: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming
June 2021
52 pages
ISBN:9781450384674
DOI:10.1145/3460945
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Source-code mining
  2. self-supervised learning

Qualifiers

  • Research-article

Conference

PLDI '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Full Stack Adaptivity in Permissioned BlockchainsProceedings of the VLDB Endowment10.14778/3641204.364121617:5(1073-1080)Online publication date: 2-May-2024
  • (2023)MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with TransformersProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624063(2-10)Online publication date: 12-Nov-2023
  • (2023)Advising OpenMP Parallelization via A Graph-Based Approach with TransformersOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_1(3-17)Online publication date: 13-Sep-2023
  • (2022)Machine programmingProceedings of the VLDB Endowment10.14778/3554821.355489215:12(3754-3757)Online publication date: 1-Aug-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media