Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Seq2Parse: neurosymbolic parse error repair

Published: 31 October 2022 Publication History

Abstract

We present Seq2Parse, a language-agnostic neurosymbolic approach to automatically repairing parse errors. Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by the many error-correction rules that are not relevant to the particular program that requires repair. In contrast, Neural approaches are fooled by the large space of possible sequence level edits, but can precisely pinpoint the set of EC-rules that are relevant to a particular program. We show how to combine their complementary strengths by using neural methods to train a sequence classifier that predicts the small set of relevant EC-rules for an ill-parsed program, after which, the symbolic EC-parsing algorithm can make short work of generating useful repairs. We train and evaluate our method on a dataset of 1,100,000 Python programs, and show that Seq2Parse is accurate and efficient: it can parse 94% of our tests within 2.1 seconds, while generating the exact user fix in 1 out 3 of the cases; and useful: humans perceive both Seq2Parse-generated error locations and repairs to be almost as good as human-generated ones in a statistically-significant manner.

References

[1]
Alireza Ahadi, Raymond Lister, Shahil Lal, and Arto Hellas. 2018. Learning Programming, Syntax Errors and Institution-Specific Factors. In Proceedings of the 20th Australasian Computing Education Conference (ACE ’18). Association for Computing Machinery, New York, NY, USA. 90–96. isbn:9781450363402 https://doi.org/10.1145/3160489.3160490
[2]
Toufique Ahmed, Premkumar Devanbu, and Vincent J Hellendoorn. 2021. Learning lenient parsing & typing via indirect supervision. Empirical Software Engineering, 26, 2 (2021), mar, https://doi.org/10.1007/s10664-021-09942-y
[3]
A. V. Aho and S. C. Johnson. 1974. LR Parsing. ACM Comput. Surv., 6, 2 (1974), jun, 99–124. issn:0360-0300 https://doi.org/10.1145/356628.356629
[4]
Alfred V. Aho and Thomas G. Peterson. 1972. A Minimum Distance Error-Correcting Parser for Context-Free Languages. SIAM J. Comput., 1 (1972), 305–312.
[5]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473 (2015).
[6]
Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1603.06129
[7]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg. 209–210. isbn:0387310738
[8]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
[9]
Michael G. Burke and Gerald A. Fisher. 1987. A Practical Method for LR and LL Syntactic Error Diagnosis and Recovery. ACM Trans. Program. Lang. Syst., 9, 2 (1987), mar, 164–197. issn:0164-0925 https://doi.org/10.1145/22719.22720
[10]
Nigel P Chapman. 1987. LR Parsing: Theory and Practice. Cambridge University Press, New York, NY, USA. isbn:0-521-30413-X
[11]
Michael Collins. 2013. Probabilistic Context-Free Grammars (PCFGs). Lecture Notes, https://u.cs.biu.ac.il/~89-680/collins-pcfgs.pdf
[12]
Rafael Corchuelo, José A. Pérez, Antonio Ruiz, and Miguel Toro. 2002. Repairing Syntax Errors in LR Parsers. ACM Trans. Program. Lang. Syst., 24, 6 (2002), nov, 698–710. issn:0164-0925 https://doi.org/10.1145/586088.586092
[13]
Benjamin Cosman, Madeline Endres, Georgios Sakkas, Leon Medvinsky, Yao-Yuan Yang, Ranjit Jhala, Kamalika Chaudhuri, and Westley Weimer. 2020. PABLO: Helping Novices Debug Python Code Through Data-Driven Fault Localization. Association for Computing Machinery, New York, NY, USA. 1047–1053. isbn:9781450367936 https://doi.org/10.1145/3328778.3366860
[14]
Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors Are Not Equal. In Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE ’12). Association for Computing Machinery, New York, NY, USA. 75–80. isbn:9781450312462 https://doi.org/10.1145/2325296.2325318
[15]
Jay Earley. 1970. An Efficient Context-Free Parsing Algorithm. Commun. ACM, 13, 2 (1970), Feb., 94–102. issn:0001-0782 https://doi.org/10.1145/362007.362035
[16]
Madeline Endres, Georgios Sakkas, Benjamin Cosman, Ranjit Jhala, and Westley Weimer. 2019. InFix: Automatically Repairing Novice Program Inputs. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE ’19). IEEE Press, 399–410. isbn:9781728125084 https://doi.org/10.1109/ASE.2019.00045
[17]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, 180–184. http://www.deeplearningbook.org
[18]
Sumit Gulwani, Ivan Radicek, and Florian Zuleger. 2018. Automated clustering and program repair for introductory programming assignments. Programming Language Design and Implementation, isbn:9781450356985 https://doi.org/10.1145/3192366.3192387
[19]
Philip J. Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visualization for Cs Education. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (SIGCSE ’13). Association for Computing Machinery, New York, NY, USA. 579–584. isbn:9781450318686 https://doi.org/10.1145/2445196.2445368
[20]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 1 (2017), Feb., https://ojs.aaai.org/index.php/AAAI/article/view/10742
[21]
Momchil Hardalov, Ivan Koychev, and Preslav Nakov. 2018. Towards Automated Customer Support. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. 48–59. https://doi.org/10.1007/978-3-319-99344-7_5
[22]
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer New York. isbn:9780387848570, 9780387848587 https://doi.org/10.1007/978-0-387-84858-7
[23]
Frederick Jelinek, John D Lafferty, and Robert L Mercer. 1992. Basic Methods of Probabilistic Context Free Grammars. In Speech Recognition and Understanding. Springer, 345–360. https://doi.org/10.1007/978-3-642-76626-8_35
[24]
Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE.
[25]
Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush. 2017. Structured Attention Networks. CoRR, abs/1702.00887 (2017), arXiv:1702.00887. arxiv:1702.00887
[26]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. 22 Dec., arxiv:1412.6980.
[27]
Donald E. Knuth. 1965. On the translation of languages from left to right. Information and Control, 8, 6 (1965), 607–639. issn:0019-9958 https://doi.org/10.1016/S0019-9958(65)90426-2
[28]
Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In International Symposium on Software Testing and Analysis. ACM, 165–176. isbn:9781450343909 https://doi.org/10.1145/2931037.2931051
[29]
Sarah K. Kummerfeld and Judy Kay. 2003. The Neglected Battle Fields of Syntax Errors. In Proceedings of the Fifth Australasian Conference on Computing Education - Volume 20 (ACE ’03). Australian Computer Society, Inc., AUS. 105–111. isbn:0909925984
[30]
Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2020). Association for Computing Machinery, New York, NY, USA. 101–114. isbn:9781450380089 https://doi.org/10.1145/3395363.3397369
[31]
Matias Martinez, Laurence Duchien, and Martin Monperrus. 2013. Automatically extracting instances of code change patterns with AST analysis. In 2013 IEEE international conference on software maintenance. 388–391.
[32]
Philippe McLean and R. Nigel Horspool. 1996. A Faster Earley Parser. In CC.
[33]
Eugene W Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica, 1, 1-4 (1986), 251–266.
[34]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning. 807–814.
[35]
Michael A Nielsen. 2015. Neural Networks and Deep Learning. Determination Press.
[36]
Chris Parnin and Alessandro Orso. 2011. Are Automated Debugging Techniques Actually Helping Programmers? In International Symposium on Software Testing and Analysis. ACM, 199–209. isbn:9781450305624 https://doi.org/10.1145/2001420.2001445
[37]
Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: a neural program corrector for MOOCs. https://doi.org/10.48550/ARXIV.1607.02902
[38]
Yizhou Qian and James Lehman. 2017. Students’ Misconceptions and Other Difficulties in Introductory Programming: A Literature Review. ACM Trans. Comput. Educ., 18, 1 (2017), Article 1, oct, 24 pages. https://doi.org/10.1145/3077618
[39]
Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535
[40]
Sanguthevar Rajasekaran and Marius Nicolae. 2014. An error correcting parser for context free grammars that takes less than cubic time. https://doi.org/10.48550/ARXIV.1406.3405
[41]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-propagating Errors. Nature, 323, 6088 (1986), 533–536. https://doi.org/10.1038/323533a0
[42]
Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley Weimer, and Ranjit Jhala. 2020. Type Error Feedback via Analytic Program Repair. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 16–30. isbn:9781450376136 https://doi.org/10.1145/3385412.3386005
[43]
George Sakkas, Ranjit Jhala, Westley Weimer, and Madeline Endres. 2022. gsakkas/seq2parse: OOPSLA2 2022 code release. https://doi.org/10.5281/zenodo.7080357
[44]
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks, 61 (2015), Jan, 85–117. issn:0893-6080 https://doi.org/10.1016/j.neunet.2014.09.003
[45]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arxiv:1409.3215.
[46]
Richard A. Thompson. 1976. Language Correction Using Probabilistic Grammars. IEEE Trans. Comput., C-25, 3 (1976), 275–286. https://doi.org/10.1109/TC.1976.5009254
[47]
P. van der Spek, N. Plat, and C. Pronk. 2005. Syntax Error Repair for a Java-Based Parser Generator. SIGPLAN Not., 40, 4 (2005), April, 47–50. issn:0362-1340 https://doi.org/10.1145/1064165.1064173
[48]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc., 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[49]
Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477
[50]
Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Search, Align, and Repair: Data-driven Feedback Generation for Introductory Programming Exercises. In Programming Language Design and Implementation. 481–495. isbn:978-1-4503-5698-5 https://doi.org/10.1145/3192366.3192384
[51]
P.J. Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE, 78, 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337
[52]
Liwei Wu, Fei Li, Youhua Wu, and Tao Zheng. 2020. GGF: A Graph-Based Method for Programming Language Syntax Error Correction. Association for Computing Machinery, New York, NY, USA. 139–148. isbn:9781450379588 https://doi.org/10.1145/3387904.3389252

Cited By

View all
  • (2024)Revitalizing Language Processing with Intelligent Syntax Repair and Boolean Tensor Completion2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825203(5361-5368)Online publication date: 15-Dec-2024
  • (2024)Deep learning-based software engineering: progress, challenges, and opportunitiesScience China Information Sciences10.1007/s11432-023-4127-568:1Online publication date: 24-Dec-2024
  • (2023)OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL ReachabilityProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00072(1200-1211)Online publication date: 11-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 6, Issue OOPSLA2
October 2022
1932 pages
EISSN:2475-1421
DOI:10.1145/3554307
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2022
Published in PACMPL Volume 6, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Automated Program Repair
  2. Error-Correcting Parsers
  3. Machine Learning

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)271
  • Downloads (Last 6 weeks)46
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Revitalizing Language Processing with Intelligent Syntax Repair and Boolean Tensor Completion2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825203(5361-5368)Online publication date: 15-Dec-2024
  • (2024)Deep learning-based software engineering: progress, challenges, and opportunitiesScience China Information Sciences10.1007/s11432-023-4127-568:1Online publication date: 24-Dec-2024
  • (2023)OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL ReachabilityProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00072(1200-1211)Online publication date: 11-Nov-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media