research-article

Open access

Seq2Parse: neurosymbolic parse error repair

Authors:

Georgios Sakkas,

Madeline Endres,

Westley Weimer,

Ranjit JhalaAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 6, Issue OOPSLA2

Article No.: 167, Pages 1180 - 1206

https://doi.org/10.1145/3563330

Published: 31 October 2022 Publication History

Abstract

We present Seq2Parse, a language-agnostic neurosymbolic approach to automatically repairing parse errors. Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by the many error-correction rules that are not relevant to the particular program that requires repair. In contrast, Neural approaches are fooled by the large space of possible sequence level edits, but can precisely pinpoint the set of EC-rules that are relevant to a particular program. We show how to combine their complementary strengths by using neural methods to train a sequence classifier that predicts the small set of relevant EC-rules for an ill-parsed program, after which, the symbolic EC-parsing algorithm can make short work of generating useful repairs. We train and evaluate our method on a dataset of 1,100,000 Python programs, and show that Seq2Parse is accurate and efficient: it can parse 94% of our tests within 2.1 seconds, while generating the exact user fix in 1 out 3 of the cases; and useful: humans perceive both Seq2Parse-generated error locations and repairs to be almost as good as human-generated ones in a statistically-significant manner.

References

[1]

Alireza Ahadi, Raymond Lister, Shahil Lal, and Arto Hellas. 2018. Learning Programming, Syntax Errors and Institution-Specific Factors. In Proceedings of the 20th Australasian Computing Education Conference (ACE ’18). Association for Computing Machinery, New York, NY, USA. 90–96. isbn:9781450363402 https://doi.org/10.1145/3160489.3160490

Digital Library

[2]

Toufique Ahmed, Premkumar Devanbu, and Vincent J Hellendoorn. 2021. Learning lenient parsing & typing via indirect supervision. Empirical Software Engineering, 26, 2 (2021), mar, https://doi.org/10.1007/s10664-021-09942-y

Digital Library

[3]

A. V. Aho and S. C. Johnson. 1974. LR Parsing. ACM Comput. Surv., 6, 2 (1974), jun, 99–124. issn:0360-0300 https://doi.org/10.1145/356628.356629

Digital Library

[4]

Alfred V. Aho and Thomas G. Peterson. 1972. A Minimum Distance Error-Correcting Parser for Context-Free Languages. SIAM J. Comput., 1 (1972), 305–312.

Digital Library

[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473 (2015).

[6]

Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1603.06129

[7]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg. 209–210. isbn:0387310738

Digital Library

[8]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165

[9]

Michael G. Burke and Gerald A. Fisher. 1987. A Practical Method for LR and LL Syntactic Error Diagnosis and Recovery. ACM Trans. Program. Lang. Syst., 9, 2 (1987), mar, 164–197. issn:0164-0925 https://doi.org/10.1145/22719.22720

Digital Library

[10]

Nigel P Chapman. 1987. LR Parsing: Theory and Practice. Cambridge University Press, New York, NY, USA. isbn:0-521-30413-X

[11]

Michael Collins. 2013. Probabilistic Context-Free Grammars (PCFGs). Lecture Notes, https://u.cs.biu.ac.il/~89-680/collins-pcfgs.pdf

[12]

Rafael Corchuelo, José A. Pérez, Antonio Ruiz, and Miguel Toro. 2002. Repairing Syntax Errors in LR Parsers. ACM Trans. Program. Lang. Syst., 24, 6 (2002), nov, 698–710. issn:0164-0925 https://doi.org/10.1145/586088.586092

Digital Library

[13]

Benjamin Cosman, Madeline Endres, Georgios Sakkas, Leon Medvinsky, Yao-Yuan Yang, Ranjit Jhala, Kamalika Chaudhuri, and Westley Weimer. 2020. PABLO: Helping Novices Debug Python Code Through Data-Driven Fault Localization. Association for Computing Machinery, New York, NY, USA. 1047–1053. isbn:9781450367936 https://doi.org/10.1145/3328778.3366860

Digital Library

[14]

Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors Are Not Equal. In Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE ’12). Association for Computing Machinery, New York, NY, USA. 75–80. isbn:9781450312462 https://doi.org/10.1145/2325296.2325318

Digital Library

[15]

Jay Earley. 1970. An Efficient Context-Free Parsing Algorithm. Commun. ACM, 13, 2 (1970), Feb., 94–102. issn:0001-0782 https://doi.org/10.1145/362007.362035

Digital Library

[16]

Madeline Endres, Georgios Sakkas, Benjamin Cosman, Ranjit Jhala, and Westley Weimer. 2019. InFix: Automatically Repairing Novice Program Inputs. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE ’19). IEEE Press, 399–410. isbn:9781728125084 https://doi.org/10.1109/ASE.2019.00045

Digital Library

[17]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, 180–184. http://www.deeplearningbook.org

Digital Library

[18]

Sumit Gulwani, Ivan Radicek, and Florian Zuleger. 2018. Automated clustering and program repair for introductory programming assignments. Programming Language Design and Implementation, isbn:9781450356985 https://doi.org/10.1145/3192366.3192387

Digital Library

[19]

Philip J. Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visualization for Cs Education. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (SIGCSE ’13). Association for Computing Machinery, New York, NY, USA. 579–584. isbn:9781450318686 https://doi.org/10.1145/2445196.2445368

Digital Library

[20]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 1 (2017), Feb., https://ojs.aaai.org/index.php/AAAI/article/view/10742

[21]

Momchil Hardalov, Ivan Koychev, and Preslav Nakov. 2018. Towards Automated Customer Support. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. 48–59. https://doi.org/10.1007/978-3-319-99344-7_5

Digital Library

[22]

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer New York. isbn:9780387848570, 9780387848587 https://doi.org/10.1007/978-0-387-84858-7

[23]

Frederick Jelinek, John D Lafferty, and Robert L Mercer. 1992. Basic Methods of Probabilistic Context Free Grammars. In Speech Recognition and Understanding. Springer, 345–360. https://doi.org/10.1007/978-3-642-76626-8_35

[24]

Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE.

[25]

Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush. 2017. Structured Attention Networks. CoRR, abs/1702.00887 (2017), arXiv:1702.00887. arxiv:1702.00887

[26]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. 22 Dec., arxiv:1412.6980.

[27]

Donald E. Knuth. 1965. On the translation of languages from left to right. Information and Control, 8, 6 (1965), 607–639. issn:0019-9958 https://doi.org/10.1016/S0019-9958(65)90426-2

[28]

Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In International Symposium on Software Testing and Analysis. ACM, 165–176. isbn:9781450343909 https://doi.org/10.1145/2931037.2931051

Digital Library

[29]

Sarah K. Kummerfeld and Judy Kay. 2003. The Neglected Battle Fields of Syntax Errors. In Proceedings of the Fifth Australasian Conference on Computing Education - Volume 20 (ACE ’03). Australian Computer Society, Inc., AUS. 105–111. isbn:0909925984

[30]

Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2020). Association for Computing Machinery, New York, NY, USA. 101–114. isbn:9781450380089 https://doi.org/10.1145/3395363.3397369

Digital Library

[31]

Matias Martinez, Laurence Duchien, and Martin Monperrus. 2013. Automatically extracting instances of code change patterns with AST analysis. In 2013 IEEE international conference on software maintenance. 388–391.

Digital Library

[32]

Philippe McLean and R. Nigel Horspool. 1996. A Faster Earley Parser. In CC.

[33]

Eugene W Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica, 1, 1-4 (1986), 251–266.

Digital Library

[34]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning. 807–814.

Digital Library

[35]

Michael A Nielsen. 2015. Neural Networks and Deep Learning. Determination Press.

[36]

Chris Parnin and Alessandro Orso. 2011. Are Automated Debugging Techniques Actually Helping Programmers? In International Symposium on Software Testing and Analysis. ACM, 199–209. isbn:9781450305624 https://doi.org/10.1145/2001420.2001445

Digital Library

[37]

Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: a neural program corrector for MOOCs. https://doi.org/10.48550/ARXIV.1607.02902

[38]

Yizhou Qian and James Lehman. 2017. Students’ Misconceptions and Other Difficulties in Introductory Programming: A Literature Review. ACM Trans. Comput. Educ., 18, 1 (2017), Article 1, oct, 24 pages. https://doi.org/10.1145/3077618

Digital Library

[39]

Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535

Digital Library

[40]

Sanguthevar Rajasekaran and Marius Nicolae. 2014. An error correcting parser for context free grammars that takes less than cubic time. https://doi.org/10.48550/ARXIV.1406.3405

[41]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-propagating Errors. Nature, 323, 6088 (1986), 533–536. https://doi.org/10.1038/323533a0

[42]

Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley Weimer, and Ranjit Jhala. 2020. Type Error Feedback via Analytic Program Repair. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 16–30. isbn:9781450376136 https://doi.org/10.1145/3385412.3386005

Digital Library

[43]

George Sakkas, Ranjit Jhala, Westley Weimer, and Madeline Endres. 2022. gsakkas/seq2parse: OOPSLA2 2022 code release. https://doi.org/10.5281/zenodo.7080357

Digital Library

[44]

Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks, 61 (2015), Jan, 85–117. issn:0893-6080 https://doi.org/10.1016/j.neunet.2014.09.003

Digital Library

[45]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arxiv:1409.3215.

Digital Library

[46]

Richard A. Thompson. 1976. Language Correction Using Probabilistic Grammars. IEEE Trans. Comput., C-25, 3 (1976), 275–286. https://doi.org/10.1109/TC.1976.5009254

Digital Library

[47]

P. van der Spek, N. Plat, and C. Pronk. 2005. Syntax Error Repair for a Java-Based Parser Generator. SIGPLAN Not., 40, 4 (2005), April, 47–50. issn:0362-1340 https://doi.org/10.1145/1064165.1064173

Digital Library

[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc., 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

[49]

Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477

Digital Library

[50]

Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Search, Align, and Repair: Data-driven Feedback Generation for Introductory Programming Exercises. In Programming Language Design and Implementation. 481–495. isbn:978-1-4503-5698-5 https://doi.org/10.1145/3192366.3192384

Digital Library

[51]

P.J. Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE, 78, 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337

[52]

Liwei Wu, Fei Li, Youhua Wu, and Tao Zheng. 2020. GGF: A Graph-Based Method for Programming Language Syntax Error Correction. Association for Computing Machinery, New York, NY, USA. 139–148. isbn:9781450379588 https://doi.org/10.1145/3387904.3389252

Digital Library

Cited By

Joshi A(2024)Revitalizing Language Processing with Intelligent Syntax Repair and Boolean Tensor Completion2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825203(5361-5368)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825203
Chen XHu XHuang YJiang HJi WJiang YJiang YLiu BLiu HLi XLian XMeng GPeng XSun HShi LWang BWang CWang JWang TXuan JXia XYang YYang YZhang LZhou YZhang L(2024)Deep learning-based software engineering: progress, challenges, and opportunitiesScience China Information Sciences10.1007/s11432-023-4127-568:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s11432-023-4127-5
Zhang WWang GChen JXiong YLiu YZhang LBissyandé TKlein JBird CSarro F(2023)OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL ReachabilityProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00072(1200-1211)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00072

Index Terms

Seq2Parse: neurosymbolic parse error repair

Recommendations

Error repair in shift-reduce parsers

Local error repair of strings during CFG parsing requires the insertion and deletion of symbols in the region of a syntax error to produce a string that is error free. Rather than precalculating tables at parser generation time to assist in finding such ...
Error-repair parsing schemata

Robustness, the ability to analyze any input regardless of its grammaticality, is a desirable property for any system dealing with unrestricted natural language text. Error-repair parsing approaches achieve robustness by considering ungrammatical ...
Integrated control of chart items for error repair
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2

This paper describes a system that performs hierarchical error repair for ill-formed sentences, with heterarchical control of chart items produced at the lexical, syntactic, and semantic levels. The system uses an augmented context-free grammar and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 6, Issue OOPSLA2

October 2022

1932 pages

EISSN:2475-1421

DOI:10.1145/3554307

Editor:
Philip Wadler
University of Edinburgh, UK

Issue’s Table of Contents

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2022

Published in PACMPL Volume 6, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
598
Total Downloads

Downloads (Last 12 months)271
Downloads (Last 6 weeks)46

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Joshi A(2024)Revitalizing Language Processing with Intelligent Syntax Repair and Boolean Tensor Completion2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825203(5361-5368)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825203
Chen XHu XHuang YJiang HJi WJiang YJiang YLiu BLiu HLi XLian XMeng GPeng XSun HShi LWang BWang CWang JWang TXuan JXia XYang YYang YZhang LZhou YZhang L(2024)Deep learning-based software engineering: progress, challenges, and opportunitiesScience China Information Sciences10.1007/s11432-023-4127-568:1Online publication date: 24-Dec-2024
https://doi.org/10.1007/s11432-023-4127-5
Zhang WWang GChen JXiong YLiu YZhang LBissyandé TKlein JBird CSarro F(2023)OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL ReachabilityProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00072(1200-1211)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00072

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents