research-article

On multi-modal learning of editing source code

Authors:

Saikat Chakraborty,

Baishakhi RayAuthors Info & Claims

ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering

Pages 443 - 455

https://doi.org/10.1109/ASE51524.2021.9678559

Published: 24 June 2022 Publication History

Abstract

In recent years, Neural Machine Translator (NMT) has shown promise in automatically editing source code. Typical NMT based code editor only considers the code that needs to be changed as input and suggests developers with a ranked list of patched code to choose from - where the correct one may not always be at the top of the list. While NMT based code editing systems generate a broad spectrum of plausible patches, the correct one depends on the developers' requirement and often on the context where the patch is applied. Thus, if developers provide some hints, using natural language, or providing patch context, NMT models can benefit from them.

As a proof of concept, in this research, we leverage three modalities of information: edit location, edit code context, commit messages (as a proxy of developers' hint in natural language) to automatically generate edits with NMT models. To that end, we build Modit, a multi-modal NMT based code editing engine. With in-depth investigation and analysis, we show that developers' hint as an input modality can narrow the search space for patches and outperform state-of-the-art models to generate correctly patched code in top-1 position.

References

[1]

B. Ray, M. Nagappan, C. Bird, N. Nagappan, and T. Zimmermann, "The uniqueness of changes: Characteristics and applications," ser. MSR '15. ACM, 2015.

[2]

N. Meng, M. Kim, and K. S. McKinley, "Systematic editing: generating program transformations from an example," ACM SIGPLAN Notices, vol. 46, no. 6, pp. 329--342, 2011.

Digital Library

[3]

N. Meng, "Lase: Locating and applying systematic edits by learning from examples," In Proceedings of 35th International Conference on Software Engineering (ICSE), pp. 502--511, 2013.

[4]

R. Rolim, G. Soares, L. D'Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann, "Learning syntactic program transformations from examples," in Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 2017, pp. 404--415.

[5]

M. Tufano, J. Pantiuchina, C. Watson, G. Bavota, and D. Poshyvanyk, "On learning meaningful code changes via neural machine translation," arXiv preprint arXiv:1901.09102, 2019.

[6]

S. Chakraborty, Y. Ding, M. Allamanis, and B. Ray, "Codit: Code editing with tree-based neural models," IEEE Transactions on Software Engineering, vol. 1, pp. 1--1, 2020.

[7]

M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, and D. Poshyvanyk, "An empirical investigation into learning bug-fixing patches in the wild via neural machine translation," 2018.

Digital Library

[8]

M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, "An empirical study on learning bug-fixing patches in the wild via neural machine translation," ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 28, no. 4, pp. 1--29, 2019.

Digital Library

[9]

N. Jiang, T. Lutellier, and L. Tan, "Cure: Code-aware neural machine translation for automatic program repair," arXiv preprint arXiv:2103.00073, 2021.

[10]

T. Lutellier, H. V. Pham, L. Pang, Y. Li, M. Wei, and L. Tan, "Coconut: combining context-aware neural translation models using ensemble for program repair," in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 101--114.

[11]

Z. Chen, S. J. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, "Sequencer: Sequence-to-sequence learning for end-to-end program repair," IEEE Transactions on Software Engineering, 2019.

[12]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems 30, 2017, pp. 5998--6008.

[13]

Z. Liu, X. Xia, A. E. Hassan, D. Lo, Z. Xing, and X. Wang, "Neural-machine-translation-based commit message generation: How far are we?" in 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018, pp. 373--384.

[14]

W. Wang, G. Li, S. Shen, X. Xia, and Z. Jin, "Modular tree network for source code representation learning," ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 29, no. 4, pp. 1--23, 2020.

Digital Library

[15]

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, "CodeBERT: A pre-trained model for programming and natural languages," in Findings of the Association for Computational Linguistics: EMNLP 2020, Nov. 2020, pp. 1536--1547.

[16]

D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, J. Yin, D. Jiang et al., "Graphcodebert: Pre-training code representations with data flow," in International Conference on Learning Representations, 2021.

[17]

W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, "Unified pre-training for program understanding and generation," in 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021.

[18]

D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in International Conference on Learning Representations, 2015.

[19]

P. Yin and G. Neubig, "A syntactic neural model for general-purpose code generation," in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, 2017, pp. 440--450.

[20]

B. Wei, G. Li, X. Xia, Z. Fu, and Z. Jin, "Code generation as a dual task of code summarization," in Advances in Neural Information Processing Systems 32, 2019, pp. 6563--6573.

[21]

W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, "A transformer-based approach for source code summarization," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

[22]

S. Kim, J. Zhao, Y. Tian, and S. Chandra, "Code prediction by feeding trees to transformers," arXiv preprint arXiv:2003.13848, 2020.

[23]

A. Svyatkovskiy, S. K. Deng, S. Fu, and N. Sundaresan, "Intellicode compose: Code generation using transformer," in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1433--1443.

[24]

A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, "Pre-trained contextual embedding of source code," arXiv preprint arXiv:2001.00059, 2019.

[25]

L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou, and H.-W. Hon, "Unified language model pre-training for natural language understanding and generation," arXiv preprint arXiv:1905.03197, 2019.

[26]

S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang et al., "Codexglue: A machine learning benchmark dataset for code understanding and generation," arXiv preprint arXiv:2102.04664, 2021. [Online]. Available: https://arxiv.org/abs/2102.04664

[27]

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," arXiv preprint arXiv:1910.13461, 2019.

[28]

T. Kudo and J. Richardson, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Nov. 2018, pp. 66--71.

[29]

R.-M. Karampatsis, H. Babii, R. Robbes, C. Sutton, and A. Janes, "Big code!= big vocabulary: Open-vocabulary models for source code," in 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 2020, pp. 1073--1085.

[30]

J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, "Fine-grained and accurate source code differencing," in Proceedings of the 29th ACM/IEEE international conference on Automated software engineering. ACM, 2014, pp. 313--324.

[31]

R. Müller, S. Kornblith, and G. Hinton, "When does label smoothing help?" arXiv preprint arXiv:1906.02629, 2019.

[32]

Y. Ding, B. Ray, P. Devanbu, and V. J. Hellendoorn, "Patching as translation: the data and the metaphor," in 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2020, pp. 275--286.

[33]

R. Tufano, L. Pascarella, M. Tufano, D. Poshyvanyk, and G. Bavota, "Towards automating code review activities," arXiv preprint arXiv:2101.02518, 2021.

[34]

E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, and K. Wang, "Hoppity: Learning graph transformations to detect and fix bugs in programs," in International Conference on Learning Representations, 2019.

[35]

D. Tarlow, S. Moitra, A. Rice, Z. Chen, P.-A. Manzagol, C. Sutton, and E. Aftandilian, "Learning to fix build errors with graph2diff neural networks," in Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, 2020, pp. 19--20.

Digital Library

[36]

Z. Yao, F. F. Xu, P. Yin, H. Sun, and G. Neubig, "Learning structural edits via incremental tree transformations," arXiv preprint arXiv:2101.12087, 2021.

[37]

R. M. Karampatsis and C. Sutton, "How often do single-statement bugs occur? The ManySStuBs4J dataset," arXiv preprint arXiv:1905.13334, 2019.

[38]

R. Sennrich, B. Haddow, and A. Birch, "Neural machine translation of rare words with subword units," arXiv preprint arXiv:1508.07909, 2015.

[39]

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019. [Online]. Available: https://arxiv.org/abs/1907.11692

[40]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019.

[41]

X. Ge, Q. L. DuBose, and E. Murphy-Hill, "Reconciling manual and automatic refactoring," in Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 2012, pp. 211--221.

[42]

V. Raychev, M. Schäfer, M. Sridharan, and M. Vechev, "Refactoring with synthesis," in ACM SIGPLAN Notices, vol. 48, no. 10. ACM, 2013, pp. 339--354.

[43]

N. Meng, L. Hua, M. Kim, and K. S. McKinley, "Does automated refactoring obviate systematic editing?" in Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 2015, pp. 392--402.

Digital Library

[44]

R. Robbes and M. Lanza, "Example-based program transformation," in International Conference on Model Driven Engineering Languages and Systems. Springer, 2008, pp. 174--188.

[45]

H. A. Nguyen, A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, and H. Rajan, "A study of repetitiveness of code changes in software evolution," in Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 2013, pp. 180--190.

[46]

H. A. Nguyen, T. T. Nguyen, G. Wilson Jr, A. T. Nguyen, M. Kim, and T. N. Nguyen, "A graph-based approach to API usage adaptation," in ACM Sigplan Notices, vol. 45, no. 10. ACM, 2010, pp. 302--321.

[47]

W. Tansey and E. Tilevich, "Annotation refactoring: inferring upgrade transformations for legacy applications," in ACM Sigplan Notices, vol. 43, no. 10. ACM, 2008, pp. 295--312.

[48]

B. Ray, V. Hellendoorn, S. Godhane, Z. Tu, A. Bacchelli, and P. Devanbu, "On the" naturalness" of buggy code," in 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 2016, pp. 428--439.

[49]

W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, "Detecting code clones with graph neural network and flow-augmented abstract syntax tree," in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 261--271.

[50]

M. R. Parvez, S. Chakraborty, B. Ray, and K.-W. Chang, "Building language models for text with named entities," 2018.

[51]

S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, "Deep learning based vulnerability detection: Are we there yet," IEEE Transactions on Software Engineering, pp. 1--1, 2021.

[52]

H. Xu, S. Fan, Y. Wang, Z. Huang, H. Xu, and P. Xie, "Tree2tree structural language modeling for compiler fuzzing," in International Conference on Algorithms and Architectures for Parallel Processing. Springer, 2020, pp. 563--578.

[53]

U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code," in Proceedings of the ACM on Programming Languages, vol. 3. ACM, 2019, p. 40. [Online].

Digital Library

[54]

R. Just, D. Jalali, and M. D. Ernst, "Defects4J: A database of existing faults to enable controlled testing studies for java programs," in Proceedings of the 2014 International Symposium on Software Testing and Analysis. ACM, 2014, pp. 437--440.

[55]

Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, "Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks," in Advances in Neural Information Processing Systems, vol. 32, 2019, pp. 10 197--10 207.

[56]

Y. Zhou and A. Sharma, "Automated identification of security issues from commit messages and bug reports," in Proceedings of the 2017 11th joint meeting on foundations of software engineering, 2017, pp. 914--919.

[57]

K. Gallaba, C. Macho, M. Pinzger, and S. McIntosh, "Noise and heterogeneity in historical build data: an empirical study of travis ci," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 87--97.

[58]

S. Kim, H. Zhang, R. Wu, and L. Gong, "Dealing with noise in defect prediction," in 2011 33rd International Conference on Software Engineering (ICSE). IEEE, 2011, pp. 481--490.

Cited By

Allamanis MPanthaplackel SYin PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Unsupervised evaluation of code LLMs with round-trip correctnessProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692114(1050-1066)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692114
Zou WLi QLi CGe JChen XHuang LLuo B(2024)Improving Source Code Pre-Training via Type-Specific MaskingACM Transactions on Software Engineering and Methodology10.1145/369959934:3(1-34)Online publication date: 11-Oct-2024
https://dl.acm.org/doi/10.1145/3699599
Liu SLi YXie XMa WMeng GLiu Y(2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/367473133:8(1-30)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3674731
Show More Cited By

Recommendations

You Don’t Have to Say Where to Edit! jLED – Joint Learning to Localize and Edit Source Code
Learning to edit code automatically is becoming more and more feasible. Thanks to recent advances in Neural Machine Translation (NMT), various case studies are being investigated where patches are automatically produced and assessed either automatically (...
Multi-modal attention network learning for semantic source code retrieval
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering

Code retrieval techniques and tools have been playing a key role in facilitating software developers to retrieve existing code fragments from available open-source repositories given a user query (e.g., a short natural language text describing the ...
Improved source code editing for effective ad-hoc code reuse

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering

November 2021

1446 pages

ISBN:9781665403375

General Chair:
John Grundy
Monash University, Australia

Sponsors

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 24 June 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASE '21

Sponsor:

ASE '21: 36th IEEE/ACM International Conference on Automated Software Engineering

November 15 - 19, 2021

Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
49
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Allamanis MPanthaplackel SYin PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Unsupervised evaluation of code LLMs with round-trip correctnessProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692114(1050-1066)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692114
Zou WLi QLi CGe JChen XHuang LLuo B(2024)Improving Source Code Pre-Training via Type-Specific MaskingACM Transactions on Software Engineering and Methodology10.1145/369959934:3(1-34)Online publication date: 11-Oct-2024
https://dl.acm.org/doi/10.1145/3699599
Liu SLi YXie XMa WMeng GLiu Y(2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/367473133:8(1-30)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3674731
Noble JAdams BZimmermann TOzkaya ILin DZhang J(2024)Automatic Programming vs. Artificial IntelligenceProceedings of the 1st ACM International Conference on AI-Powered Software10.1145/3664646.3664775(144-146)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3664646.3664775
Liu CCai YLin YHuang YPei YJiang BYang PDong JMei HChristakis MPradel M(2024)CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive NatureProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652142(466-478)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652142
Misu MLopes CMa INoble J(2024)Towards AI-Assisted Synthesis of Verified Dafny MethodsProceedings of the ACM on Software Engineering10.1145/36437631:FSE(812-835)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643763
Liu YTantithamthavorn CLiu YLi L(2024)On the Reliability and Explainability of Language Models for Program GenerationACM Transactions on Software Engineering and Methodology10.1145/364154033:5(1-26)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3641540
Liu CCetin PPatodia YRay BChakraborty SDing Y(2024)Automated Code Editing With Search-Generate-ModifyIEEE Transactions on Software Engineering10.1109/TSE.2024.337638750:7(1675-1686)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3376387
Zhang QFang CMa YSun WChen Z(2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3631974
Souza BPradel MChandra SBlincoe KTonella P(2023)LExecutor: Learning-Guided ExecutionProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616254(1522-1534)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616254
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten