Abstract
Translating source code into natural language text helps people understand the computer program better and faster. Previous code translation methods mainly exploit human specified syntax rules. Since handcrafted syntax rules are expensive to obtain and not always available, a PL-independent automatic code translation method is much more desired. However, existing sequence translation methods generally regard source text as a plain sequence, which is not competent to capture the rich hierarchical characteristics inherently reside in the code. In this work, we exploit the abstract syntax tree (AST) that summarizes the hierarchical information of a code snippet to build a structure-aware code translation method. We propose a syntax annotation network called Code2Text to incorporate both source code and its AST into the translation. Our Code2Text features the dual encoders for the sequential input (code) and the structural input (AST) respectively. We also propose a novel dual-attention mechanism to guide the decoding process by accurately aligning the output words with both the tokens in the source code and the nodes in the AST. Experiments on a public collection of Python code demonstrate that Code2Text achieves better performance compared to several state-of-the-art methods, and the generation of Code2Text is accurate and human-readable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: ICML 2016, New York City, NY, pp. 2091–2100 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: WCRE 2010, Beverly, MA, pp. 35–44 (2010)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jones, K.S.: Automatic summarising: the state of the art. Inf. Process. Manag. 43(6), 1449–1481 (2007)
Karaivanov, S., Raychev, V., Vechev, M.: Phrase-based statistical translation of programming languages. In: SPLASH 2014, Portland, OR, pp. 173–184 (2014)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL 2003, Edmonton, Canada, vol. 1, pp. 48–54 (2003)
Liu, X., Kong, X., Liu, L., Chiang, K.: TreeGAN: syntax-aware sequence generation with generative adversarial networks. In: ICDM 2018 (2018)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: InterSpeech 2010, Makuhari, Japan (2010)
Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L., Vijay-Shanker, K.: Automatic generation of natural language summaries for Java classes. In: ICPC 2013, San Francisco, CA, pp. 23–32 (2013)
Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation. In: ASE 2015, Lincoln, NE, pp. 574–584 (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL 2002, Philadelphia, PA, pp. 311–318 (2002)
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: ASE 2010, Lawrence, KS, pp. 43–52 (2010)
Sridhara, G., Pollock, L., Vijay-Shanker, K.: Automatically detecting and describing high level actions within methods. In: ICSE 2011, Honolulu, HI, pp. 101–110 (2011)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014, Montreal, Canada, pp. 3104–3112 (2014)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Wong, E., Yang, J., Tan, L.: AutoComment: mining question and answer sites for automatic comment generation. In: ASE 2013, Palo Alto, CA, pp. 562–567 (2013)
Zheng, W., Zhou, H., Li, M., Wu, J.: Code attention: translating code to comments by exploiting domain features. arXiv preprint arXiv:1709.07642 (2017)
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China Projects No. U1636207, U1936213, the NSF under grants No. III-1526499, III-1763325, III-1909323, and CNS-1930941, the Shanghai Science and Technology Development Fund No. 19DZ1200802, 19511121204.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xiong, Y. et al. (2020). Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12114. Springer, Cham. https://doi.org/10.1007/978-3-030-59419-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-59419-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59418-3
Online ISBN: 978-3-030-59419-0
eBook Packages: Computer ScienceComputer Science (R0)