Nothing Special   »   [go: up one dir, main page]

Skip to main content

Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12114))

Included in the following conference series:

Abstract

Translating source code into natural language text helps people understand the computer program better and faster. Previous code translation methods mainly exploit human specified syntax rules. Since handcrafted syntax rules are expensive to obtain and not always available, a PL-independent automatic code translation method is much more desired. However, existing sequence translation methods generally regard source text as a plain sequence, which is not competent to capture the rich hierarchical characteristics inherently reside in the code. In this work, we exploit the abstract syntax tree (AST) that summarizes the hierarchical information of a code snippet to build a structure-aware code translation method. We propose a syntax annotation network called Code2Text to incorporate both source code and its AST into the translation. Our Code2Text features the dual encoders for the sequential input (code) and the structural input (AST) respectively. We also propose a novel dual-attention mechanism to guide the decoding process by accurately aligning the output words with both the tokens in the source code and the nodes in the AST. Experiments on a public collection of Python code demonstrate that Code2Text achieves better performance compared to several state-of-the-art methods, and the generation of Code2Text is accurate and human-readable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://docs.python.org/3/library/ast.html.

  2. 2.

    https://github.com/javaparser/javaparser.

  3. 3.

    https://github.com/foonathan/cppast.

References

  1. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: ICML 2016, New York City, NY, pp. 2091–2100 (2016)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  3. Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: WCRE 2010, Beverly, MA, pp. 35–44 (2010)

    Google Scholar 

  4. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  5. Jones, K.S.: Automatic summarising: the state of the art. Inf. Process. Manag. 43(6), 1449–1481 (2007)

    Article  Google Scholar 

  6. Karaivanov, S., Raychev, V., Vechev, M.: Phrase-based statistical translation of programming languages. In: SPLASH 2014, Portland, OR, pp. 173–184 (2014)

    Google Scholar 

  7. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL 2003, Edmonton, Canada, vol. 1, pp. 48–54 (2003)

    Google Scholar 

  8. Liu, X., Kong, X., Liu, L., Chiang, K.: TreeGAN: syntax-aware sequence generation with generative adversarial networks. In: ICDM 2018 (2018)

    Google Scholar 

  9. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  10. Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: InterSpeech 2010, Makuhari, Japan (2010)

    Google Scholar 

  11. Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L., Vijay-Shanker, K.: Automatic generation of natural language summaries for Java classes. In: ICPC 2013, San Francisco, CA, pp. 23–32 (2013)

    Google Scholar 

  12. Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation. In: ASE 2015, Lincoln, NE, pp. 574–584 (2015)

    Google Scholar 

  13. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL 2002, Philadelphia, PA, pp. 311–318 (2002)

    Google Scholar 

  14. Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: ASE 2010, Lawrence, KS, pp. 43–52 (2010)

    Google Scholar 

  15. Sridhara, G., Pollock, L., Vijay-Shanker, K.: Automatically detecting and describing high level actions within methods. In: ICSE 2011, Honolulu, HI, pp. 101–110 (2011)

    Google Scholar 

  16. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014, Montreal, Canada, pp. 3104–3112 (2014)

    Google Scholar 

  17. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)

  18. Wong, E., Yang, J., Tan, L.: AutoComment: mining question and answer sites for automatic comment generation. In: ASE 2013, Palo Alto, CA, pp. 562–567 (2013)

    Google Scholar 

  19. Zheng, W., Zhou, H., Li, M., Wu, J.: Code attention: translating code to comments by exploiting domain features. arXiv preprint arXiv:1709.07642 (2017)

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China Projects No. U1636207, U1936213, the NSF under grants No. III-1526499, III-1763325, III-1909323, and CNS-1930941, the Shanghai Science and Technology Development Fund No. 19DZ1200802, 19511121204.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xiong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiong, Y. et al. (2020). Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12114. Springer, Cham. https://doi.org/10.1007/978-3-030-59419-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59419-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59418-3

  • Online ISBN: 978-3-030-59419-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics