Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

Yun Xiong¹⁴,
Shaofeng Xu¹⁴,
Keyao Rong¹⁴,
Xinyue Liu¹⁵,
Xiangnan Kong¹⁵,
Shanshan Li¹⁶,
Philip Yu¹⁷ &
…
Yangyong Zhu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12114))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2308 Accesses
3 Citations

Abstract

Translating source code into natural language text helps people understand the computer program better and faster. Previous code translation methods mainly exploit human specified syntax rules. Since handcrafted syntax rules are expensive to obtain and not always available, a PL-independent automatic code translation method is much more desired. However, existing sequence translation methods generally regard source text as a plain sequence, which is not competent to capture the rich hierarchical characteristics inherently reside in the code. In this work, we exploit the abstract syntax tree (AST) that summarizes the hierarchical information of a code snippet to build a structure-aware code translation method. We propose a syntax annotation network called Code2Text to incorporate both source code and its AST into the translation. Our Code2Text features the dual encoders for the sequential input (code) and the structural input (AST) respectively. We also propose a novel dual-attention mechanism to guide the decoding process by accurately aligning the output words with both the tokens in the source code and the nodes in the AST. Experiments on a public collection of Python code demonstrate that Code2Text achieves better performance compared to several state-of-the-art methods, and the generation of Code2Text is accurate and human-readable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LenANet: A Length-Controllable Attention Network for Source Code Summarization

FCSO: Source Code Summarization by Fusing Multiple Code Features and Ensuring Self-consistency Output

A syntax-guided multi-task learning approach for Turducken-style code generation

Article 14 October 2023

Notes

References

Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: ICML 2016, New York City, NY, pp. 2091–2100 (2016)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: WCRE 2010, Beverly, MA, pp. 35–44 (2010)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jones, K.S.: Automatic summarising: the state of the art. Inf. Process. Manag. 43(6), 1449–1481 (2007)
Article Google Scholar
Karaivanov, S., Raychev, V., Vechev, M.: Phrase-based statistical translation of programming languages. In: SPLASH 2014, Portland, OR, pp. 173–184 (2014)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL 2003, Edmonton, Canada, vol. 1, pp. 48–54 (2003)
Google Scholar
Liu, X., Kong, X., Liu, L., Chiang, K.: TreeGAN: syntax-aware sequence generation with generative adversarial networks. In: ICDM 2018 (2018)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: InterSpeech 2010, Makuhari, Japan (2010)
Google Scholar
Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L., Vijay-Shanker, K.: Automatic generation of natural language summaries for Java classes. In: ICPC 2013, San Francisco, CA, pp. 23–32 (2013)
Google Scholar
Oda, Y., et al.: Learning to generate pseudo-code from source code using statistical machine translation. In: ASE 2015, Lincoln, NE, pp. 574–584 (2015)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL 2002, Philadelphia, PA, pp. 311–318 (2002)
Google Scholar
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: ASE 2010, Lawrence, KS, pp. 43–52 (2010)
Google Scholar
Sridhara, G., Pollock, L., Vijay-Shanker, K.: Automatically detecting and describing high level actions within methods. In: ICSE 2011, Honolulu, HI, pp. 101–110 (2011)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014, Montreal, Canada, pp. 3104–3112 (2014)
Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Wong, E., Yang, J., Tan, L.: AutoComment: mining question and answer sites for automatic comment generation. In: ASE 2013, Palo Alto, CA, pp. 562–567 (2013)
Google Scholar
Zheng, W., Zhou, H., Li, M., Wu, J.: Code attention: translating code to comments by exploiting domain features. arXiv preprint arXiv:1709.07642 (2017)

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China Projects No. U1636207, U1936213, the NSF under grants No. III-1526499, III-1763325, III-1909323, and CNS-1930941, the Shanghai Science and Technology Development Fund No. 19DZ1200802, 19511121204.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
Yun Xiong, Shaofeng Xu, Keyao Rong & Yangyong Zhu
Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA, USA
Xinyue Liu & Xiangnan Kong
School of Computer Science, National University of Defense Technology, Changsha, China
Shanshan Li
University of Illinois at Chicago, Chicago, IL, USA
Philip Yu

Authors

Yun Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Shaofeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Keyao Rong
View author publications
You can also search for this author in PubMed Google Scholar
Xinyue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangnan Kong
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Philip Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yangyong Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Xiong .

Editor information

Editors and Affiliations

Dankook University, Yongin, Korea (Republic of)
Yunmook Nah
Peking University, Haidian, China
Bin Cui
Sungkyunkwan University, Suwon, Korea (Republic of)
Sang-Won Lee
Department of Systems Engineering and En, The Chinese University of Hong Kong, Hong Kong, Hong Kong
Jeffrey Xu Yu
Kangwon National University, Chunchon, Korea (Republic of)
Yang-Sae Moon
Korea Advanced Institute of Science and, Daejeon, Korea (Republic of)
Steven Euijong Whang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, Y. et al. (2020). Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12114. Springer, Cham. https://doi.org/10.1007/978-3-030-59419-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-59419-0_6
Published: 22 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59418-3
Online ISBN: 978-3-030-59419-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LenANet: A Length-Controllable Attention Network for Source Code Summarization

FCSO: Source Code Summarization by Fusing Multiple Code Features and Ensuring Self-consistency Output

A syntax-guided multi-task learning approach for Turducken-style code generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LenANet: A Length-Controllable Attention Network for Source Code Summarization

FCSO: Source Code Summarization by Fusing Multiple Code Features and Ensuring Self-consistency Output

A syntax-guided multi-task learning approach for Turducken-style code generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation