Code Summarization with Abstract Syntax Tree

Qiuyuan Chen⁹,
Han Hu¹⁰ &
Zhaoyi Liu¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1143))

Included in the following conference series:

International Conference on Neural Information Processing

2745 Accesses

Abstract

Code summarization, which provides a high-level description of the function implemented by code, plays a vital role in software maintenance and code retrieval. Traditional approaches focus on retrieving similar code snippets to generate summaries, and recently researchers pay increasing attention to leverage deep learning approaches, especially the encoder-decoder framework. Approaches based on encoder-decoder suffer from two drawbacks: (a) Lack of summarization in functionality level; (b) Code snippets are always too long (more than ten words), regular encoders perform poorly. In this paper, we propose a novel code representation with the help of Abstract Syntax Trees, which could describe the functionality of code snippets and shortens the length of inputs. Based on our proposed code representation, we develop Generative Task, which aims to generate summary sentences of code snippets. Experiments on large-scale real-world industrial Java projects indicate that our approaches are effective and outperform the state-of-the-art approaches in code summarization.

Q. Chen, H. Hu and Z. Liu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FCSO: Source Code Summarization by Fusing Multiple Code Features and Ensuring Self-consistency Output

LenANet: A Length-Controllable Attention Network for Source Code Summarization

Code Summarization with Project-Specific Features

Notes

1.
The camel-cased and underline identifiers are split into several words, for example, split checkJavaFile to three words: check, Java, File, so a leaf may consist of several tokens.
2.
https://github.com/javaparser/javaparser.
3.
http://help.eclipse.org/mars/index.jsp.
4.
https://www.oracle.com/technetwork/articles/java/index-137868.html.
5.
https://pytorch.org/tutorials/.

References

Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International Conference on Machine Learning, pp. 2091–2100 (2016)
Google Scholar
Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: Proceedings of the 26th Conference on Program Comprehension, pp. 200–210. ACM (2018)
Google Scholar
Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred API knowledge. In: IJCAI, pp. 2269–2275 (2018)
Google Scholar
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2073–2083 (2016)
Google Scholar
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Mapping language to code in programmatic context (2018)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Liang, Y., Zhu, K.Q.: Automatic generation of text descriptive comments for code blocks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ling, W., et al.: Latent predictor networks for code generation (2016)
Google Scholar
McBurney, P.W., McMillan, C.: Automatic documentation generation via source code summarization of method context. In: Proceedings of the 22nd International Conference on Program Comprehension, pp. 279–290. ACM (2014)
Google Scholar
Moreno, L., Aponte, J., Sridhara, G., Marcus, A., Pollock, L., Vijay-Shanker, K.: Automatic generation of natural language summaries for java classes. In: 2013 21st International Conference on Program Comprehension (ICPC), pp. 23–32. IEEE (2013)
Google Scholar
Movshovitz-Attias, D., Cohen, W.W.: Natural language models for predicting programming comments. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, pp. 35–40 (2013)
Google Scholar
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for java methods. In: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, pp. 43–52. ACM (2010)
Google Scholar
Sridhara, G., Pollock, L., Vijay-Shanker, K.: Generating parameter comments and integrating with method summaries. In: 2011 IEEE 19th International Conference on Program Comprehension, pp. 71–80. IEEE (2011)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Qiuyuan Chen
School of Software, Tsinghua University, Beijing, China
Han Hu
School of Shenzhen Graduate, Peking University, Shenzhen, 518055, China
Zhaoyi Liu

Authors

Qiuyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Han Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Hu .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Hu, H., Liu, Z. (2019). Code Summarization with Abstract Syntax Tree. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1143. Springer, Cham. https://doi.org/10.1007/978-3-030-36802-9_69

Download citation

DOI: https://doi.org/10.1007/978-3-030-36802-9_69
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36801-2
Online ISBN: 978-3-030-36802-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics