Article

Leveraging Comment Retrieval for Code Summarization

Authors:

Yanfang YeAuthors Info & Claims

Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II

Pages 439 - 447

https://doi.org/10.1007/978-3-031-28238-6_34

Published: 02 April 2023 Publication History

Abstract

Open-source code often suffers from mismatched or missing comments, leading to difficult code comprehension, and burdening software development and maintenance. In this paper, we design a novel code summarization model CodeFiD to address this laborious challenge. Inspired by retrieval-augmented methods for open-domain question answering, CodeFiD first retrieves a set of relevant comments from code collections for a given code, and then aggregates presentations of code and these comments to produce a natural language sentence that summarizes the code behaviors. Different from current code summarization works that focus on improving code representations, our model resorts to external knowledge to enhance code summarizing performance. Extensive experiments on public code collections demonstrate the effectiveness of CodeFiD by outperforming state-of-the-art counterparts across all programming languages.

References

[1]

Ahmed, T., Devanbu, P.: Multilingual training for software engineering. arXiv preprint arXiv:2112.02043 (2021)

[2]

Allamanis M, Barr ET, Devanbu P, and Sutton C A survey of machine learning for big code and naturalness ACM Comput. Surv. (CSUR) 2018 51 4 1-37

Digital Library

[3]

Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: generating sequences from structured representations of code. In: International Conference on Learning Representations (ICLR) (2019)

[4]

Chen, L., Hou, S., Ye, Y., Xu, S.: Attributed heterogeneous information network embedding for code retrieval. In: Heterogeneous Information Network Analysis and Applications (2021)

[5]

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT (2019)

[6]

Fan, Y., Hou, S., Zhang, Y., Ye, Y., Abdulhayoglu, M.: Gotcha-sly malware! scorpion a metagraph2vec based malware detection system. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 253–262 (2018)

[7]

Fan, Y., Ju, M., Hou, S., Ye, Y., Wan, W., Wang, K., Mei, Y., Xiong, Q.: Heterogeneous temporal graph transformer: An intelligent system for evolving android malware detection. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 2831–2839 (2022)

[8]

Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 1536–1547 (2020)

[9]

Hellendoorn, V.J., Sutton, C., Singh, R., Maniatis, P., Bieber, D.: Global relational models of source code. In: International conference on learning representations (2019)

[10]

Hou, S., Chen, L., Ye, Y.: Summarizing source code from structure and context. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)

[11]

Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: ICPC, pp. 200–210. IEEE (2018)

[12]

Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchnet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)

[13]

Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: ACL, pp. 2073–2083 (2016)

[14]

Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 874–880 (2021)

[15]

Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781 (2020)

[16]

Kulis, B., et al.: Metric learning: a survey. Found. Trends® Mach. Learn. 5(4), 287–364 (2013)

[17]

LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: ICPC, pp. 184–195 (2020)

[18]

Ling X et al. Deep graph matching and searching for semantic code retrieval TKDD 2021 15 5 1-21

Digital Library

[19]

Loyola, P., Marrese-Taylor, E., Matsuo, Y.: A neural architecture for generating natural language descriptions from source code changes. arXiv preprint arXiv:1704.04856 (2017)

[20]

Lu, S., et al.: CodeXglue: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021)

[21]

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Association for Computational Linguistics, pp. 311–318 (2002)

[22]

Parvez, M.R., Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.W.: Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601 (2021)

[23]

Phan, L., et al.: Cotext: Multi-task learning with code-text transformer. arXiv preprint arXiv:2105.08645 (2021)

[24]

Raffel C et al. Exploring the limits of transfer learning with a unified text-to-text transformer J. Mach. Learn. Res. 2020 21 140 1-67

[25]

Rodeghero, P., McMillan, C., McBurney, P.W., Bosch, N., D’Mello, S.: Improving automated source code summarization via an eye-tracking study of programmers. In: ICSE, pp. 390–401 (2014)

[26]

Xia X, Bao L, Lo D, Xing Z, Hassan AE, and Li S Measuring program comprehension: a large-scale field study with professionals IEEE Trans. Softw. Eng. 2017 44 10 951-976

Digital Library

[27]

Yao, Z., Peddamail, J.R., Sun, H.: CoaCor: code annotation for code retrieval with reinforcement learning. In: The World Wide Web Conference, pp. 2203–2214 (2019)

[28]

Ye, Y., et al.: Out-of-sample node representation learning for heterogeneous graph in real-time android malware detection. In: 28th International Joint Conference on Artificial Intelligence (IJCAI) (2019)

[29]

Ye, Y., et al.: ICSD: an automatic system for insecure code snippet detection in stack overflow over heterogeneous information network. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 542–552 (2018)

[30]

Ye Y, Li T, Adjeroh D, and Iyengar SS A survey on malware detection using data mining techniques ACM Comput. Surv. (CSUR) 2017 50 3 1-40

Digital Library

[31]

Zhang, C., Song, D., Huang, C., Swami, A., Chawla, N.V.: Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793–803 (2019)

[32]

Zhang, J., Wang, X., Zhang, H., Sun, H., Liu, X.: Retrieval-based neural source code summarization. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pp. 1385–1397. IEEE (2020)

[33]

Zhao, J., Wang, X., Shi, C., Hu, B., Song, G., Ye, Y.: Heterogeneous graph structure learning for graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4697–4705 (2021)

[34]

Zügner, D., Kirschstein, T., Catasta, M., Leskovec, J., Günnemann, S.: Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318 (2021)

Recommendations

An Extractive-and-Abstractive Framework for Source Code Summarization
(Source) Code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization ...
Snippet Comment Generation Based on Code Context Expansion
Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment ...
Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction
Continuous Special Section: AI and SE

Code summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II

Apr 2023

734 pages

ISBN:978-3-031-28237-9

DOI:10.1007/978-3-031-28238-6

Editors:
Jaap Kamps
University of Amsterdam, Amsterdam, The Netherlands
,
Lorraine Goeuriot
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
,
Fabio Crestani
Università della Svizzera Italiana, Lugano, Switzerland
,
Maria Maistro
University of Copenhagen, Copenhagen, Denmark
,
Hideo Joho
University of Tsukuba, Ibaraki, Japan
,
Brian Davis
Dublin City University, Dublin, Ireland
,
Cathal Gurrin
Dublin City University, Dublin, Ireland
,
Udo Kruschwitz
Universität Regensburg, Regensburg, Germany
,
Annalina Caputo
Dublin City University, Dublin, Ireland

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 April 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents