A Neural Architecture for Detecting Identifier Renaming from Diff

Qiqi Gu¹⁷ &
Wei Ke¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13113))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1672 Accesses
1 Citations

Abstract

In software engineering, code review controls code quality and prevents bugs. Although many commits to a codebase add features, some commits are code refactoring, including renaming of identifiers. Reviewing code refactoring requires a bit of different efforts than that of reviewing functional changes. For instance, renaming an identifier has to make sure that the new name not only is more descriptive and follows the naming convention of the institution, but also does not collide with any other identifiers. We propose in this paper a machine learning model to automatically identify commits consisting of pure identifier renaming, from only the diff files. This technique helps code review enforce naming and coding conventions of the institution, and let quality assurance testers focus more on functional changes. In contrast to the traditional way of detecting such changes by parsing the full source code before and after the commit, which is less efficient and requires rigorous syntactical completeness and correctness, our novel approach based on neural networks is able to read only the diff and gives a confidence value of whether it is a renaming or not. Since there had been no existing labeled dataset on repository commits, we labeled a dataset with more than 1,000 repos from GitHub by Java syntax analysis. Then we trained a neural network to classify these commits as whether they are renaming, obtaining the test accuracy of 85.65% and the false positive rate of 2.03%. The methods in our experiment also have significance for general static analysis with neural network approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes

Article Open access 14 January 2023

Commit2Vec: Learning Distributed Representations of Code Changes

Article 19 March 2021

Detecting outdated code element references in software repository documentation

Article Open access 21 November 2023

Notes

1.
https://github.com/antlr/grammars-v4.

References

Agarap, A.F.: Deep learning using rectified linear units (relu) (2018). arXiv preprint arXiv:1803.08375
Alali, A., Kagdi, H., Maletic, J.I.: What’s a typical commit? a characterization of open source software repositories. In: 2008 16th IEEE International Conference on Program Comprehension, pp. 182–191. IEEE (2008)
Google Scholar
Alexandru, C.V., Panichella, S., Gall, H.C.: Replicating parser behavior using neural machine translation. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 316–319. IEEE (2017)
Google Scholar
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)
Article Google Scholar
Buse, R.P., Weimer, W.R.: Automatically documenting program changes. In: Proceedings of the IEEE/ACM Iternational Conference on Automated Software Engineering, pp. 33–42 (2010)
Google Scholar
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 631–642 (2016)
Google Scholar
Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: 2010 17th Working Conference on Reverse Engineering, pp. 35–44. IEEE (2010)
Google Scholar
Hattori, L.P., Lanza, M.: On the nature of commits. In: 2008 23rd IEEE/ACM International Conference on Automated Software Engineering-Workshops, pp. 63–71. IEEE (2008)
Google Scholar
Huang, Y., Jia, N., Zhou, H.J., Chen, X.P., Zheng, Z.B., Tang, M.D.: Learning human-written commit messages to document code changes. J. Comput. Sci. Technol. 35(6), 1258–1277 (2020)
Article Google Scholar
Huo, X., Li, M., Zhou, Z.H., et al.: Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, vol. 16, pp. 1606–1612 (2016)
Google Scholar
Jiang, S., Armaly, A., McMillan, C.: Automatically generating commit messages from diffs using neural machine translation. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 135–146. IEEE (2017)
Google Scholar
Linares-Vásquez, M., Cortés-Coy, L.F., Aponte, J., Poshyvanyk, D.: Changescribe: a tool for automatically generating commit messages. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, pp. 709–712. IEEE (2015)
Google Scholar
Liu, Q., Liu, Z., Zhu, H., Fan, H., Du, B., Qian, Y.: Generating commit messages from diffs using pointer-generator network. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 299–309. IEEE (2019)
Google Scholar
Loyola, P., Marrese-Taylor, E., Matsuo, Y.: A neural architecture for generating natural language descriptions from source code changes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 2: Short Papers, pp. 287–292 (2017)
Google Scholar
Macho, C., McIntosh, S., Pinzger, M.: Predicting build co-changes with source code change and commit categories. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 541–551. IEEE (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
Moreno, L., Bavota, G., Di Penta, M., Oliveto, R., Marcus, A., Canfora, G.: Arena: an approach for the automated generation of release notes. IEEE Trans. Softw. Eng. 43(2), 106–127 (2016)
Article Google Scholar
Morgachev, G., Ignatyev, V., Belevantsev, A.: Detection of variable misuse using static analysis combined with machine learning. In: 2019 Ivannikov Ispras Open Conference (ISPRAS), pp. 16–24. IEEE (2019)
Google Scholar
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Parr, T.: The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, Raleigh (2013)
Google Scholar
Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code.". ACM SIGPLAN Notices 50(1), 111–124 (2015)
Article Google Scholar
Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 419–428 (2014)
Google Scholar
Shimagaki, J., Kamei, Y., McIntosh, S., Hassan, A.E., Ubayashi, N.: A study of the quality-impacting practices of modern code review at sony mobile. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 212–221 (2016)
Google Scholar
Silva, D., Silva, J., Santos, G.J.D.S., Terra, R., Valente, M.T.O.: Refdiff 2.0: a multi-language refactoring detection tool. IEEE Trans. Softw. Eng (2020)
Google Scholar
Tao, Y., Dang, Y., Xie, T., Zhang, D., Kim, S.: How do software engineers understand code changes? an exploratory study in industry. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp. 1–11 (2012)
Google Scholar
Tsantalis, N., Mansouri, M., Eshkevari, L., Mazinanian, D., Dig, D.: Accurate and efficient refactoring detection in commit history. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 483–494. IEEE (2018)
Google Scholar
Xu, S., Yao, Y., Xu, F., Gu, T., Tong, H., Lu, J.: Commit message generation for source code changes. In: IJCAI (2019)
Google Scholar

Download references

Acknowledgement

This work is part of the research project (RP/ESCA-03/2020) funded by Macao Polytechnic Institute, Macao SAR.

Author information

Authors and Affiliations

School of Applied Sciences, Macao Polytechnic Institute, Macao SAR, China
Qiqi Gu & Wei Ke

Authors

Qiqi Gu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiqi Gu .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Universidad Politecnica de Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Manchester, Manchester, UK
Richard Allmendinger
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Southern University of Science and Technology, Shenzhen, China
Ke Tang
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
University of Minho, Braga, Portugal
Paulo Novais
NOVA University of Lisbon, Lisbon, Portugal
Susana Nascimento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, Q., Ke, W. (2021). A Neural Architecture for Detecting Identifier Renaming from Diff. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-91608-4_4
Published: 23 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91607-7
Online ISBN: 978-3-030-91608-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Neural Architecture for Detecting Identifier Renaming from Diff

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes

Commit2Vec: Learning Distributed Representations of Code Changes

Detecting outdated code element references in software repository documentation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Neural Architecture for Detecting Identifier Renaming from Diff

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes

Commit2Vec: Learning Distributed Representations of Code Changes

Detecting outdated code element references in software repository documentation

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation