Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Neural Architecture for Detecting Identifier Renaming from Diff

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2021 (IDEAL 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13113))

Abstract

In software engineering, code review controls code quality and prevents bugs. Although many commits to a codebase add features, some commits are code refactoring, including renaming of identifiers. Reviewing code refactoring requires a bit of different efforts than that of reviewing functional changes. For instance, renaming an identifier has to make sure that the new name not only is more descriptive and follows the naming convention of the institution, but also does not collide with any other identifiers. We propose in this paper a machine learning model to automatically identify commits consisting of pure identifier renaming, from only the diff files. This technique helps code review enforce naming and coding conventions of the institution, and let quality assurance testers focus more on functional changes. In contrast to the traditional way of detecting such changes by parsing the full source code before and after the commit, which is less efficient and requires rigorous syntactical completeness and correctness, our novel approach based on neural networks is able to read only the diff and gives a confidence value of whether it is a renaming or not. Since there had been no existing labeled dataset on repository commits, we labeled a dataset with more than 1,000 repos from GitHub by Java syntax analysis. Then we trained a neural network to classify these commits as whether they are renaming, obtaining the test accuracy of 85.65% and the false positive rate of 2.03%. The methods in our experiment also have significance for general static analysis with neural network approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/antlr/grammars-v4.

References

  1. Agarap, A.F.: Deep learning using rectified linear units (relu) (2018). arXiv preprint arXiv:1803.08375

  2. Alali, A., Kagdi, H., Maletic, J.I.: What’s a typical commit? a characterization of open source software repositories. In: 2008 16th IEEE International Conference on Program Comprehension, pp. 182–191. IEEE (2008)

    Google Scholar 

  3. Alexandru, C.V., Panichella, S., Gall, H.C.: Replicating parser behavior using neural machine translation. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 316–319. IEEE (2017)

    Google Scholar 

  4. Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)

    Article  Google Scholar 

  5. Buse, R.P., Weimer, W.R.: Automatically documenting program changes. In: Proceedings of the IEEE/ACM Iternational Conference on Automated Software Engineering, pp. 33–42 (2010)

    Google Scholar 

  6. Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 631–642 (2016)

    Google Scholar 

  7. Haiduc, S., Aponte, J., Moreno, L., Marcus, A.: On the use of automated text summarization techniques for summarizing source code. In: 2010 17th Working Conference on Reverse Engineering, pp. 35–44. IEEE (2010)

    Google Scholar 

  8. Hattori, L.P., Lanza, M.: On the nature of commits. In: 2008 23rd IEEE/ACM International Conference on Automated Software Engineering-Workshops, pp. 63–71. IEEE (2008)

    Google Scholar 

  9. Huang, Y., Jia, N., Zhou, H.J., Chen, X.P., Zheng, Z.B., Tang, M.D.: Learning human-written commit messages to document code changes. J. Comput. Sci. Technol. 35(6), 1258–1277 (2020)

    Article  Google Scholar 

  10. Huo, X., Li, M., Zhou, Z.H., et al.: Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, vol. 16, pp. 1606–1612 (2016)

    Google Scholar 

  11. Jiang, S., Armaly, A., McMillan, C.: Automatically generating commit messages from diffs using neural machine translation. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 135–146. IEEE (2017)

    Google Scholar 

  12. Linares-Vásquez, M., Cortés-Coy, L.F., Aponte, J., Poshyvanyk, D.: Changescribe: a tool for automatically generating commit messages. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2, pp. 709–712. IEEE (2015)

    Google Scholar 

  13. Liu, Q., Liu, Z., Zhu, H., Fan, H., Du, B., Qian, Y.: Generating commit messages from diffs using pointer-generator network. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 299–309. IEEE (2019)

    Google Scholar 

  14. Loyola, P., Marrese-Taylor, E., Matsuo, Y.: A neural architecture for generating natural language descriptions from source code changes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 2: Short Papers, pp. 287–292 (2017)

    Google Scholar 

  15. Macho, C., McIntosh, S., Pinzger, M.: Predicting build co-changes with source code change and commit categories. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 541–551. IEEE (2016)

    Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781

  17. Moreno, L., Bavota, G., Di Penta, M., Oliveto, R., Marcus, A., Canfora, G.: Arena: an approach for the automated generation of release notes. IEEE Trans. Softw. Eng. 43(2), 106–127 (2016)

    Article  Google Scholar 

  18. Morgachev, G., Ignatyev, V., Belevantsev, A.: Detection of variable misuse using static analysis combined with machine learning. In: 2019 Ivannikov Ispras Open Conference (ISPRAS), pp. 16–24. IEEE (2019)

    Google Scholar 

  19. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  20. Parr, T.: The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, Raleigh (2013)

    Google Scholar 

  21. Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “big code.". ACM SIGPLAN Notices 50(1), 111–124 (2015)

    Article  Google Scholar 

  22. Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 419–428 (2014)

    Google Scholar 

  23. Shimagaki, J., Kamei, Y., McIntosh, S., Hassan, A.E., Ubayashi, N.: A study of the quality-impacting practices of modern code review at sony mobile. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 212–221 (2016)

    Google Scholar 

  24. Silva, D., Silva, J., Santos, G.J.D.S., Terra, R., Valente, M.T.O.: Refdiff 2.0: a multi-language refactoring detection tool. IEEE Trans. Softw. Eng (2020)

    Google Scholar 

  25. Tao, Y., Dang, Y., Xie, T., Zhang, D., Kim, S.: How do software engineers understand code changes? an exploratory study in industry. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp. 1–11 (2012)

    Google Scholar 

  26. Tsantalis, N., Mansouri, M., Eshkevari, L., Mazinanian, D., Dig, D.: Accurate and efficient refactoring detection in commit history. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 483–494. IEEE (2018)

    Google Scholar 

  27. Xu, S., Yao, Y., Xu, F., Gu, T., Tong, H., Lu, J.: Commit message generation for source code changes. In: IJCAI (2019)

    Google Scholar 

Download references

Acknowledgement

This work is part of the research project (RP/ESCA-03/2020) funded by Macao Polytechnic Institute, Macao SAR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiqi Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gu, Q., Ke, W. (2021). A Neural Architecture for Detecting Identifier Renaming from Diff. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2021. IDEAL 2021. Lecture Notes in Computer Science(), vol 13113. Springer, Cham. https://doi.org/10.1007/978-3-030-91608-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91608-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91607-7

  • Online ISBN: 978-3-030-91608-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics