Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICSE.2019.00019acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Learning to spot and refactor inconsistent method names

Published: 25 May 2019 Publication History

Abstract

To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.

References

[1]
M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: improving the design of existing code. Addison-Wesley Professional, 1999.
[2]
S. McConnell, Code complete. Pearson Education, 2004.
[3]
K. Beck, Implementation patterns. Pearson Education, 2007.
[4]
R. C. Martin, Clean code: a handbook of agile software craftsmanship. Pearson Education, 2009.
[5]
P. Johnson, "Don't go into programming if you don't have a good thesaurus," https://www.itworld.com/article/2833265/cloud-computing/don-t-go-into-programming-if-you-don-t-have-a-good-thesaurus.html, Last Accessed: August 2018.
[6]
P. Johnson, "Arg! the 9 hardest things programmers have to do," http://www.itworld.com/article/2823759/enterprise-software/124383-Arg-The-9-hardest-things-programmers-have-to-do.html#slide10, Last Accessed: August 2018.
[7]
S. Kim and D. Kim, "Automatic identifier inconsistency detection using code dictionary," Empirical Software Engineering, vol. 21, no. 2, pp. 565--604, 2016.
[8]
F. Deissenboeck and M. Pizka, "Concise and consistent naming," Software Quality Journal, vol. 14, no. 3, pp. 261--282, 2006.
[9]
M. Gethers, T. Savage, M. Di Penta, R. Oliveto, D. Poshyvanyk, and A. De Lucia, "CodeTopics: which topic am i coding now?" in Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011, pp. 1034--1036.
[10]
G. Bavota, R. Oliveto, M. Gethers, D. Poshyvanyk, and A. De Lucia, "Methodbook: Recommending move method refactorings via relational topic models," IEEE Transactions on Software Engineering, vol. 40, no. 7, pp. 671--694, 2014.
[11]
F. Deissenboeck and M. Pizka, "Concise and consistent naming: ten years later," in Proceedings of the 23rd International Conference on Program Comprehension. IEEE, 2015, pp. 3--3.
[12]
A. A. Takang, P. A. Grubb, and R. D. Macredie, "The effects of comments and identifier names on program comprehensibility: an experimental investigation," J. Prog. Lang., vol. 4, no. 3, pp. 143--167, 1996.
[13]
B. Liblit, A. Begel, and E. Sweetser, "Cognitive perspectives on the role of naming in computer programs," in Proceedings of the 18th Annual Workshop of the Psychology of Programming Interest Group. Citeseer, 2006, pp. 53--67.
[14]
D. Lawrie, C. Morrell, H. Feild, and D. Binkley, "What's in a name? a study of identifiers," in Proceedings of the 14th International Conference on Program Comprehension. IEEE, 2006, pp. 3--12.
[15]
V. Arnaoudova, L. M. Eshkevari, M. Di Penta, R. Oliveto, G. Antoniol, and Y.-G. Gueheneuc, "Repent: Analyzing the nature of identifier renamings," IEEE Transactions on Software Engineering, vol. 40, no. 5, pp. 502--532, 2014.
[16]
V. Arnaoudova, M. Di Penta, and G. Antoniol, "Linguistic antipatterns: What they are and how developers perceive them," Empirical Software Engineering, vol. 21, no. 1, pp. 104--158, 2016.
[17]
M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, "Deep learning code fragments for code clone detection," in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 2016, pp. 87--98.
[18]
J. Hofmeister, J. Siegmund, and D. V. Holt, "Shorter identifier names take longer to comprehend," in Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 2017, pp. 217--227.
[19]
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "Relating identifier naming flaws and code quality: An empirical study," in Proceedings of 16th Working Conference on Reverse Engineering. IEEE, 2009, pp. 31--35.
[20]
S. L. Abebe, S. Haiduc, P. Tonella, and A. Marcus, "The effect of lexicon bad smells on concept location in source code," in Proceedings of the 11th International Working Conference on Source Code Analysis and Manipulation. IEEE, 2011, pp. 125--134.
[21]
S. L. Abebe, V. Arnaoudova, P. Tonella, G. Antoniol, and Y.-G. Gueheneuc, "Can lexicon bad smells improve fault prediction?" in Proceedings of the 19th Working Conference on Reverse Engineering. IEEE, 2012, pp. 235--244.
[22]
S. Amann, H. A. Nguyen, S. Nadi, T. N. Nguyen, and M. Mezini, "A systematic evaluation of api-misuse detectors," arXiv preprint arXiv:1712.00242, 2017.
[23]
D. Hovemeyer and W. Pugh, "Finding bugs is easy," ACM Sigplan Notices, vol. 39, no. 12, pp. 92--106, 2004.
[24]
Eclipse, "Aspectj," https://github.com/eclipse/org.aspectj, Last Access: August. 2018.
[25]
S. Exchange, "Stack overflow," https://stackoverflow.com/, Last Access: August. 2018.
[26]
Microsoft, "Github," https://github.com/, Last Access: August. 2018.
[27]
E. W. Høst and B. M. Østvold, "Debugging method names," in Proceedings of the 23rd European Conference on Object-Oriented Programming. Springer, 2009, pp. 294--317.
[28]
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, "Learning natural coding conventions," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 281--293.
[29]
P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, "Class-based n-gram models of natural language," Computational Linguistics, vol. 18, no. 4, pp. 467--479, 1992.
[30]
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, "Suggesting accurate method and class names," in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 2015, pp. 38--49.
[31]
M. Allamanis, H. Peng, and C. Sutton, "A convolutional attention network for extreme summarization of source code," in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091--2100.
[32]
Q. Le and T. Mikolov, "Distributed representations of sentences and documents," in Proceedings of the 31th International Conference on Machine Learning. JMLR.org, 2014, pp. 1188--1196.
[33]
M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, "Subject independent facial expression recognition with robust face detection using a convolutional neural network," Neural Networks, vol. 16, no. 5--6, pp. 555--559, 2003.
[34]
T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," CoRR, vol. abs/1301.3781, 2013.
[35]
G. E. Dahl, R. P. Adams, and H. Larochelle, "Training restricted boltzmann machines on word observations," arXiv preprint arXiv:1202.5695, 2012.
[36]
D. Tang, B. Qin, and T. Liu, "Document modeling with gated recurrent neural network for sentiment classification," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL, 2015, pp. 1422--1432.
[37]
Q. Ai, L. Yang, J. Guo, and W. B. Croft, "Analysis of the paragraph vector model for information retrieval," in Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval. ACM, 2016, pp. 133--142.
[38]
M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, "From word embeddings to document distances," in Proceedings of the 32nd International Conference on Machine Learning. JMLR.org, 2015, pp. 957--966.
[39]
J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, "Towards universal paraphrastic sentence embeddings," arXiv preprint arXiv:1511.08198, 2015.
[40]
A. M. Dai, C. Olah, and Q. V. Le, "Document embedding with paragraph vectors," arXiv preprint arXiv:1507.07998, 2015.
[41]
A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher, "Ask me anything: Dynamic memory networks for natural language processing," in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 1378--1387.
[42]
D. Tang, B. Qin, and T. Liu, "Learning semantic representations of users and products for document level sentiment classification," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1. ACL, 2015, pp. 1014--1023.
[43]
Y. Kim, "Convolutional neural networks for sentence classification," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. ACL, 2014, pp. 1746--1751.
[44]
P. Wang, J. Xu, B. Xu, C. Liu, H. Zhang, F. Wang, and H. Hao, "Semantic clustering and convolutional neural network for short text categorization," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, 2015, pp. 352--357.
[45]
H. Peng, L. Mou, G. Li, Y. Liu, L. Zhang, and Z. Jin, "Building program vector representations for deep learning," in Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management. Springer, 2015, pp. 547--553.
[46]
M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei, "Bimodal modelling of source code and natural language," in Proceedings of the 32nd International Conference on Machine Learning. JMLR.org, 2015, pp. 2123--2132.
[47]
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, "Convolutional neural networks over tree structures for programming language processing," in Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI, 2016, pp. 1287--1293.
[48]
X. Gu, H. Zhang, D. Zhang, and S. Kim, "Deep api learning," in Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016, pp. 631--642.
[49]
S. Jiang, A. Armaly, and C. McMillan, "Automatically generating commit messages from diffs using neural machine translation," in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 2017, pp. 135--146.
[50]
T. D. Nguyen, A. T. Nguyen, H. D. Phan, and T. N. Nguyen, "Exploring api embedding for api usages and applications," in Proceedings of the 39th International Conference on Software Engineering. IEEE/ACM, 2017, pp. 438--449.
[51]
X. Gu, H. Zhang, and S. Kim, "Deep code search," in Proceedings of the 40th International Conference on Software Engineering. ACM, 2018, pp. 933--944.
[52]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software," in Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012, pp. 837--847.
[53]
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness," ACM Computing Surveys, vol. 51, no. 4, p. 81, 2018.
[54]
N. D. Q. Bui, L. Jiang, and Y. Yu, "Cross-language learning for program classification using bilateral tree-based convolutional neural networks," in Proceedings of the Workshops of the The 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 2018, pp. 758--761.
[55]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.
[56]
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," pp. 1724--1734, 2014.
[57]
Google, "Word2vec," https://code.google.com/archive/pAvord2vec/, Last Accessed: August. 2018.
[58]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems. NIPS, 2013, pp. 3111--3119.
[59]
E. W. Høst and B. M. Østvold, "The java programmer's phrase book," in Proceedings of the First International Conference on Software Language Engineering. Springer, 2008, pp. 322--341.
[60]
K. Liu, D. Kim, T. F. Bissyandé, S. Yoo, and Y. L. Traon, "Mining fix patterns for findbugs violations," IEEE Transactions on Software Engineering, 2018.
[61]
B. Liang, P. Bian, Y. Zhang, W. Shi, W. You, and Y. Cai, "AntMiner: mining more bugs by reducing noise interference," in Proceedings of the 38th IEEE/ACM International Conference on Software Engineering. ACM, 2016, pp. 333--344.
[62]
T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning," in The Elements of Statistical Learning. Springer, 2009, pp. 485--585.
[63]
D. W. Aha, Lazy learning. Washington, DC: Springer, 1997.
[64]
S. Wang, T. Liu, and L. Tan, "Automatically learning semantic features for defect prediction," in Proceedings of the 38th International Conference on Software Engineering. ACM, 2016, pp. 297--308.
[65]
Oracle, "Java naming convention," http://www.oracle.com/technetwork/java/codeconventions-135099.html, Last Access: August. 2018.
[66]
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "Mining java class naming conventions," in Proceedings of the 27th IEEE International Conference on Software Maintenance. IEEE, 2011, pp. 93--102.
[67]
M. Frigge, D. C. Hoaglin, and B. Iglewicz, "Some implementations of the boxplot," The American Statistician, vol. 43, no. 1, pp. 50--54, 1989.
[68]
M. T. Ribeiro, S. Singh, and C. Guestrin, "Why should I trust you?: Explaining the predictions of any classifier," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 1135--1144.
[69]
Eclipse, "Deep learning for java," https://deeplearning4j.org/, Last Access: August. 2018.
[70]
Gitter, "Deeplearning4j communities," https://gitter.im/deeplearning4j/deeplearning4j, Last Access: August. 2018.
[71]
A. Thies and C. Roth, "Recommending rename refactorings," in Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering. ACM, 2010, pp. 1--5.
[72]
D. Binkley, M. Hearn, and D. Lawrie, "Improving identifier informativeness using part of speech information," in Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 2011, pp. 203--206.
[73]
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code," in Proceedings of the 46th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, vol. 3. ACM, 2019, pp. 40:1--40:29.
[74]
T. Suzuki, K. Sakamoto, F. Ishikawa, and S. Honiden, "An approach for evaluating and suggesting method names using n-gram models," in Proceedings of the 22nd International Conference on Program Comprehension. ACM, 2014, pp. 271--274.
[75]
H. Kim, Y. Jung, S. Kim, and K. Yi, "MeCC: memory comparison-based clone detector," in Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011, pp. 301--310.
[76]
F.-H. Su, J. Bell, K. Harvey, S. Sethumadhavan, G. Kaiser, and T. Jebara, "Code relatives: detecting similarly behaving software," in Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016, pp. 702--714.
[77]
K. Kim, D. Kim, T. F. Bissyande, E. Choi, L. Li, J. Klein, and Y. Le Traon, "Facoy-a code-to-code search engine," in Proceedings of the 40th International Conference on Software Engineering. ACM, 2018.
[78]
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "Exploring the influence of identifier names on code quality: An empirical study," in Proceedings of the 14th European Conference on Software Maintenance and Reengineering. IEEE, 2010, pp. 156--165.
[79]
S. Butler, "Mining java class identifier naming conventions," in Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012, pp. 1641--1643.
[80]
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "INVocD: identifier name vocabulary dataset," in Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE, 2013, pp. 405--408.
[81]
B. Caprile and P. Tonella, "Nomen est omen: Analyzing the language of function identifiers," in Proceedings of the 6th Working Conference on Reverse Engineering. IEEE, 1999, pp. 112--122.
[82]
D. Lawrie, H. Feild, and D. Binkley, "Syntactic identifier conciseness and consistency," in Proceedings of the 6th IEEE International Workshop on Source Code Analysis and Manipulation. IEEE, 2006, pp. 139--148.
[83]
S. Haiduc, J. Aponte, L. Moreno, and A. Marcus, "On the use of automated text summarization techniques for summarizing source code," in Proceedings of the 17th Working Conference on Reverse Engineering. IEEE, 2010, pp. 35--44.
[84]
S. Haiduc, J. Aponte, and A. Marcus, "Supporting program comprehension with source code summarization," in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. ACM, 2010, pp. 223--226.
[85]
G. Sridhara, L. Pollock, and K. Vijay-Shanker, "Automatically detecting and describing high level actions within methods," in Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011, pp. 101--110.
[86]
A. De Lucia, M. Di Penta, and R. Oliveto, "Improving source code lexicon via traceability and information retrieval," IEEE Transactions on Software Engineering, vol. 37, no. 2, pp. 205--227, 2011.

Cited By

View all
  • (2024)An Empirical Evaluation of Method Signature Similarity in Java CodebasesProceeding of the 2024 5th Asia Service Sciences and Software Engineering Conference10.1145/3702138.3702152(35-42)Online publication date: 11-Sep-2024
  • (2024)How Important Are Good Method Names in Neural Code Generation? A Model Robustness PerspectiveACM Transactions on Software Engineering and Methodology10.1145/363001033:3(1-35)Online publication date: 14-Mar-2024
  • (2024)Context-Aware Name Recommendation for Field RenamingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639195(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '19: Proceedings of the 41st International Conference on Software Engineering
May 2019
1318 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Empirical Evaluation of Method Signature Similarity in Java CodebasesProceeding of the 2024 5th Asia Service Sciences and Software Engineering Conference10.1145/3702138.3702152(35-42)Online publication date: 11-Sep-2024
  • (2024)How Important Are Good Method Names in Neural Code Generation? A Model Robustness PerspectiveACM Transactions on Software Engineering and Methodology10.1145/363001033:3(1-35)Online publication date: 14-Mar-2024
  • (2024)Context-Aware Name Recommendation for Field RenamingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639195(1-13)Online publication date: 20-May-2024
  • (2024)VarGAN: Adversarial Learning of Variable Semantic RepresentationsIEEE Transactions on Software Engineering10.1109/TSE.2024.339173050:6(1505-1517)Online publication date: 25-Apr-2024
  • (2024)Deep learning based identification of inconsistent method names: How far are we?Empirical Software Engineering10.1007/s10664-024-10592-z30:1Online publication date: 25-Nov-2024
  • (2023)An Accurate Identifier Renaming Prediction and Suggestion ApproachACM Transactions on Software Engineering and Methodology10.1145/360310932:6(1-51)Online publication date: 29-Sep-2023
  • (2023)RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename RefactoringProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598092(740-752)Online publication date: 12-Jul-2023
  • (2023)Pre-implementation Method Name Prediction for Object-oriented ProgrammingACM Transactions on Software Engineering and Methodology10.1145/359720332:6(1-35)Online publication date: 29-Sep-2023
  • (2023)Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/359720232:6(1-40)Online publication date: 29-Sep-2023
  • (2023)The Best of Both Worlds: Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct PatchesACM Transactions on Software Engineering and Methodology10.1145/357603932:4(1-34)Online publication date: 27-May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media