Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2635868.2635883acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Learning natural coding conventions

Published: 11 November 2014 Publication History

Abstract

Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project’s coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94 % accuracy in its top suggestions for identifier names. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.

References

[1]
S. L. Abebe, S. Haiduc, P. Tonella, and A. Marcus. The effect of lexicon bad smells on concept location in source code. In Source Code Analysis and Manipulation (SCAM), 2011 11th IEEE International Working Conference on, pages 125–134. IEEE, 2011.
[2]
A. Abran, P. Bourque, R. Dupuis, J. W. Moore, and L. L. Tripp. Guide to the Software Engineering Body of Knowledge - SWEBOK. IEEE Press, Piscataway, NJ, USA, 2004 version edition, 2004.
[3]
E. N. Adams. Optimizing preventive service of software products. IBM Journal of Research and Development, 28(1):2–14, Jan. 1984.
[4]
M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 207–216. IEEE Press, 2013.
[5]
N. Anquetil and T. Lethbridge. Assessing the relevance of identifier names in a legacy software system. In Proceedings of the 1998 Conference of the Centre for Advanced Studies on Collaborative Research, page 4, 1998.
[6]
N. Anquetil and T. C. Lethbridge. Recovering software architecture from the names of source files. Journal of Software Maintenance, 11(3):201–221, 1999.
[7]
C. Arthur. Apple’s SSL iPhone vulnerability: How did it happen, and what next? bit.ly/1bJ7aSa, 2014. Visited Mar 2014.
[8]
M. I. S. R. Association et al. MISRA-C 2012: Guidelines for the Use of the C Language in Critical Systems. ISBN 9781906400118, 2012.
[9]
astyle Contributors. Artistic style 2.03. http://astyle.sourceforge.net/, 2013. Visited September 9, 2013.
[10]
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In ICSE, 2013.
[11]
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281–305, 2012.
[12]
T. J. Biggerstaff, B. G. Mitbander, and D. Webster. The concept assignment problem in program understanding. In Proceedings of the 15th International Conference on Software Engineering, pages 482–498. IEEE Computer Society Press, 1993.
[13]
D. Binkley, M. Davis, D. Lawrie, J. Maletic, C. Morrell, and B. Sharif. The impact of identifier style on effort and comprehension. Empirical Software Engineering, 18(2):219–276, 2013.
[14]
D. Binkley, M. Davis, D. Lawrie, and C. Morrell. To CamelCase or Under_score. In IEEE International Conference on Program Comprehension (ICPC), pages 158–167, 2009.
[15]
C. Boogerd and L. Moonen. Assessing the value of coding standards: An empirical study. In H. Mei and K. Wong, editors, Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008), pages 277 – 286. IEEE, October 2008.
[16]
F. P. Brooks. The Mythical Man-Month. Addison-Wesley Reading, 1975.
[17]
M. Broy, F. Deißenböck, and M. Pizka. A holistic approach to software quality at work. In Proc. 3rd World Congress for Software Quality (3WCSQ), 2005.
[18]
M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In ESEC/SIGSOFT FSE, pages 213–222. ACM, 2009.
[19]
R. P. Buse and W. R. Weimer. Learning a metric for code readability. Software Engineering, IEEE Transactions on, 36(4):546–558, 2010.
[20]
B. Caprile and P. Tonella. Restructuring program identifier names. In International Conference on Software Maintenance (ICSM’00), pages 97–107, 2000.
[21]
J. Carletta. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 22(2):249–254, 1996.
[22]
S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, pages 310–318. Association for Computational Linguistics, 1996.
[23]
N. Cowan. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1):87–114, 2001.
[24]
F. Deißenböck and M. Pizka. Concise and consistent naming {software system identifier naming}. In Proceedings of the 13th International Workshop on Program Comprehension (IWPC’05), pages 97–106, 2005.
[25]
S. Dowdy, S. Wearden, and D. Chilko. Statistics for Research, volume 512. John Wiley & Sons, 2011.
[26]
Eclipse-Contributors. Eclipse JDT. http://www.eclipse.org/jdt/, 2013. Visited September 9, 2013.
[27]
L. M. Eshkevari, V. Arnaoudova, M. Di Penta, R. Oliveto, Y.-G. Guéhéneuc, and G. Antoniol. An exploratory study of identifier renamings. In Proceedings of the 8th Working Conference on Mining Software Repositories, pages 33–42. ACM, 2011.
[28]
M. Gabel and Z. Su. A study of the uniqueness of source code. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of software engineering, FSE ’10, pages 147–156, New York, NY, USA, 2010. ACM.
[29]
M. G. Gabel. Inferring Programmer Intent and Related Errors from Software. PhD thesis, University of California, 2011.
[30]
GitHub. JUnit Pull Request #834. bit.ly/O8bmjM, 2014. Visited Mar 2014.
[31]
GitHub. libgdx Pull Request #1400. bit.ly/O8aBqV, 2014. Visited Mar 2014.
[32]
gnu-indent Contributors. GNU Indent – beautify C code. http://www.gnu.org/software/indent/, 2013. Visited September 9, 2013.
[33]
S. Gupta, S. Malik, L. Pollock, and K. Vijay-Shanker. Part-of-speech tagging of program identifiers for improved text-based software engineering tools. In International Conference on Program Comprehension, pages 3–12. IEEE, 2013.
[34]
L. Hatton. Safer language subsets: an overview and a case history, MISRA C. Information and Software Technology, 46(7):465–472, 2004.
[35]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In International Conference on Software Engineering (ICSE), pages 837–847. IEEE, 2012.
[36]
A. Hindle, M. W. Godfrey, and R. C. Holt. Reading beside the lines: Using indentation to rank revisions by complexity. Science of Computer Programming, 74(7):414–429, May 2009.
[37]
E. W. Høst and B. M. Østvold. Debugging method names. In In European Conference on Object-Oriented Programming (ECOOP), pages 294–317. Springer, 2009.
[38]
D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, 2nd edition, 2009.
[39]
K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4):377–439, Dec. 1992.
[40]
A. Langley. Apple’s SSL/TLS bug. bit.ly/MMvx6b, 2014. Visited Mar 2014.
[41]
D. Lawrie, H. Feild, and D. Binkley. Syntactic identifier conciseness and consistency. In IEEE International Workshop on Source Code Analysis and Manipulation, pages 139–148. IEEE, 2006.
[42]
D. Lawrie, H. Feild, and D. Binkley. An empirical study of rules for well-formed identifiers: Research articles. Journal of Software Maintenance Evolution: Research and Practice, 19(4):205–229, July 2007.
[43]
D. Lawrie, C. Morrell, H. Feild, and D. Binkley. What’s in a Name? A Study of Identifiers. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC’06), ICPC ’06, pages 3–12, Washington, DC, USA, 2006. IEEE Computer Society.
[44]
B. Liblit, A. Begel, and E. Sweetser. Cognitive perspectives on the role of naming in computer programs. In Annual Psychology of Programming Workshop, 2006.
[45]
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, 2004.
[46]
C. J. Maddison and D. Tarlow. Structured generative models of natural source code. arXiv preprint arXiv:1401.0514, 2014.
[47]
E. Mays, F. J. Damerau, and R. L. Mercer. Context based spelling correction. Information Processing and Management, 27(5):517–522, 1991.
[48]
G. A. Miller. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63(2):81, 1956.
[49]
D. Movshovitz-Attias and W. W. Cohen. Natural language models for predicting programming comments. In Proc of the ACL, 2013.
[50]
E. Murphy-Hill, C. Parnin, and A. P. Black. How we refactor, and how we know it. Software Engineering, IEEE Transactions on, 38(1):5–18, 2012.
[51]
N. Nagappan and T. Ball. Using software dependencies and churn metrics to predict field failures: An empirical case study. In ESEM, pages 364–373, 2007.
[52]
A. T. Nguyen, T. T. Nguyen, H. A. Nguyen, A. Tamrawi, H. V. Nguyen, J. Al-Kofahi, and T. N. Nguyen. Graph-based pattern-oriented, context-sensitive source code completion. In ACM/IEEE International Conference on Software Engineering (ICSE). IEEE, 2012.
[53]
T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 532–542. ACM, 2013.
[54]
M. Ohba and K. Gondow. Toward mining concept keywords from identifiers in large software projects. In ACM SIGSOFT Software Engineering Notes, volume 30, pages 1–5. ACM, 2005.
[55]
Oracle. Code Conventions for the Java Programming Language. http://www.oracle.com/technetwork/ java/codeconv-138413.html, 1999. Visited September 2, 2013.
[56]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In Association for Computational Linguistics (ACL), pages 311–318, 2002.
[57]
R. Pike. Go at Google. http://talks.golang.org/2012/splash.slide, 2012. Visited September 9, 2013.
[58]
Pylint-Contributors. Pylint – code analysis for Python. http://www.pylint.org/, 2013. Visited September 9, 2013.
[59]
V. Rajlich and P. Gosavi. Incremental change in object-oriented programming. Software, IEEE, 21(4):62–69, 2004.
[60]
D. Ratiu and F. Deißenböck. From reality to programs and (not quite) back again. In IEEE International Conference on Program Comprehension (ICPC), pages 91–102. IEEE, 2007.
[61]
P. C. Rigby and C. Bird. Convergent software peer review practices. In Proceedings of the the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE). ACM, 2013.
[62]
R. Robbes and M. Lanza. How program history can improve code completion. In Automated Software Engineering (ASE), pages 317–326. IEEE, 2008.
[63]
M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. Software, IEEE, 27(4):80–86, 2010.
[64]
G. v. Rossum, B. Warsaw, and N. Coghlan. PEP 8–Style Guide for Python Code. http://www.python.org/dev/peps/pep-0008/, 2013. Visited September 8, 2013.
[65]
C. Simonyi. Hungarian notation. http://msdn.microsoft. com/en-us/library/aa260976(VS.60).aspx, 1999. Visited September 2, 2013.
[66]
E. Soloway and K. Ehrlich. Empirical studies of programming knowledge. Software Engineering, IEEE Transactions on, (5):595–609, 1984.
[67]
W. Strunk Jr and E. White. The Elements of Style. Macmillan, New York, 3rd edition, 1979.
[68]
A. Takang, P. Grubb, and R. Macredie. The effects of comments and identifier names on program comprehensibility: an experiential study. Journal of Program Languages, 4(3):143–167, 1996.
[69]
A. A. Takang, P. A. Grubb, and R. D. Macredie. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143–167, 1996.
[70]
G. Uddin, B. Dagenais, and M. P. Robillard. Analyzing temporal API usage patterns. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pages 456–459. IEEE Computer Society, 2011.
[71]
J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 319–328. IEEE Press, 2013.
[72]
X. Wang, L. Pollock, and K. Vijay-Shanker. Automatic segmentation of method code into meaningful blocks to improve readability. In Working Conference on Reverse Engineering, pages 35–44. IEEE, 2011.
[73]
Wikipedia. Coding Conventions. http: //en.wikipedia.org/wiki/Coding_conventions.
[74]
H. P. Young. The economics of convention. The Journal of Economic Perspectives, 10(2):105–122, 1996.
[75]
C. Zhang, J. Yang, Y. Zhang, J. Fan, X. Zhang, J. Zhao, and P. Ou. Automatic parameter recommendation for practical api usage. In Proceedings of the 34th International Conference on Software Engineering, pages 826–836. IEEE Press, 2012.
[76]
H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In ECOOP 2009–Object-Oriented Programming, pages 318–343. Springer, 2009. Introduction Motivating Example Use Cases and Tools The Naturalize Framework The Core of Naturalize Choices of Scoring Function Suggesting Natural Names Suggesting Natural Formatting Converting Conventions into Rules Evaluation The Importance of Coding Conventions Suggestion Robustness of Suggestions Manual Examination of Suggestions Suggestions Accepted by Projects Related Work Conclusion Acknowledgements References

Cited By

View all
  • (2024)Enhancing Code Readability through Automated Consistent FormattingElectronics10.3390/electronics1311207313:11(2073)Online publication date: 27-May-2024
  • (2024)Autonomous Vehicles: Evolution of Artificial Intelligence and the Current Industry LandscapeBig Data and Cognitive Computing10.3390/bdcc80400428:4(42)Online publication date: 7-Apr-2024
  • (2024)Training AI Model that Suggests Python Code from Student Requests in Natural LanguageJournal of Information Processing10.2197/ipsjjip.32.6932(69-76)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Learning natural coding conventions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering
    November 2014
    856 pages
    ISBN:9781450330565
    DOI:10.1145/2635868
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Coding conventions
    2. naturalness of software

    Qualifiers

    • Research-article

    Conference

    SIGSOFT/FSE'14
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 17 of 128 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)162
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing Code Readability through Automated Consistent FormattingElectronics10.3390/electronics1311207313:11(2073)Online publication date: 27-May-2024
    • (2024)Autonomous Vehicles: Evolution of Artificial Intelligence and the Current Industry LandscapeBig Data and Cognitive Computing10.3390/bdcc80400428:4(42)Online publication date: 7-Apr-2024
    • (2024)Training AI Model that Suggests Python Code from Student Requests in Natural LanguageJournal of Information Processing10.2197/ipsjjip.32.6932(69-76)Online publication date: 2024
    • (2024)Semantic-aware Source Code ModelingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695605(2494-2497)Online publication date: 27-Oct-2024
    • (2024)Enhancing Automated Program Repair with Solution DesignProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695537(1706-1718)Online publication date: 27-Oct-2024
    • (2024)A Systematic Literature Review on the Influence of Enhanced Developer Experience on Developers' Productivity: Factors, Practices, and RecommendationsACM Computing Surveys10.1145/368729957:1(1-46)Online publication date: 7-Oct-2024
    • (2024)Teachers' Beliefs and Practices on the Naming of Variables in Introductory Python Programming CoursesProceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training10.1145/3639474.3640069(368-379)Online publication date: 14-Apr-2024
    • (2024)Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/363197533:3(1-37)Online publication date: 15-Mar-2024
    • (2024)An Application of Program Slicing and CodeBERT to Distill Variables With Inappropriate Names2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685588(356-361)Online publication date: 30-May-2024
    • (2024)Reducing False Positives of Static Bug Detectors Through Code Representation Learning2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00075(681-692)Online publication date: 12-Mar-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media