Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Code completion with statistical language models

Published: 09 June 2014 Publication History

Abstract

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.
Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.
Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

References

[1]
Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.
[2]
Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.
[3]
Stack overflow. http://www.stackoverflow.com/.
[4]
Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.
[5]
Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.
[6]
Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.
[7]
Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.
[8]
Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002).
[9]
Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11.
[10]
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155.
[11]
Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249.
[12]
Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328.
[13]
Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.
[14]
Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10.
[15]
Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013).
[16]
Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011.
[17]
Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012).
[18]
Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05.
[19]
Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240.
[20]
Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.
[21]
Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.
[22]
Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.
[23]
Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005).
[24]
Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.
[25]
Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012).
[26]
Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012).
[27]
Reiss, S. P. Semantics-based code search. In ICSE'09.
[28]
Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.
[29]
Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007).
[30]
Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009).
[31]
Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415.
[32]
Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010).
[33]
Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41.
[34]
Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).
[35]
Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007).
[36]
Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.
[37]
Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008).
[38]
Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18.
[39]
Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476.
[40]
Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094.
[41]
Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291.
[42]
Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011).
[43]
Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09.

Cited By

View all
  • (2024)CodeScore: Evaluating Code Generation by Learning Code ExecutionACM Transactions on Software Engineering and Methodology10.1145/3695991Online publication date: 13-Sep-2024
  • (2024)Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitACM Computing Surveys10.1145/3664597Online publication date: 18-May-2024
  • (2024)An Analysis of the Costs and Benefits of Autocomplete in IDEsProceedings of the ACM on Software Engineering10.1145/36607651:FSE(1284-1306)Online publication date: 12-Jul-2024
  • Show More Cited By

Index Terms

  1. Code completion with statistical language models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 49, Issue 6
    PLDI '14
    June 2014
    598 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2666356
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2014
      619 pages
      ISBN:9781450327848
      DOI:10.1145/2594291
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2014
    Published in SIGPLAN Volume 49, Issue 6

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)289
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CodeScore: Evaluating Code Generation by Learning Code ExecutionACM Transactions on Software Engineering and Methodology10.1145/3695991Online publication date: 13-Sep-2024
    • (2024)Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitACM Computing Surveys10.1145/3664597Online publication date: 18-May-2024
    • (2024)An Analysis of the Costs and Benefits of Autocomplete in IDEsProceedings of the ACM on Software Engineering10.1145/36607651:FSE(1284-1306)Online publication date: 12-Jul-2024
    • (2024)Feedback-Directed Partial ExecutionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680320(781-793)Online publication date: 11-Sep-2024
    • (2024)Generating Java Methods: An Empirical Assessment of Four AI-Based Code AssistantsProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644402(13-23)Online publication date: 15-Apr-2024
    • (2024)Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative ApproachInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350068734:05(705-727)Online publication date: 13-Jan-2024
    • (2023)User-Driven Support for Visualization Prototyping in D3Proceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584041(958-972)Online publication date: 27-Mar-2023
    • (2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
    • (2023)Specializing Neural Networks for Cryptographic Code Completion ApplicationsIEEE Transactions on Software Engineering10.1109/TSE.2023.326536249:6(3524-3535)Online publication date: 1-Jun-2023
    • (2023)Investigating Execution Trace Embedding for Test Case Prioritization2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00036(279-290)Online publication date: 22-Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media