Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2950290.2983931acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Public Access

T2API: synthesizing API code usage templates from English texts with statistical translation

Published: 01 November 2016 Publication History

Abstract

In this work, we develop T2API, a statistical machine translation-based tool that takes a given English description of a programming task as a query, and synthesizes the API usage template for the task by learning from training data. T2API works in two steps. First, it derives the API elements relevant to the task described in the input by statistically learning from a StackOverflow corpus of text descriptions and corresponding code. To infer those API elements, it also considers the context of the words in the textual input and the context of API elements that often go together in the corpus. The inferred API elements with their relevance scores are ensembled into an API usage by our novel API usage synthesis algorithm that learns the API usages from a large code corpus via a graph-based language model. Importantly, T2API is capable of generating new API usages from smaller, previously-seen usages.

References

[1]
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In Proceedings of the International Symposium on Foundations of Software Engineering, FSE 2014, pages 281–293. ACM, 2014.
[2]
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Suggesting accurate method and class names. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 38–49. ACM, 2015.
[3]
M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 207–216. IEEE Press, 2013.
[4]
M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32nd International Conference on Machine Learning, ICML ’15. ACM, 2015.
[5]
A. Bacchelli, A. Cleve, M. Lanza, and A. Mocci. Extracting structured data from natural language documents with island parsing. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pages 476–479. IEEE Computer Society, 2011.
[6]
S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 681–682. ACM, 2006.
[7]
M. Beller, G. Gousios, A. Panichella, and A. Zaidman. When, how, and why developers (do not) test in their IDEs. In Proceedings of 10th Symposium on Foundations of Software Engineering, ESEC/FSE’15, pages 179–190. ACM, 2015.
[8]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263–311, June 1993.
[9]
W.-K. Chan, H. Cheng, and D. Lo. Searching Connected API Subgraph via Text Phrases. In Proceedings of the 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 10:1–10:11. ACM, 2012.
[10]
T. Gvero and V. Kuncak. Synthesizing java expressions from free-form queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 416–432. ACM, 2015.
[11]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 837–847. IEEE Press, 2012.
[12]
C. Jaspan and J. Aldrich. Checking framework interactions with relationships. In European Conference on Object-Oriented Programming, pages 27–51. Springer-Verlag, 2009.
[13]
M. Kechagia, D. Mitropoulos, and D. Spinellis. Charting the API minefield using software telemetry data. Empirical Softw. Engg., 20(6):1785–1830, Dec. 2015.
[14]
P. Koehn. Statistical Machine Translation. Cambridge University Press, New York, NY, USA, 1st edition, 2010.
[15]
P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In Proc. of the Conference of the North American Chapter on Human Language Technology, NAACL’03, pages 48–54. Association for Computational Linguistics, 2003.
[16]
C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In Proceedings of the 31st International Conference on Machine Learning (ICML), June 2014.
[17]
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the API jungle. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 48–61. ACM, 2005.
[18]
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, 1999.
[19]
C. McMillan, D. Poshyvanyk, and M. Grechanik. Recommending source code examples via API call usages and documentation. In Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE ’10, pages 21–25. ACM, 2010.
[20]
L. L. Minku and X. Yao. How to make best use of cross-company data in software effort estimation? In Proceedings of the 36th International Conference on Software Engineering, ICSE’14, pages 446–456. ACM, 2014.
[21]
L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang. TBCNN: A tree-based convolutional neural network for programming language processing. CoRR, abs/1409.5718, 2014.
[22]
A. T. Nguyen and T. N. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th International Conference on Software Engineering, ICSE 2015. IEEE CS, 2015.
[23]
T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of Foundations of Software Engineering, ESEC/FSE ’09, pages 383–392. ACM, 2009.
[24]
M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 371–382. IEEE, 2009.
[25]
M. Raghothaman, Y. Wei, and Y. Hamadi. Swim: Synthesizing what i mean: Code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 357–367. ACM, 2016.
[26]
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 419–428. ACM, 2014.
[27]
P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 832–841. IEEE Press, 2013.
[28]
StackOverflow. http://stackoverflow.com/questions/11270229/how-to-usegeocoder-to-get-the-current-location-zipcode/11271458#11271458.
[29]
T2API Website. http://home.engineering.iastate.edu/ anhnt/Research/T2API/.
[30]
S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of International Conference on Automated Software Engineering, ASE ’07, pages 204–213. ACM, 2007.
[31]
Introduction Approach Overview T2API's Architecture Training Translation Tool Development Related Work Conclusion Acknowledgements References

Cited By

View all
  • (2024)DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASIİstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi10.55071/ticaretfbd.1354040Online publication date: 21-Mar-2024
  • (2023)Improving Code Completion by Solving Data Inconsistencies in the Source Code with a Hierarchical Language ModelElectronics10.3390/electronics1207157612:7(1576)Online publication date: 27-Mar-2023
  • (2023)Employing Source Code Quality Analytics for Enriching Code Snippets DataData10.3390/data80901408:9(140)Online publication date: 31-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
November 2016
1156 pages
ISBN:9781450342186
DOI:10.1145/2950290
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. API Usage Synthesis
  2. Graph-based Statistical Machine Translation
  3. Text-to-Code Translation

Qualifiers

  • Research-article

Funding Sources

Conference

FSE'16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)26
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASIİstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi10.55071/ticaretfbd.1354040Online publication date: 21-Mar-2024
  • (2023)Improving Code Completion by Solving Data Inconsistencies in the Source Code with a Hierarchical Language ModelElectronics10.3390/electronics1207157612:7(1576)Online publication date: 27-Mar-2023
  • (2023)Employing Source Code Quality Analytics for Enriching Code Snippets DataData10.3390/data80901408:9(140)Online publication date: 31-Aug-2023
  • (2023)Explaining Transformer-based Code Models: What Do They Learn? When They Do Not Work?2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00020(96-106)Online publication date: 2-Oct-2023
  • (2023)Source Code Recommender Systems: The Practitioners' Perspective2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: May-2023
  • (2023)On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00181(2149-2160)Online publication date: May-2023
  • (2022)AutoTransformProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510067(237-248)Online publication date: 21-May-2022
  • (2022)Evaluating Automatic Program Repair Capabilities to Repair API MisusesIEEE Transactions on Software Engineering10.1109/TSE.2021.306715648:7(2658-2679)Online publication date: 1-Jul-2022
  • (2022)A3: Assisting Android API Migrations Using Code ExamplesIEEE Transactions on Software Engineering10.1109/TSE.2020.298839648:2(417-431)Online publication date: 1-Feb-2022
  • (2021)Natural mapping between voice commands and APIsOpen Computer Science10.1515/comp-2020-012511:1(135-145)Online publication date: 13-Jan-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media