research-article

Public Access

T2API: synthesizing API code usage templates from English texts with statistical translation

Authors:

Peter C. Rigby,

Anh Tuan Nguyen,

Tien N. NguyenAuthors Info & Claims

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 1013 - 1017

https://doi.org/10.1145/2950290.2983931

Published: 01 November 2016 Publication History

Abstract

In this work, we develop T2API, a statistical machine translation-based tool that takes a given English description of a programming task as a query, and synthesizes the API usage template for the task by learning from training data. T2API works in two steps. First, it derives the API elements relevant to the task described in the input by statistically learning from a StackOverflow corpus of text descriptions and corresponding code. To infer those API elements, it also considers the context of the words in the textual input and the context of API elements that often go together in the corpus. The inferred API elements with their relevance scores are ensembled into an API usage by our novel API usage synthesis algorithm that learns the API usages from a large code corpus via a graph-based language model. Importantly, T2API is capable of generating new API usages from smaller, previously-seen usages.

References

[1]

M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In Proceedings of the International Symposium on Foundations of Software Engineering, FSE 2014, pages 281–293. ACM, 2014.

Digital Library

[2]

M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Suggesting accurate method and class names. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 38–49. ACM, 2015.

Digital Library

[3]

M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 207–216. IEEE Press, 2013.

Digital Library

[4]

M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In Proceedings of the 32nd International Conference on Machine Learning, ICML ’15. ACM, 2015.

Digital Library

[5]

A. Bacchelli, A. Cleve, M. Lanza, and A. Mocci. Extracting structured data from natural language documents with island parsing. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pages 476–479. IEEE Computer Society, 2011.

Digital Library

[6]

S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 681–682. ACM, 2006.

Digital Library

[7]

M. Beller, G. Gousios, A. Panichella, and A. Zaidman. When, how, and why developers (do not) test in their IDEs. In Proceedings of 10th Symposium on Foundations of Software Engineering, ESEC/FSE’15, pages 179–190. ACM, 2015.

Digital Library

[8]

P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2):263–311, June 1993.

Digital Library

[9]

W.-K. Chan, H. Cheng, and D. Lo. Searching Connected API Subgraph via Text Phrases. In Proceedings of the 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 10:1–10:11. ACM, 2012.

Digital Library

[10]

T. Gvero and V. Kuncak. Synthesizing java expressions from free-form queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 416–432. ACM, 2015.

Digital Library

[11]

A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 837–847. IEEE Press, 2012.

Digital Library

[12]

C. Jaspan and J. Aldrich. Checking framework interactions with relationships. In European Conference on Object-Oriented Programming, pages 27–51. Springer-Verlag, 2009.

Digital Library

[13]

M. Kechagia, D. Mitropoulos, and D. Spinellis. Charting the API minefield using software telemetry data. Empirical Softw. Engg., 20(6):1785–1830, Dec. 2015.

Digital Library

[14]

P. Koehn. Statistical Machine Translation. Cambridge University Press, New York, NY, USA, 1st edition, 2010.

Digital Library

[15]

P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In Proc. of the Conference of the North American Chapter on Human Language Technology, NAACL’03, pages 48–54. Association for Computational Linguistics, 2003.

Digital Library

[16]

C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In Proceedings of the 31st International Conference on Machine Learning (ICML), June 2014.

[17]

D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the API jungle. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 48–61. ACM, 2005.

Digital Library

[18]

C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, 1999.

Digital Library

[19]

C. McMillan, D. Poshyvanyk, and M. Grechanik. Recommending source code examples via API call usages and documentation. In Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE ’10, pages 21–25. ACM, 2010.

Digital Library

[20]

L. L. Minku and X. Yao. How to make best use of cross-company data in software effort estimation? In Proceedings of the 36th International Conference on Software Engineering, ICSE’14, pages 446–456. ACM, 2014.

Digital Library

[21]

L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang. TBCNN: A tree-based convolutional neural network for programming language processing. CoRR, abs/1409.5718, 2014.

[22]

A. T. Nguyen and T. N. Nguyen. Graph-based statistical language model for code. In Proceedings of the 37th International Conference on Software Engineering, ICSE 2015. IEEE CS, 2015.

Digital Library

[23]

T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen. Graph-based mining of multiple object usage patterns. In Proceedings of Foundations of Software Engineering, ESEC/FSE ’09, pages 383–392. ACM, 2009.

Digital Library

[24]

M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pages 371–382. IEEE, 2009.

Digital Library

[25]

M. Raghothaman, Y. Wei, and Y. Hamadi. Swim: Synthesizing what i mean: Code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 357–367. ACM, 2016.

Digital Library

[26]

V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 419–428. ACM, 2014.

Digital Library

[27]

P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 832–841. IEEE Press, 2013.

Digital Library

[28]

StackOverflow. http://stackoverflow.com/questions/11270229/how-to-usegeocoder-to-get-the-current-location-zipcode/11271458#11271458.

[29]

T2API Website. http://home.engineering.iastate.edu/ anhnt/Research/T2API/.

[30]

S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of International Conference on Automated Software Engineering, ASE ’07, pages 204–213. ACM, 2007.

Digital Library

[31]

Introduction Approach Overview T2API's Architecture Training Translation Tool Development Related Work Conclusion Acknowledgements References

Cited By

Hatipoğlu ABilgin T(2024)DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASIİstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi10.55071/ticaretfbd.1354040Online publication date: 21-Mar-2024
https://doi.org/10.55071/ticaretfbd.1354040
Yang Y(2023)Improving Code Completion by Solving Data Inconsistencies in the Source Code with a Hierarchical Language ModelElectronics10.3390/electronics1207157612:7(1576)Online publication date: 27-Mar-2023
https://doi.org/10.3390/electronics12071576
Karanikiotis TDiamantopoulos TSymeonidis A(2023)Employing Source Code Quality Analytics for Enriching Code Snippets DataData10.3390/data80901408:9(140)Online publication date: 31-Aug-2023
https://doi.org/10.3390/data8090140
Show More Cited By

Index Terms

T2API: synthesizing API code usage templates from English texts with statistical translation
1. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Integrated and visual development environments
    2. Software libraries and repositories

Recommendations

Statistical translation of English texts to API code templates
ICSE-C '17: Proceedings of the 39th International Conference on Software Engineering Companion

We develop T2Api, a context-sensitive, graph-based statistical translation approach that takes as input an English description of a programming task and synthesizes the corresponding API code template for the task. We train T2Api to statistically learn ...
Eclipse API usage: the good and the bad

Today, when constructing software systems, many developers build their systems on top of frameworks. Eclipse is such a framework that has been in existence for over a decade. Like many other evolving software systems, the Eclipse platform has both ...
Aligning turkish and english parallel texts for statistical machine translation
ISCIS'05: Proceedings of the 20th international conference on Computer and Information Sciences

This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

November 2016

1156 pages

ISBN:9781450342186

DOI:10.1145/2950290

General Chair:
Thomas Zimmermann
Microsoft Research, USA
,
Program Chairs:
Jane Cleland-Huang
University of Notre Dame, USA
,
Zhendong Su
University of California at Davis, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

FSE'16

Sponsor:

SIGSOFT

FSE'16: 24nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering

November 13 - 18, 2016

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
1,044
Total Downloads

Downloads (Last 12 months)75
Downloads (Last 6 weeks)26

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hatipoğlu ABilgin T(2024)DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASIİstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi10.55071/ticaretfbd.1354040Online publication date: 21-Mar-2024
https://doi.org/10.55071/ticaretfbd.1354040
Yang Y(2023)Improving Code Completion by Solving Data Inconsistencies in the Source Code with a Hierarchical Language ModelElectronics10.3390/electronics1207157612:7(1576)Online publication date: 27-Mar-2023
https://doi.org/10.3390/electronics12071576
Karanikiotis TDiamantopoulos TSymeonidis A(2023)Employing Source Code Quality Analytics for Enriching Code Snippets DataData10.3390/data80901408:9(140)Online publication date: 31-Aug-2023
https://doi.org/10.3390/data8090140
Mohammadkhani ATantithamthavorn CHemmatif H(2023)Explaining Transformer-based Code Models: What Do They Learn? When They Do Not Work?2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00020(96-106)Online publication date: 2-Oct-2023
https://doi.org/10.1109/SCAM59687.2023.00020
Ciniselli MPascarella LAghajani EScalabrino SOliveto RBavota G(2023)Source Code Recommender Systems: The Practitioners' Perspective2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00182
Mastropaolo APascarella LGuglielmi ECiniselli MScalabrino SOliveto RBavota G(2023)On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00181(2149-2160)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00181
Thongtanunam PPornprasit CTantithamthavorn CDwyer MDamian DZeller A(2022)AutoTransformProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510067(237-248)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510067
Kechagia MMechtaev SSarro FHarman M(2022)Evaluating Automatic Program Repair Capabilities to Repair API MisusesIEEE Transactions on Software Engineering10.1109/TSE.2021.306715648:7(2658-2679)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TSE.2021.3067156
Lamothe MShang WChen T(2022)A3: Assisting Android API Migrations Using Code ExamplesIEEE Transactions on Software Engineering10.1109/TSE.2020.298839648:2(417-431)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TSE.2020.2988396
Sulír MPorubän J(2021)Natural mapping between voice commands and APIsOpen Computer Science10.1515/comp-2020-012511:1(135-145)Online publication date: 13-Jan-2021
https://doi.org/10.1515/comp-2020-0125
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents