Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3196398.3196411acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Towards extracting web API specifications from documentation

Published: 28 May 2018 Publication History

Abstract

Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily rely on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting significant parts of such specifications from web API documentation pages. Given a seed online documentation page of an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates, and HTTP methods - collectively describing the endpoints of the API.
We evaluate whether D2Spec can accurately extract endpoints from documentation on 116 web APIs. The results show that D2Spec achieves a precision of 87.1% in identifying base URLs, a precision of 80.3% and a recall of 80.9% in generating path templates, and a precision of 83.8% and a recall of 77.2% in extracting HTTP methods. In addition, in an evaluation on 64 APIs with pre-existing API specifications, D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. API consumers would benefit from D2Spec pointing them to, and allowing them thus to fix, such inconsistencies.

References

[1]
2016. scikit-learn. (2016). http://scikit-learn.org/stable/index.html.
[2]
2016. URL - W3C. (2016). https://www.w3.org/TR/url-1/.
[3]
2017. API Harmony. (2017). https://apiharmony-open.mybluemix.net.
[4]
2018. APIs.guru - Wikipedia for Web APIs. (2018). https://apis.guru/openapi-directory.
[5]
2018. OpenAPI Specification. (2018). https://github.com/OAI/OpenAPI-Specification.
[6]
2018. ProgrammableWeb API Directory. (2018). https://www.programmableweb.com/apis/directory/.
[7]
2018. RESTful API Modeling Language (RAML). (2018). https://raml.org/.
[8]
Manuel Álvarez, Alberto Pan, Juan Raposo, Fernando Bellas, and Fidel Cacheda. 2008. Extracting lists of data records from semi-structured web pages. Data & Knowledge Engineering 64, 2 (2008), 491--509.
[9]
Hanyang Cao, Jean-Rémy Falleri, and Xavier Blanc. 2017. Automated Generation of REST API Specification from Plain HTML Documentation. In Service-Oriented Computing (ICSOC). Springer International Publishing, Cham, 453--461.
[10]
Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo, et al. 2001. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In Proceedings of the International Conference of Very Large Data Bases (VLDB). Morgan Kaufmann, 109--118.
[11]
Barthélémy Dagenais and Martin P. Robillard. 2012. Recovering Traceability Links Between an API and Its Learning Resources. In Proceedings of the 34th International Conference on Software Engineering (ICSE). IEEE Press, 47--57.
[12]
Hamza Ed-douibi, Javier Luis Cánovas Izquierdo, and Jordi Cabot. 2017. Example-Driven Web API Specification Discovery". In Modelling Foundations and Applications. Springer International Publishing, Cham, 267--284.
[13]
Tiago Espinha, Andy Zaidman, and Hans-Gerhard Gross. 2015. Web API Growing Pains: Loosely Coupled yet Strongly Tied. Journal of Systems and Software 100 (2015), 27--43.
[14]
Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2014. Web data extraction, applications and techniques: a survey. Knowledge-based systems 70 (2014), 301--323.
[15]
Chushu Gao, Jun Wei, Hua Zhong, and Tao Huang. 2014. Inferring Data Contract for Web-Based API. In IEEE International Conference on Web Services (ICWS). IEEE, 65--72.
[16]
Alberto HF Laender, Berthier A Ribeiro-Neto, Altigran S da Silva, and Juliana S Teixeira. 2002. A brief survey of web data extraction tools. ACM Sigmod Record 31, 2 (2002), 84--93.
[17]
Jun Li, Yingfei Xiong, Xuanzhe Liu, and Lu Zhang. 2013. How Does Web Service API Evolution Affect Clients?. In 2013 IEEE 20th International Conference on Web Services (ICWS). 300--307.
[18]
Manning. 2009. Introduction to Information Retrieval. Cambridge Press.
[19]
Jussi Myllymaki. 2002. Effective web data extraction with standard XML technologies. Computer Networks 39, 5 (2002), 635--644.
[20]
Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Stephen Oney, and Amit Paradkar. 2012. Inferring Method Specifications from Natural Language API Descriptions. In Proceedings of the 34th International Conference on Software Engineering. 815--825.
[21]
Peter C Rigby and Martin P Robillard. 2013. Discovering essential code elements in informal documentation. In Proceedings of the 35th International Conference on Software Engineering (ICSE). IEEE Press, 832--841.
[22]
Carlos Rodríguez, Marcos Baez, Florian Daniel, Fabio Casati, Juan Carlos Trabucco, Luigi Canali, and Gianraffaele Percannella. 2016. REST APIs: A Large-Scale Analysis of Compliance with Principles and Best Practices. In Web Engineering. Springer International Publishing, Cham, 21--39.
[23]
S M Sohan, Craig Anslow, and Frank Maurer. 2015. SpyREST: Automated RESTful API Documentation Using an HTTP Proxy Server. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 271--276.
[24]
S. M. Sohan, C. Anslow, and F. Maurer. 2017. Automated example oriented REST API documentation at Cisco. In 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 213--222.
[25]
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API documentation. In Proceedings of the 36th International Conference on Software Engineering. ACM, 643--652.
[26]
Philippe Suter and Erik Wittern. 2015. Inferring web API descriptions from usage data. In Proceedings of the Third IEEE Workshop on Hot Topics in Web Systems and Technologies. 7--12.
[27]
Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /*Icomment: Bugs or Bad Comments?*/. SIGOPS Oper. Syst. Rev. 41, 6 (Oct. 2007), 145--158.
[28]
Erik Wittern, Vinod Muthusamy, Jim Alain Laredo, Maja Vukovic, Aleksander A. Slominski, Shriram Rajagopalan, Hani Jamjoom, and Arjun Natarajan. 2016. API Harmony: Graph-based search and selection of APIs in the cloud. IBM Journal of Research and Development 60, 2-3 (March 2016), 12:1--12:11.
[29]
Erik Wittern, Annie T. T. Ying, Yunhui Zheng, Julian Dolby, and Jim Alain Laredo. 2017. Statically Checking Web API Requests in JavaScript. In Proceedings of the 39th International Conference on Software Engineering (ICSE). IEEE Press, 244--254.
[30]
Erik Wittern, Annie T. T. Ying, Yunhui Zheng, Jim Alain Laredo, Julian Dolby, Christopher C. Young, and Aleksaner A. Slominski. 2017. Opportunities in Software Engineering Research for Web API Consumption. In 2017 IEEE/ACM 1st International Workshop on API Usage and Evolution (WAPI). IEEE, 7--10.
[31]
Yanhong Zhai and Bing Liu. 2005. Web data extraction based on partial tree alignment. In Proceedings of the International Conference on World Wide Web. ACM, 76--85.
[32]
Yanhong Zhai and Bing Liu. 2007. Extracting web data using instance-based learning. World Wide Web: Internet and Web Information Systems 10, 2 (2007), 113--132.

Cited By

View all
  • (2024)Leveraging Natural Language Processing and Data Mining to Augment and Validate APIsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685554(1906-1908)Online publication date: 11-Sep-2024
  • (2023)Enhancing REST API Testing with NLP TechniquesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598131(1232-1243)Online publication date: 12-Jul-2023
  • (2023)Understanding and Mitigating Twin Function Misuses in Operating System KernelIEEE Transactions on Computers10.1109/TC.2023.324036572:8(2181-2193)Online publication date: 1-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
May 2018
627 pages
ISBN:9781450357166
DOI:10.1145/3196398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Natural Language Processing and Data Mining to Augment and Validate APIsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3685554(1906-1908)Online publication date: 11-Sep-2024
  • (2023)Enhancing REST API Testing with NLP TechniquesProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598131(1232-1243)Online publication date: 12-Jul-2023
  • (2023)Understanding and Mitigating Twin Function Misuses in Operating System KernelIEEE Transactions on Computers10.1109/TC.2023.324036572:8(2181-2193)Online publication date: 1-Aug-2023
  • (2023)Carving UI Tests to Generate API Tests and API SpecificationProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00167(1971-1982)Online publication date: 14-May-2023
  • (2023)Exploiting Service-Discovery and OpenAPI in Multi-Agent MicroServices (MAMS) ApplicationsEngineering Multi-Agent Systems10.1007/978-3-031-48539-8_5(78-84)Online publication date: 29-May-2023
  • (2023)An Empirical Study of Web API Versioning PracticesWeb Engineering10.1007/978-3-031-34444-2_22(303-318)Online publication date: 6-Jun-2023
  • (2021)Towards Large-Scale Empirical Assessment of Web APIs EvolutionWeb Engineering10.1007/978-3-030-74296-6_10(124-138)Online publication date: 18-May-2021
  • (2020)Putting the semantics into semantic versioningProceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3426428.3426922(157-179)Online publication date: 18-Nov-2020
  • (2020)A First Look at the Deprecation of RESTful APIs: An Empirical Study2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME46990.2020.00024(151-161)Online publication date: Sep-2020
  • (2020)Automated Web Service Specification Generation Through a Transformation-Based LearningServices Computing – SCC 202010.1007/978-3-030-59592-0_7(103-119)Online publication date: 13-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media