Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3550356.3561542acmconferencesArticle/Chapter ViewAbstractPublication PagesmodelsConference Proceedingsconference-collections
research-article

Combining OCL and natural language: a call for a community effort

Published: 09 November 2022 Publication History

Abstract

The growing popularity and availability of pretrained natural language models opens the door to many interesting applications combining natural language (NL) with software artefacts. A couple of examples are the generation of code excerpts from NL instructions or the verbalization of programs in NL to facilitate their comprehension.
Many of these language models have been trained with open source software datasets and therefore "understand" a variety of programming languages, but not OCL.
We argue that OCL needs to jump into the machine learning bandwagon or it will risk losing its appeal as a constraint specification language. For that, the key first task is to create together an OCL corpus dataset amenable for natural language processing.

References

[1]
Imran Sarwar Bajwa, Behzad Bordbar, and Mark G. Lee. 2010. OCL Constraints Generation from Natural Language Specification. In Proceedings of the 14th IEEE International Enterprise Distributed Object Computing Conference, EDOC 2010, Vitòria, Brazil, 25-29 October 2010. IEEE Computer Society, 204--213.
[2]
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie S. Chen, Kathleen Creel, Jared Quincy Davis, Dorottya Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark S. Krass, Ranjay Krishna, Rohith Kuditipudi, and et al. 2021. On the Opportunities and Risks of Foundation Models. CoRR abs/2108.07258 (2021). arXiv:2108.07258 https://arxiv.org/abs/2108.07258
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[4]
Lola Burgueño, Jordi Cabot, Manuel Wimmer, and Steffen Zschaler. 2022. Guest editorial to the theme section on AI-enhanced model-driven engineering. Softw. Syst. Model. 21, 3 (2022), 963--965.
[5]
Jordi Cabot, Raquel Pau, and Ruth Raventòs. 2010. From UML/OCL to SBVR specifications: A challenging transformation. Inf. Syst. 35, 4 (2010), 417--440.
[6]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[7]
Birgit Demuth, Heinrich Hussmann, and Sten Loecher. 2001. OCL as a specification language for business rules in database applications. In International Conference on the Unified Modeling Language. Springer, 104--117.
[8]
Marina Egea and Carolina Dania. 2019. SQL-PL4OCL: an automatic code generator from OCL to SQL procedural language. Software & Systems Modeling 18, 1 (2019), 769--791.
[9]
Martin Gogolla and Jordi Cabot. 2016. Continuing a Benchmark for UML and OCL Design and Analysis Tools. In Software Technologies: Applications and Foundations - STAF 2016 Collocated Workshops: DataMod, GCM, HOFM, MELO, SEMS, VeryComp, Vienna, Austria, July 4-8, 2016, Revised Selected Papers (Lecture Notes in Computer Science, Vol. 9946), Paolo Milazzo, Dániel Varrò, and Manuel Wimmer (Eds.). Springer, 289--302.
[10]
SOM Research Group. 2022. NL-OCL corpus - Git Repository. https://github.com/SOM-Research/nl-ocl.
[11]
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018).
[12]
Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, November 16-20, 2020.
[13]
José Antonio Hernández Lòpez, Javier Luis Cánovas Izquierdo, and Jesús Sánchez Cuadrado. 2022. ModelSet: a dataset for machine learning in model-driven engineering. Softw. Syst. Model. 21, 3 (2022), 967--986.
[14]
Josh G. M. Mengerink, Jeroen Noten, and Alexander Serebrenik. 2019. Empowering OCL research: a large-scale corpus of open-source data from GitHub. Empir. Softw. Eng. 24, 3 (2019), 1574--1609.
[15]
Farid Meziane, Nikos Athanasakis, and Sophia Ananiadou. 2008. Generating natural language specifications from UML class diagrams. Requirements Engineering 13, 1 (2008), 1--18.
[16]
XiPeng Qiu, TianXiang Sun, YiGe Xu, YunFan Shao, Ning Dai, and XuanJing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences 63, 10 (sep 2020), 1872--1897.
[17]
Ben Wang. 2021. Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingofiolz/mesh-transformer-jax.
[18]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45.
[19]
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium.
[20]
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017).

Cited By

View all
  • (2024)Exploring Dependencies Among Inconsistencies to Enhance the Consistency Maintenance of Models2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00023(147-158)Online publication date: 12-Mar-2024
  • (2023)On Codex Prompt Engineering for OCL Generation: An Empirical Study2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00033(148-157)Online publication date: May-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MODELS '22: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings
October 2022
1003 pages
ISBN:9781450394673
DOI:10.1145/3550356
  • Conference Chairs:
  • Thomas Kühn,
  • Vasco Sousa
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • Univ. of Montreal: University of Montreal
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OCL
  2. community
  3. corpus
  4. dataset
  5. natural language

Qualifiers

  • Research-article

Funding Sources

Conference

MODELS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 144 of 506 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Dependencies Among Inconsistencies to Enhance the Consistency Maintenance of Models2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00023(147-158)Online publication date: 12-Mar-2024
  • (2023)On Codex Prompt Engineering for OCL Generation: An Empirical Study2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00033(148-157)Online publication date: May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media