Welcome to the 1st edition of the International Workshop on Natural Language-Based Software Engineering (NLBSE). The potential of Natural Language Processing (NLP) and Natural Language Generation (NLG) to support developers and engineers in a wide number of software engineering-related tasks (e.g., requirements engineering, extraction of knowledge and patterns from the software artifacts, summarization and prioritization of development and maintenance activities, etc.) is increasingly evident. Furthermore, the current availability of libraries (e.g., NLTK, CoreNLP, and fasttext) and models (e.g., BERT) that allow efficiently and easily dealing with low-level aspects of natural language processing and representation, pushed more and more researchers to closely work with industry to attempt to solve software engineers' real-world problems.
Proceeding Downloads
Unsupervised extreme multi label classification of stack overflow posts
Knowing the topics of a software forum post, such as those on StackOverflow, allows for greater analysis and understanding of the large amounts of data that come from these communities. One approach to this problem is using extreme multi label ...
Understanding digits in identifier names: an exploratory study
Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers ...
From zero to hero: generating training data for question-to-cypher models
Graph databases employ graph structures such as nodes, attributes and edges to model and store relationships among data. To access this data, graph query languages (GQL) such as Cypher are typically used, which might be difficult to master for end-...
Automatic identification of informative code in stack overflow posts
Despite Stack Overflow's popularity as a resource for solving coding problems, identifying relevant information from an individual post remains a challenge. The overload of information in a post can make it difficult for developers to identify specific ...
NLBSE'22 tool competition
We report on the organization and results of the first edition of the Tool Competition from the International Workshop on Natural Language-based Software Engineering (NLBSE'22). This year, five teams submitted multiple classification models to ...
Issue report classification using pre-trained language models
This paper describes our participation in the tool competition organized in the scope of the 1st International Workshop on Natural Language-based Software Engineering. We propose a supervised approach relying on fine-tuned BERT-based language models for ...
BERT-based GitHub issue report classification
Issue tracking is one of the integral parts of software development, especially for open source projects. GitHub, a commonly used software management tool, provides its own issue tracking system. Each issue can have various tags, which are manually ...
Predicting issue types with seBERT
Pre-trained transformer models are the current state-of-the-art for natural language models processing. seBERT is such a model, that was developed based on the BERT architecture, but trained from scratch with software engineering data. We fine-tuned ...
GitHub issue classification using BERT-style models
Recent innovations in natural language processing techniques have led to the development of various tools for assisting software developers. This paper provides a report of our proposed solution to the issue report classification task from the NL-Based ...
CatIss: an intelligent tool for categorizing issues reports using transformers
Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about ...
On the evaluation of NLP-based models for software engineering
NLP-based models have been increasingly incorporated to address SE problems. These models are either employed in the SE domain with little to no change, or they are greatly tailored to source code and its unique characteristics. Many of these approaches ...
Identification of intra-domain ambiguity using transformer-based machine learning
Recently, the application of neural word embeddings for detecting cross-domain ambiguities in software requirements has gained a significant attention from the requirements engineering (RE) community. Several approaches have been proposed in the ...
Can NMT understand me?: towards perturbation-based evaluation of NMT models for code generation
Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to ...
Supporting systematic literature reviews using deep-learning-based language models
Background: Systematic Literature Reviews are an important research method for gathering and evaluating the available evidence regarding a specific research topic. However, the process of conducting a Systematic Literature Review manually can be ...
Story point level classification by text level graph neural network
Estimating the software projects' efforts developed by agile methods is important for project managers or technical leads. It provides a summary as a first view of how many hours and developers are required to complete the tasks. There are research ...
Recommendations
Summary of the 1st Natural Language-based Software Engineering Workshop (NLBSE 2022)
Natural language processing (NLP) refers to automatic computa- tional processing of human language, including both algorithms that take human-produced text as input and algorithms that pro- duce natural-looking text as outputs. There is a widespread and ...