Abstract
While AI is extensively transforming Software Engineering (SE) fields, SE is still in need of a framework to consider overall all phases to facilitate Automated Software Evolution (ASEv), particularly for intelligent applications that are context-rich instead of conquering each division independently. Its complexity comes from the intricacy of the intelligent applications, the heterogeneity of the data sources, and the constant changes in the context. This study proposes a conceptual framework for achieving automated software evolution, emphasizing the importance of multimodality learning. A Selective Sequential Scope Model (3 S) model is developed based on the conceptual framework, and it can be used to categorize existing and future research when it covers different SE phases and multimodal learning tasks. This research is a preliminary step toward the blueprint of a higher-level ASEv. The proposed conceptual framework can act as a practical guideline for practitioners to prepare themselves for diving into this area. Although the study is about intelligent applications, the framework and analysis methods may be adapted for other types of software as AI brings more intelligence into their life cycles.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the advancements of Artificial Intelligence (AI) and Natural Language Processing (NLP) in the past decades, especially the rise of the Large Language Model (LLM) and multimodality learning, software engineering fields welcome AI techniques to be employed in every aspect of software cycles. Meanwhile, the research of intelligent applications has continuously been a hotspot [73] because of the increasing amount of data of multimodalities generated in various domains. This type of software is designed to adapt to constantly changing scenarios of rich context [70, 73], and some examples are listed in part C of Fig. 1. One primary characteristic of those applications is that a significant portion of their system behaviors is learned from continuous interaction with the users and environment involving detection and analysis of states and activities [10, 62, 67], unlike applications of banking or insurance with more matured and stable business logic.
The rapid evolution of hardware and software wheels bring more capabilities to intelligent applications meanwhile making the creation and maintenance of that software more intricate [13, 74], both fields of software engineering and intelligent applications are eager for breakthroughs in higher-level automation (HLA) - collaboratively resolving the challenges by benefiting from AI techniques. One form of HLA of the two fields can be Automated Software Evolution / Evolvement (ASEv) [29, 47], which is a term derived from software evolution and Automated Software Engineering(ASE). Software evolution [36] refers to a continuous change from a lesser, simpler, or worse state to a higher or better state. ASEv can be the hero for Context-Rich Intelligent Applications (CRIA) in providing a fast response to the changing context, not only at the traditional context adaption (decision-making) level but also at the system evolution level in yielding new requirements and making relevant design and development automatically, as shown by the interaction arrows between the ASEv facilities, intelligent systems and the intelligent system evolution process in Fig. 1.
For the purpose of introducing HLA or intelligence to software engineering, lots of correlated research has been applying Machine Learning (ML) and NLP techniques in requirement analysis [12] and bug detection [17, 56], verification automation [18, 42], model-driven development [65]. ASEv methodologies expect to build an automated procedure to make changes to an application or create a new application based on old systems statuses and available data if the current version of the system cannot fulfill the requirement of the present context.
One important essence of AI is fundamentally built upon data-driven mathematical analysis, logical reasoning, statistical learning, and algorithmic search. The data sometimes was referred to as the context of things. The things, such as applications, objects (e.g., camera, elevator), people, and even logic, include anything that has meaningful data that can benefit AI computing in making simple or complex context-aware decisions or recommendations, which can benefit from multimodality learning since the data are from various sources and may have different formats. Thus, ASEv for CRIA requires AI to thoroughly consider the context from both the software development side and the application running side. After the integration of various AI procedures, the final target is to produce some products that can be used by existing systems, methodologies, or human users. In software engineering areas, it can start by generating simple products like new code snippets.
However, the HLA of generation of new intelligent applications or features for CRIA requires researchers to have an overview of the combination between the ASEv process and the multimodality learning instead of only conquering partial of the task. A data-to-product multimodal conceptual framework proposed in this research, as a synthesis of interrelated components and variables, is designed as a response to this challenge, which can shine a light on related research. A conceptual framework is an integrated way of looking at problems [38], giving a broader understanding of a research problem, or telling a bigger map of possible relationships by joining together small individual concepts [28]. Intelligent applications are rich in contextual data, which is not only from environments but also from systems and software engineering processes. From this perspective, this framework is towards both "intelligent software" engineering and intelligent "software engineering," and the latter covers the applications that are context-rich in their software engineering process, which may not necessarily be a CRIA.
The subsequent sections of the article are organized as follows: Sect. 2 describes the background and related work. Section 3 outlines the key dimensions of the conceptual framework. Section 4 introduces a 3 S model for categorizing solutions in the research of ASEv. Section 5 evaluates the framework and the 3 S model by applying them to ASEv-related research. Section 6 discusses related topics of interest, limitations, and concerns. Finally, Sect. 7 provides the concluding remarks.
2 Backgrounds and related work
In order to better present the problem of achieving ASEv for CRIS, this section discusses the related concepts and literature in software evolution, data-to-product view, context dynamism, and multimodality.
2.1 Automated software evolution
Software evolution refers to "a continuous change from a lesser, simpler, or worse state to a higher or better state" [36]. To describe the HLA of CRIA, automated software evolution, a term derived from software evolution and Automated Software Engineering(ASE), is used to emphasize the updating and improving of the software. In this study, the term does not only include the process of software design, development, and maintenance; it also considers the deprecation of the software and the generation of the software from nowhere (generation of software based on patterns learned from other software or from the old, similar systems).
Historically, Computer Aided Software Engineering (CASE) [63] is a traditional approach that aids designers and developers in software engineering activities, including development and integration. However, CASE tools were criticized for being mainly descriptive. They provide neither a theoretical framework nor an attempt to explain the factors influencing adoption [27].
An interdisciplinary survey on challenges and state of the art in the evolution of automated production systems (APS) was conducted by Birgit et al. [64]. They argued that fundamental techniques such as variability modeling and tracing, which facilitate software evolution, are confined to the software domain. To develop those APS, not only software needs to be considered, but the system specifics and system design, which need to calculate the hardware, need to be considered throughout the whole process of product development. Although their research targets are more related to mechatronic automated products, such as washing machines or other automated systems that produce automated products, the perceived underlying rule can be applied to other types of software and hardware pairs. Take the iPhone’s hardware ecosystem and software ecosystem as an example; not only do the operating system updates happen after hardware updates, but those mobile applications will also have to adapt themselves to the new features provided by new hardware and software. A vanward idea about hardware considers it as a new type of software [3]. Software often needs to be revised to meet changing market and regulatory demands.
Martin [45] presented a survey on automatic software repair to resolve software bugs autonomously without interventions from human experts. The article discusses behavioral repair and state repair. Behavioral repair treats test suites, contracts, models, and crashing inputs as an oracle. State repair works at runtime using checkpoints and restarts, system reconfiguration, and invariant rehabilitation techniques. This survey work notably covers an extensive array of concepts in automatic software repair and spans a broad range of research areas, including not only software engineering but also operating systems, programming languages, and security.
A conceptual framework is presented by Nouredine et al. [21] for modeling the classification of software architecture evolution approaches based on six explicit dimensions (What, Where, When, Who, Why, and How questions) treating the architectural viewpoint as a primary focus. Taking "When" as an example, it refers to the time of evolution, which consists of design-time evolution and runtime evolution. The former is at the design stages of software, allowing improvement and extension of the architecture by predicting the evolution earlier. The latter, as considered a primary theme in architecture adaptation, encompasses evolving at compile-time, load time, and dynamic time. As a comparison, this research studies a conceptual framework for resolving how multimodalities and AI shape the future automatic software evolution, which encloses architectural evolution.
2.2 Data-to-product view
This subsection discusses a particular perspective of the "Data-to-Product" view, which describes a phenomenon of a software process that utilizes the data to produce the final product instead of utilizing the data to only generate intermediate decisions or recommendations. Compared with "requirement" and "feature" in terms like "data-to-feature" and "data-to-requirement," "product" is the final result of the software evolution and means high usability. So, it requires the AI techniques to be embedded into all ASEv life cycles, which will be mentioned in the framework and also illustrated in section 5.
One example of Data-to-Product can be found in Fig. 2; some new requirements can be categorized by evaluating the general features generated by learning from contextual data [60]. Auto selections or semi-auto selections can be made, and the selections will be sent to the decision-making and artifact generation step, where it can send a specific request based on combining extra considerations to form a loop of the decision-making process to improve the decision-making and artifact generation. The special request can ask for a new round of learning based on different features from a specific data set, feedback from other phases of product generation, or the result of the feature integration. This involves a process of optimization using a learning loop. Of course, the structure can be different. But this structure is just a demonstration that the product will be one that is final or approaching the final version. This view requires a constant collection of contextual data from SE cycles, applications, as well as environments.
More specifically, all types of data that can be used to improve the application or the development process can be counted as related context in Fig 2. These data will be sent to an unimodal or multimodal learning model (more info can be found in Fig. 3). Some initial features can be learned from the data. After another round of learning through LLM or expert system, decisions can be made about the generation of products automatically or semiautomatically if human experts are needed in the decision-making process. The effect from human experts can be predefined, like using a weight-based mechanism. According to the source data, different types of products can be generated, such as new requirements, new code snippets, new test cases, etc. After combining those new products, an application can even be generated using an automated way of learning from source data of requirements, code artifacts, configuration, and the relationship data between them. The power of product generation and evolvement capability is unlimited, given enough raw data.
2.3 White-box, black-box, and gray-box machine learning models
Black-box or white-box are software engineering terms describing whether the inner logic is known to the examiners or not. ML models are classified using X-Box (white, black, or gray-box) w.r.t how to analyze, model, and encode the context data. As summarized by Michael et al. [1]: black-box ML techniques refer to methods that generate models where the internal workings are either concealed or too complex to be analyzed, producing outputs based on the inputs. In contrast, white-box modeling involves models with transparency that can be analyzed in detail. Decision tree models, linear regression models, bayesian networks, and fuzzy cognitive maps are the common examples of white-box models [20, 52]; in comparison, deep neural networks, support vector machines, and Large Language Models (LLM) are the common examples of black-box models [54].
Gray-box is the development of an ensemble of black and White-Box models in order to combine and acquire the benefits of both, building a more efficient global composite model [5, 52]. If an ensemble of ML algorithms contains both black and White-Box models, like neural networks and linear regression, it can be considered as a Grey-Box.
A most recent gray-box research is from Pan et al. [48] and outlines a forward-looking roadmap for integrating Large Language Models (LLMs) and Knowledge Graphs (KGs). This roadmap features three primary frameworks: Firstly, KG-enhanced LLMs, which integrate KGs during both the pre-training and inference phases of LLMs to improve their understanding of acquired knowledge; Secondly, LLM-augmented KGs, which utilize LLMs to perform various KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering; and thirdly, Synergized LLMs + KGs, where LLMs and KGs collaborate equally to enhance each other, facilitating bidirectional reasoning that leverages both data and knowledge.
Takeishi et al. [59] define deep gray-box models as compositions of data-driven models and theory-driven models, with the former being deep neural networks and the latter being human-understandable models with a theory as a backbone. They empirically analyze the sum of regularizers, including those that can prevent the theory-driven models from being ignored.
2.4 Context dynamism
One primary characteristic of CRIA is that a great portion of their system behaviors is learned from continuous interaction with the users and environment involving detection and analysis of states and activities [10, 62, 67], unlike applications of banking or insurance with more matured and stable business logic.
For CRIA, contextual data can change frequently, and so can the decision processes of the applications. Thus, contextual reasoning logic needs to adapt to the changes accordingly. The dynamism can sometimes lead to a new system feature [70]. For example, a smart elevator ships people in most scenarios, but for a period of time, it sees people convoying some large devices. Then, how it updates itself to fit the new normal and many more new normals could be a challenging problem. Some other types of dynamism can be perceived given enough data and time for analysis. Thus, the software team or a designated system should be smart enough to take care of updating the intelligent elevator system by identifying the dynamism automatically.
Thus, context dynamism refers to the dynamic aspects of entities, attributes, and states and the corresponding relationship changes within intelligent system domains [69]. One type of context dynamism can be the new joining of entity, attribute, or state, and an exemplified instance is "a new student is joining a smart campus." Other types include relationship changes, hotspot situation forming, and cycles and steps of transitions affecting decision-making [69].
Although big changes for business-matured applications can be rare, some small or medium changes due to context changes are unavoidable, such as adaptation to regularities, new versions of the underlying software, or optimized workflow from analyzing contextual data. Although the main focus of ASEv in this study is CRIA, it can also be applied to business-stable applications. Changes in the data can result in changes in the code or configuration. A conceptual framework is needed to support the building of this automated application evolution process.
2.5 Multimodality
Multimodality is extensively studied in AI / ML research [2, 44, 51] and industry engineering [22, 26]. It refers to a phenomenon of harnessing data from heterogeneous sources to achieve a better understanding of some target data or make a more precise decision by reasoning those data from multimodalities. Due to the intrinsic similarity, the popularity of multimodality can be derived from the data fusion [15, 34], while multimodality is a natural fact that exists in our daily life, as Tadas et al. pointed out in their article [2]: "Our experience of the world is multimodal - we see objects, hear sounds, feel the texture, smell odors, and taste flavors."
Multimodal Deep Learning models typically consist of multiple neural networks, each specialized in analyzing a particular modality. The output of these networks is then combined using various fusion techniques, such as early fusion, late fusion, or hybrid fusion, to create a joint representation of the data. A fascinating observation from empirical multimodal learning is that a model trained with multiple modalities can outperform a finely-tuned unimodal model, even on population data of the same unimodal task [41].
A definition of multimodality for ML was also given by Letitia et al. [49]: A ML task is multimodal when inputs or outputs are represented differently or are composed of distinct types of atomic units of information. In this perspective, to achieve automated software evolution by utilizing data with different formats from various phases of the software engineering is a multimodal learning process, as shown in Fig. 3, which contains a structure adapted from the work [50].
Thus, multimodal learning in software engineering can be defined as the term to describe the learning process with data from heterogeneous sources of various phases of software engineering. Data includes text, documents, code, images, videos, models, etc., from external and internal environments of applications. The definition emphasizes the differences between sources: as long as the sources are different, they contribute to multimodality, regardless of whether their formats are the same or distinct.
3 A bigger map and the key dimension/feature analysis
Figure 4 demonstrates a Data-to-Product multimodal framework for ASEv. It includes dimensions of Context Sources (C), Data Modalities (D), Multimodal Learning (M), Key features of ASEv (K), and Products (P). The internal relations among them are as follows: C provides data to D, D is the source of M, K is the key feature of ASEv, and M. P is the result of applying K. The process of generating P is not trivial and can need a tailored platform to achieve each of the tasks.
The items listed in the framework dimensions are not exhaustive, and they are examples that should be considered with higher priorities to the author’s best knowledge. In the following subsections, key dimensions, exemplified items, and features of the framework are analyzed.
3.1 Context sources and data modalities
Context sources for software design, development, and maintenance encompass various activities in any phase of the software engineering life cycle, as well as internal and historical data from intelligent applications and correlated environments. The underlying philosophy behind incorporating such a broad spectrum of context sources is that the more related details are collected, the better decisions can be made in generating the final software products. This assumption is based on a hypothesis that the learning methods can effectively leverage the diverse data pool. Contextual data derived from applications includes users’ habits and other historical information, alongside sensor data from the surrounding environment, which can help to capture whether a user is correlated with a specific state, such as weather temperature, body movement, noise level, and more.
The software engineering process yields various types of data, including presentations, videos, discussion audio, images of graphical design, text from the requirement documents, and maintenance tickets, which record the steps of communications and complex business logic discussions. However, leveraging certain types of this data requires careful handling of privacy concerns and obtaining consent from participants [30, 55]. Techniques have been developed to safeguard the privacy of individuals in videos, such as face obfuscation. A study of face obfuscation by Kaiyu et al. [66] found that the features learned on obfuscated images are equally transferable when performing experiments with transfer learning to downstream tasks such as object recognition, scene recognition, and face attribute classification.
3.2 Multimodal learning
The definitions of mulitmodalities are comprehensively studied by Letitia et al. [49]. Three types of definitions are discussed in their work, including human-centered (relies on human perceptual experience such as hearing and seeing), machine-centered (how is information represented), and ML task-related (inputs and outputs are represented differently or are composed of distinct types of atomic units of information). According to the ML task-related definition, it is multimodal learning even if only text data is involved and if the input and output formats are different or it is handling different units of data.
There are a bunch of techniques in multimodal learning. To better describe this component, a categorization method is particularly used by dividing them into white-box, black-box, and gray-box ML models. Firstly, they are terms that originated in the software engineering area. Secondly, the three categories can provide a complete coverage of the models. In each of the categories, specific methods are listed in the framework component, as shown in Fig. 4. Meanwhile, multimodal learning also considers how to preprocess data and the fusion of the data. So, this component in our framework consists of the subcomponents representing data processing tasks and machine learning methods that work with different inputs or outputs or distinct types of atomic units of information.
3.3 Key features and products
Key features of ASEv, as the objectives of multimodal learning, are essential components of the learning process. They are functional modules for building final products, including new features, programs, systems, and their updates. After all, the ultimate objective remains to be creating useful products. This perspective underscores a shift towards a Data-to-Product approach (the connection from D to P in Fig. 4), emphasizing the transformation of data into tangible outcomes rather than focusing solely on the evolution procedures. Consequently, the automated software evolution process must encompass a decision-making loop to address this aspect effectively, e.g., a new feature can be examined by a new phase of learning. Thus, the conceptual framework outlined here not only delves into the intricacies of various domains and subproblems but also explores the integration of them to fulfill the Data-to-Product view.
Feature generation and enhancement benefit the function designs and can be learned from the user’s habits and changes in the context of the environment. They are mapped with the software requirement phase. A feedback loop module is particularly necessary for the reason of the second-order effects, as discussed in [58], which can be observed due to the dynamic nature of human learning systems. When working with intelligent software agents, human users continually refine their models, beliefs, and expectations through trial and error, resulting in a learning process akin to reinforcement learning. In some cases, the learning dynamics may exhibit positive feedback loops, potentially amplifying smaller effects over extended periods.
Dynamic code generation deals with the software development phase. For example, the work [37] delves into the realm of dynamic code generation, particularly focusing on competition-level code generation with AlphaCode. System integration, as an important consideration of today’s complex system development, cannot be neglected in order to provide robust and fault-tolerant services. Automated bug detection and correction, as mapped to the phase of software testing, ensures the correctness and quality of the software. Configuration, deployment, and dependency management are other crucial factors that facilitate the automation of the overall software evolution process.
4 3-S model
The tasks in achieving ASEv through multimodality learning correlate with the SE phases, as shown in the rows of Fig. 5. So do the products mentioned in the conceptual framework. One of the benefits of developing the conceptual framework is identifying products through analyzing the concepts related to ASEv and their relations. Each product is generated through an automation approach that can cover multiple phases of SE. Given some ASEv research working with generating a product mentioned in the ASEv conceptual framework, it is significant to have a way to clearly describe the research coverage in terms of the SE phases and tasks.
This section presents a Selective Sequential Scope Model (SSSM or 3 S model) for categorizing research work that spans different scopes (one or more phases correlate with one or more tasks). It is sequential because it is practical to work on one phase based on its prior phase. As a counterexample, it is still not practical to directly connect the phases of environment and design without covering/considering the requirement phase. Although there could be ways to achieve that, which can happen in the future or beyond the author’s knowledge, another model could be used to describe those approaches. This model is solely for the sequential coverage of the scopes.
4.1 Tasks
In order to support the transition from data to product, it is necessary to consider another dimension, data-logic-model, as shown in Fig. 5. Data processing can include data modeling, preprocessing, fusion, extracting initial features, or high-level features. Some ML algorithms can be applied initially to learn some high-level features. Logics can include business logic or entity-relationship logic. The logic deduction is to use multimodality learning to generate the decisions, knowledge, or other types of models described using ML models.
Logic deduction is the reasoning process, including utilizing LLM or other multimodality learning techniques. The result of multimodal learning is a model or system that can perform tasks or make predictions based on a more comprehensive and holistic understanding of the input data. For example, in a multimodality model that processes both textual and visual data, the product can have improved performance in tasks like image captioning [68], sentiment analysis on multimedia content [57], or any other applications where combining information from different modalities provides a richer and more accurate representation of the underlying patterns or semantics.
The formal model is a way to represent those generated ML models or reasoning systems, or it is a visualization of them; either way, it can be used to represent the result of learning from the data. Generally speaking, models here refer to complex functions that represent the products or resolve the decision task given input data. Notably, general-purpose LLMs are not formal models in this dimension and are treated as a logic deduction tool if included in a methodology.
4.2 Software engineering (SE) phases
The main body of the SE phases is from the classical waterfall model. However, a special phase zero is added as the start phase, and phase seven is added as a flexible phase. Phase zero is the environment from which all the requirements and context data are based. Phase seven is named "PhasingOut" in Fig. 5. However, it can be documenting, configuration, or other things.
Particularly, the "phasing out" stage takes care of how to store or delete the data while considering privacy protection and other regulations. Meanwhile, the data processed in this stage may preserve the business logic and database logic so that it can be used for future development.
4.3 Scope
This 3 S model is presented with two dimensions: tasks and SE phases. The scope of the 3 S model means the intersections of the two dimensions covered by an ASEv research. The research can be analyzed concerning its relationship to the two dimensions in order to decide which scope it relates to. For example, if a study explores using UML design to generate the code and the test cases in an automated way, it can be viewed as covering p2 to p4 (including p3) in the SE phase dimension. If it is using data processing and logic deduction, it also covers t0 to t1 in the task dimension.
Some notations are defined to describe the scope in a simplified way. Firstly, a pair is used to represent the coverage in two dimensions. Secondly, > and < are used to describe the direction of the coverage. If both are used, it means that the study allows the affection of phases in two ways, e.g., P3 < > P4, the data model learned from the development phase can generate the test cases, and meanwhile, the model generated from the test phases can help with the code development. If there is no continuous coverage, then a comma is used to include indexes of phases or tasks that are not contiguous to each other. The arrows shown in Fig. 5 are used to denote scopes. The type with a wider area covering multiple columns considers two or more tasks simultaneously. The type that covers only one column involves one task.
Here are some more examples: If the scope is represented by (t0, p0> 1), it covers one task and only spans over two nearby phases. Another common route can be (t0, p3> 4), which considers the code generation and test code or test cases generation according to the code. If the route is (t0, p3< 4), then the research can be test-driven development.
5 Apply the framework and 3 S model to ASEv-related research
This section presents some endeavors that target ASEv or can help with ASEv. The data-to-model conceptual framework is applied to analyze these studies, and the 3 S model categorizes them. It is not an exhaustive search of the literature, and only some representative research is selected as a demonstration.
In a broad sense, deep learning falls under the category of machine learning, and LLMs are special deep learning models that are trained on large amounts of data. In this section, we differentiate them into two categories, machine learning and LLM-based methods, mainly due to LLMs being trained on significantly more extensive and more diverse data corpora.
5.1 Agent-based software development
ChatDev [53], a virtual chat-powered company for software development, brings together "software agents" from diverse social identities, including chief officers, professional programmers, test engineers, and art designers. "collaborative chatting" among those agents will take place when presented with a human client’s request. The method of ChatDev enables the automatic crafting of comprehensive software solutions that encompass source codes, environment dependencies, and user manuals.
ChatDev employs the widely adopted waterfall model and divides the software development process into four distinct phases: designing, coding, testing, and documenting. Particularly, a chat chain is designed to facilitate breaking down each phase into atomic subtasks, allowing for proposing and validating solutions through context-aware communication.
Their work with agents depends on AI tools such as ChatGPT to generate code and evaluate the code across the stages of designing, coding, testing, and documenting. Since documenting is not listed in the 3 S model, only the first three stages are considered. For adding phases such as documenting to the model, a customized extension can be made to the 3 S model. Thus, according to the scope definition in the 3 S model, the pair (t2, p2>4) can be used to represent the scope of this research. Similarly, the scopes of other research mentioned in this section are generated and shown in Table 1.
5.2 Genetic programming
GenProg [35] uses an extended form of genetic programming (GP) to evolve a program variant to automatically repair it. GP uses computational analogs of biological mutation and crossover to generate new program variants, which are evaluated by a user-defined fitness function; GenProg uses the input test cases to evaluate the fitness, and individuals with high fitness are selected for continued evolution. This GP process is successful when it produces a variant that passes all tests and can be encoded with required behaviors and bugs. In particular, GenProg uses only statements from the program itself to repair errors and does not invent new code.
5.3 Ontology-based
A lightweight semantic processing approach [31] was designed to develop requirement documents using an ontology technique, where they establish a mapping between a requirement specification and ontological elements. Changes in a requirement ontology can be used to predict the next changes, which enables the method to improve the completeness of requirements specification incrementally. They are working only with the requirement phase. However, their method involves all three tasks.
5.4 Case-based
An Automated coding system - ACAI (Automated Coder using Artificial Intelligence) is proposed by Danilchenko et al. [16]. They treat code generation as a design problem, and the solutions are plans. Thus, ACAI is built with combining the techniques of routine design, template-based programming, and case-based reasoning. By accepting the objective and the specifications for how to achieve the objective, ACAI yields the result in the form of a working Java program. ACAI highly relies on tailored input templates and program templates, and more importantly, the user of ACAI should design the target code by applying the mechanisms the system provides. Thus, this system is categorized as working on the formal model in the phase of development without covering automated data processing and logic deduction.
5.5 State-based
CSMEngine [70] is proposed to automatically model and reason high-level context. Their study applied the methodology to an intelligent elevator system. CSMEngine is based on context state machine modeling methods [71] and CSM-H-R [69] is a framework as an extension of the core model, to facilitate the modeling of the context dynamism. CSMEngine works with high-level context information, which is the data fused from the environment or applications. After automatically modeling the collected context, it can reason the context to identify new rules the applications can adapt to or the design process can utilize.
5.6 Machine learning
Fixing bugs is an essential activity in the maintenance of software. Bugs can be identified in the testing phase or reported by analyzing the failure information of a running system. No matter how the bugs are identified, they are supposed to be corrected. In this study, bug fixing is considered to be in the development and testing phases of the waterfall model.
DeepFix [24] is a method for fixing common C language errors by deep learning. They treat the problem of fixing a programming error as a sequence-to-sequence learning problem, which requires a program to be represented as a sequence. They constructed the dataset by collecting correct programs and erroneous programs. Through the mutation of correct programs, they can build training examples that consist of the mutated programs and the fixes. Their trained model achieved a successful fix rate of 27%. By collecting data from code examples and designing the deep learning neural network architecture for an attention-based sequence-to-sequence model, long-term dependencies can be captured to utilize the context in fixing the bugs.
ArduCode [8] is a predictive framework for assisting automation engineers in classifying automation code, finding similar code snippets, and reasoning about the hardware selection of sensors and actuators to resolve the requirement challenges from hardware and software integration. They use paragraph embedding in classification and achieve precision that is close to human annotation. Hardware recommendations are studied using autoencoder models. Thus, their work is categorized in the 3 S model as working with the requirement and development.
5.7 LLM-based
CodeBERT[19] is a transformer-based neural model that has been pre-trained on an extensive corpus of source code. Ehsan et al. [43] apply it for automated program repair of Java simple bugs. Through experiments, they found the method can generate fix codes in 19–72% of the cases with different types of their datasets, which are exactly the same as the fix codes implemented by developers.
Nejjar et al. [46] conducted an empirical study on code generation tasks using LLM-based tools; specifically, the code generation tasks include a matrix multiplication function, scripts for data analysis, and data visualization. They performed the study by providing prompts to the tools, including GPT3.5, GPT4, Bing Chat, Google Bard, and YouChat. Their report shows the quality of code w.r.t. correctness, efficiency, comprehensibility, and overall rating. Liu et al. [40] conducted an empirical study on generating software code for safety-critical domains using GPT-4. Compared with Nejjar’s work, this study applied more approaches for utilizing the LLM tool, including overall requirements, specific requirements, and augmented prompts. The approach they applied is working on requirements and code development. The LLMs they used are general-purpose models, not formal models specifically for achieving ASEv tasks. They use LLMs only as a logic deduction tool.
Liu et al. [39] explore an LLM-powered test case generation method. A framework called AID is proposed. The framework consists of three main stages: the generation of program variants, the generation of program inputs, and differential testing. AID first feeds problem descriptions, program descriptions, and programs under test into LLMs to generate program variants and test inputs; then a diversity-first differential testing is applied to create test cases. AID is considered to be working in the phase of testing and the task of logic deduction.
5.8 Summary
The research presented in this section is not based on a comprehensive survey. However, to the best of the authors’ knowledge, little research covers more than two stages in the SE phase dimension. As more data is collected and multimodal learning is applied to more specific tasks, it is predicted that future research will cover a broader scope so that a higher level of automation of the software evolution process can be achieved.
6 Discussion
This section presents an anticipated future trend and related prospective exploration directions and discusses some interesting topics that are closely relevant to the conceptual framework but were not discussed thoroughly in the main body due to the article’s focus on multimodality and data-to-product process. Open Research Questions (ORQ) are specially identified for advancing the work of the proposed ASEv conceptual framework and the 3 S model.
6.1 Future trends
A trend is anticipated in research regarding leveraging data across multiple software engineering phases. Several prospective avenues for exploration are envisioned as follows:
-
1.
Initiating tasks, such as code generation, from raw environmental data or other types of contextual data.
-
2.
Implementing logic adaptation strategies to dynamically adjust software behavior based on evolving requirements.
-
3.
Harnessing Artificial General Intelligence [33] to map and synthesize code logic, followed by iterative optimization and refinement processes.
6.2 Full automation and semi-automation
All the aforementioned ASEv features could be approached through two methods: full automation (with minimum exterior intervention, e.g., initial configuration) and semi-automation (with significant exterior assistance, e.g., needing help in data generation, logic deduction). When simple tasks can be initiated through automated means, complex tasks necessitate an initial semi-automated approach, with the potential to transition gradually to full automation, for the initial stages of achieving full automation may pose technical challenges. This study presents a conceptual framework to facilitate the full automation of software evolution. Identifying tasks suitable for full automation and those requiring a semi-automated approach can help to apply a gradual progression toward automation as data thresholds are met and accumulated.
6.3 X-Box learning
Explainability does not come for free. In order to make the model and learning process more understandable, extra efforts should be made to combine white-box learning with black-box learning to represent the knowledge in a way that human beings can comprehend. Although black-box learning models may be more efficient in time when dealing with large datasets and more accurate, as shown in many studies in the past decades, it does not mean they are always more accurate. The work of Emmanuel et al. [52] suggests white-box models or gray-box models can outperform black-box models, especially when the data amount is small. So, the real performance of various ASEv tasks involved needs to be investigated in practice.
6.4 Practical managerial significance
Management is a crucial facet in software engineering practices and theories [23, 32, 61], including project management, people management, progress management, quality management, risk management, and so on. The ’manager side’ can be human participants and management tools, while the ’managed side’ consists of human participants, software, and other control points. Practical Managerial Significance (PMS) can be a significant concern, as the ASEv process can affect the human participants and the software on both sides. Although this article primarily focuses on technical aspects and thus does not delve deeply into management concepts within the framework or the 3 S model, exploring the integration of management practices in the framework and discussing the ASEv’s PMS remains an interesting direction for future research.
6.5 Data amount and privacy
The prerequisites of multimodality learning rely on the availability of sufficient data; if the data does not meet a threshold, effective training sometimes becomes impossible. Privacy considerations often present obstacles to data acquisition, and the concerns should be thoroughly discussed and resolved. For some types of intelligent systems, if they are for public usage and have few privacy concerns, obtaining data from those intelligent systems’ context could be more accessible. Nevertheless, it might still be hard to get the data from the software development cycles to protect the privacy or other ethical concerns of the software development participants. Possible strategies can be leveraging data from different software types and conducting cross-domain research to mitigate data scarcity within specific software development cycles.
6.6 Other ethical concerns
There is a term in sociology called Technological Determinism (TD) [25], which suggests that technology is not just a tool for human users to achieve goals, but rather it controls and shapes human behaviors. This subsection is not to support or discuss TD, but among many influences that technologies can bring to human society, persuasion can particularly be an ethical concern of ASEv and the framework due to the applications of black-box models such as LLMs, which exhibit the traits of persuasion [9] and the possibility of being persuaded [72]. In the analysis of the interaction between intelligent software agents and human users [7], technologies to influence human user behavior towards specific objectives are examined, such as increasing the time spent on a website. This goal-driven behavior has been previously discussed in the literature [14]. It is imperative to critically analyze and regulate any technology with the potential to steer human behavior in the processes and products of ASEv.
6.7 Limitations and open research questions
This conceptual framework considers the software engineering phases as the context sources and as one dimension in the 3 S model. However, it’s important to note that different methodologies of a specific phase can influence the workflow in a nuanced way, a factor that needs to be carefully considered. For instance, as a crucial consideration in the design phase and from the perspective of comparing technical-driven and tailored-usability-driven, system design methodologies can include hard systems, soft systems [11], and socio-technical design [4] as a relatively balanced approach. Soft systems take into account human activity dynamics and their tailored needs [6], while hard systems focus more on techniques and engineering methods. Such distinctions can lead to different data or logic flows under the framework of ASEv. Thus, the below ORQs are proposed for the framework and the 3 S model.
-
1.
Extend the framework by considering various methodologies of SE phases. For example, in the design phase, the system design methodologies can be emphasized for a particular type of software.
-
2.
The framework is a general one that treats software as a whole. Software in different domains may have varying needs in the process of ASEv, so another dimension to consider is the specific domains. Besides, domains can be a third axis for the 3 S model.
-
3.
Expand the framework to incorporate ethical concerns and social impacts, such as PMS, providing valuable insights for software management.
7 Conclusion
ASEv of CRIS is imperative due to the unceasing accumulation of contextual data from multiple sources and the intrinsic complexity and dynamism of the business logic of those systems. The constantly changing context can bring intelligent applications with new and updated requirements, which may be beyond the capability of manual design and development processes by system analysts, designers, and developers. In the meantime, multimodality learning has shown its advantages in utilizing the data to resolve real-world problems and is promising in helping achieve ASEv.
This article studies the blueprint using multimodality learning by crafting a data-to-product conceptual framework for ASEv. In order to categorize the research in this area, a 3 S model is developed, using scope to represent the coverage of each research. The generated model can be an efficient method to categorize the related research when making the comparisons, as demonstrated in this study. This research is a preliminary step towards automated software evolution, and the proposed framework can act as a practice guideline for practitioners to prepare themselves for diving into this research area. Although the study is motivated by CRIA, the framework and analysis methods may be adapted for other types of software as AI brings more intelligence into their life cycles.
References
Affenzeller M, Burlacu B, Dorfer V, Dorl S, Halmerbauer G, Königswieser T, Kommenda M, Vetter J, Winkler S (2020) White box vs. black box modeling: On the performance of deep learning, random forests, and symbolic regression in solving regression problems. In Computer Aided Systems Theory–EUROCAST 2019: 17th International Conference, Las Palmas de Gran Canaria, Spain, February 17–22, 2019, Revised Selected Papers, Part I 17, pp. 288–295. Springer
Baltrušaitis T, Ahuja C, Morency LP (2018) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443
Baumann A (2017) Hardware is the new software. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems, HotOS ’17, New York, NY, USA, pp. 132-137. Association for Computing Machinery
Baxter G, Sommerville I (2011) Socio-technical systems: from design methods to systems engineering. Interact Comput 23(1):4–17
Bohlin TP (2006) Practical grey-box process identification: theory and applications. Springer, Berlin
Burge S (2015) An overview of the soft systems methodology. System Thinking, Approaches and Methodologies, pp 1–14
Burr C, Cristianini N, Ladyman J (2018) An analysis of the interaction between intelligent software agents and human users. Mind Mach 28:735–774
Canedo A, Goyal P, Huang D, Pandey A, Quiros G (2020) Arducode: predictive framework for automation engineering. IEEE Trans Autom Sci Eng 18(3):1417–1428
Carrasco-Farre C (2024) Large language models are as persuasive as humans, but why? about the cognitive effort and moral-emotional language of llm arguments. arXiv preprint[SPACE]arXiv:2404.09329
Cassavia N, Masciari E, Pulice C, Sacca D (2017) Discovering user behavioral features to enhance information search on big data. ACM Trans Interact Intell Syst 7(2):1–33
Checkland P, Poulter J (2020) Soft systems methodology. Systems approaches to making change, A practical guide, pp 201–253
Cho BS, Lee SW (2020) A comparative study on requirements analysis techniques using natural language processing and machine learning. J Korea Soc Comput Inf 25(7):27–37
Chu W, Wuniri Q, Du X, Xiong Q, Huang T, Li K (2021) Cloud control system architectures, technologies and applications on intelligent and connected vehicles: a review. Chin J Mech Eng 34(1):139
Cristianini N (2010) Are we there yet? Neural Netw 23(4):466–470
Dalla Mura M, Prasad S, Pacifici F, Gamba P, Chanussot J, Benediktsson JA (2015) Challenges and opportunities of multimodality and data fusion in remote sensing. Proc IEEE 103(9):1585–1601
Danilchenko Y, Fox R (2012) Automated code generation using case-based reasoning, routine design and template-based programming. In: Midwest Artificial Intelligence and Cognitive Science Conference, pp. 119–125
Deshmukh J, Annervaz K, Podder S, Sengupta S, Dubash N (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE International conference on software maintenance and evolution (ICSME), pp. 115–124. IEEE
Durelli VH, Durelli RS, Borges SS, Endo AT, Eler MM, Dias DR, Guimarães MP (2019) Machine learning applied to software testing: a systematic mapping study. IEEE Trans Reliab 68(3):1189–1212
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al. (2020) Codebert: A pre-trained model for programming and natural languages. arXiv preprint[SPACE]arXiv:2002.08155
Garcia IDCG, Sengupta D, Lorenzo MMG, Nowe A (2016) Grey-box model: An ensemble approach for addressing semi-supervised classification problems. In 25th Belgian-Dutch Conference on Machine Learning, pp. 1–3
Gasmallah N, Amirat A, Oussalah M, Seridi H (2018) Developing a conceptual framework for software evolution methods via architectural metrics. In Computational Intelligence and Its Applications: 6th IFIP TC 5 International Conference, CIIA 2018, Oran, Algeria, May 8-10, 2018, Proceedings 6, pp. 140–149. Springer
Gaw N, Yousefi S, Gahrooei MR (2022) Multimodal data fusion for systems improvement: a review. IISE Trans 54(11):1098–1116
Gilb T, Finzi S et al (1988) Principles of software engineering management, vol 11. Addison-wesley, Reading
Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: Fixing common c language errors by deep learning. In Proceedings of the aaai conference on artificial intelligence, Volume 31
Hallström J (2022) Embodying the past, designing the future: technological determinism reconsidered in technology education. Int J Technol Des Educ 32(1):17–31
Hou I, Man O, Mettille S, Gutierrez S, Angelikas K, MacNeil S (2024) More robots are coming: Large multimodal models (chatgpt) can solve visually diverse images of parsons problems. In Proceedings of the 26th Australasian Computing Education Conference, pp. 29–38
Iivari J (1996) Why are case tools not used? Commun ACM 39(10):94–103
Imenda S (2014) Is there a conceptual difference between theoretical and conceptual frameworks? J Soc Sci 38(2):185–195
Ivers J, Ozkaya I, Nord RL, Seifried C (2020) Next generation automated software evolution refactoring at scale. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1521–1524
Kagan D, Alpert GF, Fire M (2023) Zooming into video conferencing privacy. IEEE Transactions on Computational Social Systems
Kaiya H, Saeki M (2005) Ontology based requirements analysis: lightweight semantic processing approach. In Fifth international conference on quality software (QSIC’05), pp. 223–230. IEEE
Karolak DW, Karolak N (1995) Software engineering risk management: A just-in-time approach. IEEE Computer Society Press
Kumpulainen S, Terziyan V (2022) Artificial general intelligence vs. industry 4.0: Do they need each other? Proc Comput Sci 200:140–150
Lahat D, Adali T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges, and prospects. Proc IEEE 103(9):1449–1477
Le Goues C, Nguyen T, Forrest S, Weimer W (2011) Genprog: a generic method for automatic software repair. IEEE Trans Software Eng 38(1):54–72
Lehman M, Fernáandez-Ramil JC (2006) Software evolution. Software evolution and feedback, Theory and practice, pp 7–40
Li Y, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Dal Lago A et al (2022) Competition-level code generation with alphacode. Science 378(6624):1092–1097
Liehr P, Smith MJ (1999) Middle range theory: Spinning research and practice to create knowledge for the new millennium. Adv Nurs Sci 21(4):81–91
Liu K, Liu Y, Chen Z, Zhang JM, Han Y, Ma Y, Li G, Huang G (2024) Llm-powered test case generation for detecting tricky bugs. arXiv preprint[SPACE]arXiv:2404.10304
Liu M, Wang J, Lin T, Ma Q, Fang Z, Wu Y (2024) An empirical study of the code generation of safety-critical software using llms. Appl Sci 14(3):1046
Lu Z (2024) A theory of multimodal learning. Advances in Neural Information Processing Systems 36
Ma Y, Sun C, Chen J, Cao D, Xiong L (2022) Verification and validation methods for decision-making and planning of automated vehicles: a review. IEEE Transactions on Intelligent Vehicles
Mashhadi E, Hemmati H (2021) Applying codebert for automated program repair of java simple bugs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp. 505–509. IEEE
Massaro DW (2012) Multimodal learning. Encyclopedia of the Sciences of Learning: 2375–2378
Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surv 51(1):1–24
Nejjar M, Zacharias L, Stiehle F, Weber I (2023) Llms for science: usage for code generation and data analysis. J Softw Evol Process 5:e2723
Ó Cinnéide M, Nixon P (2001) Automated software evolution towards design patterns. In Proceedings of the 4th international workshop on Principles of software evolution, pp. 162–165
Pan S, Luo L, Wang Y, Chen C, Wang J, Wu X (2024) Unifying large language models and knowledge graphs: a roadmap. IEEE Trans Knowl Data Eng
Parcalabescu L, Trost N, Frank A (2021) What is multimodality? arXiv preprint arXiv:2103.06304
Pawłowski M, Wróblewska A, Sysko-Romańczuk S (2023) Effective techniques for multimodal data fusion: a comparative analysis. Sensors 23(5):2381
Peña A, Serna I, Morales A, Fierrez J, Ortega A, Herrarte A, Alcantara M, Ortega-Garcia J (2023) Human-centric multimodal machine learning: recent advances and testbed on ai-based recruitment. SN Comput Sci 4(5):434
Pintelas E, Livieris IE, Pintelas P (2020) A grey-box ensemble model exploiting black-box accuracy and white-box intrinsic interpretability. Algorithms 13(1):17
Qian C, Cong X, Yang C, Chen W, Su Y, Xu J, Liu Z, Sun M (2023) Communicative agents for software development. arXiv preprint[SPACE]arXiv:2307.07924
Robnik-Šikonja M, Kononenko I (2008) Explaining classifications for individual instances. IEEE Trans Knowl Data Eng 20(5):589–600
Saini MK, Atrey PK, Mehrotra S, Kankanhalli MS (2013) Privacy aware publication of surveillance video. Int J Trust Manag Comput Commun 1(1):23–51
Singh SK, Chaturvedi A (2020) Applying deep learning for discovery and analysis of software vulnerabilities: a brief survey. Soft Comput Theor Appl Proc SoCTA 2019:649–658
Stappen L, Baird A, Schumann L, Schuller B (2021) The multimodal sentiment analysis in car reviews (muse-car) dataset: collection, insights and improvements. IEEE Trans Affect Comput 14(2):1334–1350
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, London
Takeishi N, Kalousis A (2023) Deep grey-box modeling with adaptive data-driven models toward trustworthy estimation of theory-driven models. In: International Conference on Artificial Intelligence and Statistics, pp. 4089–4100. PMLR
Talele P, Phalnikar R (2021) Software requirements classification and prioritisation using machine learning. In: Machine Learning for Predictive Analysis: Proceedings of ICTIS 2020, pp. 257–267. Springer
Tsui F, Karam O, Bernal B (2022) Essentials of software engineering. Jones & Bartlett Learning
Tzafestas S (2012) Intelligent systems, control and automation: science and engineering
Vessey I, Sravanapudi AP (1995) Case tools as collaborative support technologies. Commun ACM 38(1):83–95
Vogel-Heuser B, Fay A, Schaefer I, Tichy M (2015) Evolution of software in automated production systems: challenges and research directions. J Syst Softw 110:54–84
Wiesmayr B, Zoitl A, Prenzel L, Steinhorst S (2022) Supporting a model-driven development process for distributed control software. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA), pp. 1–8. IEEE
Yang K, Yau JH, Fei-Fei L, Deng J, Russakovsky O (2022) A study of face obfuscation in imagenet. In: International Conference on Machine Learning, pp. 25313–25330. PMLR
Yang R, Newman MW (2013) Learning from a learning thermostat: lessons for intelligent systems for the home. In: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pp. 93–102
Yu J, Li J, Yu Z, Huang Q (2019) Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans Circuits Syst Video Technol 30(12):4467–4480
Yue S, Hong X, Smith RK (2024). Csm-hr: A context modeling framework in supporting reasoning automation for interoperable intelligent systems and privacy protection. IEEE Access
Yue S, Smith RK (2021) Applying context state machines to smart elevators: Design, implementation and evaluation. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–9. IEEE
Yue S, Yue S, Smith RK (2017) A state-based approach to context modeling and computing. IEEE Ubiquitous Intelligence and Computing
Zeng Y, Lin H, Zhang J, Yang D, Jia R, Shi W (2024) How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms. arXiv preprint[SPACE]arXiv:2401.06373
Zhao M, Ning Z, Wang B, Peng C, Li X, Huang S (2021) Understanding the evolution and applications of intelligent systems via a tri-x intelligence (ti) model. Processes 9(6):1080
Zheng Z, Wang F, Gong G, Yang H, Han D (2023) Intelligent technologies for construction machinery using data-driven methods. Autom Constr 147:104711
Acknowledgements
The author claims that there is no conflict of interest.
Funding
Open access funding provided by the Carolinas Consortium.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yue, S. A Multimodal conceptual framework to achieve automated software evolution for context-rich intelligent applications. Innovations Syst Softw Eng (2024). https://doi.org/10.1007/s11334-024-00591-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11334-024-00591-0