1 Introduction
Systems engineering (SE) is a paradigm that involves various processes and methodologies for lifecycle management of systems [
9]. In its basic form a system is defined as
“... an integrated set of elements, subsystems, or assemblies that accomplish a defined objective” in the SE handbook by the
International Council on Systems Engineering (INCOSE) [
88]. The practices of SE are meant to aid the engineers and practitioners of various disciplines to communicate and rigorously perform activities to ensure that the system is delivered correctly. Indeed, SE prescriptions and guidelines are followed for larger projects and activities typically encompassing a multitude of disciplines across various fields (e.g., satellites, aeroplanes, nuclear plants, etc.); in these development endeavours it is vital that the participants can communicate efficiently and align efforts via standard practices and methods. Viewing SE from the INCOSE SE handbook, it is clear that there is a multitude of processes and methodologies that could be utilised when performing activities related to a system’s lifecycle. The lifecycle of a system is usually divided into stages with clear separations. Notably, the ISO generic lifecycle standard (ISO/IEC/IEEE 15288:2015)
1 formulates the following stages of a system lifecycle: Concept, Development, Production, Utilisation, Support, and Retirement. Similar definitions are found in various SE methodologies across different domains, and although stages might differ in scope depending on any specific standard or methodology, a clear distinction of activities
before implementation begins and
after implementation has started is found in most standards (e.g., the V-model is a very common example [
35,
88]). Indeed, standard SE practices put a lot of emphasis on the precondition that starting to build/implement a
System of Interest (SoI) should be only done once there exists enough confidence that it will meet stakeholder expectations and needs [
88]. Therefore, the initial phases of the system lifecycle, related to stakeholders’ requirements and system design, are key for the rest of the development. In fact, these activities are performed before the SoI enters the development or production stages and need to convey strong arguments that the design meets all the considered requirements.
Providing guarantees that a system will meet the stakeholders’ expectations requires first that these expectations are correctly interpreted in the system requirements, and poor communication with a system stakeholder has direct impact on poor development results [
46]. Once the requirements have been analysed and deemed correct, it is necessary for a rigorous process to design a system accordingly. It is important to note that often there exist many potential designs to satisfy a particular set of requirements, and in this case there is a need to additionally decide which design is the most suitable for a particular SoI [
69]. The INCOSE handbook recommends, among other things, that during the concept stage early validation should be performed to align requirements with stakeholder expectations and for identification of problems of the concepts used in the system design [
88]. As a matter of fact, discovering errors or faulty design issues during the implementation of the SoI is costly, and improper estimations of system properties at design time can lead to dramatic effort increases and even termination of the system implementation [
81].
While the arguments for assuring that a design will meet the requirements are widely accepted, there is less agreement about the involved assurance processes and approaches. In fact, providing evidence that a system not yet developed will be produced and will perform correctly with some level of confidence is not simple [
88]. A common strategy adopted at this stage is the re-use of previously successful processes and solutions; this strategy, however, comes with the risk to overlook certain viable and perhaps more attractive alternatives [
1,
22]. Moreover, always re-using the same solutions will eventually lead to missing out on potential new advancements or improvements, regardless of how important and necessary re-use is from an industrial perspective.
Traditionally, SE has been document-centric in its activities; however, with the rise in complexity and growing needs of industry, more methods and practices are becoming model-centric, i.e., utilising models as the main artefacts during the SE phases, and in particular the stages before implementation [
45,
61]. To confirm this, the INCOSE 2035 vision states:
“The future of Systems Engineering is predominantly Model-Based.” 2 A model is any description of a system that is not the thing-in-itself [
56]. The quote “
all models are wrong, but some are useful,” from Box and Draper [
10], captures a vital essence of modelling. Modelling is often not for the sake of describing something in great detail; in fact, it is often the opposite, to describe a subject “good enough” with as a high degree of abstraction as possible in a particular context. A natural tradeoff comes between abstraction and detail, and defining a model with a high level of abstraction that still contains the necessary details is a difficult task but is often a precondition for a “useful” model [
30,
52].
INCOSE defines
Model-Based Systems Engineering (MBSE) as “
[...] the formalised application of modelling to support system requirements, design, analysis, verification and validation activities beginning in the conceptual design phase and continuing throughout development and later lifecycle phases” [
88]. Therefore, it is expected that modelling will assist the user from the first phases of the system lifecycle until the last ones, with models as the primary artefacts to support the SE activities. However, modelling at early phases of a system lifecycle comes with important challenges: By definition, capturing properties and making design decisions at an early phase is performed with a limited understanding of the system, and parts of the system might even not be understood or remain to be determined.
The behaviour of a system is one aspect that is paramount to validate and verify, but when dealing with early phases of development the lack of detail especially limits the understanding and the analytical power of modelled behaviour. Besides, the competition between abstraction and detail mentioned before becomes critical, since hidden details might convey implicit hypotheses and/or design decisions that could be contradicted in later development phases. With this in mind, it is of interest to understand how behavioural models are created, both in industry and academia, and for what validation or verification purpose the models are useful for, together with corresponding identified limitations. Therefore, in this article we contribute a systematic literature review that aims to answer various questions regarding validation and verification of system behaviour in early MBSE phases. Through the review process we identified 149 papers, which have been processed systematically and key data have been extracted to be presented in this work. We illustrate our findings from the review and discuss their implications with a specific emphasis on the industrial perspective in contrast with the academic one. Notably, from an industrial perspective we aim to elicit the kind of V&V analyses available in the state of the art together with the preconditions/efforts demanded, from a modelling perspective, to adopt such analyses. Instead, from an academic point of view, we aim to identify open challenges that could be worth to investigate as future research directions.
The rest of the article is structured as follows: Section
2 describes the background and motivation for the review. In Section
3, related work is discussed in regard to MBSE and V&V. Our research methodology is described in Section
4 in addition to our research questions. Section
5 discusses the threats to validity. In Section
6, we present our results, and in Section
7 we perform a horizontal mapping of results. We discuss our findings in Section
8 and present challenges for industrial adoption of early behaviour validation in MBSE. Finally, Section
9 summarises the article with our conclusions, key findings, and future work.
2 Background AND Motivation
Model-based practices are often considered as an enabler for performing early V&V as part of the lifecycle of SE processes [
27,
73]. In particular, the improvement of early analysis is expected from the usage of models, which enable more robust reasoning and evaluation compared to traditional document-centric development [
36]. In the early phases of SE processes much emphasis is put on requirements management and conceptual design, often related to some system architecture. In this respect, MBSE is often deemed to have several benefits when compared to the traditional means of SE; some of these benefits are more intuitive and easy to identify, like added capabilities of traceability and understandability/communication from diagrams [
31,
55]. Nonetheless, as the main artefacts of MBSE at all stages of development are models, it becomes interesting to understand and demonstrate the benefits and advantages of modelling activities. Notably, a key motivation often referred to as a main benefit of MBSE is the increased opportunity for V&V [
17,
36]. On the one hand, when dealing with models the balance between fidelity and purpose of modelling has large implications leading to modelling tradeoffs with respect to partially or fully developed models: Too few details and the model cannot provide much in terms of analysis; too many details and the benefit of modelling early is reduced, as developing models often involves a significant effort [
77]. On the other hand, SE practices and experience demonstrate that the cost of addressing faults and errors increases rapidly as development progresses [
88], emphasising a clear incentive toward moving V&V activities earlier rather than later. When dealing with early behaviour analysis, it is often required to understand the dynamics of the system at a finer level of granularity than what is captured by standard languages like SysML
3; these additional details can either be embedded/hard-coded in the analysis tools or need to be explicitly provided by introducing more details in the models. In addition, most tools do not support execution of many standard MBSE languages, and therefore transformations to other languages and platforms are typically required to perform behavioural analyses.
Representing and analysing system behaviour at an early stage falls into the SE modelling activities mentioned so far, both in terms of potential benefits and challenges [
64]. In particular, a benefit of utilising models as the primary artefacts is the improved semantic integration of digital assets, enabling more robust early analysis toward an integrated system behaviour. Notably, the simulation of systems is seen as an essential capability of MBSE [
25]. However, the current landscape lacks system simulation maturity regarding commonly used languages such as SysML [
67,
94]. The lack of analytical capabilities, further noted from an industrial perspective [
25,
83], hampers potential adoption. Additionally, tooling is often seen as a limiting factor regarding model-based practices, especially in industry [
20,
53]. A remarkable gap in this regard is the interoperability between tools and languages: Since the analysis of complex system behaviour often entails the incorporation of several domains, the lack of interoperability represents a prominent limiting factor (particularly for industrial SE processes) [
66]. In this regard, there is a need to further investigate the common view of what early
verification and validation (V&V) details for system behaviour, considering the benefits and limitations of current technologies and techniques.
By summarising what we discussed so far, it is widely accepted that a move toward early V&V is an attractive endeavour, and MBSE is considered as a possible way forward; however, the adoption of MBSE is not trivial, and research literature reports on several practical and fundamental challenges in the adoption [
61,
83,
87]. Therefore, in this article we survey the state of the art related to early V&V of systems’ behaviour to elicit what are the properties of interest for the existing analyses and the corresponding proposed approaches.
2.1 Motivating Example
To situate our article more clearly, we extract a typical industrial scenario from our MBSE experience in
Construction Equipment (CE) [
18,
83]. We refer to the SE discipline and relate to the notion of different stages of development as defined by standards such as ISO 15268 and common processes such as the V-model. In particular, we focus on the early phases of system development, where the system under study still contains much uncertainty toward the eventual design and implementation. At this stage the system often is made-up of system
views at high levels of abstraction. This is typically guided by standards such as ISO 42010
4 and has clear industrial definitions regarding view definitions. Here, it is worth noting that early stage is a relative concept: It could refer to a completely new system design, starting from “scratch” with customer’s requirements and feasibility analysis; it could also refer to re-designing an already in-use system, perhaps a new variant of a product family (typical for domains as CE). What associates these cases is that a new idea or concept is to be evaluated and the available artefacts and information contain limited details. Nonetheless, these evaluations are important, since they are used as a base to reduce the solution space, which otherwise can be considered practically endless and hence support the engineers progress in the development. With the advent of MBSE, companies are incentivised to change their processes to incorporate models for increasing design effectiveness and improve decision-making. However, changing the way of working in industrial processes is a costly endeavour, thus requiring convincing proofs that the Return of Investment motivates the move to change.
The CE domain has a strong legacy in hardware-intensive SE, which relies heavily on re-use and product families with high variability [
8]. The development processes are mature and rooted in well-established standards. With the digitalisation paradigm shift, there has been a growing interest in MBSE as a critical enabling technology to manage the complexity increase. Due to the heavily integrated variability, change and configuration management techniques already rely on modular model approaches, particularly for system architectures. A product can be customised depending on stakeholder needs in the form of customer or regulatory concerns. Various drawings/diagrams are typically used in conjunction with tables to present variable options in design, and the customer is free to pick options suitable for the context. Similarly, engineers utilise various system definitions and drawings to define valid system compositions and perform feasibility/tradeoff analysis toward customer demands. In this respect, the traditional SE development already includes system model views of various types. Some examples are provided in Figure
1 from the CE domain at its early stages (architecture and high-level design). Although we do not discuss any of the views in detail, we emphasise the wide range of artefacts available as a result of several parallel/joint engineering activities. Despite many of these system views being models, it is worth noting that often they are not linked meaningfully and/or they rely on informal semantics (e.g., Visio drawings), which limits model-based analysis “as is.”
In these early stages, it is often the case that what if analysis or high-level tradeoffs are required to make high-impact early design decisions. A common example to be considered even at the very start of a design process is a machine’s brake functionality, as design guarantees are required as per industrial standards for several aspects (e.g., maximum brake distance, fault tolerance, and environmental robustness). In this context, early V&V of both functional and non-functional requirements could front-load activities to speed-up the decisions related to high-level design and reduce risk of extensive iterations on design. However, providing guarantees that a system will meet strict requirements demands strong confidence, and in early phases the high-degree of uncertainty makes that challenging, and often pessimistic assumptions are used to make “safe” estimations for the design. With the considered legacy as a starting point, MBSE is being gradually adopted by the CE domain. In particular, taking the already-in-place model views and further leveraging them for model-based analytical capabilities currently not in place could increase competitiveness by improving the ability of early valid analysis. Notably, the behavioural aspects of systems become easier to analyse via rigorous methods compared to legacy SE methods that rely on semi - or non-formal models and often implicit expert knowledge. By leveraging the traditional SE views with MBSE technologies, design decisions could be made earlier than traditional workflows. The value proposition from early V&V relates to the difference from traditional decision-making, for example, via the time required to reach decision maturity or analysis coverage with system descriptions. Practically, information to make informed decisions about system viability can become available earlier by leveraging model-based methods for re-use, analysis, simulation, and so on. However, as anticipated earlier in this article, MBSE adoption is an industrial challenge. In our experience within CE, model-based methods generally map poorly to the overall SE context and are overly complex and/or abstract. The value proposition is also challenging to be demonstrated in practice, leading to “convincing” stakeholders of the potential value. Additionally, MBSE is associated with tooling and extensive training to change the way of working.
By considering industrial contexts like the CE domain, this article and subsequent research questions aim to provide a better understanding of the literature landscape for early V&V of system behaviour. We aim to elicit and disseminate the current research results and industrial readiness for early behaviour validation utilising MBSE. Notably, we want to understand how it is defined and motivated; how it is implemented; what tools, languages, and methods are used; what aim the authors have in mind with early V&V; and, finally, observed limitations. In this way, the review can serve the necessary information due to a step forward for MBSE adoption of SE disciplines already considering models as a part of development. Moreover, it can highlight potential challenges to be addressed in the wider community. In particular, the review results would clarify what types of early V&V behaviour analysis can be expected in MBSE, what is the required effort in terms of modelling activities, what are the methods, languages, and tools involved. In other words, this review aims to elicit the distance between currently used early stage artefacts as the ones described in Figure
1, and the models necessary to enable MBSE early analysis techniques. Furthermore, we aim to shed the light on open research challenges and possible future investigation directions, especially to close the gap between research and practice.
3 Related Work
Although we have found no other survey or literature review regarding the subject described and reported in this article, several other works address similar or related issues. Therefore, in this section we highlight other reviews discussing related aspects of MBSE.
Ma et al. [
60] aimed to understand the state of the art and state of practice for the tool chains used for MBSE. They define a tool chain as two or more modelling, simulation, and design tools that, when combined, support/construct SE workflows with advanced features. The review identifies that SysML is the most adopted modelling language for MBSE tool chains. The authors note that although tool chains based on SysML are the most mature, there are still major challenges for robust industrial adoption. Furthermore, the authors highlight some primary indicators of tool readiness, namely integration capabilities, interoperability, and traceability. While Ma et al. discuss some aspects of MBSE that are related to this review, there are significant differences in scope, depth, and context. Notably, we emphasise the notion of early phases in MBSE and focus our analysis around system behaviour analysis while their work does not necessarily consider system behaviour analysis. In addition, we emphasise an industrial perspective through a motivational example and present a deeper analysis from that context in addition to adoption barriers for early V&V.
Another study from Rashid et al. [
73] investigated the tools used for MBSE activities within the embedded systems domain. Similarly to other reviews, they identify that UML and the profiles SysML and MARTE are the most commonly utilised means of modelling. They also note that UML profiles or UML alone do not meet the existing modelling challenges, and some combination of languages is often employed. On the contrary, the authors note that SysML provides a sufficient foundation to model structure and behaviour. The work by Rashid et al. focuses on embedded systems while the work in this article is not tailored for a particular domain. Additionally, their paper is situated more toward code implementation as opposed to early V&V, so to this regard the observations in the reviews capture different aspects of MBSE.
De Saqui-Sannes et al. [
25] provide a taxonomy of MBSE approaches in the scope of languages, tools, and methods. In the review, they note the prominence of SysML as a MBSE language, with an assortment of tools to support SysML. However, the authors note that robust methods of MBSE are still lacking. In addition, they argue that many challenges still exist for the industrial adoption of MBSE and that SE education needs to capture the current reality of MBSE better. The work by De Saqui-Sannes et al. has a different focus compared to this work as it aims to give a brief overview of the field. Further, the work discusses experiences from the authors’ experience with a drone project. Comparatively, our work has a narrower view on the behaviour of systems but provides a more complete review in terms of literature coverage and corresponding results.
Nigischer et al. [
66] provide a systematic review on multi-domain simulation utilising SysML. In their review, they argue that MBSE provides support for analysis at early stages, utilising SysML to capture information that can be exchanged with suitable simulation tools. The review discusses various means of managing simulation via SysML and notes that the
Functional Mock-up Interface (FMI)
5 is a promising standard. However, the authors conclude that there are still many issues with SysML-based simulation, and a particular challenge is interoperability between tools. While the review by Nigischer et al. covers aspects present in this review, it focuses on SysML and simulation, which can be considered as a subset of early V&V of system behaviour.
Zeigler et al. [
94] argue that simulation is an essential capability for MBSE toolsets, which they identify as lacking in the current landscape. Their paper states that MBSE practices need to be expanded to manage more complex systems engineering practices, especially when dealing with
System of Systems (SoS). The authors identify that most of the efforts in MBSE regard implementations in notations such as UML and SysML. However, such notations are limited regarding simulation capabilities, which the authors argue could raise questions regarding the adequacy of those notations for developing complex systems. The review by Zeigler et al. provides arguments and insight into simulation; instead, the work presented in this article aims to discuss and review concepts related to early V&V. Although simulation is a commonly used technique in early phases of systems engineering, we are not focusing on these specific techniques and their applicability. Further, the distinct focus on SoS is not present in this review.
Henderson and Salado [
41] review the reported benefits and value of MBSE practices from the existing literature. The primary finding of the authors is that most of the argued benefits in literature are expected and not measured, leading to their conclusion that MBSE benefits remain inconclusive. Li et al. [
58] also identify that although MBSE-affiliated research is growing, several independent research clusters exist with little interaction. These works discuss topics also reflected in our review; however, our focus is different. First, our work has a strict focus on early V&V in the context of MBSE, and while the other reviews also regard MBSE the focus is broader. Second, our work discusses similar topics, but it is part of the work and not the main focus of the work, creating a more holistic view for the aforementioned area of early V&V. And, last, the aim of the reviews are different; we catalogue and review aspects of interest for early V&V while Henderson and Salado call for more empirical studies in MBSE and Li et al. aims to promote collaboration across existing domains to address future concerns.
Laing et al. perform a survey of industrial MBSE practitioners in France regarding model-based verification [
53]. They identify several success criteria that, if met, are believed to lead to positive effects in the adoption of MBSE. They also note two major weaknesses from an industrial perspective for model-based verification, namely for verifying the system architectures and how multi-physics or multi-disciplinary designs can be integrated in MBSE frameworks. While the review by Laing et al. focus on verification in MBSE, they catalogue industrial views on the subject and present success criteria, while the review presented in this article instead focuses on the V&V in itself via a systematic study.
Arauju et al. [
5] perform a systematic literature review on testing, verification, and validation of
robotic and autonomous systems (RAS). Their review indicates a growing need for extending traditional means of testing and V&V for complex RAS systems. Similarly to our work, their review also focused on the industrial perspective, targeting both industry and academia and finding a lack of industrial applications of methods and tools. While the review discusses similar concepts to this work, its focus is narrower and systems descriptions are not expected to be given by means of low-fidelity models (as investigated in this review). In addition the focus of their review is closely related to the technical aspects, while our work also discusses MBSE from a more holistic view.
Ahmad et al. perform a survey on model-based testing utilising UML activity diagrams [
2]. A few results of the review are highlighted, namely the lack of non-functional testing, lack of industrial or elaborate evaluation, high representation of domain specific solutions with tight restrictions, and the lack of holistic approaches. While the review shares some common aspects with the work presented in this article, it focuses on UML and model-based testing specifically as opposed to the broader and differently positioned review presented here. Further, the work by Ahmad et al. does not consider the context of MBSE.
Chaudemar and De Saqui-Sannes [
21] investigated the combination of
Multidisciplinary Design Analysis and Optimisation (MDAO) and MBSE for early validation of design. The authors argue that MDAO could be a good fit with MBSE as it, among other things, can be integrated with low-fidelity models and take model uncertainty into account. However, the authors note that these methodologies remain mostly separated in the literature, and some challenges must be addressed to join them. While our work discusses many concepts related to design, we do not have the same strict focus as Chaudemar and De Saqui-Sannes about early validation. Further, this review considers early behaviour validation independently of its coupling with MDAO.
Tsioptsias et al. [
86] investigated the simulation model validation and testing via a literature review. Three distinct fields of research are observed: Operational research, Modelling & Simulation, and Computer Science. Some of their main findings include the distinct lack of common terminology for the reviewed concepts, a lack of linkage between theory and practice, and insufficient empirical studies to support the claims in papers. The authors also argue that validation should be performed continuously and that modellers and users should work closely during simulation model validation. Although discussing models, the review by Tsioptsias et al. focuses on a different scope compared to the review presented in this article. Their focus is on simulation model validation, while this review is centred around V&V in the context of MBSE. While simulation model validity is central to V&V, it can be considered a sub-problem of (early) V&V as a whole and our review presents a more broad discussion.
4 Research Method
This section presents the research methodology used to conduct the survey. We followed the steps described by Kitchenham for a systematic literature review [
51]. In particular, the research method included three distinct phases,
planning,
performing, and
reporting.
The purpose of the planning phase included (i) the identification of gaps in the literature and needs for the review, discussed in Sections
2 and
3; (ii) the definition of the research questions to drive the work, presented in Section
4.1; (iii) the definition of the review process and guidelines for the involved authors, illustrated in the remaining of this section.
During the performing phase, we executed the review in several concrete steps, namely
Search,
Selection,
Snowball,
Definition of data collection table,
Data extraction, and
Data analysis. The search step consisted of defining a search string and a consequent automated search for relevant papers through several scientific databases. The selection consisted of a rigorous process for identifying primary studies for the review. We complemented the identified papers via an exhaustive snowballing process [
91] to identify potentially missing papers. At this point, a data collection table was constructed and validated on a few pilot papers. Finally, we performed the data extraction on the included papers and coded data for easier interpretation. We analysed the extracted data vertically and horizontally, resulting in the findings of this review.
In the reporting phase, we documented the findings resulting from the review (see Section
6). Further, we analysed potential threats to validity and corresponding mitigation strategies to be employed (see Section
5).
4.1 Research Questions
This study aims to investigate the current methods and practices for describing and analysing behaviour by means of system models in early stages of development when adopting MBSE. We are also interested in inspecting model-based techniques from an academic and industrial perspective, highlighting the current similarities and differences with respect to adopted methods and tools. With this and the motivating scenario illustrated in Section
2.1 in mind, we formulated the following research questions (RQs) to drive the work:
RQ1: How is early V&V defined and motivated in the MBSE literature? This question investigates the definition of early V&V activities in the literature together with the main motivations reported for performing those activities.
RQ2: What are the means for describing system behaviour at an early stage of development? By considering the tradeoff between purpose of modelling and fidelity of the models, it is of interest to understand how a system behaviour is initially described. Moreover, it is relevant to capture the languages and formalisms utilised for analytical purposes, since they might differ from the initial behavioural descriptions. Eventually, in the cases where different models are used due to behaviour description and analysis, it is of interest to elicit the types of approaches adopted to map the different representations.
RQ3: What are the results of interest for the early V&V, and what techniques are employed for performing the analysis? Given the early stage of the development, it is relevant to understand what type of analysis results are reported in the literature. Similarly, it is important to elicit what methods or techniques are suitable for computing the analysis results and to understand how these results are presented to the user.
RQ4: Which are the application domains employing early V&V? By considering the tradeoff between modelling efforts and reliability of the analysis results, it is interesting to know which application domains adopt the proposed solutions and whether these solutions are domain specific or not. Additionally, it is important to report whether the solutions have been validated in an industrial setting or not, aiming to elicit potential gaps between academia and industry.
RQ5: What are the limitations of the existing approaches for early V&V? By considering the growth in complexity of the developed systems and the impacts of problems discovered late in the development process, it is critical to understand what limitations are reported when performing early V&V, both specific for the proposed solution and more broadly for early V&V in general.
4.2 Search Process
To find the papers included in the study, we opted for an automatic search across several scientific databases. By operating some preliminary exploration of the databases of interest we noticed a limited number of hits on relevant publications; moreover, based on our own experience, we expected a relevant spread of keywords and definitions due to MBSE being broad in nature. For example, the notion of validation and verification will differ between several domains, and the “MBSE” keyword is used in several orthogonal disciplines. Therefore, we kept the search strings relatively tight and decided to perform an exhaustive snowballing to ensure the search process would capture as many relevant papers as possible for the study. The following databases were searched for information: ACM,
6 IEEE,
7 ScienceDirect,
8 and Scopus.
9 Moreover, we used the following search string:
(“MBSE” OR “Model-based systems engineering” OR “Model based systems engineering”) AND (“Validation” OR “Verification” OR “V&V” OR “Evaluation”) AND (“Behavior” OR “Behaviour”).
4.3 Inclusion Process with Inclusion and Exclusion Criteria
Starting from a set of initial papers collected via the search strings across the chosen databases, we selected additional papers through an iterative process until a final set of papers was identified. The process was guided by well-defined inclusion criteria (IC) and exclusion criteria (EC), summarised as follows:
IC1: The paper regards the problem of model-based early V&V of system behaviour
IC2: The paper presents one or more concrete solutions for early V&V
EC3: Scope outside of model-based systems engineering
EC4: Not available in full text
EC5: Short papers, tutorials, WiPs, research agendas, papers shorter than 5 pages
EC6: Paper overlaps with a more complete paper (e.g., a conference publication extended by a journal article).
IC1 and IC2 are the main criteria we consider for a paper to be included in the review, and each included paper meets both criteria. If any of the EC are met, regardless of the IC, then a publication is automatically not qualified for the review process. EC1, EC2, and EC4 remove papers not meeting the basic criteria. EC3 removes papers that do not discuss the topic of early validation in the MBSE context, such as papers discussing non-model-based approaches or parallel domains such as software engineering (also referred to as MBSE). With EC5, we aim to remove any work that does not present a complete research or application paper. Furthermore, with EC6 we aim to avoid bias by including the same general source with minor editions, and in the case of overlap, the more mature paper is included.
Figure
2 provides an overview of how the number of papers changed over the process of applying the IC/EC to arrive at the final set of 149 papers. The original search was conducted March 15, 2022, and the additional search was conducted the March 31, 2023.
The steps taken in the search process and shown in Figure
2 were performed in a review management tool called Covidence.
10 First, the database search was performed, resulting in 495 papers, which became 431 after duplicate removal. Then, the IC and EC were applied to the title and abstract of each paper, resulting in 179 papers to be included for the full text review. We note a drastic decrease of the papers at this stage and ascribe much of that to catching papers not focused on system behaviour; in fact, the “MBSE” search-term caught many unrelated papers. In the full text review, each paper was read again with the IC and EC in mind, and, finally, 69 papers were chosen as the set of primary papers. Subsequently, as planned at the search string definition time, an exhaustive snowballing process took place following the guidelines in Reference [
91]. The snowballing process took eight rounds until no new papers were found, resulting in 206 newly identified papers for the review process and eventually 152 included papers in total. Again, we note a large increase in the papers after the snowballing process, in part due to its exhaustive nature but partly due to the initial search string missing some key papers because of the adoption of slightly different terminology. Authors would refer to the notion of early in a process differently, for example, “early stage,” “early phase,” simply “early,” and, more commonly for our snowballing, not explicitly in the title itself. It is also worth noting that the snowballing phase followed the same screening procedure adopted for the initial set of papers (for the sake of readability we omit the cycles due to the snowballing in the picture).
The final round of review concerned removal of overlapping papers (e.g., a conference paper and a journal paper with same introduction and background sections) and a final check for the IC and EC, resulting in papers that were finally considered for data extraction. As visible in the figure, given the extent of this work we decided to perform a search update at this stage to catch any new papers published during the review and writing process. Such an update resulted in new entries to be considered, the numbers of which are detailed correspondingly in the figure (starting from “Additional database search”). Due to the aforementioned removal of overlapping papers the total number of publications was reduced from (152 + 4) to 149.
When applying the IC and EC, two reviewers were assigned to each paper, and in case of disagreement a third reviewer would make a final decision. In the snowballing phase, a single reviewer identified the potential papers, while the selection process followed the procedure mentioned before. With a final set of 149 papers, the extracted data were analysed vertically and horizontally for presentation in this article.
4.4 Data Collection and Analysis
Once a set of studies has been identified, the relevant data have been extracted as shown in Table
1. By going into more details, Question 1 targets general publication details. Questions 2 and 3 aim to answer RQ1 by extracting the authors’ definition of early V&V and specified motivations, respectively. Questions 4, 5, and 6 target RQ2 by extracting the languages and formalisms used for description and analysis of behaviour. It is worth to notice that we differentiate language and formalism based on an existing classification.
11 RQ3 is answered by Questions 7, 8, and 9, which target the techniques, results of interest, and tools utilised in the process of analysis. To answer RQ4, Questions 10 and 11 extract more information about the domain of application considered in the paper; moreover, Question 12 targets how the solution was validated based on the classification by Shaw [
79]. As part of RQ4 we also extract whether the study evaluates the solution in an industrial context or not based on the case study description (or lack thereof) in the paper. Finally RQ5 is answered by Question 13, which extracts the limitations of the presented solutions as explicitly discussed in each paper.
Similarly to the previous steps, the Covidence tool was used to perform the data extraction. In particular, Covidence allows for automatic identification of conflicts in data extraction. This feature was utilised on a set of pilot studies to verify whether two researchers would extract the same data and hence elicit possible issues with the extraction form. After performing the data extraction on 10 (random) pilot papers and comparing the results for inconsistencies, only minor issues were identified, which led to a minor refinement of the guidelines for the data extraction to better match the expected outcome. Subsequently, one researcher took charge of the data extraction for all the papers. Nonetheless, after the completion of the extraction task another researcher performed a round of random checks on the extracted data. This round included half of the extracted papers and no major inconsistencies were discovered.
Based on the data collected according to Table
1 we performed vertical and horizontal analyses, the results of which are reported in Section
6. Vertical analysis refers to the deeper discussion of each particular RQ and corresponding data extraction. Horizontal analysis instead focuses on cross-data patterns and correlations. In this respect, it is important to notice that additional coding has been adopted on certain categories to perform the horizontal analysis (see Section
7). Examples of such coding could be whether papers used SysML or not, the type of licensing for tools, and categories for limitations. The coding has been especially helpful for the categories where input varied greatly. In those cases, almost all entries would appear once or twice at most, thus making the extracted information spread across a large set of data. For the interested reader, we refer to the publicly available review replication package, which includes the review protocol, the set of collected papers, the ones selected for the extraction, the complete set of extracted data, and the adopted coding for specific subsets.
125 Threats to Validity
The review presented in this article has been performed according to well-established research guidelines. Moreover, a research protocol with correlated data is located in a publicly available replication package. Still, we acknowledge that a study of this magnitude and scope might contain some threats to the overall validity. Therefore, in the following we discuss the potential threats by adopting the terms by Wohlin et al. [
92] together with the corresponding countermeasures we have considered.
5.1 Data Reproduceability
For the sake of readability and conciseness, we summarise the results and highlight selected peculiar outcomes. A complete list of the included papers and the corresponding data extraction, along with other technical details related to the systematic review process, can be found in a publicly available repository both for replication and to use this data source for other kinds of analyses and investigations. The replication package consists of the search strings used, in addition to a table that places unique publications on the rows and the extracted data in the columns. Some columns have been created explicitly for horizontal analysis, presented later in the paper, detailing for example whether a publication uses SysML or what type of tool licenses exist for the tools used in the publications. Furthermore, we have shared the tables and graphs used for the various analysis.
5.2 External Validity
The threats of external validity primarily relate to the retrieval of the papers to be analysed. In fact, the selected papers are the source of all the analyses and significantly impact the extracted results and their quality. We utilised the search string in several databases to identify potential studies for our analysis. We performed the search with a relatively refined search string to reduce the initial set of papers. To mitigate the risks of missing relevant papers, e.g., due to a missing explicit reference to MBSE or V&V, an exhaustive snowballing (forward and backward) procedure was performed until no new papers were identified. The snowballing was performed in eight rounds, which allowed us to capture the papers missed by the initial search string and to accurately retrieve the relevant papers for analysis. Additionally, we re-iterated the search process at a later stage of this research work to catch any papers released during the time of data analysis and of writing.
5.3 Internal Validity
Internal validity refers to any threats primarily arising from the bias of the reviewers involved in the study. To mitigate the bias of individual researchers, we required a majority consensus of the reviewers for all the selection steps until the data extraction. In other words, two researchers performed the selection in parallel and in case of decision conflicts a third researcher took the final decision. Moreover, before performing the data extraction, several pilot extractions were performed to evaluate and harmonise how different reviewers would interpret the extraction forms. The pilot extractions did not reveal any significant conflicts, which gave us confidence that an individual reviewer could perform the extraction. Nonetheless, at the end of the data extraction a sanity check was performed on a significant subset of random papers to verify the similarity of the results.
To reduce the bias even further, the researchers were not tasked with evaluating any of the papers’ claims. Instead, the extraction consisted of reporting the claims by the authors of the papers. As such, any interpretation of the data was left for after the extraction. For the entire process, we utilised the Covidence tool for maintaining consistency among reviews, as the tool highlights potential miss-matches of extraction. After extraction was complete we exported all data from the tool and performed the horizontal analysis and vertical analysis directly on the data.
5.4 Construct Validity
Construct validity mainly relates to the risk of deriving an incorrect conclusion from the relations between treatment and outcome. For this article, this would mean that the way we searched and selected the papers and the approach adopted for the extraction could have affected the results we obtained. We used several literature sources for the search string to mitigate this risk, four to be exact. Moreover, we performed exhaustive snowballing to mitigate the threats of poorly formulated search strings or missing papers due to exclusion from databases. The snowballing was done according to the best practices recommended in literature and was performed exhaustively, i.e., until no new papers were found.
5.5 Conclusion Validity
Threats to conclusion validity refer to any risk of misinterpreting the results of the findings. To mitigate these risks, we have followed well-established systematic literature review approaches [
51,
92]. Moreover, we did not adopt any preliminary interpretation of papers’ contents, and we applied automated analysis tools to collect data and elicit relevant cross-relations.
Admittedly, there could be still some risk for bias due to our close experience with the CE domain and hence a potentially limited/incomplete interpretation of the extracted results. Nonetheless, our experiences in other industrial domains point out very similar MBSE adoptions scenarios and issues, making us confident about the broader validity of our reasoning. Additionally, we provide a publicly accessible replication package with all the details about how the study was performed and the corresponding extracted data.
6 Findings
In this section, we present the findings of the data extraction while targeting the research questions formulated in Section
4. We refer to the online replication package to see the precise extraction of data from publications. The appendix explicitly maps the papers to the RQ categories, which are presented and summarised in the following section.
6.1 Publication Details
In this section, we present the overarching publication details. First, we map the trend of publications over the years, as illustrated in Figure
3.
Figure
3 indicates a rise in the interest for the topics discussed by this review over the years. Moreover, even if no condition for inclusion was set on the earliest year, the earliest publications can be found in 2000 and few publications are found before 2008. Since then, the general trend shows an increase in the number of publications, hinting a growing interest in the topic. Such a trend also matches quite well the birth and maturation of MDE techniques, which are a key enabler for early V&V [
78]. In the figure, we also distinguish the selected papers by the publication types and notice a large majority of either conference or journal publications. Such prevalence might indicate a certain degree of complexity and/or maturity for the proposed solutions that are difficult to enclose in workshop publications.
Figure
4 presents a word cloud of the publication keywords, where each keyword is included regardless of the number of occurrences, and no keyword clustering has been performed. Ignoring the references to MBSE or modelling in general in Figure
4, the most common keywords in the publications are SysML (
\(n=30\) ), Simulation (
\(n=20\) ), Verification (
\(n=16\) ), UML (
\(n=10\) ), Model checking (
\(n=10\) ), Requirements (
\(n=9\) ), and Model transformations (
\(n=9\) ). We note a clear bias for SysML and related concepts, as well as for simulation and related tools, languages, and techniques. We also highlight a lack of mentioning for the word “design” or similar concepts. Apart from the more represented topics in the keywords, many are represented only once or twice, often related to specific techniques or domain-specific concepts.
6.2 RQ1 - How Is Early Validation and Verification Motivated and Defined in the Literature for MBSE?
This section summarises the data extracted to answer RQ1, specifically questions 2 and 3 in Table
1.
6.2.1 How Does the Community Define Early V&V?.
Although all the analysed papers regard the concept of V&V at an early stage of development, few papers explicitly state what that means in their context. Instead, most of the papers implicitly infer that the targeted V&V takes place at some point of system design or requirements elicitation, often referring to the INCOSE definition that states that verification and validation begin in the conceptual design phase [
88]. Of the papers explicitly describing the phase or context of early V&V, a majority refers to the design phase and, as previously stated, often refers to the INCOSE definition of the “Conceptual design phase.” Apart from the design phase, some papers argue that early V&V targets the “requirements phase,” and some authors report that their solution targets both the requirements and design phases. Figure
5 visualises the target phase of the solutions among the papers.
In more detail, 107 of the 149 publications (71.8%) report that the proposed solution applies for the design phase, while 29 (19.5%) report that the solution exists in the requirements phase. Thirteen (8.7%) of the papers report that their solution covers both requirements and design and tend to be large in scope. As an example, Lemazurier et al. [
P28] use requirement boiler-plates as a starting point to leverage five unique domain-specific languages and several views to generate a functional architecture. Bouffaron et al. [
P44] instead define an iterative process of system refinement that spans several stages of development. In both cases, there is a large emphasis on the process, and the solutions play a supporting technical role.
From the extracted data, we can see that authors consider system requirements and system design as both targets for early system behaviour V&V. While few authors explicitly define early V&V, it is clear that the majority of them considers the topic of early validation to regard system design rather than system requirements. Moreover, the problem of “will a particular design meet the requirements?” prevails over “will a particular set of requirements capture the system of interest adequately?” Eventually, when papers deal with both requirements and design phases they use the available information to perform cross-checks, notably for better understanding the system and reason about constraints [
P4,
P48,
P149]; to raise the quality of models and reduce “bad smells” [
P15,
P28,
P53,
P95]; to improve traceability and understand artefact dependencies [
P28]; to validate non-functional requirements [
P44,
P60,
P96]; and to reason about tradeoffs or to reduce the design space [
P51,
P63].
6.2.2 What Are the Main Motivating Reasons for Doing Early V&V?.
To understand the motivations for doing early V&V, we extracted from each paper the reason(s) the authors use to motivate their activities. The exact extractions are found in the replication package, while Figure
6 summarises the findings. In particular, it displays all the motivations mentioned at least twice in the papers, while those mentioned only once are valued as “Other” in the figure.
The reasons listed in Figure
6 can partially overlap, and many papers report multiple reasons for performing V&V activities. The most common motivation for performing early V&V is to ensure a desired level of quality for the design before proceeding with the implementation. This is followed by reducing risks with late flaw detection, reducing time to market, reducing risk for incomplete requirements, and exploring system behaviour before implementation. In this respect, the predominant motivations seem to directly target the reduction of risks associated with introducing errors or creating incomplete specifications in the early phases, which should lead to a more streamlined process. This is also confirmed by the fact that reducing time to market is an often quoted motivation for the activities performed by authors. Other potential risks to be prevented are those caused by sub-optimal (or even wrong) design decisions that could impact critical quality attributes of the system. In this respect, several of the more frequently mentioned reasons are related to performances and dependability (safety in particular).
A set of papers does not clearly motivate the reasons for employing their solutions targeting early V&V; we note that many of these papers describe more theoretical works without any real target case study, which is perhaps why there is a lack of motivation for the activities. For example, Kahani and Cordy discuss bounded verification for state machines in a train system controller [
P40]. However, they do not discuss their solution motivation at length, as they deem the concepts already well established. Similarly, Liu et al. [
P138] discuss Assume-Guarantee Reasoning in the context of scheduled components and present a general theory instead without a concrete motivation. In other words, these research works deal with techniques that might support early V&V solutions but do not explicitly mention concrete usage scenarios.
RQ1 discussion: To summarise, while there is some spread in the extracted motivating reasons, there seems to be a consensus about V&V in the earlier phases of development as being key to minimising development costs and reducing risks with flaws later in the development lifecycle. Although the main category of early V&V motivation can be formulated as “anticipating V&V before implementation” and is perhaps what is to be expected, it does not even map to half the papers.
13 Apart from the more expected results, commonly reported motivations include “reducing risks associated with faulty design or requirements,” “performing tradeoff studies,” and “improving communication and integration between system aspects.” Most of the reported motivations for V&V are well in line with SE state of the art and practices [
88], in which MBSE is seen as means of going earlier with the involved activities while maintaining the necessary rigour. Additionally, the authors mostly regard their solutions as situated in the design phase, considering the requirements phase to a smaller extent. Moreover, phases past design are not represented to any significant extent in the paper solutions. The extracted data indicate that MBSE is reaching maturity for using methods of analysis in system design with significant benefits, and it may also signify that the requirements phase has a lower need for added capabilities to current methods. However, the lower amount of papers targeting the requirements phase could also highlight that solutions need to be more mature, something reflected in the lower representation of industrial cases in the selected papers targeting the requirements phase. We also observe that few papers seem to target both the requirements and design phases, probably due to the issues related to connecting different artefacts (and corresponding tools), often reported as a weak point for MBSE [
41].
Perhaps, more interesting is what is not reported by the authors to any significant extent, notably aspects such as re-use (of the V&V artefacts for other phases of the development process as well as for other projects) and improved traceability between development stages. This is surprising, since the mentioned aspects are often argued to be strong points for MBSE adoption [
41]. In the same fashion, there is little discussion on the broader integration of the V&V activities in the SE landscape, which is argued as a benefit of MBSE. Notably, there are few mentions of the digital thread [
80], which can be enabled with model-based methods. In conjunction with this, few papers discuss cross-domain integration, which also is something to be expected when leveraging abstraction. In essence, the solutions seem to overall lack a holistic motivation; that is how early V&V supports parallel and down-stream activities.
6.3 RQ2 - What Are the Means of Describing System Behaviour at an Early Stage of Development?
This section summarises the data extracted to answer RQ2, namely questions 4, 5, and 6 in Table
1.
6.3.1 How Is System Behaviour Represented in Early V&V?.
Figure
7 shows the languages utilised in the solutions to describe system behaviour. The category “Other” refers to entries only represented once in the extraction that are not created specifically for the solution, which instead is represented by “Custom language.” When we consider the implementation of the solutions, we differentiate between tools, languages, and formalisms (where possible) based on existing classifications as presented in Section
4.
The data in Figure
7 are based on the reported languages/formalisms from the analysed papers. Here it is worth noting that the input varied a lot in detail. Notably, some papers state “SysML” without specifying or showing what sub-set is utilised. Similarly, different types of diagrams or means of representing behaviour are shared among the various languages, such as state-based formalisms. Moreover, the selected papers reported languages or formalisms in an inconsistent way. These issues do not allow us to perform meaningful clustering of the results, and therefore we limit the reporting of the extraction results to the languages, which all analysed papers mention.
While there is a large spread of languages to represent system behaviour in the early phases of development, it is clear that SysML is by far the most common language, and much of the language is utilised, especially the behavioural diagrams and block diagrams. Figure
8 illustrates the sub-sets of SysML utilised in the papers adopting SysML as part of the proposed solution. We note that the distribution of diagram types is found in a similar way for UML and other UML profiles.
When considering the sub-parts of SysML utilised in the selected papers, activity diagrams and state machine diagrams are the most popular means of using SysML for describing system behaviour; particularly, it is utilised in many papers aiming for automated translations. For example, Staskal et al. [
P140] map SysML activity diagrams to the symbolic model checker nuXmv, and Mahani et al. [
P36] similarly map SysML state machine diagrams to the NuSMV model checker. Many solutions use more than one diagram of the language to describe the behaviour, and often it is not entirely clear what is used and what is not by simply reading the paper. Nonetheless, of the four diagrams classified as behavioural diagrams in the SysML standard, activity and state machine diagrams are preferred over use case diagrams and sequence diagrams. Besides, one paper argues that the solution utilises the entire SysML specification [
P114], and four others that all of the behavioural diagrams are utilised [
P12,
P46,
P91,
P122].
SysML, as a general-purpose language, aims to be a solution for all types of systems. Further, it is evident that UML and languages related to UML, such as MARTE, are among the most commonly used languages and formalisms apart from SysML (which indeed has been typically implemented as a UML profile until the recent version SysML v2). This might indicate that general-purpose languages such as SysML and UML are suitable for describing early systems specifications in general and behaviour in particular. Nonetheless, a large number of papers utilise a means of description that is only observed once in the selected papers, indicating a large variety of domains and possibly the need for domain-specific support. For example, Supremica is used by Markovski [
P26], which extends finite automata to handle large complex industrial systems, KARMA used by Ding et al. [
P78] to unify formalisms across several MBSE models and simulations, or SWRL used by Chen et al. [
P95] in combination by OWL to leverage ontology reasoning for verification.
A few papers propose custom languages or formalisms in their solutions. Miao et al. [
P1] present a Python-like custom informal language for their system descriptions. Lemazurier et al. [
P28] utilise a domain-specific language for nuclear power plants, along with several other domain-specific languages, to manage their system descriptions. Similarly, Deng et al. [
P48] utilise a custom language and environment for mechanical product design, and Stachtiari et al. [
P60] use their custom language for describing satellites. Singh and Muller [
P80] use a tool-based approach to describe their system of interest based on needs observed in their industrial context of manufacturing systems. Bernaerts et al. [
P126] use a specific type of model for describing safety-related aspects in the automotive domain. Miyazawa et al. [
P99] use their custom state machine formalism for describing robotic systems. Zhang et al. [
P141] propose an integrated intelligent modelling and simulation language to cope with the combination of domain specific models in System of Systems.
6.3.2 What Language or Formalism Is Utilised for Behaviour Analysis?.
To present the results from this data extraction, we differentiate between the reported languages and the reported formalisms. Analogously to the behaviour description, authors rarely report in a precise way how a language is implemented or what parts are employed for what purpose. Further, some papers report on the particular formalisms employed while others do not. In particular, Figure
9 shows the reported languages and their frequency for analysing behaviour, where “Other” groups the entries reported only once that are not custom made for a particular solution; Figure
10 depicts the reported formalisms for analysing the behaviour.
Figures
9 and
10 clearly show that languages and formalisms used for analysis are typically case-dependent, since a large majority of the entries is represented only once or twice in the papers. In this respect, many reported languages and formalisms serve narrow/domain-specific purposes that are not easily portable to a more general case. Examples include languages such as Event-B [
P24], OWL [
P104], LabVIEW [
P97], or Sabotage [
P137]. In addition, despite the widespread use of SysML- and UML-based languages for the system description, there is no evidence about potential “standard” analysis approaches for early V&V of specific properties.
Apart from the case dependent solutions, some general-purpose languages and formalisms are frequently used, namely SysML, MATLAB/Simulink, Modelica, Petri nets, and various state-based formalisms. Moreover, it is interesting to notice that there exists a group of solutions presented as implementation agnostic with respect to the particular type of formalism and language used for the analysis [
P77,
P94,
P95,
P125]. Friedl et al. [
P77] argue that their solution can be adapted to cover various case-dependent languages. Similarly, the solution from Castet et al. [
P94] involve ontologies and is presented as being adaptable in terms of languages, where an implementation example is given in the Modelica language. Chen et al. [
P95] also utilise a method of automatically creating specific design ontologies and rules from requirements and argue that the target implementation depends on the context considered for the system under study. Damm et al. [
P125] showcase a method for contract-based virtual integration testing and consider it to be language and tool agnostic. Similarly, as for languages used for behaviour description, there are a few custom languages and formalisms, of which two overlapping with custom languages for describing behaviour [
P1,
P28]. Moreover, Kang et al. [
P4] extend previous work to analyse EAST-ADL models, and Brandstetter et al. [
P59] use validation rules for automation process software requirements validation that is not tooling dependent.
6.3.3 If the Description and Analysis Language Differ, Then How Is the Transformation Performed?.
Although SysML is the primary language used to represent system behaviour, it is seldom used for analysis. More in general, only in 24 cases of all the papers is the language or formalism for representation and analysis are the same, where SysML is used 13 times [
P22,
P31,
P50,
P51,
P55,
P58,
P63,
P70,
P101,
P104,
P107,
P114,
P130,
P147]. As a consequence, in all the other cases a transformation (which can be composed of several sub-transformations) is required between the different languages and formalisms, at least to translate concepts from the representation language toward the analysis one. In more complex scenarios, such transformation would be used to extract and synthesise the necessary information to perform the analysis due to early V&V.
We note that 94 of the papers (63.0%) utilise fully automated transformations, while 19 (12.7%) solutions utilise semi-automated transformations, and 12 papers (8%) use some form of manual transformation. We note that rarely do authors provide proof of transformation correctness in case of automated solutions, possibly highlighting a gap in the works. Moreover, as previously mentioned, 24 (16.1%) use no transformation. The results from the extraction indicate that if a transformation is performed, then it is mostly done via automatic means; otherwise, either no transformation or semi-automatic means are employed. Singh and Muller use Dynamic A3 Architectures as an approach for validation leveraging tool support without the need for transformations [
P80], particularly by promoting view management and cross-communication between teams via integrated tool capabilities, early validation is achieved by increased communication, collaboration, and integration between engineering teams. Farooq models directly in Simulink and leverages the tool capabilities for simulation and verification [
P84]. Since some of the proposed solutions utilise several languages and formalisms in both the description and analysis, semi-automated transformations also include cases where some of the multiple translations are automatic while some are not. For example, González et al. [
P62] leverage co-simulation mechanisms in conjunction with SysML models and only transforms a sub-set of the model information to MATLAB equations. Friedl et al. [
P77] discuss the use of SysML architecture models in a process for model integration between system architects and model experts and highlights a semi-automated guided approach for integration. Only few papers rely on a completely manual approach for mapping different languages and formalisms [
P1,
P30,
P43,
P44,
P48,
P59,
P65,
P85,
P120,
P123,
P127]. It is interesting to note that these solutions tend to be reported as not validated in an industrial context by the publication authors [
P30,
P43,
P44,
P48,
P65,
P120,
P123,
P127] and targeting the requirements phase;
vice versa, the solutions deploying automated transformations are often validated in industry and do not target requirements.
RQ2 discussion: When describing a system’s behaviour, there is an apparent representation of SysML and UML or other types of UML profiles, such as MARTE. The prominence of these languages is not surprising and has been reported previously in other types of reviews [
21,
60,
66]. Of the different types of behavioural diagrams in SysML, activity diagrams and state machine diagrams are the most commonly used. However both use case diagrams, and sequence diagrams are used significantly less, seemingly not good candidates for describing the systems at this stage to enable V&V. However, as opposed to the description being uniform in the language, analysis mechanisms vary a lot. Indeed most papers utilise languages or formalisms only found once or twice in the set of selected papers. The most-often-reported languages are SysML, Simulink, and Modelica. Petri net diagrams (with various formalisms), UPPAAL, and other similar languages are also used often. The mappings between the languages in the cases where the behaviour representation and analysis use different means (which is mostly the case) show a clear tendency toward automated or semi-automated solutions. This is consistent with the pragmatics of model-based development (and consequently of MBSE) [
78], since handling such
discontinuities by hand would introduce accidental complexity to the solution, making its adoption difficult and even not possible at all.
In a broader perspective, the wide adoption of SysML- and UML-based languages for behaviour description might confirm their status of de-facto standards for early systems’ modelling. In fact, keeping those languages for early behaviour description eases the technology transfer for the developed analysis mechanisms by avoiding learning new languages (and also adopting new tools). However, as SysML- and UML-based languages are general purpose, they often do not convey enough support for domain-specific analysis, thus requiring information extraction/translation toward semantic domains in which the analysis can be performed. Relying heavily on such technologies introduces further complexity into the process, which might make industrial adoption a larger challenge.
While we see some expected results from the analysis, there is at the same time a lack of many important topics. Relying on model-transformations to create analytical models creates a relevant dependency on these transformations, but there is little discussion on the viability of transformations. Similarly, the notion of consistency management across different languages in addition to interoperability and scalability is mostly missing. Implementing MBSE in industrial contexts will often be reliant on large tool-chains, relying on a set of model transformations in a landscape of changing tools, standards, and users is a considerable risk. For languages not using transformations, many are implemented in advanced tooling like Simulink or integrated MBSE toolkits like Cameo Systems Modeller or MagicDraw, hinting at these solutions being more closely tied with industry needs. Overall, the large range of analytical languages and notations proves powerful flexibility but, however, introduces complexity in the process, which is a considerable tradeoff.
6.4 RQ3 - What Are the Results of Interest for Authors Performing Early V&V, and What Techniques Are Employed for the Required Analysis?
This section summarises the data extracted to answer RQ3, specifically answering questions 7, 8, and 9 from Table
1.
6.4.1 What Methods and Techniques Are Used for Analysis?.
Figure
11 reports the V&V analysis methods and their frequency as extracted from the papers. Similarly to what done previously, we do not display entries only represented once in the papers and group them as “Other” in the figure.
The most frequently reported method is simulation, which for some papers is also referred to as model execution, virtual testing, and co-simulation. The papers not utilising simulation employ many different means of analysing models, often through static analysis methods. More common types of analysis are model checking and manual inspection/review. There exists some overlap between the reported techniques, partly due to some solutions employing several means of analysing system behaviour or because of solutions being broad in the sense that several aspects are analysed either in parallel or through different steps. As an example, Quadri et al. [
P81] discuss a framework with several interconnected components for V&V on SysML/MARTE models, including simulation, compile time visualisation, and temporal property checks. Similarly, Liu et al. [
P16] introduce a methodology to leverage a sub-language of AADL for simulation, schedulability analysis, syntax analysis, lexical analysis, and more.
Methods with fewer entries are usually devoted to domain-specific analysis, and many of the reported entries relate to techniques for reasoning about or analysing the models directly without any need for execution. Examples include program analysis [
P13], compositional verification [
P138], model reasoning [
P116], and correctness by construction [
P60].
6.4.2 What Results Are of Interest for the Authors?.
This section illustrates the results collected due to the data extraction related to question 8 in Table
1. To further understand early V&V from the authors’ perspective, we map the results of interest in Figure
12. Again, we do not include entries only represented once, which are instead classified as “Other.”
Similarly to previous results, there is a large spread of interest in the analysis feedback and outcomes. Even more, the results for this question are influenced by the lack of harmony and consistency in the terms used and the presentation of data from the authors, which is likely due to the broad range of domains involved in the selected papers. Execution traces refer to works using the traces of execution themselves for analysis, for example, Delvi et al. [
P74] plot execution traces to compare and evaluate energy consumption, while Kotronis et al. [
P100] visualise and compare traces for railway system configuration performance. Safety features can include, for example, guarantees of execution as in the case of Dragomir et al. [
P148], safety assessments from Fault Trees or Failure Mode and Effect Analysis in the case of Krishnan et al. [
P139], or more comprehensive V&V safety analysis through a set of techniques as in the case of Bozzano et al. [
P67]. Inconsistency can refer to inconsistency between different system behavioural views, as in the case of Duhil et al. [
P58], or between different tools and notations belonging to traditionally separated teams as in the case of Gregory et al. [
P117]. Intended functional behaviour can refer to simulation to validate the execution of generated code like for Bocciarelli et al. [
P10] or to detect design and integration errors with model-checking as in the case of Braspenning et al. [
P88]. Functional requirements verification, however, can refer to the explicit requirement satisfaction checks as in Kang et al. [
P4] or in Anwar et al. [
P86], where SystemVerilog assertion code is generated for a particular SoI based on the design requirements defined using OCL, which can be leveraged during the eventual detailed system design for requirements verification.
Domains such as power plants [
P44], canal/waterway systems [
P85], web applications [
P132], and medical systems [
P140] have varied vocabularies compared to the more represented domains such as aerospace or automotive. Nevertheless, it is possible to identify some clear trends, such as execution traces and intended functional behaviour being of high interest. Similarly, results commonly obtained through static analysis, such as liveness, inconsistency, deadlock-freeness, reachability, and so on, are also reported often.
A few papers argue that their analysis approaches are adaptable to the application of interest for a specific context, and as such the results of interest might vary depending on the utilisation scenario. Notably, Anwar et al. [
P41] present a meta-model for modelling FPGA-related concepts and argue that their solution can be used to generate test benches for various purposes. Votintseva et al. [
P47] reason about multi-domain systems and propose solutions for managing cross-domain analysis, suggesting that their approach depends on the chosen domains and applications. Kaslow et al. [
P55] use various inspection forms to perform their analysis, which is argued to be adapted for the stakeholder needs. González et al. [
P62] propose a means of using co-simulation for early V&V and argue that the results of interest depend on the evaluation context. Hecht and Chen [
P130] utilise various query-based operations on SysML models and argue that this analysis should be adapted for the particular system of interest and corresponding models. Zhang et al. [
P141] use their custom language X with a prototype tooling that covers an extensive array of potential analysis, and as such, the results vary accordingly.
6.4.3 What Are the Tools Used for Analysis?.
This section discusses the outcomes of the data extraction related to question 9 in Table
1. We present a categorisation of tools used in Figure
13 and refer to the appendix for an explicit mapping of tools to solutions.
Consistently with previous results on languages for system description and V&V analysis, the number of tools used is extensive, and most of the tools are found only once or twice among the analysed papers. The more common categories are integrated MBSE toolkits, graphical programming/simulation environments, model-checkers, modelling frameworks, and simulation toolkits. The most commonly used individual tools are MATLAB/Simulink with corresponding libraries and EMF-based solutions. EMF is one of the leading free/open source platforms for modelling, which makes it an attractive tool/platform for academic investigations [
82]. MATLAB/Simulink, in a similar fashion, is suitable for both academics and industry and has a rich history of applications in the model-based community [
4]. Apart from these tools, many proprietary SysML editors are commonly used for their integrated analysis capabilities, eliminating, at least partially, the need for different tools for describing and analysing behaviour. In some cases, the tools are not presented [
P12,
P31,
P37,
P38,
P43,
P54,
P66,
P92,
P98,
P104], and other times it is argued that the solution is not tool dependent [
P47,
P64,
P76,
P95,
P101,
P109,
P125,
P134]. Moreover, some authors have proposed custom tools or environments for their analysis [
P27,
P48,
P68,
P82,
P87,
P141,
P144].
RQ3 discussion: Simulation is by far the most commonly used means of analysis. This perhaps indicates that for meaningful analysis at early stages, it is required to have advanced means of analysis, particularly for dynamic behaviour. Apart from simulation, there is a broader spread of different methods that can be reduced to model checking or automated reviews/inspections.
The results of interest are typically tightly coupled with the domain, the adopted languages, the tools, and so forth, hence showing a large variety of target properties. More explicit and shared set of results of interest can be observed when model checking is adopted, namely freedom of deadlock, liveness, safety, and reachability. In general, the embedded systems domain seems to have a clearer view of the process and scope of early V&V, which is reflected in the more compact set of tools, languages, and results compared to the other types of domains. Moreover, the embedded systems domain shows some maturity regarding early validation practices, with earlier publications in the observed timeline for early V&V.
Another interesting aspect is that models describing the behaviour in a somewhat uniform way, often in SysML, can lead to a wide array of analyses; this shows the powerful flexibility of the semi-formal nature of general-purpose languages like SysML and UML profiles. Of course, such semi-formal descriptions often entail the need of transformations to more structured representations for analysis, which increases the complexity and reduces the freedom of modelling, as automatic transformations require structured formatting.
The tooling reported for analysis in the papers is the category with the most significant spread observed. Very few tools are seen as good enough for the general audience of MBSE. Of the more frequent entries, Eclipse/EMF, MATLAB/Simulink, Papyrus, OpenModelica, and UPPAAL are the more commonly reported tools. Some tools that do not perform transformations between languages are also represented in the papers, like MagicDraw and Cameo Systems Modeler. These latter two are tools used in solutions where early analysis is performed directly on SysML diagrams thanks to functionalities embedded in the tooling. Moreover, as expected, these solutions are often observed in industrial applications, where it can also happen that the tooling is customised on-demand to perform specific tasks.
While tooling is tightly connected with MBSE, it is surprising that so few solutions claim to be tool agnostic. In fact, if tooling is central, then MBSE likewise considers methods, methodology, and languages. However many of the solutions rely on tool-specific analysis, regardless if the solution is extensive or not. Academic tools tend to be more compact and openly accessible for the best impact, while industrial tooling should often integrate into larger processes and there is a reluctance on openly available tooling as it rarely is scalable and maintainable for large enterprises in addition to intellectual property protection concerns. Therefore, the fact that much of the tooling is very case-specific feeds into the known problems of interoperability and maintainability.
6.5 RQ4 - Who Are the Users of Early V&V?
This section summarises the data extracted to answer RQ4, more specifically questions 10, 11, and 12 from Table
1.
6.5.1 What Is the Reported Domain?.
Figure
14 displays the domains identified in the selected papers: There is a multitude of domains observed in the papers, of which the most prominent are Aerospace and Avionics.
14 Other domains with a significant presence are embedded systems, cyber-physical systems, safety critical systems, automotive, and railway. Finally, there is also a large category of domains only reported once in the papers, such as nuclear power plants [
P28], canal systems [
P85], web applications [
P132], and cloud computing [
P134].
Aerospace is somewhat expected to be one of the most commonly reported and investigated domain given the complexity of the developed systems and the historical relevance of this field. Indeed, there are prominent promoters for MBSE, such as NASA, which produce quality research on the topic [
42,
70]. Similarly, embedded systems are quite mature regarding formal methods and model checking [
54,
85], which are often reported in the papers. Moreover, the automotive and railway industry have strong foundations on model-based practices [
7,
24], and standards such as AUTOSAR exist for automotive [
32]. Cyber-physical and safety critical systems are broader categories than previously mentioned ones, but we observe a strong presence regardless. In this case, CPSs typically benefit from the unified view models provide (e.g., for integration analysis purposes) while safety critical systems require early analysis for providing the necessary evidence that systems will meet the imposed requirements.
6.5.2 Is the Solution Domain Specific?.
Of the analysed papers, 117 contain domain-specific solutions (78.5%), 16 (10.7%) papers argue to be domain independent, and 16 (10.5%) illustrate a domain-specific solution but make a case for it being extensible for more domains. Examples include Bocciarelli et al. [
P10], which describes an approach based on the IEEE 1516-2010 - HLA standard and apply it concretely in the SoS domain but argue for wide applicability. Similarly, Stewart et al. [
P17] describe a process for safety analysis in the aerospace domain but argue that it can be extended for any safety critical domain such as automotive or nuclear power plants. In the same way, Herzig et al. [
P30] apply a SysML-based simulation analysis method on a telescope use case but argue that the approach can be extended for other domains. The fact that such a large majority of papers are domain-specific and many different domains are reported is consistent with the broad range of solutions proposed in the analysed papers.
Of the non-domain-specific papers, only three are validated in an industrial setting [
P1,
P34,
P105], indicating that such solutions might not have the necessary maturity required for industry. The three papers discuss their applications in at least two domains. For example, Cheng et al. [
P34] discuss an approach for automatic analysis of multi-view architectures and present both an automotive and micro-service cloud application.
6.5.3 How Was the Solution Validated?.
This section answers question 13 from Table
1 by extracting whether the solution was validated in an industrial setting or not and what kind of validation is discussed in the selected papers.
Fifty-one papers (34.2%) evaluate their results in an industrial setting, which showcases a strong industrial presence. We re-emphasise here that the validation of the approach is extracted from the publication itself, and hence we rely on the definition adopted by the authors.
One hundred fourteen (76.5%) of the analysed papers use some form of running example as defined by Shaw [
79] for the validation of the solution; in particular, here we group toy examples with slice of life examples. A sub-set of contributions, 18 (12%), present empirical observations of the solutions [
P3,
P4,
P5,
P11,
P14,
P40,
P49,
P60,
P62,
P66,
P67,
P80,
P87,
P103,
P110,
P134,
P149]. For example, Mens et al. [
P3] introduce a new method for testing and validating state-charts, and part of their evaluation is a controlled user study to measure the tools’ effectiveness and usability. Andrade et al. [
P5] introduce a method to map SysML activity diagrams to Petri Nets for the evaluation of real-time systems with energy constraints and measure the obtained analysis results with hardware measurements, which confirm the adequacy of analysis results. Anwar et al. [
P11] introduce a model-based method for design of FPGAs and measure the time taken for baseline methods compared to their approach in man-hours. Eventually, 17 (11.4%) papers have no example or empirical observation for their solution [
P15,
P25,
P31,
P32,
P42,
P53,
P58,
P59,
P70,
P98,
P100,
P104,
P119,
P124,
P128]. For example, Lima et al. [
P25] describe a semantic for reasoning about SysML diagrams via refinement and support their approach through abstract high-level disjoint diagrammatic examples; instead, Gauthier et al. [
P119] explain a process to transform SysML models to Modelica code without any example in the publication apart from the transformation rules.
The outcome about users of early validation is consistent with the other categories in the sense that there is a large and significant spread. The outliers in this regard are aerospace (with a related domain referring to avionics), embedded systems, CPS, safety-critical systems, railway, and automotive. Interestingly, solutions in the embedded systems domain tend to be consistent in terms of solutions proposed by the analysed papers, while the same cannot be said for the other domains. In fact, in most of the larger reported domains there is no common/shared view of how early V&V is expected to be performed. For example, in aerospace there are examples of requirement analysis [
P7], simulation [
P10], schedulability analysis [
P16], model checking [
P17], inspection of diagrams [
P55], and so on.
RQ4 discussion: As it might be expected, most of the solutions are domain specific, and only around 11% claim to be applicable to any domain. As a matter of fact, since performing analysis early in the SE process requires assumptions to be made or uncertainties to be managed, the solutions tend to be domain specific due to more precise and reliable information on which to build the analysis. Further, as many target languages or tools are often coupled strongly to a domain, profiles and constraints need to be in place to enable SysML and similar semi-formal (general-purpose) languages to be applied with enough rigour.
Around 34% of the analysed papers describe research performed in an industrial setting, seemingly indicating the industrial applicability of many solutions. Indeed MBSE is a paradigm with a solid industrial perspective, which puts many requirements on the types of solutions. Related to this substantial industrial scope we note also that most of the solutions are based on examples to various degrees, with less focus on empirical measurements. Indeed, only few papers discuss empirical evidence to support the solutions that instead largely rely on arguing for the perceived benefits. These observations are consistent with the previously reported weakness of MBSE [
41], that is, the lack of empirical evidence. In this respect, we argue that empirical evidence could be hard to produce for solutions of this nature, as the measurements are intricate to define and might be challenging to perform in industrial settings without a high risk of introducing bias and other validity threats. Nevertheless, this is a reoccurring problem regarding model-based practices that hinders evidence-based discussions on a broader scale [
16].
6.6 RQ5 - What Limitations Do Authors See, if Any, with Their Implemented Solutions?
This section summarises the data extracted to answer RQ5, namely question 13 in Table
1.
Not all authors identify limitations for their proposed approaches for early V&V. The results of those that do identify limitations are presented in Figure
15. The largest identified limitation is that the proposed solution is not fully developed, often partially covering what is of interest for the authors. Concretely, the solution by Zhu et al. [
P2] does not consider the validity of SysML models that are used as input for their automated approach, and Ghitri et al. [
P118] cannot perform automation for all parts of their transformation from SysML to UPPAAL. The second biggest limitation is related to the proposed analysis due to the simplifications introduced by the adopted level of abstraction in the modelling activities. For example, Zhang et al. [
P45] note that their approach does not adequately cover more complex systems compared to their case study due to the simplicity of the models. Brandstetter et al. [
P59] conclude that their approach can only confirm whether the model of requirements is valid and not if the actual system is. Another significant challenge is due to integrating languages when there is a difference from description and analysis language or formalism [
P26,
P36,
P47,
P57,
P62,
P64,
P86,
P109,
P123,
P130,
P134]. For example, in Mahani et al. [
P36], automation is tricky between SysML and NuSMV due to the differences in representations. Other issues reported include the lack of automation [
P15,
P19,
P22,
P47,
P65,
P106,
P116,
P140]. For instance Staskal et al. [
P140] argue that due to the semi-formal method of modelling for SysML, it is difficult to achieve consistent automation. Limited expressiveness of the behaviour description language is another common limitation [
P30,
P34,
P70,
P122,
P128,
P134,
P135,
P138,
P148]. For example, Liu et al. [
P138] discuss how AADL semantics makes it difficult to express timing and execution behaviour for analytical purposes, Schamai et al. [
P90] discuss how ModelicaML cannot describe all kinds of requirements used in their context for verification.
RQ5 discussion: We note that many of the stated limitations are related directly or indirectly to the fact of being early in the development process. Indeed, if the level of abstraction is high and/or the known system’s details are little, then the analysis should be expected to be limited or superficial. Still, there is a clear statement from the authors that this is an observed liability of the solution, highlighting the difficulty of balancing fidelity and abstraction of models in the early stages. Similarly, we notice many issues relating to the tooling and learning curve, which are commonly reported problems in software and systems modelling.
The extracted input for RQ5 is the lowest in terms of quantity. Indeed many papers did not discuss to any considerable extent the limitations with the developed solutions and/or for early V&V in a broader perspective. However, we highlight some patterns that emerge from those works that performed a specific discussion related to limitations. The most commonly reported limitation is that the solutions presented in the paper are not fully developed. An incomplete solution is a limitation most likely common regardless of the review subject, as papers often discuss partial solutions or work in progress toward specific goals, so it is hard to say how relevant is the connection between this pattern and the subject of this review. Nevertheless, this could be seen as a weakness for MBSE, as there is a need for mature solutions to be adopted substantially from an industrial perspective.
Eventually, more typical limitations related to model-based practices are also frequently discussed, namely issues due to tooling, interoperability, learning curves, and scalability. Indeed, these limitations are not surprising and highlight the need for more maturity in the field toward industrial applicability, as reported in the research literature and mentioned earlier in this article.
In a broader perspective, MBSE and SE generally aim at the entirety of a system lifecycle. Many MBSE methodologies, such as MagicGrid [
65], provide a framework for activities and methods across the various lifecycle stages. Interestingly, only few solutions discuss the potential limitations of interconnecting early V&V activities and results with later stages. Notably, while some papers make a connection between phases, the discussion and application of traceability between artefacts in each phase are limited. In the visions presented by organisations such as INCOSE, it is often reiterated that future modelling should encompass all parts of a system lifecycle. However, without more sophisticated traceability means between the models produced at various stages, there is an inherent risk of introducing inconsistencies or additional efforts related to managing information across artefacts. As a matter of fact, this is one of the identified issues with document-centric development that has often been used as an argument to move toward model-based practices [
38].
9 Conclusions
This article reports the results of a systematic literature review on early behaviour validation and verification in model-based systems engineering. From a set of 701 papers retrieved through searches and snowballing activities, we selected 149 relevant contributions, we extracted and coded the obtained data, and we performed analyses whose results and findings are presented in the work. In this respect, we notice an increased interest in performing early V&V and observe a broad range of domains in the analysed papers, with a corresponding variety of methods, tools, and languages. Further, we note a strong industrial presence in the literature and several industrial perspective trends that differ from the academic ones. To name a few of our findings, we note that SysML is the most represented language in industry and academia for describing system behaviour. In contrast, the language or formalism for analysis varies between most solutions. Additionally, several limitations are identified, indicating a lack of readiness for the solutions together with the concerns about managing analysis with low-fidelity models. Finally, a significant divide emerges between the academic and industrial implementation of solutions; such a divide is especially observable for SysML, utilised across all contexts but relying on different tooling for the contexts.
We contextualise the review findings and discuss the current status of early validation of system behaviour in the context of industrial MBSE adoption. The review is structured according to the needs of the industry to promote the eventual adoption of early V&V and MBSE processes at large. The review provides actionable insights for the five presented research questions to promote further investigation into this area. Furthermore, we distinguish three areas, Model-Based, Systems Engineering, and Validation & Verification, and highlight a set of corresponding barriers for each area, which we feel need to be addressed to promote and support industrial adoption of early V&V techniques. As such, we hope the findings of this review can provide an adequate state-of-the-art view and pave the way for future investigations for researchers and practitioners.