Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

A Combinatorial Reliability Analysis of Generic Service Function Chains in Data Center Networks

Published: 02 December 2021 Publication History

Abstract

In data center networks, the reliability of Service Function Chain (SFC)—an end-to-end service presented by a chain of virtual network functions (VNFs)—is a complex and specific function of placement, configuration, and application requirements, both in hardware and software. Existing approaches to reliability analysis do not jointly consider multiple features of system components, including, (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency. To this end, we develop a novel analysis of service reliability of the so-called generic SFC, consisting of n = k + r sub-SFCs, whereby k≥ 1 and r≥ 0 are the numbers of arbitrary placed primary and backup (redundant) sub-SFCs, respectively. Our analysis is based on combinatorics and a reduced binomial theorem—resulting in a simple approach, which, however, can be utilized to analyze rather complex SFC configurations. The analysis is practically applicable to various VNF placement strategies in arbitrary data center configurations, and topologies and can be effectively used for evaluation and optimization of reliable SFC placements.

1 Introduction

Reliability of complex systems can be defined as a probability that the system will successfully complete the processing of the intended service. As such, reliability analysis in system engineering is critically important for designing systems that are robust in case of failures of any system component. Within the framework of Network Function Virtualization (NFV), the reliability of Service Function Chain (SFC)—a chain of Virtual Network Functions (VNFs)—is a complex function of VNF placement, network configurations, and application requirements on SFC deployment, both in hardware and software. Generally, SFCs suffer from processing vulnerability, whereby failures caused by hardware or software can interrupt the entire service chain [1, 17]. To maintain the service in case of failures, the most common technique is based on redundant configurations [3, 18, 20, 32]. With redundant configurations, the failed processes can be redirected to backup components. However, even a full backup system may not yield 100% reliability if backup resources are placed in shared servers or racks. Moreover, SFCs can deploy replications to improve response times and flexibility of scaling, thus further complicating the calculation of reliability.
Previous approaches to reliability analysis cannot be generalized for SFCs, because they do not jointly consider a few important aspects of system components in data center networks (DCN): (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency, resulting in the so-called common cause failures and failure propagations. A joint consideration of all these different features, in consideration of component’s dependences, is, however, of major practical relevance. For instance, when VNFs are placed in the same rack they create dependencies, which implies that a failure of a rack causes a failure of all servers and VNFs inside the rack. Furthermore, all components, be it racks or servers, or servers with different hardware, exhibit different component reliability by design, also due to different lifetimes and levels of utilization. In fact, SFC reliability calculation of individual system components was well studied, i.e., without these dependencies, especially in the context of the so-called reliable SFC mapping problem [6, 10, 14, 15, 21]. References [11, 12] explored the role of server redundancy in data centers with server failures. In Reference [30], a scalable and efficient algorithm for VNFs placement to addressing physical link failures was presented, whereas in Reference [34], the focus was on disjointness, while addressing failure of servers. References [24, 25] studied a joint optimization of SFC placement and flow routing under the assumption of VM failures.
Component disjointness and their independence is a frequent assumption to assess the reliability independently of SFC and VNF placement. In such scenarios, the reliability calculation is straightforward. In practice, however, one rack can host multiple servers that are hosting multiple VNFs of the same SFC, which requires a different analytical approach. References [4, 5] address this problem, as they provide a hierarchical topology model without backup protection to asses reliability based on queuing and graph theory and apply Bayesian analysis to predict service reliability. This analysis assumes that service is reliable when all service components are available during service runtime, whereby the reliability calculation can be reduced to multiplication of all reliability values. The failure of a rack hierarchically propagates the failure to the corresponding servers and VMs, also referred to as hierarchical failure propagation [16]. In Reference [22], a hierarchical modeling framework of tree-based data center networks was proposed, where servers, links, switches, and routers hierarchically fail. The resulting reliability model yields an approximate reliability, due to the underlying assumptions of component disjointedness and independence. The assumption of component disjointness cannot, however, always be met in practice. In addition, SFC reliability is placement-dependent, as some VNFs can be hosted by different servers, racks, and data centers.
Another practical feature to consider is splitting and parallel processing of the workload using multiple servers, which is a widely deployed technique to improve load balancing, response times, and flexibility of scaling in data centers [19, 31, 33]. Therefore, an SFC is not only a serial configuration of various processing workloads (VNFs) but also a system that can be fully parallelized to provide end-to-end service in the form of replicas or backups. Hence, a generic SFC can be made of multiple so-called sub-SFCs running in parallel. Despite their practical relevance, such generic SFCs have been widely overlooked in the literature. In our previous study [8, 9], we analyzed SFC reliability for cases of parallelized processing as a function of interdependent hardware (server) and software (VNFs) component failures and VNF placement strategies. In Reference [7], we investigated SFC reliability as a function of VNF placement strategy in inter- and intra-DCN, taking into account failures of data centers, racks, servers, and VMs. We generalize and majorly extend our analytical approach from these papers and can summarize the main contributions of this article as follows:
Reliability analysis of generic SFCs, defined as rank-ordered VNFs allocated to n = k + r sub-SFCs, k≥ 1, r≥ 0 being the numbers of primary and backup (redundant) sub-SFCs, respectively.
Joint consideration of the hierarchical, component-dependent placement of generic SFCs, with consideration of the level of components’ disjointness, interdependency, and sharing.
Analysis of hierarchical failure propagation over data centers, racks, servers, and VNFs, with derivations of the so-called Acceptable Component Failures and the corresponding amount of components available for utilization upon failures.
Analysis of the suitability of the model to derive best placement strategies according to reliability requirements in generic inter- and intra-data center network scenarios.
The rest of the article is organized as follows: In Section 2, we provide background and assumptions. Section 3 provides the analysis of SFC reliability. Section 4 discusses the numerical results. Section 5 concludes the article.

2 Preliminaries

In this section, we introduce the basic concepts related to SFC placement, define generic SFCs, discuss hierarchical component failures, and provide definitions used for modeling. All notations utilized here and throughout the article are summarized in Table 1.
Table 1.
knumber of primary sub-SFCs;
rnumber of backup sub-SFCs;
ntotal number of pre-reserved sub-SFCs, i.e., n = k + r;
Ψnumber of VNFs in a SFC and, thus, sub-SFC;
ψa type of VNF in a SFC, where ;
number of hierarchy levels, i.e., different component types;
ccertain component type, ;
highest value of shared component c, ;
Htotal number of reliability classes, i.e., ;
ξreliability class of sub-SFC and related components, ;
number of sub-SFCs of reliability class ξ, ;
number of components from the highest hierarchy of reliability class ξ;
number of components of type c inside components of type c-1;
reliability of any shared component of type c utilized to host class ξ;
reliability of any unshared component of type c utilized to host class ξ and VNFψ;
number of component types to disjointedly place VNFs of the same type;
Δnumber of component types to disjointedly place VNFs of different type;
acceptable amount of failures of a shared component c belonging to class ξ;
acceptable amount of failures of an unshared component c used to host VNFψ of class ξ;
number of available components of type c and reliability class ξ;
number of failed components of type c and class ξ;
wset of reliability classes hosted by common root, i.e., ;
cwtype of common root component, which hosts |w| reliability classes;
Φset of root components for a certain SFC placement, i.e., ;
logical function to ensure that the common roots of any type c of any class are considered only once for calculation of SFC reliability;
number of available common roots of any type c and class ;
acceptable amount of failures of common root c of class ;
number of failed components of the lowest hierarchy level and a certain type, e.g., VNFs1, due to failed components of type , i.e., , of same class ξ;
placement-independent SFC reliability, if H≥ 1 and ;
placement-independent SFC reliability, if H≥ 1 and Ψ = 1;
Rplacement-independent SFC reliability, if and ;
placement-dependent SFC reliability, if H = 1 and ;
placement-dependent SFC reliability, if H = 1 and Ψ = 1;
placement-dependent SFC reliability, if H≥ 1 and ;
placement-dependent SFC reliability, if H≥ 1, Ψ = 1 and ;
placement-dependent SFC reliability, if H≥ 1, Ψ = 1 and ;
placement-dependent SFC reliability, if H≥ 1, and .
Table 1. Notation

2.1 SFC Placement

An SFC is defined as an ordered chain of Ψ virtual network functions (VNFs). VNFs are typically placed in Virtual Machines (VMs), which in turn are installed in networked servers in data center networks (DCNs), whereby a number of servers can be located in the same rack [2, 23, 27, 28]. Let us generalize the data center network architecture as a hardware fabric of switches, links, racks with multiple servers and multiple VMs per server, including core routers, Top-of-Rack (ToR) switches, forwarding switches building a switching fabric of DC, and Virtual Switch (vS) (Figure 1(a)). Core routers provide connectivity between geographically distant DCNs over wide area networks. ToR switches forward traffic within a rack. The switching fabric provides a connection between racks and to the core routers. Each server can host multiple VNFs with as many resources as required. Thus, the main task of the programmable hypervisor switch (denoted as vS) with a load balancer (LB) is to select the appropriate VNF instance as well as to provide connectivity to ToR.
Fig. 1.
Fig. 1. Composition, deployment, and placement of generic SFC in inter- and intra-DCNs: (a) composition and placement of generic SFC; (b) deployment and backup protection of generic SFC to ensure reliablity.
Embedding VNFs of an SFC within DCNs is a known challenge from the perspective of performance. As illustrated in Figure 1(a), an SFC created based on three VNF types, i.e., VNF1, VNF2, VNF3, can be placed in either one or multiple DCNs. Let us define the VNF type ψ, where , by the application implemented by this VNF, examples of which are firewall, Network Address Translation (NAT), Intrusion Detection System (IDS), or traffic shaper. As previously noted, the VNFs of various types are placed in an SFC in a rank ordered fashion. For simplicity, and without loss of generality, we assume that VNF type is defined by its index, i.e., Ψ = 1 indicates VNF1, meaning that it is both the type and order of VNF in the chain. Inside a DCN, there are multiple options to place various types of VNFs. For instance, in DC4, different VNFs of an SFC can be either placed in the same or in different racks, in the same or different servers, and so on. For example, Rack1 in DC4 accommodates an entire SFC, which is distributed over servers S1, S2, and S3, and in Rack2, two SFCs can be allocated to three servers, whereby each server provides two VNFs of the same type. In contrast, VNF1, VNF2, and VNF3 in Rack3 (DC4) are all placed in one server, e.g., S2. In addition, the same SFC can be placed in DC1, DC2, and DC3, and so on.

2.2 Generic SFCs in DCN

As previously noted, we define a generic SFC of n≥ 1 sub-SFCs, whereby n = k + r. In its simplest form (k = 1, r = 0) a generic SFC corresponds to the traditional and most widely used serial SFC, as illustrated by the solid line in Figure 1(a). Without redundant resources, SFC reliability depends on the reliability of the underlying hardware and software components. If we now assume that another SFC (dashed line) is additionally used as a redundant resource or backup, then we get closer to the idea of a generic SFC, here with k = 1 and r = 1. The backup VNFs can replace any failed primary VNF of the same type. Similar to primary SFCs, also backup SFCs can fail during service runtime. As shown in References [21, 29], there are multiple different strategies for fast VNF failure detection and effective deployment of primary and backup VNFs. For the purpose of our analysis, however, we do not need to explicitly consider any specific backup deployment strategy. Instead, we assume that any SFC is available and functional during service runtime, if at least k≥ 1 out of n VNFs of each type are available, while at most r VNFs of each type can fail during service runtime.
Figure 1(a) can be used again to illustrate this, whereby a request for VNF1, VNF2, and VNF3 is now split into two primary k = 2 sub-SFCs, i.e., r = 0, presented by solid and dashed lines. The resulting SFC is created over two sub-SFCs, one placed in DC4 in Rack1 and another in DC1, DC2 and DC3. An SFC is reliable/available only when both sub-SFCs are available during service runtime. It should be noted here that in a generic SFC, VNFs of the same type (e.g., all VNF1s) need to be synchronized in the case of stateful operation. To this end, an external state repository can be utilized to store the internal states of VNFs. Once the SFC workload is split between k sub-SFCs, each sub-SFC uses own VNFs. This can be implemented with SFC Encapsulation, as proposed in References [13, 26]. From the practical perspective, generic SFC can deploy traffic parallelism, SFC replicas, and network routing to provide for resilient networking, as known techniques being widely deployed. We note here that each of the techniques can be used additionally to also increase the reliability; disjoint network paths can help various SFC exhibit higher reliability. Again, our analysis does not make any particular assumption on routing and traffic splitting and merely calculates the likelihood that at least k≥ 1 out of n VNFs of each type are available during service runtime allowing that at most r VNFs of each type can fail during service runtime.
Figure 1(b) illustrates a generic SFC with backup protection, where network traffic request f with three VNF types (VNF1-VNF2-VNF3) consists of three workload flows (f1, f2, f3) and is served by a generic SFC with k = 3 primary sub-SFCs and r = 1 backup sub-SFC. Any backup VNFψ allocated to DC1, DC2, and DC3 is able to replace any primary VNF of the same type ψ, where . All primary sub-SFCs are placed in DC4, with the first and second sub-SFCs being placed in Rack5. All VNFs of sub-SFC f1 are placed in server S4. In contrast, all VNFs for f2 are distributed over different servers, i.e., S1, S2, and S3 in Rack5. The third sub-SFC for f3 is distributed over three different racks in DC4 so each VNF is placed in different racks and servers, i.e., VNF1, VNF2, and VNF3 in S1 (Rack 2), S3 (Rack 3), and S3 (Rack 4), respectively. In contrast, each backup VNF is placed in individual DCs, i.e., DC1, DC2, and DC3. In DC4, VNF1 and VNF2 of primary sub-SFC for f2 can fail; for instance, VNF1 in S1 and server S2 fail in Rack 5, respectively. As a result, workload flow f2 is redirected toward DC1 and DC2 to be processed by backups VNF1 and VNF2. If primary VNF3 of sub-SFC for f3 also fails, for instance, due to the failure of Rack 4 in DC4, then sub-request f3 can be redirected to backup VNF3 placed in DC3. Although there are two hardware and one VM failures resulting in loss of three different primary VNFs, the SFC remains available as long as the corresponding backup VNFs, i.e., VNF1, VNF2, and VNF3, are available.

2.3 Hierarchical Component Failures

Even though we show that our analysis does not depend on specific data center network topology, it is important to recognize the hierarchical nature of the data center network topologies, which is critical to failure propagations, i.e.,
Data Center (DC). The failure of an entire data center can be caused by connectivity failures in wide area network or between core routers and forwarding switches, as well as by failures of all core routers. DC failure hence propagates as failure of all racks, servers, and VMs.
Rack (R). Rack failures can be caused by failures of forwarding, failures of ToR switches, as well as their links and connectivity. Any failure of a rack causes the failure of all servers and all VNFs in that rack.
Server (S). A server can fail and become unavailable in case of failure of any server’s hardware components, i.e., power supply, memory, and so on. A server can also fail due to a failed link between server and ToR or when vS fails resulting in unavailability of all VMs hosted by the server.
VM/Container. We assume that one VM or container is reserved for one VNF, indicating that a failure of VM or the container, e.g., due to misconfigurations or application software issues, causes failure of that VNF only.
In sum, a failure of a component in a higher level of topological hierarchy generally results in failures of all connected components in lower levels of the hierarchy.
Each component can be qualified by a component-dependent reliability or availability, which are temporal functions of a Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR), respectively [21]. In this article, since our focus is on the spacial interdependency and the related hierarchical failure propagation, and we treat temporal reliability (availability) of a single component as input parameter. Let us illustrate this with DCN presented as hierarchical tree in Figure 2. The root node is DC, which belongs to the highest hierarchy level; VNFs represent leaf nodes at the lowest hierarchy level. The links between any nodes define interdependency between components, whereas the components of the same type within the same hierarchy level are independent of each other. The components of the lower hierarchy level do not impact components at the higher hierarchy level. For example, for SFCs placed in DC4, a failure of VNF1 will not affect the availability of VNF2 and VNF3 placed in R1 and servers S2 and S3, respectively (Figure 2). The availability of S1, R1, and DC4 will not be affected by VNF1 failure either. In contrast, the components of higher hierarchy level impact the lower hierarchy components, e.g., a failure of R1 (level 2) in DC4 will result in failures of all servers (level 3) inside this rack and all VMs (level 4), i.e., failure of all VNFs (VNF1, VNF2, and VNF3). Thus, the failure of an entire SFCs needs to be considered. As shown in Figure 2, VNFs can be placed in shared components: SFC allocated to R1 has two shared components (R1 and DC4), VNFs of SFCs placed in R2 share R2 and DC4 and SFCs hosted by S2 and S3 in R3 have a shared server (S2 or S3), R3 and DC4. As a result, failure of any shared component results not only in failures of components from the lower hierarchy level but also in failure of the entire sub-SFC, i.e., Ψ VNFs of different types.
Fig. 2.
Fig. 2. Placement of SFCs: Modeling of hierarchical component placements.
Our assumption is that DCN topology exhibits a high level of connectivity, such as leaf-spine topology [2, 23, 28], which in turn allows us to assume that the availabilities of DC, racks, servers, and VM are statistically independent of switch and link failures. For instance, we assume that a certain rack is available as long as at least one core router, one forwarding switch, ToR, and at least one link between them are all available. This assumption, however, can lead to an overestimation of service availability value in DCNs with lower connectivity, such as with fat-tree topology [2, 14]. For example, DC and rack availability can depend on availability of core routers, as failure of a certain core router will lead to a failure of some forwarding switches and links reducing reliability of pods and racks. This limitation can be addressed by a slight modification of the hierarchical network model presented in Figure 2, where instead of DC used as the highest hierarchy level, core routers can be used instead, and an additional hierarchy level (level 2) for forwarding switches of aggregation layer can be included in the model. In other words, more hierarchy levels would yield a more accurate reliability analysis for DCNs with lower connectivity. Without loss of generality, and to simplify further discussion as well as to limit the amount of parameters and variables in our reliability analysis, we consider a highly connected DCN with four hierarchy levels.

2.4 Definitions and Parameters for VNF Placement

Let us assume hierarchy levels in DCN topology, in consideration of sharing, disjointness, interdependency, and heterogeneity of components. Each hierarchy level is described by a variable c, , and at the same time it uniquely determines the component type in DCN, i.e., DC, rack, server, and VM (containers). For instance, at hierarchy level 1 the component type is defined as DC, while at hierarchy level it is VM/container/VNF. Without loss of generality, all VNFs, i.e., component type , of a certain sub-SFC, i.e., each VNFψ, where , can be placed in the same component of higher hierarchy level, e.g., in same DC or same rack or same server, and, thus, all Ψ VNFs share this component.
Definition 1.
Any component c, , which is utilized by all VNFs of the same sub-SFC, i.e., VNFs of different types, is referred to as shared component, whereby the number of unshared component types for a certain sub-SFC is defined as , , is the highest value of c, i.e., the highest hierarchy level, over all component types shared by VNFs in a sub-SFC.
Since any VNF is placed in a separate virtual machine or container, i.e., disjointedly from any other VNFs, the highest value of shared component is (server) and the minimal number of unshared component types is, then, defined as . Figure 3 represents a hierarchical model of a generic SFC earlier shown in Figure 1(b). In this example, a generic SFC consists of n = 4 sub-SFCs, whereby each sub-SFC has different number of shared and unshared Δ components. For instance, when a whole sub-SFC is placed in the same server, such as VNF1, VNF2, and VNF3 in S4, R5, and DC4. In contrast, the number of unshared components of sub-SFC placed in DC1, DC2, and DC3 is , where different VNF types (VNF1, VNF2, VNF3) are placed in different DCs, racks, servers, and VMs. In this case, there are no shared components, i.e., . Generally, the type of shared component shows its lowest hierarchy level, whereby the total number of shared component types, i.e., hierarchy levels, is . To model a VNF placement strategy, we next introduce an additional parameter:
Fig. 3.
Fig. 3. Hierarchical structure of a generic SFC with three primary and one backup sub-SFCs from Figure 1(b), component and reliability classification of sub-SFCs based on placement strategy.
Considering all sub-SFCs of a generic SFC, we define a parameter called disjointness degree , where , used to indicate the number of different component types, i.e., hierarchy levels, that are utilized to separate n VNFs of the same type ψ as they are placed rank ordered.
When VNFs of a certain type ψ, for instance, Ψ = 3 meaning VNF3, of a generic SFC are placed in different DCs, racks, servers, and VMs, the disjointness degree is determined as . In Figure 3, SFC with four sub-SFCs placed in DC1, DC2, DC3, and DC4 has the overall disjointness degree as all VNFs of each type ψ in generic SFC are placed in different servers and different VMs, i.e., sub-SFC1 and sub-SFC2. When we now assume that a generic SFC presented in Figure 3 consists of sub-SFC3 and a backup sub-SFC only, which are placed in DC4 (R2, R3, R4), and DC1, DC2, and DC3, respectively, the VNFs of a certain type ψ are placed in different DCs resulting in a disjointness degree of .
We consider component heterogeneity to reflect upon the assumption that different component types, such as racks and servers, and components of the same type, e.g., VNFs, can have different component reliability. This assumption is of practical importance, as different component types provide different reliability by design, e.g., DCs are more reliable than VMs, and components of the same type can have different reliability due to different lifetimes and level of utilization. Thus, we assume that component reliability of any component c depends on VNF type ψ allocated to this component c. The following notation is used: Any unshared component is described as , where , and any shared component , where . Considering component heterogeneity and VNF placement strategy, we additionally assume that each (sub-)SFC builds a certain reliability class ξ.
Each sub-SFC in a generic SFC belongs to a certain reliability class ξ, and , defined by its placement strategy, i.e., and Δ, and reliability value of each used component of each hierarchy level c, . Multiple sub-SFCs can belong to the same class ξ if they are placed in the same fashion and use components characterized by the same reliability per hierarchy level, i.e., iff , and , where .
Generally, there can be components of type c inside a component of type utilized to accommodate the reliability class ξ. In Figure 3, for example, data center (DC4), rack (R5), servers (S1, S2, S3) and virtual machine per server (VNF1, VNF2, VNF3) are required to place sub-SFC of reliability class . In real networks, multiple sub-SFCs can follow the same or different placement strategies and may share some components or be placed disjointedly from each other as discussed above. In Figure 3, all four sub-SFCs belong to four different reliability classes, because each sub-SFC utilizes different VNF placement strategy. Referring back to Figure 2, however, the basic placement strategy shown in Figure 3 by sub-SFC of reliability class is utilized by both sub-SFCs placed in DC4 and R3. When all shared components c = 3, i.e., both servers S2 and S3, as well as all unshared components , , i.e., VNFs, have the same component reliability, respectively, both sub-SFCs will belong to the same reliability class. Thus, these two sub-SFCs that follow the same VNF placement strategy can belong to the same reliability class, when and , or to different reliability classes, when or . In contrast, each unshared component of any type c and reliability class ξ utilized to allocate a certain VNF type ψ can have different component-dependent reliability, i.e., . As a result, there can be at most reliability classes, whereby , , sub-SFCs can belong to the same reliability class ξ. Depending on placement strategy, multiple sub-SFCs of different reliability classes can be allocated to the same components; see three sub-SFCs in Figure 3 that are all placed in DC4.
Definition 4.
The components at any hierarchy level c, whereby , which are jointly utilized by sub-SFCs of different reliability classes are referred to as common root components.
For instance, in Figure 3, the sub-SFCs of reliability class have two common roots, i.e., R5 and DC4, while DC4 is the common root of three reliability classes, i.e., . The backup sub-SFC of class is placed separately from other reliability classes and does not have any common roots.
To describe the common root components of any type, i.e., hierarchy level, c and all related reliability classes combined by this component c, we introduce a set , where any describes a component type and each presents a set of indexes of reliability classes hosted by this component , i.e., and . Since all components of type are always disjoint, i.e., different VMs, they can not be a common root components resulting in , i.e., DC, rack, or server. Note, that using the component type , i.e., server, as a common root will result in the same placement strategies for all sub-SFCs, which utilize this root. Considering the example in Figure 3, the configuration set for the common roots can be determined as with and .

3 SFC Reliability Analysis

Without loss of generality, we define the reliability of a generic SFC as a probability that at least k≥ 1 sub-flows can successfully traverse k out of n = k + r sub-SFCs, where r≥ 0. Since each sub-SFC consists of Ψ different VNF types, at least k VNFs of each type ψ have to be available during service runtime, i.e., in total VNFs. As different sub-SFC can belong to different reliability classes, the component-dependent reliability of each shared and unshared component of type c utilized to accommodate certain VNFs of class ξ are denoted as and , respectively. Generally, reliabilities of any data center, any rack, any server, and any VM can differ (heterogeneity). At the same time, only shared components of a certain type c utilized to host any sub-SFC belonging to a certain class ξ have an equal component-dependent reliability , where , , relates to hierarchy level, i.e., defines component type, utilized for a placement of class ξ.
With r backup sub-SFCs, any component from any hierarchy level c and any reliability class ξ is allowed to fail as long as these failures result in a failure of at most r VNFs of any type. To assess the number of components that can fail without affecting the SFC, we introduce a parameter Acceptable Component Failures (ACF) for common root components, for shared and for unshared components of type c. ACF defines the amount of failed hardware and software components of any type that do not lead to service interruption. It is obvious that when no backup or other protection is applied, there is no ACF, and all primary components, i.e., DCs, racks, servers, and VNFs, must remain available/reliable during service runtime. The ACF strongly depends on the number of backup components and the placement of VNFs of a certain reliability class inside DCs, racks and servers. Additionally, we introduce a variable , which describes the number of remaining available components of a certain type c and reliability class ξ, which are not affected by failures of other components, but can still fail during service runtime. Using these two parameters, we can derive the placement-independent and placement-dependent SFC reliability. Without loss of generality, when and , SFC reliability can be generalized as a probability that at least out of components of type c are available to maintain at least k sub-SFCs. That can be described by a well known Binomial formula determined as , where is a probability mass function of binomial distribution.

3.1 Available Components and Acceptable Component Failures

The acceptable component failure (ACF) of shared , unshared , and common root components generally shows the bound of any summation in Binomial formula used for reliability calculation and is a function of available () and failed () components of type c from the same reliability class ξ and the failures of any other components of higher hierarchy level (, ), which cause failures of components of the lowest hierarchy level over all reliability classes.
As at most r VNFs of each type of any reliability class are allowed to fail, we need to define ACF for each reliability class and each component type over all reliability classes. This requires, however, a consideration of failure interdependency between components and reliability classes as, e.g., and , where each previously assumed failure reduces the acceptable failures of component and the amount of remaining available components of type c and class ξ, respectively.
Depending on VNF placement strategy, different reliability classes can utilize the same common root components, while a failure of common root will result in complex failure propagation affecting multiple reliability classes, i.e., multiple sub-SFC. A failure of one common root can generally cause a failure of more than r sub-SFCs and lead to service interruption. Thus, let us first derive ACF of the common root components of any reliability class from set , i.e., . Since there is only one common root component per reliability classes, the amount of available roots and the ACF of the root can be either 0 or 1. Both depend on the number of VNFs of each type, which will fail if a certain root fails, and on a number of the already failed roots. Generally, the root component can only fail if it is available. A specific common root component is, however, only available when the other roots from the higher hierarchy do not previously fail leading to the failure of , i.e., , where is a number of failed common root . Then, the amount of available common roots is defined as
(1)
When , ACF of root can hypothetically be . However, to maintain the service, the total amount of failed VNFs due to failure of root and other root components may not be larger than the number of backup VNFs, i.e., r. The amount of VNFs of a certain type belonging to a reliability class can be calculated as VNFs, which will fail with failure of the root component type . Since there are reliability classes affected by failure of root component of type , the total number of failed VNFs of certain type of any reliability class , is determined as , where and . Since there can be other common roots from the higher hierarchy level that affects the root under consideration as well as other roots, we need to take into account all other VNF failures due to failure of all other roots. Thus, the amount of failed VNFs of certain type over root components can be calculated as a function of any , i.e., , where , iff , and ensures that the failures of VNFs from the same reliability class are considered only once. The last formulation can take into account any VNF placement strategy, where, for instance, DC and rack inside this DC represent two common roots for the same reliability classes, whereby DC can combine more reliability classes and lead to failure of rack if it fails, as shown in Figure 3. Summarizing all constraints above, the ACF of the common roots is
(2)
where and , iff .
Since all components of a certain reliability class are placed separately from components of any other reliability class per definition and only some common root components can be utilized to combine different classes, we need to additionally consider each reliability class ξ separately. A number of available components and ACF is thereby a function of VNF placement strategy and amount of available and failed components of any type in reliability class ξ. As DCs are the components from the highest hierarchy level and independent of failures of any other component types, the amount of available DCs that are not the common root components is equal to the amount of data centers required to accommodate sub-SFCs of reliability class ξ, i.e., . The failure of DCs causes a failure of racks reducing the overall number of available racks as , where . Then, the remaining amount of servers after DC and rack failures is , where . After failures of DCs, racks, and servers, the remaining amount of VNFs of any type is , if . Since some reliability classes have a common root component , which impact multiple reliability classes, i.e., any ξ, , we need to consider the availability and failure of common root components from a set Φ as for any . Thus, the amount of available primary and backup components of any component type of any reliability class ξ, after the failure of any component of higher hierarchy level, i.e., of type 1 to , is determined as follows
(3)
where the first case describes a number of the common root components of a certain type c and is the same for any reliability class ξ from set . However, when there is one reliability class only, H = 1, there are no common root components and the first case in Equation (3) will be never true. Other two cases describe the amount of available shared and unshared components.
As multiple VNFs of different type of a certain reliability class ξ can share some components, the number of any failed components of type can vary as for these shared components and as for disjoint (unshared) components in reliability class ξ. Here, any out of components of type of reliability class ξ can fail without interrupting the end-to-end service. Generally, ACF, i.e., for component shared by different VNF types and for unshared component, is a function of available components of type c and a reliability class ξ, the amount of provided backup VNFs r, and the number of VNFs considered as already failed after components of any type c, , and any class ξ, failed. The amount of failed components of the lowest hierarchy level and a certain type, e.g., VNFs1, due to failure () of any component types , i.e., , caused by failure of component can be generalized as an iterative function:
(4)
where failures of any component from the higher hierarchy level, from 1 to c, are taken into account. The failure of components of type is caused by failure of components of type 1 to c, i.e., , and reduces the amount of available backup VNFs r, which are generally allowed to fail, i.e., reduces ACF. The failure of any common root can be taken into account by Equation (4) with following assumption: , if and .
Then, ACF of any shared component is either the total number of available VNFs provided by a reliability class ξ and placed inside component c, i.e., or, the total number of available backup VNFs r reduced by any failures of common roots, and shared components of any type and any reliability class, whichever is lower. The resulting ACF shows how many components c of a reliability class ξ may fail and still keep the service reliable. The amount of backup VNFs is reduced by the VNF failures due to failures of the components from the higher hierarchy level, i.e., from 1 to c-1, and the certain reliability class ξ, which can be calculated with Equation (4) as . Additionally, the number of available backup VNFs is reduced by VNF failures due to a failure of any shared component from any other reliability classes from class 1 to class . These failures can be also determined with Equation (4) as , where defines the lowest hierarchy of a shared component related to the reliability class l. Thus, to define ACF of component , we need to take into account remaining backup VNFs that can fail without service interruption. Thus, the amount of VNFs of class ξ, which may fail, are defined as . Since we are interested in ACF of component , which can be from any hierarchy level, i.e., , we need to normalize by the amount of VNFs allocated to component only, i.e., by . This is also the number of VNFs, which will fail if one component fails. As a result, ACF for a component type shared by multiple VNFs of different type and a reliability class ξ can be derived as follows:
(5)
where the sum of all identifies all failures of component types from 1 to of reliability classes from 1 to and, finally, all prior failures of shared components. Obviously, if any prior reliability class have unshared components, then all Ψ VNFs as well as all related component types of this class l are disjoint and there are no shared components, i.e., .
Similarly, ACF of unshared components of any type and any reliability class ξ, i.e., , is a function of the remaining available components of type c from a reliability class ξ, the amount of remaining backup VNFs out of r reserved VNFs, and the number of VNFs considered as failed due to failure of common roots, shared components, and unshared components of different reliability classes. Thus, there is a need to define the amount of VNFs that may still fail without interrupting the service, i.e., the minimum between the total number of available VNFs provided by the reliability class ξ and placed inside component , i.e., , and the amount of remaining backup VNFs after failures of any other common roots, shared and unshared components from any other reliability classes. Thus, the number of backup VNFs r is already reduced by possible failures of shared and unshared components from hierarchy level 1 to of any reliability class l, . These failed VNFs from different reliability classes can be calculated with Equation (4) as . Some VNFs of reliability class ξ could fail due to failures of components from a higher hierarchy level belonging to the same class ξ. Their number can also be determined by Equation (4) as . Finally, any other reliability class l, , can affect ACF of class ξ, if they utilize some shared components that could fail resulting in additional VNF failures. Thus, the amount of remaining backup VNFs of a certain type is determined as and defines the maximal number of VNFs that may fail. To calculate the amount of components , which can fail resulting in allowed VNFs failures calculated above, the required normalization is performed with similar to Equation (5). This product determines the number of VNFs placed inside component . As a result, ACF for a component utilized for disjoint placement of Ψ different VNFs of a reliability class ξ is given as follows:
(6)

3.2 Placement-independent SFC Reliability

SFC reliability is independent of VNF placement strategy when either (i) only one component type can fail, e.g., servers only, whereas a single VNF in the server does not fail, or, (ii) all VNFs are placed in a fully disjoint manner, i.e., maximal disjointness degree and maximal number of unshared components , e.g., each VNF is placed in different DC. Without loss of generality, in case of fully disjoint VNF placement, the reliability of any VNF type ψ of reliability class ξ can be defined as and considered as a failure of one component, e.g., VNF, which just contains multiple sub-components from 1 to . To this end, we follow the most common approach assuming component disjointedness and failure of one component type c only. This enables us to significantly reduce the complexity of reliability analysis, as there are no shared components and no failure interdependency, while the component reliability is only a function of component-dependent reliability and ACF . Thus, we assume that components from a certain hierarchy level c, which contain one VNF only, can fail and all other components from any other hierarchy level , , where , have a reliability of 100%, i.e., . As a result, the end-to-end service reliability is independent of the VNF placement strategy. We extend this approach by an assumption of multiple reliability classes, whereby only components of type c of any reliability class ξ can fail. Then, the service is successful if at least k out of n components of type c and, thus, k VNFs of each type ψ from any reliability class ξ are available.
Lemma 1.
The placement-independent SFC reliability is a function of the number of reliability classes H, SFC length Ψ, ACF for any reliability class and the number of available components required to maintain k sub-SFCs, i.e.,
(7)
where and .
Proof.
Let us first assume that (i) only VMs (VNFs) can fail, i.e., c = 4, (ii) a generic SFC consists of Ψ = 1 VNF types, and (iii) all n VNFs have the same VNF reliability, i.e., there is only one reliability class (H = 1) and . Thus, the SFC reliability follows the parallel reliability model described by Binomial formula for calculation of the probability that at least k out of components are available, i.e., .
For instance, when reliability of primary VNFs differs only from reliability of backup VNFs of a certain type ψ such as , there are two reliability classes, H = 2. Then, the SFC reliability is determined as a probability that at least k VNFs over all VNFs of each type ψ do not fail and at least k VNFs of any type are available to serve k sub-flows. Since we first assumed SFC with one VNF type, Ψ = 1, at least primary VNFs, where , and at least , where , backup VNFs have to be available, while also backup VNFs can fail without service interruption. Thus, ACF of primary and backup VNFs can be determined as and , respectively. Following the Binomial formula and, additionally, applying the serial reliability model as VNFs from both reliability classes, i.e., at least k in total, have to be available at the same time; the SFC reliability is determined as , if Ψ = 1, , and H = 2.
It is obvious that this example can be extended to reliability classes, which would require us to extend the equation above to up to H summations, i.e., one per reliability class. Then, the ACF of any reliability class ξ can be calculated with Equation (6) and simplified as , where the number of failed VNFs over all reliability classes, i.e., from 1 to , is not larger than the number of backup VNFs provided. Thus, the SFC reliability for and is
(8)
where is a probability that there are at least k available out of n pre-reserved VNFs over all H reliability classes.
Next, let us assume that a generic SFC consists of different VNFs. Then, the SFC reliability is determined as a probability that at least k out of VNFs of each type ψ out of Ψ types do not fail and at least k sub-SFCs can be composed by VNFs of each type to serve k sub-flows. When all VNFs of the same type ψ have the same VNF reliability , i.e., c = 4 and , there is only one reliability class (H = 1, ) and SFC reliability is a probability that at least k VNFs of each type ψ, e.g., k VNF1, k VNF2, are available as described by above. The SFC reliability is then . Finally, the service reliability can be generalized as
(9)
Similarly, when there are H≥ 1 reliability classes and all VNFs of type ψ from the same reliability class ξ have the same VNF reliability , the overall service reliability is derived with Equations (8) and (9), as , which results in Equation (7).□

3.3 Placement-dependent SFC Reliability

The SFC reliability is a function of VNF placement strategy when we consider the amount of available and failed primary and backup DCs, racks, servers, and VNFs involved as well as their interdependency. Let us further simplify notation as and for shared and unshared components, respectively, to reduce a size of some formulas provided below, where , , , and , . Our main assumption that all shared components from a hierarchical level c utilized to place sub-SFCs of reliability class ξ and all unshared components utilized to place VNFs of type ψ and class ξ have the same component reliability and , respectively, is still valid.

3.3.1 One Reliability Class.

There is only one reliability class H = 1 when all n primary and backup sub-SFCs are placed in the same fashion and all shared components of each type c have the same component-dependent reliability, i.e., . In contrast, each unshared component of any type c can have different component-dependent reliability, i.e., , . As we do not need to differentiate between reliability classes here, we do not use index ξ for clarity. To place all n VNFs of a certain type ψ, e.g., VNF1, primary and backup DCs, primary or backup racks inside any DC, i.e., in total primary and backup racks, and primary or backup servers inside any rack, i.e., in total primary and backup servers, and VNFs are required, whereby any server can host VNFs of the same type ψ. Similarly, the same amount of DC, racks and servers are utilized to place each other VNF type from the same sub-SFC, e.g., VNF2, ..., VNFΨ. Thus, the number of components from hierarchy level c utilized for SFC placement can be calculated as , if c = 1, or else as . However, some of these components, i.e., Δ unshared component types from the lower hierarchy level, are utilized for a certain VNF type ψ only; other component types are shared by VNFs of different types. Thus, we need to consider the VNF placement strategy and the resulting number of shared and unshared components by taking into account the components’ interdependency.
Lemma 2.
When there is one reliability class, H = 1, the placement-dependent SFC reliability is a function of the amount of unshared components Δ, SFC length Ψ, ACF, and the amount of available components for any component type c, i.e.,
(10)
where , if .
Proof.
Let us first assume that SFC consists of Ψ = 1 VNF types, e.g., VNF1 only, resulting in no shared components. Moreover, all n = k + r primary and backup VNFs as well as all DCs, racks, and servers involved to place these n VNFs have the same component reliability p4, p1, p2, and p3, respectively. Then, the SFC reliability can be calculated with Binomial formula using the previously discussed in Section 3.2, whereby it is required to consider the availability of all component types involved , i.e., a minimal amount of DC, racks, and servers required to ensure availability of at least k out of n VNFs, i.e.,
(11)
To extend Equation (11) to arbitrary SFC length, i.e., VNFs, we need to consider the VNF placement strategy and the resulting number of shared and unshared components Δ. Let us first assume that each VNF type ψ of a certain sub-SFC is allocated to different components of any hierarchy level c, , i.e., no shared components. Thus, any VNF type ψ is placed in different DC, different rack, different server, and VM resulting in unshared component types. Then, service reliability can be determined as a function of the number of unshared components Δ and SFC reliability defined in Equation (11). Using the serial reliability model similar to Equation (9), the service reliability can be calculated as . However, for any specific placement strategy based either on SFC or VNF type, there can be shared components that accommodate a whole sub-SFC. When different VNF types, i.e., sub-SFCs, are placed in the same DC but different racks, servers, and VMs (), the failure of DC, i.e., of the shared component, affects all Ψ VNFs of a certain sub-SFC and, thus, needs to be considered once and not separately for each out of Ψ VNF types. Thus, the SFC reliability can be calculated by separating the probability that there are at least available DCs required for maintenance of at least k sub-SFCs, whereby the following is still valid: . Similarly, when different VNF types, i.e., sub-SFCs, are placed in the same DC, same rack, but different servers resulting in unshared component types, there are 2 shared components, i.e., DC and rack. Thus, any failure of DC or rack will affect all Ψ VNFs and needs to be considered once to calculate the probability that at least DCs and racks are available to build k sub-SFC of Ψ VNFs each. Then, the SFC reliability is determined as . In case of full VNF disjointness (), i.e., different VNF types are placed in the same server and, thus, in the same rack and same DC, i.e., three shared component types, a failure of a server, rack, or DC will lead to a failure of a whole sub-SFC, i.e., a failure of all Ψ VNFs. Thus, only VNFs are independent of each other resulting in SFC reliability defined as As a result, the number of summations outside and inside the brackets, in the formulas for reliability calculation above, is a function of the number of unshared (Δ) and shared () components. In other words, the summations for the shared components are outside the brackets, i.e., summations, and the Δ summations for component types whose failure impacts one VNF type ψ only, are placed inside the brackets. Thus, without loss of generality, the SFC reliability for any value Δ is determined by Equation (10).□

3.3.2 H-Reliability Classes.

We next consider a generic SFC with n primary and backup sub-SFCs where any sub-SFC can belong to one out of H, , reliability classes. Then, an index ξ, , shows a certain reliability class of any primary or backup component utilized to place the sub-SFCs from this reliability class ξ. Thus, any component type c utilized to host reliability class ξ and its component reliability are noted as and of shared or of unshared components, respectively. Multiple sub-SFCs can belong to a certain reliability class ξ if these sub-SFCs are placed following a certain placement strategy and all involved component types have a certain component reliability value or each. Note that unshared components of hierarchy level c utilized for placement of a certain VNF type ψ of a certain SFC reliability class ξ have the same component reliability, i.e., , which can differ from component reliability of unshared components used to place another VNF type from the same reliability class ξ, i.e., . Without loss of generality, there are VNFs per sub-SFC, an arbitrary number of involved component types ( types) to place these VNFs and an arbitrary number of reliability classes (H for primary and backup sub-SFCs), whereby each reliability class has its own number of unshared components, . Thus, the service reliability needs to consider the placement strategy and the resulting number of unshared components of any reliability class ξ. Depending on placement strategy, some sub-SFCs from different reliability classes can have common root components, e.g., are allocated to the same DC or in the same rack. The failure of these common root components results in failures of multiple relevant component types and the whole sub-SFCs. All reliability classes combined by the common root and the related root component are collected in set , where is an index, . Any describes a component type c of the common root component, where w is a set of reliability classes accommodated by the common root component of type c, i.e., and . Thus, is a th set of reliability classes, which are allocated in component c from collection Φ. Then, the resulting expression for service reliability needs to take into account H≥ 1 reliability classes of SFC of length , the shared and common root components utilized for allocation of sub-SFCs from any reliability class ξ, i.e.,
Lemma 3.
When there are multiple reliability classes, H≥ 1, the placement-dependent SFC reliability is a function of the number of unshared components Δ, SFC length Ψ, ACF, and the number of available components for any component type c of a certain class ξ, whereby the shared and common root components utilized for allocation of sub-SFCs from a certain reliability class ξ need to be taken in consideration as follows:
(12)
where the function ensures that the common root components for any reliability class are considered only once by the first summation over , i.e.,
(13)
Proof.
First, we assume that the components from different reliability classes do not have any common root components, which corresponds to the inter-DCN placement of primary and backup sub-SFCs. Thus, the sub-SFCs of a certain reliability class ξ are placed disjointedly from any other reliability classes. Assumption that any SFC contains only one VNF, Ψ = 1, additionally relaxes the placement dependency, as there can not be any shared components, i.e., . Then, service reliability can be determined based on Equations (8) and (11) as follows:
(14)
where failure and availability of each component type and each reliability class are described by own summation and , i.e., . Thus, all component types, i.e., from type 1 to type are considered for each reliability class ξ, , and any summation determines the probability that the service will not fail in case of failures of any component type c of a reliability class ξ, whereby at most failures of each unshared component are allowed.
Let us next assume that some sub-SFCs from different reliability classes have common roots, while sub-SFCs consist of one VNF, i.e., Ψ = 1, avoiding component sharing by different VNF types. A failure of the common root components generally results in a failure of all component types of lower hierarchy levels utilized to place corresponding reliability classes and, thus, a failure of multiple sub-SFCs. Set Φ collects all reliability classes in set accommodated by the common root , which affects ACF and the number of available components of all associated classes, i.e., . Thus, to take into account the common roots for different reliability classes, Equation (14) for Ψ = 1 is modified as follows:
(15)
where , i.e., , and the function determined by Equation (13) ensures that the common root components for any reliability class are considered only once, i.e., by the first summation over .
Next, let us assume that SFC consists of VNFs and there are no common root components, but some sub-SFCs are placed in the same shared component, i.e., all Ψ different VNF types are hosted in the same component. Thus, each reliability class ξ has its own number of unshared components , , which needs to be considered by the expression for the service reliability. Taking into account a reliability class ξ that provides unshared components, i.e., , and shared components, i.e., , due to a certain placement strategy, there are summations for each reliability class ξ, whereby summations for shared components and summations for unshared components should be separated similar to Equation (10). Thus, the service reliability can be determined as follows:
(16)
where and describe ACF of a shared and unshared component type of any reliability class ξ, respectively.
When the number of unshared components is maximal, i.e., , all component types are disjoint and all related summations will be inside the brackets in Equation (16), i.e., . Similarly, in the case of , only components of type , i.e., VNFs, are disjoint. This results in only one summation per reliability class ξ inside brackets. When , i.e., only component type 1 (DCs), is a shared component for all Ψ different VNF types of reliability class ξ, there is one summation outside the brackets, which describes a probability that there are enough available components of type c = 1 to maintain the service.
When VNFs from sub-SFCs of length are placed in such way that there are shared components and common root components described by a set Φ, we can derive the service reliability with Equations (16) and (15). The resulting equation for SFC reliability takes into account H≥ 1 reliability classes of SFCs of length , their common root and shared components, as well as failures of any out of component types involved for allocation of sub-SFCs from a certain reliability class ξ, i.e., , and represents Equation (12).
Finally, we can derive Equation (13), where needs to ensure that the common root components are considered only once by the first summation in Equations (12) and (15) and any component type that is not the common root component is considered in the following summation for calculation the probability that there are enough available components of type to maintain the service. As a result, needs to be additionally considered when either the reliability class ξ of component type does not have any common root components, i.e., , or ξ have some common root components with other reliability classes, i.e., , but the considered component type c is not the common root, i.e., .□

4 Numerical Results and Discussion

This section provides numerical results that show reliability of generic SFCs and verify accuracy of the provided analytical model. We investigate the placement-dependent SFC reliability considering all possible VNF placement strategies using racks, servers, and VM to place VNFs over inter- and intra-DCNs and show the impact of amount of sub-SFCs and backups, SFC length, and component reliability as well as the number of reliability classes on the resulting SFC reliability.
We present analytical results of placement-independent and placement-dependent service reliability calculated with Equation (10) for and and Equation (12) for all other placement strategies; ACF of shared, unshared, and common root components were determined with Equations (5), (6), and (2); the number of available components from any hierarchy level and number of failed VNFs were calculated with Equations (3), (1), and (4), respectively. We verified the analytical model and analytical results presented by Monte Carlo simulations. Since the simulation results overlapped with analytical results (with 95% confidence interval), we present analytical results only, where and, in the most cases, SFC reliability values are sorted in descending order.
We first evaluate the placement and component-dependent reliability of generic SFCs. Assuming that all sub-SFCs follow the same placement strategy, we consider all 16 possible placement strategies over DCs, racks, servers, and VMs, as presented in Figure 4. Each placement strategy is characterized by different amounts of unshared components Δ and minimal number of protected component types . Some DCN components can host either multiple sub-SFCs or multiple VNFs of the same type. Thus, we distinguish between “SFC based placement,” i.e., the entire sub-SFC is placed in the same hardware components and VNFs of the same type are separated, and “VNF type based placement,” i.e., VNFs of the same type are placed together in any hardware component. The related placement strategy (Pl.) in the tables below is marked with v for “VNF type based placement” and with s for “SFC based placement.”
Fig. 4.
Fig. 4. All basic VNF placement strategies over inter- and intra-DCNs, when n = 2 and Ψ = 2.
The placement strategies described as (, ), (, ), (, ), and (, ) in Figure 4 present special cases, as they have features of both placement strategies, i.e., “v” and “s,” whereby each VNF of each type is placed in different DCs, same DC and disjoint racks, same rack and disjoint servers, and in the same server, respectively. In other words, sub-SFCs and VNFs of different types are not separated. While the number of unshared components Δ, disjointness degree , the number of reliability classes and the common root components collected in Φ are defined by the selected placement strategy, the component-dependent reliability , SFC length, and the number of primary and backup sub-SFCs (k and r) required to maintain and to protect the service have to be predefined and adapted to a certain use case. To be able to see the generic dependence of SFC reliability on placement strategy, we assume for the numerical evaluation that component reliability is independent of reliability class and VNF type, i.e., . Thus, sub-SFCs can only belong to different reliability classes if they utilize different placement strategies, resulting in at most reliability classes. Thus, we investigate different scenarios by defining component reliability in two ways: (1) Case 1 as more practically relevant, where , , , [14]; and (2) Case 2 as an idealized scenario, where . We investigate the SFC reliability over primary sub-SFCs with VNFs and different amount of backup sub-SFCs r.
Let us first assume that there is only one reliability class H = 1 and one primary sub-SFC (k = 1) and evaluate the impact of Δ, , Ψ and r on the resulting service reliability. Table 2 shows the SFC reliability for Case 1 and Case 2 as a function of different VNF placement strategies defined through Δ and and shown in Figure 4. Without backup protection (r = 0), the SFC reliability is independent of , since there are only one (sub-)SFC and no VNFs of the same type utilized. However, as expected, for a fixed the SFC reliability decreases with increasing number of unshared components Δ and increasing SFC length Ψ. For fixed Δ but different , the SFC reliability is identical. This is because an increasing Δ indicates that the probability increases that a component can fail due to hardware issue. As expected, Case 2 provides higher SFC reliability due to higher component reliability defined, but reflects the same functional dependency. Without any backup protection, the placement strategies resulting in provide the highest SFC reliability.
Table 2.
ΔPl.Case 1Case 2
Ψ = 4Ψ = 8Ψ = 4Ψ = 8 
r = 0r = 1r = 100r = 0r = 1r = 100r = 0r = 1r = 100r = 0r = 1r = 100 
41s0.959520.99951234→ 10.92170.9990295→ 10.999930.9999999963→ 10.999890.9999999935→ 1
42s0.956650.99950754→ 10.91520.9990157→ 10.999900.9999999948→ 10.999820.9999999900→ 1
43s0.956360.99950737→ 10.91460.9990150→ 10.999870.9999999939→ 10.999750.9999999879→ 1
44v, s0.956330.99950735→ 10.91450.9990149→ 10.999840.9999999936→ 10.999680.9999999872→ 1
31s0.959520.999503140.999989990.92170.99902100.9999890.999930.99998999760.9999890.999890.99998999560.999989
32s0.956650.999498410.999989990.91520.99900740.9999890.999900.99998999670.9999890.999820.99998999350.999989
33v, s0.956360.999498240.999989990.91460.99900670.9999890.999870.99998999640.9999890.999750.99998999280.999989
34v0.956330.999468250.999960000.91450.99893670.9999200.999840.99995999700.9999600.999680.99991999560.999920
21s0.959520.999411180.999890000.92170.99893660.9998900.999930.99997999880.9999800.999890.99997999760.999980
22v, s0.956650.999407020.999890000.91520.99892420.9998900.999900.99997999850.9999800.999820.99997999690.999980
23v0.956360.999107220.999590060.91460.99822520.9991900.999870.99994999940.9999500.999750.99991000040.999910
24v0.956330.999077250.999560070.91450.99815530.9991200.999840.99992000120.9999200.999680.99984000880.999840
11v, s0.959520.998490610.998890110.92170.99809120.9988900.999930.99996999990.9999700.999890.99996999950.999970
12v0.956650.995498130.995896430.91520.99112550.9919180.999900.99994000110.9999400.999820.99990000370.999900
13v0.956360.995199510.995597690.91460.99043190.9912240.999870.99991000320.9999100.999750.99983001280.999830
14v0.956330.995169660.995567820.91450.99036260.9911550.999840.99988000620.9998800.999680.99976002680.999760
Table 2. The Placement-dependent SFC Reliability for k = 1 vs. Backup SFCs r and SFC Length Ψ
We next investigate the traditional 1+1 backup protection, i.e., r = 1. At first, the service reliability decreases with decreasing amount of backup component types and an increasing number of unshared components Δ. However, some placement strategies with low provide higher SFC reliability (marked bold) than a few placements with a higher . In Case 1 with an SFC length of Ψ = 8, placement with and outperforms placements with and . That is because a generic SFC that utilizes placement strategies with and involves many more hardware components that can fail increasing the probability of service interruption. In contrast, SFC placed with parameters and will be most likely interrupted due to VNF failures, as the number of racks and servers involved is low. Then, if we consider the VNF components only, its reliability relates to , i.e., the lower reliability of the longer SFC chain, i.e., , can not be sufficiently compensated by the backup DCs, racks and servers. For Case 2, we marked bold some results for the placement strategies (v), as we observed that a placement (s) with lower outperforms a placement based on VNF type, e.g., placement with and outperforms placement with and . This can be explained by the fact that the placement with and involves more hardware components, whereby both DCs have to be available to maintain the service. Moreover, also other placement strategies based on VNF type (v) (marked bold) provide the lowest SFC reliability for any value of r and Ψ. As a result, the amount of backup components, i.e., , can be generally reduced without degrading the service reliability, and it is preferable to use a placement strategy (s).
We next study the impact of increasing the amount of backup sub-SFCs on resulting service reliability, where r = 100 sub-SFCs can be a good approximation of an upper bound. First, a more detailed rule can be observed for both use cases, which also reflects the case without protection:
For a given value of , SFC reliability decreases with increasing number of unshared components Δ, and for a fixed value of Δ, SFC reliability increases with increasing .
For instance, for a and descending , it can be seen for all r and Ψ, the higher results in higher SFC reliability. Furthermore, SFC reliability increases with increasing redundancy, e.g., , , for , respectively, if and in Case 1. In contrast, with and and , the provided SFC reliability of “3-” and “2-”nines in Case 1 is almost independent of value , respectively.
We further investigate the placement strategy characterized by and , which provides the highest service reliability, setting k≥ 1 primary sub-SFCs and component reliability as defined for Case 1. Table 3 shows the service reliability as function of k, and Ψ. The first column contains the amount of backup sub-SFCs normalized by the number of primary sub-SFCs, i.e., . As expected, the increasing number of VNFs in SFC, i.e., Ψ, decreases SFC reliability. In contrast, the increasing number of primary sub-SFCs significantly increases the service reliability if . The SFC reliability of “six nines” can be reached with configuration k = 4 and r = 4 or k = 8 and r = 4 for both Ψ = 4 and Ψ = 8, while a standard serial SFC (k = 1) requires r = 3 backup sub-SFCs, which means 300% redundancy (not shown for clarity of presentation). Thus, to increase service reliability and to decrease backup resources per sub-SFC, a generic SFC should include as many sub-SFCs as possible.
Table 3.
 Ψ = 4Ψ = 8
k = 1k = 4k = 8k = 1k = 4k = 8
00.9595298550540960.8476839652077230.7185681048702890.9217205502408430.7217670996086340.520947746077460
0.125--0.983696719311009--0.968371471691151
0.25-0.9952738744133650.999384198321856-0.9906971857876670.998774974539633
0.375--0.999981195954079--0.999962424513603
0.5-0.9998936216265730.999999500094456-0.9997877840032780.999999000321022
0.625--0.999999987999969--0.999999976000402
0.75-0.9999979324510630.999999999734000-0.9999958664496110.999999999468002
0.875--0.999999999994470--0.999999999988942
10.9995123443974610.9999999633138510.9999999999998900.9990295230128670.9999999266315230.999999999999781
1.125--0.999999999999997--0.999999999999995
1.25-0.9999999993897060.999999999999999-0.9999999987794210.999999999999998
1.375--0.999999999999999--0.999999999999999
Table 3. Service Reliability for Pl. with and vs. a Number of Primary k and Backup r Sub-SFCs
Finally, we investigate the SFC reliability as function of placement, which are different for primary and backup SFCs resulting in H = 2 different reliability classes. We use configuration with k = 4, r = 3, Ψ = 4, and component reliability defined for Case 1. Table 4 shows the service reliability, whereby the first two columns and the two upper rows ( and Δ) describe placement strategy of primary and backup sub-SFCs, respectively, as presented in Figure 4. Also here, the general rule is that SFC reliability decreases with increasing number of unshared components Δ for a fixed . To this end, we show the results for the backup placement strategy described by as this configuration provides the highest SFC reliability. However, the highest reliability of “five nines” can be reached when primary sub-SFCs are placed with disjointness degree (marked bold italic), i.e., all placements “s.” Thus, the best placement strategy is to use less disjoint components for placement of VNFs of a certain sub-SFC and to maximally separate sub-SFCs from each other, e.g., place the whole sub-SFC in the same server and different sub-SFCs in different DCs.
Table 4.
 44443333
Δ12341234
410.9999979324510.99999793186550.99999793182860.9999979318250.999996410.999996410.999996410.9999961885402
420.9999979317830.99999793170050.99999793169750.9999979316970.999996310.999996310.999996310.9999961883993
430.9999979317530.99999793169680.99999793169510.9999979316950.999996300.999996300.999996300.9999961883957
440.9999979317500.99999793169650.99999793169500.9999979316940.999996290.999996290.999996290.9999961883954
310.9999879366410.99998793606390.99998793602760.9999879360240.999986410.999986410.999986410.9999861942812
320.9999879359810.99998793590190.99998793589910.9999879358980.999986310.999986310.999986310.9999861941428
330.9999879359520.99998793589840.99998793589680.9999879358960.999986300.999986300.999986300.9999861941393
340.9999579366140.99995793656030.99995793655880.9999579365580.999956300.999956300.999956300.9999561948534
210.9998879792470.99988797874700.99988797871640.9998879787130.999886460.999886460.999886460.9998862523920
220.9998879786660.99988797861340.99988797861200.9998879786110.999886360.999886360.999886360.9998862522773
230.9995880422680.99958804221550.99958804221400.9995880422130.999586420.999586420.999586420.9995863163972
240.9995580549260.99955805487410.99955805487260.9995580548720.999556430.999556430.999556430.9995563291076
110.9988884761850.99888847614570.99888847614470.9988884761440.998886990.998886990.998886990.9988869043083
120.9958948064230.99589480638380.99589480638280.9958948063820.995893320.995893320.995893320.9958932392572
130.9955960678570.99559606781770.99559606781670.9955960678160.995594590.995594590.995594590.9955945011612
140.9955662002740.99556620023440.99556620023340.9955662002330.995564720.995564720.995564720.9955646336249
Table 4. Service Reliability with H = 2 Different Reliability Classes in Configuration k = 4, and Ψ = 4
Placement of primary and backup sub-SFCs defined in two first columns and rows, respectively.

5 Conclusion

We provided a novel reliability analysis of the so-called generic SFC, which jointly considers component sharing, components heterogeneity, and their failure interdependency due to failure propagation. Our analysis is based on combinatorics and a reduced binomial theorem allowing us to analyze a placement-dependent SFC reliability, which has not been addressed previously. The analytical results confirmed by simulations showed some tradeoffs between component heterogeneity, interdependency, disjointness, and sharing for SFC placement strategies over inter- and intra-DCNs under the assumption of failures of DCs, racks, servers, and VNFs. We showed that the high VNF disjointness provided by placement strategies does not automatically lead to the highest SFC reliability and that not all placement strategies result in high reliability independently of the amount of redundancy. In contrast, a proper selection of the number of shared and unshared components in DCN can significantly increase SFC reliability, which indeed is placement-dependent. In contrast to the common assumption that a maximally disjoint placement is the best way to increase reliability, we found that service reliability could increase also by increasing the number of shared components and reducing the number of common roots. We verified that the highest SFC reliability over any configuration can be reached when each sub-SFC is allocated to a separate server, whereas VNFs of the same type and, thus, each sub-SFC, are placed in different DCs.

References

[1]
Waseem Ahmed and Yong Wei Wu. 2013. A survey on reliability in distributed systems. J. Comput. Syst. Sci. 79, 8 (2013), 1243–1255. DOI: https://doi.org/10.1016/j.jcss.2013.02.006.
[2]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. SIGCOMM Comput. Commun. Rev. 38, 4 (Aug. 2008), 63–74. DOI: https://doi.org/10.1145/1402946.1402967.
[3]
Abdelhamid Alleg, Toufik Ahmed, Mohamed Mosbah, and Raouf Boutaba. 2020. Joint diversity and redundancy for resilient service chain provisioning. IEEE J. Select. Areas Commun. 38, 7 (2020), 1490–1504. DOI: https://doi.org/10.1109/JSAC.2020.2986867.
[4]
Yuan-Shun Dai, Yi Pan, and Xukai Zou. 2007. A hierarchical modeling and analysis for grid service reliability. IEEE Trans. Comput. 56, 5 (May 2007), 681–691. DOI: https://doi.org/10.1109/TC.2007.1034.
[5]
Yuan-Shun Dai, Bo Yang, Jack Dongarra, and Gewei Zhang. 2010. Cloud service reliability: Modeling and analysis. In Proc. IEEE Pacific Rim Int. Symp. Depend Comput.784–789.
[6]
W. Ding et al.2017. Enhancing the reliability of services in NFV with the cost-efficient redundancy scheme. In Proceedings of the IEEE International Conference on Communications. 1–6. DOI: https://doi.org/10.1109/ICC.2017.7996840.
[7]
A. Engelmann, W. Bziuk, and A. Jukan. 2020. Bounding reliability in service function chaining. In Proceedings of the IEEE International Conference on Information, Communication and Electronic Technology (MIPRO).
[8]
A. Engelmann and A. Jukan. 2018. A reliability study of parallelized VNF chaining. In Proceedings of the IEEE International Conference on Communications. 1–6. DOI: https://doi.org/10.1109/ICC.2018.8422595.
[9]
A. Engelmann, A. Jukan, and R. Pries. 2019. On coding for reliable VNF chaining in DCNs. In Proceedings of the International Conference on the Design of Reliable Communication Networks. 83–90.
[10]
Z. Ye et al. 2016. Joint topology design and mapping of service function chains for efficient, scalable, and reliable network functions virtualization. IEEE Netw. 30, 3 (May 2016), 81–87. DOI: https://doi.org/10.1109/MNET.2016.7474348.
[11]
Z. Guo and Y. Yang. 2015. Exploring server redundancy in nonblocking multicast data center networks. IEEE Trans. Comput. 64, 7 (July 2015), 1912–1926. DOI: https://doi.org/10.1109/TC.2014.2346576.
[12]
Z. Guo and Y. Yang. 2015. On nonblocking multicast fat-tree data center networks with server redundancy. IEEE Trans. Comput. 64, 4 (Apr. 2015), 1058–1073. DOI: https://doi.org/10.1109/TC.2014.2315631.
[13]
Joel M. Halpern and Carlos Pignataro. 2015. Service Function Chaining (SFC) Architecture. RFC 7665. (Oct. 2015). DOI: https://doi.org/10.17487/RFC7665.
[14]
S. Herker, X. An, W. Kiess, S. Beker, and A. Kirstaedter. 2015. Data-center architecture impacts on virtualized network functions service chain embedding with high availability requirements. In Proceedings of the IEEE Globecom Workshops (GC Wkshps). 1–7. DOI: https://doi.org/10.1109/GLOCOMW.2015.7414158.
[15]
A. Hmaity, M. Savi, F. Musumeci, M. Tornatore, and A. Pattavina. 2016. Virtual network function placement for resilient service chain provisioning. In Proceedings of the International Workshop on Reliable Networks Design and Modeling. 245–252. DOI: https://doi.org/10.1109/RNDM.2016.7608294.
[16]
Per Hokstad and Marvin Rausand. 2008. Common Cause Failure Modeling: Status and Trends. Springer London, London, 621–640.
[17]
Anne Immonen and Daniel Pakkala. 2014. A survey of methods and approaches for reliable dynamic service compositions. Serv. Orient. Comput. Applic. 8, 2 (01 June 2014), 129–158. DOI: https://doi.org/10.1007/s11761-013-0153-3.
[18]
Gauri Joshi, Emina Soljanin, and Gregory Wornell. 2017. Efficient redundancy techniques for latency reduction in cloud systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2, 2 (Apr. 2017). DOI: https://doi.org/10.1145/3055281.
[19]
Wasiur R. KhudaBukhsh, Sounak Kar, Amr Rizk, and Heinz Koeppl. 2019. Provisioning and performance evaluation of parallel systems with output synchronization. ACM Trans. Model. Perform. Eval. Comput. Syst. 4, 1 (March 2019). DOI: https://doi.org/10.1145/3300142.
[20]
Defang Li, Peilin Hong, Kaiping Xue, and Jianing Pei. 2019. Availability aware VNF deployment in datacenter through shared redundancy and multi-tenancy. IEEE Trans. Netw. Serv. Manag. 16, 4 (2019), 1651–1664. DOI: https://doi.org/10.1109/TNSM.2019.2936505.
[21]
Hidefumi Nakamura et al.ETSI GS NFV-REL 003 V1.1.1: Network Functions Visualisation (NFV); Reliability; Report on Models and Features for End-to-End Reliability. https://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/003/01.01.01_60/gs_NFV-REL003v010101p.pdf.
[22]
T. A. Nguyen, D. Min, E. Choi, and T. D. Tran. 2019. Reliability and availability evaluation for cloud data center networks using hierarchical models. IEEE Access 7 (2019), 9273–9313. DOI: https://doi.org/10.1109/ACCESS.2019.2891282.
[23]
Larry Peterson and Saurav Das. 2017. Trellis: CORD Network Infrastructure. (Oct. 29 2017). Retrieved from https://wiki.opencord.org/pages/viewpage.action?pageId=2557405.
[24]
L. Qu, C. Assi, K. Shaban, and M. Khabbaz. 2016. Reliability-aware service provisioning in NFV-enabled enterprise datacenter networks. In Proceedings of the International Conference on Network and Service Management. 153–159. DOI: https://doi.org/10.1109/CNSM.2016.7818411.
[25]
L. Qu, M. Khabbaz, and C. Assi. 2018. Reliability-aware service chaining in carrier-grade softwarized networks. IEEE J. Select. Areas Commun. 36, 3 (Mar. 2018), 558–573. DOI: https://doi.org/10.1109/JSAC.2018.2815338.
[26]
Paul Quinn, Uri Elzur, and Carlos Pignataro. 2018. Network Service Header (NSH). RFC 8300. (Jan. 2018). DOI: https://doi.org/10.17487/RFC8300.
[27]
J. Quittek et al.ETSI GS NFV 002 V1.1.1: Network Functions Visualisation (NFV); Architectural Framework. https://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.02.01_60/gs_NFV002v010201p.pdf.
[28]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter rising: A decade of clos topologies and centralized control in Google’s datacenter network. In Proceedings of the SIGCOMM Conference.
[29]
Gurpreet Singh et al.ETSI GS NFV-REL 004 V1.1.1: Network Functions Visualisation (NFV); Assurance; Report on Active Monitoring and Failure Detection. https://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/004/01.01.01_60/gs_NFV-REL004v010101p.pdf.
[30]
O. Soualah, M. Mechtri, C. Ghribi, and D. Zeghlache. 2017. A link failure recovery algorithm for virtual network function chaining. In Proceedings of the IFIP/IEEE Symposium on Integrated Network and Service Management (IM). 213–221. DOI: https://doi.org/10.23919/INM.2017.7987282.
[31]
Da Wang, Gauri Joshi, and Gregory W. Wornell. 2019. Efficient straggler replication in large-scale parallel computing. ACM Trans. Model. Perform. Eval. Comput. Syst. 4, 2 (April 2019). DOI: https://doi.org/10.1145/3310336.
[32]
Meng Wang, Bo Cheng, and Junliang Chen. 2020. Joint availability guarantee and resource optimization of virtual network function placement in data center networks. IEEE Trans. Netw. Serv. Manag. 17, 2 (2020), 821–834. DOI: https://doi.org/10.1109/TNSM.2020.2978910.
[33]
H. Xu and B. Li. 2014. TinyFlow: Breaking elephants down into mice in data center networks. In Proceedings of the IEEE International Symposium on Local and Metropolitan Area Networks. 1–6. DOI: https://doi.org/10.1109/LANMAN.2014.7028620.
[34]
Y. Xu and V. P. Kafle. 2017. Reliable service function chain provisioning in software-defined networking. In Proceedings of the 13th International Conference on Network and Service Management (CNSM). 1–4. DOI: https://doi.org/10.23919/CNSM.2017.8256022.

Cited By

View all
  • (2024)A Network Calculus Model for SFC Realization and Traffic Bounds Estimation in Data CentersACM Transactions on Internet Technology10.1145/370044024:4(1-32)Online publication date: 18-Nov-2024
  • (2024)Reliability-assured service function chain migration strategy in edge networks using deep reinforcement learningJournal of Network and Computer Applications10.1016/j.jnca.2024.103999(103999)Online publication date: Aug-2024
  • (2024)Post-fault restoration of service function chains in fog networksComputer Networks10.1016/j.comnet.2024.110580251(110580)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 6, Issue 3
September 2021
98 pages
ISSN:2376-3639
EISSN:2376-3647
DOI:10.1145/3492429
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2021
Accepted: 01 July 2021
Revised: 01 June 2021
Received: 01 November 2020
Published in TOMPECS Volume 6, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NFV
  2. VNF
  3. SFC
  4. reliability
  5. availability
  6. combinatorial analysis
  7. common cause failures
  8. failure propagation
  9. parallel service function chaining
  10. reliable VNF placement

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • European Commission

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)240
  • Downloads (Last 6 weeks)71
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Network Calculus Model for SFC Realization and Traffic Bounds Estimation in Data CentersACM Transactions on Internet Technology10.1145/370044024:4(1-32)Online publication date: 18-Nov-2024
  • (2024)Reliability-assured service function chain migration strategy in edge networks using deep reinforcement learningJournal of Network and Computer Applications10.1016/j.jnca.2024.103999(103999)Online publication date: Aug-2024
  • (2024)Post-fault restoration of service function chains in fog networksComputer Networks10.1016/j.comnet.2024.110580251(110580)Online publication date: Sep-2024
  • (2024)Dependability of Network Services in the Context of NFV: A Taxonomy and State of the Art ClassificationJournal of Network and Systems Management10.1007/s10922-024-09810-232:2Online publication date: 26-Mar-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media