research-article

Open access

A Combinatorial Reliability Analysis of Generic Service Function Chains in Data Center Networks

Authors:

Anna Engelmann,

Admela JukanAuthors Info & Claims

ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Volume 6, Issue 3

Article No.: 9, Pages 1 - 24

https://doi.org/10.1145/3477046

Published: 02 December 2021 Publication History

All formats PDF

Abstract

In data center networks, the reliability of Service Function Chain (SFC)—an end-to-end service presented by a chain of virtual network functions (VNFs)—is a complex and specific function of placement, configuration, and application requirements, both in hardware and software. Existing approaches to reliability analysis do not jointly consider multiple features of system components, including, (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency. To this end, we develop a novel analysis of service reliability of the so-called generic SFC, consisting of n = k + r sub-SFCs, whereby k≥ 1 and r≥ 0 are the numbers of arbitrary placed primary and backup (redundant) sub-SFCs, respectively. Our analysis is based on combinatorics and a reduced binomial theorem—resulting in a simple approach, which, however, can be utilized to analyze rather complex SFC configurations. The analysis is practically applicable to various VNF placement strategies in arbitrary data center configurations, and topologies and can be effectively used for evaluation and optimization of reliable SFC placements.

1 Introduction

Reliability of complex systems can be defined as a probability that the system will successfully complete the processing of the intended service. As such, reliability analysis in system engineering is critically important for designing systems that are robust in case of failures of any system component. Within the framework of Network Function Virtualization (NFV), the reliability of Service Function Chain (SFC)—a chain of Virtual Network Functions (VNFs)—is a complex function of VNF placement, network configurations, and application requirements on SFC deployment, both in hardware and software. Generally, SFCs suffer from processing vulnerability, whereby failures caused by hardware or software can interrupt the entire service chain [1, 17]. To maintain the service in case of failures, the most common technique is based on redundant configurations [3, 18, 20, 32]. With redundant configurations, the failed processes can be redirected to backup components. However, even a full backup system may not yield 100% reliability if backup resources are placed in shared servers or racks. Moreover, SFCs can deploy replications to improve response times and flexibility of scaling, thus further complicating the calculation of reliability.

Previous approaches to reliability analysis cannot be generalized for SFCs, because they do not jointly consider a few important aspects of system components in data center networks (DCN): (i) heterogeneity, (ii) disjointness, (iii) sharing, (iv) redundancy, and (v) failure interdependency, resulting in the so-called common cause failures and failure propagations. A joint consideration of all these different features, in consideration of component’s dependences, is, however, of major practical relevance. For instance, when VNFs are placed in the same rack they create dependencies, which implies that a failure of a rack causes a failure of all servers and VNFs inside the rack. Furthermore, all components, be it racks or servers, or servers with different hardware, exhibit different component reliability by design, also due to different lifetimes and levels of utilization. In fact, SFC reliability calculation of individual system components was well studied, i.e., without these dependencies, especially in the context of the so-called reliable SFC mapping problem [6, 10, 14, 15, 21]. References [11, 12] explored the role of server redundancy in data centers with server failures. In Reference [30], a scalable and efficient algorithm for VNFs placement to addressing physical link failures was presented, whereas in Reference [34], the focus was on disjointness, while addressing failure of servers. References [24, 25] studied a joint optimization of SFC placement and flow routing under the assumption of VM failures.

Component disjointness and their independence is a frequent assumption to assess the reliability independently of SFC and VNF placement. In such scenarios, the reliability calculation is straightforward. In practice, however, one rack can host multiple servers that are hosting multiple VNFs of the same SFC, which requires a different analytical approach. References [4, 5] address this problem, as they provide a hierarchical topology model without backup protection to asses reliability based on queuing and graph theory and apply Bayesian analysis to predict service reliability. This analysis assumes that service is reliable when all service components are available during service runtime, whereby the reliability calculation can be reduced to multiplication of all reliability values. The failure of a rack hierarchically propagates the failure to the corresponding servers and VMs, also referred to as hierarchical failure propagation [16]. In Reference [22], a hierarchical modeling framework of tree-based data center networks was proposed, where servers, links, switches, and routers hierarchically fail. The resulting reliability model yields an approximate reliability, due to the underlying assumptions of component disjointedness and independence. The assumption of component disjointness cannot, however, always be met in practice. In addition, SFC reliability is placement-dependent, as some VNFs can be hosted by different servers, racks, and data centers.

Another practical feature to consider is splitting and parallel processing of the workload using multiple servers, which is a widely deployed technique to improve load balancing, response times, and flexibility of scaling in data centers [19, 31, 33]. Therefore, an SFC is not only a serial configuration of various processing workloads (VNFs) but also a system that can be fully parallelized to provide end-to-end service in the form of replicas or backups. Hence, a generic SFC can be made of multiple so-called sub-SFCs running in parallel. Despite their practical relevance, such generic SFCs have been widely overlooked in the literature. In our previous study [8, 9], we analyzed SFC reliability for cases of parallelized processing as a function of interdependent hardware (server) and software (VNFs) component failures and VNF placement strategies. In Reference [7], we investigated SFC reliability as a function of VNF placement strategy in inter- and intra-DCN, taking into account failures of data centers, racks, servers, and VMs. We generalize and majorly extend our analytical approach from these papers and can summarize the main contributions of this article as follows:

•

Reliability analysis of generic SFCs, defined as rank-ordered VNFs allocated to n = k + r sub-SFCs, k≥ 1, r≥ 0 being the numbers of primary and backup (redundant) sub-SFCs, respectively.

•

Joint consideration of the hierarchical, component-dependent placement of generic SFCs, with consideration of the level of components’ disjointness, interdependency, and sharing.

•

Analysis of hierarchical failure propagation over data centers, racks, servers, and VNFs, with derivations of the so-called Acceptable Component Failures and the corresponding amount of components available for utilization upon failures.

•

Analysis of the suitability of the model to derive best placement strategies according to reliability requirements in generic inter- and intra-data center network scenarios.

The rest of the article is organized as follows: In Section 2, we provide background and assumptions. Section 3 provides the analysis of SFC reliability. Section 4 discusses the numerical results. Section 5 concludes the article.

2 Preliminaries

In this section, we introduce the basic concepts related to SFC placement, define generic SFCs, discuss hierarchical component failures, and provide definitions used for modeling. All notations utilized here and throughout the article are summarized in Table 1.

Table 1.

2.1 SFC Placement

An SFC is defined as an ordered chain of Ψ virtual network functions (VNFs). VNFs are typically placed in Virtual Machines (VMs), which in turn are installed in networked servers in data center networks (DCNs), whereby a number of servers can be located in the same rack [2, 23, 27, 28]. Let us generalize the data center network architecture as a hardware fabric of switches, links, racks with multiple servers and multiple VMs per server, including core routers, Top-of-Rack (ToR) switches, forwarding switches building a switching fabric of DC, and Virtual Switch (vS) (Figure 1(a)). Core routers provide connectivity between geographically distant DCNs over wide area networks. ToR switches forward traffic within a rack. The switching fabric provides a connection between racks and to the core routers. Each server can host multiple VNFs with as many resources as required. Thus, the main task of the programmable hypervisor switch (denoted as vS) with a load balancer (LB) is to select the appropriate VNF instance as well as to provide connectivity to ToR.

Fig. 1.

Embedding VNFs of an SFC within DCNs is a known challenge from the perspective of performance. As illustrated in Figure 1(a), an SFC created based on three VNF types, i.e., VNF1, VNF2, VNF3, can be placed in either one or multiple DCNs. Let us define the VNF type ψ, where

, by the application implemented by this VNF, examples of which are firewall, Network Address Translation (NAT), Intrusion Detection System (IDS), or traffic shaper. As previously noted, the VNFs of various types are placed in an SFC in a rank ordered fashion. For simplicity, and without loss of generality, we assume that VNF type is defined by its index, i.e., Ψ = 1 indicates VNF1, meaning that it is both the type and order of VNF in the chain. Inside a DCN, there are multiple options to place various types of VNFs. For instance, in DC4, different VNFs of an SFC can be either placed in the same or in different racks, in the same or different servers, and so on. For example, Rack1 in DC4 accommodates an entire SFC, which is distributed over servers S1, S2, and S3, and in Rack2, two SFCs can be allocated to three servers, whereby each server provides two VNFs of the same type. In contrast, VNF1, VNF2, and VNF3 in Rack3 (DC4) are all placed in one server, e.g., S2. In addition, the same SFC can be placed in DC1, DC2, and DC3, and so on.

2.2 Generic SFCs in DCN

As previously noted, we define a generic SFC of n≥ 1 sub-SFCs, whereby n = k + r. In its simplest form (k = 1, r = 0) a generic SFC corresponds to the traditional and most widely used serial SFC, as illustrated by the solid line in Figure 1(a). Without redundant resources, SFC reliability depends on the reliability of the underlying hardware and software components. If we now assume that another SFC (dashed line) is additionally used as a redundant resource or backup, then we get closer to the idea of a generic SFC, here with k = 1 and r = 1. The backup VNFs can replace any failed primary VNF of the same type. Similar to primary SFCs, also backup SFCs can fail during service runtime. As shown in References [21, 29], there are multiple different strategies for fast VNF failure detection and effective deployment of primary and backup VNFs. For the purpose of our analysis, however, we do not need to explicitly consider any specific backup deployment strategy. Instead, we assume that any SFC is available and functional during service runtime, if at least k≥ 1 out of n VNFs of each type are available, while at most r VNFs of each type can fail during service runtime.

Figure 1(a) can be used again to illustrate this, whereby a request for VNF1, VNF2, and VNF3 is now split into two primary k = 2 sub-SFCs, i.e., r = 0, presented by solid and dashed lines. The resulting SFC is created over two sub-SFCs, one placed in DC4 in Rack1 and another in DC1, DC2 and DC3. An SFC is reliable/available only when both sub-SFCs are available during service runtime. It should be noted here that in a generic SFC, VNFs of the same type (e.g., all VNF1s) need to be synchronized in the case of stateful operation. To this end, an external state repository can be utilized to store the internal states of VNFs. Once the SFC workload is split between k sub-SFCs, each sub-SFC uses own VNFs. This can be implemented with SFC Encapsulation, as proposed in References [13, 26]. From the practical perspective, generic SFC can deploy traffic parallelism, SFC replicas, and network routing to provide for resilient networking, as known techniques being widely deployed. We note here that each of the techniques can be used additionally to also increase the reliability; disjoint network paths can help various SFC exhibit higher reliability. Again, our analysis does not make any particular assumption on routing and traffic splitting and merely calculates the likelihood that at least k≥ 1 out of n VNFs of each type are available during service runtime allowing that at most r VNFs of each type can fail during service runtime.

Figure 1(b) illustrates a generic SFC with backup protection, where network traffic request f with three VNF types (VNF1-VNF2-VNF3) consists of three workload flows (f₁, f₂, f₃) and is served by a generic SFC with k = 3 primary sub-SFCs and r = 1 backup sub-SFC. Any backup VNFψ allocated to DC1, DC2, and DC3 is able to replace any primary VNF of the same type ψ, where

. All primary sub-SFCs are placed in DC4, with the first and second sub-SFCs being placed in Rack5. All VNFs of sub-SFC f₁ are placed in server S4. In contrast, all VNFs for f₂ are distributed over different servers, i.e., S1, S2, and S3 in Rack5. The third sub-SFC for f₃ is distributed over three different racks in DC4 so each VNF is placed in different racks and servers, i.e., VNF1, VNF2, and VNF3 in S1 (Rack 2), S3 (Rack 3), and S3 (Rack 4), respectively. In contrast, each backup VNF is placed in individual DCs, i.e., DC1, DC2, and DC3. In DC4, VNF1 and VNF2 of primary sub-SFC for f₂ can fail; for instance, VNF1 in S1 and server S2 fail in Rack 5, respectively. As a result, workload flow f₂ is redirected toward DC1 and DC2 to be processed by backups VNF1 and VNF2. If primary VNF3 of sub-SFC for f₃ also fails, for instance, due to the failure of Rack 4 in DC4, then sub-request f₃ can be redirected to backup VNF3 placed in DC3. Although there are two hardware and one VM failures resulting in loss of three different primary VNFs, the SFC remains available as long as the corresponding backup VNFs, i.e., VNF1, VNF2, and VNF3, are available.

2.3 Hierarchical Component Failures

Even though we show that our analysis does not depend on specific data center network topology, it is important to recognize the hierarchical nature of the data center network topologies, which is critical to failure propagations, i.e.,

•

Data Center (DC). The failure of an entire data center can be caused by connectivity failures in wide area network or between core routers and forwarding switches, as well as by failures of all core routers. DC failure hence propagates as failure of all racks, servers, and VMs.

•

Rack (R). Rack failures can be caused by failures of forwarding, failures of ToR switches, as well as their links and connectivity. Any failure of a rack causes the failure of all servers and all VNFs in that rack.

•

Server (S). A server can fail and become unavailable in case of failure of any server’s hardware components, i.e., power supply, memory, and so on. A server can also fail due to a failed link between server and ToR or when vS fails resulting in unavailability of all VMs hosted by the server.

•

VM/Container. We assume that one VM or container is reserved for one VNF, indicating that a failure of VM or the container, e.g., due to misconfigurations or application software issues, causes failure of that VNF only.

In sum, a failure of a component in a higher level of topological hierarchy generally results in failures of all connected components in lower levels of the hierarchy.

Each component can be qualified by a component-dependent reliability or availability, which are temporal functions of a Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR), respectively [21]. In this article, since our focus is on the spacial interdependency and the related hierarchical failure propagation, and we treat temporal reliability (availability) of a single component as input parameter. Let us illustrate this with DCN presented as hierarchical tree in Figure 2. The root node is DC, which belongs to the highest hierarchy level; VNFs represent leaf nodes at the lowest hierarchy level. The links between any nodes define interdependency between components, whereas the components of the same type within the same hierarchy level are independent of each other. The components of the lower hierarchy level do not impact components at the higher hierarchy level. For example, for SFCs placed in DC4, a failure of VNF1 will not affect the availability of VNF2 and VNF3 placed in R1 and servers S2 and S3, respectively (Figure 2). The availability of S1, R1, and DC4 will not be affected by VNF1 failure either. In contrast, the components of higher hierarchy level impact the lower hierarchy components, e.g., a failure of R1 (level 2) in DC4 will result in failures of all servers (level 3) inside this rack and all VMs (level 4), i.e., failure of all VNFs (VNF1, VNF2, and VNF3). Thus, the failure of an entire SFCs needs to be considered. As shown in Figure 2, VNFs can be placed in shared components: SFC allocated to R1 has two shared components (R1 and DC4), VNFs of SFCs placed in R2 share R2 and DC4 and SFCs hosted by S2 and S3 in R3 have a shared server (S2 or S3), R3 and DC4. As a result, failure of any shared component results not only in failures of components from the lower hierarchy level but also in failure of the entire sub-SFC, i.e., Ψ VNFs of different types.

Fig. 2.

Our assumption is that DCN topology exhibits a high level of connectivity, such as leaf-spine topology [2, 23, 28], which in turn allows us to assume that the availabilities of DC, racks, servers, and VM are statistically independent of switch and link failures. For instance, we assume that a certain rack is available as long as at least one core router, one forwarding switch, ToR, and at least one link between them are all available. This assumption, however, can lead to an overestimation of service availability value in DCNs with lower connectivity, such as with fat-tree topology [2, 14]. For example, DC and rack availability can depend on availability of core routers, as failure of a certain core router will lead to a failure of some forwarding switches and links reducing reliability of pods and racks. This limitation can be addressed by a slight modification of the hierarchical network model presented in Figure 2, where instead of DC used as the highest hierarchy level, core routers can be used instead, and an additional hierarchy level (level 2) for forwarding switches of aggregation layer can be included in the model. In other words, more hierarchy levels would yield a more accurate reliability analysis for DCNs with lower connectivity. Without loss of generality, and to simplify further discussion as well as to limit the amount of parameters and variables in our reliability analysis, we consider a highly connected DCN with four hierarchy levels.

2.4 Definitions and Parameters for VNF Placement

Let us assume

hierarchy levels in DCN topology, in consideration of sharing, disjointness, interdependency, and heterogeneity of components. Each hierarchy level is described by a variable c,

, and at the same time it uniquely determines the component type in DCN, i.e., DC, rack, server, and VM (containers). For instance, at hierarchy level 1 the component type is defined as DC, while at hierarchy level

it is VM/container/VNF. Without loss of generality, all VNFs, i.e., component type

, of a certain sub-SFC, i.e., each VNFψ, where

, can be placed in the same component

of higher hierarchy level, e.g., in same DC or same rack or same server, and, thus, all Ψ VNFs share this component.

Since any VNF is placed in a separate virtual machine or container, i.e., disjointedly from any other VNFs, the highest value of shared component is

(server) and the minimal number of unshared component types is, then, defined as

. Figure 3 represents a hierarchical model of a generic SFC earlier shown in Figure 1(b). In this example, a generic SFC consists of n = 4 sub-SFCs, whereby each sub-SFC has different number of shared

and unshared Δ components. For instance,

when a whole sub-SFC is placed in the same server, such as VNF1, VNF2, and VNF3 in S4, R5, and DC4. In contrast, the number of unshared components of sub-SFC placed in DC1, DC2, and DC3 is

, where different VNF types (VNF1, VNF2, VNF3) are placed in different DCs, racks, servers, and VMs. In this case, there are no shared components, i.e.,

. Generally, the type of shared component

shows its lowest hierarchy level, whereby the total number of shared component types, i.e., hierarchy levels, is

. To model a VNF placement strategy, we next introduce an additional parameter:

Fig. 3.

Considering all sub-SFCs of a generic SFC, we define a parameter called disjointness degree , where

, used to indicate the number of different component types, i.e., hierarchy levels, that are utilized to separate n VNFs of the same type ψ as they are placed rank ordered.

When VNFs of a certain type ψ, for instance, Ψ = 3 meaning VNF3, of a generic SFC are placed in different DCs, racks, servers, and VMs, the disjointness degree is determined as

. In Figure 3, SFC with four sub-SFCs placed in DC1, DC2, DC3, and DC4 has the overall disjointness degree

as all VNFs of each type ψ in generic SFC are placed in different servers and different VMs, i.e., sub-SFC1 and sub-SFC2. When we now assume that a generic SFC presented in Figure 3 consists of sub-SFC3 and a backup sub-SFC only, which are placed in DC4 (R2, R3, R4), and DC1, DC2, and DC3, respectively, the VNFs of a certain type ψ are placed in different DCs resulting in a disjointness degree of

We consider component heterogeneity to reflect upon the assumption that different component types, such as racks and servers, and components of the same type, e.g., VNFs, can have different component reliability. This assumption is of practical importance, as different component types provide different reliability by design, e.g., DCs are more reliable than VMs, and components of the same type can have different reliability due to different lifetimes and level of utilization. Thus, we assume that component reliability of any component c depends on VNF type ψ allocated to this component c. The following notation is used: Any unshared component is described as

, where

, and any shared component

, where

. Considering component heterogeneity and VNF placement strategy, we additionally assume that each (sub-)SFC builds a certain reliability class ξ.

Each sub-SFC in a generic SFC belongs to a certain reliability class ξ,

and

, defined by its placement strategy, i.e.,

and Δ, and reliability value of each used component of each hierarchy level c,

. Multiple sub-SFCs can belong to the same class ξ if they are placed in the same fashion and use components characterized by the same reliability per hierarchy level, i.e., iff

and

, where

Generally, there can be

components of type c inside a component of type

utilized to accommodate the reliability class ξ. In Figure 3, for example,

data center (DC4),

rack (R5),

servers (S1, S2, S3) and

virtual machine per server (VNF1, VNF2, VNF3) are required to place sub-SFC of reliability class

. In real networks, multiple sub-SFCs can follow the same or different placement strategies and may share some components or be placed disjointedly from each other as discussed above. In Figure 3, all four sub-SFCs belong to four different reliability classes, because each sub-SFC utilizes different VNF placement strategy. Referring back to Figure 2, however, the basic placement strategy shown in Figure 3 by sub-SFC of reliability class

is utilized by both sub-SFCs placed in DC4 and R3. When all shared components c = 3, i.e., both servers S2 and S3, as well as all unshared components

, i.e., VNFs, have the same component reliability, respectively, both sub-SFCs will belong to the same reliability class. Thus, these two sub-SFCs that follow the same VNF placement strategy can belong to the same reliability class, when

and

, or to different reliability classes, when

. In contrast, each unshared component of any type c and reliability class ξ utilized to allocate a certain VNF type ψ can have different component-dependent reliability, i.e.,

. As a result, there can be at most

reliability classes, whereby

, sub-SFCs can belong to the same reliability class ξ. Depending on placement strategy, multiple sub-SFCs of different reliability classes can be allocated to the same components; see three sub-SFCs in Figure 3 that are all placed in DC4.

Definition 4.

The components at any hierarchy level c, whereby

, which are jointly utilized by sub-SFCs of different reliability classes are referred to as common root components.

For instance, in Figure 3, the sub-SFCs of reliability class

have two common roots, i.e., R5 and DC4, while DC4 is the common root of three reliability classes, i.e.,

. The backup sub-SFC of class

is placed separately from other reliability classes and does not have any common roots.

To describe the common root components of any type, i.e., hierarchy level, c and all related reliability classes combined by this component c, we introduce a set

, where any

describes a component type and each

presents a set of indexes of reliability classes hosted by this component

, i.e.,

and

. Since all components of type

are always disjoint, i.e., different VMs, they can not be a common root components

resulting in

, i.e., DC, rack, or server. Note, that using the component type

, i.e., server, as a common root will result in the same placement strategies for all sub-SFCs, which utilize this root. Considering the example in Figure 3, the configuration set for the common roots can be determined as

with

and

3 SFC Reliability Analysis

Without loss of generality, we define the reliability of a generic SFC as a probability that at least k≥ 1 sub-flows can successfully traverse k out of n = k + r sub-SFCs, where r≥ 0. Since each sub-SFC consists of Ψ different VNF types, at least k VNFs of each type ψ have to be available during service runtime, i.e., in total

VNFs. As different sub-SFC can belong to different reliability classes, the component-dependent reliability of each shared and unshared component of type c utilized to accommodate certain VNFs of class ξ are denoted as

and

, respectively. Generally, reliabilities of any data center, any rack, any server, and any VM can differ (heterogeneity). At the same time, only shared components of a certain type c utilized to host any sub-SFC belonging to a certain class ξ have an equal component-dependent reliability

, where

, relates to hierarchy level, i.e., defines component type, utilized for a placement of class ξ.

With r backup sub-SFCs, any component from any hierarchy level c and any reliability class ξ is allowed to fail as long as these failures result in a failure of at most r VNFs of any type. To assess the number of components that can fail without affecting the SFC, we introduce a parameter Acceptable Component Failures (ACF)

for common root components,

for shared and

for unshared components of type c. ACF defines the amount of failed hardware and software components of any type that do not lead to service interruption. It is obvious that when no backup or other protection is applied, there is no ACF, and all primary components, i.e., DCs, racks, servers, and VNFs, must remain available/reliable during service runtime. The ACF strongly depends on the number of backup components and the placement of VNFs of a certain reliability class inside DCs, racks and servers. Additionally, we introduce a variable

, which describes the number of remaining available components of a certain type c and reliability class ξ, which are not affected by failures of other components, but can still fail during service runtime. Using these two parameters, we can derive the placement-independent and placement-dependent SFC reliability. Without loss of generality, when

and

, SFC reliability can be generalized as a probability that at least

out of

components of type c are available to maintain at least k sub-SFCs. That can be described by a well known Binomial formula determined as

, where

is a probability mass function of binomial distribution.

3.1 Available Components and Acceptable Component Failures

The acceptable component failure (ACF) of shared

, unshared

, and common root

components generally shows the bound of any summation in Binomial formula used for reliability calculation and is a function of available (

) and failed (

) components of type c from the same reliability class ξ and the failures of any other components of higher hierarchy level (

), which cause

failures of components of the lowest hierarchy level over all reliability classes.

As at most r VNFs of each type of any reliability class are allowed to fail, we need to define ACF for each reliability class and each component type over all reliability classes. This requires, however, a consideration of failure interdependency between components and reliability classes as, e.g.,

and

, where each previously assumed failure reduces the acceptable failures of component

and the amount of remaining available components of type c and class ξ, respectively.

Depending on VNF placement strategy, different reliability classes can utilize the same common root components, while a failure of common root will result in complex failure propagation affecting multiple reliability classes, i.e., multiple sub-SFC. A failure of one common root can generally cause a failure of more than r sub-SFCs and lead to service interruption. Thus, let us first derive ACF

of the common root components of any reliability class from set

, i.e.,

. Since there is only one common root component per

reliability classes, the amount of available roots

and the ACF of the root can be either 0 or 1. Both depend on the number of VNFs of each type, which will fail if a certain root fails, and on a number of the already failed roots. Generally, the root component can only fail if it is available. A specific common root component

is, however, only available when the other roots from the higher hierarchy do not previously fail leading to the failure of

, i.e.,

, where

is a number of failed common root

. Then, the amount of available common roots is defined as

(1)

When

, ACF of root

can hypothetically be

. However, to maintain the service, the total amount of failed VNFs due to failure of root

and other root components may not be larger than the number of backup VNFs, i.e., r. The amount of VNFs of a certain type belonging to a reliability class

can be calculated as

VNFs, which will fail with failure of the root component type

. Since there are

reliability classes affected by failure of root component of type

, the total number of failed VNFs of certain type of any reliability class

is determined as

, where

and

. Since there can be other common roots from the higher hierarchy level that affects the root under consideration as well as other roots, we need to take into account all other VNF failures due to failure of all other

roots. Thus, the amount of failed VNFs of certain type over

root components can be calculated as a function of any

, i.e.,

, where

, iff

, and ensures that the failures of VNFs from the same reliability class are considered only once. The last formulation can take into account any VNF placement strategy, where, for instance, DC and rack inside this DC represent two common roots for the same reliability classes, whereby DC can combine more reliability classes and lead to failure of rack if it fails, as shown in Figure 3. Summarizing all constraints above, the ACF of the common roots is

(2)

where

and

, iff

Since all components of a certain reliability class are placed separately from components of any other reliability class per definition and only some common root components

can be utilized to combine different classes, we need to additionally consider each reliability class ξ separately. A number of available components and ACF is thereby a function of VNF placement strategy and amount of available and failed components of any type in reliability class ξ. As DCs are the components from the highest hierarchy level and independent of failures of any other component types, the amount of available DCs that are not the common root components is equal to the amount of data centers required to accommodate

sub-SFCs of reliability class ξ, i.e.,

. The failure of

DCs causes a failure of

racks reducing the overall number of available racks as

, where

. Then, the remaining amount of servers after DC and rack failures is

, where

. After failures of

DCs,

racks, and

servers, the remaining amount of VNFs of any type is

, if

. Since some reliability classes have a common root component

, which impact multiple reliability classes, i.e., any ξ,

, we need to consider the availability and failure of common root components from a set Φ as

for any

. Thus, the amount of available primary and backup components

of any component type

of any reliability class ξ, after the failure of any component of higher hierarchy level, i.e., of type 1 to

, is determined as follows

(3)

where the first case describes a number of the common root components of a certain type c and is the same for any reliability class ξ from set

. However, when there is one reliability class only, H = 1, there are no common root components and the first case in Equation (3) will be never true. Other two cases describe the amount of available shared and unshared components.

As multiple VNFs of different type of a certain reliability class ξ can share some components, the number of any failed components

of type

can vary as

for these shared components and as

for disjoint (unshared) components in reliability class ξ. Here, any

out of

components of type

of reliability class ξ can fail without interrupting the end-to-end service. Generally, ACF, i.e.,

for component

shared by different VNF types and

for unshared component, is a function of available components

of type c and a reliability class ξ, the amount of provided backup VNFs r, and the number of VNFs considered as already failed

after

components of any type c,

, and any class ξ,

failed. The amount of failed components

of the lowest hierarchy level

and a certain type, e.g., VNFs1, due to failure (

) of any component types

, i.e.,

, caused by failure of component

can be generalized as an iterative function:

(4)

where failures of any component from the higher hierarchy level, from 1 to c, are taken into account. The failure of

components of type

is caused by failure of components of type 1 to c, i.e.,

, and reduces the amount of available backup VNFs r, which are generally allowed to fail, i.e., reduces ACF. The failure of any common root

can be taken into account by Equation (4) with following assumption:

, if

and

Then, ACF

of any shared component is either the total number of available VNFs provided by a reliability class ξ and placed inside component c, i.e.,

or, the total number of available backup VNFs r reduced by any failures of common roots, and shared components of any type and any reliability class, whichever is lower. The resulting ACF

shows how many components c of a reliability class ξ may fail and still keep the service reliable. The amount of backup VNFs is reduced by the VNF failures due to failures of the components from the higher hierarchy level, i.e., from 1 to c-1, and the certain reliability class ξ, which can be calculated with Equation (4) as

. Additionally, the number of available backup VNFs is reduced by VNF failures due to a failure of any shared component from any other reliability classes from class 1 to class

. These failures can be also determined with Equation (4) as

, where

defines the lowest hierarchy of a shared component related to the reliability class l. Thus, to define ACF of component

, we need to take into account

remaining backup VNFs that can fail without service interruption. Thus, the amount of VNFs of class ξ, which may fail, are defined as

. Since we are interested in ACF of component

, which can be from any hierarchy level, i.e.,

, we need to normalize by the amount of VNFs allocated to component

only, i.e., by

. This is also the number of VNFs, which will fail if one component

fails. As a result, ACF for a component type

shared by multiple VNFs of different type and a reliability class ξ can be derived as follows:

(5)

where the sum of all

identifies all failures of component types from 1 to

of reliability classes from 1 to

and, finally, all prior failures of shared components. Obviously, if any prior reliability class

have

unshared components, then all Ψ VNFs as well as all related component types of this class l are disjoint and there are no shared components, i.e.,

Similarly, ACF of unshared components of any type

and any reliability class ξ, i.e.,

, is a function of the remaining available components

of type c from a reliability class ξ, the amount of remaining backup VNFs out of r reserved VNFs, and the number of VNFs considered as failed due to failure of common roots, shared components, and unshared components of different reliability classes. Thus, there is a need to define the amount of VNFs that may still fail without interrupting the service, i.e., the minimum between the total number of available VNFs provided by the reliability class ξ and placed inside component

, i.e.,

, and the amount of remaining backup VNFs after failures of any other common roots, shared and unshared components from any other reliability classes. Thus, the number of backup VNFs r is already reduced by possible failures of shared and unshared components from hierarchy level 1 to

of any reliability class l,

. These failed VNFs from

different reliability classes can be calculated with Equation (4) as

. Some VNFs of reliability class ξ could fail due to failures of components from a higher hierarchy level belonging to the same class ξ. Their number can also be determined by Equation (4) as

. Finally, any other reliability class l,

, can affect ACF of class ξ, if they utilize some shared components that could fail resulting in

additional VNF failures. Thus, the amount of remaining backup VNFs of a certain type is determined as

and defines the maximal number of VNFs that may fail. To calculate the amount of components

, which can fail resulting in allowed VNFs failures calculated above, the required normalization is performed with

similar to Equation (5). This product determines the number of VNFs placed inside component

. As a result, ACF for a component

utilized for disjoint placement of Ψ different VNFs of a reliability class ξ is given as follows:

(6)

3.2 Placement-independent SFC Reliability

SFC reliability is independent of VNF placement strategy when either (i) only one component type can fail, e.g., servers only, whereas a single VNF in the server does not fail, or, (ii) all VNFs are placed in a fully disjoint manner, i.e., maximal disjointness degree

and maximal number of unshared components

, e.g., each VNF is placed in different DC. Without loss of generality, in case of fully disjoint VNF placement, the reliability of any VNF type ψ of reliability class ξ can be defined as

and considered as a failure of one component, e.g., VNF, which just contains multiple sub-components from 1 to

. To this end, we follow the most common approach assuming component disjointedness and failure of one component type c only. This enables us to significantly reduce the complexity of reliability analysis, as there are no shared components and no failure interdependency, while the component reliability is only a function of component-dependent reliability

and ACF

. Thus, we assume that components from a certain hierarchy level c, which contain one VNF only, can fail and all other components from any other hierarchy level

, where

, have a reliability of 100%, i.e.,

. As a result, the end-to-end service reliability is independent of the VNF placement strategy. We extend this approach by an assumption of multiple reliability classes, whereby only components of type c of any reliability class ξ can fail. Then, the service is successful if at least k out of n components of type c and, thus, k VNFs of each type ψ from any reliability class ξ are available.

Lemma 1.

The placement-independent SFC reliability is a function of the number of reliability classes H, SFC length Ψ, ACF for any reliability class

and the number of available components

required to maintain k sub-SFCs, i.e.,

(7)

where

and

3.3 Placement-dependent SFC Reliability

The SFC reliability is a function of VNF placement strategy when we consider the amount of available and failed primary and backup DCs, racks, servers, and VNFs involved as well as their interdependency. Let us further simplify notation as

and

for shared and unshared components, respectively, to reduce a size of some formulas provided below, where

and

. Our main assumption that all shared components from a hierarchical level c utilized to place sub-SFCs of reliability class ξ and all unshared components utilized to place VNFs of type ψ and class ξ have the same component reliability

and

, respectively, is still valid.

3.3.1 One Reliability Class.

There is only one reliability class H = 1 when all n primary and backup sub-SFCs are placed in the same fashion and all shared components of each type c have the same component-dependent reliability, i.e.,

. In contrast, each unshared component of any type c can have different component-dependent reliability, i.e.,

. As we do not need to differentiate between reliability classes here, we do not use index ξ for clarity. To place all n VNFs of a certain type ψ, e.g., VNF1,

primary and backup DCs,

primary or backup racks inside any DC, i.e., in total

primary and backup racks, and

primary or backup servers inside any rack, i.e., in total

primary and backup servers, and

VNFs are required, whereby any server can host

VNFs of the same type ψ. Similarly, the same amount of DC, racks and servers are utilized to place each other VNF type from the same sub-SFC, e.g., VNF2, ..., VNFΨ. Thus, the number of components from hierarchy level c utilized for SFC placement can be calculated as

, if c = 1, or else as

. However, some of these components, i.e., Δ unshared component types from the lower hierarchy level, are utilized for a certain VNF type ψ only; other

component types are shared by VNFs of different types. Thus, we need to consider the VNF placement strategy and the resulting number of shared and unshared components by taking into account the components’ interdependency.

3.3.2 H-Reliability Classes.

We next consider a generic SFC with n primary and backup sub-SFCs where any sub-SFC can belong to one out of H,

, reliability classes. Then, an index ξ,

, shows a certain reliability class of any primary or backup component utilized to place the sub-SFCs from this reliability class ξ. Thus, any component type c utilized to host reliability class ξ and its component reliability are noted as

and

of shared or

of unshared components, respectively. Multiple sub-SFCs can belong to a certain reliability class ξ if these sub-SFCs are placed following a certain placement strategy and all

involved component types have a certain component reliability value

each. Note that unshared components of hierarchy level c utilized for placement of a certain VNF type ψ of a certain SFC reliability class ξ have the same component reliability, i.e.,

, which can differ from component reliability of unshared components used to place another VNF type

from the same reliability class ξ, i.e.,

. Without loss of generality, there are

VNFs per sub-SFC, an arbitrary number of involved component types (

types) to place these VNFs and an arbitrary number of reliability classes (H for primary and backup sub-SFCs), whereby each reliability class has its own number of unshared components,

. Thus, the service reliability needs to consider the placement strategy and the resulting number of unshared components

of any reliability class ξ. Depending on placement strategy, some sub-SFCs from different reliability classes can have common root components, e.g., are allocated to the same DC or in the same rack. The failure of these common root components results in failures of multiple relevant component types and the whole sub-SFCs. All reliability classes combined by the common root and the related root component are collected in set

, where

is an index,

. Any

describes a component type c of the common root component, where w is a set of reliability classes accommodated by the common root component of type c, i.e.,

and

. Thus,

is a

th set of reliability classes, which are allocated in component c from collection Φ. Then, the resulting expression for service reliability needs to take into account H≥ 1 reliability classes of SFC of length

, the shared and common root components utilized for allocation of sub-SFCs from any reliability class ξ, i.e.,

4 Numerical Results and Discussion

This section provides numerical results that show reliability of generic SFCs and verify accuracy of the provided analytical model. We investigate the placement-dependent SFC reliability considering all possible VNF placement strategies using racks, servers, and VM to place VNFs over inter- and intra-DCNs and show the impact of amount of sub-SFCs and backups, SFC length, and component reliability as well as the number of reliability classes on the resulting SFC reliability.

We present analytical results of placement-independent and placement-dependent service reliability calculated with Equation (10) for

and

and Equation (12) for all other placement strategies; ACF of shared, unshared, and common root components were determined with Equations (5), (6), and (2); the number of available components from any hierarchy level and number of failed VNFs were calculated with Equations (3), (1), and (4), respectively. We verified the analytical model and analytical results presented by Monte Carlo simulations. Since the simulation results overlapped with analytical results (with 95% confidence interval), we present analytical results only, where

and, in the most cases, SFC reliability values are sorted in descending order.

We first evaluate the placement and component-dependent reliability of generic SFCs. Assuming that all sub-SFCs follow the same placement strategy, we consider all 16 possible placement strategies over DCs, racks, servers, and VMs, as presented in Figure 4. Each placement strategy is characterized by different amounts of unshared components Δ and minimal number of protected component types

. Some DCN components can host either multiple sub-SFCs or multiple VNFs of the same type. Thus, we distinguish between “SFC based placement,” i.e., the entire sub-SFC is placed in the same hardware components and VNFs of the same type are separated, and “VNF type based placement,” i.e., VNFs of the same type are placed together in any hardware component. The related placement strategy (Pl.) in the tables below is marked with v for “VNF type based placement” and with s for “SFC based placement.”

Fig. 4.

The placement strategies described as (

), (

), and (

) in Figure 4 present special cases, as they have features of both placement strategies, i.e., “v” and “s,” whereby each VNF of each type is placed in different DCs, same DC and disjoint racks, same rack and disjoint servers, and in the same server, respectively. In other words, sub-SFCs and VNFs of different types are not separated. While the number of unshared components Δ, disjointness degree

, the number of reliability classes

and the common root components collected in Φ are defined by the selected placement strategy, the component-dependent reliability

, SFC length, and the number of primary and backup sub-SFCs (k and r) required to maintain and to protect the service have to be predefined and adapted to a certain use case. To be able to see the generic dependence of SFC reliability on placement strategy, we assume for the numerical evaluation that component reliability is independent of reliability class and VNF type, i.e.,

. Thus, sub-SFCs can only belong to different reliability classes if they utilize different placement strategies, resulting in at most

reliability classes. Thus, we investigate different scenarios by defining component reliability in two ways: (1) Case 1 as more practically relevant, where

[14]; and (2) Case 2 as an idealized scenario, where

. We investigate the SFC reliability over

primary sub-SFCs with

VNFs and different amount of backup sub-SFCs r.

Let us first assume that there is only one reliability class H = 1 and one primary sub-SFC (k = 1) and evaluate the impact of Δ,

, Ψ and r on the resulting service reliability. Table 2 shows the SFC reliability for Case 1 and Case 2 as a function of different VNF placement strategies defined through Δ and

and shown in Figure 4. Without backup protection (r = 0), the SFC reliability is independent of

, since there are only one (sub-)SFC and no VNFs of the same type utilized. However, as expected, for a fixed

the SFC reliability decreases with increasing number of unshared components Δ and increasing SFC length Ψ. For fixed Δ but different

, the SFC reliability is identical. This is because an increasing Δ indicates that the probability increases that a component can fail due to hardware issue. As expected, Case 2 provides higher SFC reliability due to higher component reliability defined, but reflects the same functional dependency. Without any backup protection, the placement strategies resulting in

provide the highest SFC reliability.

Table 2.

	Δ	Pl.	Case 1						Case 2
			Ψ = 4			Ψ = 8			Ψ = 4			Ψ = 8
			r = 0	r = 1	r = 100	r = 0	r = 1	r = 100	r = 0	r = 1	r = 100	r = 0	r = 1	r = 100
4	1	s	0.95952	0.99951234	→ 1	0.9217	0.9990295	→ 1	0.99993	0.9999999963	→ 1	0.99989	0.9999999935	→ 1
4	2	s	0.95665	0.99950754	→ 1	0.9152	0.9990157	→ 1	0.99990	0.9999999948	→ 1	0.99982	0.9999999900	→ 1
4	3	s	0.95636	0.99950737	→ 1	0.9146	0.9990150	→ 1	0.99987	0.9999999939	→ 1	0.99975	0.9999999879	→ 1
4	4	v, s	0.95633	0.99950735	→ 1	0.9145	0.9990149	→ 1	0.99984	0.9999999936	→ 1	0.99968	0.9999999872	→ 1
3	1	s	0.95952	0.99950314	0.99998999	0.9217	0.9990210	0.999989	0.99993	0.9999899976	0.999989	0.99989	0.9999899956	0.999989
3	2	s	0.95665	0.99949841	0.99998999	0.9152	0.9990074	0.999989	0.99990	0.9999899967	0.999989	0.99982	0.9999899935	0.999989
3	3	v, s	0.95636	0.99949824	0.99998999	0.9146	0.9990067	0.999989	0.99987	0.9999899964	0.999989	0.99975	0.9999899928	0.999989
3	4	v	0.95633	0.99946825	0.99996000	0.9145	0.9989367	0.999920	0.99984	0.9999599970	0.999960	0.99968	0.9999199956	0.999920
2	1	s	0.95952	0.99941118	0.99989000	0.9217	0.9989366	0.999890	0.99993	0.9999799988	0.999980	0.99989	0.9999799976	0.999980
2	2	v, s	0.95665	0.99940702	0.99989000	0.9152	0.9989242	0.999890	0.99990	0.9999799985	0.999980	0.99982	0.9999799969	0.999980
2	3	v	0.95636	0.99910722	0.99959006	0.9146	0.9982252	0.999190	0.99987	0.9999499994	0.999950	0.99975	0.9999100004	0.999910
2	4	v	0.95633	0.99907725	0.99956007	0.9145	0.9981553	0.999120	0.99984	0.9999200012	0.999920	0.99968	0.9998400088	0.999840
1	1	v, s	0.95952	0.99849061	0.99889011	0.9217	0.9980912	0.998890	0.99993	0.9999699999	0.999970	0.99989	0.9999699995	0.999970
1	2	v	0.95665	0.99549813	0.99589643	0.9152	0.9911255	0.991918	0.99990	0.9999400011	0.999940	0.99982	0.9999000037	0.999900
1	3	v	0.95636	0.99519951	0.99559769	0.9146	0.9904319	0.991224	0.99987	0.9999100032	0.999910	0.99975	0.9998300128	0.999830
1	4	v	0.95633	0.99516966	0.99556782	0.9145	0.9903626	0.991155	0.99984	0.9998800062	0.999880	0.99968	0.9997600268	0.999760

Table 2. The Placement-dependent SFC Reliability for k = 1 vs. Backup SFCs r and SFC Length Ψ

We next investigate the traditional 1+1 backup protection, i.e., r = 1. At first, the service reliability decreases with decreasing amount of backup component types

and an increasing number of unshared components Δ. However, some placement strategies with low

provide higher SFC reliability (marked bold) than a few placements with a higher

. In Case 1 with an SFC length of Ψ = 8, placement with

and

outperforms placements with

and

. That is because a generic SFC that utilizes placement strategies with

and

involves many more hardware components that can fail increasing the probability of service interruption. In contrast, SFC placed with parameters

and

will be most likely interrupted due to VNF failures, as the number of racks and servers involved is low. Then, if we consider the VNF components only, its reliability relates to

, i.e., the lower reliability of the longer SFC chain, i.e.,

, can not be sufficiently compensated by the backup DCs, racks and servers. For Case 2, we marked bold some results for the placement strategies (v), as we observed that a placement (s) with lower

outperforms a placement based on VNF type, e.g., placement with

and

outperforms placement with

and

. This can be explained by the fact that the placement with

and

involves more hardware components, whereby both DCs have to be available to maintain the service. Moreover, also other placement strategies based on VNF type (v) (marked bold) provide the lowest SFC reliability for any value of r and Ψ. As a result, the amount of backup components, i.e.,

, can be generally reduced without degrading the service reliability, and it is preferable to use a placement strategy (s).

We next study the impact of increasing the amount of backup sub-SFCs on resulting service reliability, where r = 100 sub-SFCs can be a good approximation of an upper bound. First, a more detailed rule can be observed for both use cases, which also reflects the case without protection:

For a given value of , SFC reliability decreases with increasing number of unshared components Δ, and for a fixed value of Δ, SFC reliability increases with increasing .

For instance, for a

and descending

, it can be seen for all r and Ψ, the higher

results in higher SFC reliability. Furthermore, SFC reliability increases with increasing redundancy, e.g.,

for

, respectively, if

and

in Case 1. In contrast, with

and

, the provided SFC reliability of “3-” and “2-”nines in Case 1 is almost independent of value

, respectively.

We further investigate the placement strategy characterized by

and

, which provides the highest service reliability, setting k≥ 1 primary sub-SFCs and component reliability as defined for Case 1. Table 3 shows the service reliability as function of k,

and Ψ. The first column contains the amount of backup sub-SFCs normalized by the number of primary sub-SFCs, i.e.,

. As expected, the increasing number of VNFs in SFC, i.e., Ψ, decreases SFC reliability. In contrast, the increasing number of primary sub-SFCs significantly increases the service reliability if

. The SFC reliability of “six nines” can be reached with configuration k = 4 and r = 4 or k = 8 and r = 4 for both Ψ = 4 and Ψ = 8, while a standard serial SFC (k = 1) requires r = 3 backup sub-SFCs, which means 300% redundancy (not shown for clarity of presentation). Thus, to increase service reliability and to decrease backup resources per sub-SFC, a generic SFC should include as many sub-SFCs as possible.

Table 3.

	k = 1	k = 4	k = 8	k = 1	k = 4	k = 8
	Ψ = 4			Ψ = 8
0	0.959529855054096	0.847683965207723	0.718568104870289	0.921720550240843	0.721767099608634	0.520947746077460
0.125	-	-	0.983696719311009	-	-	0.968371471691151
0.25	-	0.995273874413365	0.999384198321856	-	0.990697185787667	0.998774974539633
0.375	-	-	0.999981195954079	-	-	0.999962424513603
0.5	-	0.999893621626573	0.999999500094456	-	0.999787784003278	0.999999000321022
0.625	-	-	0.999999987999969	-	-	0.999999976000402
0.75	-	0.999997932451063	0.999999999734000	-	0.999995866449611	0.999999999468002
0.875	-	-	0.999999999994470	-	-	0.999999999988942
1	0.999512344397461	0.999999963313851	0.999999999999890	0.999029523012867	0.999999926631523	0.999999999999781
1.125	-	-	0.999999999999997	-	-	0.999999999999995
1.25	-	0.999999999389706	0.999999999999999	-	0.999999998779421	0.999999999999998
1.375	-	-	0.999999999999999	-	-	0.999999999999999

Table 3. Service Reliability for Pl. with

and

vs. a Number of Primary k and Backup r Sub-SFCs

Finally, we investigate the SFC reliability as function of placement, which are different for primary and backup SFCs resulting in H = 2 different reliability classes. We use configuration with k = 4, r = 3, Ψ = 4, and component reliability defined for Case 1. Table 4 shows the service reliability, whereby the first two columns and the two upper rows (

and Δ) describe placement strategy of primary and backup sub-SFCs, respectively, as presented in Figure 4. Also here, the general rule is that SFC reliability decreases with increasing number of unshared components Δ for a fixed

. To this end, we show the results for the backup placement strategy described by

as this configuration provides the highest SFC reliability. However, the highest reliability of “five nines” can be reached when primary sub-SFCs are placed with disjointness degree

(marked bold italic), i.e., all placements “s.” Thus, the best placement strategy is to use less disjoint components for placement of VNFs of a certain sub-SFC and to maximally separate sub-SFCs from each other, e.g., place the whole sub-SFC in the same server and different sub-SFCs in different DCs.

Table 4.

	Δ	1	2	3	4	1	2	3	4
		4	4	4	4	3	3	3	3
4	1	0.999997932451	0.9999979318655	0.9999979318286	0.999997931825	0.99999641	0.99999641	0.99999641	0.9999961885402
4	2	0.999997931783	0.9999979317005	0.9999979316975	0.999997931697	0.99999631	0.99999631	0.99999631	0.9999961883993
4	3	0.999997931753	0.9999979316968	0.9999979316951	0.999997931695	0.99999630	0.99999630	0.99999630	0.9999961883957
4	4	0.999997931750	0.9999979316965	0.9999979316950	0.999997931694	0.99999629	0.99999629	0.99999629	0.9999961883954
3	1	0.999987936641	0.9999879360639	0.9999879360276	0.999987936024	0.99998641	0.99998641	0.99998641	0.9999861942812
3	2	0.999987935981	0.9999879359019	0.9999879358991	0.999987935898	0.99998631	0.99998631	0.99998631	0.9999861941428
3	3	0.999987935952	0.9999879358984	0.9999879358968	0.999987935896	0.99998630	0.99998630	0.99998630	0.9999861941393
3	4	0.999957936614	0.9999579365603	0.9999579365588	0.999957936558	0.99995630	0.99995630	0.99995630	0.9999561948534
2	1	0.999887979247	0.9998879787470	0.9998879787164	0.999887978713	0.99988646	0.99988646	0.99988646	0.9998862523920
2	2	0.999887978666	0.9998879786134	0.9998879786120	0.999887978611	0.99988636	0.99988636	0.99988636	0.9998862522773
2	3	0.999588042268	0.9995880422155	0.9995880422140	0.999588042213	0.99958642	0.99958642	0.99958642	0.9995863163972
2	4	0.999558054926	0.9995580548741	0.9995580548726	0.999558054872	0.99955643	0.99955643	0.99955643	0.9995563291076
1	1	0.998888476185	0.9988884761457	0.9988884761447	0.998888476144	0.99888699	0.99888699	0.99888699	0.9988869043083
1	2	0.995894806423	0.9958948063838	0.9958948063828	0.995894806382	0.99589332	0.99589332	0.99589332	0.9958932392572
1	3	0.995596067857	0.9955960678177	0.9955960678167	0.995596067816	0.99559459	0.99559459	0.99559459	0.9955945011612
1	4	0.995566200274	0.9955662002344	0.9955662002334	0.995566200233	0.99556472	0.99556472	0.99556472	0.9955646336249

Table 4. Service Reliability with H = 2 Different Reliability Classes in Configuration k = 4,

and Ψ = 4

Placement of primary and backup sub-SFCs defined in two first columns and rows, respectively.

5 Conclusion

We provided a novel reliability analysis of the so-called generic SFC, which jointly considers component sharing, components heterogeneity, and their failure interdependency due to failure propagation. Our analysis is based on combinatorics and a reduced binomial theorem allowing us to analyze a placement-dependent SFC reliability, which has not been addressed previously. The analytical results confirmed by simulations showed some tradeoffs between component heterogeneity, interdependency, disjointness, and sharing for SFC placement strategies over inter- and intra-DCNs under the assumption of failures of DCs, racks, servers, and VNFs. We showed that the high VNF disjointness provided by placement strategies does not automatically lead to the highest SFC reliability and that not all placement strategies result in high reliability independently of the amount of redundancy. In contrast, a proper selection of the number of shared and unshared components in DCN can significantly increase SFC reliability, which indeed is placement-dependent. In contrast to the common assumption that a maximally disjoint placement is the best way to increase reliability, we found that service reliability could increase also by increasing the number of shared components and reducing the number of common roots. We verified that the highest SFC reliability over any configuration can be reached when each sub-SFC is allocated to a separate server, whereas VNFs of the same type and, thus, each sub-SFC, are placed in different DCs.

References

[1]

Waseem Ahmed and Yong Wei Wu. 2013. A survey on reliability in distributed systems. J. Comput. Syst. Sci. 79, 8 (2013), 1243–1255. DOI: https://doi.org/10.1016/j.jcss.2013.02.006.

Digital Library

Google Scholar

[2]

Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. SIGCOMM Comput. Commun. Rev. 38, 4 (Aug. 2008), 63–74. DOI: https://doi.org/10.1145/1402946.1402967.

Digital Library

Google Scholar

[3]

Abdelhamid Alleg, Toufik Ahmed, Mohamed Mosbah, and Raouf Boutaba. 2020. Joint diversity and redundancy for resilient service chain provisioning. IEEE J. Select. Areas Commun. 38, 7 (2020), 1490–1504. DOI: https://doi.org/10.1109/JSAC.2020.2986867.

Google Scholar

[4]

Yuan-Shun Dai, Yi Pan, and Xukai Zou. 2007. A hierarchical modeling and analysis for grid service reliability. IEEE Trans. Comput. 56, 5 (May 2007), 681–691. DOI: https://doi.org/10.1109/TC.2007.1034.

Digital Library

Google Scholar

[5]

Yuan-Shun Dai, Bo Yang, Jack Dongarra, and Gewei Zhang. 2010. Cloud service reliability: Modeling and analysis. In Proc. IEEE Pacific Rim Int. Symp. Depend Comput.784–789.

Google Scholar

[6]

W. Ding et al.2017. Enhancing the reliability of services in NFV with the cost-efficient redundancy scheme. In Proceedings of the IEEE International Conference on Communications. 1–6. DOI: https://doi.org/10.1109/ICC.2017.7996840.

Google Scholar

[7]

A. Engelmann, W. Bziuk, and A. Jukan. 2020. Bounding reliability in service function chaining. In Proceedings of the IEEE International Conference on Information, Communication and Electronic Technology (MIPRO).

Crossref

Google Scholar

[8]

A. Engelmann and A. Jukan. 2018. A reliability study of parallelized VNF chaining. In Proceedings of the IEEE International Conference on Communications. 1–6. DOI: https://doi.org/10.1109/ICC.2018.8422595.

Google Scholar

[9]

A. Engelmann, A. Jukan, and R. Pries. 2019. On coding for reliable VNF chaining in DCNs. In Proceedings of the International Conference on the Design of Reliable Communication Networks. 83–90.

Google Scholar

[10]

Z. Ye et al. 2016. Joint topology design and mapping of service function chains for efficient, scalable, and reliable network functions virtualization. IEEE Netw. 30, 3 (May 2016), 81–87. DOI: https://doi.org/10.1109/MNET.2016.7474348.

Digital Library

Google Scholar

[11]

Z. Guo and Y. Yang. 2015. Exploring server redundancy in nonblocking multicast data center networks. IEEE Trans. Comput. 64, 7 (July 2015), 1912–1926. DOI: https://doi.org/10.1109/TC.2014.2346576.

Digital Library

Google Scholar

[12]

Z. Guo and Y. Yang. 2015. On nonblocking multicast fat-tree data center networks with server redundancy. IEEE Trans. Comput. 64, 4 (Apr. 2015), 1058–1073. DOI: https://doi.org/10.1109/TC.2014.2315631.

Digital Library

Google Scholar

[13]

Joel M. Halpern and Carlos Pignataro. 2015. Service Function Chaining (SFC) Architecture. RFC 7665. (Oct. 2015). DOI: https://doi.org/10.17487/RFC7665.

Google Scholar

[14]

S. Herker, X. An, W. Kiess, S. Beker, and A. Kirstaedter. 2015. Data-center architecture impacts on virtualized network functions service chain embedding with high availability requirements. In Proceedings of the IEEE Globecom Workshops (GC Wkshps). 1–7. DOI: https://doi.org/10.1109/GLOCOMW.2015.7414158.

Google Scholar

[15]

A. Hmaity, M. Savi, F. Musumeci, M. Tornatore, and A. Pattavina. 2016. Virtual network function placement for resilient service chain provisioning. In Proceedings of the International Workshop on Reliable Networks Design and Modeling. 245–252. DOI: https://doi.org/10.1109/RNDM.2016.7608294.

Google Scholar

[16]

Per Hokstad and Marvin Rausand. 2008. Common Cause Failure Modeling: Status and Trends. Springer London, London, 621–640.

Google Scholar

[17]

Anne Immonen and Daniel Pakkala. 2014. A survey of methods and approaches for reliable dynamic service compositions. Serv. Orient. Comput. Applic. 8, 2 (01 June 2014), 129–158. DOI: https://doi.org/10.1007/s11761-013-0153-3.

Digital Library

Google Scholar

[18]

Gauri Joshi, Emina Soljanin, and Gregory Wornell. 2017. Efficient redundancy techniques for latency reduction in cloud systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2, 2 (Apr. 2017). DOI: https://doi.org/10.1145/3055281.

Digital Library

Google Scholar

[19]

Wasiur R. KhudaBukhsh, Sounak Kar, Amr Rizk, and Heinz Koeppl. 2019. Provisioning and performance evaluation of parallel systems with output synchronization. ACM Trans. Model. Perform. Eval. Comput. Syst. 4, 1 (March 2019). DOI: https://doi.org/10.1145/3300142.

Digital Library

Google Scholar

[20]

Defang Li, Peilin Hong, Kaiping Xue, and Jianing Pei. 2019. Availability aware VNF deployment in datacenter through shared redundancy and multi-tenancy. IEEE Trans. Netw. Serv. Manag. 16, 4 (2019), 1651–1664. DOI: https://doi.org/10.1109/TNSM.2019.2936505.

Google Scholar

[21]

Hidefumi Nakamura et al.ETSI GS NFV-REL 003 V1.1.1: Network Functions Visualisation (NFV); Reliability; Report on Models and Features for End-to-End Reliability. https://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/003/01.01.01_60/gs_NFV-REL003v010101p.pdf.

Google Scholar

[22]

T. A. Nguyen, D. Min, E. Choi, and T. D. Tran. 2019. Reliability and availability evaluation for cloud data center networks using hierarchical models. IEEE Access 7 (2019), 9273–9313. DOI: https://doi.org/10.1109/ACCESS.2019.2891282.

Google Scholar

[23]

Larry Peterson and Saurav Das. 2017. Trellis: CORD Network Infrastructure. (Oct. 29 2017). Retrieved from https://wiki.opencord.org/pages/viewpage.action?pageId=2557405.

Google Scholar

[24]

L. Qu, C. Assi, K. Shaban, and M. Khabbaz. 2016. Reliability-aware service provisioning in NFV-enabled enterprise datacenter networks. In Proceedings of the International Conference on Network and Service Management. 153–159. DOI: https://doi.org/10.1109/CNSM.2016.7818411.

Digital Library

Google Scholar

[25]

L. Qu, M. Khabbaz, and C. Assi. 2018. Reliability-aware service chaining in carrier-grade softwarized networks. IEEE J. Select. Areas Commun. 36, 3 (Mar. 2018), 558–573. DOI: https://doi.org/10.1109/JSAC.2018.2815338.

Google Scholar

[26]

Paul Quinn, Uri Elzur, and Carlos Pignataro. 2018. Network Service Header (NSH). RFC 8300. (Jan. 2018). DOI: https://doi.org/10.17487/RFC8300.

Google Scholar

[27]

J. Quittek et al.ETSI GS NFV 002 V1.1.1: Network Functions Visualisation (NFV); Architectural Framework. https://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.02.01_60/gs_NFV002v010201p.pdf.

Google Scholar

[28]

Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter rising: A decade of clos topologies and centralized control in Google’s datacenter network. In Proceedings of the SIGCOMM Conference.

Digital Library

Google Scholar

[29]

Gurpreet Singh et al.ETSI GS NFV-REL 004 V1.1.1: Network Functions Visualisation (NFV); Assurance; Report on Active Monitoring and Failure Detection. https://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/004/01.01.01_60/gs_NFV-REL004v010101p.pdf.

Google Scholar

[30]

O. Soualah, M. Mechtri, C. Ghribi, and D. Zeghlache. 2017. A link failure recovery algorithm for virtual network function chaining. In Proceedings of the IFIP/IEEE Symposium on Integrated Network and Service Management (IM). 213–221. DOI: https://doi.org/10.23919/INM.2017.7987282.

Google Scholar

[31]

Da Wang, Gauri Joshi, and Gregory W. Wornell. 2019. Efficient straggler replication in large-scale parallel computing. ACM Trans. Model. Perform. Eval. Comput. Syst. 4, 2 (April 2019). DOI: https://doi.org/10.1145/3310336.

Digital Library

Google Scholar

[32]

Meng Wang, Bo Cheng, and Junliang Chen. 2020. Joint availability guarantee and resource optimization of virtual network function placement in data center networks. IEEE Trans. Netw. Serv. Manag. 17, 2 (2020), 821–834. DOI: https://doi.org/10.1109/TNSM.2020.2978910.

Digital Library

Google Scholar

[33]

H. Xu and B. Li. 2014. TinyFlow: Breaking elephants down into mice in data center networks. In Proceedings of the IEEE International Symposium on Local and Metropolitan Area Networks. 1–6. DOI: https://doi.org/10.1109/LANMAN.2014.7028620.

Google Scholar

[34]

Y. Xu and V. P. Kafle. 2017. Reliable service function chain provisioning in software-defined networking. In Proceedings of the 13th International Conference on Network and Service Management (CNSM). 1–4. DOI: https://doi.org/10.23919/CNSM.2017.8256022.

Google Scholar

Cited By

View all

Sharma SJukan AMalik AGumaste A(2024)A Network Calculus Model for SFC Realization and Traffic Bounds Estimation in Data CentersACM Transactions on Internet Technology10.1145/370044024:4(1-32)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3700440
Li YZhang PKumar NGuizani MWang JKostromitin KWang YTan L(2024)Reliability-assured service function chain migration strategy in edge networks using deep reinforcement learningJournal of Network and Computer Applications10.1016/j.jnca.2024.103999(103999)Online publication date: Aug-2024
https://doi.org/10.1016/j.jnca.2024.103999
Siasi NJasim MGhani N(2024)Post-fault restoration of service function chains in fog networksComputer Networks10.1016/j.comnet.2024.110580251(110580)Online publication date: Sep-2024
https://doi.org/10.1016/j.comnet.2024.110580
Show More Cited By

Index Terms

A Combinatorial Reliability Analysis of Generic Service Function Chains in Data Center Networks
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Networks
  1. Network performance evaluation
    1. Network performance analysis

Recommendations

Efficient Redundancy Allocation for Reliable Service Function Chains in Edge Computing
Abstract
Ensuring the high reliability of service function chains (SFCs) in Edge Computing, in which several distributed edge servers are available, is a challenging issue. Previous studies on reliable SFCs ignore the impact of physical hardware failures ...
Performability analysis method from reliability and availability
ICHIT '09: Proceedings of the 2009 International Conference on Hybrid Information Technology

Availability and reliability are important attributes of dependability. Many dependability analysis methodologies measure and predict availability and reliability separately. However, those methodologies cannot reflect real system characteristics since ...
Resource backup algorithm of service function chain based on network characteristics and sharing advantages

In the context of network function virtualisation (NFV), the issue of low reliability during deployment of service function chain (SFC) has become an urgent and important issue to be addressed. A heuristic advantage node selection and backup algorithm ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems

ACM Transactions on Modeling and Performance Evaluation of Computing Systems Volume 6, Issue 3

September 2021

98 pages

ISSN:2376-3639

EISSN:2376-3647

DOI:10.1145/3492429

Editor:
Leana Golubchik
University of Southern California, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2021

Accepted: 01 July 2021

Revised: 01 June 2021

Received: 01 November 2020

Published in TOMPECS Volume 6, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

European Commission

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
807
Total Downloads

Downloads (Last 12 months)240
Downloads (Last 6 weeks)71

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sharma SJukan AMalik AGumaste A(2024)A Network Calculus Model for SFC Realization and Traffic Bounds Estimation in Data CentersACM Transactions on Internet Technology10.1145/370044024:4(1-32)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3700440
Li YZhang PKumar NGuizani MWang JKostromitin KWang YTan L(2024)Reliability-assured service function chain migration strategy in edge networks using deep reinforcement learningJournal of Network and Computer Applications10.1016/j.jnca.2024.103999(103999)Online publication date: Aug-2024
https://doi.org/10.1016/j.jnca.2024.103999
Siasi NJasim MGhani N(2024)Post-fault restoration of service function chains in fog networksComputer Networks10.1016/j.comnet.2024.110580251(110580)Online publication date: Sep-2024
https://doi.org/10.1016/j.comnet.2024.110580
Azadiabad SKhendek F(2024)Dependability of Network Services in the Context of NFV: A Taxonomy and State of the Art ClassificationJournal of Network and Systems Management10.1007/s10922-024-09810-232:2Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1007/s10922-024-09810-2

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

k	number of primary sub-SFCs;
r	number of backup sub-SFCs;
n	total number of pre-reserved sub-SFCs, i.e., n = k + r;
Ψ	number of VNFs in a SFC and, thus, sub-SFC;
ψ	a type of VNF in a SFC, where ;
	number of hierarchy levels, i.e., different component types;
c	certain component type, ;
	highest value of shared component c, ;
H	total number of reliability classes, i.e., ;
ξ	reliability class of sub-SFC and related components, ;
	number of sub-SFCs of reliability class ξ, ;
	number of components from the highest hierarchy of reliability class ξ;
	number of components of type c inside components of type c-1;
	reliability of any shared component of type c utilized to host class ξ;
	reliability of any unshared component of type c utilized to host class ξ and VNFψ;
	number of component types to disjointedly place VNFs of the same type;
Δ	number of component types to disjointedly place VNFs of different type;
	acceptable amount of failures of a shared component c belonging to class ξ;
	acceptable amount of failures of an unshared component c used to host VNFψ of class ξ;
	number of available components of type c and reliability class ξ;
	number of failed components of type c and class ξ;
w	set of reliability classes hosted by common root, i.e., ;
c_w	type of common root component, which hosts \|w\| reliability classes;
Φ	set of root components for a certain SFC placement, i.e., ;
	logical function to ensure that the common roots of any type c of any class are considered only once for calculation of SFC reliability;
	number of available common roots of any type c and class ;
	acceptable amount of failures of common root c of class ;
	number of failed components of the lowest hierarchy level and a certain type, e.g., VNFs1, due to failed components of type , i.e., , of same class ξ;
	placement-independent SFC reliability, if H≥ 1 and ;
	placement-independent SFC reliability, if H≥ 1 and Ψ = 1;
R	placement-independent SFC reliability, if and ;
	placement-dependent SFC reliability, if H = 1 and ;
	placement-dependent SFC reliability, if H = 1 and Ψ = 1;
	placement-dependent SFC reliability, if H≥ 1 and ;
	placement-dependent SFC reliability, if H≥ 1, Ψ = 1 and ;
	placement-dependent SFC reliability, if H≥ 1, Ψ = 1 and ;
	placement-dependent SFC reliability, if H≥ 1, and .

Abstract

1 Introduction

2 Preliminaries

2.1 SFC Placement

2.2 Generic SFCs in DCN

2.3 Hierarchical Component Failures

2.4 Definitions and Parameters for VNF Placement

3 SFC Reliability Analysis

3.1 Available Components and Acceptable Component Failures

3.2 Placement-independent SFC Reliability

3.3 Placement-dependent SFC Reliability

3.3.1 One Reliability Class.

3.3.2 H-Reliability Classes.

4 Numerical Results and Discussion

5 Conclusion

References

Cited By

Index Terms

Recommendations

Efficient Redundancy Allocation for Reliable Service Function Chains in Edge Computing

Performability analysis method from reliability and availability

Resource backup algorithm of service function chain based on network characteristics and sharing advantages

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations