Software">
2021COAZ4028
2021COAZ4028
2021COAZ4028
1
2
Résumé
De nombreux progrès ont eu lieu ces dernières années dans les domaines de
la virtualisation, l’informatique en nuage et la programmation des fonctions
réseau. L’essor des concepts tels que Software Defined Networking (SDN) et
Network Function Virtualization (NFV) a largement modifié la manière dont
les fournisseurs de services Internet gèrent leurs offres. Parallèlement, au cours
de la dernière décennie, les plateformes sécurisées de Cloud publiques telles que
Amazon AWS ou Microsoft Azure sont devenues des acteurs incontournables de
la scène. Ces nouveaux concepts permettent des réductions de coûts et une plus
grande rapidité d’innovation, ce qui a conduit à l’adoption de ces paradigmes
par l’industrie. Tous ces changements apportent également leur lot de nouveaux
défis. Tout en étant devenus tentaculaires et complexes, ces réseaux offrent une
plus grande diversité de services: les tester devient ainsi de plus en plus com-
pliqué, tout en nécessitant beaucoup de ressources. Pour résoudre ce problème,
nous proposons un nouvel outil qui combine les technologies d’émulation et
les techniques d’optimisation afin de distribuer les simulations SDN/NFV dans
des bancs de test privés et des plateformes de Cloud publiques. Par ailleurs,
les fournisseurs de Cloud proposent en général aux utilisateurs des métriques
spécifiques en termes de CPU et de ressources mémoire afin de caractériser leurs
services, mais ont tendance à présenter une vue d’ensemble de haut niveau du
délai maximum engendré par le réseau, sans aucune valeur spécifique. Ceci peut
constituer un problème lorsqu’il s’agit de déployer des applications sensibles au
délai dans le Cloud, car les utilisateurs n’ont pas de données précises sur ce
sujet. Nous proposons un cadre de test pour surveiller le délai engendré par le
réseau entre plusieurs centres de données des infrastructures Cloud. Enfin, dans
le contexte des réseaux SDN/NFV, nous exploitons la logique centralisée SDN
pour implémenter une stratégie optimale de routage en cas de défaillances mul-
tiples des liens dans le réseau. Un environnement de banc de test a également
été créé afin de valider nos propositions pour différentes topologies de réseau.
−Mots-clés: Emulation, SDN, NFV, Mininet, Virtualisation
3
4
Abstract
5
6
Contents
1 Introduction 9
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 Network Emulation . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Cloud Testing . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Failure Recovery . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Background 17
2.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Hypervisor-Based Virtualization . . . . . . . . . . . . . . 17
2.1.2 Container Virtualization . . . . . . . . . . . . . . . . . . . 20
2.2 Network Function Virtualization (NFV) . . . . . . . . . . . . . . 21
2.3 Software Defined Networking (SDN) . . . . . . . . . . . . . . . . 22
2.3.1 SDN Architecture . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Openflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Service Function Chaining (SFC) . . . . . . . . . . . . . . . . . . 25
2.5 Test-beds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Cloud Environments . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Distrinet 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Distributed Mininet . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 Multi-Host Mininet Implementation . . . . . . . . . . . . 35
3.4 Distrinet Architecture . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.1 Distrinet Core Performance Assessment . . . . . . . . . . 39
3.5.2 Tools Comparison . . . . . . . . . . . . . . . . . . . . . . 40
3.5.3 Experiments With High Load . . . . . . . . . . . . . . . . 42
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7
8 CONTENTS
5 Cloud Measurement 79
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Amazon Web Services Infrastructure . . . . . . . . . . . . . . . . 82
5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6 Failure Recovery 93
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 Optimization Approaches . . . . . . . . . . . . . . . . . . . . . . 96
6.3.1 Problem Statement And Notations . . . . . . . . . . . . . 96
6.3.2 A Layered Network Model . . . . . . . . . . . . . . . . . . 99
6.3.3 Compact ILP Formulation . . . . . . . . . . . . . . . . . . 100
6.3.4 Column Generation Approach . . . . . . . . . . . . . . . . 101
6.3.5 Benders Decomposition Approach . . . . . . . . . . . . . 103
6.3.6 The Min-Overflow Problem . . . . . . . . . . . . . . 104
6.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.5 Implementation Perspectives . . . . . . . . . . . . . . . . . . . . 118
6.5.1 Implementation Options . . . . . . . . . . . . . . . . . . . 118
6.5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 119
6.5.3 Recovery Time . . . . . . . . . . . . . . . . . . . . . . . . 120
6.5.4 Operational Trade-Offs . . . . . . . . . . . . . . . . . . . 124
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Introduction
1.1 Context
Networks are continuously growing and evolving, and the recent advances in vir-
tualization and orchestration are accelerating this evolution. 5G is the perfect
example of how Internet Service Providers (ISPs) are changing their offers and
manage access to their infrastructures via SDN and NFV technologies. Another
example of the rise of virtualization technologies is cloud computing. It is esti-
mated that the global cloud computing market will grow from US $371.4 billion
in 2020 to US $832.1 billion by 2025 Compound Annual Growth Rate of 17.5%.
The major players are Amazon Web Services (AWS) and Microsoft Azure, fol-
lowed by Google Cloud. The main advantage that the virtualization technology
provides is flexibility. It is easy and just takes a few seconds to create a fully
operational virtual machine, simply on demand. This flexibility can reduce for
network operators and general IT companies the OPerating EXpense (OPEX)
and CAPital EXpenditure (CAPEX) to deploy and operate around the globe.
Before deploying any network, there are studies and tests to be performed, to
avoid congestion and not underestimate (or overestimate) the network capacity.
Since the size and complexity of these networks grow, the number of resources
needed to emulate them increases. A straightforward solution to solve this issue
is vertically scale the emulating host in order to match the required resources.
This solution cannot scale to infinity since a single machine will reach its limits
at some point. The solution is to scale the emulation horizontally between
multiple hosts. This is not an easy task, since most of the network emulators
are designed to run on a single host. Another limitation is that NFV networks
can have multiple services designed to run on different virtual environments.
This means that for each service to deploy, the emulator requires a virtual
environment.
To solve this issue, we can use Virtual Machines (VM) or containers. Our
work aims to design a flexible SDN/NFV network emulator, easy to distribute
in private clusters, test-beds, and public clouds, and compatible with the main
9
10 CHAPTER 1. INTRODUCTION
SDN emulator available in the market (i.e., Mininet). This project is divided
into two different parts. The first part overviews the technical design decision
made to create the emulator’s architecture, describing the orchestration and
virtualization technologies, while the second part focuses on the algorithmic
problem raised when distributing a virtual network in a physical infrastructure.
We investigated and compared our tool with the one available in the market,
showing that our approach is the one that provides the most trusted results.
The network emulator is designed to work in a controlled environment, like a
test-bed, but it is also compatible with cloud environments.
Cloud providers sign a contract with their customers, called Service Level
Agreement, that specify and ensure that a minimum service level is maintained.
It guarantees levels of reliability, availability, and responsiveness to systems and
applications. When describing the IaaS services offered to the customers, the
providers are particularly specific in terms of virtual RAM, virtual CPU, and
network bandwidth. Still, they tend to hide the value of the network delay in
the infrastructure. The cloud providers usually indicate that the datacenters
composing the infrastructure are connected via high speed, optical networks,
but there are no specific values. The applications are usually developed with
a monolithic architecture, i.e., built as a single unit with a database layer, a
back end layer, and a client layer. With a monolithic strategy, it is difficult to
scale such applications. This is why the enterprises are moving from monolithic
to microservice applications. This can lead to synchronization problems since
the application is not running on a single machine but on multiple machines.
If the delay between the instances is too high, the application cannot work
properly. Furthermore, the developers can decide to provide regional resiliency.
This means that the application will be deployed in 2 different regions. If the
provider does not share the network delay information, how can developers
decide where to deploy the application if it is delay-sensitive ?
We propose a Command Line Interface (CLI) tool to test the delay stability
in the cloud infrastructure. In particular, we analyze Amazon AWS. The tool is
open-source and publicly available. It is written in Python and uses the latest li-
braries to create and manage virtual machines inside a cloud infrastructure. The
idea behind the tool is to perform synchronized trace-routes between different
instances in different regions and availability zones. It can perform experiments
between two or more regions (multiregional scenario). Such tests are made to
verify the delay stability between different regions. We expect in this case that
the delay varies depending on the distance between the regions. The other tests
are performed within a single region (regional experiment). The experiments
monitor the delay in a specific region between two or more Availability Zones.
Even with the use of the best resources and equipment, networks may fail. In
the context of SDN and NFV networks, efficient network algorithms considered
too hard to be implemented in a legacy network now can exploit the benefits
of a centralized controller with the SDN paradigm. The networks are failure-
prone, and, usually, failures tend to be correlated. To design this correlation, we
consider the network dimensioning problem with protection against the Shared
Risk Link Group (SRLG). An SRLG is a set of links inside the network that can
1.2. CHALLENGES 11
1.2 Challenges
1.2.1 Network Emulation
Emulating a network is often required before deploying it in a real environment.
As networks are growing in terms of size and complexity, emulating them be-
comes more and more difficult. In the context of SDN, Mininet is the primary
emulator used by the community. Since networks within the NFV paradigm
require more resources, emulating a network in a single machine can become
tricky. If the emulation tools do not provide a way to scale horizontally, the
single physical machine can exceed the resource limit in terms of CPU or RAM,
and the emulation can return results that are not aligned with the expected
ones. Multiple works are proposed to share the emulation in a distributed in-
frastructure like MaxiNet and Mininet Cluster Edition, but these tools have
some limitations. They do not consider, for instance, the physical topology and
the capabilities of the physical infrastructure. We aim to address such limi-
tations with our tool called Distrinet, and the new placement algorithms we
propose which are natively implemented on it. The main challenge is to make
the tool compatible with private clusters or public cloud platforms. The tool
has to provide a virtual connection between two or more virtual nodes. This
is a simple task to perform in private clusters since containerization technolo-
gies automatically manage overlay networks with multicast tunnels to connect
the instances. All the cloud providers do not allow multicast addresses in their
virtual network when using IaaS services. The emulation should be trustable,
which means that the emulation results should be in line with the results in
a real environment. The challenge is to distribute the emulation evenly in the
testing infrastructure, since only a single host or a single physical link overloaded
12 CHAPTER 1. INTRODUCTION
1.3 Contributions
This work lies in the new and growing SDN/NFV and cloud computing paradigms
of the recent years. We provide a high-level overview of the SDN framework and
architecture, and we describe how it can be integrated with Network Function
Virtualization (NFV) to procure Service Function Chaining (SFC), also known
as Network Service. Our first contribution is Distrinet, a distributed network
emulator that extends Mininet in order to run on multiple physical machines
and provide higher isolation. We briefly explain the limitation of the emula-
tors available on the market, then we run multiple tests showing that Distrinet
INTERNATIONAL JOURNAL 13
returns results aligned with the expected ones. The comparison is made on
different architectural and tool choices in order to build a tool compatible with
general Linux clusters (e.g., a Linux test-bed) and public cloud infrastructures
(e.g., Amazon Web Services). Another factor that we consider and study is
how to correctly distribute the experiment in a distributed environment. We
propose three new algorithms and we compared them with the ones proposed in
the main tools already available. We extensively compare the algorithms using
more than 75,000 experiments over heterogeneous and homogeneous networks.
The result shows that our algorithms return a feasible solution for almost all
the network types. We also made an extensive study of the performance degra-
dation due to an overcommitment of the physical machines [Len+21],[Di +19a],
[Di +19b],[Di +19d] [Di +19e], [Di +21c], [Di +21b].
Since many organizations are moving their applications to the cloud, for our
second contribution, we provide a tool that measures and analyzes the cloud
network delay. This tool performs experiments for multiregional and regional
environments. It automates all the processes from the creation of the cloud
environment and the deployment of the virtual instances to the configuration of
trace-route in the instances. The results are automatically retrieved and a map
is created and plotted with all the experiment info [Di +21a].
Our last contribution is a failure recovery strategy for SDN/NFV embedded
networks. Networks, in general, are failure-prone, and they can experience
multiple link failures at the same time. These failures can be related to each
other, and the network providers usually have all data enabling them to predict
which links may fail. We propose a global rerouting strategy in which, for
each failure situation, there is a new routing of all demands in the network
using the SDN technology. We test these solutions on different networks, and
the global rerouting solution consumes, on average, 40% less network bandwidth
compared to a dedicated path protection strategy. We also provide an emulation
framework with Mininet and OpenDayLight to validate the solution in a testing
environment [Tom+19a], [Tom+19b], [Tom+21].
List Of Publications
International Journal
[Di +21c] Giuseppe Di Lena, Andrea Tomassilli, Damien Saucez, Frédéric
Giroire, Thierry Turletti, and Chidung Lac. “Distrinet: a Mininet
Implementation for the Cloud”. In: ACM Computer Communica-
tion Review (2021) (cit. on p. 13).
[Tom+21] A. Tomassilli, G. Di Lena, F. Giroire, I. Tahiri, D. Saucez, S.
Perennes, T. Turletti, R. Sadykov, F. Vanderbeck, and C. Lac.
“Design of Robust Programmable Networks with Bandwidth-optimal
Failure Recovery Scheme”. In: Computer Networks (2021). Ed. by
Elsevier (cit. on p. 13).
14 CHAPTER 1. INTRODUCTION
International Conferences
[Di +19a] G. Di Lena, A. Tomassilli, D. Saucez, F. Giroire, T. Turletti, and
C. Lac. “Mininet on steroids: exploiting the cloud for Mininet per-
formance”. In: 2019 IEEE 8th International Conference on Cloud
Networking (CloudNet). 2019, pp. 1–3. doi: 10.1109/CloudNet47604.
2019.9064129 (cit. on p. 13).
[Len+21] Giuseppe Di Lena, Andrea Tomassilli, Frédéric Giroire, Damien
Saucez, Thierry Turletti, and Chidung Lac. “A Right Placement
Makes a Happy Emulator: a Placement Module for Distributed
SDN/NFV Emulation”. Proceedings of IEEE International Con-
ference on Communications (ICC). 2021 (cit. on p. 13).
[Tom+19b] A. Tomassilli, G. D. Lena, F. Giroire, I. Tahiri, D. Saucez, S.
Perennes, T. Turletti, R. Sadykov, F. Vanderbeck, and C. Lac.
“Bandwidth-optimal Failure Recovery Scheme for Robust Pro-
grammable Networks”. In: 2019 IEEE 8th International Confer-
ence on Cloud Networking (CloudNet). 2019, pp. 1–6. doi: 10 .
1109/CloudNet47604.2019.9064126 (cit. on p. 13).
National Conferences
[Di +19d] Giuseppe Di Lena, Andrea Tomassilli, Damien Saucez, Frédéric
Giroire, Thierry Turletti, and Chidung Lac. “Trust your SDN/NFV
experiments with Distrinet”. In: Journées Cloud (2019) (cit. on
p. 13).
1.4 Outline
The Thesis continues as follows. In Chapter 2, we provide the concepts, and
background necessary for the rest of the work. In this chapter, we introduce the
virtualization and containerization concepts and how they are related to NFV
and SDN. We also provide an overview of the test-beds and the cloud environ-
ments, explain why they are important, and why more and more organizations
are heavily investing on it.
In Chapter 3, we present our first contribution, introducing the main dis-
tributed network emulators present on the market, and comparing them with
our proposition Distrinet. This chapter analyzes different technical aspects of
Distrinet, which technologies are used to distribute the network experiments
and what are the advantages that Distrinet has, compared to the other tools.
In Chapter 4, we continue our study of the network emulation tools by ana-
lyzing more in depth the algorithms used to share the emulation in a distributed
environment. We introduce the Virtual Network Embedding Problem and the
algorithms used in the main emulators, and we propose three different algo-
rithms for Distrinet. A comparison with the algorithms proposed by the other
tools shows that our algorithms outperform them in every scenario.
In Chapter 5, we focus on cloud computing and we discuss the advantages
that it brings to the organizations. We present a tool for measuring the network
delay in different deployment scenarios, in a single region or in a multiregional
deployment. The tool is useful for all the persons who are planning to move
delay-sensitive applications from on-premises environment to the cloud.
In Chapter 6, we focus on SDN/NFV embedded networks, and we propose
in particular a global rerouting strategy to recover all the flows in the network
in case of single or multiple link failures. The chapter deeply analyzes the op-
timization approaches used like ILP and Column Generation techniques. We
finally experiment the global rerouting approach, comparing it with the classi-
cal dedicated path protection, and showing that global rerouting consumes on
average 40% less bandwidth in multiple scenarios.
We conclude with Chapter 7 where we draw our conclusions, summarizing
all the problems faced and the contributions we present. A short discussion is
finally proposed on how the work can be improved and pursued.
16 CHAPTER 1. INTRODUCTION
Chapter 2
Background
2.1 Virtualization
Virtualization is one of the main building blocks of cloud computing and Service
Function Chaining. Virtualization includes all the processes to be performed in
order to create a software version of a physical environment, e.g., a virtual host
or a virtual network. The main component of virtualization technologies is the
hypervisor, also known as Virtual Machine Monitor (VMM). With the new iso-
lation functions present in the last versions of the Linux kernel, containerization
solutions have been developed. Containers provide similar isolated environments
without the hypervisor.
The hypervisor typically runs above the physical host and selects the avail-
able resources on the host for allocating them to the virtual environment. There
are different types of virtualization. A first approach is to divide them into stan-
dard virtualization (hypervisor-based) and container-based virtualization.
17
18 CHAPTER 2. BACKGROUND
virtualization (Fig. 2.1), the hypervisor is running directly on the physical host,
while in hosted virtualization (Fig. 2.2) the hypervisor runs on top of the OS.
Full Virtualization
In Full Virtualization, the guest OS is not aware that it is in a virtualized
environment. Full Virtualization represents the first generation of solutions
that provide a virtual environment. The first company that provided a server
virtualization solution was IBM in 1966.
The host OS completely virtualizes the hardware. The guest OS can ex-
ecute commands just as for a regular physical host, but all the hardware is
created/simulated by the hypervisor. In this case, the binary files of the guest
OS remain unchanged. The guest OS runs as a user level process, and does
not have the same level of privileges as the OS that is hosting it on a physical
machine.
If the guest OS is not modified, it can try to execute some privileged instruc-
tions available only in the hosting OS (Fig. 2.3). In this case, the hypervisor
catches the privileged instruction and replaces the non-virtualizable instructions
with a new sequence of instructions that have the intended effect on the virtual
hardware, while the user-level application is directly executed on the physical
machine to improve the performances (Fig. 2.4).
Paravirtualization
In Paravirtualization, the guest OS is aware that it is hosted in a virtual environ-
ment, so it has an installed driver specifically for the virtualization environment
in which it is running. In this case, the source code of the guest OS is modified.
This allows to perform optimizations in the virtualization process. The guest
OS does not perform the privileged instructions like in Full Virtualization, but
it executes the instructions provided by the hypervisor layer. The hypervisor
layer can then access to the real resources in the underlying hardware, exploiting
fully this capability (Fig. 2.5).
Translation of
OS requests
2.1. VIRTUALIZATION 19
Ring 2
User Requests
Ring 2
Ring 2 Ring 2
Ring 1 Ring 1
ization
Calls to the
hypervisor
20 CHAPTER 2. BACKGROUND
a set of processes does not increase over a certain limit, while net prio Cgroup
provides a system to prioritize the network traffic of different sets of processes.
The last function to analyze is the chroot command provided in the Linux
environment. Chroot (contraction for change root) changes the apparent root
directory for the current running process and its children. Figs. 2.7 and 2.8
present through a high-level overview the differences between VMs and contain-
ers. As we can see, the VM has to virtualize the entire guest OS stack, while
the container has to create just the namespace and the Cgroup to separate the
processes of each guest OS. The container images are lighter as there is no need
to create the entire kernel since it is using the kernel running on the physical
host. Each container image needs to have a small part of the OS in order to work
in the host. As indicated earlier, it is only possible to create Linux containers
since the other OSs do not have namespaces and Cgroups.
APP APP
APP APP
Ubuntu 18.04 Windows 10
Ubuntu 18.04 Debian 9
Hypervisor Layer Namespace and Namespace and
Cgroup 1 Cgroup 2
Linux OS Linux OS
Physical Hardware Physical Hardware
functions.
By decoupling network functions from their underlying hardware, NFV pro-
vides more flexible networking functionalities. Indeed, the virtualized functions
can be created on-demand, without the installation overhead and they can be
reconfigured remotely, without human intervention. Network operators can dy-
namically deploy network functions faster in the same underlying physical in-
frastructure. This approach also brings benefits in terms of cost reductions.
Fig. 2.9 shows the idea behind the NFV concept. The functions can be virtual-
ized with hypervisor-based or container-based virtualization.
Load Balancer
Firewall
VPN server
Firewall Load Balancer
Router
COTS Hardware
centralize the controller plane. With this approach, the controller has a global
overview of the network and can perform routing decisions based on the full
network topology and the different metrics in the switches and the links. For
example, the controller can decide to reroute flows in a different path if there is
an overloaded link.
Control plane
Control plane
tion layer, and is usually realized through the REST APIs of SDN con-
trollers.
Yaml/Json App
SouthBound API
Infrastructure
Layer
2.3.2 Openflow
As introduced above, Openflow can be used for interfacing the SDN controller
with programmable switches. Openflow describes a standard specification for
2.4. SERVICE FUNCTION CHAINING (SFC) 25
• Priority: it is often referred inside the match field. When the switch
has installed in the Openflow table two or more different rules that match
a flow, the priority defines which rule should be applied. If the match’s
flows have the same priority, the protocol does not specify how to choose
the rule. Each vendor can implement a different policy.
• Counters: they are used to keep track of the number of packets and the
quantity of traffic which matched the flow.
creating a Service Function Path (SFP). Services are built to satisfy a particular
business needs and must satisfy policies that define operational characteristics
and access control/security. A SFC defines the required functions and the corre-
sponding order that must be applied to the packets belonging to a specific data
flow. Thanks to the dynamic function provisioning of NFV and the centralized
control of SDN, a SDN/NFV network is able to simplify the service chain de-
ployment and provisioning by making the process easier and cheaper, enabling
a flexible and dynamic deployment of network functions.
The European Telecommunications Standards Institute (ETSI) defines SFC
as a Network Service, an offer provided by an operator that is delivered using
one or more service functions. An example of SFC can be found in Fig. 2.13,
where the traffic should follow the order firewall, load balancer and Intrusion
Detection System (IDS). In traditional networks, all these functions run on
dedicated physical boxes. The adoption of SFC simplifies this concept, since the
functions can be created, destroyed or modified on demand on general purpose
servers. The network intelligence can be easily managed by an SDN controller,
and correctly interfaced with the underlying infrastructure architecture.
The SDN part is responsible to correctly forward the traffic to the correct
VNFs (Virtual Network Functions), improving security and isolation of the net-
work. The NFV technology is responsible for creating and managing the VNFs,
monitoring and handling these VNFs in case of errors or failures, e.g., link
failures. In this work, VNF is used in place of Service Function (SF)
2.5 Test-beds
With the ever growing complexity of networks, researchers have to rely on test-
beds to be able to fully assess the quality of their propositions. Emulation is
an essential step in the evaluation cycle of network applications and protocols.
It provides a fully controlled and duplicable environment to test real proto-
cols/applications in a reproducible way, without any modification.
It is interesting for test-bed infrastructures to provide network emulation ca-
pabilities in their environments in order to run large-scale scenarios with realistic
and accurate results. Test-beds, essentials for researchers, provide a controlled
environment which avoids any interference from the external environment. In
our work, we heavily rely on Grid’5000 for our experiments. Federated platforms
such as Fed4Fire test-beds provide a uniform interface for running emulation
scenarios, including hybrid ones that involve more than one test-bed.
2.6. CLOUD ENVIRONMENTS 27
since all the services that the cloud provider offers are usually on-demand. The
clients can stop guessing about the capacity needed, avoiding then to purchase
too much or too few resources, i.e., wasting money or facing downtime due to
infrastructure upgrade. The cloud can scale with their business needs, with no
long term contracts. The use of virtual infrastructure thus increases speed and
agility.
Cloud providers propose three different models for cloud computing (IaaS,
PaaS, SaaS) which will described later. They also provide a new type of design
called serverless architecture. It scales infinitely on demand. As the clients do
not have to spend money running and maintaining datacenters, their organiza-
tion can focus only on the problems to be solved, without having to manage the
infrastructure which is maintained and updated by the cloud provider.
The last aspect to take into account is the global scaling. A few years
ago, it was not possible for a small organization to provide a global service
immediately, since an infrastructure has to be setup around the world with
different datacenters, causing huge costs. With cloud computing, the clients can
easily deploy their applications in multiple regions around the world through a
few clicks. This allows the clients to provide low latency and a better experience
for their customers at minimal costs.
Cloud computing provides convenient, on demand access to a potentially
unlimited pool of computing resource. As mentioned previously, cloud providers
offer three service models (Fig. 2.14):
• On Premises: the user is responsible for maintaining the entire infras-
tructure, the user has to manage the physical servers and the network
(configuration and upgrade). The user is also responsible for designing
failure recovery strategies in case of server or network failures, and backup
of the data. All these management tasks have a huge cost, and if the user
has to scale the infrastructure globally around the world, the only way to
do it is to deploy new datacenters in the regions. For all these reasons,
companies should carefully consider cloud platforms.
• IaaS (Infrastructure as a Service): the user can manage the virtual or
physical servers inside the cloud provider infrastructure. As shown in the
figure, the cloud provider is responsible to manage the physical infrastruc-
ture, and to provide a virtualization layer isolating each user environment.
The provider is responsible to keep the data and execution of the virtual
instances from access of unauthorized users. The figure also shows that the
provider manages the physical servers and the network components of the
infrastructure. The provider is also responsible for the maintenance of the
OS and the hypervisor. The user has never access to the hypervisor layer,
since resources are shared between the users. The client is responsible
for all other layers in the application stack (e.g., Amazon EC2, Microsoft
Azure), i.e., the client has to maintain the guest OS and the applications
running on it. The user is also responsible of the security of the virtual
network and the management of the virtual firewall, as well as the keys to
access the instances.
2.6. CLOUD ENVIRONMENTS 29
• PaaS (Platform as a Service): the cloud provider takes care of all the
layers that manage the IaaS, and it is also responsible for the guest OS
and the platform hosting it (e.g., Google App Engine, GitHub). The client
just uses the platform without worrying about the resources and policies
needed to test its applications.
• Public cloud: the cloud provider is public and everyone has access to its
services (e.g., Amazon Web Services, Microsoft Azure, Google Cloud).
• Hybrid cloud: the client can decide to deploy its infrastructure partly on
the public cloud, and partly on a private cloud.
• Private cloud: some big players in the market can decide to build their own
cloud infrastructure. There are different choices for this type of scenario,
e.g., using OpenStack or VMware solutions.
30 CHAPTER 2. BACKGROUND
Chapter 3
Distrinet
3.1 Introduction
Modern networks became so complex and implementation-dependent that it is
now impossible to solely rely on models or simulations to study them. On one
hand, models are particularly interesting to determine the limits of a system,
potentially at very large scale or to reason in an abstract way to conceive effi-
cient networks. On the other hand, simulations are pretty handy for studying
the general behavior of a network or for getting high confidence about the ap-
plicability of new concepts. However, these methods do not faithfully account
for implementation details. To this end, emulation is more and more used to
evaluate new networking ideas. The advantage of emulation is that the exact
same code as the production one can be used and tested in rather realistic cases
helping to understand fine-grained interactions between software and hardware.
However, emulation is not the reality and it often needs to deal with scalability
issues for large and resource-intensive experiments. As it relies on real software
that implements all the details of the system to emulate, all three methods are
used and complement each other.
When it comes to Software Defined Networking (SDN), Mininet [LHM10] is
by far the most popular emulator. The success of Mininet comes from its ability
to emulate potentially large networks on one machine, thanks to lightweight
virtualization techniques and a simple yet powerful API. Mininet was designed
to run on one single machine, which can be a limiting factor for experiments
with heavy memory or processing capacity needs. A solution to tackle this issue
is to distribute the emulation over multiple machines. However, Mininet single-
machine design assumes that all resources are shared and directly accessible from
each component of an experiment. Unfortunately, when multiple machines are
used to run an experiment, this assumption does not hold anymore and the way
Mininet is implemented has to be revised.
In this chapter, we present Distrinet [Di +19c], an extension of Mininet
implemented to allow distributed Mininet experiments for leveraging resources
31
32 CHAPTER 3. DISTRINET
results which are as closest as possible to the outcome in real conditions. De-
pending on the situation, three different approaches can be used: simulation,
emulation or experimentation on test-beds. These approaches are complemen-
tary and, usually, several of them are required before the real deployment of a
new solution.
Emulation allows to test the performances of real applications over a virtual
network. A first frequently used tool to emulate networks is the Open vSwitch
(OVS) software switch [ovs20]. To build a virtual network, virtual switches
(vSwitches) can be connected with virtual interfaces, through GRE or VXLAN
tunnels. To emulate virtual hosts (vHosts), one can use containerization tools
(e.g., LXC [Can19] or Docker [Mer14]) or full virtualization tools (e.g., Vir-
tual Box [Wat08]). An example of general purpose testbed is FITS (Future
Internet Testbed with Security [Mor+14]), where the authors present the de-
sign and implementation of the facility for experimenting solutions for the next
generation Internet. FITS is build using Xen [Xen21] and OpenFlow [Sga+13].
In [Zha+19] the authors evaluate the performances of state-of-the-art software
switches: OVS-DPDK, snabb, BESS, FastClick, VPP and netmap VALE.
Graphical Network Simulator-3 (GNS3) [EA15] is a software emulating routers
and switches in order to create virtual networks with a GUI. It can be used to
emulate Cisco routers and supports a variety of virtualization tools such as
QEMU, KVM, and Virtual Box to emulate the vHosts.
Mininet [LHM10] is the most common Software Defined Networking (SDN)
emulator. It allows to emulate an SDN network composed of hundreds of vHosts
and vSwitches on a single host. Mininet is easy to use and its installation is
simple. As we show in Sec. 3.4, it is possible to create a network with dozens
of vSwitches and vHosts in just a few seconds. Mininet is very efficient to
emulate network topologies as long as the resources required for the experiments
do not exceed the ones that a single machine can offer. If physical resources are
exceeded, the results might be not aligned with the ones of a real scenario.
The tools closest to ours are Maxinet [Wet+14] and Mininet Cluster Edition
(Mininet CE[Pro19]). They allow to distribute Mininet on a cluster of nodes.
Maxinet creates different Mininet instances in each physical node of the cluster,
and connects the vSwitches between different physical hosts with GRE tunnels.
Mininet CE extends directly Mininet in order to distribute the vNodes and the
vLinks in a set of machines via GRE or SSH tunnels. Containernet [PKR16]
allows to extend Mininet to support Docker containers. By default, it is not
able to distribute the emulation on different nodes, but it is possible to combine
it with Maxinet or Mininet CE to support such an option and provide better
vNodes isolation. One of the main problems is that Maxinet and Containernet
do not consider the properties of the physical infrastructure (we will focus on
this limitation in the next chapter) in which they are running, e.g., a machine
or a link in the physical network may be overloaded during the emulation. With
Distrinet, we combine the concepts of Maxinet and Containernet to provide a
distributed Mininet implementation which takes into account the characteristics
of the logical and physical topologies.
While the Maxinet approach makes it possible to increase the scalability
34 CHAPTER 3. DISTRINET
Table 3.1: Supported features in the different tools for distributed emulation.
First, emulated nodes must be isolated to ensure the correctness of the ex-
periments, even when the hosts supporting the experiments are heterogeneous.
To obtain these guarantees, virtualization techniques (full or container-based)
have to be employed. Similarly, traffic encapsulation is needed such that the
network of the experiment can run on any type of infrastructure.
To start and manage experiments, an experimentation control plane is nec-
essary. This control plane allows to manage all the emulated nodes and links of
the experiment, regardless of where they are physically hosted.
PTY 1
OUT Jump
mn process
SSH
E
N2 H C2
PIP IN SS PTY 1
SSH
OUT
C1 N2
PI
PE IN
PTY 2 PTY 1
OUT OUT
N1 N1
IN IN
In Mininet, network nodes and links are created sequentially. The sequen-
tial approach is not an issue in Mininet where interactions are virtually in-
stantaneous. However, a sequential approach is not appropriate in Distrinet,
since nodes are deployed from LXC images and because every interaction with
a node is subject to network delays. For this reason, in Distrinet, the node
deployment and setup calls are made concurrent with the Asynchronous I/O
3.3. DISTRIBUTED MININET 37
Figure 3.2: Distrinet general architec- Figure 3.3: Amazon VPC configuration.
ture.
3.5 Experiments
To evaluate Distrinet, we considered 3 types of experiments. The first one mea-
sures the overhead in term of execution time that the tools introduce to allow
the distribution of the emulations.
The second experiment compares the network capabilities of Mininet CE, Maxinet,
and Distrinet. The last experiment shows the behaviors of a resource in-
tensive emulation inside a single physical host and in a distributed environ-
ment. For the evaluation, we used Amazon Web Service (AWS) [Ama20e] and
Grid’5000 [Bal+13]. In Grid’5000, we used the Gros cluster where the hosts are
all equipped with one Intel Xeon Gold 5220 (18 multi-threaded cores per CPU
at 2.20 GHz, i.e., 36 vCores) and 96 GB of RAM.
Table 3.3: Maximum throughput (in Mbps) ± standard deviation over 10 ex-
periments - the virtual links are bounded at 1 Gbps.
h1 h2 h3 h4 hn
last hosts for different linear topologies. For the star topology, there is an iperf
connection between 5 pairs of hosts. Table 3.3 summarizes the results over 10
experiments for each virtual topology. The physical environment is composed
of one or two machines completely dedicated to the experiments in Grid’5000.
The vLink capacities are set to 1 Gbps. The second and the third columns
compare Mininet and Distrinet while running on a single machine. We can see
that the maximum throughput measured with iperf is very close between our
tool and Mininet. The next set of experiments compares Mininet CE, Maxinet,
and Distrinet while distributing the virtual network on two physical hosts - the
Maximum Transmission Unit (MTU) for the physical interfaces has been set at
1600 bytes to avoid fragmentation. To be fair in the comparison, the distribu-
tion of the virtual nodes is done using the Maxinet placement algorithm with
all the tools, since Maxinet is the only one that has restrictions on the place-
ment (i.e., Maxinet does not allow to place a vSwitch and a vHost on different
physical machines if they are connected through a vLink).
As mentioned in Sec. 3.1, Mininet CE does not support TCLinks for vLinks
42 CHAPTER 3. DISTRINET
CPU overcommitment
104
400 %
Time (s)
103
0%
102
0%
101
2
5
10
20
30
40
50
60
70
80
90
0
0
0
00
10
25
50
10
CPU overcommitment
104
400 %
Time (s)
103 300 %
200 %
2
10
100 %
101 0%
2
10
20
30
40
50
00
10
Linear Topology (N)
Figure 3.6: Distrinet running in a single host, with the CPU overcommitment
percentage for each topology.
set distributed between one and 100 physical machines, the second set running
different topologies on a single physical machine.
The incomplete isolation of Mininet nodes prevents Hadoop from running prop-
erly. Hence, we use Distrinet in a single host, since the network performances
are very similar with respect to the Mininet’s ones (Tab. 3.3).
Experiments
3.6 Conclusions
To overcome the limitations of resource-intensive experiments being run on a
single machine, we proposed Distrinet that extends Mininet capabilities to make
it able to distribute experiments on an arbitrary number of hosts.
We have shown the cost to adapt an application designed to run on a single
machine to run in multiple hosts, while remaining compatible with it. Our im-
plementation is flexible and experiments can be run on a single Linux host, a
cluster of Linux hosts, or even in the Amazon EC2 cloud. Distrinet automati-
cally provisions hosts and launches Amazon instances such that experimenters
do not need to know the details of the infrastructure used for their experiment.
It is compatible with Mininet and its source code is available on https://
distrinet-emu.github.io. In the next chapter; we study how to correctly
distribute the emulation in a physical environment.
Chapter 4
Distributed Network
Emulation
4.1 Introduction
The complexity of networks has greatly increased in the last years. Networks
currently rely massively on software frameworks and virtualization, and their
performances become implementation dependent.
Using a single machine for a rapid emulation is thus limiting to handle resource
intensive experiments, e.g., needing heavy memory, processing, input/output,
or specific hardware to emulate, for instance, networks with virtual network
functions or artificial intelligence algorithms.
To tackle this issue, distributed emulation tools were proposed: Maxinet [Wet+14],
Mininet Cluster Edition [Pro19], Distrinet [Dis19; Di +19c]. These tools were
presented in the previous chapter. They allow to run experiments of various
types on a large number of machines with different hardware configurations. In
this chapter, we focus more specifically on the methodology used to distribute
the experiments with each tool.
Carrying distributed emulation rises several challenges. First, when facing
an experiment, is there a need to distribute it? In other words, how to know
if the experiment exceeds the capacity of a single node? Then, if yes, with
how many nodes and on which nodes should it be distributed? If it has to be
distributed onto m machines, how should the experiments be executed on these
machines?
Actually, a networking experiment can be seen as a virtual network or a graph
with node and link demands in terms of CPU, memory, network capacity, etc.
A fundamental problem that arises in this context is how to map virtual nodes
and links to a physical network topology, while minimizing a certain objective
function without exceeding the available resources.
Existing tools have placement modules answering partially these questions.
Mininet Cluster Edition implements three simple algorithms (Round Robin,
45
46 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
Random, and Switch Bin [Pro19]), while Maxinet uses an algorithm from ME-
TIS [met20], a library for graphs partitioning. However, these placement meth-
ods have several important limitations.
Firstly, they do not take into account the nodes’ resources and the links’ capaci-
ties. This means that they do not verify if nodes or links capacities are exceeded.
Consequently, experiments may run with congested links and overloaded nodes,
leading to unreliable results.
Secondly, they do not minimize the number of machines required as they use
all machines at their disposal. This is especially important for public clusters,
where physical resources are shared, as policy rules might lead to a large waiting
time to obtain the needed resources.
To solve these limitations, we studied placement algorithms to map an ex-
periment onto a set of physical machines (e.g., in private test-beds or clusters).
The experimental infrastructure topology is taken into account. The goal is to
provide a mapping such that the physical resources of the nodes (i.e., processing
and memory) and links (i.e., capacity) are not exceeded. This is important to
ensure a trusty emulation. The combination of node and link constraints makes
finding a feasible solution a difficult task. Indeed, by using a small number of
physical nodes, we might exceed their physical resources, while by using too
many physical nodes, we may exceed the available rate of the physical links if
the virtual nodes are not placed correctly. Our objective consists in minimizing
the number of reserved machines to run an experiment, motivated by the fact
that scientific clusters such as Grid’5000 [Gri20] require to reserve a group of
machines before running an experiment [Vic+13] and an excess in these terms
may lead to usage policy violations, or to a large waiting time to obtain the
needed resources.
The problem can be seen as a variant of a Virtual Network Embedding (VNE)
problem. However, only exact methods based on linear programming were pro-
posed to deal with it in the literature, and such solutions do not scale well and
have long execution times for large networks which constitute our targets in this
work. This motivates the need for fast algorithms that can provide near optimal
solutions.
We propose new tailored placement algorithms and compare them with the
ones used in existing tools. We built a placement module for distributed emu-
lators to solve efficiently this problem in practice. This module first decides if
the experiment has to be distributed. Then, given a pool of available machines,
it computes the deployment using the minimum number of machines to run
the experiments in such a way that physical resources are not exceeded. The
placement module can be used with any emulator. However, to test it in the
wild, we integrated it in Distrinet. Through this approach, the experiment is
automatically distributed over several nodes using the optimal allocation.
To summarize, our contributions are as follows:
- We build a placement module for distributed emulators with all the al-
gorithms implemented. The placement module can currently be used in
Distrinet, but the algorithms may potentially be integrated in any tool.
- We compare our algorithms with the ones implemented in existing tools
using extensive simulations. We show that they succeed in ensuring that
no link or node capacity is exceeded, while the same experiments running
with other tools would lead to resource overload.
- We then carry experimentation in a private cluster with the goal of eval-
uating the impact of such resource overload on the emulation. We show
that overloading a link, the CPU, or the memory may lead to respectively
important drops of measured bandwidth, the increase of execution time,
and emulation crashes.
The rest of this chapter is organized as follows. In Sec. 4.2, we review
the related works on placement methods to carry out distributed emulation.
We then formally state the problem and propose algorithms to deal with it in
Sec. 4.3. We evaluate the algorithms against existing placement modules with
extensive simulations in Sec. 4.4 and with experimentation in Sec. 4.5. Last, we
conclude and present future work on placement algorithms in Sec. 4.6.
vSwitch
vHost S7
20 20
S5 S6
10 10 10 10
S1 S2 S3 S4
H1 H2 H3 H4
1 1
7 7
20 20 20 20
1 1
1 1
5 6
5 6
10 10 10 10
10 10 10 10 2 2 2 2
2 2 2 2 1 2 3 4
Host 1
1 2 3 4 Host 0
In all these algorithms, the physical infrastructure is not taken into account.
This means that a physical link or a physical machine can be overloaded and
become a bottleneck for the emulation without the user being notified. In fact,
we show that the existing placement solutions behave well when the physical
infrastructure is an homogeneous environment. However, when the physical
environment is heterogeneous (different types of machines or a complex physical
network), they often return solutions with overloaded resources.
Moreover, the existing solutions do not evaluate the minimum number of
machines needed to run the experiment (and in particular if the experiment has
to be distributed). These solutions use all the machines put at their disposal
by the user. On the contrary, our placement module provides to the user the
smallest number of physical hosts in order to run the experiment without any
overloaded physical resource.
4.3.2 Algorithms
We propose three algorithms to tackle this problem. Our algorithms are able
to provide near-optimal solutions in a reasonable computation time. Solutions
are compared with the optimal ones computed using an ILP approach.
The first two algorithms, k-balanced and DivideSwap, have two phases.
Firstly, virtual nodes are mapped into the physical topology and, secondly,
physical paths are found to map virtual links. The third proposed algorithm,
GreedyPartition, mixes both nodes and links mapping.
Homogeneous Case
If the substrate nodes within the cluster are homogeneous in terms of physical
resources (or if there is a subset of homogeneous nodes from the entire cluster),
an assignment strategy may consist in carrying out a partition of the tasks
to be done by the physical machines while minimizing the network tasks that
would be necessary to be done. We refer to this algorithm as k-balanced.
Similarly as in [Gir+19], we use as a subroutine an algorithm for the k-balanced
partitioning problem. Given an edge-capacitated graph and an integer k ≥ 2,
the goal is to partition the graph vertices into k parts of equal size, so as to
minimize the total capacity of the cut edges (i.e., edges from different partitions).
The problem is NP-hard even for k = 2 [GJS74]. k-balanced solves a k-
partitioning problem for k = 1, . . . , min(|N S |, |N V |), and tests the feasibility of
the computed mapping m : N V → N S of virtual nodes on the substrate network.
The smallest k for which a feasible k-partitioning exists will be the output of
the algorithm. The corresponding pseudo-code is given in Algorithm 1.
The best known approximation factor for the k-balanced partitioning problem
is due√to Krauthgamer et al. [KNS09] and achieves an approximation factor
of O( log n log k), with n being the number of nodes in the virtual network.
Nevertheless, as their algorithm is based on semi-definite programming and
would lead to long execution time, to deal with the problem, we use the O(log n)
approximation algorithm described in [ST97]. The main idea consists in solving
recursively a Minimum Bisection Problem. To this end, we use the Kernighan
and Lin heuristic [KL70].
52 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
Algorithm 1 k-balanced
1: Input: Virtual network GV , Substrate network GS .
2: Output: a mapping of virtual nodes to substrate nodes m : N V → N S .
3: for k = 1, 2, . . . , min(|N S |, |N V |) do
4: sol ← Compute an approximate solution of the k-balanced partitioning
problem for GV .
5: if sol is feasible (see Sec. 4.3.2) then return sol
6: end if
7: end for
8: return ∅
Algorithm 2 DivideSwap
1: Input: Virtual network GV , Substrate network GS .
2: Output: a mapping of virtual nodes to substrate nodes m : N V → N S .
3: for k = 1, 2, . . . , min(|N S |, |N V |) do
4: Divide the nodes from N S in k balanced subsets V1 , V2 , ..., Vk .
5: Take a random sample of k physical nodes P1 , P2 , ..., Pk
6: Assign nodes in Vj to Pj for j = 1, ..., k.
7: for i = 1, 2, . . . , N SW AP S do
8: Choose at random two nodes u, v ∈ N V assigned to two distinct
physical nodes.
9: If by swapping u and v the cut weight decreases, swap them in sol
and update the cut weight
10: end for
11: if sol is feasible (see Sec. 4.3.2) then return sol.
12: end if
13: end for
14: return ∅
Algorithm 3 GreedyPartition
1: Input: Virtual network GV , Substrate network GS .
2: Output: a mapping of virtual nodes to substrate nodes m : N V → N S .
3: T ← Compute a bisection tree of GV .
4: for j = 1, 2, . . . , |N S | do
5: Select the j most powerful machines, P1 , . . . , Pj
6: Perform a BFS on the bisection tree T .
7: for each Node v of T do
8: if ∃ P ∈ P1 , . . . , Pj with enough resource to host the
virtual nodes in v then
9: v is assigned to P
10: Remove v and the subtree rooted at v from T
11: end if
12: end for
13: if T is empty then return sol
14: end if
15: end for
16: return ∅
General Case
Link Mapping
DivideSwap and k-balanced are based on the assignment of virtual nodes
into physical nodes, then the feasibility checking of the problem by trying to
allocate all virtual links between virtual nodes assigned in two different substrate
nodes to a substrate path. This problem is solved differently according to the
structure of the physical substrate network.
54 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
Tree Topologies
Even if the substrate network is assumed to be a tree, there are still decisions
to be made in terms of how the network interfaces of a substrate compute node
should be used by the virtual nodes. Indeed, if we allow the traffic associated to
a single virtual link to be sent using more than one network interface (e.g., 50%
on eth0 and 50% on eth1), then multiple links between a compute node and a
switch can be considered as a single link with an associated rate which corre-
sponds to the sum of rates on all the node network interfaces. As a consequence,
once the mapping between virtual and substrate nodes has been selected, check-
ing if a substrate compute node has enough resources on its interfaces to send or
receive a given set of rates can be done exactly in polynomial time with a BFS or
Depth First Search (DFS) visit in time O(|N S | + |LS |), as there exists only one
path between each source and destination pair. Conversely, if the virtual link
rate to be supported can be mapped on only a single network interface, then the
situation can be reduced to the Bin Packing Problem and is thus NP-hard. To
deal with it, we use the First-fit decreasing heuristic which has been shown to
be 11
9 -approximated for this problem [Joh+74] (i.e., it guarantees an allocation
using at most 11 9 OPT + 1 bins, with OPT being the optimal number of bins).
General Topologies
In this case, we also need to distinguish two situations according to the desired
strategy for mapping a virtual link of the virtual network to a physical path in
the substrate network. If path splitting is supported by the substrate network,
then the problem can be solved in polynomial time by using a multi-commodity
flow algorithm [Yu+08]. On the other hand, if path splitting is not supported,
then the situation can be reduced to the Unsplittable Flow Problem, which is
NP-hard [KS97]. In such case, we use the following approach. We consider
the virtual links in non-increasing order. Given the remaining capacities, we
find the shortest path in the residual network in which we remove all links with
an available rate smaller than the rate to be mapped. If we succeed to find a
physical path for all the virtual links to be mapped (i.e., between nodes assigned
to distinct physical machines), then the problem is considered feasible.
ILP Approach
As previously mentioned, we compare our approximation algorithms with the
optimal solution computed using an ILP approach. The goal of the ILP is to
minimize the number of host used for the mapping. We already defined with
GV = (N V , E V ) and GS = (N S , E S ) the virtual network and the substrate
(physical) network. The tuple (u, v) ∈ E V , represents the virtual link between
u, v ∈ N V ; while (i, j, d) ∈ E S represents the physical link between i, j ∈ N S
via the device d. We define with cj and mj the virtual cores and the memory
required by the virtual node j ∈ N V ; while Ci and Mi represent the maximum
4.3. PROBLEM AND ALGORITHMS 55
cores and memory available in the physical node i ∈ N S . For the links, ruv
represents the bandwidth required by the virtual link (u, v) ∈ E V , while Rijd
represents the maximum bandwidth for the physical link (i, j, d) ∈ E S . We
indicate with δ(i) the set of neighbours of the node i ∈ N V and with α(i, j) the
set of devices for the tuple (i, j) with i, j ∈ N S .
Variables:
• xi ∈ {0, 1}: boolean variable equal to 1 iff there is at least one virtual
node assigned to the physical node i ∈ N S , 0 otherwise
• yji ∈ {0, 1}: boolean variable equal to 1 iff the virtual node j ∈ N V is
assigned to the physical node i ∈ N S , 0 otherwise
• luvijd ∈ {0, 1}: boolean variable equal to 1 if the physical link (i, j, d) ∈ E S
is used by the virtual link (u, v) ∈ E V , 0 otherwise
Objective (4.1): minimization of the physical nodes needed in order to embed
the virtual network. X
min xi (4.1)
i∈N S
A machine is used if at least a virtual node is mapped on it (4.2)
xi ≥ yji ∀i ∈ N S , ∀j ∈ N V (4.2)
Assignment of virtual nodes to physical nodes (4.3)
X
yji = 1 ∀j ∈ N V (4.3)
i∈N S
Given a virtual link a physical machine, the rate that goes out from the physical
machine to a device (4.8) or that comes in to the physical machine from a device
(4.9) is at most 1 :
X X
luvijd ≤ 1 ∀(u, v) ∈ E V , ∀i ∈ N S (4.8)
j∈δ(i) d∈α(i,j)
56 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
X X
luvjid ≤ 1 ∀(u, v) ∈ E V , ∀i ∈ N S (4.9)
j∈δ(i) d∈α(i,j)
(b) Time
Figure 4.4: Performances of the heuristics and ILP solver for a K-Fat Tree -
value of the solution found (a) and time needed to find the solution (b).
10 Gbps Paravance
Parapide: 8 vCpu 24GB RAM
Parapluie: 24 vCpu 48GB RAM [41-45]
Paravance: 32 vCpu 128GB RAM
40 Gbps
Parasilo: 32 vCpu 128GB RAM
10 Gbps 40 Gbps
neous virtual and physical topologies. Scenarios with homogeneous virtual and
physical infrastructures are the most favorable for simple placement algorithms.
Scenarios with homogeneous physical infrastructures should be the most favor-
able ones for the placement modules of the existing tools, as they do not take
into account the physical infrastructure. We show that our algorithms outper-
form them even in this scenario. The heterogeneous scenario represents a more
complex case to show the importance of taking into consideration the capacities
of the physical infrastructure.
Physical Topologies
The first one is a simple star topology corresponding to the one of the Gros clus-
ter in Grid’5000 [Bal+13]. We use 20 physical machines, each equipped with an
Intel Gold 5220 (Cascade Lake-SP, 2.20 GHz, 1 CPU/node, 18 cores/CPU) and
96 GB of RAM. The machines are connected by a single switch with 25 Gbps
links. The second is represented in Fig. 4.5: this infrastructure is made of a
subset of 25 hosts of the Rennes cluster in Grid’5000. There are four types of
servers with different numbers of cores and memory sizes: Parapide (2 servers
with 8 cores and 24 GB of RAM); Parapluie (8 servers with 24 cores and 48 GB
of RAM); Parasilo (5 servers with 32 cores and 128 GB of RAM); Paravence
(10 servers with 32 cores and 128 GB of RAM) for a total of 25 machines. The
servers are interconnected using a small network with 4 switches and links with
capacities of 1 , 10 or 40 Gbps.
4.4. EVALUATION OF THE PLACEMENT MODULES 59
Virtual Topologies
We use two different families of virtual topologies: Fat Trees and Random. We
choose the first one as it is a traditional family of datacenter topologies: this
corresponds to the homogeneous scenario. Indeed, Fat Trees present symmetries
and all servers are usually similar.
- Links’ rates: rate associated to each virtual link (1, 10, 50, 100, 200, 500
or 1,000 Mbps).
Random Networks the requirements associated to the virtual nodes and virtual
links may be different.
In this section, we extensively study the performances of the different place-
ment algorithms. To this end, we considered more than 70,000 test instances,
corresponding to the mapping of each generated virtual topology on the two
physical topologies presented above.
While the existing placement algorithms always return solutions as they do
not take into account node and link capacities constraints, it is not the case
of our algorithms, as they make sure that resources are not overloaded. To
assess the impact of such a difference, we first analyze the cases where feasible
solutions are found. We then study the cases where physical constraints are not
respected. Finally, we discuss how the algorithmic choices are translated in the
number of physical hosts needed to run the experiments.
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
we study how severely overloaded links and nodes can belong to the computed
solutions.
Figs. 4.6–4.11 take into account the virtual instances for which an overloaded
solution is returned (in terms of CPU, memory, or link overcommitment) for all
the Mininet Cluster and Maxinet algorithms, using the different clusters.
The overcommitment is the estimation (in %) of how much the CPU, mem-
ory, or bandwidth is assigned in excess to a physical node or a physical link.
From the percentages in the plots, we can observe a strong difference in terms
of feasible solutions returned for the two different clusters.
If we focus our attention on the link overcommitment (Figs. 4.6 and 4.9), we
can observe that less than 1% of the returned solutions lead to links overloaded
in the Gros cluster. This is expected as the physical topology is a simple star.
Conversely, in the Rennes cluster, a high percentage of the solutions lead to
links overloaded: from 21.7% for Metis to almost 70% for Random, and 51%
and 57% for SwitchBin and RoundRobin, respectively. Moreover, for such
instances, the overcommitment is important: 4 or 5 links are 100% overloaded in
4.4. EVALUATION OF THE PLACEMENT MODULES 63
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
average for Random, SwitchBin, and RoundRobin. The overload factor may
reach several hundred percents and even more than 1,000% for some instances
with Random. The overcommitment is lower for Metis, the median case being
2 links with a 40% overload. The explanation of this better behavior of the
latter algorithm is that it is using a partitioning graph algorithm minimizing
the cuts between partitions placed in different machines.
Considering now CPU overcommitment (Figs. 4.7 and 4.10), we see that it is
frequent, both for the Gros and Rennes clusters. RoundRobin is the algorithm
handling the best CPU resources: only 5.85% of its returned solutions are over-
loaded compared to 35%, 43%, and 55% for Metis, Random, and SwitchBin,
respectively. The explanation is that RoundRobin tries to distribute the load
evenly to the physical hosts. In Rennes, Metis behaves a little bit better than
Random and SwitchBin with fewer overloaded nodes. Memory overcommit-
ment rarely happens in Gros (see Figs. 4.8 and 4.11). In Rennes, from 18% to
35% of the instances are overloaded. Metis is the best algorithm in this case.
Focusing on Metis, we can see that it returns a solution with more link overload-
64 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
does not contain many large topologies, as the less efficient placement algorithms
were not able to find solutions for them.
As expected, the proposed algorithms use much fewer physical hosts. For
the Gros cluster (Fig. 4.12), the general tendency is that GreedyPartition
uses between 1 and 13 hosts (median is 3) in case it is emulating a vFT, and
between 2 and 11 hosts (median is 5) in case it emulates a vRD. METIS uses a
minimum of 4 instances with a maximum of 20 instances (medians are 4 and
12). Round Robin (medians are 13 and 14), Random (medians are 7 and 20),
and Switch Bin (medians are 5 and 15) use in general more hosts than Metis.
The differences are even more important for the heterogeneous topology,
Rennes, especially in terms of maximum number of hosts used (Fig. 4.13). The
numbers of hosts used by our algorithms are in general lower (e.g., between 1 and
7 hosts for GreedyPartition), while the ones for the existing algorithms are
higher (e.g., between 4 and 24 hosts for METIS, and between 7 and 26 for Round
Robin). Indeed, the same virtual topology is harder to solve on a heterogeneous
physical topology than on a homogeneous one. So, the set of topologies for which
66 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
all algorithms find a solution is smaller for the Rennes topology and contains
smaller virtual topologies on average. Our algorithms thus find solutions using
a lower number of hosts, while other algorithms have difficulty to map them
efficiently on the heterogeneous physical topology. Again, GreedyPartition
is the best algorithm in terms of resource usage.
28
26 vFatTree vRandom
24
Number of physical hosts
22
20
18
16
14
12
10
8
6
4
2
0
s
yP
in
ed
om
eti
wa
obi
nc
hB
eed
nd
M
eS
dR
ala
itc
Ra
Gr
v id
un
Kb
Sw
Di
Ro
28
26 vFatTree vRandom
24
Number of physical hosts
22
20
18
16
14
12
10
8
6
4
2
0
s
yP
in
ed
om
eti
wa
obi
nc
hB
eed
nd
M
eS
dR
ala
itc
Ra
Gr
v id
un
Kb
Sw
Di
Ro
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
physical hosts or links. The results are reported in Table 4.2 for vFT and vRD
topologies for the Gros cluster and in Figs. 4.14, 4.15, and 4.16. We tested
525 different vFT instances and 4,500 vRD topologies. For the vFT, the most
favorable scenario for the existing algorithms (homogeneous virtual and physi-
cal topologies), the first observation is that METIS, Round Robin, Random, and
Switch Bin are only able to find feasible (with no node or link overcommitment)
solutions with the same number of physical hosts for 59.42%, 33.52%, 96.38%,
and 32.0% of the instances, respectively. The percentage is low for METIS and
4.4. EVALUATION OF THE PLACEMENT MODULES 69
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
very low for Random and Switch Bin. For Round Robin, the results are good for
this scenario because each physical machine has the same capabilities and each
vNode has the same requests. In homogeneous cases and when considering the
number of hosts found by our algorithm, if the Round Robin strategy returns an
unfeasible solution, it is only due to link overcommitment. As the link capacities
are very high in Gros (25 Gbps), it explains why Round Robin is able to solve
96.38% of instances. If we consider now the more complex scenario with vRT,
the percentage of solved instances by the existing algorithms drops to values
between 1.4% and 11.82%. This shows the efficiency of GreedyPartition to
find efficient solutions even in hard cases. Note that this advantage would be
even more important when doing experiments on heterogeneous clusters.
We analyze now the solutions returned when no solution satisfying node
and link constraints could be found. The goal is to see if such solutions are
“close” to be feasible, in the sense that only a node or link is a little bit over-
loaded. Link, CPU, and memory overcommitments are reported in Figs. 4.14,
4.15, and 4.16, respectively. Random, Round Robin, and Switch Bin do not re-
70 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
103
102
101
100
s
om
in
eti
obi
hB
nd
M
dR
itc
Ra
un
Sw
Ro
turn close solutions when they have some overloaded links. Indeed, they have
multiple overloaded links (with a median of 4 and 5) with an overcommitment
between 2% and 200%, and median values of 60%, 42%, and 68%, respectively.
For Metis, the link overcommitment is lower and involves a single link over-
loaded at around 10%. The overcommitment in terms of CPU tends to be
similar within the algorithms. The number of overloaded hosts is between 1
and 10 (with a median of 3 or 4) and the median overload is between 20% and
90%. The maximum overload reaches more than 100% for three of the four
algorithms.
c1 c3 s1 s3
... ...
c2 c4 s2 s4
They were performed using Grid’5000 [Bal+13] on two different clusters (Gros
and Rennes). We perform three kinds of experiments: bandwidth, CPU, and
memory intensive experiments - discussed in Secs. 4.5.1, 4.5.2, and 4.5.3, re-
spectively. We show the very strong impact of placement on bandwidth usage
and, thus, on emulation reliability, as well as the one on crashes and increased
emulation completion time due to lack of memory or because of CPU overload-
ing.
600
Expected value
500
Throughput [Mbps]
400
300
200
100
s
yP
om
in
ed
p
eti
wa
obi
nc
hB
eed
nd
M
eS
dR
al a
itc
Ra
Gr
v id
un
Kb
Sw
Di
Ro
Figure 4.18: Network experiment, Gros cluster, vFatTree K=6.
Homogeneous Case
The first experiment shows the bandwidth measured in the Gros cluster (see
Sec. 4.4) with 20 physical nodes, virtualizing a vFT with K=6. The physical
network in this cluster is a simple star topology with 25 Gbps links connect-
ing all physical nodes to the central nodes. The achieved throughput values
for all placement solutions are summarized in Fig. 4.18 by means of boxplots
representing the distributions over all client/server pairs. Since each link has
a capacity of 500 Mbps, the maximum iperf speed is slightly lower than this
value (as for Mininet). We observe that in this simple case, all the algorithms
return a solution that is not overloading any physical link, nor any physical ma-
chine. The emulation is working as expected for each of the studied placement
solutions.
4.5. OVERLOADING EXPERIMENT 73
600
Expected value
500
Throughput [Mbps]
400
300
200
100
0
s
yP
om
in
ed
p
eti
wa
obi
nc
hB
eed
nd
M
eS
dR
al a
itc
Ra
Gr
v id
un
Kb
Sw
Di
Ro
Heterogeneous Case
We now consider the heterogeneous physical (yet simple) physical topology of
the Rennes cluster (Fig. 4.5). Results using this experimental platform are
illustrated in Figs. 4.19 and 4.20. The experiments show how the emulation
can return unexpected results when the placement of the virtual nodes does not
take into account links’ rates. As this topology is not homogeneous like the
Gros cluster, finding a good placement is significantly harder, as discussed in
Sec. 4.4.
The first test in the Rennes cluster consists in emulating a vFT topology
K=4 using all the algorithms. In this case, the emulation creates a vFT with
16 vHosts and 20 vSwitches (Fig. 4.17). Fig. 4.19 reports the bandwidth per-
formance of a vFT (with K=4) obtained for the different placement algorithms
when the emulation is performed in the Rennes cluster. As we can observe, the
bandwidth results using GreedyPartition, k-balanced, DivideSwap, and
Metis are the ones expected from the emulation, while the results obtained
by other algorithms, Random and RoundRobin, are far from the expected
ones. Indeed, some of the links are overloaded in the placement returned by
the latter algorithms. This means that the paths of two demands are using the
same links. For these links, the throughput drops to 120 Mbps and 200 Mbps,
respectively. When running a larger emulation for a vFT with K=6, we can
74 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
600
Expected value
500
Throughput [Mbps]
400
300
200
100
0 s
yP
om
in
eti
obi
hB
eed
nd
M
dR
itc
Ra
Gr
un
Sw
Ro
Figure 4.20: Network experiment, Rennes cluster, vFatTree K=6.
observe in Fig. 4.20 that k-balanced, DivideSwap do not find a feasible so-
lution and the results returned by Metis and SwitchBin are not trustworthy
anymore. The measured throughput on some overloaded links has fallen to
very low values between 20 and 100 Mbps, to be compared with the expected
478 Mbps. On the contrary, the emulation distributed with GreedyPartition
returns exactly the expected results, showing the efficiency and reliability of the
proposed algorithm.
1.0
140
Job Completion Time [Seconds]
0.8
Probability to crash
120
0.6
100
0.4
80 0.2
60 0.0
s
yP
om
in
eti
obi
hB
eed
nd
M
dR
itc
Ra
Gr
un
Sw
Ro
Algorithm
Fig. 4.21 reports the Hadoop Job Completion times for each placement al-
gorithm. The boxplots provide their distribution over 10 experiments. The best
performances are the ones of GreedyPartition and RoundRobin. The rea-
son is that their solutions are not overloading any physical machine, nor link.
On the contrary, Metis and SwitchBin return the same placement overload-
ing 8 physical machines in terms of CPU by 50% (48 vCores are assigned to
8 hosts with 32 cores available). The impact of this overcommitment on job
completion time is an increase of around 40% as seen in the figure. Random
returns a different placement for each experiment. For most of them, at least
one physical machine hosts 3 virtual hosts leading to a CPU overcommitment
of 100%, explaining why Random has the highest job completion time.
1.0
140
Job Completion Time [Seconds] 0.8
Probability to crash
120
0.6
100
0.4
80 0.2
60 s 0.0
yP
om
in
eti
obi
hB
eed
nd
M
dR
itc
Ra
Gr
un
Sw
Ro
Algorithm
1.0
140
Job Completion Time [Seconds]
0.8
Probability to crash
120
0.6
100
0.4
80 0.2
60 0.0
s
yP
om
in
eti
obi
hB
eed
nd
M
dR
itc
Ra
Gr
un
Sw
Ro
Algorithm
Figure 4.23: Memory (no swap) experiment, Gros cluster, vFatTree K=4.
using Random (80% of the experiments did not succeed). This is due to a bad
assignment of resources made by the algorithm (often 3 vHosts were assigned
to the same physical host, leading to a memory overcommitment of 56%).
4.6 Conclusions
In this chapter, we propose a placement module for tools enabling distributed
network emulation. Indeed, large scale or resource intensive emulations have to
78 CHAPTER 4. DISTRIBUTED NETWORK EMULATION
Cloud Measurement
5.1 Introduction
Cloud computing provides convenient on-demand access to a potentially unlim-
ited pool of computing resources. In recent years, cloud computing resources
have become cheaper and more powerful, with the growing interest of compa-
nies around the world. Cloud computing brings several advantages: a small
company has little or no capital expenditure (CAPEX) when launching a new
product, instead of having to invest in datacenters and servers before knowing
how much these resources are going to be used. Cloud computing is flexible
and on-demand. This means that the users can consume computing resources
and only pay for the amount of resources they actually consumed. There is no
necessity to spend money on running and maintaining a datacenter since only
cloud providers have access to the physical equipment.
On a high level, cloud providers offer three service models:
79
80 CHAPTER 5. CLOUD MEASUREMENT
UDP, and ICMP) using different source ports and the default destination port.
The study does not take into account how the delay and the path are changing
at different times of the day like in our study. The work in [Arn+20] covers
more generally the major cloud providers and the main Tier-1 ISPs: this paper
shows how Internet traffic is more and more generated and transmitted in the
private interconnection between the cloud providers, thus bypassing the Tier-1
ISPs.
Migration of applications to the cloud is non-trivial. One of the first works
trying to automate the process is CloudGenius in 2012 [MR12]. CloudGenius
automates the decision-making process using models for web server migration
to the cloud, taking into consideration various constraints of the application,
like QoS, image types, costs, etc. Since 2012, the number of services and com-
plexity have exploded, and the cloud providers tend not to disclose the details
on network performance. The customers thus suffer from highly variable and
unpredictable network performance [MP12], [Pal+19]. [Gar07] evaluates AWS
computing and networking virtual resources’ performance, by analyzing its lim-
itations and the APIs. The work evaluates the AWS security features, the DNS,
Simple Queue Service (SQS) [Ama20g] , and EC2 services.
In [Ber+13], the authors evaluate the AWS network performance through the
passive analysis of network traffic collected from a single university campus and
three large Points of Presence (PoP) of an Italian national-wide ISP for 60 days.
The main services tested are EC2 , S3 [Ama20f], and Cloudfront [Ama20b], the
Content Distribution Network that Amazon built to provide services closer to
its customers.
In [Per+15a] and [Per+15b], the authors analyzed the cloud performance in
Amazon AWS and Microsoft Azure, with more than 800 hours of experimen-
tation. In particular, they studied what is the maximum network throughput
achievable between two VMs deployed on Microsoft Azure and how it changes
over time, across different scenarios.
Another problem is to define what is the network performance metric to
be considered when evaluating cloud infrastructures. Such research is done
in [MP12], where the authors review the proposals for providing performance
guarantees within cloud networks.
A metric to be considered when evaluating cloud performances is the effective
bandwidth assigned by the cloud provider. The authors in [HX18] experimented
public clouds with the Available Bandwidth Estimation Tools (ABETs). They
prove that the tools do not correctly estimate the available bandwidth of public
clouds. The main reasons are the rate limitations of network virtualization, the
packet sizes of the traffic generated, and the overhead of VM scheduling when
virtual instances are deployed in the physical servers.
An interesting scenario where it is critical to have low delay is cloud gaming.
The users do not need a powerful machine or a console to play as the providers
use cloud processing to run the game. In this case, the network delay cannot
exceed few milliseconds in order to be playable. While playing, the users are
constantly sending input commands to the game that is running far from the
user’s location. The gaming service has to send back the output of the game in
82 CHAPTER 5. CLOUD MEASUREMENT
subnet, the routing table should be configured before the connection (Fig. 5.1).
By default, the VPC does not provide any way to connect the instances to
the external network. The Virtual Internet Gateway is a highly scalable
virtual device managed entirely by AWS that allows traffic to go from the VPC
to the outside world, and vice-versa. Once created, the routing table should be
updated to redirect traffic to the subnets.
Two other main services are Network Access Control List (NACL) and
Security Group (SG). NACL is a stateless firewall associated with a subnet.
By default, the rule is to DENY the traffic. In order to enable the traffic in the
subnet, the ALLOW rule associated with a specific flow should be appended to
the NACL. SG is similar to NACL. It is a stateful firewall associated with an
interface: it can be a virtual interface of an instance, or another virtual device
that has an IP created in the VPC, i.e., virtual load balancer.
There are three different type of addresses in AWS: private, public, and
Elastic IP. Elastic IP is a public address that can be conveniently attached to,
and detached from, a virtual interface at any time. Once created and attached,
Elastic IP is not released, even if the user deletes the instance or the interface.
The address should be explicitly released by the user when it is not needed
anymore.
In some scenarios, the user needs to connect instances or services between
different VPCs. This can be achieved, of course, using the public IPs of the
interfaces. This solution comes with various side effects. First of all, the traffic
is going through the public Internet, and it is a risk in terms of security and
privacy. The other issue of this solution is that if an application doesn’t need a
public IP but has to communicate with another service or instance in another
VPC, then it is forced to have a public IP. To solve this issue, AWS proposes the
VPC peering service [Ama20h]. With VPC peering, it is possible to connect
two or multiple VPCs. The instances in the peering can connect to each other
via private IP, like they are in the same network. Amazon ensures that the
traffic going via the peering never leaves the AWS network, increasing security
and performances.
The last important concept to describe before presenting CloudTrace is Elas-
tic Compute (EC2). EC2 is a service that provides secure and resizable
compute capacity in the cloud [Ama20d]. EC2 provides a large number of in-
stance types. The user can choose the instance type that fits better the targeted
scenario. Some instances are compute-optimized, with high-performance pro-
cessors ideally used for scientific modeling and High-Performance Computing
(HPC). The memory-optimized instances are able to process large data sets in
memory: this type of instances is well suited for the in-memory database (i.e.,
Redis), GPU-optimized for machine learning workloads and many other types
of instances [Ama20c]. For our experiment, we use two different architectures
(Figs. 5.1 and 5.2). Fig. 5.1 describes a regional infrastructure composed of two
subnets created in two different AZs within a single VPC, i.e., in a single region.
As explained before, the Internet Gateway is responsible to allow connectivity
with the external Internet. An instance is created on each subnet, and the flow
table is stored in a single routing table, serving the two subnets. The instances
84 CHAPTER 5. CLOUD MEASUREMENT
can connect to each other via the public IPs (an elastic IP is attached on each
interface of the instance), or via the private IP since they are on the same VPC.
In Fig 5.2, the infrastructure is shared between two different regions. The
architecture is similar, and the instances can still use the public IP to connect
to each other like in the regional environment, but they cannot use by default
the private IPs. In this case, a VPC peering connection between the two VPCs
needs to be created to allow the instances to use the private IP to connect them
as if they were on the same network.
CloudTrace is generating the infrastructure in both cases, with two or more
AZs in case of regional experiments, and two or more regions in case of multi-
regional experiments.
Our implementation choices are discussed in the next section.
5.4 Implementation
CloudTrace is open-source, and it is compatible with Linux and MAC OSX,
while it is possible to install it on Windows via Docker. The tool is built
using LiteSQL, Ansible, Paris-Traceroute, and the Boto3 API for automatic
deployment in Amazon AWS. The basic idea is to have a CLI that takes as input
the regions and the type of experiment that the user wants to run, and creates
an environment performing multiple traceroutes between the virtual instances.
5.4. IMPLEMENTATION 85
If the user selects the option to use VPC peering, a peering connection
between all the VPCs are created, and all the route tables are updated, in order
to forward the traffic via the private IP between remote VPCs (Fig. 5.2).
All the information needed is saved in the client database (i.e., public IPs,
VPC ID, regions, instances ID, etc.). CloudTrace parses all this information
and creates for each experiment an Ansible inventory (a file that describes the
login info and the addresses of the instances). For each experiment, Ansible runs
a standard playbook, a YAML file that lists all the requirements the instances
have to install and all the commands that each instance has to execute before
running the experiment, e.g., update the kernel. All public IPs of the instances
are saved in the client database. CloudTrace also creates a host configuration
file for Ansible.
The process described above summarizes the creation process of the experi-
ment, but the experiment does not start by default. Once all the requirements
are installed, the environment is ready for the experiment. The user can easily
start the traceroute experiment with the start command followed by the ID of
the experiment. Once the user decides to start the experiment, the tool cre-
ates a traceroute script for each instance. This script runs Paris-traceroute on
the default source and destination ports of all the other instances in the ex-
periment. For example, if the user creates a multiregional experiment with 4
regions listed, each instance has to perform 3 traceroutes in parallel (one for
each other instance). If the user decides to run the experiment via the public
IP, the traceroute file uses the public IPs of the instances. Otherwise, it uses the
private ones. The scripts are copied on the virtual instances on AWS. Ansible
uses SSH to connect to the instances, and the login configurations are saved in
the inventory file of the experiment.
After copying the files in the remote instances, Ansible configures crontab
to run the script each minute and saves the file in a specified directory in the
machine. The process is run in parallel using the Ansible –fork option. At
any moment, the user can retrieve the traceroutes from the instances, using
the retrieve data command. This command will not stop the experiment in
the instances, so the user can retrieve the new data without interrupting the
experiment. Ansible compresses the data on the remote instances before sending
them back to the client, in order to decrease the retrieving time (it is also made
in parallel with the fork command).
Once the data are retrieved in the client, the user can analyze it manually
or can use CloudTrace for a simple analysis of the data.
CloudTrace can also plot a simple and interactive network map that describes
how the delay varies around the globe. The map is created using Plotly, Folium,
and Dash Python libraries. Fig. 5.3 is an example of the delays obtained between
3 regions.
5.5. EXPERIMENTS 87
5.5 Experiments
We present multiple experiments, tracing the network performances during one
month in Amazon AWS. The tests are made on 3 AWS regions: Frankfurt
(eu-central-1), Dublin (eu-west-1) and London (eu-west-2). The experiment
ran from 04/11/2020 till 04/12/2020, and we collected in total more than two
millions of traceroutes.
Regional Experiments. We did these first experiments in all the Avail-
ability Zones of the three regions chosen above. The regional experiment creates
a virtual instance in each Availability Zone and performs traceroutes between
them. The first region we take into consideration is Frankfurt, with the traffic
going from eu-central-1a to eu-central-1b (Fig. 5.4) and eu-central-1b to eu-
central-1c (Fig. 5.5). In this first experiment, we present the result using the
private IPs to perform the traceroute.
As we can see in Fig. 5.4, the average delay during the day is stable. The
black line indicates the daily rolling mean, while the blue line shows the hourly
rolling mean. The delay is regular in both Availability Zones. In the first
plot, we can see that the average daily delay is increasing in the week from 20
November to 26 November. The reason could be that the Availability Zone was
more used in this period. Unfortunately, we don’t have access to the AWS data
to see how crowded was the region during this period. What we will see in the
latest experiment is how to link the delay with the AWS Spot instances prices.
Table 5.1: Multiregional delay with confidence intervals (99%) with public and
private IPs
Table 5.2: Frankfurt delay with confidence interval for public and private IPs
regions with the private IP assigned by AWS. Table 5.1 shows the confidence
intervals (at 99%) for all the multiregional experiments performed in the en-
tire month. As we can see, the delay using private IPs is higher, and also less
stable, than for public IPs. This behavior is also repeated in all the regional
experiments performed. For example, Table 5.2 shows the delay with confidence
interval in the Frankfurt region. We think that these differences are due to the
additional layer added inside the AWS network and the control that each packet
has to pass in order to circulate inside the AWS network. For example, to pre-
vent IP spoofing, the AWS network does not allow a virtual instance to send a
packet inside a network if the packet’s source address is different from the one
assigned to the instance. These types of control can slow down the traffic and
increase the delay.
Delay vs. spot instances prices. We first need to explain the concept of
spot instance. The AWS clusters are not always full. There are periods when the
5.5. EXPERIMENTS 91
Fig. 5.8 shows the delay between eu-central-1c and eu-central-1a Availability
Zones. We can see that from 5 November to 13 November, the spot price
increases and became almost twice expensive. This means that in this period,
the Availability Zone was more crowded than usual and returned at the standard
price around 23 November. In this period, we can see that the delay was not
affected by the increasing demand from the customer. On the opposite, in
Fig. 5.9, we can see that in the London region (between eu-west-2c and eu-west-
2a), the average delay is higher from November 10 to November 13, but the
spot prices are not changing. This means that there is no correlation between
the spot prices and the delay measured in the Availability Zone.
92 CHAPTER 5. CLOUD MEASUREMENT
5.6 Conclusions
In this chapter, we analyzed and presented the main components of the AWS
IaaS service. AWS provides hundreds of services to its customers. We focused
on the main EC2 components (like VPCs, subnets, Virtual Internet gateway,
etc.) that companies and organizations use to move their infrastructures to the
cloud.
We focused mainly on the networking performances of the global infrastruc-
ture. We studied the network delay stability in multiple deployment scenarios.
Companies that are moving delay-sensitive applications on AWS are interested
in response and delay time of the network for a high availability (multiregional)
configuration or a regional configuration with multiple Availability Zones. We
implemented a simple yet powerful tool for the community to measure the net-
working performances and plot automatically statistics, together with a map of
the analyzed network.
CloudTrace is publicly accessible at https://github.com/Giuseppe1992/
CloudTrace.
We performed multiple experiments on 3 different AWS regions using the
basic EC2 instances, and we monitored the performance of the network for one
month. Interested readers can easily download and install their applications,
and test the networking performances in other regions. The deployment in the
cloud and the experiment part are separated into different modules; CloudTrace
can be easily extended to be compatible with Microsoft Azure and Google Cloud.
These extensions can be helpful to differentiate the performances in the various
cloud providers.
Chapter 6
Failure Recovery
6.1 Introduction
More than ever, data networks have demonstrated their central role in the world
economy, but also in the well-being of humanity that needs fast and reliable
networks. In parallel, with the emergence of Network Function Virtualization
(NFV) and Software Defined Networking (SDN), efficient network algorithms
considered too hard to be put in practice in the past now have a second chance
to be considered again. In this context, as new networks will be deployed and
current ones get significant upgrades, it is thus time to rethink the network
dimensioning problem with protection against Shared Risk Link Group (SRLG)
failures.
In this chapter, we consider a path-based protection scheme with a global
rerouting strategy in which, for each failure situation, there may be a new
routing of all the demands. Our optimization task is to minimize the needed
amount of bandwidth. After discussing the hardness of the problem, we develop
two scalable mathematical models that we handle using both Column Gener-
ation and Benders Decomposition techniques. Through extensive simulations
on real-world IP network topologies and on randomly generated instances, we
show the effectiveness of our methods: they lead to savings of 40% to 48% of
the bandwidth to be installed in a network to protect against failures compared
to traditional schemes. Finally, our implementation in OpenDaylight demon-
strates the feasibility of the approach, and its evaluation with Mininet shows
that technical implementation choices may have a dramatic impact on the time
needed to reestablish the flows after a failure takes place.
As introduced before, NFV is an emerging approach in which network func-
tions such as firewall, load balancing, and content filtering are no longer ex-
ecuted by proprietary hardware appliances but, instead, can run on generic-
purpose servers located in small cloud nodes and can be instantiated on de-
mand [Han+15]. Network flows are often required to be processed by an or-
dered sequence of network functions. Moreover, different customers may have
93
94 CHAPTER 6. FAILURE RECOVERY
and manage networks more efficiently. Indeed, with SDN and its control–data
planes decoupling, routing decisions can be done using a logically centralized
approach. This paves the way for a broadening of perspective in terms of fault
management [FM17].
Chu et al. [Chu+15] consider a hybrid SDN network and propose a method
to design the network in such a way that fast failure recovery from any single
link failure is achieved. Their proposal consists in redirecting the traffic of
the failed link from the routers to SDN switches through pre-configured IP
tunnels. Next hops are pre-configured before the failures take place, and the set
of candidate recovery paths for different affected destinations is chosen by the
SDN controller in such a way that the maximal link utilization after redirecting
the recovery traffic through these paths is minimized. In [Qiu+17] they build a
high-performance control plane for path computation using multiple controllers;
while in [Qiu+19] the authors uses fast SDN rerouting reducing the time to
compute new paths in case of link failures, minimizing delay of the new paths.
Suchara et al. [Suc+11] propose a joint architecture for both failure recovery
and traffic engineering. Their architecture uses multiple pre-configured paths
between each pair of edge routers. In the event of a failure, the failover is made
on the least congested path that ensures connectivity. Besides, Sgambelluri et
al. [Sga+13] propose a controller–based fault recovery solution that uses Open-
Flow’s Fast Failover Group Tables to quickly select a pre-configured backup
path in case of link failure.
The idea of using a set of pre-configured multiple backup network configu-
rations is not new. For instance, in [Kva+06; KCG07], the authors propose a
pre-configured proactive IP recovery scheme that makes use of multiple routing
backup configurations as a method for fast recovery. The main idea is to create
a small set of backup routing configurations to be used in the case of a single
link or node failure. Since the backup configurations are kept in the routers,
it is necessary to reduce their number to avoid requiring the routers to store a
significant amount of state information.
In [Vas+18; VNT20; Vas+20], the authors use SRLG for modelling in order
to enhance the preparedness of a given network to natural disasters, or regional
failures. For example, a regional SRLG aims to characterize a failure damaging
the network only in a bounded geographical area. The authors in [Ass+20]
propose a Mixed Integer Linear Program (MILP) for an elastic optical network
protection with SRLG group and dedicated protection mechanism.
Herein, we take to the extreme the idea of multiple routing configurations by
allowing a completely different routing in response to an SRLG failure situation.
Unlike the above studies, our aim is to provide a bandwidth-optimal mechanism
to design a reliable network. Our work revisits optimization techniques and
protection schemes introduced to design survivable optical networks in a large
corpus of studies of the 90s - see for example [GMS95; KM05; Sto06] for surveys
or reference books. We generalize the results to layered graphs. Indeed, besides
guaranteeing the recovery, our proposed approach also takes into consideration
the Service Function Chain (SFC) requirement of the flows. Moreover, we show
with the use of experiments that global rerouting, which was studied in the
96 CHAPTER 6. FAILURE RECOVERY
past just as a lower bound, becomes today a viable and very efficient protection
solution thanks to SDN control. It allows to study effectively what is the right
number of NFVI-enabled nodes, in terms of costs and acceptable QoS levels.
Table 6.1 summarizes the most important works to be compared to our
proposal. In particular, we check if the works consider optimal (bandwidth)
dimensioning for protection against SRLGs, handle the VNFs of the network
services, implement fast rerouting upon failures using SDN, propose scalable
solutions through decomposition methods, or approximation algorithms.
[Chu+15] [Suc+11] [Ass+20] [Taj+19] [Sga+13] [Kva+06] [KCG07] [VNT20] [BSY18] Our
Context and problem solved proposal
Survivable network design 4 4 4 4 4 4 4 4 8 4
Fast rerouting with SDN 4 8 8 4 4 8 8 8 4 4
Network services and NFV 8 8 8 4 8 8 8 8 8 4
SRLG 8 4 4 8 8 8 8 4 8 4
Theoretical methods and results
Optimal bandwidth 4 4 4 8 8 8 8 8 4 4
Decomposition methods 8 8 8 8 8 8 8 8 4 4
Approximation algorithms 4 4 8 4 8 4 4 4 8 4
Implementation results
Implementation via emulation 8 8 8 8 4 8 8 8 8 4
4=Yes, 8=No
Table 6.1: Summary of related work. The table analyzes the context and the
methods used in the different works.
The proposed optimization methods are said scalable in the sense that they allow to solve much
larger instances than the classic compact ILP formulation solving the same problem.
6.3. OPTIMIZATION APPROACHES 97
Symbol Description
G Undirected graph
V Set of nodes in the graph G
E Set of links in the graph G
r∈R Set of links that share a common phys-
ical resource
R Set of SRLGs
d∈D Single demand composed by
(sd , td , bwd , Cd )
D Set of demands
sd Source of the traffic
td Destination of the traffic
bwd Required units of bandwidth G
Cd Ordered sequence of network functions
`(d) Length of the SFC for a demand d
V vnf ⊆ V Set of NFVI nodes
GL (d) Layered graph for demand d
π Service path
Πrd Service function paths for a demand d
aπuv Number of times link (u, v) is used in
the service path π
yπd,r Boolean value to check if demand d
uses path π as a service path in the
SRLG failure event r
xuv Bandwidth allocated to link (u, v)
ϕ(ui,vj) Boolean value to check if the flow
is forwarded on link ((u, i), (v, j)) of
GL (d)
αωsd , βuv
r
Dual values relative to constraints
(6.8) and (6.9)
λd (µ) Length of the shortest path for de-
mand d, with respect to link metrics
µ
e1
e2
e3
s .. t
.
e|S|
Figure 6.1: The multigraph resulting from the reduction. G = (V, E) is the
multigraph with V = {s, t} and E = {ei , i = 1, ..., |S|}. All the edges have s
and t as endpoints.
(see Fig 6.1 and Example 1 for an example). For each C 0 ⊆ C, we add a
failing scenario rC 0 = E \ C 0 to R, corresponding to edges that cannot be used
in the failure situation r. Finally, we add to D , a demand d with s and t as
source and destination respectively, and with charge equal to 1. The goal now
consists in finding a path for each of the failure scenarios r ∈ R minimizing
the needed capacity to deploy. The total capacity needed to satisfy d in each
of the failure situations does not exceed c ⇐⇒ there exists a hitting set of
cardinality not greater than c. The proposition follows immediately from the
fact that Hitting Set is NP-hard [ADP80] and cannot be approximated within
a factor of ln |S| [DS14], unless P=NP.
Example 1 (Reduction). Consider the following instance of the Hitting Set
Problem: S = {1, 2, 3, 4, 5} and C = {C1 , C2 , C3 , C4 , C5 } with C1 = {1, 2, 3, 4},
C2 = {1, 4, 5}, C3 = {2, 5}, C4 = {2, 3, 5}, and C5 = {1, 4}. The multigraph
resulting from the reduction of the instance is a multigraph with 5 edges (noted
1, 2, 3, 4, and 5) linking two nodes s and t corresponding to the elements in S.
The corresponding instance of the Global Rerouting problem has a single
demand d with source s and destination t, an empty SFC, and requiring a single
unit of bandwidth to be sent. It has 5 SRLG events R = {r1 , r2 , r3 , r4 , r5 }, as
many as the number of subsets in C, with r1 = E \ C1 = {5}, r2 = E \ C2 =
{2, 3}, r3 = E \ C3 = {1, 3, 4}, r4 = E \ C4 = {1, 4}, and r5 = E \ C5 = {2, 3, 5}.
Solving the Global Rerouting problem boils down to finding a set of edges
(of minimum cardinality) for which at least an edge will be available over all
failure scenarios: {4, 5}. If we install a capacity of 1 for these two edges, we ob-
tain a survivable network of minimum cost to protect against the 5 considered
SRLG events. Note that this is equivalent to finding a minimum cardinality
Hitting Set for the original instance, as the sets C1 , C2 , C3 , C4 , C5 correspond
to the edges available in each failure scenario.
Layer 0
u2,0
u1,0 u3,0
Layer 1
u2,1
u1,1 u3,1
Layer 2
u2,2
u1,2 u3,2
Figure 6.2: The layered network GL (d) associated with a demand d such that
sd = u1 , td = u3 , and Cd = f1 , f2 , with G = (V, E) being a triangle network.
We assume f1 installed on Node u1 and f2 installed on Nodes u1 and u3 . Source
and destination nodes of GL (d) are u1,0 and u3,2 , respectively. They are drawn
with dashed lines. Two possible service paths that satisfy d are drawn in red
and blue.
For each link, the needed capacity to be installed is maxr∈R xruv in order to guar-
antee the recovery for every failure scenario or SRLG in R. We thus mininimize
the sum over all links in (6.1).
Flow Conservation constraints (6.2) (6.3) (6.4): for each demand d, SRLG set
r ∈ R,
X X
ϕd,r
(sd 0,vj) = ϕd,r
(td `(d),vj) = 1 (6.2)
(v,j)∈ω((sd ,0)) (v,j)∈ω((td ,`(d))
X
ϕd,r
(ui,vj) ≤2 v ∈ V \ {(sd , 0), (td , `(d))} (6.3)
(v,j)∈ω((u,i))
X
ϕd,r d,r
(ui,v 0 j 0 ) ≥ ϕ(ui,vj)
(v 0 ,j 0 )∈ω((u,i))\{(v,j)}
Equation (6.2) ensures that the path of demand d starts at the source node
(sd , 0) in layer 0 and ends at the destination node (td , `(d)) in layer `(d), with
`(d) the length of the SFC of the demand d. Equation (6.3) forces the paths
to not use twice the same link of the layered graph to avoid loops. The flow
conservation of each path (i.e., a path arriving at an intermediate node has to
leave this node) is a result of Equation (6.4). Remark that the constraint is an
inequality. However, we have equality for an optimal solution as a product of
the minimization objective.
Unavailable links in an SRLG failure event (6.5): for each r ∈ R,
`(d)
X X X
ϕd,r
(uk,vk) = 0. (6.5)
d∈D (u,v)∈r k=0
Equation (6.5) ensures that a path cannot use a link involved in SRLG r when
considering the failure scenario r.
Bandwidth utilization in an SRLG failure event (6.6): for each SRLG set r ∈ R,
link (u, v) ∈ E,
`(d)
X X
bwd · ϕd,r r
(uk,vk) ≤ xuv . (6.6)
d∈D k=0
Equation 6.6 guarantees that the bandwidth usage on each link (which is the
sum of the bandwidth usage over all layers on the link) is lower than or equal
to the link capacity.
One service path for each demand and SRLG failure event (6.8): for all d ∈ D ,
r∈R X
yπd,r ≥ 1. (6.8)
π∈Πrd
Inequality (6.8) ensures that at least one path is selected for each pair de-
mand/SRLG event. Note that a single path is selected for optimal solutions
due to the minimization objective.
Bandwidth utilization (6.9): for all (u, v) ∈ E, r ∈ R
X X
xuv ≥ bwd · aπuv · yπd,r . (6.9)
d∈D π∈Πrd
Given its very large number of variables, Column Generation is an efficient tech-
nique to handle the above linear integer programming model. One starts with
a limited set of variables in a so-called Restricted Master Problem (RMP). At
each iteration, the RMP is solved. The dual values associated to the constraints
are used to generate new paths with negative reduced cost and the associated
variables are added to the RMP that may enable to improve the current solu-
tion. This process is repeated until no more columns can be added to the RMP,
i.e., no more columns with negative reduced cost exist. We refer to [DDS06] for
more details regarding this technique.
Pricing Problem
The pricing subproblem is solved independently for each demand d and
SRLG failure event r and it returns a service path π. It consists in finding
6.3. OPTIMIZATION APPROACHES 103
a minimum cost service path in the layered graph where the weight of a link is
defined according to the dual values of the associated constraint.
Variables:
• ϕ(ui,vj) ∈ {0, 1}, where ϕd,r
(ui,vj) = 1 if the flow is forwarded on link
L
((u, i), (v, j)) of G (d).
Let αωsd ≥ 0 and βuv r
≥ 0 be the dual values relative to constraints (6.8) and
(6.9), respectively. The service path reduced cost for a given demand d and an
SRLG r can be written as:
`(d)
X X
min −αrd + bwd · r
βuv · ϕ(uk,vk)
(u,v)∈E k=0
The first term is a constant for each request, and the second term corresponds
to a summation over the links of the network. Therefore, the objective func-
tion (6.10) becomes:
`(d)
X X
r
min βuv · ϕ(uk,vk) . (6.10)
(u,v)∈E k=0
Thus, for each request and for each failure situation, the pricing subproblem
corresponds to a weighted shortest-path problem in the layered graph. In a
given SRLG failure situation r and for all the demands d ∈ D, the weight of a
link ((u, i), (v, j)) of GL (d) is defined to be βuv
r
if i = j, 0 otherwise. Either one
of these paths leads to a negative reduced cost column, or the current master
solution is optimal for the unrestricted program. In the former case, the new
configurations found are then added iteratively to the RMP. In the second case,
∗
the solution of the linear relaxation of the RMP zLP is optimal. Convergence
of the basic Column Generation procedure suffers from dual oscillations as the
number of constraints (6.9) is large. To improve the convergence and reduce the
fluctuations in the dual variables, we use the piecewise linear penalty function
stabilization described in [Pes+18]. Associated to the optimal solution of the
linear relaxation of the RMP, for each demand d and SRLG failure situation
r, there is a set of service paths identified by all the variables yπd,r with value
greater than 0. These service paths guarantee the minimum cost in terms of
required bandwidth to deploy for ensuring the recovery in the splittable flow
case. However, if we restrict our attention to the unsplittable flow case, we have
to select only one service path for each demand and SRLG failure situation.
The problem now consists in making this choice by reducing the overflow in-
troduced in the network. One possible way consists in changing the domain of
the variables in the last RMP from continuous to integer and use an ILP solver.
We refer to this strategy as MasterILP.
assignments on one hand, and second stage routing decisions on the other hand.
The master problem is in terms of the xuv variables. It takes the following form.
Objective (6.11): minimization of the bandwidth used in the network
X
min xuv . (6.11)
(u,v)∈E
where the latter constraints are known as metric inequalities [CCG09]. They can
be separated in polynomial time by solving an LP. Hence, they can be handled in
a lazy way by a dynamic generation, which allows to solve the problem using the
cutting plane algorithm. These cuts are iteratively added to the master problem
until the difference between the lower bound, corresponding to the solution of
the master problem, and the upper bound, corresponding to the solution of the
subproblems, falls under a fixed value .
Benders separation subproblem is solved given the link bandwidth vector x.
This capacity assignment is globally feasible (for the splittable problem) if and
only if for each vector µ = {µuv ≥ 0 : (u, v) ∈ E} and for each SRLG failure
situation r ∈ R, the inequality
X X
µuv · δuv,r · xuv ≥ λd (µ) · bwd (6.13)
(u,v)∈E d∈D
holds, where δuv,r ∈ [0, 1] is the available portion of link (u, v) under scenario r,
and λd (µ) is the length of the shortest path for demand d with respect to link
metrics µ.
Associated to the optimal solution of the master problem, we have the op-
timal link capacities in the splittable flow case, as in the Column Generation
case. The main difference lies in the fact that we do not have the selected paths.
We thus have to find a path for each demand and failure situations trying to
minimize the overflow, with respect to the solution found in the splittable flow
case.
IterILP Algorithm
MinOverflowILP:
Input a graph G = (V, E), an SRLG set r ∈ R, and a capacity function c̃.
Remark: if the objective function is equal to |E|, all demands can be routed with
the capacity function c̃. All the overflow ratios are equal to 0 and no additional
capacity is needed for the scenario of the SLRG r.
106 CHAPTER 6. FAILURE RECOVERY
Bandwidth utilization for the SRLG set r (6.18): for each link (u, v) ∈ E,
`(d)
X X
bwd · ϕd,r r
(uk,vk) ≤ xuv . (6.18)
d∈D k=0
Equation 6.18 guarantees that the bandwidth usage on each link is lower than,
or equal to, the link capacity.
Definition of the overflow: the overflow ratio of the link uv ∈ E, oruv , is defined as
1 if xruv ≤ c̃uv and (xruv − c̃uv )/c̃uv otherwise, i.e., oruv = max((xruv − c̃uv )/c̃uv , 1).
It thus gives inequalities (6.19) and (6.20), for each link (u, v) ∈ E,
xruv − c̃uv
oruv ≥ (6.19)
c̃uv
oruv ≥ 1 (6.20)
Note that the minimization of the objective function will imply the equality to
the maximum for an optimal solution.
If on one hand, this strategy leads to good results, on the other hand, it may
not scale well, since we have to solve an ILP for each SRLG failure scenario.
Note that, contrary to the classical version of the problem, we do not have
hard capacity constraints to respect while computing an integer routing. Herein,
the goal is to route all the demands reducing the increase in terms of capacity
over each of the links (i.e., the overflow) with respect to the free given capacities
already available in the network.
Proposition 2. The Min Overflow Problem is APX-hard (and so is NP-
3
hard) and cannot be approximated within a factor of 1 + 320 , unless P=NP.
Let c∗uv be the optimal capacity of an edge (u, v) in the splittable flow case.
After having computed a fractional flow, we have associated to each demand
d ∈ D a set consisting of n(d) ≥ 1 paths Pd = {Pd,i : i = 1, ..., n(d)}. Each path
Pn(d)
Pd,i is associated to a multiplier 0 ≤ λd,i ≤ 1 such that i=1 λd,i = 1 which
gives the amount of flow λd,i · bwd routed on Pd,i . Let λd,i (uv) be the fraction of
108 CHAPTER 6. FAILURE RECOVERY
flow routed on the edge (u, v) by a demand d. Note that for each edge (u, v), we
Pn(d)
have d∈D i=1 bwd · λd,i (uv) ≤ c∗uv since by hypothesis these capacities are
P
feasible for the splittable flow case. In order to find an unsplittable solution, we
use a rounding–based heuristic referred to as Randomized Rounding, which
assigns to a demand d a path Pd,i with probability λd,i . We consider now the
impact in terms of load on an edge (u, v). Let fuv be the flow on (u, v) at the
end of the rounding procedure. Clearly, for each edge (u, v), E(fuv ) ≤ c∗uv holds.
Let Ouv be the overflow on the edge (u, v) defined as max(0, fuv − c∗uv ). We
denote by P0 (uv) = P[fuv = 0] the probability that the edge (u, v) is not used.
E[Ouv ] = P0 (uv) · 0 + (1 − P0 (uv)) E[fuv |fuv > 0] − c∗uv
= (1 − P0 (uv)) E[fuv |fuv > 0] − c∗uv (1 − P0 (uv))
Moreover,
E[fuv ] = P0 (uv) · 0 + (1 − P0 (uv)) E(fuv |fuv > 0)
E[fuv ]
E[fuv |fuv > 0] =
1 − P0 (uv)
We can therefore bound the expected overflow of a link (u, v).
E[Ouv ] = E[fuv ] − c∗uv (1 − P0 (uv))
= P0 (uv)c∗uv − (c∗uv − E[fuv ]) ≤ P0 (uv)c∗uv
Let us now consider the probability P0 (uv) that an edge is not used after the
randomized rounding. Given an edge (u, v), we define Puv to be the paths that
contain (u, v) as an edge.
Y
P0 (uv) = (1 − λd,i )
Pd,i ∈Puv
The probability for an edge not to be selected is maximized when all λd,i are
equal (i.e., λd,i = |P1uv | ∀ λd,i ∈ Puv ). Thus,
1
P0 (uv) = (1 − )|Puv |
ρ|Puv |
E[fuv ]
where ρ is defined to be c∗ . This gives an upper bound for the possible value
uv
of P0 (uv). Indeed,
1 n 1
P0 (uv) ≤ lim (1 − ) = ρ.
n→∞ ρn e
The function is minimized with ρ = 1. We thus get
1 ∗
E[Ouv ] ≤ c − (c∗uv − E[fuv ])
eρ uv
1 1
≤ c∗uv ( ρ − (1 − ρ)) ≤ c∗uv ≈ 0.37c∗uv .
e e
6.4. NUMERICAL RESULTS 109
By using the Markov inequality, the probability that the obtained solution has
1
a cost larger than 1.37(1 + ) is at most 1+ . The overflow resulting from the
execution of the randomized rounding can be checked in polynomial time. If the
overflow exceeds the factor of (1+ 1e )+, another trial may be necessary in order
to find a solution below this value. The number of trials depends on the chosen
1
value for . For instance, if we set = 10 , we need an average of 10 trials in
order to find a solution with cost not greater than 1.507 (= 1.37 + 0.137) times
the optimal fractional one. As we have just shown, the problem of minimizing
the overflow can be approximated efficiently for a single scenario. The proposed
scheme consists in a randomized rounding to be performed according to the value
of the splittable flow solution. We may extend Randomized Rounding to the
case of multiple scenarios by simply solving the scenarios in an iterative fashion.
At each iteration, an SRLG r ∈ R is considered. First, a fractional capacitated
multicommodity flow is solved. Then, a (1+ 1e +)–approximated integer solution
is found using the Randomized Rounding procedure described in Algorithm 6.
The overflow introduced (if any) by the procedure is then added. We refer to
this method as Iterative Randomized Rounding - see Algorithm 5 for the
pseudo-code of our proposed algorithm.
9: end for
10: until overflow ≤ (1+ 1e +)|E| return ci , the capacities of the drawn integral
routing
1
of using an ILP to minimize the overflow, we use a (1 + e + )–approximation
algorithm. We show the effectiveness
Data Sets
We conduct experiments on three real-world topologies from SNDlib [Orl+10]:
polska, (12 nodes, 18 links, and 66 demands), pdh (11 nodes, 34 links, and 24
demands) and nobel-germany (17 nodes, 26 links, and 121 demands). For these
networks, we use the traffic matrices provided with the data set. No informa-
tion is available about the SRLGs for these networks. Thus, the collection of
network failures R for these instances contains single edge failures.
We also conduct experiments on randomly generated instances of different sizes
and with different SRLGs. We build our synthetic instances using a simi-
lar method to the one in [KKV05]. We generate two networks in which we
place nodes in a unit square. In each of them, we add links according to the
Waxman model [Wax88]. The probability of having a link (u, v) is defined as
−dist(u,v)
α exp βL where dist(u, v) is the Euclidean distance from node u to node
v, L is the maximum distance between two nodes and α, β are real parameters
in the range [0, 1]. One of the two networks represents the logical IP network,
i.e., IP routers and IP links while the other represents the underlying optical
6.4. NUMERICAL RESULTS 111
Figure 6.3: Time of the solution found by the ILP and by our proposed methods
as a function of the number of demands.
network, i.e., cross-connect and fibers. Each IP node is mapped to the closest
optical cross-connect and each IP link (u, v) is mapped onto the shortest path
between u and v in the physical network.
All IP links using the same physical link are associated to an SRLG. In addition,
we add an SRLG for each undirected link. Demands are generated using the
−dist(u,v)
model described in [FT02]. The model considers the distance factor exp 2L
between two nodes u and v. As a result, the load of the demands between close
pairs of nodes is higher with respect to pairs of nodes far apart.
Finally, the chain of each demand is composed of 3 to 6 functions uniformly
chosen at random from a set of 10 functions. Each NFVI node can run up to 6
network functions. Indeed, a node may not be allowed to run all the network
functions. Similarly as in [HJG18], locations are chosen according to their be-
tweenness centrality, an index of the importance of a node in the network: it
is the fraction of all shortest paths between any two nodes that pass through a
given node. Experiments have been conducted on an Intel Xeon E5520 with 24
GB of RAM.
Figure 6.4: Value of the solution found by the ILP and by our proposed methods
as a function of the number of demands.
the Waxman random networks are 22, 40, 53, and 70, respectively. The first
column compares the Column Generation (ColGen) and the Benders Decompo-
sition [Ben62] (Benders) techniques to find a fractional solution based on which
the heuristics find an integer solution. The Column Generation technique ap-
∗
pears to be faster in finding the optimal solution zLP : on the largest considered
network, wxm40 only takes 22 minutes to find an optimal solution, while Benders
would require more than one hour. The remaining three columns refer to our
optimization methods. For each method, we present both the time needed to
z̃ −z ∗
find a solution z̃ILP , as well as the ratio = ILPz∗ LP with respect to the opti-
LP
∗
mal fractional solution zLP . , called accuracy in the following, gives an upper
bound on the maximum overflow to pay in excess with respect to the optimal
∗
integer solution zILP , since the optimal integer solution may be larger than the
fractional one.
Both MasterILP and IterILP allow to find near–optimal solutions. As the size
of the network increases, we begin to observe the limits of the IterILP approach,
as it solves an ILP for each scenario. Although MasterILP demonstrates a bet-
ter scalability and a very high accuracy, for larger networks we have a tradeoff
between the time to find the solution and the quality of the solution found.
Indeed, for wxm40, IterRR only takes 2 minutes to find a good solution with an
accuracy of about 9%, while MasterILP requires 27 minutes to find a solution
with an accuracy of 2.2%.
NFVI nodes are expensive to both purchase and maintain (e.g., hardware,
software licenses, energy consumption, and maintenance). If, on one hand,
over-provisioning corresponds to undue extra costs, on the other hand, under-
provisioning may result in poor service to user and in Service Level Agreement
(SLA) violations. It is thus necessary to find the right trade-off in terms of
NFVI nodes in the network design phase.
Bandwidth overhead. In Fig. 6.5, we compare the overhead in terms of band-
width needed in the network by the global rerouting scheme and Dedicated Path
Protection with respect to the bandwidth needed in the unprotected case. For
Dedicated Path Protection, we compute for each demand two SRLG-disjoint
paths, i.e., two paths such that no link on one path has a common risk with any
link on the other path. By doing so, we set the bandwidth minimization as an
optimization task. With an increasing number of NFVI nodes in the network,
the required bandwidth decreases. However, the overhead with respect to the
unprotected case tends to remain constant. Indeed, if with global rerouting we
only need from 30 to 60% more bandwidth, with dedicated path protection we
may need almost three times more bandwidth to guarantee the recovery.
114 CHAPTER 6. FAILURE RECOVERY
(a) Pdh.
(b) Nobel-germany.
Figure 6.5: Bandwidth overhead comparison of the Global Rerouting (GR) and
Dedicated Path Protection (DP) schemes with respect to the No-Protection
scenario (NP) for pdh and Nobel-germany networks. Labels on top of the bars
indicate the overhead with respect to the unprotected case.
6.4. NUMERICAL RESULTS 115
∗
zLP MasterILP IterILP IterRR
Network
ColGen Benders time time time
pdh 22s 32s 11mn 4% 1mn 4.82% 40s 12.7%
polska 15s 18s 40s 0.22% 1mn 0.1% 20s 1.4%
nb-germany 35s 1mn 40s 0.17% 4mn 0.06% 30s 3.2%
wxm10 10s 5s 50s 0.3% 40s 1% 10s 5.5%
wxm20 40s 2mn 1mn 0.6% 4mn 0.6% 30s 2.7%
wxm30 3mn 16mn 6mn 0.2% 21mn 0.9% 1mn 4.5%
wxm40 22mn >1h 27mn 2.2% >1h - 2mn 9.2%
Table 6.3: Numerical results for the proposed optimization models. First column
∗
refers to the time needed to find the optimal fractional solution zLP . We set a
maximum time limit of 1h. The other columns refer to the proposed methods
to obtain an integer solution z̃ILP . For each method, we show the additional
time needed and the quality of the solution found, expressed as the ratio =
∗
z̃ILP −zLP
z∗ .
LP
20
Number of distinct paths
15
10
0
h
ka
10
20
30
40
ge
pd
ls
xm
xm
xm
xm
-
po
nb
Figure 6.6: Distribution for the number of distinct paths for each demand. Boxes
are defined by the first and third quartiles. Ends of the whiskers correspond to
the first and ninth deciles. The median value is drawn in red.
116 CHAPTER 6. FAILURE RECOVERY
Paths’ delays. In Fig. 6.7, we show the impact of the number of NFVI nodes
on the path latency distribution and compare them with the ones calculated
using shortest paths on the layered network. As expected, we see that the
number of hops decreases as the number of NFVI nodes increases.
Actually, more there are NFVI-nodes in a network, higher is the opportunity to
find easily closer NFVI-nodes which can perform some of the required network
functions. Another result is that the paths computed using our method are
almost as short (in terms of number of hops) as the shortest paths.
Number of paths. In our considered protection scheme, a demand may be
rerouted on a different path in each of the possible SRLG failure situations.
Even though our optimization models do not impose constraints on the number
of distinct paths for a demand, the experimental results indicate that their
number tends to be small in practice.
In Fig. 6.6, we show the distribution for the number of distinct paths of the
demands for our considered networks. The number of distinct paths increases
with the size of the network and tends to stay within the range (5, 10) for most
of them. For instance, for wxm40 we may have potentially 71 distinct paths to
be used for a demand, one for each of the possible SRLG failure scenarios plus
one for the case without failure. However, as the results show, in such a case,
50% of the demands would use no more than 12 distinct paths.
6.4. NUMERICAL RESULTS 117
(a) Pdh.
(b) Nobel-germany.
Figure 6.7: Hops distribution of the backup paths computed by the global
rerouting scheme (GR) compared with the shortest paths (SP) for pdh and
nobel-germany networks.
118 CHAPTER 6. FAILURE RECOVERY
Controller
S1 S2 S3
None:d,S None:d,W None:d,E
L1:d,S L1:d,S L1:d,E
L2:d,E L2:d,S L2:d,E
L3:d,E L3:d,S L3:d,E
L4:d,SE L4:d,W L4:d,E
L5:d,E L5:d,S L5:d,N
Blob1 Blob2 Blob3
S1 S2
d,S E L1 W d,W
Blob1 Blob2
S SE S
L2 L3 L4
N
d,E E L5
Blob3
S3 d
Controller Controller
S1 S2 S3 S1 S2 S3
None:d,S None:d,W None:d,E None:d,S None:d,W None:d,E
L1:d,S L1:d,S L1:d,E L1:d,S L1:d,S L1:d,E
L2:d,E L2:d,S L2:d,E L2:d,E L2:d,S L2:d,E
L3:d,E L3:d,S L3:d,E L3:d,E L3:d,S L3:d,E
L4:d,SE L4:d,W L4:d,E L4:d,SE L4:d,W L4:d,E
L5:d,E L5:d,S L5:d,N L5:d,E L5:d,S L5:d,N
Blob1 Blob2 Blob3 Blob1 Blob2 Blob3
Full:
Delta:
d,E Full:
Full: d,E Delta:
Blob1 d,S Delta:
d,N d,S
Blob2 d,N
Blob3
S1 S2 S1 S2
d,E E L1 W d,S d,E E L1 W d,S
Blob1 Blob2 Blob1 Blob2
S SE S S SE S
L2 L3 L4 L2 L3 L4
N N
d,N
Blob3
S3
E
X L5
d
d,N
Blob3
S3
E
X L5 d
Figure 6.8: Switch update operations for the full and delta protection scheme
implementation options in SDN networks upon one link failure. In the full
protection scheme, the table inside the switches are completely overwritten by
the new tables. In the delta protection scheme, the controller only modifies
the rules that are changing in the failure scenario.
Controller Controller
S1 S2 S3 S1 S2 S3
None:GoTo T1 None:GoTo T1 None:GoTo T1 None:GoTo T1 None:GoTo T1 None:GoTo T1
L1:GoTo T2 L1:GoTo T2 L1:GoTo T2 L1:GoTo T2 L1:GoTo T2 L1:GoTo T2
L2:GoTo T3 L2:GoTo T3 L2:GoTo T3 L2:GoTo T3 L2:GoTo T3 L2:GoTo T3
L3:GoTo T4 L3:GoTo T4 L3:GoTo T4 L3:GoTo T4 L3:GoTo T4 L3:GoTo T4
L4:GoTo T5 L4:GoTo T5 L4:GoTo T5 L4:GoTo T5 L4:GoTo T5 L4:GoTo T5
L5:GoTo T6 L5:GoTo T6 L5:GoTo T6 L5:GoTo T6 L5:GoTo T6 L5:GoTo T6
Blob1 Blob2 Blob3 Blob1 Blob2 Blob3
Notification:
T0:GoTo T1 T0:GoTo T1 T0:GoTo T6 T0: GoTo T6 Notification: T0:GoTo T6
T1:d,S T1:d,W T1:d,S T0: GoTo T6 T1:d,W
Notification:
T2:d,S T2:d,S T2:d,S T0: GoTo T6 T2:d,S
T3:d,E S1 S2 T3:d,S T3:d,E S1 S2 T3:d,S
d,S
T4:d,E E L1 W T4:d,S d,S
T4:d,E E L1 W T4:d,S
Blob1
T5:d,SE T5:d,W Blob1
T5:d,SE T5:d,W
T6:d,E T6:d,S T6:d,E T6:d,S
Blob1 S SE S Blob2 Blob1 S SE S Blob2
T0:GoTo T1 L2 L3 L4 T0:GoTo T6 L2 L3 L4
T1:d,E T1:d,E
T2:d,E N T2:d,E N
XL5
T3:d,E T3:d,E
T4:d,E E L5 T4:d,E E
T5:d,E T5:d,E
T6:d,N S3 d T6:d,N S3 d
Blob3 Blob3
Figure 6.9: Switch update operations for the notification protection scheme
implementation option in SDN networks upon one link failure. The rules are
preinstalled in the switches in a different table for each failure scenario, and
table 0 is pointing to the no failure table in the normal state. In a failure
situation, the controller updates the pointer in table 0 to use the table assigned
for the failure scenario.
103
GR DP
Recovery time (ms)
102
101
Full Delta Notification Ideal
notification option significantly outperforms the other options. The ideal imple-
mentation also shows that the tools used to implement the protection scheme
have a significant impact on the recovery time as, all things considered, our ideal
is just a way of implementing the notification option without a controller. Actu-
ally, a significant fraction of the recovery time in OpenDaylight implementations
is caused by the usage of the Northbound API. All implementation options offer
sub-second recovery time for the considered network.
Figs. 6.10 and 6.14 show that there is a direct link between the number
of changes to be performed on the switches and the recovery time in Polska
network. The same observation applies to the wx10 network as highlighted by
Figs. 6.11 and 6.15. Figs. 6.14 and 6.15 report, for each switch, the maxi-
mum number of flow table changes observed expressed in number of flow entries
for the three OpenDaylight implementation options. Dedicated path protection
has similar recovery time than global rerouting when the full implementation
is used. This is because the number of flows to install on switches is similar
between dedicated path protection and global rerouting. Similarly, no perfor-
mance difference can be observed when the notifications based implementation
is used as the controller only has to update one rule per switch. On the contrary,
if updates are performed with the delta implementation, then dedicated path
protection converges faster than global rerouting, as it requires much fewer path
changes.
122 CHAPTER 6. FAILURE RECOVERY
103
GR DP
Recovery time (ms)
102
101
Full Delta Notification Ideal
GR DP
103
Max flows installed
102
101
Full Delta Notification
Figure 6.12: Comparison of the flow table sizes of various implementation op-
tions for Global Rerouting (GR) and Dedicated Path Protection (DP) for the
polska network.
6.5. IMPLEMENTATION PERSPECTIVES 123
103 GR DP
Max flows installed
102
101
100
Full Delta Notification
Figure 6.13: Comparison of the flow table sizes of various implementation op-
tions for Global Rerouting (GR) and Dedicated Path Protection (DP) for the
wxm10 network.
60
GR DP
50
Max flows changes
40
30
20
10
0
Full Delta Notification
Figure 6.14: Comparison of the number of flow table changes of various imple-
mentation options for Global Rerouting (GR) and Dedicated Path Protection
(DP) for the polska network.
124 CHAPTER 6. FAILURE RECOVERY
GR DP
60
Max flows changes
40
20
0
Full Delta Notification
Figure 6.15: Comparison of the number of flow table changes of various imple-
mentation options for Global Rerouting (GR) and Dedicated Path Protection
(DP) for the wxm10 network.
Based on the recovery time, one would recommend deploying the notification op-
tion. However, the reduction of the recovery time comes at the cost of increasing
flow table sizes on switches as shown in Fig. 6.12 and Fig. 6.13. Actually, these
figures report, for each switch, the maximum observed flow table size expressed
in number of flow entries for the three OpenDaylight implementation options
in both networks. The full option minimizes the number of entries, as it only
requires to have the flow table for the current routing case. The delta option
consumes slightly more space than the full one, as the flow table always con-
tains the “no-failure” scenario flow table and the additional flow entries needed
to circumvent the current failure. Finally, the notification option has signifi-
cantly larger flow tables (one order of magnitude more), as flow tables always
contain all the potential failure scenarios, which may prevent it to be used in
practice on low end switches or for large networks.
As the robustness of the controller is an orthogonal problem that must be treated
by all SDN solutions and because it is already largely studied [Zha+18], it was
not considered here.
6.6. CONCLUSION 125
6.6 Conclusion
In this chapter, we studied the network dimensioning problem with protection
against a Shared Risk Link Group (SRLG) failure in the light of network virtu-
alization. We considered a path-protection method based on a global rerouting
strategy, which makes the protection method optimal in terms of bandwidth.
We proposed algorithms to compute the backup paths for the demands which
rely on the Column Generation and Benders Decomposition techniques. We
validated them with simulations on real-world and on randomly generated net-
work topologies and workloads. Simulations show that our algorithms can reach
near-optimal solutions in a short time, even for large instances. Finally, we
proposed a real implementation of our proposition in OpenDaylight and show
the applicability of the global rerouting protection method when SDN is used.
The experimental results in Mininet show that our solution provides sub-second
recovery times, but the way it is implemented may greatly impact the amount
of signaling traffic exchanged. In our evaluations, the recovery phase requires
only a few tens of milliseconds for the fastest implementation, compared to a
few hundreds of milliseconds for the slowest one.
126 CHAPTER 6. FAILURE RECOVERY
Chapter 7
127
128 CHAPTER 7. CONCLUSION AND FUTURE WORK
of the virtual network could lead to results that are not trustable, or a complete
crash of the test-bed.
In Chapter 5, we considered the cloud computing topic. Organizations that
are planning to move their applications into the cloud should first check that
the cloud network will support these applications. On premises datacenter,
the delay is usually low, while cloud infrastructures do not provide a specific
value for the delay in a region or in between two different regions. To monitor
the delay within the cloud infrastructure, we implemented CloudTrace, a simple
tool that automatizes the deployment and the configuration of multiregional and
regional Virtual Private Clouds. We first showed the main services proposed
by Amazon Web Services to provide Infrastructure as a Service deployment.
After introducing the single components, we described how to merge them to
provide a multiregional and regional environment. CloudTrace automatically
creates the environments on the regions specified by the user, then manages all
the virtual instances, updating them and installing all the requirements. The
user can run the experiment after the deployment. CloudTrace will synchronize
Paris-Traceroute to run each minute. Results can be retrieved by the user at
any moment, without stopping the experiments. CloudTrace is finally able to
analyze the data received and build a map with all the data collected.
In our last contribution (Chapter 6), we investigated different problems and
proposed new solutions in order to minimize the bandwidth used in the net-
work. The challenge is to design a network, considering a path-based protection
scheme with a global rerouting strategy. In the worst case, using a global rerout-
ing strategy, we may have a new routing of all the flows. This means that the
controller has to send many updates to the switches. We extensively exper-
imented the new approach, comparing it with the dedicated path protection
one. For the experiment test-bed, we used Mininet to emulate the networks
and OpenDaylight as a controller. We demonstrated that there is not a signif-
icant increase in the update rule time in both strategies. However, the global
rerouting consumes on average 40% less bandwidth with respect to the dedi-
cated path protection. The experiments ran on real-world network topologies
and randomly generated instances. We also presented three different implemen-
tation strategies. We showed that the technical implementation choices greatly
impact the time needed to reestablish the flows after a failure occurs.
131
132 BIBLIOGRAPHY
[KK98] George Karypis and Vipin Kumar. “Multilevel algorithms for multi-
constraint graph partitioning”. In: SC’98: Proceedings of the 1998
ACM/IEEE Conference on Supercomputing. IEEE. 1998, pp. 28–
28 (cit. on p. 47).
[KKV05] Srikanth Kandula, Dina Katabi, and Jean-Philippe Vasseur. “Shrink:
A tool for failure diagnosis in IP networks”. In: Proceedings of the
2005 ACM SIGCOMM workshop on Mining network data. ACM.
2005, pp. 173–178 (cit. on pp. 94, 110).
[KL70] Brian W Kernighan and Shen Lin. “An efficient heuristic proce-
dure for partitioning graphs”. In: Bell system technical journal
49.2 (1970), pp. 291–307 (cit. on p. 51).
[KM05] Hervé Kerivin and A Ridha Mahjoub. “Design of survivable net-
works: A survey”. In: Networks: An International Journal 46.1
(2005), pp. 1–21 (cit. on p. 95).
[KNS09] Robert Krauthgamer, Joseph Naor, and Roy Schwartz. “Parti-
tioning graphs into balanced components”. In: Proceedings of the
twentieth annual ACM-SIAM symposium on Discrete algorithms.
SIAM. 2009, pp. 942–949 (cit. on p. 51).
[KS97] Stavros G Kolliopoulos and Clifford Stein. “Improved approxima-
tion algorithms for unsplittable flow problems”. In: Proceedings
38th Annual Symposium on Foundations of Computer Science.
IEEE. 1997, pp. 426–436 (cit. on p. 54).
[Kva+06] Amund Kvalbein, Audun Fosselie Hansen, Stein Gjessing, and
Olav Lysne. “Fast IP network recovery using multiple routing con-
figurations”. In: Proceedings of IEEE INFOCOM. 2006 (cit. on
pp. 95, 96).
[Len+21] Giuseppe Di Lena, Andrea Tomassilli, Frédéric Giroire, Damien
Saucez, Thierry Turletti, and Chidung Lac. “A Right Placement
Makes a Happy Emulator: a Placement Module for Distributed
SDN/NFV Emulation”. Proceedings of IEEE International Con-
ference on Communications (ICC). 2021 (cit. on p. 13).
[LHM10] Bob Lantz, Brandon Heller, and Nick McKeown. “A Network in
a Laptop: Rapid Prototyping for Software-defined Networks”. In:
Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics
in Networks. ACM, 2010. doi: 10.1145/1868447.1868466 (cit. on
pp. 31, 33, 119).
[Li+17] Z. Li, M. Kihl, Q. Lu, and J. A. Andersson. “Performance Over-
head Comparison between Hypervisor and Container Based Vir-
tualization”. In: 2017 IEEE 31st International Conference on Ad-
vanced Information Networking and Applications (AINA). 2017,
pp. 955–962. doi: 10.1109/AINA.2017.79 (cit. on p. 80).
138 BIBLIOGRAPHY
[LM15] S.-I. Lee and Myung M.-K. Shin. “A self-recovery scheme for ser-
vice function chaining”. In: International Conference on Informa-
tion and Communication Technology Convergence (ICTC). 2015,
pp. 108–112 (cit. on p. 94).
[Mar+14] Athina Markopoulou, Gianluca Iannaccone, Supratik Bhattacharyya,
Chen-Nee Chuah, and Christophe Diot. “Characterization of fail-
ures in an IP backbone”. In: Proceedings of IEEE INFOCOM,
2004. 2014 (cit. on p. 94).
[McK+08] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar,
Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan
Turner. “OpenFlow: enabling innovation in campus networks”. In:
ACM SIGCOMM Computer Communication Review 38.2 (2008),
pp. 69–74 (cit. on p. 94).
[Med+15] J. Medved, R. Varga, A. Tkacik, and K. Gray. “OpenDaylight:
Towards a Model-Driven SDN Controller architecture”. In: Pro-
ceedings of IEEE WoWMoM 2014. 2015 (cit. on p. 120).
[Mel+11] Márcio Melo, Jorge Carapinha, Susana Sargento, Luis Torres, Phuong
Nga Tran, Ulrich Killat, and Andreas Timm-Giel. “Virtual net-
work mapping–an optimization problem”. In: Springer MONAMI.
2011 (cit. on p. 49).
[Mer14] Dirk Merkel. “Docker: lightweight linux containers for consistent
development and deployment”. In: vol. 2014. 239. 2014, p. 2 (cit.
on p. 33).
[met20] metis. Metis documentation. http : / / glaros . dtc . umn . edu /
gkhome/metis/metis/overview. 2020 (cit. on pp. 46, 48).
[Mic20] Microsoft. Project xCloud website. https://www.xbox.com/en-
GB/xbox-game-streaming/project-xcloud. 2020 (cit. on p. 82).
[Min16] Mininet. Mininet Cluster Edition TCLink discussion. https://
mailman.stanford.edu/pipermail/mininet- discuss/2016-
July/007005.html. 2016 (cit. on p. 34).
[Mor+14] Igor M. Moraes, Diogo M.F. Mattos, Lyno Henrique G. Ferraz,
Miguel Elias M. Campista, Marcelo G. Rubinstein, Luı́s Henrique
M.K. Costa, Marcelo D. de Amorim, Pedro B. Velloso, Otto Carlos
M.B. Duarte, and Guy Pujolle. “FITS: A flexible virtual network
testbed architecture”. In: Computer Networks 63 (2014). Special
issue on Future Internet Testbeds Part II, pp. 221–237. issn: 1389-
1286. doi: https://doi.org/10.1016/j.bjp.2014.01.002.
url: http://www.sciencedirect.com/science/article/pii/
S1389128614000036 (cit. on p. 33).
[MP12] Jeffrey C Mogul and Lucian Popa. “What we talk about when
we talk about cloud network performance”. In: ACM SIGCOMM
Computer Communication Review 42.5 (2012), pp. 44–48 (cit. on
p. 81).
BIBLIOGRAPHY 139