Distributed System Lecture 1

DISTRIBUTED SYSTEMS
Lecture 1 – Introduction to distributed systems
Pan Hui
pan.hui@helsinki.fi
Huber Flores
huber.flores@helsinki.fi
Helsinki, Finland, 2018.

2
Agenda
• Concept and definition
– What is a distributed system
• Modelling
• Given the model
– How to reason about correctness

3
Concept and definition
WHAT IS A DISTRIBUTED SYSTEM?

4
Examples of distributed systems

A few examples of distributed systems are:
• Google – for big data processing
• Facebook – for content sharing
• Skype – for voice and video
• BitTorrent (P2P Network) – for sharing any file
• Sensor Networks – for monitoring of artifacts
• Vehicular networking – for traffic analysis
• Surveillance systems – for security

5
What is a distributed system?

• Abstract view: A network of nodes/processes

6

• Definition 1 (from Van Steen)
– A distributed system is a collection of autonomous
computing elements that appears to its users as a single
coherent system.
• Definition 2 (from Gosh)
– It is a network of processes.
– The nodes are processes,
and the edges are communication
channels.
– The processes (or computers) coordinate their activities, so
that users perceive it as a single, integrated computing
service with a well-defined goal.

7

• A set of nodes, connected by a network, which
appear to its users as a single coherent system
(from Haridi)

8

9
Coherent system, integrated service

• End user cannot tell where the computation is performed
• End user (or the application) does not care where data is
stored
• Replication is hidden
• Geographic distribution
• Goals
– Sharing resources
– Openness
– Scalability
– Availability

10
Why to study distributed systems?

• Facts:
– It is hard to find systems that do not work in a
distributed manner
• Fault tolerance, reliability and better performance
– Demand for resources has increased dramatically,
a single computer cannot cope with it
• Internet services
– Our society works in a networked fashion

11

12

13

14
Distributed transparency
Transparency Description
Access Hide differences in data representation and how an object
is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another location
while in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several independent
users
Failure Hide the failure and recovery of an object
[source]: van Steen & Tannenbaum

15
Distributed computing structure

• Distributed computing relies on
inter-process communication,
which involves the various layers
of networking.
– Message passing
• Distributed computing helps
create simple abstractions for
these layers to facilitate program
writing.
– E.g. message, packet, datagram,
frame

16
Network layers
• Message from sender
goes through all the
layers on its way
• Protocols/
functionality
described on each
layer separately
• Abstraction!!!

17
Architecture layers
• Different layers –
different abstractions
• Assumptions about
the lower layer’s
behaviour
• A process in each
layer

18
Critical challenges
• Knowledge is local – no single central control
point
• Clocks are not synchronized
• No globally shared address space
• Topology and routing – Everything is dynamic
• Scalability
• Processes and links fails
– Fault tolerance, e.g., CODA File System
19
Common issues
• Leader election
• Mutual exclusion
• Time synchronization
• Distributed snapshot
• Reliable multicast
• Replica management
• Consensus

20
Implementation
• Real network
– Cellular network, 3G, LTE, 5G
• Simulation
– Programming languages, e.g., Python, Java, Spark,
Erlang, and so on
– Multi-core, multi-thread, shared-memory
• Cloud
– Virtualization (on the fly and on demand)

21
Hands-on session
• Execute python routines to communicate two
processes
– https://gist.github.com/huberflores/6a5ecee3ef4
920d16b4c0cb1c737bb6f

22
MODELLING

23
Models
• We will reason about distributed systems by
relying on models. There are many dimensions of
variability in distributed systems. Examples
– Type of processors
– inter-process communication mechanisms
– Timing assumptions
– Failure classes
– Security features, etc.

24
Models
• Models are simple
abstractions that help
to overcome the
variability --
abstractions that
preserve the essential
features, but hide the
implementation details
and simplify writing
distributed algorithms
for problem solving

25
A classification

26
Modelling communication
• System topology is a graph
G = (V, E), where V = set of
nodes (sequential
processes) E = set of edges
(links or channels,
bi/unidirectional).
• Four types of actions by a

process:
– internal action
– input action
– communication action
– output action

27
Reliable channel
• Axiom1: message m sent
⇔ message m received
• Axiom2: propagation
delay arbitrary, but finite
FIFO channel also:

• Axiom3: m1 sent before
m2 ⇒ m1 received before
m2 NOTE: Axioms are for
the model only
not for real systems!

28
More about channels

• Reliable channel is easy for correctness proof of a
model, but real channels are not reliable
• How to make sure that the channel is empty?
– Bound (=finite) propagation delay -> simply wait
– Unbound propagation delay ??
– FIFO Channel -> send a special message
• What if the channel happened to contain that special
message
– Unbound, non-FIFO channel????
• Nobody knows how to do this!!
• NOTE: Not a formal impossibility proof

29
Relationships among models

• Different models for different purposes
– Communication models
– Failure models
– Performance models (not on this course)
–…
• Two models are equivalent when 1-1
correspondence between objects and
operations of the models

30
Weak vs. strong models

• One object (or operation) Examples
of a strong model = More • High level language is
than one simpler objects
(or simpler operations) of stronger than assembly
a weaker model. language.
• Often, weaker models • Asynchronous is weaker
are synonymous with than synchronous
fewer restrictions. (communication).
• One can add layers • Bounded delay is stronger
(additional restrictions) than unbounded delay
to create a stronger (channel).
model from weaker one.

31
Model transformations
“Can model X be
• Stronger models implemented using
– Simplify reasoning, but model Y?” is an
– Need extra work to implement interesting question
in computer science.
• Weaker models
– Are easier to implement
– Have closer relationship with the real world, but
– Might be difficult or impossible to proove correct
• Common transformations (weaker to stronger)
– Non-FIFO to FIFO channel
– Message passing to shared memory
– Non-atomic to atomic broadcast

32
Non-FIFO to FIFO channel

Sender process P Receiver process Q
var i : interger var k : interger {initially 0}
{initially 0} buffer : buffer [0.. ∞] of msg
{initially 0k: buffer[k]=null}
repeat {store}
receive m[i], i from P;
repeat
store m[i] into buffer[i];
send m[i], i to Q;
{deliver}
i++;
while buffer[k] null do begin
forever
deliver content of buffer[k];
NOT GOOD: buffer[k]:=null; k++;
Needs unbounded buffer & end
unbounded sequence no forever
33
Other classifications
• Reactive vs transformational systems
– Reactive system always ready to react to requests
or changes
– Transformational (nonreactive) system reaches a
fixed point (like termination) after which no
further changes
• Named vs anonymous systems
– Anonymous when the algorithms do not consider
name or identifiers of processes.

34
Common complexity measures

• Space complexity
– How much space is needed per process to run an algorithm?
(measured in terms of N, the size of the network)
• Time complexity
– What is the max. time (number of steps) needed to complete
the execution of the algorithm?
• Message complexity
– How many message are exchanged to complete the execution of
the algorithm?
• Bit complexity (not so common)
– How may bits are transmitted when the algorithm runs.
Messages might be of arbitrary size

35
Example of complexity: Multicasting in
hypercube
• k-cube, with N=2k processes
• k is number of edges
connected to each node
• Multicast – send msg to all
• Node 0 starts the multicast
by sending the value to its neighbors
• Every node forwards the message to nodes
with higher id than itself
• Multicast finished when no node forwards
messages any more
36
Hypercube multicast using message
passing model
• x[i] value of x in process i
• N(i) neighbors of process i
• Start when p0 send x[i]
{Program for process i>0}
receive msg m {m cointains the value};
if m is received for the first time
then x[i] := m.value; Message
send x[i] to each node in {j ∈ N(i): j>i} complexity |E|
else discard m - Size of edge set
end if = number of edges

37
Hypercube multicast using state-
reading model
• x[i] value of x in process i
• N(i) neighbors of process i
• Now processes can read
the states of its neighbors
• P0 initiates by setting x[0]:=ν

while ∃j ∈ N(i): (j<i) ∧ (x[i]≠x[j]) do x[i] := x[j] end while
• To reduce the number of updates, allow update only
when all lowed-number neighbors have identical value
•while
. ∀j,k ∈ N(i): j<i ∧ k<i, x[j]=x[k]) ∧ x[i] ≠ x[j] do x[i] := x[j] end while
38
QUESTIONS?

39
Your tasks
• For next lecture session: (From Van Steen)
– Read Chapter 2 Architectural Styles
– Start reading Chapter 3 Processes
• Install Python in your computer

• Introduce a third process in the hands-on
exercise. The new process should append a
new word.
• Register to the course in webOODI, if you have
not yet done it

40
SEE YOU IN NEXT SESSION! 

Distributed System Lecture 1

Uploaded by

Copyright:

Available Formats

Distributed System Lecture 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed System Lecture 1

Uploaded by

Copyright:

Available Formats

DISTRIBUTED SYSTEMS

Lecture 1 – Introduction to distributed systems

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Concept and definition

WHAT IS A DISTRIBUTED SYSTEM?

Helsinki, Finland, 2018.

Examples of distributed systems

Helsinki, Finland, 2018.

What is a distributed system?

Helsinki, Finland, 2018.

What is a distributed system?

Helsinki, Finland, 2018.

What is a distributed system?

Helsinki, Finland, 2018.

What is a distributed system?

Helsinki, Finland, 2018.

Coherent system, integrated service

Helsinki, Finland, 2018.

Why to study distributed systems?

Helsinki, Finland, 2018.

Why to study distributed systems?

Helsinki, Finland, 2018.

Why to study distributed systems?

Helsinki, Finland, 2018.

Why to study distributed systems?

Helsinki, Finland, 2018.

[source]: van Steen & Tannenbaum

Helsinki, Finland, 2018.

Distributed computing structure

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

• Four types of actions by a

Helsinki, Finland, 2018.

FIFO channel also:

Helsinki, Finland, 2018.

More about channels

Helsinki, Finland, 2018.

Relationships among models

Helsinki, Finland, 2018.

Weak vs. strong models

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

Non-FIFO to FIFO channel

Helsinki, Finland, 2018.

Common complexity measures

Helsinki, Finland, 2018.

Helsinki, Finland, 2018.

{Program for process i>0}

Helsinki, Finland, 2018.