Nothing Special   »   [go: up one dir, main page]

Advanced Distributed Systems

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

UNIT 1 ADVANCED DISTRIBUTED SYSTEMS

A distributed system consists of hardware and software components located in a network of


computers that communicate and coordinate their actions only by passing messages.
Implications of distributed systems

Concurrency components execute in concurrent processes that read and update


shared resources. Requires coordination
No global clock makes coordination difficult (ordering of events)
Independent failure of components partial failure & incomplete information
Unreliable communication Loss of connection and messages. Message bit errors
Unsecure communication Possibility of unauthorized recording and modification of
messages.
Expensive communication Communication between computers usually has less
bandwidth, longer latency, and costs more, than between independent processes on the
same computer.

Goals of distributed systems

Resource sharing the possibility of using available resources anywhere.


Openness an open distributed system can be extended and improved incrementally.
requires publication of component interfaces and standards protocols for accessing
interfaces.
Scalability the ability to serve more users, provide acceptable response times with
increased amount of data.
Fault tolerance maintain availability even when individual components fail.
Allow heterogeneity network and hardware, operating system, programming
languages, implementations by different developers.

Types of distributed system


1. Distributed Computing Systems
Used for high performance computing tasks
Cluster computing systems
Grid computing systems
2. Distributed Information Systems
Systems mainly for management and integration of business functions
Transaction processing systems
Enterprise Application Integration
3. Distributed Pervasive (or Ubiquitous) Systems
Mobile and embedded systems
Home systems
Sensor networks

Designing the distributed systems does not come for free. Some challenges need to be
overcome in order to get the ideal systems. The challenges in distributed systems are:

Heterogeneity
This term means the diversity of the distributed systems in terms of hardware, software,
platform, etc. Modern distributed systems will likely span different:

Hardware devices: computers, tablets, mobile phones, embedded devices, etc.

Operating System: Ms Windows, Linux, Mac, UNIX, etc.

Network: Local network, the Internet, wireless network, satellite links, etc.

Programming Languages: Java, C/C++, Python, PHP, etc.

Different roles of software developers, designers, system managers

Transparency
Distributed systems designers must hide the complexity of the systems as much as they
can. Adding abstraction layer is particularly useful in distributed systems. While users hit
search in google.com, they never notice that their query goes through a complex process
before google shows them a result. Some terms of transparency in distributed systems
are:

Openness
If the well-defined interfaces for a system are published, it is easier for developers to add
new features or replace sub-systems in the future. Example: Twitter and Facebook have
API that allows developers to develop their own software interactively.

Concurrency
Distributed Systems usually is multi-users environment. In order to maximize
concurrency, resource handling components should be anticipate as they will be accessed
by competing users. Concurrency is a tricky challenges, then we must avoid the systems
state from becoming unstable when users compete to view or update data.

Security
Every system must consider strong security measurement. Distributed Systems somehow
deals with sensitive information; so secure mechanism must be in place.

Scalability
Distributed systems must be scalable as the number of user increases.
Scalability has 3 Dimensions:

Size: Number of users and resources to be processed. Problem associated is overloading


Geography: Distance between users and resources. Problem associated is
communication reliability

Administration: As the size of distributed systems increases, many of the system


needs to be controlled. Problem associated is administrative mess.

Resilience to Failure

Distributed Systems involves a lot of collaborating components (hardware, software,


communication). So there is a huge possibility of partial or total failure.

Architectural Models
How are responsibilities distributed between system components and how are these
components placed?
Client-server model
The system is structured as a set of processes, called servers that offer services to the users,
called clients.
The client-server model is usually based on a simple request/reply protocol, implemented with
send/receive primitives or using remote procedure calls (RPC) or remote method invocation
(RMI): - the client sends a request (invocation) message to the server asking for some service;
- the server does the work and returns a result (e.g. the data requested) or an error code if the
work could not be performed.
A server can itself request services from other servers; thus, in this new relation, the server
itself acts like a client.

Peer-to-peer
All processes (objects) play similar role.
Processes (objects) interact without particular distinction between clients and servers.

The pattern of communication depends on the particular application.


A large number of data objects are shared; any individual computer holds only a small part
of the application database.
Processing and communication loads for access to objects are distributed across many
computers and access links.
This is the most general and flexible model.

For example, be distributed computing, file-sharing, distributed storage, communication, or


real-time media streaming. Ideally, there is no centralized entity to control, organize,
administer, or maintain the entire system. Instead, these functions are divided and supported
by all peers. Peers cooperate by sharing resources such as storage, CPU cycles, network
bandwidth, and data. A number of benefits are obtained by adopting the P2P paradigm to
implement distributed applications.
These benefits include:
(1) Improved scalability by aggregating resources from peers and reducing the reliance on
centralized servers.
(2) Cost-effectiveness by utilizing already-deployed resources and eliminating the need for
expensive infrastructure.
(3) deplorability by performing all processing at the end systems.
Peer-to-Peer Applications
According to the software architecture model described in Section 1.2, P2P applications are
built on top of P2P substrates. The P2P substrate provides file lookup and peer management
services to the P2P application. Many distributed applications can leverage the P2P paradigm.
In this section, we present different categories of applications that either have been proposed
in the literature, or have been deployed in the real world. We do not intend to provide
exhaustive coverage of all possible P2P applications.

File Sharing
File sharing is the simplest and the most widely-deployed application in P2P systems. A filesharing application uses the P2P substrate to discover peers who have a requested file. Once
one or more suppling peers have been found, connections are established between the
supplier(s) and the requester. The application does not do more than storing and providing
files to requesting peers.
Media Streaming and High-bandwidth Content Distribution
In file-sharing P2P applications, a client has to download the entire file before starting using
it. Consider for example, a one-hour movie recorded at 1 Mb/s, and being downloaded by a
client with an in-bound bandwidth of 1.5 Mb/s. Ignoring all protocols overhead and
retransmissions, the client will have to wait for 40 minutes to start watching the movie! Given
that most of the contents distributed over the current P2P systems are multimedia files [17],
P2P media streaming applications have been receiving increasing attention in the research
community [40]. Real-time streaming applications start playing out the requested movie after
a short (e.g., order of seconds) waiting period.
File and Storage Systems
Distributed file systems provide logical functions similar to those provided by a centralized
file server, but they are constructed from physically distributed peers.
Some problems with client-server:
Centralisation of service poor scaling
Limitations: capacity of server bandwidth of network connecting the server
Peer-to-Peer tries to solve some of the above
It distributes shared resources widely
share computing and communication loads
Problems with peer-to-peer:
High complexity due to
- cleverly place individual objects
- retrieve the objects
- maintain potentially large number of replicas

Introduction to interprocess communications


Interprocess communication (IPC) is the set of tools provided by the OS to allow processes
that do not share common memory segments to communicate with each other
UNIX pipes is one of these tools: it is an instance of message passing
Message passing comprises two basic primitives:
send(destination, this_msg, msg_length);
receive(source, a_msg, &how_longength);

External data representation and marshalling

The Pub/Sub Model


The pub/sub communication model defines three different roles for entities in the system.
Publishers are sources that inject their information using publication messages. Subscribers
are information sinks and act as consumers of publications that were produced by publishers.
For this purpose, a subscriber issues a subscription message to specify the type of publications
that it would like to consume. Publishers and subscribers are collectively considered to be
clients for the pub/sub middleware which is then responsible for delivering published
messages to subscribers by taking their subscription interests into account. Typically, this
involves forwarding of publications through a number of intermediary nodes in an overlay
network. These forwarding nodes collectively realize the pub/sub service and are referred to
as service providers. In the pub/sub model, service providers store clients subscriptions and
use it to determine which publications must be delivered to which subscribers. This has the
advantage of eliminating the need for publishers or subscribers to be consciously aware of one
another.

Based on the expressiveness of the language used to represent subscribers interests, pub/sub
systems are commonly classified into two types, namely topic-based and content-based [53].
Simply put, both topic-based and content-based pub/sub systems allow a subscriber to issue
subscriptions that declare filtering constraints on the produced publication messages. Only
publications that satisfy these constraints are delivered to a subscriber. These publications are
said to match the clients subscription. Topic-based pub/sub systems support simple
constraints that are based upon a predefined set of topics.

Dependability in Pub/Sub Systems


Different aspects of dependability of operation in pub/sub systems.
Reliability: Reliable publication delivery concerns assurances provided by the pub/sub
implementation regarding successful delivery of individual publications. For example, highly
sensitive stock market information must be delivered to interested traders as failure to meet
this requirement can potentially lead to lost trading opportunities and cause financial loss. As
a result, reliability of the pub/sub system in this application scenario is of great importance.
Ordered delivery: Ordering guarantees concern assurances regarding the order of
successive publications that are delivered to the subscribers. For example, if the pub/sub

system provides total ordering guarantee, then all traders will receive stock quotes in the exact
same order. This may be useful in order to ensure an even and unbiased playing field for all
competing traders. Alternatively, the system may provide causal ordering such that the
precedence relationships between messages are preserved. This can be useful to study how
traders react to delivery of market news, for instance.
Recovery from failures: Distributed applications are commonly composed of faultprone
processes and networking components which may cease to operate at any point in time or
become disconnected from one another. For example, service providers may instantaneously
crash at any time or the machines that they run on may be unplugged suddenly. Likewise,
communication links may be unreliable and experience long periods of disconnections.
Occurrence of such failures in an unprepared pub/sub system can significantly hinder its
operation and even permanently disrupt its availability. To recover from such failure
scenarios, the system must have built-in recovery mechanisms that ensure such disruptions are
temporary and do not impact the operation of the system in the long run. In a distributed
pub/sub system, recovery typically involves amending the pub/sub overlay (i.e., maintaining
connectivity among service providers despite failures) and updating routing tables of service
providers accordingly in order to setup new forwarding paths in the network (i.e., in order to
re-route publications).

Cloud computing
In the simplest terms, cloud computing means storing and accessing
data and programs over the Internet instead of your computer's hard
drive. The cloud is just a metaphor for the Internet. It goes back to the
days of flowcharts and presentations that would represent the gigantic
server-farm infrastructure of the Internet as nothing but a puffy, white
cumulonimbus cloud, accepting connections and doling out information
as it floats.
What cloud computing is not about is your hard drive. When you store
data on or run programs from the hard drive, that's called local storage
and computing. Everything you need is physically close to you, which
means accessing your data is fast and easy, for that one computer, or
others on the local network. Working off your hard drive is how the
computer industry functioned for decades; some would argue it's still
superior to cloud computing, for reasons I'll explain shortly.

For it to be considered "cloud computing," you need to access your


data or your programs over the Internet, or at the very least, have that
data synchronized with other information over the Web. In a big

business, you may know all there is to know about what's on the other
side of the connection; as an individual user, you may never have any
idea what kind of massive data-processing is happening on the other
end. The end result is the same: with an online connection, cloud
computing can be done anywhere, anytime.
Common Cloud Examples
The lines between local computing and cloud computing sometimes get
very, very blurry. That's because the cloud is part of almost everything
on our computers these days. You can easily have a local piece of
software (for instance, Microsoft Office 365 ) that utilizes a form of
cloud computing for storage
Some other major examples of cloud computing you're probably using:
Google Drive : This is a pure cloud computing service, with all the
storage found online so it can work with the cloud apps: Google Docs,
Google Sheets, and Google Slides. Drive is also available on more
than just desktop computers; you can use it on tablets like
the iPad $335.00 at Amazon or on smartphones, and there are separate
apps for Docs and Sheets, as well. In fact, most of Google's services
could be considered cloud computing: Gmail, Google Calendar, Google
Maps, and so on.
Apple iCloud : Apple's cloud service is primarily used for online storage,
backup, and synchronization of your mail, contacts, calendar, and
more. All the data you need is available to you on your iOS, Mac OS,
or Windows device (Windows users have to install the iCloud control
panel). Naturally, Apple won't be outdone by rivals: it offers cloudbased versions of its word processor (Pages), spreadsheet (Numbers),
and presentations (Keynote) for use by any iCloud subscriber. iCloud is
also the place iPhone users go to utilze the Find My iPhone feature
that's all important when the phone goes missing.
Hybrid services like Box, Dropbox , and SugarSync all say they work in
the cloud because they store a synced version of your files online, but
most also sync those files with local storage. Synchronization to allow
all your devices to access the same data is a cornerstone of the cloud
computing experience, even if you do access the file locally.

You might also like