Cluster Computing

Classification of Cluster Computer
Clusters Classification..1
Based
on Focus (in Market)
High Performance (HP) Clusters

Grand Challenging Applications
High Availability (HA) Clusters

Mission Critical applications
HA Cluster: Server Cluster with "Heartbeat" Connection
Based
on Workstation/PC Ownership
Dedicated Clusters Non-dedicated clusters

Adaptive parallel computing Also called Communal multiprocessing
Based
on Node Architecture..
Clusters of PCs (CoPs) Clusters of Workstations (COWs) Clusters of SMPs (CLUMPs)
Building Scalable Systems: Cluster of SMPs (Clumps)
Performance of SMP Systems Vs. Four-Processor Servers in a Cluster
Based on Node OS Type..
Linux Clusters (Beowulf) Solaris Clusters (Berkeley NOW) NT Clusters (HPVM) AIX Clusters (IBM SP2) SCO/Compaq Clusters (Unixware) .Digital VMS Clusters, HP clusters, ..
7
Based
on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..):
Homogeneous Clusters
All nodes will have similar configuration
Heterogeneous Clusters
Nodes based on different processors and running different OSes.
8
Clusters Classification..6a
(3) Network
Public
Dimensions of Scalability & Levels of Clustering
Metacomputing
Enterprise
Campus Department Workgroup Uniprocessor
Technology
(1)
SMP
Cluster
MPP
Platform
(2) 9
Clusters Classification..6b
Group Clusters (#nodes: 2-99)

(a set of dedicated/non-dedicated computers - mainly connected by SAN like Myrinet) Departmental Clusters (#nodes: 99-999) Organizational Clusters (#nodes: many 100s) (using ATMs Net) Internet-wide Clusters=Global Clusters: (#nodes:
1000s to many millions)
Metacomputing Web-based Computing Agent Based Computing
Java plays a major in web and agent based computing
10
Cluster Middleware and Single System Image
11
Contents

What is Middleware ? What is Single System Image ? Benefits of Single System Image SSI Boundaries SSI Levels Relationship between Middleware Modules. Strategy for SSI via OS Solaris MC: An example OS supporting SSI Cluster Monitoring Software
12
What is Cluster Middleware ?

An interface between user applications and cluster hardware and OS platform. Middleware packages support each other at the management, programming, and implementation levels. Middleware Layers:
SSI Layer Availability Layer: It enables the cluster services of

Checkpointing, Automatic Failover, recovery from failure, fault-tolerant operating among all cluster nodes.
13
Middleware Design Goals
Complete Transparency
Lets the see a single cluster system..
Single entry point, ftp, telnet, software loading... Scalable Performance
Easy growth of cluster
no change of API & automatic load distribution. Enhanced Availability
Automatic Recovery from failures

Employ checkpointing & fault tolerant technologies
Handle consistency of data when replicated..
14
What is Single System Image (SSI) ?
A single system image is the illusion, created by software or hardware, that a collection of computing elements appear as a single computing resource. SSI makes the cluster appear like a single machine to the user, to applications, and to the network. A cluster without a SSI is not a cluster
15
Benefits of Single System Image
Usage of system resources transparently Improved reliability and higher availability Simplified system management
Reduction in the risk of operator errors

User need not be aware of the underlying system architecture to use these machines effectively
16
SSI vs. Scalability (design space of competing arch.)
17
Desired SSI Services
Single Entry Point
telnet cluster.my_institute.edu telnet node1.cluster. institute.edu

Single File Hierarchy: xFS, AFS, Solaris MC Proxy Single Control Point: Management from single GUI Single virtual networking Single memory space - DSM Single Job Management: Glunix, Condin, LSF Single User Interface: Like workstation/PC windowing environment (CDE in Solaris/NT), may it can use Web technology
18
Availability Support Functions
Single I/O Space (SIO):
any node can access any peripheral or disk devices without the knowledge of physical location.
Single Process Space (SPS)
Any process on any node create processes cluster wide and they communicate through signal, pipes, etc, as if they are one a single node.
Checkpointing and Process Migration.
Saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. PM for Load balancing...

Reduction in the risk of operator errors User need not be aware of the underlying system architecture to use these machines effectively
19
SSI Levels
It is a computer science notion of levels of abstractions (house is at a higher level of abstraction than walls, ceilings, and floors).
Application and Subsystem Level
Operating System Kernel Level
Hardware Level
20
Cluster Computing - Research Projects

Beowulf (CalTech and Nasa) - USA CCS (Computing Centre Software) - Paderborn, Germany Condor - Wisconsin State University, USA DJM (Distributed Job Manager) - Minnesota Supercomputing Center DQS (Distributed Queuing System) - Florida State University, USA EASY - Argonne National Lab, USA HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US far - University of Liverpool, UK Gardens - Queensland University of Technology, Australia Generic NQS (Network Queuing System),University of Sheffield, UK NOW (Network of Workstations) - Berkeley, USA NIMROD - Monash University, Australia PBS (Portable Batch System) - NASA Ames and LLNL, USA PRM (Prospero Resource Manager) - Uni. of S. California, USA QBATCH - Vita Services Ltd., USA
21
Cluster Computing - Commercial Software

Codine (Computing in Distributed Network Environment) - GENIAS GmbH, Germany LoadLeveler - IBM Corp., USA LSF (Load Sharing Facility) - Platform Computing, Canada NQE (Network Queuing Environment) - Craysoft Corp., USA OpenFrame - Centre for Development of Advanced Computing, India RWPC (Real World Computing Partnership), Japan Unixware (SCO-Santa Cruz Operations,), USA Solaris-MC (Sun Microsystems), USA
22
Representative Cluster Systems

1. Solaris -MC 2. Berkeley NOW 3. their comparison with Beowulf & HPVM
23
Next Generation Distributed Computing:
The Solaris MC Operating System
24
Why new software?

Without
software, a cluster is:
Just a network of machines Requires specialized applications Hard to administer
With

a cluster operating system:
Cluster becomes a scalable, modular computer Users and administrators see a single large machine Runs existing applications Easy to administer
New
software makes cluster better for the customer

25
Cluster computing and Solaris MC
Goal: use computer clusters for general-purpose computing Support existing customers and applications

Solution: Solaris MC (Multi Computer) operating system
A distributed operating system (OS) for multi-computers
26
What is the Solaris MC OS ?
Solaris MC extends standard Solaris Solaris MC makes the cluster look like a single machine
Global
file system Global process management Global networking
Solaris MC runs existing applications unchanged

Supports
Solaris ANI (Application binary interface)
27
Applications
Ideal for:
Web
and interactive servers Databases File servers Timesharing
Benefits for vendors and customers

Preserves
investment in existing applications Modular servers with low entry-point price and low cost of ownership Easier system administraion Solaris could become a preferred platform for clustered systems
28
Solaris MC is a running research system
Designed, built and demonstrated Solaris MC prototype

CLuster
of SPARCstations connected with Myrinet network Runs unmodified commercial parallel database, scalable Web server, parallel make
Next: Solaris MC Phase II

High
availability New I/O work to take advantage of clusters Performance evaluation
29
Advantages of Solaris MC
Leverages continuing investment in Solaris

Same
applications: binary-compatible Same kernel, device drivers, etc. As portable as base Solaris - will run on SPARC, x86, PowerPC
State of the art distributed systems techniques

High
availability designed into the system Powerful distributed object-oriented framework
Ease of administration and use

Looks
like a familiar multiprocessor server to users, sytem administrators, and applications
30
Solaris MC details
Solaris MC is a set of C++ loadable modules on top of Solaris

Very few changes to existing kernel
A private Solaris kernel per node: provides reliability Object-oriented system with well-defined interfaces
31
Solaris MC components
Applications System call interface Network File system C++ Processes Solaris MC Other nodes
Object framework Object invocations Kernel Solaris MC Architecture
Existing Solaris 2.5 kernel
Object and communication support High availability support PXFS global distributed file system Process mangement Networking
32
Object Orientation
Better software maintenance, change, and evolution

Well-defined
interfaces Separate implementation from interface Interface inheritance
Solaris MC uses:
IDL:
a better way to define interfaces CORBA object model: a better RPC (Remote Procedure Call) C++: a better C
33
Object and Communication Framework
Mechanism for nodes and modules to communicate

Inter-node
and intra-node interprocess communication
Optimized protocols for trusted computing base

Efficient, low-latency communication primitives Object communication independent of interconnect
We
use Ethernet, fast Ethernet, FibreChannel, Myrinet interconnect hardware to be upgraded
Allows
34
High Availability Support
Node failure doesnt crash entire system

Unaffected
nodes continue running Better than a SMP A requirement for mission critical market
Well-defined failure boundaries

Separate
kernel per node - OS does not use shared
memory
Object framework provides support

Delivers
failure notifications to servers and clients Group membership protocol detects node failures
Each subsystem is responsible for its recovery

Filesystem,
process management, networking, applications
35
PXFS: Global Filesystem
Single-system image of file sytem Backbone of Solaris MC Coherent access and caching of files and directories
Caching
provides high performance
Access to I/O devices
36
PXFS: An object-oriented VFS
PXFS builds on existing Solaris file sytems

Uses
the vnode/virtual file system interface (VFS) externally Uses object communication internally
37
Process management
Provide global view of processes on any node

Users,
administrators, and applications see global view Supports existing applications
Uniform support for local and remote processes

Process
creation/waiting/exiting (including remote execution) Global process identifiers, groups, sessions Signal handling procfs (/proc)
38
Process management benefits
Global process management helps users and administrators
Users see familiar single machine process model

Can run programs on any node Location of process in the cluster doesnt matter Use existing commands and tools: unmodified ps, kill, etc.
39
Networking goals
Cluster appears externally as a single SMP server

Familiar
to customers Access cluster through single network address Multiple network interfaces supported but not required
Scalable design
protocol
and network application processing on any mode Parallelism provides high server performance
40
Networking: Implementation
A programmable packet filter

Packets
routed between network device and the correct node Efficient, scalable, and supports parallelism Supports multiple protocols with existing protocol stacks
Parallelism of protocol processing and applications

Incoming
connections are load-balanced across the cluster
41
Status
4 node, 8 CPU prototype with Myrinet demonstrated
Object and communication infrastructure Global file system (PXFS) with coherency and caching Networking TCP/IP with load balancing Global process management (ps, kill, exec, wait, rfork, Monitoring tools Cluster membership protocols
/proc)
Demonstrated applications
Commercial parallel database Scalable Web server Parallel make Timesharing
Solaris-MC
team is working on high availability
42
Summary of Solaris MC
Clusters likely to be an important market Solaris MC preserves customer investment in Solaris
Uses
existing Solaris applications like a multiprocessor, not a special cluster architecture
Familiar to customers
Looks
Ease of administration and use Clusters are ideal for important applications Web server, file server, databases, interactive services State-of-the-art object-oriented distributed implementation
Designed
for future growth
43
Berkeley NOW Project
44
NOW @ Berkeley
Design & Implementation of higher-level system Global OS (Glunix) Parallel File Systems (xFS) Fast Communication (HW for Active Messages) Application Support Overcoming technology shortcomings Fault tolerance System Management NOW Goal: Faster for Parallel AND Sequential
45
NOW Software Components
Large Seq. Apps
Parallel Apps
Sockets, Split-C, MPI, HPF, vSM

Name Svr
Global Layer Unix
Active Messages
Unix Workstation VN segment Driver AM L.C.P.
Unix (Solaris) Workstation VN segment Driver AM L.C.P.
Myrinet Scalable Interconnect
46
Active Messages: Lightweight Communication Protocol

Key Idea: Network Process ID attached to every message that HW checks upon receipt Net PID match, as fast as before Net PIC mismatch, interrupt and invoke OS Can mix LAN messages and MPP messages; invoke OS & TCP/IP only when not cooperating (if everyone uses same physical layer format)
47
MPP Active Messages
Key Idea: associate a small user-level handler directly with each message Sender injects the message directly into the network Handler executes immediately upon arrival Pulls the message out of the network and integrates it into the ongoing computation, or replies No buffering (beyond transport), no parsing, no allocation, primitive scheduling
48
Active Message Model
Every message contains at its header the address of a user level handler which gets executed immediately in user level No receive side buffering of messages Supports protected multiprogramming of a large number of users onto finite physical network resource Active message operations, communication events and threads are integrated in a simple and cohesive model Provides naming and protection
49
Active Message Model (Contd..)
data structs
primary computation
data structs
primary computation
handler
Active Message Network data pc
50
xFS: File System for NOW

Serverless File System: All data with clients Uses MP cache coherency to reduce traffic Files striped for parallel transfer Large file cache (cooperative caching-Network RAM) Miss Rate Response Time Client/Server 10% 1.8 ms xFS 4% 1.0 ms (42 WS, 32 MB/WS, 512 MB/server, 8KB/access)
51
Glunix: Gluing Unix

It is built onto of Solaris It glues together Solaris running on Cluster nodes. Support transparent remote execution, load balancing, allows to run existing applications. Provides globalized view of system resources like SolarisMC Gang schedule parallel jobs to be as good as dedicated MPP for parallel jobs
52
3 Paths for Applications on NOW?

Revolutionary (MPP Style): write new programs from scratch using MPP languages, compilers, libraries, Porting: port programs from mainframes, supercomputers, MPPs, Evolutionary: take sequential program & use 1) Network RAM: first use memory of many computers to reduce disk accesses; if not fast enough, then: 2) Parallel I/O: use many disks in parallel for accesses not in file cache; if not fast enough, then: 3) Parallel program: change program until it sees enough processors that is fast => Large speedup without fine grain parallel program
53
Comparison of 4 Cluster Systems
54
Clusters Revisited
55
Summary
We have discussed Clusters

Enabling Technologies Architecture & its Components Classifications Middleware Single System Image Representative Systems
56
Conclusions Clusters are promising..

Solve parallel processing paradox Offer incremental growth and matches with funding pattern. New trends in hardware and software technologies are likely to make clusters more promising..so that Clusters based supercomputers can be seen everywhere!
57

Cluster Computing

Uploaded by

Copyright:

Available Formats

Cluster Computing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster Computing

Uploaded by

Copyright:

Available Formats

Classification of Cluster Computer

on Focus (in Market)

High Performance (HP) Clusters

High Availability (HA) Clusters

HA Cluster: Server Cluster with "Heartbeat" Connection

Dedicated Clusters Non-dedicated clusters

Clusters of PCs (CoPs) Clusters of Workstations (COWs) Clusters of SMPs (CLUMPs)

Building Scalable Systems: Cluster of SMPs (Clumps)

Performance of SMP Systems Vs. Four-Processor Servers in a Cluster

Based on Node OS Type..

Dimensions of Scalability & Levels of Clustering

Group Clusters (#nodes: 2-99)

Java plays a major in web and agent based computing

Cluster Middleware and Single System Image

What is Cluster Middleware ?

SSI Layer Availability Layer: It enables the cluster services of

Middleware Design Goals

Lets the see a single cluster system..

Single entry point, ftp, telnet, software loading... Scalable Performance

Easy growth of cluster

no change of API & automatic load distribution. Enhanced Availability

Automatic Recovery from failures

Handle consistency of data when replicated..

What is Single System Image (SSI) ?

Benefits of Single System Image

Reduction in the risk of operator errors

SSI vs. Scalability (design space of competing arch.)

Desired SSI Services

Single Entry Point

telnet cluster.my_institute.edu telnet node1.cluster. institute.edu

Availability Support Functions

Single I/O Space (SIO):

Single Process Space (SPS)

Checkpointing and Process Migration.

Application and Subsystem Level

Operating System Kernel Level

Cluster Computing - Research Projects

Cluster Computing - Commercial Software

Representative Cluster Systems

Next Generation Distributed Computing:

The Solaris MC Operating System

Why new software?

software, a cluster is:

Just a network of machines Requires specialized applications Hard to administer

a cluster operating system:

software makes cluster better for the customer

Cluster computing and Solaris MC

Solution: Solaris MC (Multi Computer) operating system

A distributed operating system (OS) for multi-computers

What is the Solaris MC OS ?

file system Global process management Global networking

Solaris MC runs existing applications unchanged

Solaris ANI (Application binary interface)

and interactive servers Databases File servers Timesharing

Benefits for vendors and customers

Solaris MC is a running research system

Designed, built and demonstrated Solaris MC prototype

Next: Solaris MC Phase II

availability New I/O work to take advantage of clusters Performance evaluation