Cluster Computing
Cluster Computing
Cluster Computing
Clusters Classification..1
Based
Clusters Classification..2
Based
on Workstation/PC Ownership
Clusters Classification..3
Based
on Node Architecture..
Clusters Classification..4
Linux Clusters (Beowulf) Solaris Clusters (Berkeley NOW) NT Clusters (HPVM) AIX Clusters (IBM SP2) SCO/Compaq Clusters (Unixware) .Digital VMS Clusters, HP clusters, ..
7
Clusters Classification..5
Based
on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..):
Homogeneous Clusters
All nodes will have similar configuration
Heterogeneous Clusters
Nodes based on different processors and running different OSes.
8
Clusters Classification..6a
(3) Network
Public
Metacomputing
Enterprise
Campus Department Workgroup Uniprocessor
Technology
(1)
SMP
Cluster
MPP
Platform
(2) 9
Clusters Classification..6b
10
11
Contents
What is Middleware ? What is Single System Image ? Benefits of Single System Image SSI Boundaries SSI Levels Relationship between Middleware Modules. Strategy for SSI via OS Solaris MC: An example OS supporting SSI Cluster Monitoring Software
12
An interface between user applications and cluster hardware and OS platform. Middleware packages support each other at the management, programming, and implementation levels. Middleware Layers:
13
Complete Transparency
14
A single system image is the illusion, created by software or hardware, that a collection of computing elements appear as a single computing resource. SSI makes the cluster appear like a single machine to the user, to applications, and to the network. A cluster without a SSI is not a cluster
15
Usage of system resources transparently Improved reliability and higher availability Simplified system management
16
17
Single File Hierarchy: xFS, AFS, Solaris MC Proxy Single Control Point: Management from single GUI Single virtual networking Single memory space - DSM Single Job Management: Glunix, Condin, LSF Single User Interface: Like workstation/PC windowing environment (CDE in Solaris/NT), may it can use Web technology
18
any node can access any peripheral or disk devices without the knowledge of physical location.
Any process on any node create processes cluster wide and they communicate through signal, pipes, etc, as if they are one a single node.
Saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. PM for Load balancing...
Reduction in the risk of operator errors User need not be aware of the underlying system architecture to use these machines effectively
19
SSI Levels
It is a computer science notion of levels of abstractions (house is at a higher level of abstraction than walls, ceilings, and floors).
Hardware Level
20
Beowulf (CalTech and Nasa) - USA CCS (Computing Centre Software) - Paderborn, Germany Condor - Wisconsin State University, USA DJM (Distributed Job Manager) - Minnesota Supercomputing Center DQS (Distributed Queuing System) - Florida State University, USA EASY - Argonne National Lab, USA HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US far - University of Liverpool, UK Gardens - Queensland University of Technology, Australia Generic NQS (Network Queuing System),University of Sheffield, UK NOW (Network of Workstations) - Berkeley, USA NIMROD - Monash University, Australia PBS (Portable Batch System) - NASA Ames and LLNL, USA PRM (Prospero Resource Manager) - Uni. of S. California, USA QBATCH - Vita Services Ltd., USA
21
22
23
24
With
Cluster becomes a scalable, modular computer Users and administrators see a single large machine Runs existing applications Easy to administer
New
Goal: use computer clusters for general-purpose computing Support existing customers and applications
26
Solaris MC extends standard Solaris Solaris MC makes the cluster look like a single machine
Global
27
Applications
Ideal for:
Web
investment in existing applications Modular servers with low entry-point price and low cost of ownership Easier system administraion Solaris could become a preferred platform for clustered systems
28
of SPARCstations connected with Myrinet network Runs unmodified commercial parallel database, scalable Web server, parallel make
29
Advantages of Solaris MC
applications: binary-compatible Same kernel, device drivers, etc. As portable as base Solaris - will run on SPARC, x86, PowerPC
30
Solaris MC details
A private Solaris kernel per node: provides reliability Object-oriented system with well-defined interfaces
31
Solaris MC components
Applications System call interface Network File system C++ Processes Solaris MC Other nodes
Object and communication support High availability support PXFS global distributed file system Process mangement Networking
32
Object Orientation
Solaris MC uses:
IDL:
a better way to define interfaces CORBA object model: a better RPC (Remote Procedure Call) C++: a better C
33
Allows
34
nodes continue running Better than a SMP A requirement for mission critical market
memory
failure notifications to servers and clients Group membership protocol detects node failures
35
Single-system image of file sytem Backbone of Solaris MC Coherent access and caching of files and directories
Caching
36
the vnode/virtual file system interface (VFS) externally Uses object communication internally
37
Process management
creation/waiting/exiting (including remote execution) Global process identifiers, groups, sessions Signal handling procfs (/proc)
38
39
Networking goals
to customers Access cluster through single network address Multiple network interfaces supported but not required
Scalable design
protocol
and network application processing on any mode Parallelism provides high server performance
40
Networking: Implementation
routed between network device and the correct node Efficient, scalable, and supports parallelism Supports multiple protocols with existing protocol stacks
41
Status
4 node, 8 CPU prototype with Myrinet demonstrated
Object and communication infrastructure Global file system (PXFS) with coherency and caching Networking TCP/IP with load balancing Global process management (ps, kill, exec, wait, rfork, Monitoring tools Cluster membership protocols
/proc)
Demonstrated applications
Commercial parallel database Scalable Web server Parallel make Timesharing
Solaris-MC
42
Summary of Solaris MC
Clusters likely to be an important market Solaris MC preserves customer investment in Solaris
Uses
Familiar to customers
Looks
Ease of administration and use Clusters are ideal for important applications Web server, file server, databases, interactive services State-of-the-art object-oriented distributed implementation
Designed
43
44
NOW @ Berkeley
Design & Implementation of higher-level system Global OS (Glunix) Parallel File Systems (xFS) Fast Communication (HW for Active Messages) Application Support Overcoming technology shortcomings Fault tolerance System Management NOW Goal: Faster for Parallel AND Sequential
45
Parallel Apps
Active Messages
46
47
Key Idea: associate a small user-level handler directly with each message Sender injects the message directly into the network Handler executes immediately upon arrival Pulls the message out of the network and integrates it into the ongoing computation, or replies No buffering (beyond transport), no parsing, no allocation, primitive scheduling
48
Every message contains at its header the address of a user level handler which gets executed immediately in user level No receive side buffering of messages Supports protected multiprogramming of a large number of users onto finite physical network resource Active message operations, communication events and threads are integrated in a simple and cohesive model Provides naming and protection
49
data structs
primary computation
data structs
primary computation
handler
50
51
52
Revolutionary (MPP Style): write new programs from scratch using MPP languages, compilers, libraries, Porting: port programs from mainframes, supercomputers, MPPs, Evolutionary: take sequential program & use 1) Network RAM: first use memory of many computers to reduce disk accesses; if not fast enough, then: 2) Parallel I/O: use many disks in parallel for accesses not in file cache; if not fast enough, then: 3) Parallel program: change program until it sees enough processors that is fast => Large speedup without fine grain parallel program
53
54
Clusters Revisited
55
Summary
57