Arkom 13-40275
Arkom 13-40275
Arkom 13-40275
PROCESSOR
1
Dosen 1. NOVERA ISTIQOMAH, M.T. (NVO)
Pengajar
2. MUHAMMAD FARIS RURIAWAN, M.T. (FRW)
2
1. Capaian
3. Symmetric Multiprocessors
6. Clusters
8. Vector Computation
3
CAPAIAN
4
MULTIPLE PROCESSOR
ORGANIZATIONS
5
JENIS PROSESOR PARALEL
• Single instruction, single data (SISD) stream: A single processor executes a
single instruction stream to operate on data stored in a single memory.
Uniprocessors fall into this category.
• Single instruction, multiple data (SIMD) stream: A single machine
instruction controls the simultaneous execution of a number of processing
elements on a lockstep basis. Each processing element has an associated data
memory, so that instructions are executed on different sets of data by
different processors.
• Multiple instruction, single data (MISD) stream: A sequence of data is
transmitted
to a set of processors, each of which executes a different instruction
sequence. This structure is not commercially implemented.
• Multiple instruction, multiple data (MIMD) stream: A set of processors
simultaneously execute different instruction sequences on different data sets.
SMPs, clusters, and NUMA systems fit into this category.
6
ARSITEKTUR PROSESOR PARALEL
7
SYMMETRIC MULTIPROCESSORS
(SMP)
8
KARAKTERISTIK
1. There are two or more similar processors of comparable capability.
2. These processors share the same main memory and I/O facilities
and are interconnected by a bus or other internal connection scheme,
such that memory
access time is approximately the same for each processor.
3. All processors share access to I/O devices, either through the same
channels or through different channels that provide paths to the same
device.
4. All processors can perform the same functions (hence the term
symmetric).
5. The system is controlled by an integrated operating system that
provides interaction
between processors and their programs at the job, task, file, and data
element levels.
9
KEUNTUNGAN
• Performance: If the work to be done by a computer can be
organized so that some portions of the work can be done in
parallel, then a system with multiple processors will yield greater
performance than one with a single processor of the same type
(Figure 17.3).
10
KEUNTUNGAN
14
HARDWARE SOLUTIONS
1. DIRECTORY PROTOCOLS
collect and maintain information about where copies of lines reside
there is a centralized controller
responsible for keeping the state information up to date
every local action that can affect the global state of a line must be
reported to the central controller
effective in large-scale systems that involve multiple buses or some
other complex interconnection scheme.
2. SNOOPY PROTOCOLS
distribute the responsibility for maintaining cache coherence among all
of the cache controllers in a multiprocessor
suited to a bus-based multiprocessor, because the shared bus provides
a simple means for broadcasting and snooping.
two basic approaches: write invalidate and write update (or write
broadcast)
Performance depends on the number of local caches and the pattern of
15
memory reads and writes.
HARDWARE SOLUTIONS
3. The MESI Protocol
The data cache includes two status bits per tag, so that each line
can be in one of four states:
• Modified: The line in the cache has been modified (different
from main memory) and is available only in this cache.
• Exclusive: The line in the cache is the same as that in main
memory and is not present in any other cache.
• Shared: The line in the cache is the same as that in main
memory and may be present in another cache.
• Invalid: The line in the cache does not contain valid data.
16
MULTITHREADING AND CHIP
MULTIPROCESSORS
17
IMPLICIT&EXPLICIT
1. Process: An instance of a program running on a computer. A
process embodies two key characteristics:
— Resource ownership: A process includes a virtual address
space to hold the process image; the process image is the
collection of program, data, stack, and attributes that define
the process
— Scheduling/execution: The execution of a process follows
an execution path (trace) through one or more programs. A
process has an execution state (Running, Ready, etc.) and a
dispatching priority and is the entity that is scheduled and
dispatched by the operating system.
18
IMPLICIT&EXPLICIT
3. Process switch: An operation that switches the processor
from one process to another, by saving all the process control
data, registers, and other information for the first and
replacing them with the process information for the second.
4. Thread: A dispatchable unit of work within a process. It
includes a processor context (which includes the program
counter and stack pointer) and its own data area for a stack
(to enable subroutine branching). A thread executes
sequentially and is interruptible so that the processor can turn
to another thread.
5. Thread switch: The act of switching processor control from
one thread to another within the same process. Typically, this
type of switch is much less costly than a process switch.
19
PRINCIPAL APPROACHES
1. Interleaved multithreading/fine-grained multithreading. The processor deals with
two or more thread contexts at a time, switching from one thread to another at
each clock cycle. If a thread is blocked because of data dependencies or memory
latencies, that thread is skipped and a ready thread is executed.
2. Blocked multithreading/coarse-grained multithreading. The instructions of a
thread are executed successively until an event occurs that may cause delay, such
as a cache miss. This event induces a switch to another thread. This approach is
effective on an in-order processor that would stall the pipeline for a delay event
such as a cache miss.
3. Simultaneous multithreading (SMT): Instructions are simultaneously issued from
multiple threads to the execution units of a superscalar processor. This combines
the wide superscalar instruction issue capability with the use of multiple thread
contexts.
4. Chip multiprocessing: the entire processor is replicated on a single chip and each
processor handles separate threads. The advantage of this approach is that the
available logic area on a chip is used effectively without depending on ever-
increasing complexity in pipeline design.
20
CLUSTERS
21
DEFINITION
An alternative to symmetric multiprocessing as an approach to
providing high performance and high availability and is
particularly attractive for server applications
A group of interconnected, whole computers working together
as a unified computing resource that can create the illusion of
being one machine.
22
CONFIGURATION
23
METHODS
24
ARCHITECTURE
25
COMPARED TO SMP
CLUSTERS SMP
27
TERMS
• Uniform memory access (UMA): All processors have access to all
parts of main memory using loads and stores. The memory access time
of a processor to all regions of memory is the same. The access times
experienced by different processors are the same. T
• Nonuniform memory access (NUMA): All processors have access to
all parts of main memory using loads and stores. The memory access
time of a processor differs depending on which region of main memory
is accessed. The last statement is true for all processors; however, for
different processors, which memory regions are slower and which are
faster differ.
• Cache-coherent NUMA (CC-NUMA): A NUMA system in which cache
coherence is maintained among the caches of the various processors.
28
ORGANIZATION
29
VECTOR COMPUTATION
30
APPROACHES
31
Terima kasih
32