HPC Chapter 1

High Performance Computing
Levels of Parallelism
In computing, parallelism refers to the use of multiple processors or cores to perform
computations simultaneously, thereby reducing the time required to complete a task. There are
several levels of parallelism in computer systems, including:
1.1. Instruction-level parallelism: This involves the simultaneous execution of

multiple instructions within a single processor. This is achieved through
techniques such as pipelining, where multiple stages of a processor are utilized
to execute different instructions simultaneously.
1.2. Thread-level parallelism: This involves the execution of multiple threads within a
single process. Threads are separate execution paths that can be scheduled
independently by the operating system. This allows multiple threads to execute
simultaneously on different processors or cores.
1.3. Data-level parallelism: This involves the simultaneous execution of the same
operation on multiple data elements. This is commonly used in applications such
as graphics processing and scientific computing, where large amounts of data
can be processed in parallel.
1.4. Task-level parallelism: This involves the execution of multiple independent tasks
simultaneously. This is commonly used in distributed computing systems, where
multiple computers work together to solve a large computational problem.
1.5. Overall, parallelism is a powerful technique that can be used to speed up
computations and improve the performance of computer systems.
Models of parallel computing / Parallelism Models
SIMT
1. SIMT stands for Single Instruction Multiple Thread,
which is a parallel computing model used in modern
graphics processing units (GPUs). In this model, a
large number of threads are executed in parallel on
a GPU, with each thread executing the same
instruction on different data.

2. The SIMT model is similar to the SIMD model (Single
Instruction Multiple Data) used in some other
parallel computing architectures, but with some
important differences.
3. In the SIMD model, all threads execute the same
instruction on the same data, whereas in the SIMT
model, each thread can execute the same instruction
on different data
4. The SIMT model is particularly well-suited for
computations, such as those commonly found in
graphics processing, scientific simulations, and
machine learning.
5. The SIMT model is also used in some general-
purpose parallel computing frameworks, such as
CUDA (Compute Unified Device Architecture)
developed by NVIDIA.
6. In the SIMT model, the threads are organized into
groups called warps, which are executed in parallel
on the GPU. Each warp consists of a fixed number of
threads, typically 32, and all threads in a warp
execute the same instruction.
7. This allows the GPU to achieve high parallelism and
efficiency, as long as the computation can be
parallelized and the memory access patterns are
optimized.
SPMD
1.1. SPMD stands for Single Program Multiple Data. It is a parallel
computing model where a single program is executed by multiple
processors on different data.
1.2. In the SPMD model, each processor executes the same program,
but operates on different data.
1.3. The SPMD model is commonly used in parallel computing
environments such as clusters, grids, and supercomputers.
1.4. In this model, the program is written as a single sequential
program, which is then divided into parallel tasks that can be
executed simultaneously by multiple processors. Each processor
executes the same program, but operates on different data.
1.5. The SPMD model is well-suited for parallelizing applications that
can be divided into independent tasks, such as scientific
simulations and data analytics.
1.6. It allows for the efficient use of multiple processors to speed up
computations and achieve better performance.
1.7. In the SPMD model, the processors may communicate with each
other through message passing, shared memory, or other
mechanisms, depending on the specific implementation.
1.8. The SPMD model is one of the most widely used parallel
computing models, and is supported by many parallel

programming languages and frameworks, such as MPI (Message
Passing Interface) and OpenMP (Open Multi-Processing).
Data Flow Models
1.1. Data Flow Models are a type of parallel computing
model that focus on the flow of data through a system,
rather than on the flow of control. In data flow models,
data is represented as a series of tokens that are passed
between computational nodes.
1.2. A data flow model is a diagrammatic representation of
the flow and exchange of information within a system.
1.3. Data flow models are used to graphically represent the
flow of data in an information system by describing the
processes involved in transferring data from input to file
storage and reports generation
1.4. The basic idea is that the program is represented as a
network of processing elements, each of which performs
a particular function on its inputs to produce its output.
1.5. One important advantage of data flow models is their
ability to exploit parallelism in a program.

1.6. By breaking down a program into a network of
processing elements, data flow models can more easily
identify which computations can be executed
concurrently, and which must be executed sequentially.
1.7. This allows for more efficient use of computing
resources, resulting in faster computation time.
1.8. Here are several different types of data flow models in
parallel computing, including static data flow, dynamic
data flow, and token flow.
1.9. In static data flow, the program's structure is
determined at compile time, while in dynamic data flow,
the structure is determined at runtime. Token flow
models use tokens to represent data flowing through the
system and synchronize computations.
1.10. Data flow models are widely used in parallel computing
for a variety of applications, including image processing,
signal processing, and scientific computing.
1.11. They provide a flexible and efficient way to represent
computations in parallel, and can be implemented on a

variety of hardware platforms, including multicore
processors, GPUs, and FPGAs.
Demand-driven Computation
1. Demand-driven computation is a programming paradigm that
focuses on computing only what is necessary, rather than
computing everything in advance.
2. Demand-driven computation is a paradigm of computing where the

computation is initiated or performed in response to a specific
request or demand.
3. In parallel computing this approach is particularly useful in

situations where data is too large to fit in memory, or when the
computation involves complex algorithms that require a lot of
processing power.
4. In demand-driven computation, the focus is on the input data rather

than the algorithm or program. The computation is performed only
when the input data is available, and it stops when the output is
produced.
5. In demand-driven parallel computing, the computation is divided into

smaller units of work, called tasks. Each task is executed by a
separate processing unit, such as a CPU core or a GPU.
6. Tasks are scheduled dynamically based on their dependencies and

availability of resources.
7. This allows the computation to adapt to changing conditions, such

as variations in input data, without wasting resources on
unnecessary computations. And it is also called task-based
parallelism.
8. Another approach is data-driven parallelism, which focuses on the

parallel execution of data operation.In this approach, data elements
are distributed among processing units, and each unit performs
operations on the data it has.
9. Overall, demand-driven computation in parallel computing can be a
useful paradigm for handling large or complex data sets and creating
responsive applications.And this can lead to faster computation
times and better utilization of resources, as idle processors can be
used to handle other tasks, rather than being left unused.
Architectures in parallel computing
Parallel computing architectures refer to the different ways in
which processors are organized and connected to perform
computation simultaneously.
1. N-wide superscalar architectures

1.1. N-wide superscalar architectures are a type of
parallel computing architecture that enables the
execution of multiple instructions in parallel by using
multiple execution units. Such as arithmetic logic
units (ALUs) and floating-point units (FPUs), that
can operate independently.
1.2. Specifically, an N-wide superscalar architecture can
execute up to N instructions simultaneously, where N
is the width of the execution units in the processor.
1.3. It allows the execution of multiple instructions in a
single clock cycle.

1.4. In a N-wide superscalar architecture, multiple
instructions are fetched and decoded in parallel, and
then dispatched to multiple execution units for
parallel execution.
1.5. This enables the processor to achieve higher
instruction throughput, which can result in
significant performance improvements compared to
traditional scalar architectures by executing multiple
instructions simultaneously.
1.6. In addition, N-wide superscalar architectures can
also be combined with other parallel computing
techniques such as pipelining and multithreading to
further increase instruction throughput and achieve
even higher levels of parallelism.
1.7. This is particularly useful for applications that
require intensive computations or large data sets, as
it allows for faster processing and improved
efficiency.
1.8. However N-wide superscalar architectures have
some significant drawbacks: they can be more
complex and difficult to design, and they require a

large amount of power and resources to operate
efficiently.
1.9. Additionally, not all programs can take advantage of
the parallelism offered by superscalar architectures,
so the benefits may not always be guaranteed.
1.10. Overall, N-wide superscalar architectures are a
powerful tool for parallel computation, particularly
for applications that require high levels of parallelism,
for achieving high performance through parallel
instruction execution.
2. Multi-core
2.1. Multi-core architecture is a type of parallel computing
architecture that involves the use of multiple processors
or processing units, also known as cores, within a
single physical computer chip CPU.
2.2. Each core is capable of executing instructions

independently, allowing multiple instructions to be
executed simultaneously.These cores can work together
to perform multiple tasks simultaneously, which is
known as parallel computation.
2.3. This enables the processor to achieve higher instruction

throughput and improve performance by distributing the
workload among the various cores and executing
multiple instructions simultaneously.
2.4. In a multi-core architecture, the cores typically share a

common cache, memory, and other hardware resources,
which can result in resource conflict or dispute and
bottlenecks if not properly managed.
2.5. Each core can be assigned a different subtask, allowing

for faster completion of the overall task. This is
especially useful for computationally intensive tasks
such as data processing, scientific simulations, and
machine learning.
2.6. With multi-core processors, a computer can perform

tasks such as running multiple applications, handling
complex calculations, or rendering high-quality graphics,
all at the same time.
2.7. Multi-core technology has led to significant

improvements in computer performance and has
enabled the creation of more powerful and versatile
devices, such as smartphones, tablets, and gaming
consoles.
2.8. Software techniques such as parallel algorithms and

thread-level parallelism are also used to take advantage
of the parallelism in multi-core architectures. For
example, multi-threaded applications can take
advantage of the multiple cores by executing different
threads in parallel on different cores.
2.9. Overall, multi-core architecture is an important

technique in parallel computing for achieving high
performance through parallel execution. However, it
also requires careful design and optimization to ensure
efficient execution and minimize potential performance
bottlenecks.
3. Multi-threaded
3.1. Multi-threaded architecture is a type of parallel
computing architecture that involves the use of multiple
threads within a single process or program
3.2. Each thread is an independent sequence of instructions
that can be executed simultaneously, allowing multiple
instructions to be executed concurrently on separate
processor cores or CPUs
3.3. This enables the processor to achieve higher instruction

throughput and improve performance by executing
multiple instructions simultaneously.
3.4. It leverages the power of multiple processors to execute

tasks more quickly.
3.5. In multi-threaded architecture, each thread is

responsible for a specific portion of the computation,
and the threads communicate with each other to share
data and coordinate their work.
3.6. The threads can be executed in parallel, allowing for

faster overall execution times.
3.7. In a multi-threaded program, each thread can perform a

separate task simultaneously, which can result in
significant performance improvements over single-
threaded programs.
3.8. Multi-threading can also be used to take advantage of

multi-core processors, as each thread can be assigned
to a separate core for parallel processing.
3.9. It is a common technique used in multithreaded

architectures to take advantage of the available
parallelism.
3.10. This can lead to even greater performance

improvements, especially for computationally intensive
tasks such as video encoding or scientific simulations,
including data processing, image and video processing
and more. It is particularly useful in applications that
require heavy computational power and involve large
datasets.
3.11. Multi-threaded architectures can also be combined with
other parallel computing techniques such as multi-core
architectures and distributed computing to further
increase performance and achieve even higher levels of
parallelism.
3.12. Overall, multi-threaded architecture is an important

technique in parallel computing for achieving high
performance through parallel execution. However, it
also requires careful design and optimization to ensure
efficient execution and to ensure correct behavior and
optimal performance.

HPC Chapter 1

Uploaded by

Copyright:

Available Formats

HPC Chapter 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HPC Chapter 1

Uploaded by

Copyright:

Available Formats

High Performance Computing

1.1. Instruction-level parallelism: This involves the simultaneous execution of

Models of parallel computing / Parallelism Models

which is a parallel computing model used in modern

graphics processing units (GPUs). In this model, a

large number of threads are executed in parallel on

a GPU, with each thread executing the same

instruction on different data.

Instruction Multiple Data) used in some other

parallel computing architectures, but with some

3. In the SIMD model, all threads execute the same

instruction on the same data, whereas in the SIMT

model, each thread can execute the same instruction

4. The SIMT model is particularly well-suited for

computations, such as those commonly found in

graphics processing, scientific simulations, and

5. The SIMT model is also used in some general-

purpose parallel computing frameworks, such as

CUDA (Compute Unified Device Architecture)

6. In the SIMT model, the threads are organized into

groups called warps, which are executed in parallel

on the GPU. Each warp consists of a fixed number of

threads, typically 32, and all threads in a warp

execute the same instruction.

7. This allows the GPU to achieve high parallelism and

efficiency, as long as the computation can be

parallelized and the memory access patterns are

1.1. SPMD stands for Single Program Multiple Data. It is a parallel

computing model where a single program is executed by multiple

processors on different data.

but operates on different data.

1.3. The SPMD model is commonly used in parallel computing

environments such as clusters, grids, and supercomputers.

1.4. In this model, the program is written as a single sequential

program, which is then divided into parallel tasks that can be

executed simultaneously by multiple processors. Each processor

executes the same program, but operates on different data.

1.5. The SPMD model is well-suited for parallelizing applications that

can be divided into independent tasks, such as scientific

simulations and data analytics.

1.6. It allows for the efficient use of multiple processors to speed up

computations and achieve better performance.

other through message passing, shared memory, or other

mechanisms, depending on the specific implementation.

computing models, and is supported by many parallel

Passing Interface) and OpenMP (Open Multi-Processing).

Data Flow Models

1.1. Data Flow Models are a type of parallel computing

model that focus on the flow of data through a system,

rather than on the flow of control. In data flow models,

data is represented as a series of tokens that are passed

between computational nodes.

1.2. A data flow model is a diagrammatic representation of

the flow and exchange of information within a system.

1.3. Data flow models are used to graphically represent the

flow of data in an information system by describing the

processes involved in transferring data from input to file

storage and reports generation

1.4. The basic idea is that the program is represented as a