Nothing Special   »   [go: up one dir, main page]

Parallel Processing Chapter - 2: Basics of Architectural Design

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Parallel Processing

Chapter -2
Basics of Architectural Design

Dr. Basant Tiwari


basanttiw@gmail.com
Department of Computer Science, Hawassa University
Basics of Architectural Design
1. Microprocessor Architecture
A. CPU Design
B. Addressing Mode
C. Instruction Set

2. Introduction to RISC processors


1. RISC Architecture Characteristics
2. Advantages of RISC architecture
3. Disadvantages of RISC architecture

3. Characteristics of RISC processors


4. RISC Vs CISC
5. Classification of Instruction Set Architectures
6. performance measurements
Microprocessor Architecture
• A microprocessor is the central processing unit (CPU) of a computer, where
processing of program instructions and data occurs. A basic computer consists
of a microprocessor, external memory, and input and output devices.

• The working ability of CPU is dependent on its instruction set architecture


(ISA) upon which it is designed.

• The Instruction Set Architecture (ISA) is the part of the processor that is


visible to the programmer or compiler writer. The ISA serves as the boundary
between software and hardware

• In general, an ISA defines the supported data types, the registers, the hardware


support for managing main memory fundamental features (such as
the memory consistency, addressing modes, virtual memory), and
the input/output model.
CPU Design
• Processor design is the engineering design task of creating a
microprocessor, i.e. a component of computer hardware.
• The design process involves choosing an instruction set and a certain
execution paradigm (e.g. VLIW, CISC or RISC) and results in a
microarchitecture.
• This results in a die which is bonded a chip carrier. This chip carrier is
then soldered onto a circuit board (PCB).
Addressing Modes

• Addressing modes are the manner in which, the data is accessed.


• Depending upon the type of instruction applied, addressing modes are of
various types such as direct mode, where straight data is accessed or
indirect mode, where the location of the data is accessed.
Instruction Set
• It is a group of instructions that can be
given to the computer.
• These instructions direct the computer in
terms of data manipulation.
• A typical instruction consists of two parts:
Opcode and Operand.
• Opcode or operational code is the
instruction applied. It can be loading data,
storing data etc.
• Oprand is the memory register or data
upon which instruction (Opcode) is applied
• Little Minion Computer Instruction Set is
shown in the figure.
INTRODUCTION TO RISC
• The aim of computer architects is to design computers which are cheaper
and more powerful than their predecessors.
• A cheaper computer has:
• Low hardware manufacturing cost.
• Low Cost for programming scalable/ portable architecture that require low costs for
debugging.

• RISC (Reduced Instruction Set Computer) is used in portable devices due


to its power efficiency.
• RISC is a type of microprocessor architecture that uses highly-optimized
set of instructions.
• Pipelining is one of the unique features of RISC. It is performed by
overlapping the execution of several instructions in a pipeline fashion. It
has a high performance advantage over CISC.
• RISC processors take simple instructions and are executed within a clock
cycle.
RISC ARCHITECTURE CHARACTERISTICS
• Simple Instructions are used in RISC architecture.
• RISC helps and supports few simple data types and synthesize complex data
types.
• RISC utilizes simple addressing modes and fixed length instructions for
pipelining.
• RISC takes one Cycle Execution Time.
• RISC contains Large Number of Registers in order to prevent various number
of interactions with memory.
• RISC uses pipelining technique that allows for simultaneous execution, of
instructions to more efficiently process instructions;
• Reduced instructions need a less number of transistors in RISC.
• RISC uses Harvard memory model means it is Harvard Architecture.
• A compiler is used to perform the conversion operation means to convert a
high-level language statement into the code of its form.
Parallel processing
Advantages of RISC architecture

• RISC instructions are simpler machine instruction.


• RISC instructions are hardwired to fasten the execution.
• There are very fewer instructions in RISC instruction set.
• RISC instruction has simple addressing modes.
• RISC instruction executes faster because most of
instruction operates on processor register and there is no need to access
memory for each instruction.
• It is easy to pipeline RISC instruction as all instruction is of fixed size
and opcode and operand are located in the same position in the word.
• RISC instructions execute one instruction per clock cycle.
 Disadvantages of RISC architecture
• Mostly, the performance of the RISC processors depends on the
programmer or compiler.
• To convert the CISC code to a RISC code, termed as a code expansion, will
increase the size.
• RISC instruction size is reduced but more instructions are required to
perform an operation when compared with CISC. So, we can say that the
length of the program is increased.
• The machine instructions are hardwired in RISC so, it would cost, if any
instruction needs modification.
• It finds difficulty in processing complex instruction and complex
addressing mode.
• RISC instructions do not allow direct memory to memory transfer, it
requires Load and Store instructions to do so.
RISC Performance
• Achieving higher performance is the driving force behind the introduction of every new
architecture or system organization.

• There are several ways to achieve performance:


• technology advances,

• better machine organization,

• better architecture, and

• the optimization and improvements in compiler technology.

• The work that each instruction of the RISC machine performs is simple and straight forward.
Thus, the time required to execute each instruction can be shortened and the number of
machine cycles reduced.

• Typically the instruction execution time is divided in five stages, and as soon as processing
of one stage is finished, the machine proceeds with executing the second stage and so on.

• Typically those five pipeline stages are:


• IF – Instruction Fetch, ID – Instruction Decode, EX – Execute

MA – Memory Access, and WB – Write Back


RISC Performance Contd…
• The goal of RISC is to achieve execution rate of one Cycle Per Instruction
(CPI=1.0) which would be the case, when no interruptions in the pipeline
occurs.
• The instructions and the addressing modes in RISC architecture are carefully
selected and tailored upon the most frequently used instructions, in a way that
will result in a most efficient execution of the RISC pipeline.
• On average a code written for RISC will consist of more instructions than
the one written for CISC. The typical trade-off that exists between RISC and
CISC can be expressed in the total time required to execute a certain task:
Time (task) = I x C x P x T0
• I = No. of Instructions / Task
• C = No. of Cycles / Instruction
• P = No. of Clock Periods / Cycle (usually P=1)
• T0 = Clock Period (Ns)
RISK Pipeline Stages
RISC processor has 5 stage instruction pipeline to execute all the instructions in the
RISC instruction set. Following are the 5 stages of RISC pipeline with their respective
operations:
Stage 1 (Instruction Fetch)
• In this stage the CPU reads instructions from the address in the memory whose value is
present in the program counter.
Stage 2 (Instruction Decode)
• In this stage, instruction is decoded and in the second half, the register file is accessed to
get the operand from the registers file and loaded to the accumulator register.
Stage 3 (Instruction Execute)
• In this stage, ALU operations are performed and the result it loaded to ALU output
register.
Stage 4 (Memory Access)
• In this stage, memory operands are read and written from/to the memory that is present
in the instruction.
Stage 5 (Write Back)
• In this stage, computed/fetched value is written back to the register present in the
instructions.
RISK Pipeline Contd…
• Instruction pipelining is often used to enhance performance. In RISC
machines most of the operations are register-to-register. Therefore, the
instructions can be executed in two phases:
• F: Instruction Fetch to get the instruction.
• E: Instruction Execute on register operands and store the results in
register.
• In general, the memory access in RISC is performed through LOAD and
STORE operations. For such instructions the following steps may be needed:
• F: Instruction Fetch to get the instruction
• E: Effective address calculation for the desired memory operand
• D: Memory to register or register to memory data transfer through
bus.
RISK Pipeline Example
• Let us explain pipelining in RISC with an example program execution sample. Take the
following program (R indicates register).
• LOAD RA (Load from memory location A)
• LOAD RB (Load from memory location B)
• ADD RC ,RA , RB (RC = RA + RB )
• SUB RD, RA, RB (RD = RA - RB )
• MUL RE, RC, RD (RE = RC X RD )
• STOR RE (Store in memory location C)
• Return to main.

Sequential Execution of Instructions (WITHOUT PIPELINE)


RISK Pipeline Example

Following figure shows a simple pipelining •LOAD RA (Load from memory location A)
scheme, in which F and E phases of two •LOAD RB (Load from memory location B)
different instructions are performed
simultaneously. This scheme speeds up the •ADD RC ,RA , RB (RC = RA + RB )
execution rate of the sequential scheme. •SUB RD, RA, RB (RD = RA - RB )

•MUL RE, RC, RD (RE = RC X RD )


•STOR RE (Store in memory location C)
•Return to main.

Two Way Pipelined Timing


• Please note that the pipeline above is not running at its full capacity. This is because of the
following problems:
• We are assuming a single port memory thus only one memory access is allowed at a
time. Thus, Fetch and Data transfer operations cannot occur at the same time. Thus, you
may notice blank in the time slot 3, 5 etc.
• The last instruction is an unconditional jump. Please note that after this instruction, the
next instruction of the calling program will be executed. Although not visible in this
example a branch instruction interrupts the sequential flow of instruction execution. Thus,
causing inefficiencies in the pipelined execution.
• This pipeline can simply be improved by allowing two memory accesses at a time. Thus,
the modified pipeline would be:

Three-way Pipelining Timing


RISC ADDRESSING MODES
• RISC instruction has simple addressing modes. Below we have a list of RISC instruction
type addressing modes. Let us discuss them one by one.
• Immediate addressing mode: This addressing mode explicitly specifies the operand in
the instruction. Like
Add R4, R2, #200 = R4 = R2 + 200
• The above instruction will add 200 to the content of R2 and store the result in R4.
• Register addressing mode: This addressing mode describes the registers holding the
operands.
Add R3, R3, R4 = R3 = R3 + R4
• The above instruction will add the content of register R4 to the content of register R3 and
store the result in R3.
• Absolute addressing mode: This addressing mode describes a name for a memory
location in the instruction. It is used to declare global variables in the program.
Integer A, B, SUM;
• This instruction will allocate memory to variable A, B, SUM.
RISC ADDRESSING MODES
• Register Indirect addressing mode: This addressing mode describes the register, which
has the address of the actual operand in the instruction. It is similar to pointers in
HLL.
Load R2, (R3) -> R2 = Content (Address at) of R3
• This instruction will load the register R2 with the content, whose address is mentioned in
register R3.
• Index addressing mode: This addressing mode provides a register in the instruction, to
which when we add a constant, obtain the address of the actual operand. It is similar to
the array of HLL.
Load R2, 4(R3) -> R2 = Content (Address at) of R3+4
• This instruction will load the register R2 with the content present at the location obtained
by adding 4 to the content of register R3.
CISC Vs. RISC
1. Large number of instructions – from 120 1. Relatively fewer instructions - less than
to 350. 100.
2. Employs a variety of data types and a
2. Relatively fewer addressing modes.
large number of addressing modes.
3. Fixed-length instructions usually 32 bits, easy
3. Variable-length instruction formats.
to decode instruction format.

4. Mostly register-register operations. The only


4. Instructions manipulate operands residing
memory access is through explicit
in memory.
LOAD/STORE instructions.

5. Number of Cycles Per Instruction (CPI) 5. Number of CPI is one as it uses pipelining.
varies from 1-20 depending upon the Pipeline in RISC is optimized because of
instruction. simple instructions and instruction formats.

6. Large number of GPRs are available that are


6. GPRs varies from 8-32. But no support is primarily used as Global registers and as a
available for the parameter passing and register based “procedural call stack”, thus,
function calls. optimized for structured and parameter
passing programming

7. Microprogrammed Control Unit. 7. Hardwired Control Unit.


SYSTEM ATTRIBUTE TO PERFORMANCE
• In computing, computer performance is the amount of useful work accomplished
by a computer system. Outside of specific contexts, computer performance is
estimated in terms of accuracy, efficiency and speed of executing computer
program instructions.
• Clock Rate : The CPU of today’s digital computer is driven by a clock with a
constant cycle time (T in Nanosecond). The inverse of the cycle time is the CLOCK
RATE (Frequency) ( F = 1/T in Mega Hertz)
• Instruction Count (Ic) : The size of the program is determined by the instruction
count, in terms of the number of machine instructions to be executed in the
program.
• CPI (Cycle Per Instructions) : For a particular instruction, CPI equal to number of
clock periods (cycles) required for its execution. Therefore, the cycle per instruction
becomes an important parameter for measuring the time needed to execute each
instruction.
• For a given instruction set, we can calculate an average CPI over all instruction
type.
SYSTEM ATTRIBUTE TO PERFORMANCE
• Execution Time: Let Ic be the number of instructions in a given program or the
instruction count. The CPU time (T in Second/program) need to execute the program
is estimated by:
T = Ic X CPI X Ʈ, Where, CPI = Cycle per Instruction and Ʈ = Time of one Cycle.
We know that
C=Ic X CPI = Total number of cycles required to execute full program and
C = T/ Ʈ ≡ Total Execution Time/ Time for one Cycle.

• The execution of an instruction requires going through a cycle of events, involving


the instruction fetch, decode, operand(s) fetch, execution, and store results. In this
cycle, only the instruction decode and execution phases are carried out in the CPU.
• The remaining three operations may be required to access the memory. We define a
memory cycle as the time needed to complete one memory reference. Usually, a
memory cycle is k times the processor cycle T. The value of k depends on the speed
of the memory technology and processor-memory interconnection scheme used.
SYSTEM ATTRIBUTE TO PERFORMANCE
• The CPI of an instruction type can be divided into two component terms
corresponding to the total processor cycles and memory cycles needed to
complete the execution of the instruction. We can rewrite the above equation as:

T = Ic X (P + m X k) X Ʈ
Where P = Number of processor cycle required for instruction
Decode and Instruction Execution.
m = Number of memory access required, and
k = Ratio between memory cycle and processor cycle.
SYSTEM ATTRIBUTE TO PERFORMANCE
•  
MIPS (Million Instruction Per Second) = The processor speed is often measured
in term of million instruction per second (MIPS). We simply call it, the MIPS
Rate of a given processor.
• It should be emphasized that the MIPS rate varies with respect to number of
factor including Clock Rate (f), Instruction Count (Ic) and CPI of the given
machine as defined below:

MIPS rate = ≡ ≡

THROUGHPUT = No of output given per unit time on program/second.

Wp = OR
Numerical problem -1
• Problem 1.1 A 40-MHz processor was used to execute a benchmark
program with the following instruction mix and clock cycle counts:

Instruction Clock
Instruction type
count cycle count
Integer arithmetic 45000 1
Data transfer 32000 2
Floating point 15000 2
Control transfer 8000 2

• Determine the effective CPI, MIPS rate, and execution time for this
program.
Numerical problem -2
• Problem : A workstation uses a 15-MHz processor with a claimed 10-MIPS
rating to execute a given program mix. Assume a one-cycle delay for each
memory access.
(a) What is the effective CPI of this computer?
(b) Suppose the processor is being upgraded with a 30-MHz clock.
However, the speed of the memory subsystem remains unchanged, and
consequently two clock cycles are needed per memory access. If 30% of
the instructions require one memory access and another 5% require two
memory accesses per instruction, what is the performance of the upgraded
processor with a compatible instruction set and equal instruction counts in
the given program mix?
Numerical problem - 3
• Problem : Consider the execution of an object code with 200,000 instructions
on a 40-MHz processor. The program consists of four major types of
instructions. The instruction mix and the number of cycles (CPI) needed for
each instruction type are given below based on the result of a program trace
experiment:
Instrucion
Instruction type CPI
Mix
Arithmetic and logic 1 60%
Load/store with cache hit 2 18%
Branch 4 12%
Memory reference with cache
miss 8 10%

• Determine the effective CPI, and MIPS rate for this program.
THANKS &
HAPPY LEARNING...

You might also like