Unit III - Basic Processing Unit
Unit III - Basic Processing Unit
Unit III - Basic Processing Unit
Slide Courtesy : “Computer Organization” by Carl Hamacher, Zvonko Vranesic, Safwat Zaky
Overview
Control signals
PC
Instruction
Address
decoder and
lines
MDR HAS TWO MAR control logic
INPUTS AND Memory
TWO OUTPUTS bus
MDR
Data
lines IR
Datapath
Y
Constant 4 R0
Select MUX
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
Z
Textbook Page 413
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Register Transfers
• All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure 7.3.7.3.Input
Figure Inputand
andoutput
outputgating
gating for one register
register bit.
bit.
Performing an Arithmetic or
Logic Operation
MDR
Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
Fetching a Word from Memory
Clock
MR
MDRinE
Data
• Add (R3), R1
• Fetch the instruction
• Fetch the first operand (the contents of the
memory location pointed to by R3)
• Perform the addition
• Load the result into R1
Architecture Internal processor
bus
Riin
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Execution of a Complete
Instruction
Internal processor
bus
Control signals
Add (R3), R1
PC
Instruction
Step Action Address
decoder and
lines
MAR control logic
Constant 4 R0
5 R1 out , Yin , WMF C
6 MDR out , SelectY, Add, Zin Select MUX
Step Action
Incrementer
PC
Register
file
Constant 4
MUX
A
ALU R
Instruction
decoder
IR
MDR
MAR
Step Action
Control signals
Instruction
instruction Data
lines
MDR
IR
Carry-in
External
inputs
Decoder/
IR
encoder
Condition
codes
Control signals
Step decoder
T 1 T2 Tn
INS 1
External
INS 2 inputs
Instruction
IR Encoder
decoder
Condition
codes
INSm
Run End
Control signals
• Zin = T1 + T6 • ADD + T4 • BR + …
Branch Add
T4 T6
T1
Figure 7.12. Generation of the Zin control signal for the processor in Figure 7.1.
Generating End
T7 T5 T4 T5
End
Instruction Data
cache cache
Bus interface
Processor
System bus
Main Input/
memory Output
MDRout
WMFC
MAR in
Select
PCout
Micro -
R1out
R3out
Read
PCin
R1 in
End
Add
Z out
IRin
Yin
instruction
Zin
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
Step Action
Figure 7.6. Con trol sequence for execution of the instruction Add (R3),R1.
Overview
• Control store
Starting
IR address
generator One function
cannot be carried
out by this simple
organization.
Clock P C
Control
store CW
• The previous organization cannot handle the situation when the control unit is
required to check the status of the condition codes or external inputs to
choose between alternative courses of action.
• Use conditional branch microinstruction.
Address Microinstruction
Starting and
branch address Condition
IR codes
generator
Clock m PC
Control
store CW
F1 F2 F3 F4 F5
0000: No transfer 000: No transfer 000: No transfer 0000: Add 00: No action
0001: PCout 001: PCin 001: MARin 0001: Sub 01: Read
0010: MDRout 010: IRin 010: MDRin 10: Write
0011: Zout 011: Z in 011: TEMPin
0100: R0out 100: R0in 100: Y in 1111: XOR
0101: R1out 101: R1in
0110: R2out 110: R2 in 16 ALU
functions
0111: R3 out 111: R3 in
1010: TEMPout
1011: Offsetout
F6 F7 F8
What is the price paid for
this scheme?
F6 (1 bit) F7 (1 bit) F8 (1 bit)
11 10 8 7 4 3 0
Address Microinstruction
(octal)
External Condition
Inputs codes
Decoding circuits
AR
Control store
Next address I R
Microinstruction decoder
Control signals
F0 F1 F2 F3
F4 F5 F6 F7
F8 F9 F10
0 0 0 0 0 0 0 0 0 0 1 0 0 1 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 1 1 00 1 1 0 0 0 0 0 0 00 0 1 0 0 0
0 0 2 0 0 0 0 0 0 1 1 0 1 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 1 1 0
121 0 1 0 1 0 0 1 0 1 0 0 01 1 0 0 1 0 0 0 0 01 1 0 0 0 0
122 0 1 1 1 1 0 0 0 0 1 1 10 0 0 0 0 0 0 0 0 00 0 1 0 0 1
1 7 0 0 1 1 1 1 0 0 1 0 1 0 00 0 0 0 1 0 0 0 0 01 0 1 0 0 0
1 7 1 0 1 1 1 1 0 1 0 0 1 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 0
1 7 2 0 1 1 1 1 0 1 1 1 0 1 01 1 0 0 0 0 0 0 0 00 0 0 0 0 0
1 7 3 0 0 0 0 0 0 0 0 0 1 1 10 1 0 0 0 0 0 0 0 00 0 0 0 0 0
Decoder
Decoder
IR Rsrc Rdst
InstDecout
External
inputs ORmode
Decoding
circuits
Condition ORindsrc
codes
AR
Control store
Rdst out
Rdst in
Microinstruction
decoder
Rsrc out
Rsrc in
1
Speedup = x Pipeline Depth
1 + Pipeline stalls per Ins
Dealing With Structural Hazards
1 1
Speedup = x
1+0.4*1 1/1.05
= 0.75
Dealing With Structural Hazards
(continued)
branch IF ID EX MEM WB
IF IF ID EX MEM WB
Performing IF Twice
If p2{
S2
}
DIV.D F0,F2,F4
ADD.D F10,F0,F8
SUB.D F12,F8,F14
Dynamic Scheduling
(continued)
116
Definition and Characteristics
117
Fetching and dispatching two instructions per
cycle 118
Uninterrupted stream of instructions
120
.
Register Renaming Example
WAR dependency exist between LD r7,(r3) and SUB r3, r12,r11 instructions
With Register Renaming, the first write to r3 maps to hw3,while the second write
maps to hw20.This converts four instruction dependency chain into 2 two instructions
chains, which can then be executed in parallel if the processor allows out of order
execution.
121
Hardware Organization of a superscalar processor
122
CONCLUSION
123