Computer Basics

07-CLements-Chap07.
qxd
22/12/2005
6:17 PM
Page 293
Structure of the CPU

CHAPTER MAP
5 The instruction set architecture
The computers instruction set architecture (ISA) describes the low-level programmers view of the computer and denes the type of operations a computer carries out. We are interested in three aspects of the ISA: the nature of the instructions, the resources such as registers and memory used by the instructions, and the ways in which the instructions access data (addressing modes).
7
7 Structure of the CPU
Chapters 5 and 6 describe what a computer does; in this chapter we show how a computer converts an instruction op-code into the actions that implement the op-code. We build on some of the material we covered in the section on digital logic. In this chapter we demonstrate how a computer is organized internally and how it reads instructions from memory, decodes them, and executes them.
6 Assembly language programming

Chapter 6 builds on the previous chapter by demonstrating how instructions are used to construct entire programs. We introduce the programming environment via a simulator that runs on a PC and demonstrate how to implement some basic algorithms.
8 Accelerating performance
The previous chapter described the structure of a simple computer. Here we describe how the performance of computers can be increased by overlapping the execution of instructions (pipelining). We also look at cache memory and introduce parallel processing.
INTRODUCTION
In Chapters 2 and 3 we introduced combinational and sequential logic elements and demonstrated how to build functional circuits. In Chapters 5 and 6 we introduced the instruction set architecture and low-level programming. This chapter bridges the gap between digital circuits and the computer by demonstrating how we can construct a computer from simple circuits; that is, we show how a computer instruction is interpreted (i.e. executed). We begin by describing the structure of a simple generic CPU. Once we see how a computer operates in principle, we can look at how it may be implemented. We describe the operation of a very simple one-and-a-half address machine whose instructions have two operands; one in memory and one a register. Instructions are written in the form ADD A, B that adds A to B and puts the result in B. Either A or B must be a register. Some readers will read this introduction to the CPU before the previous two chapters on assembly language programming. Consequently, some topics will be re-introduced. Instead of introducing the computer all at once, we will keep things simple and build up a CPU step by step. This approach helps demonstrate how an instruction is executed because the development of the computer broadly follows the sequence of events taking place during the execution of an instruction. In the next chapter we will nd that this computer is highly simplied; real computers dont execute an instruction from start to nish. Todays computers overlap the execution of instructions. As soon as one instruction is fetched from memory, the next instruction is fetched before the previous instruction has completed its execution. This mechanism is called pipelining and we examine it more closely in the next chapter.
07-CLements-Chap07.qxd
22/12/2005
6:17 PM
Page 294
294 Chapter 7 Structure of the CPU
7.1 The CPU

A von Neumann stored program digital computer operates by reading instructions from memory and executing them one by one in sequence. If you wish to evaluate X2 1 on a 68K processor where X is a memory location, you may write
by step, on a very primitive hypothetical computer. We begin with the address paths that are used to locate the next instruction to be executed.
7.1.1 The address path

Figure 7.1 provides the block diagram of part of a CPU. In this diagram only the address paths and the paths needed to read an instruction from memory are shown for clarity. We have omitted most of the data paths required to execute instructions to avoid clutter. Address paths are shown in blue and data paths are in light blue. The address paths are the highways along which addresses ow from one part of the CPU to another. An address represents the location of a data element within the memory. There are three types of information ow in a computer: address,
We now demonstrate how instructions like MOVE. W X,DO are read from memory and how the sequence of actions that implement this operation are carried out inside the CPU. Because the CPU is such a complex device, we have decided to walk through the execution of an instruction, step
A HYPOTHETICAL COMPUTER Anyone describing the internal operation of a computer must select an architecture and an organization for their target machine. We have two choices: register to memory or register to register. The register-to-memory model ts the architecture of processors like the Pentium and 68K, whereas the register-to-register model corresponds to processors like the ARM, which we introduce later. When describing the internal structure of a computer, we could
describe either a system that executes an instruction to completion before beginning the next instruction, or a computer that overlaps or pipelines the execution of instructions. I have decided to begin this chapter with the description of a register-to-memory, non-pipelined processor. A non-pipelined organization is easier to describe than one that overlaps the execution of instructions.
PC Program counter Instruction address Incrementer
MAR Memory address register
Address of the next instruction
Address
The first stage in the execution of any instruction is to fetch it from memory. The program counter contains the address of the next instruction to be executed
The contents of the PC are incremented each time it is used Instruction op-code
Memory
Data
The main memory or immediate access store contains both instructions and data MBR
IR
Op-code
Address
Memory buffer register When you read an instruction, it is moved to the instruction register where it is decoded
Control unit Clock
Data paths Address paths
When you read from memory, the contents of the memory location accessed are loaded into the MBR
Control signals
Figure 7.1 The CPUs address paths.
22/12/2005
6:17 PM
Page 295
7.1 The CPU
295
CONSTANTS, VARIABLES, POINTERSA REMINDER A constant or literal is a value that does not change during the execution of a program. Its value is set at compile time or assemble time. Literal addressing is used to handle constants. A variable is a value that changes during the execution of a program. A pointer is a variable that contains the address of a variable; that is, a pointer points to a variable in memory. An address register is a pointer register. The following code implements the expression X 5 Y Z3 and illustrates the nature of constants, variables, and pointers.
data, and control. Data comprises the instructions, constants, and variables that are stored in memory and registers. Control paths comprise the signals that trigger events, provide clocks, and control the ow of data and addresses throughout the computer.
a write cycle, or from which data is being read in a read cycle. At this stage, the MAR contains a copy of the (previous) contents of the PC. When a memory read cycle is performed, the contents of the memory location specied by the MAR are read from the memory and transferred to the memory buffer register (MBR). We can represent this read operation in RTL terms as
7.1.2 Reading the instruction

Before the CPU can execute an instruction, the instruction must rst be brought to the CPU from the computers memory. We begin our description of the way in which a program is executed with the CPUs program counter (also called instruction counter or location counter). The expression program counter is a misnomer. The program counter contains the address of the next instruction in memory to be executed and it doesnt count programs or anything else. The program counter points to the next instruction to be executed. If, for example, [PC] 1234 (i.e. the PC contains the number 1234), the next instruction to be executed is to be found in memory location 1234. Fetching an instruction begins with the contents of the program counter being moved to the memory address register (i.e. [MAR] [PC]). Once the contents of the program counter have been transferred to the memory address register, the contents of the program counter are incremented and moved back to the program counter; that is, We interpret the expression [[MAR]] as the contents of the memory whose address is given by the contents of the MAR. The memory buffer register is a temporary holding place for data received from memory in a read cycle, or for data to be transferred to memory in a write cycle. Some texts refer to the MBR as the memory data register (MDR). At this point in the execution of an instruction, the MBR contains the bit pattern of the instruction to be executed. The instruction is next moved from the MBR to the instruction register (IR) where it is divided into two elds. One eld in the IR contains the operation code (op-code) that tells the CPU what operation is to be carried out. The other eld, called the operand eld, contains the address of the data to be used by the instruction. The operand eld can also provide a constant to be employed by the operation code when immediate or literal addressing is used. Note that we are assuming a one-address instruction format. The control unit (CU) takes the op-code from the instruction register together with a stream of clock pulses and generates signals that control all parts of the CPU. The time between individual clock pulses is in the range 0.3 ns to 100 ns (i.e. 3 1010 to 107s). The control unit is responsible for moving the contents of the program counter into the MAR, executing a read cycle, and moving the contents of the MBR to the IR. Later we look at the control unit in more detail and demonstrate how it goes about interpreting an op-code.
1 Remember that the 68K has variable-length instruction and its PC is incremented by 2, 4, 6, 8, or 10; that is instruction fall on 16-bit boundaries For the purpose of this chapter, we assume that instructions are all 4 bytes long.
At the end of this operation, the program counter points to the next instruction while the current instruction is being executed. The program counter in incremented by 4 rather than 1 because many of todays high-performance computers have 32-bit instructions. Computers are byte-addressed but have 32-bit architectures; that is, they can address individual bytes in memory, although instructions and data normally fall on 32-bit (i.e. 4-byte) boundaries.1 The memory address register (MAR), holds the address of the location in the memory into which data is being written in
22/12/2005
6:17 PM
Page 296
THE INSTRUCTION REGISTER A modern RISC processor like the ARM, which we introduce in Chapter 9 has a 32-bit instruction register which is sufcient for an op-code and three register addresses; for example, ADD R1, R2, R3. Processors like the Pentium and 68K have very complex variable-length instruction formats and you cannot use a simple xed-length instruction register. The processor has to fetch the rst 16 bits of an instruction from memory, decode it, and execute successive fetches to assemble the rest of the instruction.
Table 7.1 The FETCH phase expressed in register transfer language.
All instructions are executed in a two-phase operation called a fetchexecute cycle. During the fetch phase, the instruction is read from memory and decoded by the control unit. The fetch phase is followed by an execute phase in which the control unit generates all the signals necessary to execute the instruction. Table 7.1 describes the sequence of operations taking place in a fetch phase. In Table 7.1 FETCH is a label that serves to indicate a particular line in the sequence of operations. The notation IR(op-code) means the operationcode eld of the instruction register.
7.1.3 The CPUs data paths

Now that weve sorted out the fetch phase, lets see what else we need to actually execute instructions. Figure. 7.2 adds new data paths to the simplied CPU of Fig. 7.1, together with an address path from the address eld of the instruction register to the memory address register. Other modications to Fig. 7.1 included in Fig. 7.2 are the addition of a data register, D0, and an arithmetic and logical unit, ALU. The data register called D0 holds temporary results during a calculation. You need a data register in a one-address machine because dyadic operations with two operands such as ADD X,D0 take place on one operand specied by the instruction and the contents of the data register. This instruction adds the contents of memory location X to the contents of the data register and deposits the result of the addition in the data register, destroying one of the original operands. The arrangement of Fig. 7.2 has only one general-purpose data register, which weve called D0 for compatibility with the 68K we used in the previous chapters. Some rst-generation 8-bit microprocessors had only one general-purpose data register, which was called the accumulator. We can represent an ADD X,D0 instruction2 by the RTL expression
The arithmetic and logic unit (ALU) is the workhorse of the CPU because it carries out all the calculations. Arithmetic and logical operations are applied either to the contents of the data register or MBR alone, or to the contents of the data register and the contents of the MBR. The output of the ALU is fed back to the data register or to the MBR. Two types of operation are carried out by the ALUarithmetic and logical. The fundamental difference between arithmetic and logical operations is that logical operations dont generate a carry when bit ai of word A and bit bi of B are operated upon. Table 7.2 provides examples of typical arithmetic and logical operations. A logical shift treats an operand as a string of bits that are moved left or right. An arithmetic shift treats a number as a signed twos complement value and propagates the sign bit during a right shift. Most of these operations are implemented by computers like the 68K, Pentium, and ARM. Having developed our computer a little further, we can now execute an elementary program. Consider the high-level language operation P Q R. Here the plus symbol means arithmetic addition. The assembly language program required to carry out this operation is given below. Remember that P, Q, and R are symbolic names that refer to the locations of the variables in memory.
Load data register D0 with the contents of memory location Q3 ADD R,D0 Add the contents of memory location R to data register D0 STORE D0,P Store the contents of data register D0 in memory location P LOAD Q,D0
2 A machine with a single accumulator does not need to specify it explicitly; for example, an 8-bit microprocessor may use ADD P to indicate add the contents of P to the accumulator. We write ADD P,DO and make D0 explicit to be consistent with the notation we used for 68K instructions. 3 We have dened explicit LOAD and STORE operations to be consistent with the CPU we construct later in this chapter. The 68K uses a single MOVE operation to indicate LOAD and STORE
22/12/2005
6:17 PM
Page 297
7.1 The CPU

PC Program counter MAR Memory address register Address of next location to be accessed
297
Incrementer Operand address to memory address register Instruction read from memory
Address
Memory
Data
IR
Op-code
Address
Memory buffer register MBR Data read from memory or to be written to memory
Control unit
Clock Data register D0
Control signals Output from ALU goes to D0 or MBR Address paths A Data is transmitted to the ALU where it is operated on
f(A,B)
ALU B
Data paths
Arithmetic and logic units performs data processing operations
Figure 7.2 The CPUs address and data paths.
Operation Addition Subtraction Negation Multiplication Division Divide by 2 Multiply by 2 AND OR NOT EOR Shift left Shift right
Class Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Logical Logical Logical Logical Logical Logical
Typical mnemonic ADD SUB NEG MULU DIVU ASR ASL AND OR NOT EOR LSL LSR
The one-address machine requires a rather cumbersome sequence of operations just to carry out the simple action of adding two numbers. If we had a computer with a three-address format, we could have written
ADD Q,R,P Add the contents of Q to the con-
tents of R and put the result in P Three-address machines are potentially4 faster than one-address machines, because they can do in one instruction things that take other machines three operations. However, the power of threeaddress machines can be achieved only by means of a more complex and expensive CPU and memory system. We can demonstrate how the CPU operates by examining the execution of ADD R, D0 in terms of register transfer language. Table 7.3
Table 7.2 Typical arithmetic and logical operations.
22/12/2005
6:17 PM
Page 298
Move the contents of the PC to the MAR Increment the contents of the PC Read the current instruction from the memory Move the contents of the MBR to the IR Move the op-code from the IR to the CU Move the operand address to the MAR Read the data from memory Perform the addition Move the output of ALU to the data register Operations sharing the same line are executed simultaneously. ALU [MBR] and ALU [D0] are executed simultaneously.
Table 7.3 Expressing the FETCH/EXECUTE cycle for an ADD instruction in RTL.
MICROPROGRAMMING The terms microprogram, microprogramming, microcode, microinstruction, and micro-operation have nothing to do with microprocessor or microcomputer. A micro-operation is the smallest event that can take place within a computer; typical micro-operations are the clocking of data into a register, a memory read operation, putting data on a bus, or adding two numbers together. A microprogram consist of a sequence of microinstructions that, when executed, implement a machine-level instruction. For example, the machine-level or macro-level operation ADD P,D0 can be implemented by executing a sequence of micro-operations. The instructions that comprise a microprogram are called microcode.
gives the sequence of operations carried out during the fetch and execute phases of an ADD R, DO instruction. These operations tell us what is actually going on inside the computer. During the fetch phase the op-code is fed to the control unit by CU [IR(op-code)] and used to generate all the internal signals required to perform the additionthis includes programming the ALU to do addition by adding together the data at its two input terminals to produce a sum at its output terminals. Operations of the form [PC] [MAR] or [D0] [D0] [MBR] are often referred to as microinstructions. Each assembly level instruction (e.g. LOAD, ADD) is executed as a series of microinstructions. Microinstructions and microprogramming are the province of the computer designer. In the 1970s some machines were user-microprogrammable; that is, you could dene your own instruction set. We take a further look at microinstructions later in this chapter.
sequential mode; that is, the computer can execute only a stream of instructions, one by one in strict order. We covered conditional behavior in the previous chapter and we require a means of implementing instructions such as BEQ Target (branch on zero ag set to Target). The computer in Fig. 7.2 lacks a mechanism for making choices or repeating a group of instructions. To do this, the CPU must be able to execute conditional branches or jumps such as BEQ XYZ. Weve already met the branch that forces the CPU to execute an instruction out of the normal sequence. The block diagram of Fig. 7.3 shows the new address and data paths required by the CPU to execute conditional branches. Three items have been added to our computer in Fig. 7.3:
G
a condition code register, CCR a path between the CCR and the control unit a path between the address eld of the instruction register and the program counter.
7.1.4 Executing conditional instructions

So far, weve considered the architecture of the single-instruction single-data CPU capable of executing programs in a purely
4 A three-address machine is faster than a machine with registerto-register instructions only if the access time of memory is comparable to the access time of register. Memory takes much longer to access than internal registers and, therefore, a three-address machine is not currently practical.
22/12/2005
6:17 PM
Page 299
7.1 The CPU

PC Program counter MAR Memory address register
299
Incrementer
Address The next instruction may be the next instruction in sequence or the instruction at the branch target address
Memory
The IR's operand field provides the branch target address IR Op-code Address
Data
Memory buffer register MBR
Clock
Control unit Data register D0
Control signals
The control units uses the CCR output to decide whether to execute the next instruction or to force a branch
f(A,B)
ALU B
The arthimetic and logic unit performs data processing operations
CCR Condition code register program flow control paths The word in the CCR indicates whether the last operation gave a zero or a negative result or whether a carry was generated
Detail of the ALU's result are sent to the CCR
Figure 7.3 Information paths in the CPU and conditional instructions.
The condition code register or processor status register takes a snapshot of the state of the ALU after each instruction has been executed and records the state of the carry, negative, zero, and overow ag bits. A conditional branch instruction interrogates the CCRs current state. The control unit then either forces the CPU to execute the next instruction in series or to branch another instruction somewhere in the program. Lets look at the details of the conditional branch. The CPU updates the bits of its condition code register after it carries out an arithmetic or a logical operation to reect the nature of the result. The following is a reminder of the operations of the condition code bits.
C carry Set if a carry was generated in the last operation. The C-bit is, of course, the same as the carry bit in the carry ip-op. Z zero Set if the last operation generated a zero result. N negative Set if the last result generated a negative result. V overow Set if the last operation resulted in an arithmetic overow; that is, an operation on one or two twos complement values gave a result that was outside its allowable range (an arithmetic overow occurs during addition if the sign bit of the result is different from the sign bit of both operands).
22/12/2005
6:17 PM
Page 300

The condition code register is connected to the control unit, enabling instructions to interrogate the CCR. For example, some instructions test whether the last operation performed by the central processor yielded a positive result, or whether the carry bit was set, or whether arithmetic overow occurred. Theres no point in carrying out an interrogation unless the results are acted upon. We need a mechanism that does one thing if the result of the test is true and does another thing if the result of the test is false. The nal modication included in Fig. 7.3 is the addition of a path between the operand eld (i.e. target address) of the instruction register and the program counter. Its this feature that enables the computer to respond to the result of its interrogation of the CCR. A conditional branch instruction tests a bit of the CCR and, if the bit tested is clear, the next instruction is obtained from memory in the normal way. But if the bit tested is set, the next instruction is obtained from the location whose target address is in the instruction register. In the above description we said that a branch is made if a certain bit of the CCR is set; equally a branch can be made if the bit is clear (branches can also be made on the state of several CCR bits). The precise way in which conditional branches are actually implemented inside the computer is discussed later when we deal with the design of the control unit. Branch operations can be expressed in register transfer language in the form memory. Sometimes we wish to use instructions such as ADD #12,D0, where the source operand supplies the actual value of the data being referred to by the op-code part of the instruction. Although the symbol # appears as part of the operand when this instruction is written in mnemonic form, the assembler uses a different op-code code for ADD #literal,D0 than it does for ADD address,D0. The instruction ADD.B #12,D0 is dened in RTL as
Figure 7.4 shows that an additional data path is required between the operand eld of the IR and the data register and ALU to deal with literal operands. In fact, the architecture of Fig. 7.4 can execute any computer program. Any further modications to this structure improve the CPUs performance without adding any fundamentally new feature. Figure 7.5 completes the design of the computer. We have added a second general-purpose data register D1 and a pointer register A0. In principle, there is nothing stopping us adding any number of registers. As you can see, three buses, A, B, and C are used to transfer data between the registers and ALU. The structure of Fig. 7.5 can implement instructions with more complex addressing modes than the simple direct (absolute) addressing we have used so far; for example MOVE (A0),D1 can be implemented by the following sequence of micro-operations.
Move source operand address to MAR Read the actual operand from memory Copy data to D1
Typical machine-level conditional operations expressed in RTL are 1. Branch on carry clear (jump to the target address if the carry bit in the CCR is 0)
BCC target: IF [C]= 0 THEN [PC][IR(address)]
2. Branch on equal (jump to the target address if the Z-bit in the CCR is 1)
BEQ target: IF [Z]= 1 THEN [PC][IR(address)]
An example of a conditional branch is as follows. Subtract X from contents of D0 If the result was zero then branch to Last, otherwise continue Target address of branch (if taken)
This sequence has been simplied because, you will see from Fig. 7.5, that there is no direct path between register A0 and the MBR. You would have to put the contents of A0 onto bus A, pass the contents of bus A through the ALU to bus C, and then copy bus C to the MAR. We will return to this theme when we look at the detailed design of computers. We have now demonstrated the ow of information that takes place during the execution of a single address computer instruction. In the next section we reinforce some of the things we have covered by showing how you can simulate a computer architecture in C.
7.2 Simulating a CPU

7.1.5 Dealing with literal operands
The CPU of Fig. 7.2 assumes all instructions (e.g. ADD and BEQ etc.) refer to an operand somewhere within the CPUs One way of learning how a processor operates is to build one. We present a program in C that simulates a very simple 8-bit CPU. In order to make the simulator as accessible to as many readers as possible, we have written the simulator in C but
22/12/2005
6:17 PM
Page 301

301
Incrementer
Address
Memory
Data
IR
Op-code
Operand
Memory buffer register MBR Literal to ALU
Clock
Control unit Data register D0
Control signals The operand field of the IR is fed to the data register or ALU to provide a literal operand The literal from the instruction register can by loaded into the data register Literal data paths CCR
A f(A,B)
ALU B The literal from the instruction register can be supplied to the ALU
Condition code register
Figure 7.4 Modifying the CPU to deal with literal operands.
have avoided all but Cs most basic elements. All data types are 8 bits and the only C constructs we use are the while, the if . . . then . . . else, and the switch constructs, which select one of several courses of action. We are going to construct two simulatorsthe rst is a very primitive CPU with an 8-bit instruction that simply demonstrates the fetch/execute cycle, and the second is not too dissimilar to typica rst-generation 8-bit microprocessors.
branch instructions. Only the store instruction performs a write to memory. Choosing an instruction set requires many compromises; for example, if the number of bits in an instruction is xed, increasing the number of different instructions reduces the number of bits left for other functions such as addressing modes or register selection. We can dene an instruction set for our primitive 8-bit machine as
7.2.1 CPU with an 8-bit instruction
Instruction
Mnemonic
LDA STA ADD BRA BEQ N N N N N
RTL definition
[D0] [N] [N] [D0] [D0] [D0] + [N] [PC] N IF [D0] = 0 THEN [PC] N
Load D0 from memory Store D0 in memory Our rst computer has a single data register (i.e. Add memory to D0 accumulator) called D0 and all instructions are Branch to location N memory to register apart from the store and the If [D0] = 0 then branch to N
22/12/2005
6:17 PM
Page 302

Incrementer
Address
Memory Path between operand field of IR and bus A to literals MBR Op-code Address Memory buffer register
Data
Data register D0 Clock Control unit
Data register D1
Address register A0 Control signals A Bus A The output from the ALU is fed via bus C to the registers f(A,B) Bus C B Bus B CCR Condition code register ALU
Figure 7.5 Processor with multiple registers.
We have provided only ve instructions because these are illustrative of all instructions. This computer has an 8-bit instruction format that includes both the op-code and the operand. If we chose a 3-bit op-code (eight instructions) and a 4-bit operand (a 16-bit memory), the remaining bit can be used to specify the addressing mode (absolute or literal). Real 8-bit microprocessors solve the problem of instruction set design by using 1 byte to provide an operation code and then 0, 1, or 2 succeeding bytes to provide an operand.
Figure 7.6 denes the structure of an 8-bit instruction for our simulated machine. The rst step in constructing a simulator is to describe the action of the computer in pseudocode.
22/12/2005
6:17 PM
Page 303

8-bit instruction format 7 6 5 4 3 2 1 0
303
Op-code 000 = LDA 001 = STA 010 = ADD 011 = BRA 100 = BEQ
Operand Addressing mode 0 = absolute 1 = literal (immediate)
Figure 7.6 Format of an 8-bit instruction.
C allows you to operate on individual bits of a byte; for example, the operator n performs a right shift by n bits. The opcode is obtained from the three most-signicant bits of the IR by shifting right ve times. A bitwise logical AND can be performed between a variable and a hexadecimal value; for example IR & 0x0F ANDs the IR with 000011112 to extract the operand bits in the four least-signicant bit positions. Once weve extracted the addressing mode (bit 4 of the instruction register) with amode (IR & 0x10) 4, we can calculate the source operand for the load and add instruction by
We can now write a program to implement this algorithm. The following fragment of code is largely self-explanatory. The instruction in the 8-bit instruction register (IR) is decoded by the three operations
The following listing provides the C code for this CPU simulator.
22/12/2005
6:17 PM
Page 304
Most of the work done in the simulator takes place in the switch construct at the end of this program where each instruction is interpreted.
Instruction 7 6 5 4 3 2 1 0 7 6
Operand 5 4 3 2 1 0
Op-code
7.2.2 CPU with a 16-bit instruction

We now describe a CPU that is much closer to the architecture of typical 8-bit microprocessors. The simulator uses an 8-bit memory with 256 locations. Each instruction occupies two consecutive memory locationsan 8-bit instruction followed by an 8-bit operand. This arrangement provides us with a much richer instruction set than the previous example. However, each fetch cycle requires two memory accesses. The rst access is to fetch the op-code and the seconds to fetch the operand; that is;
Bit 3 not used
Addressing mode 00 = absolute 01 = literal (immediate) 10 = indexed 11 = PC relative Direction 0 = register to memory 1 = memory to register
Figure 7.7 Format of the CPUs instruction.
provided program counter relative addressing (discussed in the next chapter) in which the operand is specied with respect to the current value of the program counter; for example, MOVE D0,12(PC) means store the conCopy contents of the PC to the MAR tents of data register D0 12 bytes on from the locaIncrement contents of the PC tion pointed at by the program counter. Read the instruction from memory The instruction itself is divided into four elds, Move the instruction to the IR as Fig. 7.7 demonstrates. A 4-bit op-code in bits 7, Save the op-code 6, 5, 4 provides up to 16 instructions. A 2-bit Copy contents of the PC to the MAR addressing mode in bits 1, 0 selects the way in Increment contents of the PC which the current operand is treated. When the Read the operand from memory addressing mode is 00, the operand provides the Move the operand to the IR address of the data to be used by the current Save the operand instruction. When the addressing mode is 01 the operand provides the actual (i.e. literal) operand. Modes 10 This multibyte instruction format is used by 8-bit and and 11 provide indexed and program counter relative 16-bit microprocessors. Indeed, the 68K has one 10-byte addressing respectively (i.e. the operand is added to the A0 register or the PC, respectively). instruction. Bit 2 of the instruction is a direction bit that determines The architecture of this computer is memory to register or register to memory; for example, it supports both ADD D0,M whether the source operand is in memory or is provided by and ADD M, D0 instructions. In addition to the direct and lit- the data register; for example, the difference between eral addressing modes, we have provided address register MOVE D0,123 and MOVE 123,D0 is determined by the value indirect addressing with a single A0 register. We have also of the direction bit.
22/12/2005
6:17 PM
Page 305
305
We can express the basic fetch cycle and decode instruction phase in C as
Each instruction is executed by means of the switch construct. Note that the CCR has only a zero ag (it would have been more complex to have provided a C- and V-bit). The following provides the complete code for the processor.
22/12/2005
6:17 PM
Page 306
22/12/2005
6:17 PM
Page 307
307
Now that we have examined the sequence of events that take place during the execution of an instruction, the next step is to demonstrate how the binary code of an instruction is translated into the actions that
implement the instruction. In the next two sections we describe two different types of control unit; the microprogrammed control unit and the so-called random logic control unit.
22/12/2005
6:17 PM
Page 308
7.3 The random logic control unit

Weve demonstrated how you can write a program using assembly language instructions. Weve shown how an assembly language instruction can be reduced to the ow of information between registers and functional units in a CPU. Weve shown how register, counters, and logical units can be created from primitive gates and ip-ops. Whats left? What we have not yet demonstrated is how a binary pattern such as, say, 11011011101010012 can be turned into the sequence of operations that causes a machine-level operation like ADD $A9,D0 to be executed. We now make good this omission and complete the nal link in the chain between gate and computer. There are two ways of transforming an instruction into the operations that interpret it. One is to create the logic that directly transforms instructions into control signals. The other is to design a special computer that takes a machinelevel instruction as a program and generates control signals as an output. We rst describe the random logic unit, which uses gates and ip-ops to generate control signals. When engineers design a random logic control unit (RALU) they ask What sequence of microinstructions is needed
to execute each machine code instruction and what logic elements do we need to implement them? In other words, designers resort to the Boolean techniques we described in Chapter 2. The word random in the expression random logic element is employed in a specic sense and implies that the arrangement of gates from which the control unit is constructed varies widely from computer to computer. The same microprogrammed control unit can readily be adapted to suit many different computers with relatively little modication, whereas the random logic control unit is dedicated to a specic CPU and cannot easily be modied.
7.3.1 Implementing a primitive CPU

Before designing a random logic control unit, lets look at the structure of a very simple CPU in order to determine what control signals we need to synthesize. Figure 7.8 illustrates a computer with a single bus that is connected to all registers and the memory. This arrangement reduces the amount of logic required to implement the processor, but it is inefcient because only one word at a time can be copied from a source to a destination. The ALU has two inputs P and Q. The P input comes only from data register D0 and the Q input
GMSR System bus
R W
Read Write Memory Address
Data out EMSR Data in Each tri-state buffer puts data on the bus when enabled
CMAR
MAR GMBR
CMBR
MBR
EMBR GIR
When clocked a register latches data from the bus
CIR
IR
EIR GPC
Bus
CPC
PC
EPC GDO The inputs to the ALU are fro register D0 and the system bus The output of the ALU is a function of inputs P and Q and is determined by control signals F0 and F1. The result is latched into the ALU register by CALU
CDO
D0
EDO P
F0 and F1 select the ALU F0 function and CALU clocks F the ALU output into the 1 C ALU register ALU
ALU register f(P, Q)
ALU Q GALU EALU
Figure 7.8 Structure of a single-bus CPU.
22/12/2005
6:17 PM
Page 309
309
Signal number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Signal R W CMAR CMBR CPC CIR CD0 CALU EMSR EMBR EIR EPC ED0 EALU F0 F1
Type Memory control Memory control Register clock Register clock Register clock Register clock Register clock Register clock Bus control Bus control Bus control Bus control Bus control Bus control ALU control ALU control
Operation Read from memory Write to memory Clock data into MAR Clock data into MBR Clock data into program counter Clock data into instruction register Clock data into data register Clock data into ALU register Gate data from the memory onto the bus Gate data from the MBR onto the bus Gate operand from IR onto the bus Gate program counter onto the bus Gate data register onto the bus Gate ALU funtion register onto the bus Select ALU function bit 0 Select ALU funtion bit 1
Table 7.4 The control signals in the CPU of Fig. 7.8.
comes only from the system bus. Note that this structure allows the memory to transfer data directly to or from any register; that is, all data does not have to pass through the MBR. The memory receives the address of the location to be accessed directly from the MAR, whose output is permanently connected to the memorys address input. A dedicated connection between the MAR and memory is possible because the memory never receives an address input from a source other than the memory address register. A permanent connection removes the need for bus control circuits. Two data paths link the memory to the system bus. In a read cycle when memory control input R is asserted, data is transferred from the memory to the system bus via tri-state gate GMSR. During a memory write cycle when memory control input W is asserted, data is transferred from the system bus directly to the memory. The MBR, data register, program counter, and instruction register are each connected to the system bus in the same way. When one of these registers wishes to place data on the bus, its tri-state gate is enabled. Conversely, data is copied into a register from the bus by clocking the register. The instruction register (IR) receives data from the memory directly, without the data having to pass through the MBR. The ALU receives data from two sources, the system bus and data register D0, and places its own output on the system bus. This arrangement begs the question, If the ALU gets data from the system bus how can it put data on the same bus at the same time it is receiving data from this bus? Figure 7.8 shows that the ALU contains an internal ALU register. When
F1 0 0 1 1
F0 0 1 0 1
function add P to Q subtract Q increment Q decrement Q
Table 7.5 Decoding the ALU control code, F0, F1.
this register is clocked by CALU, the output from the ALU is captured and can be put on the system bus by enabling gate GALU. Table 7.4 denes the 16 control signals in Fig. 7.8. Instruction decoding takes an instruction and uses it to create a sequence of 16-bit signals that control the system in Fig. 7.8. The ALU is controlled by a two-bit code, F1, F0, which determines its functions as dened in Table 7.5. These operations are representative of real instructions, although a practical ALU would implement, typically, 16 different functions. In order to keep the design of a random logic control unit as simple as possible, we will construct a 3-bit operation code giving a total of eight instructions. This instruction set dened in Table 7.6 presents a very primitive instruction set indeed, but it does include the types of instruction found in real rst-generation processors. We have dened explicit LOAD and STORE instructions rather than a single MOVE instruction which does the work of both LOAD and STORE. Having constructed an instruction set, we dene each of the instructions in terms of RTL and determine the
22/12/2005
6:17 PM
Page 310

sequence of operations necessary to carry them out on the computer in Fig. 7.8. Table 7.7 gives the micro-operations for each instruction including the fetch phase. The symbol Z is the zero-ag bit from the CCR, which is assumed to be part of the ALU.
Op-code
Mnemonic
Operation
Note that N is the operand eld used by the instruction.
Table 7.6 A primitive instruction set for the CPU of Fig. 7.8.
Instruction Op-code Operations (RTL) Fetch [MAR] [IR] [ALU] [PC] 000 [MAR] [D0] [PC] [[MAR]] [PC] [ALU]
Control actions EPC R EPC E ALU E IR R =1 = 1, EMSR = 1, = 1, F 1,F 0 = 1,0, =1 =1 = 1, E MSR = 1, C MAR C IR C ALU C PC C MAR C D0 C MAR W=1 C MAR C MBR C ALU C D0 C MAR C MBR C ALU C D0 C MAR C MBR C ALU W=1 C MAR C MBR C ALU W=1 C PC IF Z = 1 THEN CPC
LOAD
[IR] [[MAR]]
STORE
001
[MAR] [IR] [[MAR]] [D0] [MAR] [MBR] [ALU] [D0] [MAR] [MBR] [ALU] [D0] [MAR] [MBR] [ALU] [[MAR]] [MAR] [MBR] [ALU] [[MAR]] [PC] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU]
E IR = 1 E D0 = 1 E IR R E MBR E ALU E IR R E MBR E ALU E IR R E MBR E ALU E IR R E MBR E ALU E IR =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,0, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,1, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,1, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 1,1, =1 =1
ADD
010
SUB
011
INC
100
DEC
101
BRA BEQ
110 111
[IR]
IF Z = 1 THEN [PC] [IR]
E IR = 1
Table 7.7 Interpreting the instruction set of Table 7.6 in RTL and microinstructions.
22/12/2005
6:17 PM
Page 311
311
Consider the load D0 from memory operation; this requires the following two steps:
Then we have to put the memory in read mode, put the data from the memory onto the bus by enabling the GMSR gate, and nally capture the data in D0 by clocking register D0.
[MAR] [IR] EIR = 1 [D0] [[MAR]] R = 1,
CMAR EMSR = 1,
CD0
Copy operand address to MAR Read memory and copy to D0

Table 7.7 tells us what signals have to be asserted to execute the two operations required to interpret LOAD N. Table 7.8
We have to send the operand address in the instruction register to the memory address register by enabling the GIR gate and then clocking the data into the memory address register.
Instruction Operations (RTL) Control actions R Fetch [MAR] [IR] [ALU] [PC] LOAD [MAR] [D0] STORE [MAR] [PC] [PC] [ALU] [IR] 0 0 0 0 W 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 CMAR 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 CMBR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CPC 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
CIR 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CD0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
CALU 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0
EMSR 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
EMBR 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
EIR 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
EPC 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ED0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EALU 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0
F1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
F0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
[[MAR]] 1
[[MAR]] 1 [IR] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[[MAR]] [D0] ADD [MAR] [MBR] [ALU] [D0] SUB [MAR] [MBR] [ALU] [D0] INC [MAR] [MBR] [ALU] [IR] [MBR] [ALU] [IR] [MBR] [ALU] [IR] [MBR]
[[MAR]] 1
[[MAR]] 1
[[MAR]] 1
[[MAR]] [ALU] DEC [MAR] [MBR] [ALU] [IR] [MBR]
[[MAR]] 1
[[MAR]] [ALU] BRA BEQ [PC] [IR]
IF Z = 1 THEN [PC] [IR] 0 0 0 0 Z 0 0 0 0 0 1 0 0 0 0 0
Table 7.8 Interpreting the micro-operations of Table 7.7 as microinstructions.
22/12/2005
6:17 PM
Page 312

gives all the signals in the form of a 16-component vector; that is, the two vectors are 0010000000100000 and 1000000010001000 Figure 7.9 shows the timing of the execution phase of this instruction. We have included only ve of the 16 possible control signals because all the other 12 signals remain inactive during these two micro-operations. we require a source of signals to trigger each of the microinstructions. A circuit that produces a stream of trigger signals is called a sequencer. Figure 7.11 provides the logic diagram of a simplied eight-step sequencer. The outputs of three JK ip-ops arranged as a 3-bit binary up-counter counting 000, 001, 010, . . . ,111, are connected to eight three-input AND gates to generate timing signals T0 to T7. Figure 7.12 illustrates the timing pulses created by this circuit. Note that the timing decoder is similar to the instruction decoder of Fig. 7.11. As not all macroinstructions require the same number of microinstructions to interpret them, the sequencer of Fig. 7.11 has a reset input that can be used to reset the sequencer by returning it to state T0. The sequencer of Fig. 7.11 is illustrative rather than practical, because, as it stands, the circuit may generate spurious timing pulses at the timing pulse outputs due to the use of an asynchronous counter. All outputs of an asynchronous counter dont change state at the same instant and therefore the bit pattern at its output may pass through several states (if only for a few nanoseconds) before it settles down to its nal value. Unfortunately, these transient states or glitches may last long enough to create spurious timing signals, which, in turn, may trigger undesired activity within the control unit. A solution to these problems is to disable the output of the timing pulse generator until the counter has settled down (or to use a synchronous counter). The next step in designing the control unit is to combine the signals from the instruction decoder with the timing signals from the sequencer to generate the actual control signals. Figure 7.13 shows one possible approach. There are nine vertical lines in the decoder Fetch phase of Fig. 7.13 (only three are shown). One vertical line corresponds to the fetch phase and each of the other eight lines is assigned to one of the eight instructions. At any instant one of the vertical lines from the instruction decoder (or fetch) is in a logical one state, enabling the column of two-input AND gates to which it is connected. The other inputs to the column of AND gates are the timing pulses from the sequencer. As the timing signals, T0 to T7, are generated, the outputs of the AND gates enabled by the current instruction synthesize the control signals required to implement the random logic control unit. The output of each AND gate corresponding to a particular microinstruction (e.g. CMAR) triggers the actual microinstruction (i.e. micro-operation). As we pointed out earlier, not all macroinstructions require eight clock cycles to execute them.
7.3.2 From op-code to operation

In order to execute an instruction we have to do two things. The rst is to convert the 3-bit op-code into one of eight possible sequences of action and the second is to cause these actions to take place. Figure 7.10 shows how the instructions are decoded and is similar in operation to the 3-line to 8-line decoder described in Chapter 2. For each of the eight possible three-bit op-codes, one and only one of the eight outputs is placed in an active-high condition. For example, if the op-code corresponding to ADD (i.e. 010) is loaded into the instruction register during a fetch phase, the ADD line 2 from the AND gate array is asserted high while all other AND gate outputs remain low. Its no good simply detecting and decoding a particular instruction. The control unit has to carry out the sequence of microinstructions that will execute the instruction. To do this
Fetch phase
Execute Micro-operation 1 Send IR to MAR Micro-operation 2 Read memory, latch into D0
Clock Put [IR] on bus EIR Enable IR to bus CMAR Clock data into MAR R Read memory EMSR Enable memory to bus CD0 Clock data into D0 Capture data in D0 Put operand on bus Read operand
Capture [IR] in MAR
Figure 7.9 Timing of the execute phase of a LOAD N instruction.
22/12/2005
6:17 PM
Page 313

3 op-code bits Bit 2 Bit 1 Bit 0 Operand address N
313
000 LOAD
N,D0
001 STORE D0,N 010 ADD
The 3-bit op-code is decoded into N,D0 eight operations
111 BEQ
Figure 7.10 The instruction decoder.
CLR 1 Clock 1 K J C Q 1 K Q 1 J
CLR Q C Q 1 K 1 J
CLR Q C Q
Clear counter to state T0
T0
T1 Eight timing pulses T2 (Only 4 outputs shown for simplicity)
T7
Figure 7.11 The timing pulse generator (sequencer).
One machine cycle (eight clock states) T0 T1 T2 T3 T4 T5 T6 T7
Figure 7.12 The outputs from the timing pulse generator.
22/12/2005
6:17 PM
Page 314

Op-code Op-code EXECUTE FETCH EPC=1, CMAR T0 R=1, EMSR =1, CIR T1 T2 EALU=1, CPC T3 T1 EPC=1, F1, F0=1, 0, CALU T0 EIR=1, CMAR T0 R=1, EMSR =1, CD0 T1 ED0=1, W=1 EIR=1, CMAR
Figure 7.13 Combining control signals.
MBR
CPC CD0 CALU F 0 F 1 R
IR
PC
E E
D0
ALU
MAR
MBR
W E
CIR
MSR
Figure 7.14 The OR gate array used to generate the actual microinstructions.
Each microinstruction is activated by one or more control signals from the nine columns of AND gates. Figure 7.14 shows the array of OR gates that combine the outputs from the AND gates to generate the control signals. The inputs from these OR gates come from the nine columns of AND gates.
always in one of two states (fetch or execute), an RS ip-op provides a convenient way of switching from one state to another. When Q 0 the current operation is a fetch phase, and when Q 1 an execute phase is being performed. Figure 7.15 is an extension of Figure 7.13 and demonstrates how the instruction decoder is enabled by the Q output of _ the fetchexecute ip-op, and the fetch decoder by the Q output. At the end of each fetch phase, a clock pulse from the timing generator sets the fetchexecute ip-op, permitting the current op-code to be decoded and executed. The timingpulse generator is reset at the end of each fetch. At the end of each execute phase, the fetchexecute ip-op is cleared and the sequencer reset, enabling the next fetch phase to begin. Table 7.9 shows how the machine-level instructions can be represented in terms of both timing signals and microinstructions. Note that weve included the micro-operation [MAR] [IR] in the fetch phase. The microinstructions are the enable signals to the bus drivers, the register clocks, the ALU function select bits, the memory controls (R and W), and the reset and set inputs of the fetchexecute ip-op. For each of the microinstructions we can write down a Boolean expression in terms of the machine-level instruction and the sequence of timing pulses. For example, consider expressions for EMBR, EIR, and CMAR. EMBR ADD T1 SUB T1 INC T1 DEC T1 EIR Fetch T4 BRA T0 BEQ T0 CMAR Fetch T0 Fetch T4
The fetchexecute ip-op

So far we have devised a mechanism to interpret each macroinstruction but have not looked at how we implement the two-phase fetchexecute cycle. As the control unit is We should note, of course, that this CPU and its microprogram are very highly simplied and illustrate the nature of the random logic CU rather than its exact design.
22/12/2005
6:17 PM
Page 315
7.4 Microprogrammed control units

Other reset inputs
315
Q Fetchexecute flip-flop
EXECUTE
S Select execute phase
FETCH Op-code (from IR) T0 T0 T0
T1
T1 Select fetch phase
T1 Select fetch phase
T2 At the end of each operation the last T timing pulse is used 3 to reset the fethexecute flip-flop T4
Control signals to the OR gate array
Figure 7.15 The fetchexecute ip-op.
In the next section we look at how the design of a control unit can be simplied by putting the sequence of microoperations in a table and then reading them from a table, rather than by synthesizing them in hard logic.

Before we describe the microprogrammed control unit, lets remind ourselves of the macro-level instruction, micro-level instruction, and interpretation. The natural or native language of a computer is its machine code whose mnemonic representation is called assembly language. Machine-level instructions are also called macroinstructions. Each macroinstruction is executed by means of a number of primitive actions called microinstructions. The process whereby a macroinstruction is executed by carrying out a series of microinstructions is called interpretation. Lets begin with another simple computer. Consider Fig. 7.16. The internal structure of this primitive CPU differs slightly from that of Fig. 7.8 because theres more than one bus. The CPU in Fig. 7.16 includes the mechanisms by which information is moved within the CPU. Each of the registers (program counter, MAR, data register, etc.) is made up of D ip-ops. When the clock input to a register is pulsed, the
data at the registers D input terminals is transferred to its output terminals and held constant until the register is clocked again. The connections between the registers are by means of m-bit wide data highways, which are drawn as a single bold line. The output from each register can be gated onto the bus by enabling the appropriate tri-state buffer. We have used a multiplexer, labeled MPLX, to select the program counters input from either the incrementer or the operand eld of the instruction register. The multiplexer is controlled by the 1-bit signal Mux, where Mux 0 selects the incrementer path, and Mux 1 selects the branch target address from the address/operand eld of the instruction register, IRaddress. Suppose our computer performs a fetchexecute cycle in which the op-code is ADD N,D0. This instruction adds the contents of the memory location specied by the operand eld N to the contents of the data register (i.e. D0) and deposits the result in D0. We can write down the sequence of operations that take place during the execution of ADD not only in terms of register transfer language, but also in terms of the enabling of gates and the clocking of ip-ops. Table 7.10 illustrates the sequence of microinstructions executed during the fetchexecute cycle of an ADD instruction. It should be emphasized that the fetch phase of all instructions is identical and it is only the execute phase that varies according to the nature of the op-code read during the fetch phase.
22/12/2005
6:17 PM
Page 316

Instruction Time Memory R Fetch T0 T1 T2 T3 T4 LOAD STORE ADD T0 T0 T0 T1 T2 SUB T0 T1 T2 INC T0 T1 T2 DEC T0 T1 T2 BRA BEQ T0 T0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 W 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 MBR 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 IR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 Gate enables PC 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 MSR 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 ALU 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 MAR 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Register clocks MBR IR 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PC 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Z D0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 ALU 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 ALU Fetchexecute F1 F0 X X X X 1 0 X X X X X X X X X X 0 0 X X X X 0 1 X X X X 1 0 X X X X 1 1 X X X X X X R 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 S 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Table 7.9 The interpretation of machine code instructions.
7.4.1 The microprogram

Imagine that the output of the control unit in Fig. 7.16 consists of 10 signals that enable gates G1 to G10, the PC input multiplexer, two signals that control the memory, and ve clock signals that pulse the clock inputs of the PC, MAR, MBR, IR, and D0 registers. Table 7.11 presents the 17 outputs of the control unit as a sequence of binary values that are generated during the fetch and execute phases of an ADD instruction. We have not included the ALU function signals in this table. When the memory is accessed by E 1, a memory read or write cycle may take place. The R/W (i.e. read/write) signal determines the nature of the memory access when E 1. When R/W 0 the cycle is a write cycle, and when R/W 1 the cycle is a read cycle.
If, for each of the seven steps in Table 7.11, the 17 signals are fed to the various parts of the CPU in Fig. 7.16, then the fetchexecute cycle will be carried out. Real microprogrammed computers might use 64 to 200 control signals rather than the 17 in this example. One of the most signicant differences between a microinstruction and a macroinstruction is that the former contains many elds and may provide several operands, while the macroinstruction frequently species only an op-code and one or two operands. The seven steps in Table 7.11 represent a microprogram that interprets a fetch phase followed by an ADD instruction. We have demonstrated that a macroinstruction is interpreted by executing a microprogram, which comprises a sequence of microinstructions. Each of the CPUs instructions has its own microprogram. We now look at the
22/12/2005
6:17 PM
Page 317

Clock PC
317
PC
Q D G1
Clock MAR
MAR Address Memory Data in out
D Q
MPLX E Mux Incrementer Branch path Clock PC Q D G2 R/W
G3 G4 G5 Bus A
Q IR
G10 Literal path
Clock MBR
MBR
Q D
G6 G7
Bus B
Control units
Clock D0 Control signals C G8 Bus C
D0
Q D A B
ALU
F1 F2 F3 Copy bus A to bus C G9
Copy bus A to IR
Figure 7.16 Controlling the ow of information in a computer.
Step 1 1a 2 3 4 4a 5 6 7 7a 7b
Register transfer language

[MAR] INC [PC] [MBR] [IR] CU [MAR] [MBR] ALU ALU [D0] [PC] [PC] INC [[MAR]] [MBR] [IR(op-code)] [IR(address)] [[MAR]] [MBR] [D0] ALU
Operations required enable G1, clock MAR

Mux = 0, clock PC enable memory, R/W=1, enable G3 enable G9, clock MBR enable G4, clock IR enable G2, clock MAR enable memory, R/W=1, enable G3, enable G9, clock MBR enable G4, set ALU function to add enable G7 enable G8, clock data register D0
Note 1 Where there is no entry in the column labeled Operations required, that operation happens automatically. For example, the output of the program counter is always connected to the input of the incrementer and therefore no explicit operation is needed to move the contents of the PC to the incrementer. Note 2 Any three-state gate not explicitly mentioned is not enabled. Note 3 Steps 1, 1a are carried out simultaneously, as are 4, 4a and 7, 7a, 7b.
Table 7.10 Interpreting a fetchexecute cycle for an ADD N,D0 instruction in terms of RTL.
22/12/2005
6:17 PM
Page 318

Step Gate control signals and MPLX control G1 1 2 3 4 5 6 7 1 0 0 0 0 0 0 G2 0 0 0 0 1 0 0 G3 0 0 1 0 0 1 0 G4 0 0 0 1 0 0 1 G5 0 0 0 0 0 0 0 G6 0 0 0 0 0 0 0 G7 0 0 0 0 0 0 1 G8 0 0 0 0 0 0 1 G9 0 0 1 0 0 1 0 G10 0 0 0 0 0 0 0 Mux 0 0 0 0 0 0 0 Memory E 0 0 1 0 0 1 0 R/W 0 0 1 0 0 1 0 Register clocks PC 0 1 0 0 0 0 0 MAR 1 0 0 0 1 0 0 MBR 0 0 1 0 0 1 0 D0 0 0 0 0 0 0 1 IR 0 0 0 1 0 0 0
Table 7.11 Control signals generated during the fetch and execution phases of an ADD instruction.
This is part of the CPU Operation code Address Instruction register
Address mapper
Incrementer
Microprogram counter
Clock
Address Microprogram memory Data Microinstruction register Control signals to other parts of the CPU
Next microinstruction address
Load Condition control select
CPU control field
Branch condition
Multiplexer
Branch on zero Branch on not zero Branch never (logical zero) Branch always (logical one)
These signals are from the ALU in the CPU
Figure 7.17 The microprogrammed control unit.
microprogram itself and consider the hardware required to execute it. The microprogram is executed by the same type of mechanism used to execute the macroprogram (i.e., machine code) itself. This is a good example of the common expression wheels within wheels. Figure 7.17 describes the basic structure of a microprogrammed control unit that has a microprogram counter, a microprogram memory, and a microinstruction register (this structure is typical of the 1980s). The microinstruction
address from the microprogram counter is applied to the address input of the microprogram memory and the data output of the memory fed to the microinstruction register. As weve said, the structure of the control unit that executes the macroinstruction is very much like the structure of the CPU itself. However, there is one very big difference between the macroinstruction world and the microinstruction worldthe microinstruction register is very much longer than the macroinstruction register and the microinstructions
22/12/2005
6:17 PM
Page 319
319
structure is much more complex that of the macroinstruction. Information in the microinstruction register is divided into four elds: next microinstruction address eld, microprogram counter load control eld, condition select eld, and CPU control eld. Most of the bits in the microinstruction register belong to the CPU control eld, which controls the ow of information within the CPU by enabling tri-state gates and clocking registers as weve described; for example, all the control signals in Table 7.11 belong to this eld. Our next task is to describe one of the principal differences between the micro- and macroinstruction. Each microinstruction is also a conditional branch instruction that determines the location of the next microinstruction to be executed. We will now explain how microinstructions are sequenced.
7.4.2 Microinstruction sequence control

If the microprogram counter were to step through the microprogram memory in the natural sequence, 0, 1, 2, 3, . . . etc., a stream of consecutive microinstructions would appear in the microinstruction register, causing the CPU to behave in the way described by Table 7.11. The CPU control bits of each microinstruction determine the ow of information within the CPU. However, just as in the case of the macroprogram control unit, it is often necessary to modify the sequence in which microinstructions are executed. For example, we might wish to repeat a group of microinstructions n times, or we may wish to jump from a fetch phase to an execute phase, or we may wish to call a (microinstruction) procedure. Microinstruction sequence control is determined by the three left-hand elds of the microinstruction register in Fig. 7.17, enabling the microprogram counter to implement both conditional and unconditional branches to locations within the microprogram memory. We shall soon see that this activity is necessary to execute macroinstructions such as BRA, BCC, BCS, BEQ, etc. In normal operation, the microprogram counter steps through microinstructions sequentially and the next microprogram address is the current address plus one. By loading
the contents of the next microinstruction address eld of the current microinstruction eld into the microprogram counter, a branch can be made to any point in the microprogram memory. In other words each microinstruction determines whether the next microinstruction is taken in sequence or whether it is taken from the next address eld of the current microinstruction. The obvious question to ask is, What determines whether the microprogram counter continues in sequence or is loaded from the next microinstruction address eld of the current microinstruction? The microprogram load control eld in the microinstruction register tells the microprogram counter how to get the next microinstruction address. This next address can come from the incrementer and cause the microprogram to continue in sequence. The next address can also be obtained from the address mapper (see below) or from the address in the next microinstruction address eld of the microinstruction register. The condition select eld in the microinstruction register implements conditional branches at the macroinstruction level by executing a conditional branch at the microinstruction level. In the simplied arrangement of Fig. 7.17, the condition select eld directly controls a 4-to-1 multiplexer that selects one of four ag bits representing the state of the CPU. These ag bits are obtained from the ALU and are usually the ag bits in the condition code register (e.g. Z, N, C, V). The condition select eld selects one of these ag bits for testing (in this example only the Z-bit is used). If the output of the multiplexer is true, a microprogram jump is made to the address specied by the contents of the next microinstruction address eld, otherwise the microprogram continues sequentially. In Fig. 7.17 two of the conditions are obtained from the CCR and two bits are permanently true and false. A false condition implies branch never (i.e. continue) and a true condition implies branch always (i.e. goto). To emphasize what weve just said, consider the hypothetical microinstruction of Fig. 7.18. This microinstruction is interpreted as:
IF Z = 1 THEN [PC] ADD3 ELSE [PC] [PC] + 1
where PC indicates the microprogram counter. A conditional branch at the macroinstruction level (e.g. BEQ) is interpreted by microinstructions in the following
Location of the next microinstruction to execute if a branch is taken Next address field ADD3
This field selects the This field determines where condition to be tested when making a the next microinstruction conditional branch address comes from
This field provide the CPU control signals that select source and destination registers, control buses, and determine the ALU function
Load control Conditional
Conditional select Branch on zero
CPU control fields
Figure 7.18 Structure of a microinstruction.
22/12/2005
6:17 PM
Page 320

way. The condition select eld of the microinstruction selects the appropriate status bit of the CCR to be tested. For example, if the macroinstruction is BEQ the Z-bit is selected. The microprogram counter load control eld contains the operation branch to the address in the microinstruction register on selected condition true. Thus, if the selected condition is true (i.e. Z 1), a jump is made to a point in the microprogram that implements the corresponding jump in the macroprogram. If the selected condition is false (i.e. Z 0), the current sequence of microinstructions is terminated by the start of a new fetchexecute cycle. some additional logic and a microprogram in ROM, a CPU with a user-dened instruction set and wordlength may be created. Of course, the designer doesnt have to construct a new CPU out of bit-slice components. You can emulate an existing microprocessor or even add machine-level instructions to enhance it. Figure 7.19 describes a typical bit-slice arithmetic logic unit that can generate one of eight functions of two inputs R and S. These functions vary from R plus S to the exclusive NOR of R and S. The values of R and S may be selected from a register le of 16 general-purpose data registers, an external input, a Q register, or zero. The bit-slice ALU is controlled (i.e. programmed) by a 9-bit input, which selects the source of the data taking part in an arithmetic or logical operation, determines the particular operation to be executed, and controls the destination (together with any shifting) of the result. Typical ALU operations are
[R7] [R6] [R9] [R7] [R7] + [R1] [R6] - [R5] [R9][R2] [R7] + 1
Implementing the fetchexecute cycle

The rst part of each microprogram executed by the control unit corresponds to a macroinstruction fetch phase that ends with the macroinstruction op-code being deposited in the instruction register. The op-code from the instruction register is rst fed to the address mapper, which is a look-up table containing the starting address of the microprogram for each of the possible op-codes. That is, the address mapper translates the arbitrary bit pattern of the op-code into the location of the corresponding microprogram that will execute the op-code. After this microprogram has been executed, an unconditional jump is made to the start of the microprogram that interprets the macroinstruction execute phase, and the process continues.
7.4.3 User-microprogrammed processors

Before the advent of todays powerful microprocessors, engineers in the 1980s requiring high performance sometimes constructed their own microprogrammed computers; that is, the engineer designed a CPU to their own specications. This was fun because you could create your own architecture and instruction set. On the other hand, you ended up with a computer without an off-the-shelf operating system, compilers, or any of the other tools you take for granted when you use a mainstream CPU. At the heart of many of these systems was the bit-slice component, which provided a middle path between microcomputer and mainframe. Bit-slice components, as their name suggests, are really subsections of a microprocessor that can be put together to create a custom CPU. For example, a 64-bit computer is made by putting together eight 8-bit bit-slice chips. Bit-slice components are divided into two types corresponding to the functional division within the microprocessor (i.e. the microprogram control and ALU). By using several ALU and microprogram controller bit-slices plus
An arithmetic unit of any length (as long as it is a multiple of 4) is constructed by connecting together bit-slice ALUs. Designers can use the ALUs internal registers in any way they desire. For example, they may choose to implement eight addressable data registers, two stack pointers (described later), two index registers, a program counter, and three scratchpad registers. Flexibility is the most powerful feature of bit-slice microprocessors. This description of the microprogrammed control unit is highly simplied. In practice the microprogram might include facilities for dealing with interrupts, the memory system, input/output, and so on. One of the advantages of a microprogrammed control unit is that it is possible to alter the content of the microprogram memory (sometimes called the control store) and hence design your own machine-level instructions. In fact it is perfectly possible to choose a set of microprograms that will execute the machine code of an entirely different computer. In this case the computer is said to emulate another computer. Such a facility is useful if you are changing your old computer to a new one whose own machine code is incompatible with your old programs. Emulation applies to programs that exist in binary (object) form on tape or disk. By writing microprograms (on the new machine) to interpret the machine code of the old machine, you can use the old software and still get the advantages of the new machine. One of the greatest problems in the design of a bit-slice computer lies in the construction and testing of the
22/12/2005
6:17 PM
Page 321

Literal data from control unit or main store
321
RAM0
RAM shifter RAM 3 Q0 Q shifter Qn
4-bit register selects inputs from control unit
Data input RAM 16 addressable registers A data B data output output
Q Q register
Logical zero 0 A B D ALU data source selector Q
R Carry_in 8-function ALU
Carry Zero Sign Overflow
C Z N V
Output data selector Y
Bit slice control functions: ALU control, source and destination control
Data output to memory address register or memory data register
Figure 7.19 The microprogrammed ALU.
microprogram. You can, or course, write a program to emulate the bit-slice processor on another computer. A popular method of developing a microprogram is to replace the microprogram ROM with read/write memory and to access this memory with a conventional microprocessor. That is, the microprogram memory is common to both the bit-slice system and the microprocessor. In this way, the microprocessor can input a microprogram in mnemonic form, edit it, assemble it, and then pass control to the bit-slice system. The microprocessor may even monitor the operation of the bit-slice system. Such a microprogram memory is called a writable control store and once a writable control store was regarded as a big selling point of microprogrammed minicomputers and mainframes. However, we have already pointed out that a microprogrammable control store is of very little practical use due to the lack of applications
software. Even if a computer user has the expertise to design new microprogrammed macroinstructions, it is unlikely that the system software and compilers will be able to make use of these new instructions. Finally, RISC technology (as we shall see) does not use microprogramming and interest in microprogramming is much less than it once was. In the next chapter we look at how the performance of computers can be enhanced by three very different techniques. We begin with a brief introduction to the RISC revolution of the 1970s and 1980s and show how processors with regular instruction sets lend themselves to pipelining (the overlapping of instruction execution). We also look at cache memory and explain how a small quantity of very-high-speed random access memory can radically improve a computers performance. Finally, we describe the multiprocessora system that uses more than one processing unit to accelerate performance.
22/12/2005
6:17 PM
Page 322

I SUMMARY
We have taken a step back from the complex CPU architecture we described in the previous chapters and have looked at how a simple processor can read an instruction from memory, decode it, and execute it. We did this by considering the sequence of events that takes place when an instruction is executed and the ow of information within the computer. In principle, the computer is a remarkably simple device. The program counter contains the address of the next instruction to be executed. The computer reads the instruction from memory and decodes it. We have demonstrated that a typical instruction requires a second access to memory to fetch the data used by the instruction. We have demonstrated how a simple computer that can execute only instructions that load and store data or perform arithmetic operations can implement the conditional behavior required for loop and if . . . then . . . else constructs. The second part of this chapter looked at two ways of implementing a computers control unit. We started with a simple computer structure and demonstrated the control signals required to implement several machine-level instructions. Then we showed how you can use relatively simple logic and a timing sequencer to generate the signals required to interpret an instruction. Random logic control units are faster than their microprogrammed counterparts. This must always be so because the random logic control unit is optimized for its particular application. Moreover, a microprogrammed control unit is slowed by the need to read a microinstruction from the microprogram memory. Memory accesses are generally slower than basic Boolean operations. Microprogramming offers a exible design. As the microprogram lives in read-only memory, it can easily be modied at either the design or the production stage. A random logic control unit is strictly special purpose and cannot readily be modied to incorporate new features in the processor (e.g. additional machine-level instructions), and sometimes it is difcult to remove design errors without considerable modication of the hardware. The highpoint of microprogramming was the early 1970s when main memory had an access time of 12 s and the control store used to hold microprograms had an access time of 50100 ns. It was then sensible to design complex machine level instructions that were executed very rapidly as microcode. Today, things have changed and memories with access times of below 50 ns are the norm rather than the exception. Faster memory makes microprogramming less attractive because hard-wired random logic control units execute instructions much more rapidly than microcoded control units. Todays generation of RISC (reduced instruction set computers) and post-RISC architectures are not microprogrammed.
I PROBLEMS
7.1 Within a CPU, what is the difference between an address path and a data path? 7.2 In the context of a machine-level instruction, what is an operand? 7.3 What is a literal operand? 7.4 How does a computer know whether an operand in its instruction register is a literal or a reference to memory (i.e. an address)? 7.5 Why is the program counter a pointer and not a counter? 7.6 Explain the function of the following registers in a CPU: (a) (b) (c) (d) PC MAR MBR IR
7.7 What is the CCR? 7.8 Does a computer need data registers? 7.9 Some microprocessors have one general-purpose data register, some two, some eight, and so on. What do you think determines the number of such general-purpose data registers in any given computer? 7.10 What is the signicance of the fetchexecute cycle? 7.11 What is the so-called von Neumann bottleneck? 7.12 Design a computer (at the register and bus level) to implement a zero address instruction set architecture. 7.13 In the context of CPU design, what is a random logic control unit? What is the meaning of the word random in this expression? 7.14 What is a microprogrammed control unit? 7.15 Microprogramming has now fallen into disfavor. Why do you think this is so? 7.16 For the computer structure of Fig. 7.20, state the sequence of micro-operations necessary to carry out the following instruction. Assume that the current instruction is in the IR.
ADDsquare D0, D1
This instruction reads the contents of register D0, squares that value, and then adds it to the contents of register D1. The result
22/12/2005
6:17 PM
Page 323

Abus Read Write Address CMAR Data out Main store Data in EMSR GMSR B bus
323
MAR GMAR MBR EMBR GIR IR EIR GPC PC EPC GD0
CMBR
This computer has two internal buses A and B. All registers capture data from a bus when they are clocked. All tri-state gates can put data on a bus when they are enabled. The function of the ALU is controlled by inputs F2, F1, F0. This memory is controlled by a read signal and a write signal. The ALU performs an operation on one or both its P and Q inputs depending on the state of its control inputs. Data iputs for the ALU come from two registers, latch1 and latch 2.
CIR
CPC
CD0
D0
ED0 GD1
CD1 D1 CB_to_A EB_to_A ALU CALU f(P, Q) EALU Function select Q Latch 2 P CL1 Latch 1
ED1
F2,
F1,
F0
CL2
Figure 7.20 A microprogrammed CPU.
is put in register D1. The function codes F2,F1, and F0 are given below.
Assume that only one operand, A, is required by the instruction and that operands B and C are in the next consecutive two memory locations, respectively. 7.18 For the architecture of the hypothetical two-bus computer of Fig. 7.20, derive a microprogram to carry out the operation
Move D0,[D1]
F2 0 0 0 0 1 1
F1 0 0 1 1 0 0
F0 0 1 0 1 0 1
Operation Copy P to F Add P to Q Subtract Q from P Add 1 to P Add 1 to Q Multiply P by Q F=P F=P+Q F=PQ F=P+1 F=Q+1 F=Qx1
7.17 For the structure of Fig. 7.20 write a microprogram to implement the operation
D1=[A]+[B]+[C]+1
This operation copies the contents of register D0 into the memory location whose address is given by the contents of register D1. You should describe the actions that occur in plain English (e.g. Put data from this register on the B bus) and as a sequence of events (e.g. Read 1, EMSR). The table in Question 16 denes the effect of the ALUs function code. All data has to pass through the ALU to get from bus B to bus A. Note that the ALU has two input latches. Data has to be loaded into these latches before an ALU operation takes place.
22/12/2005
6:17 PM
Page 324

7.19 For the computer of Fig. 7.20, what is the effect of the following sequence of micro-operations?
ED0 ED0 F2,F1,F0 EMBR ED0 F2,F1,F0 = = = = = = 1, 1, 0,0,1, 1, 1, 0,0,1, CL1 CL2 CMBR CL1 CL2 CD1
Your answer should explain what each of the microoperations does individually. You should also state what these actions achieve collectively; that is, what is the effect of the equivalent assembly language operation?
EALU,
EALU,

Computer Basics

Uploaded by

Copyright:

Available Formats

Computer Basics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Basics

Uploaded by

Copyright:

Available Formats

07-CLements-Chap07.

Structure of the CPU

6 Assembly language programming

294 Chapter 7 Structure of the CPU

7.1 The CPU

7.1.1 The address path

PC Program counter Instruction address Incrementer

MAR Memory address register

Address of the next instruction

Control unit Clock

Data paths Address paths

Figure 7.1 The CPUs address paths.

7.1 The CPU

7.1.2 Reading the instruction

296 Chapter 7 Structure of the CPU

Table 7.1 The FETCH phase expressed in register transfer language.

7.1.3 The CPUs data paths

7.1 The CPU

Clock Data register D0

Arithmetic and logic units performs data processing operations

Figure 7.2 The CPUs address and data paths.

Table 7.2 Typical arithmetic and logical operations.

298 Chapter 7 Structure of the CPU

7.1.4 Executing conditional instructions

7.1 The CPU

Memory buffer register MBR

Control unit Data register D0

The arthimetic and logic unit performs data processing operations

Detail of the ALU's result are sent to the CCR

Figure 7.3 Information paths in the CPU and conditional instructions.

300 Chapter 7 Structure of the CPU

7.2 Simulating a CPU

7.2 Simulating a CPU

Memory buffer register MBR Literal to ALU

Control unit Data register D0

Condition code register

Figure 7.4 Modifying the CPU to deal with literal operands.

7.2.1 CPU with an 8-bit instruction

302 Chapter 7 Structure of the CPU

Data register D0 Clock Control unit

Figure 7.5 Processor with multiple registers.

7.2 Simulating a CPU

Operand Addressing mode 0 = absolute 1 = literal (immediate)

Figure 7.6 Format of an 8-bit instruction.

304 Chapter 7 Structure of the CPU

7.2.2 CPU with a 16-bit instruction

Figure 7.7 Format of the CPUs instruction.

7.2 Simulating a CPU

306 Chapter 7 Structure of the CPU

7.2 Simulating a CPU

308 Chapter 7 Structure of the CPU

7.3 The random logic control unit

7.3.1 Implementing a primitive CPU

Read Write Memory Address

When clocked a register latches data from the bus

ALU register f(P, Q)

ALU Q GALU EALU

Figure 7.8 Structure of a single-bus CPU.