Nothing Special   »   [go: up one dir, main page]

Computer Basics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

07-CLements-Chap07.

qxd

22/12/2005

6:17 PM

Page 293

Structure of the CPU


CHAPTER MAP
5 The instruction set architecture
The computers instruction set architecture (ISA) describes the low-level programmers view of the computer and denes the type of operations a computer carries out. We are interested in three aspects of the ISA: the nature of the instructions, the resources such as registers and memory used by the instructions, and the ways in which the instructions access data (addressing modes).

7
7 Structure of the CPU
Chapters 5 and 6 describe what a computer does; in this chapter we show how a computer converts an instruction op-code into the actions that implement the op-code. We build on some of the material we covered in the section on digital logic. In this chapter we demonstrate how a computer is organized internally and how it reads instructions from memory, decodes them, and executes them.

6 Assembly language programming


Chapter 6 builds on the previous chapter by demonstrating how instructions are used to construct entire programs. We introduce the programming environment via a simulator that runs on a PC and demonstrate how to implement some basic algorithms.

8 Accelerating performance
The previous chapter described the structure of a simple computer. Here we describe how the performance of computers can be increased by overlapping the execution of instructions (pipelining). We also look at cache memory and introduce parallel processing.

INTRODUCTION
In Chapters 2 and 3 we introduced combinational and sequential logic elements and demonstrated how to build functional circuits. In Chapters 5 and 6 we introduced the instruction set architecture and low-level programming. This chapter bridges the gap between digital circuits and the computer by demonstrating how we can construct a computer from simple circuits; that is, we show how a computer instruction is interpreted (i.e. executed). We begin by describing the structure of a simple generic CPU. Once we see how a computer operates in principle, we can look at how it may be implemented. We describe the operation of a very simple one-and-a-half address machine whose instructions have two operands; one in memory and one a register. Instructions are written in the form ADD A, B that adds A to B and puts the result in B. Either A or B must be a register. Some readers will read this introduction to the CPU before the previous two chapters on assembly language programming. Consequently, some topics will be re-introduced. Instead of introducing the computer all at once, we will keep things simple and build up a CPU step by step. This approach helps demonstrate how an instruction is executed because the development of the computer broadly follows the sequence of events taking place during the execution of an instruction. In the next chapter we will nd that this computer is highly simplied; real computers dont execute an instruction from start to nish. Todays computers overlap the execution of instructions. As soon as one instruction is fetched from memory, the next instruction is fetched before the previous instruction has completed its execution. This mechanism is called pipelining and we examine it more closely in the next chapter.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 294

294 Chapter 7 Structure of the CPU

7.1 The CPU


A von Neumann stored program digital computer operates by reading instructions from memory and executing them one by one in sequence. If you wish to evaluate X2 1 on a 68K processor where X is a memory location, you may write

by step, on a very primitive hypothetical computer. We begin with the address paths that are used to locate the next instruction to be executed.

7.1.1 The address path


Figure 7.1 provides the block diagram of part of a CPU. In this diagram only the address paths and the paths needed to read an instruction from memory are shown for clarity. We have omitted most of the data paths required to execute instructions to avoid clutter. Address paths are shown in blue and data paths are in light blue. The address paths are the highways along which addresses ow from one part of the CPU to another. An address represents the location of a data element within the memory. There are three types of information ow in a computer: address,

We now demonstrate how instructions like MOVE. W X,DO are read from memory and how the sequence of actions that implement this operation are carried out inside the CPU. Because the CPU is such a complex device, we have decided to walk through the execution of an instruction, step
A HYPOTHETICAL COMPUTER Anyone describing the internal operation of a computer must select an architecture and an organization for their target machine. We have two choices: register to memory or register to register. The register-to-memory model ts the architecture of processors like the Pentium and 68K, whereas the register-to-register model corresponds to processors like the ARM, which we introduce later. When describing the internal structure of a computer, we could

describe either a system that executes an instruction to completion before beginning the next instruction, or a computer that overlaps or pipelines the execution of instructions. I have decided to begin this chapter with the description of a register-to-memory, non-pipelined processor. A non-pipelined organization is easier to describe than one that overlaps the execution of instructions.

PC Program counter Instruction address Incrementer

MAR Memory address register

Address of the next instruction

Address

The first stage in the execution of any instruction is to fetch it from memory. The program counter contains the address of the next instruction to be executed

The contents of the PC are incremented each time it is used Instruction op-code

Memory

Data

The main memory or immediate access store contains both instructions and data MBR

IR

Op-code

Address

Memory buffer register When you read an instruction, it is moved to the instruction register where it is decoded

Control unit Clock

Data paths Address paths

When you read from memory, the contents of the memory location accessed are loaded into the MBR

Control signals

Figure 7.1 The CPUs address paths.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 295

7.1 The CPU

295

CONSTANTS, VARIABLES, POINTERSA REMINDER A constant or literal is a value that does not change during the execution of a program. Its value is set at compile time or assemble time. Literal addressing is used to handle constants. A variable is a value that changes during the execution of a program. A pointer is a variable that contains the address of a variable; that is, a pointer points to a variable in memory. An address register is a pointer register. The following code implements the expression X 5 Y Z3 and illustrates the nature of constants, variables, and pointers.

data, and control. Data comprises the instructions, constants, and variables that are stored in memory and registers. Control paths comprise the signals that trigger events, provide clocks, and control the ow of data and addresses throughout the computer.

a write cycle, or from which data is being read in a read cycle. At this stage, the MAR contains a copy of the (previous) contents of the PC. When a memory read cycle is performed, the contents of the memory location specied by the MAR are read from the memory and transferred to the memory buffer register (MBR). We can represent this read operation in RTL terms as

7.1.2 Reading the instruction


Before the CPU can execute an instruction, the instruction must rst be brought to the CPU from the computers memory. We begin our description of the way in which a program is executed with the CPUs program counter (also called instruction counter or location counter). The expression program counter is a misnomer. The program counter contains the address of the next instruction in memory to be executed and it doesnt count programs or anything else. The program counter points to the next instruction to be executed. If, for example, [PC] 1234 (i.e. the PC contains the number 1234), the next instruction to be executed is to be found in memory location 1234. Fetching an instruction begins with the contents of the program counter being moved to the memory address register (i.e. [MAR] [PC]). Once the contents of the program counter have been transferred to the memory address register, the contents of the program counter are incremented and moved back to the program counter; that is, We interpret the expression [[MAR]] as the contents of the memory whose address is given by the contents of the MAR. The memory buffer register is a temporary holding place for data received from memory in a read cycle, or for data to be transferred to memory in a write cycle. Some texts refer to the MBR as the memory data register (MDR). At this point in the execution of an instruction, the MBR contains the bit pattern of the instruction to be executed. The instruction is next moved from the MBR to the instruction register (IR) where it is divided into two elds. One eld in the IR contains the operation code (op-code) that tells the CPU what operation is to be carried out. The other eld, called the operand eld, contains the address of the data to be used by the instruction. The operand eld can also provide a constant to be employed by the operation code when immediate or literal addressing is used. Note that we are assuming a one-address instruction format. The control unit (CU) takes the op-code from the instruction register together with a stream of clock pulses and generates signals that control all parts of the CPU. The time between individual clock pulses is in the range 0.3 ns to 100 ns (i.e. 3 1010 to 107s). The control unit is responsible for moving the contents of the program counter into the MAR, executing a read cycle, and moving the contents of the MBR to the IR. Later we look at the control unit in more detail and demonstrate how it goes about interpreting an op-code.
1 Remember that the 68K has variable-length instruction and its PC is incremented by 2, 4, 6, 8, or 10; that is instruction fall on 16-bit boundaries For the purpose of this chapter, we assume that instructions are all 4 bytes long.

At the end of this operation, the program counter points to the next instruction while the current instruction is being executed. The program counter in incremented by 4 rather than 1 because many of todays high-performance computers have 32-bit instructions. Computers are byte-addressed but have 32-bit architectures; that is, they can address individual bytes in memory, although instructions and data normally fall on 32-bit (i.e. 4-byte) boundaries.1 The memory address register (MAR), holds the address of the location in the memory into which data is being written in

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 296

296 Chapter 7 Structure of the CPU

THE INSTRUCTION REGISTER A modern RISC processor like the ARM, which we introduce in Chapter 9 has a 32-bit instruction register which is sufcient for an op-code and three register addresses; for example, ADD R1, R2, R3. Processors like the Pentium and 68K have very complex variable-length instruction formats and you cannot use a simple xed-length instruction register. The processor has to fetch the rst 16 bits of an instruction from memory, decode it, and execute successive fetches to assemble the rest of the instruction.

Table 7.1 The FETCH phase expressed in register transfer language.

All instructions are executed in a two-phase operation called a fetchexecute cycle. During the fetch phase, the instruction is read from memory and decoded by the control unit. The fetch phase is followed by an execute phase in which the control unit generates all the signals necessary to execute the instruction. Table 7.1 describes the sequence of operations taking place in a fetch phase. In Table 7.1 FETCH is a label that serves to indicate a particular line in the sequence of operations. The notation IR(op-code) means the operationcode eld of the instruction register.

7.1.3 The CPUs data paths


Now that weve sorted out the fetch phase, lets see what else we need to actually execute instructions. Figure. 7.2 adds new data paths to the simplied CPU of Fig. 7.1, together with an address path from the address eld of the instruction register to the memory address register. Other modications to Fig. 7.1 included in Fig. 7.2 are the addition of a data register, D0, and an arithmetic and logical unit, ALU. The data register called D0 holds temporary results during a calculation. You need a data register in a one-address machine because dyadic operations with two operands such as ADD X,D0 take place on one operand specied by the instruction and the contents of the data register. This instruction adds the contents of memory location X to the contents of the data register and deposits the result of the addition in the data register, destroying one of the original operands. The arrangement of Fig. 7.2 has only one general-purpose data register, which weve called D0 for compatibility with the 68K we used in the previous chapters. Some rst-generation 8-bit microprocessors had only one general-purpose data register, which was called the accumulator. We can represent an ADD X,D0 instruction2 by the RTL expression

The arithmetic and logic unit (ALU) is the workhorse of the CPU because it carries out all the calculations. Arithmetic and logical operations are applied either to the contents of the data register or MBR alone, or to the contents of the data register and the contents of the MBR. The output of the ALU is fed back to the data register or to the MBR. Two types of operation are carried out by the ALUarithmetic and logical. The fundamental difference between arithmetic and logical operations is that logical operations dont generate a carry when bit ai of word A and bit bi of B are operated upon. Table 7.2 provides examples of typical arithmetic and logical operations. A logical shift treats an operand as a string of bits that are moved left or right. An arithmetic shift treats a number as a signed twos complement value and propagates the sign bit during a right shift. Most of these operations are implemented by computers like the 68K, Pentium, and ARM. Having developed our computer a little further, we can now execute an elementary program. Consider the high-level language operation P Q R. Here the plus symbol means arithmetic addition. The assembly language program required to carry out this operation is given below. Remember that P, Q, and R are symbolic names that refer to the locations of the variables in memory.
Load data register D0 with the contents of memory location Q3 ADD R,D0 Add the contents of memory location R to data register D0 STORE D0,P Store the contents of data register D0 in memory location P LOAD Q,D0
2 A machine with a single accumulator does not need to specify it explicitly; for example, an 8-bit microprocessor may use ADD P to indicate add the contents of P to the accumulator. We write ADD P,DO and make D0 explicit to be consistent with the notation we used for 68K instructions. 3 We have dened explicit LOAD and STORE operations to be consistent with the CPU we construct later in this chapter. The 68K uses a single MOVE operation to indicate LOAD and STORE

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 297

7.1 The CPU


PC Program counter MAR Memory address register Address of next location to be accessed

297

Incrementer Operand address to memory address register Instruction read from memory

Address

Memory

Data

IR

Op-code

Address

Memory buffer register MBR Data read from memory or to be written to memory

Control unit

Clock Data register D0

Control signals Output from ALU goes to D0 or MBR Address paths A Data is transmitted to the ALU where it is operated on

f(A,B)

ALU B

Data paths

Arithmetic and logic units performs data processing operations

Figure 7.2 The CPUs address and data paths.

Operation Addition Subtraction Negation Multiplication Division Divide by 2 Multiply by 2 AND OR NOT EOR Shift left Shift right

Class Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Arithmetic Logical Logical Logical Logical Logical Logical

Typical mnemonic ADD SUB NEG MULU DIVU ASR ASL AND OR NOT EOR LSL LSR

The one-address machine requires a rather cumbersome sequence of operations just to carry out the simple action of adding two numbers. If we had a computer with a three-address format, we could have written
ADD Q,R,P Add the contents of Q to the con-

tents of R and put the result in P Three-address machines are potentially4 faster than one-address machines, because they can do in one instruction things that take other machines three operations. However, the power of threeaddress machines can be achieved only by means of a more complex and expensive CPU and memory system. We can demonstrate how the CPU operates by examining the execution of ADD R, D0 in terms of register transfer language. Table 7.3

Table 7.2 Typical arithmetic and logical operations.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 298

298 Chapter 7 Structure of the CPU

Move the contents of the PC to the MAR Increment the contents of the PC Read the current instruction from the memory Move the contents of the MBR to the IR Move the op-code from the IR to the CU Move the operand address to the MAR Read the data from memory Perform the addition Move the output of ALU to the data register Operations sharing the same line are executed simultaneously. ALU [MBR] and ALU [D0] are executed simultaneously.

Table 7.3 Expressing the FETCH/EXECUTE cycle for an ADD instruction in RTL.

MICROPROGRAMMING The terms microprogram, microprogramming, microcode, microinstruction, and micro-operation have nothing to do with microprocessor or microcomputer. A micro-operation is the smallest event that can take place within a computer; typical micro-operations are the clocking of data into a register, a memory read operation, putting data on a bus, or adding two numbers together. A microprogram consist of a sequence of microinstructions that, when executed, implement a machine-level instruction. For example, the machine-level or macro-level operation ADD P,D0 can be implemented by executing a sequence of micro-operations. The instructions that comprise a microprogram are called microcode.

gives the sequence of operations carried out during the fetch and execute phases of an ADD R, DO instruction. These operations tell us what is actually going on inside the computer. During the fetch phase the op-code is fed to the control unit by CU [IR(op-code)] and used to generate all the internal signals required to perform the additionthis includes programming the ALU to do addition by adding together the data at its two input terminals to produce a sum at its output terminals. Operations of the form [PC] [MAR] or [D0] [D0] [MBR] are often referred to as microinstructions. Each assembly level instruction (e.g. LOAD, ADD) is executed as a series of microinstructions. Microinstructions and microprogramming are the province of the computer designer. In the 1970s some machines were user-microprogrammable; that is, you could dene your own instruction set. We take a further look at microinstructions later in this chapter.

sequential mode; that is, the computer can execute only a stream of instructions, one by one in strict order. We covered conditional behavior in the previous chapter and we require a means of implementing instructions such as BEQ Target (branch on zero ag set to Target). The computer in Fig. 7.2 lacks a mechanism for making choices or repeating a group of instructions. To do this, the CPU must be able to execute conditional branches or jumps such as BEQ XYZ. Weve already met the branch that forces the CPU to execute an instruction out of the normal sequence. The block diagram of Fig. 7.3 shows the new address and data paths required by the CPU to execute conditional branches. Three items have been added to our computer in Fig. 7.3:
G

a condition code register, CCR a path between the CCR and the control unit a path between the address eld of the instruction register and the program counter.

7.1.4 Executing conditional instructions


So far, weve considered the architecture of the single-instruction single-data CPU capable of executing programs in a purely

4 A three-address machine is faster than a machine with registerto-register instructions only if the access time of memory is comparable to the access time of register. Memory takes much longer to access than internal registers and, therefore, a three-address machine is not currently practical.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 299

7.1 The CPU


PC Program counter MAR Memory address register

299

Incrementer

Address The next instruction may be the next instruction in sequence or the instruction at the branch target address

Memory

The IR's operand field provides the branch target address IR Op-code Address

Data

Memory buffer register MBR

Clock

Control unit Data register D0

Control signals

The control units uses the CCR output to decide whether to execute the next instruction or to force a branch

f(A,B)

ALU B

The arthimetic and logic unit performs data processing operations

CCR Condition code register program flow control paths The word in the CCR indicates whether the last operation gave a zero or a negative result or whether a carry was generated

Detail of the ALU's result are sent to the CCR

Figure 7.3 Information paths in the CPU and conditional instructions.

The condition code register or processor status register takes a snapshot of the state of the ALU after each instruction has been executed and records the state of the carry, negative, zero, and overow ag bits. A conditional branch instruction interrogates the CCRs current state. The control unit then either forces the CPU to execute the next instruction in series or to branch another instruction somewhere in the program. Lets look at the details of the conditional branch. The CPU updates the bits of its condition code register after it carries out an arithmetic or a logical operation to reect the nature of the result. The following is a reminder of the operations of the condition code bits.

C carry Set if a carry was generated in the last operation. The C-bit is, of course, the same as the carry bit in the carry ip-op. Z zero Set if the last operation generated a zero result. N negative Set if the last result generated a negative result. V overow Set if the last operation resulted in an arithmetic overow; that is, an operation on one or two twos complement values gave a result that was outside its allowable range (an arithmetic overow occurs during addition if the sign bit of the result is different from the sign bit of both operands).

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 300

300 Chapter 7 Structure of the CPU


The condition code register is connected to the control unit, enabling instructions to interrogate the CCR. For example, some instructions test whether the last operation performed by the central processor yielded a positive result, or whether the carry bit was set, or whether arithmetic overow occurred. Theres no point in carrying out an interrogation unless the results are acted upon. We need a mechanism that does one thing if the result of the test is true and does another thing if the result of the test is false. The nal modication included in Fig. 7.3 is the addition of a path between the operand eld (i.e. target address) of the instruction register and the program counter. Its this feature that enables the computer to respond to the result of its interrogation of the CCR. A conditional branch instruction tests a bit of the CCR and, if the bit tested is clear, the next instruction is obtained from memory in the normal way. But if the bit tested is set, the next instruction is obtained from the location whose target address is in the instruction register. In the above description we said that a branch is made if a certain bit of the CCR is set; equally a branch can be made if the bit is clear (branches can also be made on the state of several CCR bits). The precise way in which conditional branches are actually implemented inside the computer is discussed later when we deal with the design of the control unit. Branch operations can be expressed in register transfer language in the form memory. Sometimes we wish to use instructions such as ADD #12,D0, where the source operand supplies the actual value of the data being referred to by the op-code part of the instruction. Although the symbol # appears as part of the operand when this instruction is written in mnemonic form, the assembler uses a different op-code code for ADD #literal,D0 than it does for ADD address,D0. The instruction ADD.B #12,D0 is dened in RTL as

Figure 7.4 shows that an additional data path is required between the operand eld of the IR and the data register and ALU to deal with literal operands. In fact, the architecture of Fig. 7.4 can execute any computer program. Any further modications to this structure improve the CPUs performance without adding any fundamentally new feature. Figure 7.5 completes the design of the computer. We have added a second general-purpose data register D1 and a pointer register A0. In principle, there is nothing stopping us adding any number of registers. As you can see, three buses, A, B, and C are used to transfer data between the registers and ALU. The structure of Fig. 7.5 can implement instructions with more complex addressing modes than the simple direct (absolute) addressing we have used so far; for example MOVE (A0),D1 can be implemented by the following sequence of micro-operations.
Move source operand address to MAR Read the actual operand from memory Copy data to D1

Typical machine-level conditional operations expressed in RTL are 1. Branch on carry clear (jump to the target address if the carry bit in the CCR is 0)
BCC target: IF [C]= 0 THEN [PC][IR(address)]

2. Branch on equal (jump to the target address if the Z-bit in the CCR is 1)
BEQ target: IF [Z]= 1 THEN [PC][IR(address)]

An example of a conditional branch is as follows. Subtract X from contents of D0 If the result was zero then branch to Last, otherwise continue Target address of branch (if taken)

This sequence has been simplied because, you will see from Fig. 7.5, that there is no direct path between register A0 and the MBR. You would have to put the contents of A0 onto bus A, pass the contents of bus A through the ALU to bus C, and then copy bus C to the MAR. We will return to this theme when we look at the detailed design of computers. We have now demonstrated the ow of information that takes place during the execution of a single address computer instruction. In the next section we reinforce some of the things we have covered by showing how you can simulate a computer architecture in C.

7.2 Simulating a CPU


7.1.5 Dealing with literal operands
The CPU of Fig. 7.2 assumes all instructions (e.g. ADD and BEQ etc.) refer to an operand somewhere within the CPUs One way of learning how a processor operates is to build one. We present a program in C that simulates a very simple 8-bit CPU. In order to make the simulator as accessible to as many readers as possible, we have written the simulator in C but

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 301

7.2 Simulating a CPU


PC Program counter MAR Memory address register

301

Incrementer

Address

Memory

Data

IR

Op-code

Operand

Memory buffer register MBR Literal to ALU

Clock

Control unit Data register D0

Control signals The operand field of the IR is fed to the data register or ALU to provide a literal operand The literal from the instruction register can by loaded into the data register Literal data paths CCR

A f(A,B)

ALU B The literal from the instruction register can be supplied to the ALU

Condition code register

Figure 7.4 Modifying the CPU to deal with literal operands.

have avoided all but Cs most basic elements. All data types are 8 bits and the only C constructs we use are the while, the if . . . then . . . else, and the switch constructs, which select one of several courses of action. We are going to construct two simulatorsthe rst is a very primitive CPU with an 8-bit instruction that simply demonstrates the fetch/execute cycle, and the second is not too dissimilar to typica rst-generation 8-bit microprocessors.

branch instructions. Only the store instruction performs a write to memory. Choosing an instruction set requires many compromises; for example, if the number of bits in an instruction is xed, increasing the number of different instructions reduces the number of bits left for other functions such as addressing modes or register selection. We can dene an instruction set for our primitive 8-bit machine as

7.2.1 CPU with an 8-bit instruction

Instruction

Mnemonic
LDA STA ADD BRA BEQ N N N N N

RTL definition
[D0] [N] [N] [D0] [D0] [D0] + [N] [PC] N IF [D0] = 0 THEN [PC] N

Load D0 from memory Store D0 in memory Our rst computer has a single data register (i.e. Add memory to D0 accumulator) called D0 and all instructions are Branch to location N memory to register apart from the store and the If [D0] = 0 then branch to N

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 302

302 Chapter 7 Structure of the CPU


PC Program counter MAR Memory address register

Incrementer

Address

Memory Path between operand field of IR and bus A to literals MBR Op-code Address Memory buffer register

Data

Data register D0 Clock Control unit

Data register D1

Address register A0 Control signals A Bus A The output from the ALU is fed via bus C to the registers f(A,B) Bus C B Bus B CCR Condition code register ALU

Figure 7.5 Processor with multiple registers.

We have provided only ve instructions because these are illustrative of all instructions. This computer has an 8-bit instruction format that includes both the op-code and the operand. If we chose a 3-bit op-code (eight instructions) and a 4-bit operand (a 16-bit memory), the remaining bit can be used to specify the addressing mode (absolute or literal). Real 8-bit microprocessors solve the problem of instruction set design by using 1 byte to provide an operation code and then 0, 1, or 2 succeeding bytes to provide an operand.

Figure 7.6 denes the structure of an 8-bit instruction for our simulated machine. The rst step in constructing a simulator is to describe the action of the computer in pseudocode.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 303

7.2 Simulating a CPU


8-bit instruction format 7 6 5 4 3 2 1 0

303

Op-code 000 = LDA 001 = STA 010 = ADD 011 = BRA 100 = BEQ

Operand Addressing mode 0 = absolute 1 = literal (immediate)

Figure 7.6 Format of an 8-bit instruction.

C allows you to operate on individual bits of a byte; for example, the operator n performs a right shift by n bits. The opcode is obtained from the three most-signicant bits of the IR by shifting right ve times. A bitwise logical AND can be performed between a variable and a hexadecimal value; for example IR & 0x0F ANDs the IR with 000011112 to extract the operand bits in the four least-signicant bit positions. Once weve extracted the addressing mode (bit 4 of the instruction register) with amode (IR & 0x10) 4, we can calculate the source operand for the load and add instruction by

We can now write a program to implement this algorithm. The following fragment of code is largely self-explanatory. The instruction in the 8-bit instruction register (IR) is decoded by the three operations

The following listing provides the C code for this CPU simulator.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 304

304 Chapter 7 Structure of the CPU

Most of the work done in the simulator takes place in the switch construct at the end of this program where each instruction is interpreted.

Instruction 7 6 5 4 3 2 1 0 7 6

Operand 5 4 3 2 1 0

Op-code

7.2.2 CPU with a 16-bit instruction


We now describe a CPU that is much closer to the architecture of typical 8-bit microprocessors. The simulator uses an 8-bit memory with 256 locations. Each instruction occupies two consecutive memory locationsan 8-bit instruction followed by an 8-bit operand. This arrangement provides us with a much richer instruction set than the previous example. However, each fetch cycle requires two memory accesses. The rst access is to fetch the op-code and the seconds to fetch the operand; that is;
Bit 3 not used

Addressing mode 00 = absolute 01 = literal (immediate) 10 = indexed 11 = PC relative Direction 0 = register to memory 1 = memory to register

Figure 7.7 Format of the CPUs instruction.

provided program counter relative addressing (discussed in the next chapter) in which the operand is specied with respect to the current value of the program counter; for example, MOVE D0,12(PC) means store the conCopy contents of the PC to the MAR tents of data register D0 12 bytes on from the locaIncrement contents of the PC tion pointed at by the program counter. Read the instruction from memory The instruction itself is divided into four elds, Move the instruction to the IR as Fig. 7.7 demonstrates. A 4-bit op-code in bits 7, Save the op-code 6, 5, 4 provides up to 16 instructions. A 2-bit Copy contents of the PC to the MAR addressing mode in bits 1, 0 selects the way in Increment contents of the PC which the current operand is treated. When the Read the operand from memory addressing mode is 00, the operand provides the Move the operand to the IR address of the data to be used by the current Save the operand instruction. When the addressing mode is 01 the operand provides the actual (i.e. literal) operand. Modes 10 This multibyte instruction format is used by 8-bit and and 11 provide indexed and program counter relative 16-bit microprocessors. Indeed, the 68K has one 10-byte addressing respectively (i.e. the operand is added to the A0 register or the PC, respectively). instruction. Bit 2 of the instruction is a direction bit that determines The architecture of this computer is memory to register or register to memory; for example, it supports both ADD D0,M whether the source operand is in memory or is provided by and ADD M, D0 instructions. In addition to the direct and lit- the data register; for example, the difference between eral addressing modes, we have provided address register MOVE D0,123 and MOVE 123,D0 is determined by the value indirect addressing with a single A0 register. We have also of the direction bit.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 305

7.2 Simulating a CPU

305

We can express the basic fetch cycle and decode instruction phase in C as

Each instruction is executed by means of the switch construct. Note that the CCR has only a zero ag (it would have been more complex to have provided a C- and V-bit). The following provides the complete code for the processor.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 306

306 Chapter 7 Structure of the CPU

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 307

7.2 Simulating a CPU

307

Now that we have examined the sequence of events that take place during the execution of an instruction, the next step is to demonstrate how the binary code of an instruction is translated into the actions that

implement the instruction. In the next two sections we describe two different types of control unit; the microprogrammed control unit and the so-called random logic control unit.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 308

308 Chapter 7 Structure of the CPU

7.3 The random logic control unit


Weve demonstrated how you can write a program using assembly language instructions. Weve shown how an assembly language instruction can be reduced to the ow of information between registers and functional units in a CPU. Weve shown how register, counters, and logical units can be created from primitive gates and ip-ops. Whats left? What we have not yet demonstrated is how a binary pattern such as, say, 11011011101010012 can be turned into the sequence of operations that causes a machine-level operation like ADD $A9,D0 to be executed. We now make good this omission and complete the nal link in the chain between gate and computer. There are two ways of transforming an instruction into the operations that interpret it. One is to create the logic that directly transforms instructions into control signals. The other is to design a special computer that takes a machinelevel instruction as a program and generates control signals as an output. We rst describe the random logic unit, which uses gates and ip-ops to generate control signals. When engineers design a random logic control unit (RALU) they ask What sequence of microinstructions is needed

to execute each machine code instruction and what logic elements do we need to implement them? In other words, designers resort to the Boolean techniques we described in Chapter 2. The word random in the expression random logic element is employed in a specic sense and implies that the arrangement of gates from which the control unit is constructed varies widely from computer to computer. The same microprogrammed control unit can readily be adapted to suit many different computers with relatively little modication, whereas the random logic control unit is dedicated to a specic CPU and cannot easily be modied.

7.3.1 Implementing a primitive CPU


Before designing a random logic control unit, lets look at the structure of a very simple CPU in order to determine what control signals we need to synthesize. Figure 7.8 illustrates a computer with a single bus that is connected to all registers and the memory. This arrangement reduces the amount of logic required to implement the processor, but it is inefcient because only one word at a time can be copied from a source to a destination. The ALU has two inputs P and Q. The P input comes only from data register D0 and the Q input
GMSR System bus

R W

Read Write Memory Address

Data out EMSR Data in Each tri-state buffer puts data on the bus when enabled

CMAR

MAR GMBR

CMBR

MBR

EMBR GIR

When clocked a register latches data from the bus

CIR

IR

EIR GPC

Bus

CPC

PC

EPC GDO The inputs to the ALU are fro register D0 and the system bus The output of the ALU is a function of inputs P and Q and is determined by control signals F0 and F1. The result is latched into the ALU register by CALU

CDO

D0

EDO P

F0 and F1 select the ALU F0 function and CALU clocks F the ALU output into the 1 C ALU register ALU

ALU register f(P, Q)

ALU Q GALU EALU

Figure 7.8 Structure of a single-bus CPU.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 309

7.3 The random logic control unit

309

Signal number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Signal R W CMAR CMBR CPC CIR CD0 CALU EMSR EMBR EIR EPC ED0 EALU F0 F1

Type Memory control Memory control Register clock Register clock Register clock Register clock Register clock Register clock Bus control Bus control Bus control Bus control Bus control Bus control ALU control ALU control

Operation Read from memory Write to memory Clock data into MAR Clock data into MBR Clock data into program counter Clock data into instruction register Clock data into data register Clock data into ALU register Gate data from the memory onto the bus Gate data from the MBR onto the bus Gate operand from IR onto the bus Gate program counter onto the bus Gate data register onto the bus Gate ALU funtion register onto the bus Select ALU function bit 0 Select ALU funtion bit 1

Table 7.4 The control signals in the CPU of Fig. 7.8.

comes only from the system bus. Note that this structure allows the memory to transfer data directly to or from any register; that is, all data does not have to pass through the MBR. The memory receives the address of the location to be accessed directly from the MAR, whose output is permanently connected to the memorys address input. A dedicated connection between the MAR and memory is possible because the memory never receives an address input from a source other than the memory address register. A permanent connection removes the need for bus control circuits. Two data paths link the memory to the system bus. In a read cycle when memory control input R is asserted, data is transferred from the memory to the system bus via tri-state gate GMSR. During a memory write cycle when memory control input W is asserted, data is transferred from the system bus directly to the memory. The MBR, data register, program counter, and instruction register are each connected to the system bus in the same way. When one of these registers wishes to place data on the bus, its tri-state gate is enabled. Conversely, data is copied into a register from the bus by clocking the register. The instruction register (IR) receives data from the memory directly, without the data having to pass through the MBR. The ALU receives data from two sources, the system bus and data register D0, and places its own output on the system bus. This arrangement begs the question, If the ALU gets data from the system bus how can it put data on the same bus at the same time it is receiving data from this bus? Figure 7.8 shows that the ALU contains an internal ALU register. When

F1 0 0 1 1

F0 0 1 0 1

function add P to Q subtract Q increment Q decrement Q

Table 7.5 Decoding the ALU control code, F0, F1.

this register is clocked by CALU, the output from the ALU is captured and can be put on the system bus by enabling gate GALU. Table 7.4 denes the 16 control signals in Fig. 7.8. Instruction decoding takes an instruction and uses it to create a sequence of 16-bit signals that control the system in Fig. 7.8. The ALU is controlled by a two-bit code, F1, F0, which determines its functions as dened in Table 7.5. These operations are representative of real instructions, although a practical ALU would implement, typically, 16 different functions. In order to keep the design of a random logic control unit as simple as possible, we will construct a 3-bit operation code giving a total of eight instructions. This instruction set dened in Table 7.6 presents a very primitive instruction set indeed, but it does include the types of instruction found in real rst-generation processors. We have dened explicit LOAD and STORE instructions rather than a single MOVE instruction which does the work of both LOAD and STORE. Having constructed an instruction set, we dene each of the instructions in terms of RTL and determine the

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 310

310 Chapter 7 Structure of the CPU


sequence of operations necessary to carry them out on the computer in Fig. 7.8. Table 7.7 gives the micro-operations for each instruction including the fetch phase. The symbol Z is the zero-ag bit from the CCR, which is assumed to be part of the ALU.

Op-code

Mnemonic

Operation

Note that N is the operand eld used by the instruction.

Table 7.6 A primitive instruction set for the CPU of Fig. 7.8.

Instruction Op-code Operations (RTL) Fetch [MAR] [IR] [ALU] [PC] 000 [MAR] [D0] [PC] [[MAR]] [PC] [ALU]

Control actions EPC R EPC E ALU E IR R =1 = 1, EMSR = 1, = 1, F 1,F 0 = 1,0, =1 =1 = 1, E MSR = 1, C MAR C IR C ALU C PC C MAR C D0 C MAR W=1 C MAR C MBR C ALU C D0 C MAR C MBR C ALU C D0 C MAR C MBR C ALU W=1 C MAR C MBR C ALU W=1 C PC IF Z = 1 THEN CPC

LOAD

[IR] [[MAR]]

STORE

001

[MAR] [IR] [[MAR]] [D0] [MAR] [MBR] [ALU] [D0] [MAR] [MBR] [ALU] [D0] [MAR] [MBR] [ALU] [[MAR]] [MAR] [MBR] [ALU] [[MAR]] [PC] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU] [IR] [[MAR]] [MBR] [ALU]

E IR = 1 E D0 = 1 E IR R E MBR E ALU E IR R E MBR E ALU E IR R E MBR E ALU E IR R E MBR E ALU E IR =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,0, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,1, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 0,1, =1 =1 = 1, E MSR = 1, = 1, F 1,F 0 = 1,1, =1 =1

ADD

010

SUB

011

INC

100

DEC

101

BRA BEQ

110 111

[IR]

IF Z = 1 THEN [PC] [IR]

E IR = 1

Table 7.7 Interpreting the instruction set of Table 7.6 in RTL and microinstructions.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 311

7.3 The random logic control unit

311

Consider the load D0 from memory operation; this requires the following two steps:

Then we have to put the memory in read mode, put the data from the memory onto the bus by enabling the GMSR gate, and nally capture the data in D0 by clocking register D0.

[MAR] [IR] EIR = 1 [D0] [[MAR]] R = 1,

CMAR EMSR = 1,

CD0

Copy operand address to MAR Read memory and copy to D0


Table 7.7 tells us what signals have to be asserted to execute the two operations required to interpret LOAD N. Table 7.8

We have to send the operand address in the instruction register to the memory address register by enabling the GIR gate and then clocking the data into the memory address register.
Instruction Operations (RTL) Control actions R Fetch [MAR] [IR] [ALU] [PC] LOAD [MAR] [D0] STORE [MAR] [PC] [PC] [ALU] [IR] 0 0 0 0 W 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 CMAR 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 CMBR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CPC 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

CIR 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

CD0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0

CALU 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0

EMSR 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0

EMBR 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

EIR 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1

EPC 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

ED0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

EALU 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0

F1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0

F0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0

[[MAR]] 1

[[MAR]] 1 [IR] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

[[MAR]] [D0] ADD [MAR] [MBR] [ALU] [D0] SUB [MAR] [MBR] [ALU] [D0] INC [MAR] [MBR] [ALU] [IR] [MBR] [ALU] [IR] [MBR] [ALU] [IR] [MBR]

[[MAR]] 1

[[MAR]] 1

[[MAR]] 1

[[MAR]] [ALU] DEC [MAR] [MBR] [ALU] [IR] [MBR]

[[MAR]] 1

[[MAR]] [ALU] BRA BEQ [PC] [IR]

IF Z = 1 THEN [PC] [IR] 0 0 0 0 Z 0 0 0 0 0 1 0 0 0 0 0

Table 7.8 Interpreting the micro-operations of Table 7.7 as microinstructions.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 312

312 Chapter 7 Structure of the CPU


gives all the signals in the form of a 16-component vector; that is, the two vectors are 0010000000100000 and 1000000010001000 Figure 7.9 shows the timing of the execution phase of this instruction. We have included only ve of the 16 possible control signals because all the other 12 signals remain inactive during these two micro-operations. we require a source of signals to trigger each of the microinstructions. A circuit that produces a stream of trigger signals is called a sequencer. Figure 7.11 provides the logic diagram of a simplied eight-step sequencer. The outputs of three JK ip-ops arranged as a 3-bit binary up-counter counting 000, 001, 010, . . . ,111, are connected to eight three-input AND gates to generate timing signals T0 to T7. Figure 7.12 illustrates the timing pulses created by this circuit. Note that the timing decoder is similar to the instruction decoder of Fig. 7.11. As not all macroinstructions require the same number of microinstructions to interpret them, the sequencer of Fig. 7.11 has a reset input that can be used to reset the sequencer by returning it to state T0. The sequencer of Fig. 7.11 is illustrative rather than practical, because, as it stands, the circuit may generate spurious timing pulses at the timing pulse outputs due to the use of an asynchronous counter. All outputs of an asynchronous counter dont change state at the same instant and therefore the bit pattern at its output may pass through several states (if only for a few nanoseconds) before it settles down to its nal value. Unfortunately, these transient states or glitches may last long enough to create spurious timing signals, which, in turn, may trigger undesired activity within the control unit. A solution to these problems is to disable the output of the timing pulse generator until the counter has settled down (or to use a synchronous counter). The next step in designing the control unit is to combine the signals from the instruction decoder with the timing signals from the sequencer to generate the actual control signals. Figure 7.13 shows one possible approach. There are nine vertical lines in the decoder Fetch phase of Fig. 7.13 (only three are shown). One vertical line corresponds to the fetch phase and each of the other eight lines is assigned to one of the eight instructions. At any instant one of the vertical lines from the instruction decoder (or fetch) is in a logical one state, enabling the column of two-input AND gates to which it is connected. The other inputs to the column of AND gates are the timing pulses from the sequencer. As the timing signals, T0 to T7, are generated, the outputs of the AND gates enabled by the current instruction synthesize the control signals required to implement the random logic control unit. The output of each AND gate corresponding to a particular microinstruction (e.g. CMAR) triggers the actual microinstruction (i.e. micro-operation). As we pointed out earlier, not all macroinstructions require eight clock cycles to execute them.

7.3.2 From op-code to operation


In order to execute an instruction we have to do two things. The rst is to convert the 3-bit op-code into one of eight possible sequences of action and the second is to cause these actions to take place. Figure 7.10 shows how the instructions are decoded and is similar in operation to the 3-line to 8-line decoder described in Chapter 2. For each of the eight possible three-bit op-codes, one and only one of the eight outputs is placed in an active-high condition. For example, if the op-code corresponding to ADD (i.e. 010) is loaded into the instruction register during a fetch phase, the ADD line 2 from the AND gate array is asserted high while all other AND gate outputs remain low. Its no good simply detecting and decoding a particular instruction. The control unit has to carry out the sequence of microinstructions that will execute the instruction. To do this

Fetch phase

Execute Micro-operation 1 Send IR to MAR Micro-operation 2 Read memory, latch into D0

Clock Put [IR] on bus EIR Enable IR to bus CMAR Clock data into MAR R Read memory EMSR Enable memory to bus CD0 Clock data into D0 Capture data in D0 Put operand on bus Read operand

Capture [IR] in MAR

Figure 7.9 Timing of the execute phase of a LOAD N instruction.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 313

7.3 The random logic control unit


3 op-code bits Bit 2 Bit 1 Bit 0 Operand address N

313

000 LOAD

N,D0

001 STORE D0,N 010 ADD

The 3-bit op-code is decoded into N,D0 eight operations

111 BEQ

Figure 7.10 The instruction decoder.

CLR 1 Clock 1 K J C Q 1 K Q 1 J

CLR Q C Q 1 K 1 J

CLR Q C Q

Clear counter to state T0

T0

T1 Eight timing pulses T2 (Only 4 outputs shown for simplicity)

T7

Figure 7.11 The timing pulse generator (sequencer).

One machine cycle (eight clock states) T0 T1 T2 T3 T4 T5 T6 T7

Figure 7.12 The outputs from the timing pulse generator.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 314

314 Chapter 7 Structure of the CPU


Op-code Op-code EXECUTE FETCH EPC=1, CMAR T0 R=1, EMSR =1, CIR T1 T2 EALU=1, CPC T3 T1 EPC=1, F1, F0=1, 0, CALU T0 EIR=1, CMAR T0 R=1, EMSR =1, CD0 T1 ED0=1, W=1 EIR=1, CMAR

Figure 7.13 Combining control signals.

MBR

CPC CD0 CALU F 0 F 1 R

IR

PC

E E

D0

ALU

MAR

MBR

W E

CIR

MSR

Figure 7.14 The OR gate array used to generate the actual microinstructions.

Each microinstruction is activated by one or more control signals from the nine columns of AND gates. Figure 7.14 shows the array of OR gates that combine the outputs from the AND gates to generate the control signals. The inputs from these OR gates come from the nine columns of AND gates.

always in one of two states (fetch or execute), an RS ip-op provides a convenient way of switching from one state to another. When Q 0 the current operation is a fetch phase, and when Q 1 an execute phase is being performed. Figure 7.15 is an extension of Figure 7.13 and demonstrates how the instruction decoder is enabled by the Q output of _ the fetchexecute ip-op, and the fetch decoder by the Q output. At the end of each fetch phase, a clock pulse from the timing generator sets the fetchexecute ip-op, permitting the current op-code to be decoded and executed. The timingpulse generator is reset at the end of each fetch. At the end of each execute phase, the fetchexecute ip-op is cleared and the sequencer reset, enabling the next fetch phase to begin. Table 7.9 shows how the machine-level instructions can be represented in terms of both timing signals and microinstructions. Note that weve included the micro-operation [MAR] [IR] in the fetch phase. The microinstructions are the enable signals to the bus drivers, the register clocks, the ALU function select bits, the memory controls (R and W), and the reset and set inputs of the fetchexecute ip-op. For each of the microinstructions we can write down a Boolean expression in terms of the machine-level instruction and the sequence of timing pulses. For example, consider expressions for EMBR, EIR, and CMAR. EMBR ADD T1 SUB T1 INC T1 DEC T1 EIR Fetch T4 BRA T0 BEQ T0 CMAR Fetch T0 Fetch T4

The fetchexecute ip-op


So far we have devised a mechanism to interpret each macroinstruction but have not looked at how we implement the two-phase fetchexecute cycle. As the control unit is We should note, of course, that this CPU and its microprogram are very highly simplied and illustrate the nature of the random logic CU rather than its exact design.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 315

7.4 Microprogrammed control units


Other reset inputs

315

Q Fetchexecute flip-flop

EXECUTE

S Select execute phase

FETCH Op-code (from IR) T0 T0 T0

T1

T1 Select fetch phase

T1 Select fetch phase

T2 At the end of each operation the last T timing pulse is used 3 to reset the fethexecute flip-flop T4

Control signals to the OR gate array

Figure 7.15 The fetchexecute ip-op.

In the next section we look at how the design of a control unit can be simplied by putting the sequence of microoperations in a table and then reading them from a table, rather than by synthesizing them in hard logic.

7.4 Microprogrammed control units


Before we describe the microprogrammed control unit, lets remind ourselves of the macro-level instruction, micro-level instruction, and interpretation. The natural or native language of a computer is its machine code whose mnemonic representation is called assembly language. Machine-level instructions are also called macroinstructions. Each macroinstruction is executed by means of a number of primitive actions called microinstructions. The process whereby a macroinstruction is executed by carrying out a series of microinstructions is called interpretation. Lets begin with another simple computer. Consider Fig. 7.16. The internal structure of this primitive CPU differs slightly from that of Fig. 7.8 because theres more than one bus. The CPU in Fig. 7.16 includes the mechanisms by which information is moved within the CPU. Each of the registers (program counter, MAR, data register, etc.) is made up of D ip-ops. When the clock input to a register is pulsed, the

data at the registers D input terminals is transferred to its output terminals and held constant until the register is clocked again. The connections between the registers are by means of m-bit wide data highways, which are drawn as a single bold line. The output from each register can be gated onto the bus by enabling the appropriate tri-state buffer. We have used a multiplexer, labeled MPLX, to select the program counters input from either the incrementer or the operand eld of the instruction register. The multiplexer is controlled by the 1-bit signal Mux, where Mux 0 selects the incrementer path, and Mux 1 selects the branch target address from the address/operand eld of the instruction register, IRaddress. Suppose our computer performs a fetchexecute cycle in which the op-code is ADD N,D0. This instruction adds the contents of the memory location specied by the operand eld N to the contents of the data register (i.e. D0) and deposits the result in D0. We can write down the sequence of operations that take place during the execution of ADD not only in terms of register transfer language, but also in terms of the enabling of gates and the clocking of ip-ops. Table 7.10 illustrates the sequence of microinstructions executed during the fetchexecute cycle of an ADD instruction. It should be emphasized that the fetch phase of all instructions is identical and it is only the execute phase that varies according to the nature of the op-code read during the fetch phase.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 316

316 Chapter 7 Structure of the CPU


Instruction Time Memory R Fetch T0 T1 T2 T3 T4 LOAD STORE ADD T0 T0 T0 T1 T2 SUB T0 T1 T2 INC T0 T1 T2 DEC T0 T1 T2 BRA BEQ T0 T0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 W 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 MBR 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 IR 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 Gate enables PC 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 MSR 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 ALU 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 MAR 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Register clocks MBR IR 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PC 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Z D0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 ALU 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 ALU Fetchexecute F1 F0 X X X X 1 0 X X X X X X X X X X 0 0 X X X X 0 1 X X X X 1 0 X X X X 1 1 X X X X X X R 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 S 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Table 7.9 The interpretation of machine code instructions.

7.4.1 The microprogram


Imagine that the output of the control unit in Fig. 7.16 consists of 10 signals that enable gates G1 to G10, the PC input multiplexer, two signals that control the memory, and ve clock signals that pulse the clock inputs of the PC, MAR, MBR, IR, and D0 registers. Table 7.11 presents the 17 outputs of the control unit as a sequence of binary values that are generated during the fetch and execute phases of an ADD instruction. We have not included the ALU function signals in this table. When the memory is accessed by E 1, a memory read or write cycle may take place. The R/W (i.e. read/write) signal determines the nature of the memory access when E 1. When R/W 0 the cycle is a write cycle, and when R/W 1 the cycle is a read cycle.

If, for each of the seven steps in Table 7.11, the 17 signals are fed to the various parts of the CPU in Fig. 7.16, then the fetchexecute cycle will be carried out. Real microprogrammed computers might use 64 to 200 control signals rather than the 17 in this example. One of the most signicant differences between a microinstruction and a macroinstruction is that the former contains many elds and may provide several operands, while the macroinstruction frequently species only an op-code and one or two operands. The seven steps in Table 7.11 represent a microprogram that interprets a fetch phase followed by an ADD instruction. We have demonstrated that a macroinstruction is interpreted by executing a microprogram, which comprises a sequence of microinstructions. Each of the CPUs instructions has its own microprogram. We now look at the

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 317

7.4 Microprogrammed control units


Clock PC

317

PC

Q D G1

Clock MAR

MAR Address Memory Data in out

D Q

MPLX E Mux Incrementer Branch path Clock PC Q D G2 R/W

G3 G4 G5 Bus A

Q IR

G10 Literal path

Clock MBR

MBR

Q D

G6 G7

Bus B

Control units

Clock D0 Control signals C G8 Bus C

D0

Q D A B

ALU

F1 F2 F3 Copy bus A to bus C G9

Copy bus A to IR

Figure 7.16 Controlling the ow of information in a computer.

Step 1 1a 2 3 4 4a 5 6 7 7a 7b

Register transfer language


[MAR] INC [PC] [MBR] [IR] CU [MAR] [MBR] ALU ALU [D0] [PC] [PC] INC [[MAR]] [MBR] [IR(op-code)] [IR(address)] [[MAR]] [MBR] [D0] ALU

Operations required enable G1, clock MAR


Mux = 0, clock PC enable memory, R/W=1, enable G3 enable G9, clock MBR enable G4, clock IR enable G2, clock MAR enable memory, R/W=1, enable G3, enable G9, clock MBR enable G4, set ALU function to add enable G7 enable G8, clock data register D0

Note 1 Where there is no entry in the column labeled Operations required, that operation happens automatically. For example, the output of the program counter is always connected to the input of the incrementer and therefore no explicit operation is needed to move the contents of the PC to the incrementer. Note 2 Any three-state gate not explicitly mentioned is not enabled. Note 3 Steps 1, 1a are carried out simultaneously, as are 4, 4a and 7, 7a, 7b.

Table 7.10 Interpreting a fetchexecute cycle for an ADD N,D0 instruction in terms of RTL.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 318

318 Chapter 7 Structure of the CPU


Step Gate control signals and MPLX control G1 1 2 3 4 5 6 7 1 0 0 0 0 0 0 G2 0 0 0 0 1 0 0 G3 0 0 1 0 0 1 0 G4 0 0 0 1 0 0 1 G5 0 0 0 0 0 0 0 G6 0 0 0 0 0 0 0 G7 0 0 0 0 0 0 1 G8 0 0 0 0 0 0 1 G9 0 0 1 0 0 1 0 G10 0 0 0 0 0 0 0 Mux 0 0 0 0 0 0 0 Memory E 0 0 1 0 0 1 0 R/W 0 0 1 0 0 1 0 Register clocks PC 0 1 0 0 0 0 0 MAR 1 0 0 0 1 0 0 MBR 0 0 1 0 0 1 0 D0 0 0 0 0 0 0 1 IR 0 0 0 1 0 0 0

Table 7.11 Control signals generated during the fetch and execution phases of an ADD instruction.

This is part of the CPU Operation code Address Instruction register

Address mapper

Incrementer

Microprogram counter

Clock

Address Microprogram memory Data Microinstruction register Control signals to other parts of the CPU

Next microinstruction address

Load Condition control select

CPU control field

Branch condition

Multiplexer

Branch on zero Branch on not zero Branch never (logical zero) Branch always (logical one)

These signals are from the ALU in the CPU

Figure 7.17 The microprogrammed control unit.

microprogram itself and consider the hardware required to execute it. The microprogram is executed by the same type of mechanism used to execute the macroprogram (i.e., machine code) itself. This is a good example of the common expression wheels within wheels. Figure 7.17 describes the basic structure of a microprogrammed control unit that has a microprogram counter, a microprogram memory, and a microinstruction register (this structure is typical of the 1980s). The microinstruction

address from the microprogram counter is applied to the address input of the microprogram memory and the data output of the memory fed to the microinstruction register. As weve said, the structure of the control unit that executes the macroinstruction is very much like the structure of the CPU itself. However, there is one very big difference between the macroinstruction world and the microinstruction worldthe microinstruction register is very much longer than the macroinstruction register and the microinstructions

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 319

7.4 Microprogrammed control units

319

structure is much more complex that of the macroinstruction. Information in the microinstruction register is divided into four elds: next microinstruction address eld, microprogram counter load control eld, condition select eld, and CPU control eld. Most of the bits in the microinstruction register belong to the CPU control eld, which controls the ow of information within the CPU by enabling tri-state gates and clocking registers as weve described; for example, all the control signals in Table 7.11 belong to this eld. Our next task is to describe one of the principal differences between the micro- and macroinstruction. Each microinstruction is also a conditional branch instruction that determines the location of the next microinstruction to be executed. We will now explain how microinstructions are sequenced.

7.4.2 Microinstruction sequence control


If the microprogram counter were to step through the microprogram memory in the natural sequence, 0, 1, 2, 3, . . . etc., a stream of consecutive microinstructions would appear in the microinstruction register, causing the CPU to behave in the way described by Table 7.11. The CPU control bits of each microinstruction determine the ow of information within the CPU. However, just as in the case of the macroprogram control unit, it is often necessary to modify the sequence in which microinstructions are executed. For example, we might wish to repeat a group of microinstructions n times, or we may wish to jump from a fetch phase to an execute phase, or we may wish to call a (microinstruction) procedure. Microinstruction sequence control is determined by the three left-hand elds of the microinstruction register in Fig. 7.17, enabling the microprogram counter to implement both conditional and unconditional branches to locations within the microprogram memory. We shall soon see that this activity is necessary to execute macroinstructions such as BRA, BCC, BCS, BEQ, etc. In normal operation, the microprogram counter steps through microinstructions sequentially and the next microprogram address is the current address plus one. By loading

the contents of the next microinstruction address eld of the current microinstruction eld into the microprogram counter, a branch can be made to any point in the microprogram memory. In other words each microinstruction determines whether the next microinstruction is taken in sequence or whether it is taken from the next address eld of the current microinstruction. The obvious question to ask is, What determines whether the microprogram counter continues in sequence or is loaded from the next microinstruction address eld of the current microinstruction? The microprogram load control eld in the microinstruction register tells the microprogram counter how to get the next microinstruction address. This next address can come from the incrementer and cause the microprogram to continue in sequence. The next address can also be obtained from the address mapper (see below) or from the address in the next microinstruction address eld of the microinstruction register. The condition select eld in the microinstruction register implements conditional branches at the macroinstruction level by executing a conditional branch at the microinstruction level. In the simplied arrangement of Fig. 7.17, the condition select eld directly controls a 4-to-1 multiplexer that selects one of four ag bits representing the state of the CPU. These ag bits are obtained from the ALU and are usually the ag bits in the condition code register (e.g. Z, N, C, V). The condition select eld selects one of these ag bits for testing (in this example only the Z-bit is used). If the output of the multiplexer is true, a microprogram jump is made to the address specied by the contents of the next microinstruction address eld, otherwise the microprogram continues sequentially. In Fig. 7.17 two of the conditions are obtained from the CCR and two bits are permanently true and false. A false condition implies branch never (i.e. continue) and a true condition implies branch always (i.e. goto). To emphasize what weve just said, consider the hypothetical microinstruction of Fig. 7.18. This microinstruction is interpreted as:
IF Z = 1 THEN [PC] ADD3 ELSE [PC] [PC] + 1

where PC indicates the microprogram counter. A conditional branch at the macroinstruction level (e.g. BEQ) is interpreted by microinstructions in the following

Location of the next microinstruction to execute if a branch is taken Next address field ADD3

This field selects the This field determines where condition to be tested when making a the next microinstruction conditional branch address comes from

This field provide the CPU control signals that select source and destination registers, control buses, and determine the ALU function

Load control Conditional

Conditional select Branch on zero

CPU control fields

Figure 7.18 Structure of a microinstruction.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 320

320 Chapter 7 Structure of the CPU


way. The condition select eld of the microinstruction selects the appropriate status bit of the CCR to be tested. For example, if the macroinstruction is BEQ the Z-bit is selected. The microprogram counter load control eld contains the operation branch to the address in the microinstruction register on selected condition true. Thus, if the selected condition is true (i.e. Z 1), a jump is made to a point in the microprogram that implements the corresponding jump in the macroprogram. If the selected condition is false (i.e. Z 0), the current sequence of microinstructions is terminated by the start of a new fetchexecute cycle. some additional logic and a microprogram in ROM, a CPU with a user-dened instruction set and wordlength may be created. Of course, the designer doesnt have to construct a new CPU out of bit-slice components. You can emulate an existing microprocessor or even add machine-level instructions to enhance it. Figure 7.19 describes a typical bit-slice arithmetic logic unit that can generate one of eight functions of two inputs R and S. These functions vary from R plus S to the exclusive NOR of R and S. The values of R and S may be selected from a register le of 16 general-purpose data registers, an external input, a Q register, or zero. The bit-slice ALU is controlled (i.e. programmed) by a 9-bit input, which selects the source of the data taking part in an arithmetic or logical operation, determines the particular operation to be executed, and controls the destination (together with any shifting) of the result. Typical ALU operations are
[R7] [R6] [R9] [R7] [R7] + [R1] [R6] - [R5] [R9][R2] [R7] + 1

Implementing the fetchexecute cycle


The rst part of each microprogram executed by the control unit corresponds to a macroinstruction fetch phase that ends with the macroinstruction op-code being deposited in the instruction register. The op-code from the instruction register is rst fed to the address mapper, which is a look-up table containing the starting address of the microprogram for each of the possible op-codes. That is, the address mapper translates the arbitrary bit pattern of the op-code into the location of the corresponding microprogram that will execute the op-code. After this microprogram has been executed, an unconditional jump is made to the start of the microprogram that interprets the macroinstruction execute phase, and the process continues.

7.4.3 User-microprogrammed processors


Before the advent of todays powerful microprocessors, engineers in the 1980s requiring high performance sometimes constructed their own microprogrammed computers; that is, the engineer designed a CPU to their own specications. This was fun because you could create your own architecture and instruction set. On the other hand, you ended up with a computer without an off-the-shelf operating system, compilers, or any of the other tools you take for granted when you use a mainstream CPU. At the heart of many of these systems was the bit-slice component, which provided a middle path between microcomputer and mainframe. Bit-slice components, as their name suggests, are really subsections of a microprocessor that can be put together to create a custom CPU. For example, a 64-bit computer is made by putting together eight 8-bit bit-slice chips. Bit-slice components are divided into two types corresponding to the functional division within the microprocessor (i.e. the microprogram control and ALU). By using several ALU and microprogram controller bit-slices plus

An arithmetic unit of any length (as long as it is a multiple of 4) is constructed by connecting together bit-slice ALUs. Designers can use the ALUs internal registers in any way they desire. For example, they may choose to implement eight addressable data registers, two stack pointers (described later), two index registers, a program counter, and three scratchpad registers. Flexibility is the most powerful feature of bit-slice microprocessors. This description of the microprogrammed control unit is highly simplied. In practice the microprogram might include facilities for dealing with interrupts, the memory system, input/output, and so on. One of the advantages of a microprogrammed control unit is that it is possible to alter the content of the microprogram memory (sometimes called the control store) and hence design your own machine-level instructions. In fact it is perfectly possible to choose a set of microprograms that will execute the machine code of an entirely different computer. In this case the computer is said to emulate another computer. Such a facility is useful if you are changing your old computer to a new one whose own machine code is incompatible with your old programs. Emulation applies to programs that exist in binary (object) form on tape or disk. By writing microprograms (on the new machine) to interpret the machine code of the old machine, you can use the old software and still get the advantages of the new machine. One of the greatest problems in the design of a bit-slice computer lies in the construction and testing of the

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 321

7.4 Microprogrammed control units


Literal data from control unit or main store

321

RAM0

RAM shifter RAM 3 Q0 Q shifter Qn

4-bit register selects inputs from control unit

Data input RAM 16 addressable registers A data B data output output

Q Q register

Logical zero 0 A B D ALU data source selector Q

R Carry_in 8-function ALU

Carry Zero Sign Overflow

C Z N V

Output data selector Y

Bit slice control functions: ALU control, source and destination control

Data output to memory address register or memory data register

Figure 7.19 The microprogrammed ALU.

microprogram. You can, or course, write a program to emulate the bit-slice processor on another computer. A popular method of developing a microprogram is to replace the microprogram ROM with read/write memory and to access this memory with a conventional microprocessor. That is, the microprogram memory is common to both the bit-slice system and the microprocessor. In this way, the microprocessor can input a microprogram in mnemonic form, edit it, assemble it, and then pass control to the bit-slice system. The microprocessor may even monitor the operation of the bit-slice system. Such a microprogram memory is called a writable control store and once a writable control store was regarded as a big selling point of microprogrammed minicomputers and mainframes. However, we have already pointed out that a microprogrammable control store is of very little practical use due to the lack of applications

software. Even if a computer user has the expertise to design new microprogrammed macroinstructions, it is unlikely that the system software and compilers will be able to make use of these new instructions. Finally, RISC technology (as we shall see) does not use microprogramming and interest in microprogramming is much less than it once was. In the next chapter we look at how the performance of computers can be enhanced by three very different techniques. We begin with a brief introduction to the RISC revolution of the 1970s and 1980s and show how processors with regular instruction sets lend themselves to pipelining (the overlapping of instruction execution). We also look at cache memory and explain how a small quantity of very-high-speed random access memory can radically improve a computers performance. Finally, we describe the multiprocessora system that uses more than one processing unit to accelerate performance.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 322

322 Chapter 7 Structure of the CPU


I SUMMARY
We have taken a step back from the complex CPU architecture we described in the previous chapters and have looked at how a simple processor can read an instruction from memory, decode it, and execute it. We did this by considering the sequence of events that takes place when an instruction is executed and the ow of information within the computer. In principle, the computer is a remarkably simple device. The program counter contains the address of the next instruction to be executed. The computer reads the instruction from memory and decodes it. We have demonstrated that a typical instruction requires a second access to memory to fetch the data used by the instruction. We have demonstrated how a simple computer that can execute only instructions that load and store data or perform arithmetic operations can implement the conditional behavior required for loop and if . . . then . . . else constructs. The second part of this chapter looked at two ways of implementing a computers control unit. We started with a simple computer structure and demonstrated the control signals required to implement several machine-level instructions. Then we showed how you can use relatively simple logic and a timing sequencer to generate the signals required to interpret an instruction. Random logic control units are faster than their microprogrammed counterparts. This must always be so because the random logic control unit is optimized for its particular application. Moreover, a microprogrammed control unit is slowed by the need to read a microinstruction from the microprogram memory. Memory accesses are generally slower than basic Boolean operations. Microprogramming offers a exible design. As the microprogram lives in read-only memory, it can easily be modied at either the design or the production stage. A random logic control unit is strictly special purpose and cannot readily be modied to incorporate new features in the processor (e.g. additional machine-level instructions), and sometimes it is difcult to remove design errors without considerable modication of the hardware. The highpoint of microprogramming was the early 1970s when main memory had an access time of 12 s and the control store used to hold microprograms had an access time of 50100 ns. It was then sensible to design complex machine level instructions that were executed very rapidly as microcode. Today, things have changed and memories with access times of below 50 ns are the norm rather than the exception. Faster memory makes microprogramming less attractive because hard-wired random logic control units execute instructions much more rapidly than microcoded control units. Todays generation of RISC (reduced instruction set computers) and post-RISC architectures are not microprogrammed.

I PROBLEMS
7.1 Within a CPU, what is the difference between an address path and a data path? 7.2 In the context of a machine-level instruction, what is an operand? 7.3 What is a literal operand? 7.4 How does a computer know whether an operand in its instruction register is a literal or a reference to memory (i.e. an address)? 7.5 Why is the program counter a pointer and not a counter? 7.6 Explain the function of the following registers in a CPU: (a) (b) (c) (d) PC MAR MBR IR

7.7 What is the CCR? 7.8 Does a computer need data registers? 7.9 Some microprocessors have one general-purpose data register, some two, some eight, and so on. What do you think determines the number of such general-purpose data registers in any given computer? 7.10 What is the signicance of the fetchexecute cycle? 7.11 What is the so-called von Neumann bottleneck? 7.12 Design a computer (at the register and bus level) to implement a zero address instruction set architecture. 7.13 In the context of CPU design, what is a random logic control unit? What is the meaning of the word random in this expression? 7.14 What is a microprogrammed control unit? 7.15 Microprogramming has now fallen into disfavor. Why do you think this is so? 7.16 For the computer structure of Fig. 7.20, state the sequence of micro-operations necessary to carry out the following instruction. Assume that the current instruction is in the IR.
ADDsquare D0, D1

This instruction reads the contents of register D0, squares that value, and then adds it to the contents of register D1. The result

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 323

7.4 Microprogrammed control units


Abus Read Write Address CMAR Data out Main store Data in EMSR GMSR B bus

323

MAR GMAR MBR EMBR GIR IR EIR GPC PC EPC GD0

CMBR

This computer has two internal buses A and B. All registers capture data from a bus when they are clocked. All tri-state gates can put data on a bus when they are enabled. The function of the ALU is controlled by inputs F2, F1, F0. This memory is controlled by a read signal and a write signal. The ALU performs an operation on one or both its P and Q inputs depending on the state of its control inputs. Data iputs for the ALU come from two registers, latch1 and latch 2.

CIR

CPC

CD0

D0

ED0 GD1

CD1 D1 CB_to_A EB_to_A ALU CALU f(P, Q) EALU Function select Q Latch 2 P CL1 Latch 1

ED1

F2,

F1,

F0

CL2

Figure 7.20 A microprogrammed CPU.

is put in register D1. The function codes F2,F1, and F0 are given below.

Assume that only one operand, A, is required by the instruction and that operands B and C are in the next consecutive two memory locations, respectively. 7.18 For the architecture of the hypothetical two-bus computer of Fig. 7.20, derive a microprogram to carry out the operation
Move D0,[D1]

F2 0 0 0 0 1 1

F1 0 0 1 1 0 0

F0 0 1 0 1 0 1

Operation Copy P to F Add P to Q Subtract Q from P Add 1 to P Add 1 to Q Multiply P by Q F=P F=P+Q F=PQ F=P+1 F=Q+1 F=Qx1

7.17 For the structure of Fig. 7.20 write a microprogram to implement the operation
D1=[A]+[B]+[C]+1

This operation copies the contents of register D0 into the memory location whose address is given by the contents of register D1. You should describe the actions that occur in plain English (e.g. Put data from this register on the B bus) and as a sequence of events (e.g. Read 1, EMSR). The table in Question 16 denes the effect of the ALUs function code. All data has to pass through the ALU to get from bus B to bus A. Note that the ALU has two input latches. Data has to be loaded into these latches before an ALU operation takes place.

07-CLements-Chap07.qxd

22/12/2005

6:17 PM

Page 324

324 Chapter 7 Structure of the CPU


7.19 For the computer of Fig. 7.20, what is the effect of the following sequence of micro-operations?
ED0 ED0 F2,F1,F0 EMBR ED0 F2,F1,F0 = = = = = = 1, 1, 0,0,1, 1, 1, 0,0,1, CL1 CL2 CMBR CL1 CL2 CD1

Your answer should explain what each of the microoperations does individually. You should also state what these actions achieve collectively; that is, what is the effect of the equivalent assembly language operation?

EALU,

EALU,

You might also like