Differentiate Tightly Coupled and Loosely Coupled System
Differentiate Tightly Coupled and Loosely Coupled System
Differentiate Tightly Coupled and Loosely Coupled System
• Any other processor wishing to initiate a transfer must first determine the availability status
of the bus and only after the bus becomes available, the processor address the destination
unit to initiate the transfer.
• A command is issued to inform the destination unit, what operation is to be performed.
• The receiving unit recognizes its address in the bus and responds to the control signals from
the sender, after which the transfer is initiated.
• A single common-bus system is restricted to one transfer at a time.
• This means that when one processor is communicating with the memory, all other
processors are either busy with internal operations or must be idle waiting for the bus.
• As a consequence, the total overall transfer rate within the system is limited by the speed of
the single path.
• A more economical implementation of a dual bus structure is depicted in figure 10.2.
MULTIPROCESSORS
• Here we have a number of local buses each connected to its own local memory and to one
or more processors.
• Each local bus may be connected to a CPU, an IOP, or any combination of processors.
• A system bus controller links each local bus to a common system bus.
• The I/O devices connected to the local IOP, as well as the local memory, are available to the
local processor.
• The memory connected to the common system bus is shared by all processors.
• If an IOP is connected directly to the system bus, the I/O devices attached to it may be made
available to all processors.
• Only one processor can communicate shared memory with the shared memory and other
common resources through the system bus at any given time.
• The other processors are kept busy communicating with their local memory and I/O devices.
• The module must have internal control logic to determine which port will have access to
memory at any given time.
• Memory access conflicts are resolved by assigning fixed priorities to each memory port.
• Priority levels are established by the arbitration logic to select one CPU when two or more
CPUs attempt to access the same memory.
• The multiplex are controlled with the binary code that is generated by a priority encoder
with in the arbitration logic.
• A crossbar switch organization supports simultaneous transfers from memory modules
because there is a separate path associated with each module.
• However, the hardware required to implement the switch can becomes quite large and
complex.
MULTIPROCESSORS
• The two processors P1 and P2 are connected through switches to eight memory modules
marked in binary from 000 through 111.
MULTIPROCESSORS
• The path from source to a destination is determined from the binary bits of the destination
number.
• The first bit of the destination number determines the switch output in the first level.
• The second bit specifies the output of the switch in the third level.
• Example to connect P1 to memory 101, it is necessary to form a path from P1 to output 1 in
first level switch, output 0 in second level switch and output 1 in the third level switch.
• A routing procedure can be developed by computing the exclusive-OR of the source node
address with the destination node address.
• For example, in a three-cube structure, a message at 010 going to 001 produces an exclusive-
OR of the two addresses equal to 011.
• The message can be sent along the second axis to 000 and then through the third axis to 001.
• The processor takes control of the bus if it acknowledge input line is enabled.
• The bus busy line provides an orderly transfer of control, as in the Daisy chaining case.
• Figure 10.10 shows the request lines from four arbiters going into a 4 X 2 priority encoder.
• The output of the encoder generates a 2-bit code which represents the highest-priority unit
among those requesting the bus.
• The bus priority-in (BPRN) and bus priority-out (BPRO) are used for a daisy-chain connection
of bus arbitration circuits.
• The bus busy signal BUSY is an open-collector output used to instruct all arbiters when the
bus is busy conducting a transfer.
• The common bus request (CBRQ) is also an open-collector output that serves to instruct the
arbiter if there are any other arbiters of lower-priority requesting use of the system bus.
• The signals used to construct a parallel arbitration procedure are bus request (BREQ) and
priority-in (BPRN), corresponding to the request and acknowledgement signals in figure
10.10.
• The bus clock (BCLK) is used to synchronize all bus transactions.
MULTIPROCESSORS
Polling
• In a bus system that uses polling, the bus grant signal is replaced by a set of lines called poll
lines which are connected to all units.
• These lines are used by the bus controller to define an address for each device connected to
the bus.
• The bus controller sequences through the addresses in a prescribed manner.
• When a processor that requires access recognizes its address, it activates the bus busy line
and then accesses the bus.
• After a number of bus cycle, the polling process continues by choosing a different processor.
• The polling sequence is normally programmable, and as a result, the selection priority can
be altered under program control.
LRU
• The least recently used (LRU) algorithm gives the highest priority to the requesting device
that has not used the bus for the longest interval.
• The priorities are adjusted after a number of bus cycles according to the LRU algorithm.
• With this procedure, no processor is favored over any other since the priorities are
dynamically changed to give every device an opportunity to access the bus.
FIFO
• In the first-come first-serve scheme, requests are served in the order received.
• To implement this algorithm the bus controller establishes a queue arranged according to
the time that the bus requests arrive.
• Each processor must wait for its turn to use the bus on a first-in first-out (FIFO) basis.
Rotating daisy-chain
• The rotating daisy-chain procedure is a dynamic extension of the daisy chain algorithm.
• Highest priority to the unit that is nearest to the unit that has most recently accessed the bus
(it becomes the bus controller).
MULTIPROCESSORS
Write-through policy
• As shown in figure 10.12, a store to X (of the value of 120) into the cache of processor P1
updates memory to the new value in a write-through policy.
• A write-through policy maintains consistency between memory and the originating cache,
but the other two caches are inconsistent since they still hold the old value.
Write-back policy
• In a write-back policy, main memory is not updated at the time of the store.
• The copies in the other two caches and main memory are inconsistent.
• Memory is updated eventually when the modified data in the cache are copied back into
memory.
• Another configuration that may cause consistency problems is a direct memory access (DMA)
activity in conjunction with an IOP connected to the system bus.
• In the case of input, the DMA may modify locations in main memory that also reside in cache
without updating the cache.
• During a DMA output, memory locations may be read before they are updated from the
cache when using a write-back policy.
MULTIPROCESSORS
Software Approaches
Read-Only Data are Cacheable
• The scheme that allows only nonshared and read-only data to be stored in caches. Such items
are called cachable.
• Shared writable data are noncachable.
• The compiler must tag data as either cachable or noncachable, and the system hardware
makes sure that only cachable data are stored in caches.
• The noncachable data remain in main memory.
• This method restricts the type of data stored in caches and introduces an extra software
overhead that may degrades performance.
Centralized Global Table
• A scheme that allows writable data to exist in at least one cache is a method that employs a
centralized global table in it compiler.
• The status of memory blocks is stored in the central global table.
• Each block is identified as read-only (RO) or read and write (RW).
• All caches can have copies of blocks identified as RO.
• Only one cache can have a copy of an RW block.
• Thus if the data are updated in the cache with an RW block, the other caches are not affected
because they do not have a copy of this block.
Hardware Approaches
Hardware-only solutions are handled by the hardware automatically and have the advantage of
higher speed and program transparency.
Snoopy Cache Controller
• In the hardware solution, the cache controller is specially designed to allow it to monitor all
bus requests from CPUs and IOPs.
• All caches attached to the bus constantly monitor the network for possible write operations.
• Depending on the method used, they must then either update or invalidate their own cache
copies when a match is detected.
• The bus controller that monitors this action is referred to as a snoopy cache controller.
• This is basically a hardware unit designed to maintain a bus-watching mechanism over all the
caches attached to the bus.
• All the snoopy controllers watch the bus for memory store operations.
• When a word in a cache is updated by writing into it, the corresponding location in main
memory is also updated.
• The local snoopy controllers in all other caches check their memory to determine if they have
a copy of the word that has been overwritten.
• If a copy exists in a remote cache, that location is marked invalid.
• Because all caches snoop on all bus writes, whenever a word is written, the net effect is to
update it in the original cache and main memory and remove it from all other caches.
• If at some future time a processor accesses the invalid item from its cache, the response is
equivalent to a cache miss, and the updated item is transferred from main memory. In this
way, inconsistent versions are prevented.