IE930465A1 - Cache memory zoning system - Google Patents
Cache memory zoning systemInfo
- Publication number
- IE930465A1 IE930465A1 IE930465A IE930465A IE930465A1 IE 930465 A1 IE930465 A1 IE 930465A1 IE 930465 A IE930465 A IE 930465A IE 930465 A IE930465 A IE 930465A IE 930465 A1 IE930465 A1 IE 930465A1
- Authority
- IE
- Ireland
- Prior art keywords
- cache
- memory
- tag
- bus
- address
- Prior art date
Links
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A computer system has a system bus 12 with a main memory 11, DMA units 14, 15, and a processor subsystem, which in turn has a processor 10, a cache memory 16, and a bus interface 13 all coupled together by a data bus 20, a tag/high-memory-address bus 21, and a short/low-memory-address bus 22. Cache reading is by addressing the cache with the short/low-memory-address part of the full desired word address, and comparing the tag part of the desired address with the tag read from the cache. A small zone 31 (Fig. 3) for transient data is defined in the cache by forcing (at 48) the upper part (on bus 46) of the short/low-memory-address bus to all Os, and forcing (at 44) a pseudo tag PS (for a non-existent memory area) into all locations in the transient zone. Subsequently, either the tag portion of the cache can be disabled for transient zone operations or the same pseudo tag can be used. The transient zone can be updated by flushing (reading that zone with misses forced).
Description
The present invention relates to cache memory systems, which are used widely in computers.
Caches - introduction
An exceedingly common computer configuration comprises a processor of some kind coupled to a memory of some kind. (The processor is often called a CPU central processing unit - although this term is somewhat inappropriate if the configuration is a peripheral part of some larger computer system. The memory is often called the main memory.)
It is obviously desirable to match the speeds of the processor and the memory, so that the access time of the memory is roughly equal to the rate at which the processor reads and writes instructions and/or data. The processor speed will often be determined by the particular application which the system is designed for, and the required memory speed will follow from that.
In addition to its speed, the size of the memory will also have to be determined. The size of the memory must be large enough to accommodate all the information required by the processor for considerable periods of time. Transfers to and from other devices (eg hard discs) may of course be necessary in some systems, but it is desirable for the memory size to be large enough for such transfers to occupy a relatively small proportion of the total memory operating time. In other systems, eg routers in communication systems, the memory size must be large enough to store the messages passing through the system for sufficient time for them to be recei
UNDER
SECTION 28 AND RULE 23
JNL ........
0 46 S
It may be noted that if the main memory is of a suitable type, it may be capable of block transfers which are considerably faster that random accesses. If it is a DRAM, for example, a memory access involves selection of a row followed by the selection of a column. Once that has been done, a block or burst transfer can be achieved by retaining the row selection and simply advancing the column selection word by word. However, this fast block transfer is obviously only possible if the words to be transferred are in a block of sequential addresses. With typical data processing, the sequence of addresses is normally not sequential, and reads and writes normally occur in an apparently random sequence, so fast block transfer cannot normally be used for such processing.
A memory satisfying these two requirements of sufficient capacity and speed is likely to be expensive, and may be difficult to implement. The use of a cache memory has therefore become common. A cache memory has a relatively small size and relatively high speed, matched to the speed of the processor, and is used in conjunction with the main memory. This allows the speed of the main memory to be considerably less than that of the cache with only a minor adverse impact on the system performance (speed).
A cache memory is effectively an associative memory, which stores the addresses of the words in it along with the words (data) themselves. (The words and their addresses may both include parity or other error-checking bits if desired; for present purposes, these will have no effect on the system.) A cache memory system is organised so that when the processor reads or writes a word, the address of the word is passed to the cache. If the operation is a write, then the word is written into the cache (along with its
930*65 address). If the access is a read and the address is in the cache, then the word is read from the cache. If the access is a read and the address is not in cache, then the word is read from main memory and written to cache at the same time.
The efficacy of the cache depends on the fact that in most programs, many words are accessed repeatedly. Once a word has entered the cache, subsequent operations on that word are achieved by accessing the cache. Since the cache speed is matched to the processor speed, the processor runs continuously or nearly so, with few waits for memory accesses to the main memory.
The simplest cache system has just a single cache, and both data words and instructions will be stored in it. In some circumstances it may be more convenient to have two caches, one for data words and the other for instructions. For present purposes, it will be assumed that there are two caches, one for data and the other for instructions, and our primary concern will be with the data cache (though many of the considerations applying to that will apply to the instruction cache as well).
Cache structure
A true associative memory would be complex and expensive. A cache memory is therefore normally constructed to store, with each word, only a part (termed the tag) of the address of that word. The cache is addressable in the conventional manner by the remaining part of the word address. When a cache location is addressed, the tag stored in that location is compared with the tag part of the full word address. If there is a match ('hit'), the desired word is contained in the cache. The cache may contain tag comparison circuitry for comparing the tag in the desired address with the tag retrieved from the cache location, or the comparison may be performed by the
0 4 6 5sprocessor.
Conventionally, the full address is split into a high order portion and a low order portion, with the low order portion (which we will call the short/low address) being used to address the cache memory and the high order portion being used as the tag. Programs frequently involve operations on a group of words which occupy addresses close to each other. Such a group of words maps into a corresponding group of addresses (locations) close to each other in the cache, so that all the words which the processor will then operate on will be in the cache. (These words in the cache will in fact all have the same tags, but the fact that their tags are all the same has no relevance to the operation of the system.)
A more elaborate cache organisation is possible, in which each 'location' in the cache contains say four data words and their associated tags. When a location in the cache is addressed (by a short/low address), the tag of the full address is compared with all four of the tags stored therein. If there is a hit on one of the stored tags, the desired data word is that associated with that tag. For present purposes, however, this possible elaboration will be ignored, and it will be assumed that the cache is of the simple type in which each location (addressable by a short/low address) stores a single data word and associated tag.
It is often desirable to be able to inhibit the cacheing of some words. A simple and effective way of achieving this is to use some of the high order address bits. A particular set of high order address bits (eg the top 3 bits) is chosen, and a logic circuit is provided which detects a particular value of this set of bits and thereupon inhibits the cacheing of the associated word. This makes an area of the memory space uncacheable, and it is generally feasible to arrange for words to be stored in the uncacheable or cacheable part of the memory space as desired.
Cache operation
The cache organisation, as described so far, allows the processor to read words from and write words to the cache provided that their addresses are already in the cache. Obviously, however, there will be times when a required address is not in the cache. There must therefore be a mechanism for entering fresh addresses into the cache. This will involve displacing words already in the cache, so the mechanism must also ensure that such displaced words are not lost but transferred into the main memory.
When a word is to be written, it is convenient to write it into cache automatically, without first checking the cache to see whether its address is already in cache. (What is actually written into the cache is an extended word, formed by concatenating the data word with the tag part of its address.) This ensures that the word is cache if its address should be accessed again.
When a word is to be read, its address is passed to the cache. If that address is not in the cache, then the address has to be passed to the main memory, so that the word is read from there. (It is usual for the address to be passed to the cache and the main memory in parallel, so that the main memory access is not delayed if there is a cache miss; if there is a cache hit, the main memory access is aborted.) As with writing, this ensures that the word is cached if its address should be accessed again. It is convenient for the word being .930465 read from the main memory to be copied immediately into cache; this writing occurs in parallel with the processor receiving the word and carrying out whatever operation is required on it.
Both reading and writing can thus result in the writing of a word with a fresh address into cache, which results in the displacement of a word already in cache, ie the over-writing of the word (and its tag) in the cache location into which the new word is being written. To avoid losing this displaced word, the system must ensure that it is copied into the main memory before it is displaced. This can conveniently be achieved by making every write (ie writing of a word by the processor) a write into main memory as well as into cache. A write thus consists of writing into cache and main memory simultaneously. A write buffer can be interposed between the processor and the main memory, so that the operation of the system is not delayed by the long write time of the main memory if several words have to be written in quick succession.
This solves the displacement problem, because any word displaced from the cache will either be an unchanged copy of a word which has been obtained from and is still in main memory, or will have previously been copied into the main memory.
Variations on this mechanism for avoiding inconsistencies between the main and cache memories may be possible. For present purposes, however, it will be assumed that the cache is operated in the manner just described.
930*65Interaction with external systems
The system described so far has been assumed to be largely self-contained: a processor, a main memory, and a cache. In practice, however, this system will usually be only a subordinate part, a subsystem, of a larger system. In particular, the main memory of such a larger system will be accessible by other parts of the system. We can take the system as including a system bus to which the main memory is coupled, with the cache and the processor being coupled together and coupled to the system bus via an interface unit (which will contain the write buffer). The system bus will have various other devices coupled to it, which we can term DMA (direct memory access) units. Depending on the system, the DMA units may be, for example, communications units or peripheral units.
The DMA units are so called because they can access the main memory directly, over the system bus, without involving the processor. This results in a further inconsistency problem for the cache; since the contents of the main memory can be changed without the knowledge of the processor, the contents of the cache and the main memory can be inconsistent.
This is not a matter of concern as far as the DMA devices are concerned, because any changes made to the cache are copied directly into the main memory. (There may in fact be a slight delay in this, because of the buffering of writes from the processor to the main memory, but this will generally not be significant.) But it is potentially serious as far as the processor is concerned. If part of the contents of the main memory have been copied into cache and are then changed by a DMA device, the processor will continue to operate with the contents of the cache.
One possible solution to this problem is for the DMA devices to write only into non-cacheable areas of the memory. When the processor wants to access a word from such an area, it will be forced to access it from the main memory. However, this technique has two disadvantages. First, it may be inconvenient to restrict the main memory areas into which the DMA devices to non-cacheable areas; and second, it negates the purpose of the cache.
A mechanism is therefore required for updating the contents of the cache when the contents of the main memory are changed by a DMA device. This is commonly called 'flushing' the cache.
There is also another problem which, although it does not involve interaction with external devices, is related to the flushing problem; this problem is that of initialisation. When the system (the subsystem of processor, cache, and main memory) is started up, the cache will initially be empty. But 'empty' means only that the cache does not contain any specific contents, so that whatever its contents are will be inconsistent with the contents of main memory.
This initialisation problem can be dealt with in various ways; one way is to treat it as if the whole of the main memory has been updated by a DMA device so that the whole of the cache memory is inconsistent with the main memory.
A variety of techniques have been proposed for flushing the cache. Reference is made to our co-pending application, entitled 'Cache Memory Flushing System' and filed simultaneously herewith, for a discussion of a ►930465 number of such techniques. For convenience, it will be assumed here that the present system uses the cache flushing technique which is the subject of that application.
Cache zoning
In some systems, the bulk of the processing utilises (a) tables or similar data structures which are looked up (read) frequently but rarely changed, and (b) information structures which enter the system fairly frequently from DMA or other external devices and are processed (changed) using the information in the tables. For convenience we shall call the tables of type (a) permanent, although it must be realised that their permanence is only a matter of degree; they will be changed on occasion, but such changes are relatively rare.
A typical example of such a system is a router in a communication system. The DMA devices are here devices which receive and transmit messages. Each message includes a header which carries information about its destination, and is written into main memory when it is received. The header is then processed, to determine the manner and route by which the message is to be forwarded on from the system. The header is also updated, and the message (with its updated header) is then read from the main memory by one of the DMA devices.
In this system, the header processing involves reference to permanent tables (eg system topology tables). It is obviously desirable for both these tables and the headers themselves to be placed in cache. The headers, which undergo updating (ie writing to), and relatively transient. It is obviously essential for the
930465^
- 10 headers in cache to be updated so that header processing is carried out on current headers rather than old information.
Such a system can operate using a conventional simple cache. A simple cache system does not distinguish between permanent and temporary data. Any data which is required gets copied into the cache. This displaces old data from the cache, but any data so displaced is automatically copied into the main memory, and can be retrieved again later if necessary.
In general, however, this will result in parts of the permanent data being displaced fairly frequently. Since those parts of the permanent data are required repeatedly, the speed of the system will be reduced by the repeated need to copy displaced parts of the permanent data back into the cache. It is therefore desirable to segregate the two types of data, permanent and temporary, into separate areas or zones of the cache, so that the permanent data stays permanently in the cache.
One way of achieving this is to locate the various blocks of the two sorts of data appropriately in memory space. It is convenient here to use the term 'page' (or, more fully, 'cache page') for the memory space in the main memory defined by a tag; the size of each page in thus the same as the size of the cache, and the memory space is divided into sequential pages.
Locating the various data blocks appropriately involves satisfying the following conditions. The page format must consist of two zones, one for permanent data and the other for temporary data. (In practice, the zone for permanent data will usually be much larger than the zone for temporary data; it is also convenient for
-li9304651 the division into the two zones to be achieved by a simple division of the page into a large upper zone for permanent data and a small lower zone for temporary data.) Further, if all cache pages are superimposed on each other, the various permanent data blocks must remain distinct; in other words, two permanent data blocks in different pages must not occupy the same (or overlapping) positions in those pages.
In many cases, these conditions will not be too onerous as far as the permanent data blocks are concerned. Since these are relatively permanent, it will often be quite feasible to organise their positions in the memory space so that those in different pages do not overlap within the page format. (Those in the same page obviously cannot overlap.) However, these conditions may well be difficult or impossible to satisfy as far as the transient data blocks are concerned, particularly if these data blocks enter the system at irregular intervals and through a variety of different devices and form parts of messages (data blocks) of varying lengths. Even if the conditions can be satisfied for such data blocks, a very inefficient use of memory space may result, and satisfying them may impose a heavy management load on the system, so reducing the system capacity.
It may be possible to overcome these problems (concerning the transient data blocks) by providing cache updating hardware to copy the transient data blocks into the transient zone of the cache. The cache updating hardware will in effect be a DMA device coupled between the system bus and the cache.
A major drawback with this is that it is complicated. The cache updating hardware will have to have enough complexity to monitor the passage of transient data blocks on the bus (from other DMA devices to the main memory), and identify the blocks and copy them from the main memory into the cache. In practice, this may involve address and byte-count counters, special DMA start and end flags, interrupt logic, error handling, and so on. Further, the cache will also have to be able to be driven from two separate sources, the processor and the cache updating hardware, so the cache will also have to have increased complexity.
The invention
The object of the present invention is to provide a zoned cache system which alleviates or overcomes these problems .
According to the present invention there is provided a computer system having a system bus with a main memory, at least one direct memory access (DMA) unit, and a processor subsystem coupled thereto, the processor subsystem comprising a processor and a cache memory coupled together by a data bus, a tag/high-memory-address bus, and a short/low-memory-address bus, characterized by a cache zoning mechanism comprising pseudo tag generating means for forcing a pseudo tag into a zone within the cache, pseudo tag preserving means for preventing that pseudo tag from being changed during normal operations, and flushing means for forcing misses on reading within the transient zone.
Specific Example
A cache memory system with a zoning system embodying the present invention will now be described, by way of example, with reference to the drawings, in which:
'930465
Fig. 1 is a block diagram of the system;
Figs. 2 and 2A are memory maps showing the cache and main memory spaces; and
Fig. 3 is a more detailed diagram of the zoning circuitry of the system.
General system organisation
Referring to Fig. 1, the system comprises a processor PROC 10 and a main memory M MEM 11 coupled together via a system bus SYS BUS 12 and an interface unit i/F 13. The system bus 12 is a 32-bit bus which is used for both data and addresses, and also has coupled to it further devices, such as communications devices DMA1 14 and DMA2 15. The processor 10 also has coupled to it two cache units, a data cache unit D-CACHE 16 and an instruction cache unit I-CACHE 17 (which can be ignored for present purposes). More specifically, the processor is coupled to the cache units (and to the interface 13) by a 32-bit data bus DATA 20, an 18-bit tag bus TAG 21, and a 14-bit short/low address bus SH/LOW 22.
The tag bus 21 and the short/low address bus 22 are effectively combined at the processor 10 and the interface 13 to form a 32-bit address bus, corresponding to the 32-bit system bus 12 when that is carrying addresses. The data bus 20 similarly acts as a 32-bit word (data) bus between the processor 10 and the system bus 12, and corresponds to the 32-bit system bus 12 when that is carrying data. Reads from the main memory result in data words passing from the main memory 11 over the system bus, through the interface 13, and over the data bus 20. Writes to the main memory pass through a write buffer 18 in the interface 13, which passes first the address (from the combined tag and short/low address buses 21 and 22) and then the data word (from the data bus 20) to the main memory 11 over the system bus 12.
The data bus 20 and the tag bus 21 are effectively combined at the cache unit 16 as an extended word bus, and the short/low address bus 22 acts as the effective address bus for this cache unit. Thus the short/low address on bus 22 selects a location in the cache unit, which stores the extended word consisting of the 32-bit data plus the 18-bit tag on the combined data and tag buses 20 and 21.
General system operation
Consider first what happens when the processor wants to write a word. It issues the data word to be written on the data bus 20, and the address of the word on the combined tag and short/low address buses 21 and 22. The address and accompanying data word are passed to the write buffer 18, which passes the address and the data word in sequence over the system bus 12 to the main memory 11, so that the word is written into the main memory. In addition, the data word on the data bus 20 and the tag on the tag bus 21 are combined to form an extended word which is passed to the cache unit 16, and the short/low address on the short-low address bus 22 is passed to the cache unit 16, which writes the extended word at that short/low address.
Consider now what happens when the processor wants to read a word. It issues the address of the word on the combined tag and short/low address buses 21 and 22, and this full address is passed to the interface 13. The short/low address is passed to the cache unit 16, which
- 15 reads the extended word at that short/low address out onto the data bus 20 and the tag bus 21.
The processor 10 then compares the tag which returns on the tag bus 21 with the tag of the original address (having retained this original tag internally). If the two tags (that of the desired address and that returned from the cache unit 16) match - that is, if there is a hit - then the processor accepts the data on the data bus 20 as the desired word, and aborts the main memory read which was being initiated. If the two tags do not match - that is, if there is a miss - then the processor does not accept the data returned from the cache unit 16 on the data bus 20, but waits for the main memory access to be completed, when the data word read from the main memory will pass through the interface 13 and appear on the data bus 20. In the case of a read following a miss, the data word being read from main memory by the processor automatically gets written into cache.
Real memory mapping
Fig. 2 shows the organisation of the memory spaces of the main memory and the cache. The main memory 12 is notionally divided into pages, of which four (P1-P4) are shown. The size of each page is the same as the size of the cache 16, and inside each page, all addresses have the same tag (and, of course, different short/low addresses). The cache is divided (in a manner discussed later) into two zones, a large zone 30 for permanent blocks and a small zone 31 for transient blocks, the size of the large zone being an exact multiple of the size of the small zone.
The cache is shown as containing three permanent blocks, 32C-34C. These correspond to three permanent blocks 32M-34M in the main memory. The blocks 32C-34C
- 16 in the cache occupy different, non-overlapping, areas of the permanent cache zone 30; each of the permanent blocks 32M-34M occupies a position in its page exactly matching the position of the corresponding permanent block in the cache. However, there is no restriction (for present purposes) on which pages in the main memory the permanent blocks in that memory can occupy.
The main memory is also shown as containing three transient blocks 35M-37M, and the cache as containing a transient block 38C in its transient zone 32. As will be seen, the present system can copy any of the transient blocks 35M-37M into the transient zone 31 of the cache as block 38M.
Each page in the main memory 12 can be regarded as divided into sequential sub-pages, each of which corresponds to the transient zone 31 in the cache. Although, as just described, a transient block can be copied from anywhere in the main memory into the cache's transient zone, the position of the transient block in the sub-page cannot be altered during this copying. Indeed, if the transient block should overlap two adjacent sub-pages, it will be 'wrapped around' during copying into the transient zone 31. This is illustrated by Fig. 2A, which shows how a transient block 39M which falls partly in a page P10 and partly in a page P11 of the main memory is split into two portions during copying into the transient zone of the cache.
Abstract memory space
The memory space discussed with reference to Figs. 2 and 2A is memory space in real memory. However, the system has an abstract memory space which is defined by the addresses used by the system - or, more specifically, by the number of bits in the addresses
030485
- 17 used by the system, ie by the width of the address bus. (A common physical bus may be used in parts of the system, with addresses and data being multiplexed onto it.) The physical size of the main memory will normally be considerably smaller than this address space.
For a 32-bit bus width, the address space is 2^2, ie 4G, words, which is far larger than any practicable real main memory. A typical size for the (real) main memory is 8 M words; as typical size for the cache is 64k words. This means that there are large areas of memory space which are non-existent as real memory; such areas are termed NXM.
(With a 32-bit bus width, the word length is 32 bits. In some systems, the bottom 2 bits of the address may be used for the selection of a byte within a 32-bit word; in others, the bottom 4 bits of the address may be used as decoded byte selection bits to allow the selection of any desired bytes within a 32-bit word. These elaborations of addressing are not relevant for present purposes, and will not be further discussed.
Even with 4 byte selection bits, the word address space is still 22®, ie 250M, words, which is still far larger than any practicable real main memory size.)
The permanent zone 30 in the cache is treated as a normal area of cache; in particular, words stored in it are accessed by their normal main memory addresses. The transient zone 31, however, is treated differently. The tags in this zone are set to a 'pseudo' value, and the program segments which are associated with the two-zone cache mode accesses words in the transient blocks by using the pseudo tag instead of the tag parts of the real addresses (ie the address in main memory) of those words.
9304693
- 18 The transient zone in the cache thus occupies a particular sub-page in abstract memory space. The value for the pseudo tags is chosen to locate this sub-page in non-existent memory, ie in a region of memory space which is not mapped into the real main memory. This has two advantages. One is that the contents of the cache's transient zone can never be confused with main memory information having the same address. The other is that if there is a fault and there is a miss on an attempted read of a pseudo address, the system will attempt to read that address from main memory. This will result in an NXM error being returned, so the occurrence of a fault is thus detected. (Among possible faults are a cache fault, or some of the permanent data spilling over from the permanent zone into the transient zone.)
Cache zone management mechanism
Fig. 3 shows the connection of the tag bus 21 and the short/low memory bus 22 with the data cache 16 in more detail.
The tag bus is split into two branches, an input branch 40 and an output branch 41, each 18 bits wide. Input branch 40 is buffered into the cache by buffers 42, and the cache buffers tags onto the output branch 41 via tri-state buffers 43. Input branch 40 passes through a multiplexer 44 which is also fed with a hard-wired pseudo tag value PS. Multiplexer 44 is controlled by a zone mode control flip-flop ZONE 45.
The short/low address bus 22 is divided into two sub-buses, an upper 8-bit sub-bus 46 and a lower 6-bit sub-bus 47. (These values determine the size of the cache and the two zones in it. The cache size is determined by the total width of the short/low address bus, 14 bits, as 2^, ie 4k, words; the transient zone
193 0 46 §
- 19 size is determined by the width of the lower sub-bus, 6 bits, as 2θ, ie 64, words; and the permanent zone size is the remainder of the cache.) The upper sub-bus 46 passes through a multiplexer 48 which is also fed with a hard-wired all Os value. Multiplexer 48 is controlled by the zone mode control flip-flop 45.
For normal operation, the zone mode control flip-flop is set to pass the tag on the tag bus 21 and the full short/low address on bus 22 to the cache. The cache operates in the conventional manner.
This normal mode is used for normal processing even when the contents of the cache have been set to transient data in the transient zone and permanent data in the permanent zone. In this state, the only distinction between the permanent and transient data is that the two types of data (a) happen to be in different areas of the cache, and (b) happen to have tags which map to different areas of memory. (The permanent data tags map to the permanent zones of pages in real memory; the transient data tags map to the pseudo page.)
However, writes to the cache are normally accompanied by writes to main memory. So for writes to the temporary zone of the cache, either the accompanying writes to main memory must be disabled or the returned NXM Error signals must be ignored.
To transfer a transient block to the cache, the FLUSH mode must be entered (eg by setting the flush control flip-flop and utilising the flush control circuitry shown in the accompanying application noted above). In addition, the zone mode control flip-flop is set to select the pseudo address value PS as the tag on tag branch 40 to the cache and to select the all 0s value for the top 8 bits on the full short/low address on bus 22 to the cache.
£93 0 4 6 5
This has two effects. First, the tags of the block being written in by the flush are all forced to the pseudo value. Second, since the short/low address on the short/low address bus has its upper 8 bits forced to 0, only the bottom 64 words, ie the transient zone in the cache, can be addressed. Hence the transient block is copied into the transient zone.
If the zone mode control flip-flop is not set, then flushing will flush the whole cache (or any desired area of it) in the normal way, copying data from the main memory into the cache along with its addresses without modification. This would be used to initialise or update permanent blocks.
When a transient block has been processed, it will of course normally have to be written back to the main memory. To achieve this, the processor reads the block word by word from the cache (using the pseudo address tag), and writes it to the desired addresses; as with all writes, this writes it into main memory. The transient block must be prevented from being re-written into the cache with its real (main memory) addresses; this can be achieved by the processor setting the cache into a read-only state for this operation, or by writing the processed transient block to a non-cacheable address area.
Minor modifications can obviously be made to this system. For example, the transient zone of the cache can be initialised by entering a privileged mode, in which the processor can write directly to the cache without also writing to main memory. For this, the processor would write through the whole of the transient zone of the cache, to enter the pseudo tag in all extended words of that zone. (This will normally be accompanied by writing dummy data words, which will be
3 0 4 6 5 Μ
- 21 displaced as the required transient blocks are later copied into the cache by flushing.) Also, the cache may be designed so that the data word and tag parts can be separately enabled. The tag part would then be disabled for flushing transient blocks, so that once the pseudo tags have been written through the transient block on initialisation, they remain undisturbed. These features would make the multiplexer 44 superfluous.
It will be realised that to preserve the integrity of the data in the transient zone, normal entry of data into that zone must not occur. This means that the transient zones in all memory pages must be treated as reserved memory space, and must not be occupied by any data which might be required during normal processing operations. This reduces the effective memory size, but only by a very slight amount (one sub-page per page). However, it does require the memory areas occupied by data which might be required for processing to be arranged in main memory in such a way that none of it occupies any of the reserved sub-pages.
This condition only applies, of course, while the cache is operating in the two-zone mode. In this mode, the permanent zone is largely occupied by permanent tables. As discussed above, it is generally fairly easy to arrange for these tables to occupy only the permanent zones of the pages in main memory. If, for some part of the processing, it is difficult to satisfy this condition, then the processor may be allowed to use the whole of the cache freely; the two-zone mode of cache operation may be resumed later, provided that the cache is re-initialised for that mode.
As noted above, if any permanent data should spill
9304653
- 22 over into the transient zone, this will result in a miss if an attempt is made to read a transient word which has been so over-written. This will result in passing the read to the main memory. Since the transient zone pseudo tags correspond to non-existent (real) memory, this will result in an NXM fault condition. Thus if this condition is violated, a fault will result rather than the system continuing to operate on incorrect data.
The overheads required by the present technique are minimal, in both hardware and software. The hardware required is the logic circuitry of Fig. 3, and a single bit (the zone mode control flip-flop) in a control register; the software required is the program for initialising the cache, for setting and clearing the zone mode control bit in the control register, and for transferring the processed transient blocks back to main memory.
Claims (6)
1. A computer system having a system bus (12) with a main memory (11), at least one direct memory access (DMA) unit (14, 15), and a processor subsystem (10, 13, 16, 17) coupled thereto, the processor subsystem comprising a processor (10) and a cache memory (16) coupled together by a data bus (20), a tag/high-memory-address bus (21), and a short/low-memory-address bus (22), characterized by a cache zoning mechanism (Fig. 3) comprising pseudo tag generating means (PS, 44) for forcing a pseudo tag (PS) into a zone (31, Fig. 2) within the cache, pseudo tag preserving means for preventing that pseudo tag from being changed during normal operations, and flushing means for forcing misses on reading within the transient zone.
2. A computer system according to claim 1, characterized by means (48) for forcing the upper part of the short/low-memory-address bus to all 0s to thereby define the size of the transient zone.
3. A computer system according to either previous claim, characterized in that the pseudo tag generating means (PS, 44) comprise a multiplexer (44) in the tag bus (21) to the cache and fed with a fixed pseudo tag (PS) .
4. A computer system according to claim 3, characterized in that the pseudo tag generating means (PS, 44) also act as the pseudo tag preserving means.
5. A computer system according to any one of claims 1 to 3, characterized in that the pseudo tag preserving means comprise means for disabling the tag portion of the cache.
6. A computer system substantially as hereinbefore described with reference to the accompanying drawings.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IE930465A IE68440B1 (en) | 1993-06-21 | 1993-06-21 | Cache memory zoning system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IE930465A IE68440B1 (en) | 1993-06-21 | 1993-06-21 | Cache memory zoning system |
Publications (2)
Publication Number | Publication Date |
---|---|
IE930465A1 true IE930465A1 (en) | 1994-12-28 |
IE68440B1 IE68440B1 (en) | 1996-06-12 |
Family
ID=11039995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IE930465A IE68440B1 (en) | 1993-06-21 | 1993-06-21 | Cache memory zoning system |
Country Status (1)
Country | Link |
---|---|
IE (1) | IE68440B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2337348A (en) * | 1998-03-02 | 1999-11-17 | Hewlett Packard Co | Improving memory performance by using an overlay memory |
-
1993
- 1993-06-21 IE IE930465A patent/IE68440B1/en not_active IP Right Cessation
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2337348A (en) * | 1998-03-02 | 1999-11-17 | Hewlett Packard Co | Improving memory performance by using an overlay memory |
GB2337348B (en) * | 1998-03-02 | 2003-01-08 | Hewlett Packard Co | Integrated hierarchical memory overlay method for improved processor performance |
Also Published As
Publication number | Publication date |
---|---|
IE68440B1 (en) | 1996-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5091851A (en) | Fast multiple-word accesses from a multi-way set-associative cache memory | |
US5537572A (en) | Cache controller and method for dumping contents of a cache directory and cache data random access memory (RAM) | |
US5875451A (en) | Computer hybrid memory including DRAM and EDRAM memory components, with secondary cache in EDRAM for DRAM | |
US5721874A (en) | Configurable cache with variable, dynamically addressable line sizes | |
US5706466A (en) | Von Neumann system with harvard processor and instruction buffer | |
US5214770A (en) | System for flushing instruction-cache only when instruction-cache address and data-cache address are matched and the execution of a return-from-exception-or-interrupt command | |
US5060137A (en) | Explicit instructions for control of translation lookaside buffers | |
US8621152B1 (en) | Transparent level 2 cache that uses independent tag and valid random access memory arrays for cache access | |
KR100805974B1 (en) | Smart cache | |
US5313602A (en) | Multiprocessor system and method of control over order of transfer of data between buffer storages | |
US6151661A (en) | Cache memory storage space management system and method | |
US4992977A (en) | Cache memory device constituting a memory device used in a computer | |
US5717916A (en) | Method for providing an improved fully associative cache memory having a finite state machine and linked list structure | |
EP0533427B1 (en) | Computer memory control system | |
US5293622A (en) | Computer system with input/output cache | |
KR100814982B1 (en) | Cache with multiple fill mode | |
US6202128B1 (en) | Method and system for pre-fetch cache interrogation using snoop port | |
JPH0616272B2 (en) | Memory access control method | |
EP0470738B1 (en) | Cache memory system and operating method | |
US5732405A (en) | Method and apparatus for performing a cache operation in a data processing system | |
US5553264A (en) | Method and apparatus for efficient cache refilling by the use of forced cache misses | |
EP0173909B1 (en) | Look-aside buffer least recently used marker controller | |
IE930465A1 (en) | Cache memory zoning system | |
JP3733604B2 (en) | Cache memory | |
US5838946A (en) | Method and apparatus for accomplishing processor read of selected information through a cache memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Patent lapsed |