WO1992004674A1 - Computer memory array control - Google Patents
Computer memory array control Download PDFInfo
- Publication number
- WO1992004674A1 WO1992004674A1 PCT/GB1991/001557 GB9101557W WO9204674A1 WO 1992004674 A1 WO1992004674 A1 WO 1992004674A1 GB 9101557 W GB9101557 W GB 9101557W WO 9204674 A1 WO9204674 A1 WO 9204674A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- buffer
- host computer
- memory units
- bits
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C29/00—Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
- G11C29/70—Masking faults in memories by using spares or by reconfiguring
- G11C29/88—Masking faults in memories by using spares or by reconfiguring with partially good memories
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1012—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
Definitions
- This invention relates to computer memories, and in particular to a controller for controlling and a method of controlling an array of memory units in a computer.
- an idealistic computer memory would be a memory having no requirement to "seek" the data. Such a memory would have instantaneous access to all data areas. Such a memory could be provided by a RAM disk. This would provide for access to data regardless of whether it was sequential or random in its distribution in the memory.
- RAM is disadvantageous compared ,to the use of conventional magnetic disk drive storage media in view of the high cost of RAM and especially due to the additional high cost of providing "redundancy" to compensate for failure of memory units.
- non-volatile computer memories are magnetic disk drives.
- these disk drives suffer from the disadvantage that they require a period of time to position the head or heads with the correct part of the disk corresponding to the location of the data. This is termed the seek and rotation delay. This delay becomes a significant portion of the data access time when only a small amount of data is to be read or written to or from the disk.
- RAID-3 This document describes two types of arrangements.
- the first of these arrangements is particularly adapted for large scale data transfers and is termed "RAID-3".
- RAID-3 At least three disk drives are provided in which sequential bytes of information are stored in the same logical block positions on the drives, one drive having a check byte created by a controller written thereto, which enables any one of the other bytes on the disk drives to be determined from the check byte and the other bytes.
- RAID-3 as used hereinafter is as defined by the foregoing passage.
- the RAID-3 arrangement there is preferably at least five disk drives, with four bytes being written to the first four drives and the check byte being written to the fifth drive, in the same logical block position as the data bytes on the other drives.
- each byte stored on it can be reconstructed by reading the other drives.
- the computer be arranged to continue to operate despite failure of a disk drive, but also the failed disk drive can be replaced and rebuilt without the need to restore its contents from probably out-of-date backup copies.
- a disk drive storage system having the RAID-3 arrangement is described in EP-A-0320107, the content of which are incorporated herein by reference.
- RAID-5 The second type of storage system which is particularly adapted for multi-user applications, is termed "RAID-5".
- RAID-5 The second type of storage system which is particularly adapted for multi-user applications, is termed "RAID-5".
- RAID-5 arrangement there are preferably at least five disk drives in which four sectors of each disk drive are arranged to store data and one sector stores check information.
- the check information is derived not from the data in the four sectors on the disk, but from designated sectors on each of the other four disks. Consequently each disk can be rebuilt from the data and check information on the remaining disks.
- RAID-5 is seen to be advantageous, at least in theory, because it allows multi-user access, albeit with equivalent transfer performance of a single disk drive.
- a write of one sector of information involves writing to two disks, that is to say writing the information to one sector on one disk drive and writing check information to a check sector on a second disk drive.
- writing the check sector is a read modify write operation, that is, a read of the existing data and check sectors first, because the old contents of those sectors must be known before the correct check information, based on the new data to be written, can be generated and written to disk.
- RAID-5 does allow simultaneous reads by multiple users from all disks in the system which RAID-3 cannot support.
- RAID-5 cannot match the rate of data transfer achievable with RAID-3 , because with RAID-3, both read and write operations involve a transfer to each of the five disks (in five disk systems) of only a quarter of the total amount of information transferred. Since each referral can be accomplished simultaneously the process is must faster than reading or writing to a single disk particularly where large scale transfers are involved. This is because most of the time taken to effect a read or write in respect of a given disk drive, is the time taken for the read/write heads to be positioned with resect of the disk, and for the disk to rotate to the correct angular position. Clearly, this is as long for one disk, as it is for all four. But once in the correct position, transfers of large amounts of sequential information can be effected relatively quickly.
- RAID-5 only offers multiple user access in theory, rather than in practice, because requests for sequential information by the same user usually involves reading several disks in turn, thereby occupying those disks so that they are not available to other users.
- RAID-3 disk drives are presently made to read or write minimum amounts of information on each given occasion. This is the formatted sector size of the disk drive and there is usually a minimum of 256 Bytes. In RAID-3 format this means that the minimum block length on any read or write is 1,024 Bytes. With growing disk drive capacities the tendency is towards even larger minimum block sizes such as 512 Bytes, so that RAID-3 effectively quadruples that minimum to 2,048 Bytes.
- RAID-5 on _he other hand does not increase the minimum data block size.
- RAID-5 the multi-user capability of RAID-5 which makes it theoretically more advantageous than RAID-3; but, in fact, it is the data transfer rate and continued performance in the event of drive failure in RAID-3 format which gives the latter much greater potential.
- the present invention prov des a computer memory controller for interfacing to a host computer comprising a buffer means for interfacing to at least one memory unit and for holding data read thereto or therefrom; said buffer means controlled to form a plurality of buffer segments for addressably storing data requested by said host computer and further data which is logically sequential thereto; and control means operative to control the transfer of data to said host computer in response to requests therefrom by first addressing said buffer segments to establish whether the requested data is contained therein and if so supplying said data to said host computer, and if the requested data is not contained in the buffer segments reading said data from the or each memory unit, supplying said data to said host computer, reading from the or each memory unit further data which is logically sequential to the data requested by said host computer and storing said further data in a buffer segment; said control means controlling the buffer means to control the number and size of said buffer segments.
- the present invention provides a method of controlling an array of memory units for use with a host computer comprising the steps of receiving from said host computer a read request for data stored on the memory units, checking a plurality of buffer segments to establish whether the requested data is in said buffer segments, either complying with said request by transferring the data in said buffer segments to said host computer, or first reading said data from said memory units into one buffer segment and then complying with said request, reading from the memory units further data logically sequential to the data requested and storing said data in said buffer segment.
- the present invention provides a computer memory controller for a host computer comprising buffer means for interfacing to at least three memory units arranged in parallel and for holding information read from said memory units; a logic circuit connected to said buffer means to recombine bytes or groups of bits successively read from successive ones of a group of said memory units; parity means operative to use a check byte or group of bits read from one of said memory units to regenerate information read from said groups of memory units if one of said group of memory units fails; said buffer means being controlled to form a number of buffer segments each storing data requested by an application run on said host computer and further data which is logically sequential thereto, and a controller for controlling the transfer of data to said host computer in response to requests from said host computer by checking said buffer segments to establish whether the requested data is in said buffer segment and supplying said data to said host computer, or reading said data from said memory units, supplying said data to said host computer, reading from said memory units further data which is logically sequential to the data requested by said host computer and storing said further
- the present invention provides a computer memory controller for a host computer comprising buffer means for interfacing at least three memory units arranged in parallel, a logic circuit connected to said buffer means to split data input from said host computer such that successive bytes or groups of bits from said host computer are temporarily stored in said buffer means before being successively applied to successive ones of a group of said memory units, said logic circuit being further operative to recombine bytes or groups of bits successively read from successive ones of said group of said memory units into said buffer means, said logic circuit including parity means operative to generate a check byte or group of bits from said data for temporary storage in said buffer means before being stored in at least one said memory unit, and operative to use said check byte to regenerate said data read from said group of memory units if one of said group of memory units fails, said buffer means being divided into a number of channels corresponding to the number of memory units, each said channel being divided into associated portions of buffer segments, buffer segments containing successive bytes or groups of bits corresponding to data for an application being run by said host computer
- read-ahead data Since computers tend to request sequential data, particularly those running UNIX 5.4 Operating Systems and many modern Fileservers and Operating Systems, the chances are, that at a subsequent request, the requested data will actually be in the buffer, and so another read of the disk drive can be dispensed with. Indeed, it is a requirement of the computer operating system and/or the application programs being run by the various users, that in order to benefit from the present invention, the system or programs must make a habit of making at least one subsequent request for sequential data. Otherwise the present invention cannot realise the object of RAID-35 type operations.
- each buffer segment is capable of holding at least 128 Kilo-Bytes.
- write data is initially stored in some of said buffer segments, which are especially assigned for this purpose, so that actual writing to disk can be achieved in background during quiet times for the disk system.
- the present invention gives all the theoretical advantages of RAID-5 operation and operates faster, with multiple simultaneous reads and writes, but at the same time, the simultaneous data transfer rates, and the better performance on any one disk drive failure achievable by RAID-3 format.
- the present invention is not limited to the use of such disk drives.
- the present invention is equally applicable to the use of any memory device which has.a long seek time for data compared to the data transfer rate once the data is located.
- Such media could, for instance, be an optical compact disk.
- a computer storage system comprises an array of magnetic disk drives organised in RAID-3 format having at least three channels, said array comprising a plurality of disk drives connected to each said channel, each of said plurality of disk drives connected to a channel being connected through a single bus by means of which each disk drive is independently accessible.
- the computer storage system incorporates the use of a segmented buffer as hereinbefore described together with the array of magnetic disk drives organised in RAID-3 format.
- the multiple accessibility of the data stored in the memory is enhanced to its greatest potential.
- disk drives are employed on each of five channels, and thus the overall data storage capacity of the system is expanded by sevenfold.
- Such an array provides large scale storage of information together with the faster data transfer rates and better performance with regard to multi-user applications, and security in the event of any one drive failure (per group) .
- the mean time between failures (MTBF) of such an array (when meaning the mean time between two simultaneous drive failures (per group) , and which is required in order to result in information being lost beyond recall) is measured in many thousands of years with presently available disk drives each having individual MTBFs of many thousands of hours.
- Figure 1 is a block diagram of the controller architecture of a disk array system according to one embodiment of the present invention.
- Figure 2 illustrates the operation of the data splitting hardware.
- Figure 3 illustrates the read/write data cell matrix.
- Figure 4 illustrates a write data cell.
- Figure 5 illustrates a read data cell.
- Figure 6 is a flow diagram illustrating the software steps in write operations
- Figure 7 is a flow diagram illustrating the software steps in read operations
- Figures 8 and 9 are flow diagrams illustrating the software steps for read ahead and write behind
- Figure 10 is a flow diagram illustrating the software steps involved to restart suspended transfers
- Figure 11 is a flow diagram illustrating the software steps involved in cleaning up segments
- Figures 12 and 13 are flow diagrams illustrating the steps involved for input/output control.
- Figure 1 illustrates the architecture of the raid 35 disk array controller.
- the internal interface of the computer memory controller 10 is termed the ESP data bus interface and the interface to the host computer is termed the SCSI interface. These are provided in interface 12.
- the SCSI bus interface communicates with the host computer (not shown) and the ESP interface communicates with a high performance direct memory access (DMA) unit 14 in a host interface section 11 of the computer memory controller 10.
- DMA direct memory access
- the ESP interface is 16 bits (one word) wide.
- the host interface section communicates with a central buffer management (CBM) section 20 which comprises a central controller 22, in the form of a suitable microprocessor such as the Intel 80376 Microprocessor, and data splitting and parity control (DSPC) logic circuit 24.
- CBM central buffer management
- DSPC data splitting and parity control
- the DSPC 24 also combines the information on the first four channels and, after checking against the parity channel, transmits the combined information to the host computer. Furthermore, the DSPC 24 is able to reconstruct the information from any one channel, should that be necessary, on the basis of the information from the other four channels.
- the DSPC 24 is connected to a central buffer 26 which is divided into five channels A to E, each of which is divisible into buffer segments 28.
- Each central buffer channel 26,A through 26,E have the capacity to store up to half a megabyte of data for example, depending on the application required.
- Each segment may be as small as 128 kilobytes for example so that up to 16 segments can be formed in the buffer.
- the central buffer 26 communicates with five slave bus controls 32 under the direction of a slave bus controller 34 in a slave bus interface (SBI) section 30 of the memory controller 10.
- the slave bus controller 34 operates under the direction of the central controller 22.
- Each slave bus controller 32,A through 32,E communicates with up to seven disk drives 42,0 to 42,6 along SCSI buses 44,A through 44,E so that the drives 42,0,A through 42,0,E form a bank, 0 of five disk drives and so also do drives 42,1,A through 42,1,E etc. to 42,6,A through 42,6,E.
- the seven banks of five drives effectively each constitute a single disk drive, each individually and independently accessible. This is made possible by the use of SCSI buses, which allow for eight device addresses. One address is taken up by the slave bus control 32 whilst the seven remaining addresses are available for seven disk drives. The storage capacity of each channel can therefore be increased sevenfold and the slave bus controller 32 is able to access any one of the disk drives 42 in the channel independently.
- This arrangement of banks of disk drives is not only applicable to the arrangement shown in Figure 1, but is also applicable to the RAID-3 arrangement.
- Information stored in the disk drives of one bank can be accessed virtually simultaneously with information being accessed from the disk drives of another bank.
- This arrangement therefore gives an enhancement in access speed to data stored in an array of disk drives. No enhancement of speed would of course occur where information requested from two applications is stored in the same bank of disks. However, in theory at least the chance of two simultaneous requests for information being found in the same bank is 1/n where n is the number of banks employed. This is taken care of by the I/O software.
- its memory 10 consists of a number of sectors each identified by a unique address number. Where or how these sectors are stored on the various disk drives of the memory 40 is a matter of no concern to the host computer, it must merely remember the address of the data sectors it requires. Of course, addresses themselves may form part of the data stored in the memory.
- one of the functions of the central controller 22 is to store data on the various disk drives efficiently. Moreover each sector in so far as the host is concerned, is split between four disk drives in the known RAID-3 format.
- the central controller 22 arranges to store sectors of information passed to it by the host computer, in an ordered fashion so that a sector on any given disk drive is likely to contain information which logically follows from a previous adjacent sector.
- the read request is received by the central controller 22 which passes the request to the slave bus interface (SBI) controller 34.
- the SBI controller 34 instructs the slave bus control 32 to read the disk banks 40 and select the appropriate data from the appropriate banks of disks.
- the DSPC circuit 24 receives the requested data and checks it is accurate against the check data in channel E.
- the faulty drive is isolated and the system arranged to continue working employing the four good channels, in the same way and with no loss of performance, until the faulty drive is replaced and rebuilt with the appropriate information.
- the central controller 22 first responds to the data read request by transferring the information to the SCSI interface 12. However, it also instructs further information logically sequential to the requested information to be read. This is termed "read ahead information”. Read ahead information up to the capacity presently allocated by the central controller 22 to any one of the data buffer segments 28 is then stored in one buffer segment 28.
- the central controller 22 When the host computer makes a further request for information, it is likely that the information requested will follow on from the information previously requested. Consequently, when the central controller 22 receives a read request, it first interrogates those buffer segments 28 to determine if the required information is already in the buffer. If the information is there, then the central controller 22 can respond to the user request immediately, without having to read the disk drives. This is obviously a much faster procedure and avoids the seek delay. On those occasions when the required information is not already in the buffer, then a new read of the disk drives is required. Again, the requested information is passed on and sequential read ahead information is fed to another buffer segment. This process continues until all the buffer segments are filled and the system is maintained with its segments permanently filled.
- the central controller 22 will have allocated at least as many buffer segments 28 as there are application programs, up to the maximum number of segments available. Each buffer segment will be kept full by the central controller 22 ordering the disk drive seek commands in the most efficient manner, only over-riding that ordering when a buffer segment has been, say 50% emptied by host requests or when a host request cannot be satisfied from existing buffer segments 28. Thus all buffer segments are kept as full as possible with read ahead data.
- a hardware switch can be provided to ensure that all write instructions are effected immediately, with write information only being stored in the buffer segments transiently before being written to disk. This removes the fear that a power loss might result in data being lost which was thought to have been written to disk although not actually effected by the memory system. There is still however, the unlikely exception that information may be lost when a power loss occurs very shortly after a user has sent a write command, but in that event, the user is likely to be conscious of the problem. If this alternative is utilised, it does of course affect the performance of the computer.
- the controllers internal interface to the host system hardware interface is 16 bits (one word) wide. This is the ESP data bus. For every four words of sequential host data, one 64 bit wide slice of internal buffer data is formed. At the same time, an additional word or 16 bits of parity data is formed by the controller; one parity bit for four host data bits. Thus the internal width of the controller's central data bus is 80 bits. This is made up of 64 bits of host data and 16 bits of parity data.
- the data splitting and parity logic 24 is split up into 16 identical read/write data cells within the customised ASICS (application specific integrated circuits) design of the controller.
- the matrix of these data cells are shown in Figure 3.
- Each of these data cells handles the same data bit from the ESP bus for the complete sequence of four ESP 16 bit data words. That is, with reference to Figure 2, each data cell handles the same bit from each ESP bus word 0,1,2 and 3. At the same time, each data cell generates/reads the associated parity bit for these four 16 bit ESP bus data words.
- Data bits DB1 through DB15 will be identical in operation and description.
- each of these four bits is temporarily stored/latched in devices G38 through G41. As each bit appears on the ESP bus, it is steered through the multiplexor under the control of the two select lines to the relevant D-type latches G33 through G36, commencing with G33. At the end of this initial operation, the four host 16 bit words (64 data bits) will have been stored in the relevant gates G38 through G41 within all 16 data cells.
- the four DBO data bits are now called DBO-A through DBO-D.
- the RMW (buffer read modify write) control signal is set to select input ,A from all devices G38 through G42. Under these situations, the rebuild line is not used (don't care) .
- the corresponding parity data bit is generated via G31, G32, and G37.
- the resultant parity bit will have been generated and stored on device G42. This is accomplished as follows. As the first bit-0 (DBO-A) appears on the signal DBO, the INIT line is driven high/true and the output from the gate G31 is driven low/off. Whatever value is present on DBO will appear on the output of gate G32, and at the correct time will be clocked into the D-type G37. The value of DBO will now appear on the Q output of G37.
- the INIT signal will now be driven low/off, and will now aid the flow of data through G31 for the next incoming three data bits on DBO.
- Whatever value was stored as DBO-A on the output of gate G37 will now appear on the output of gate G31, and as the second DBO bit (DBO-B) appears on the signal DBO, an Exclusive OR value of these two bits will appear on the output of gate G32.
- this new value will be clocked into the device G37.
- the resultant Q output of G37 will now be the Exclusive OR function of DBO-A and DBO-B. This value will now be stored on device G42.
- the accumulative Exclusive OR (XOR) value of DBO-A through DBO-D is generated in this manner so as to preserve buffer timing and synchronisation procedures.
- the five outputs DBO-A through DBO-E are present for all data bits 0 through 15 of the four host data words.
- the total of 80 bits are now stored in the central buffer memory (DRAM) .
- the whole procedure is repeated for each sequence of four host data words (8 host data bytes) .
- each "sector" of slave disk drive data is assembled in the central buffer, it is written to all slave disk drives (to channel A through channel E) within the same bank of disk drives.
- the parity data bit is regenerated by the Exclusive OR gate G4 and compared to gate G2 with the parity data read from the slave disk drives at device G14. If a difference is detected, a NMI "non-maskable interrupt" is generated to the master processor device via gate G3. All read operations will terminate immediately.
- Gate G5 suppresses the effect of the parity bit DBO-E from the generation of the new parity bit.
- Gate Gl will suppress NMI operations if any slave disk drive has failed and the resultant mask bit has been set high/true. Also, gate Gl, in conjunction with gate G5, will allow the read parity bit DBO-E to be utilised in the regeneration process at gate G4, should any channel have failed.
- the single failed disk drive/channel will have its mask bit set high/true under the direction of the controller software.
- the relevant gates within G6 through G9 and G10 through G14 for the failed channel/drives will have their outputs determined by their "B" inputs, not their "A” inputs.
- Gl will suppress all NMI generation, and together with gate G5, will allow parity bit DBO-E to be utilised at gate G4.
- the four valid bits from gates G10 through G14 will "regenerate” the "missing” data at gate G4, and the output with gate G4 will be fed to the correct ESP bus data bit DBO via a "B" input at the relevant gate G6 through G9.
- gate G12 will be driven low and will not contribute to the output of gate G4.
- the output of gate Gl will be driven low/false and will both suppress NMIs, and will allow signal DBO-E to be fed by gate G5 to gate G4.
- Gate G4 will have all correct inputs from which to regenerate the missing data and feed the data to the output of device G8 via its "B" input. At the correct time, this bit will be fed through the multiplexor to DBO.
- the memory controller must first read the. data from the functioning four disk drives, regenerate the missing drive's data, and finally write the data to the failed disk drive after it has been replaced with a new disk drive.
- All channels of the central buffer memory 26 will have their data set to the regenerated data, but only the single replaced channel data will be written to the new disk drive under software control.
- the master 80376 processor detects an 80186 channel (array controller electronics) failure due to an "interprocessor" command protocol failure.
- An 80186 processor detects a disk drive problem i.e. a SCSI bus protocol violation.
- An 80186 processor detects a SCSI bus hardware error. This is a complete channel failure situation, not just a single disk drive on that SCSI bus.
- the channel/ ⁇ rive "masking" function is performed by the master 80376 microprocessor. Under fault conditions, the masked out channel/drive is not written to or read from by the associated 80186 channel processor.
- Figure 6 through to 13 are diagrams illustrating the operation of the software run by the central controller 22.
- Figure 6 illustrates the steps undertaken during the writing of data to the banks of disk drives. Initially the software is operating in "background" mode and is awaiting instructions. Once an instruction from the host is received indicating that data is to be sent, it is determined whether this is sequential within an existing segment. If data is sequential then this data is stored in the segment to form sequential data. If no sequential data exists in a buffer segment then either a new segment is opened (the write behind procedure illustrated in Figure 8) and data is accepted from the host, or the data is accepted into a transit buffer and queued ready to write into a segment. If there is no room for a new segment then" the segment is found which has been idle for the most time. If there are no such segments then the host write request is entered into a suspended request list.
- a segment is available it is determined whether this is a read or write segment. If it is a write segment then if it is empty it is de-allocated. If it is not empty then the segment is removed from consideration for de-allocation. If the segment is a read segment then the segment is de-allocated and opened ready to accept the host data.
- Figure 7 illustrates the steps undertaken during read operations.
- the controller is in a "background" mode.
- a request for data is received from the host computer, if the start of the data requested is already in a read segment then data can be transferred from the central buffer 26 to the host computer. If the data is not already in the central buffer 26, then it is ascertained whether it is acceptable to read ahead information. If it is not acceptable then a read request is queued. If data is to be read ahead then it is determined whether there is room for a new segment. If there is then a new segment is opened and data is read from the drives to the buffer segment and is then transferred to the host computer. If there is no room for a new segment then the segment is found for which the largest time has elapsed since it was last accessed, and this segment is de-allocated and opened to accept the data read from the disk drives.
- the read ahead procedure illustrated in Figure 9 is formed. It is determined whether there are any read segments open which require a data refresh. If the-** is such a segment then a read request for the I/O handl. _ for the segment is queued.
- Figure 10 illustrates the software steps undertaken to restart suspended transfers. It is first determined whether there are suspended host write requests in the list. If there is it is determined whether there is room for allocation of a segment for suspended host write requests. A new segment for the host transfer is opened and the host request which has been suspended longest is determined and data is accepted from the host computer into the buffer segment.
- Figure 11 illustrates a form of "housekeeping" undertaken by the software in order to clean up the segments in the central buffer 26. It is determined at a point that it is time to clean up the buffer segments. All the read segments which have times since the last access time larger than a predetermined limit termed the "geriatric limit" are found and reallocated. Also it is determined whether there are any such write segments and if so write operations are tidied up.
- Figure 12 illustrates the operation of the input/output handler
- Figure 13 illustrates the operation of the input/output sub system
- the controller When writing data, for individual writes of a single sector, or less than four correctly grouped sectors, the controller has first to read the required overall sector, then modify the data for the actual part of the sector that is necessary, and then finally write the overall slave disk sector back to the disk drive. This is a form of read modify write operation and can slow down the transfer of data to the disk .drives considerably.
- the RAID-3 controller is inferior to the RAID-5 controller.
- controller of the present invention provides for large scale sequential data transfers from memory units for multi-users of a host computer.
- the present invention is applicable to any standard host interface or slave interface and is not limited to the use of an SCSI bus as shown in Figure 1.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9019891.2 | 1990-09-12 | ||
GB909019891A GB9019891D0 (en) | 1990-09-12 | 1990-09-12 | Computer memory array control |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1992004674A1 true WO1992004674A1 (en) | 1992-03-19 |
Family
ID=10682053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB1991/001557 WO1992004674A1 (en) | 1990-09-12 | 1991-09-12 | Computer memory array control |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0548153A1 (en) |
AU (1) | AU8508191A (en) |
GB (1) | GB9019891D0 (en) |
WO (1) | WO1992004674A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994007196A1 (en) * | 1992-09-22 | 1994-03-31 | Unisys Corporation | Device interface module for disk drive complex |
EP0606743A2 (en) * | 1992-12-16 | 1994-07-20 | Quantel Limited | A data storage apparatus |
EP0650616A1 (en) * | 1992-06-04 | 1995-05-03 | Emc Corporation | System and method for dynamically controlling cache management |
EP0701198A1 (en) * | 1994-05-19 | 1996-03-13 | Starlight Networks, Inc. | Method for operating an array of storage units |
US5721950A (en) * | 1992-11-17 | 1998-02-24 | Starlight Networks | Method for scheduling I/O transactions for video data storage unit to maintain continuity of number of video streams which is limited by number of I/O transactions |
US5802394A (en) * | 1994-06-06 | 1998-09-01 | Starlight Networks, Inc. | Method for accessing one or more streams in a video storage system using multiple queues and maintaining continuity thereof |
CN107728943A (en) * | 2017-10-09 | 2018-02-23 | 华中科技大学 | It is a kind of to postpone to produce the method for verification CD and its corresponding data reconstruction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0278425A2 (en) * | 1987-02-13 | 1988-08-17 | International Business Machines Corporation | Data processing system and method with management of a mass storage buffer |
WO1989009468A1 (en) * | 1988-04-01 | 1989-10-05 | Unisys Corporation | High capacity multiple-disk storage method and apparatus |
WO1989010594A1 (en) * | 1988-04-22 | 1989-11-02 | Amdahl Corporation | A file system for a plurality of storage classes |
EP0369707A2 (en) * | 1988-11-14 | 1990-05-23 | Emc Corporation | Arrayed disk drive system and method |
-
1990
- 1990-09-12 GB GB909019891A patent/GB9019891D0/en active Pending
-
1991
- 1991-09-12 AU AU85081/91A patent/AU8508191A/en not_active Abandoned
- 1991-09-12 EP EP91916077A patent/EP0548153A1/en not_active Withdrawn
- 1991-09-12 WO PCT/GB1991/001557 patent/WO1992004674A1/en not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0278425A2 (en) * | 1987-02-13 | 1988-08-17 | International Business Machines Corporation | Data processing system and method with management of a mass storage buffer |
WO1989009468A1 (en) * | 1988-04-01 | 1989-10-05 | Unisys Corporation | High capacity multiple-disk storage method and apparatus |
WO1989010594A1 (en) * | 1988-04-22 | 1989-11-02 | Amdahl Corporation | A file system for a plurality of storage classes |
EP0369707A2 (en) * | 1988-11-14 | 1990-05-23 | Emc Corporation | Arrayed disk drive system and method |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1199638A2 (en) * | 1992-06-04 | 2002-04-24 | Emc Corporation | System and Method for dynamically controlling cache management |
EP0650616A1 (en) * | 1992-06-04 | 1995-05-03 | Emc Corporation | System and method for dynamically controlling cache management |
EP1199638A3 (en) * | 1992-06-04 | 2008-04-02 | Emc Corporation | System and Method for dynamically controlling cache management |
EP0650616A4 (en) * | 1992-06-04 | 1997-01-29 | Emc Corp | System and method for dynamically controlling cache management. |
US5471586A (en) * | 1992-09-22 | 1995-11-28 | Unisys Corporation | Interface system having plurality of channels and associated independent controllers for transferring data between shared buffer and peripheral devices independently |
WO1994007196A1 (en) * | 1992-09-22 | 1994-03-31 | Unisys Corporation | Device interface module for disk drive complex |
US5721950A (en) * | 1992-11-17 | 1998-02-24 | Starlight Networks | Method for scheduling I/O transactions for video data storage unit to maintain continuity of number of video streams which is limited by number of I/O transactions |
US5734925A (en) * | 1992-11-17 | 1998-03-31 | Starlight Networks | Method for scheduling I/O transactions in a data storage system to maintain the continuity of a plurality of video streams |
US5754882A (en) * | 1992-11-17 | 1998-05-19 | Starlight Networks | Method for scheduling I/O transactions for a data storage system to maintain continuity of a plurality of full motion video streams |
EP0606743A2 (en) * | 1992-12-16 | 1994-07-20 | Quantel Limited | A data storage apparatus |
EP0606743A3 (en) * | 1992-12-16 | 1994-08-31 | Quantel Ltd | |
US5765186A (en) * | 1992-12-16 | 1998-06-09 | Quantel Limited | Data storage apparatus including parallel concurrent data transfer |
US5732239A (en) * | 1994-05-19 | 1998-03-24 | Starlight Networks | Method for operating a disk storage system which stores video data so as to maintain the continuity of a plurality of video streams |
EP0701198A1 (en) * | 1994-05-19 | 1996-03-13 | Starlight Networks, Inc. | Method for operating an array of storage units |
US5802394A (en) * | 1994-06-06 | 1998-09-01 | Starlight Networks, Inc. | Method for accessing one or more streams in a video storage system using multiple queues and maintaining continuity thereof |
CN107728943A (en) * | 2017-10-09 | 2018-02-23 | 华中科技大学 | It is a kind of to postpone to produce the method for verification CD and its corresponding data reconstruction method |
CN107728943B (en) * | 2017-10-09 | 2020-09-18 | 华中科技大学 | Method for delaying generation of check optical disc and corresponding data recovery method |
Also Published As
Publication number | Publication date |
---|---|
AU8508191A (en) | 1992-03-30 |
EP0548153A1 (en) | 1993-06-30 |
GB9019891D0 (en) | 1990-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5526507A (en) | Computer memory array control for accessing different memory banks simullaneously | |
US6058489A (en) | On-line disk array reconfiguration | |
US6009481A (en) | Mass storage system using internal system-level mirroring | |
US5893919A (en) | Apparatus and method for storing data with selectable data protection using mirroring and selectable parity inhibition | |
US5875456A (en) | Storage device array and methods for striping and unstriping data and for adding and removing disks online to/from a raid storage array | |
EP0572564B1 (en) | Parity calculation in an efficient array of mass storage devices | |
US7730257B2 (en) | Method and computer program product to increase I/O write performance in a redundant array | |
EP0369707B1 (en) | Arrayed disk drive system and method | |
US5608891A (en) | Recording system having a redundant array of storage devices and having read and write circuits with memory buffers | |
US7228381B2 (en) | Storage system using fast storage device for storing redundant data | |
EP1376329A2 (en) | Method of utilizing storage disks of differing capacity in a single storage volume in a hierarchic disk array | |
EP0850448A1 (en) | Method and apparatus for improving performance in a redundant array of independent disks | |
WO1997044733A1 (en) | Data storage system with parity reads and writes only on operations requiring parity information | |
WO1992004674A1 (en) | Computer memory array control | |
WO1993013475A1 (en) | Method for performing disk array operations using a nonuniform stripe size mapping scheme | |
US6934803B2 (en) | Methods and structure for multi-drive mirroring in a resource constrained raid controller | |
AU662376B2 (en) | Computer memory array control | |
US6898666B1 (en) | Multiple memory system support through segment assignment | |
CA2229648C (en) | Method and apparatus for striping data and for adding/removing disks in a raid storage system | |
CA2585216C (en) | Method and apparatus for striping data and for adding/removing disks in a raid storage system | |
GB2298306A (en) | A disk array and tasking means |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU CA GB JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1991916077 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1991916077 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1991916077 Country of ref document: EP |