Nothing Special   »   [go: up one dir, main page]

CN1422048A - Solution to local failure of memory - Google Patents

Solution to local failure of memory Download PDF

Info

Publication number
CN1422048A
CN1422048A CN01135088A CN01135088A CN1422048A CN 1422048 A CN1422048 A CN 1422048A CN 01135088 A CN01135088 A CN 01135088A CN 01135088 A CN01135088 A CN 01135088A CN 1422048 A CN1422048 A CN 1422048A
Authority
CN
China
Prior art keywords
buffering area
memory
veneer
self check
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN01135088A
Other languages
Chinese (zh)
Other versions
CN1288882C (en
Inventor
涂君
雷春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB011350881A priority Critical patent/CN1288882C/en
Publication of CN1422048A publication Critical patent/CN1422048A/en
Application granted granted Critical
Publication of CN1288882C publication Critical patent/CN1288882C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The solution to local failure of memory is to perform self check of the memory in buffering area unit with the logic circuit of IC or ASIC chip itself. For buffering area with all right memory units, the initial address is written to the idle buffering area queue for subsequent use; and for buffering area with failure memory unit, the initial address will not be written for no further access. During self check, the failure buffering areas are counted for post-treatment.

Description

A kind of method that solves the memory partial failure
Affiliated field
The present invention relates to a kind ofly solve the memory partial failure and improve the method for whole system functional reliability and fault-tolerance, this method has bigger using value in the occasion that the memory piecemeal uses, and for example transmits or ATM cell such as cuts apart/recombinate at the Logic Circuit Design of aspect application in the message storage.The invention belongs to logic IC or asic chip circuit design technique field.
Background technology
Relating to that message storage is transmitted or ATM cell is cut apart/recombinate etc. in the circuit design of the logic IC of application or asic chip, often need use mass storage and be used for temporary message, and generally all be that memory is divided into several buffering areas, each buffering area can be deposited a message.
Referring to the realization block diagram of logical circuit in the application that the message storage is transmitted or ATM cell is cut apart/recombinated etc. at present shown in Figure 1, its basic functional principle is described as follows:
(1) after system reset, at first carries out the memory self check.The memory self check can be to be realized by the logical circuit of logic IC or asic chip itself, also can be to be undertaken by the memory access passage that this logic chip provides by the CPU that links to each other with this logic IC or asic chip.Because the capacity of memory is bigger, the self check speed of being carried out memory by CPU is too slow, so, all be that the logical circuit by logic IC or asic chip itself carries out the memory self check usually.The method of self check generally is to write data earlier in certain memory cell of memory, and then these data and the data of reading from this memory cell are compared judgement, if both are identical, thinks that then this memory cell is normal.If through after the self check, all memory cell of this memory are all normal, can judge that this memory self check is normal.After the memory self check was finished, this memory self check mistake of needs output was whether Status Flag, and confession CPU judges and handles accordingly.Find that such as self check there is partial failure in memory, then CPU need send alarm signal, notifies the attendant to change corresponding processing such as veneer.Heavy line among the figure is represented the delivering path of message data, and fine line is then represented the delivering path of buffering area first address.
(2) initialization of the not busy buffering area formation of the normal laggard line space of memory self check, the first address that is about to each buffering area writes the freebuf formation.Freebuf formation and the formation of transmission buffering area in fact all are push-up storage (FIFO), what preserve in the freebuf formation is the first address of freebuf, and sending what preserve in the buffering area formation is to have had buffering area first address to be sent such as message.
(3) the accepting state machine is after receiving message, to from the freebuf formation, read the first address of freebuf, and the message that receives is stored in the corresponding buffering area of mass storage according to this address, after a message received, the first address with this buffering area was written in the formation of transmission buffering area again.
(4) after the transmit status machine examination measures and in the formation of transmission buffering area data is arranged, from send the buffering area formation, read out the buffering area first address of this message storage earlier, from mass storage, read this message according to this first address then, and after handling accordingly, send.After a message transmission finished, message transmit status machine was written to the first address of this buffering area in the freebuf formation more again, to discharge this buffering area.By above workflow, just can finish the storage forwarding work of message.
At present, along with developing rapidly of microelectric technique, the capacity of memory chip is increasing, can integrated several hundred million transistors in the present chip, and the scale of memory chip also rapidly increases continuing.Simultaneously, the employed processing technology of production integrated circuit (IC) chip is also more and more advanced, and its live width is more and more littler, and the possibility that certainly will cause like this occurring the LSU local store unit inefficacy in the memory will increase greatly.
If the LSU local store unit in certain buffering area in the memory lost efficacy, the message that then is easy to cause being temporarily stored in this buffering area is made mistakes when sending.In this case, can think that generally this veneer produces fault, need to change whole memory chip, and veneer need be returned manufacturer's maintenance.The expense of whole maintenance is the cost that is higher than this memory chip itself far away, moreover this memory chip is just LSU local store unit generation inefficacy also, more seriously, can produce the illusion that the quality of this product can not get guaranteeing, bring grievous injury to image product to the user.
Summary of the invention
Thereby the purpose of this invention is to provide a kind of method that the memory partial failure improves whole system functional reliability and fault-tolerance that solves, this method can solve memory preferably and partial failure occur and cause message to send wrong and the high problem of single board default rate, make and it seems that from system to just look like that partial failure does not take place this memory chip the same, only this memory span is little little by little, whole system operation reliability and fault-tolerance be can improve greatly like this, single board default rate and repair rate reduced.
The object of the present invention is achieved like this: a kind of method that solves the memory partial failure, it is characterized in that: this method is that the logical circuit by logic IC or asic chip itself is that unit carries out self check with the buffering area to memory, the method of self check is sequentially to write data to each memory cell of this memory, and then these data and the data of reading from this buffering area are compared judgement, if both are identical, think that then the detected memory cell of this buffering area is normal, if the self-detection result of all memory cell of this buffering area is all normal, then after this buffering area self check finishes, its first address is written in the freebuf formation, just can uses this buffering area in the work of logic IC or asic chip afterwards; Have certain or some storage-unit-failure if detect certain buffering area, then the first address of this buffering area will not be written in the freebuf formation, and this buffering area that has the LSU local store unit inefficacy will be accessed in the operate as normal of logic IC or asic chip never; Simultaneously, the memory self-checking circuit is when carrying out the memory self check, one counter is set to be counted the buffering area number that damages that lost efficacy, and after self check finishes, read this statistics by CPU and lost efficacy and damage the count value of buffer count device and handle accordingly, if lost efficacy the buffering area number that damages seldom, under the little situation of the function of veneer and performance impact, can think that this veneer is normal, allow the work as usual of this veneer; When the number that damaged buffering area when losing efficacy is big, under the situation that may affect greatly, should sends alarm signal request maintenance or change veneer the function and the performance of veneer.
Adopt method of the present invention, can under the situation that the LSU local store unit that detects memory takes place to lose efficacy, not re-use this and produced the buffering area that LSU local store unit lost efficacy, but other buffering areas that do not lose efficacy can also normally use, and do not need to change whole memory chip.Like this, it seems that from system to just look like that this memory chip does not produce partial failure the same, only the capacity of this memory is little little by little, and this is complete acceptable in the overwhelming majority's system.So application of the present invention can improve the reliability and the fault-tolerance of whole system greatly, reduce the failure rate and the repair rate of veneer, this has very important significance in the continuous work of application scenario have relatively high expectations, need to(for) functional reliability.
Description of drawings
Fig. 1 is the realization block diagram of the hardware logic electric circuit in the application that the message storage used is at present transmitted or ATM cell is cut apart/recombinated etc.
Embodiment
The present invention a kind ofly solves the memory partial failure and improves the method for whole system functional reliability and fault-tolerance, the specific practice of this method is that the logical circuit by logic IC or asic chip itself is that unit carries out self check with the buffering area to this memory, the method of self check is sequentially to write data to each memory cell of each buffering area of this memory, and then these data and the data of reading from this buffering area are compared judgement, if both are identical, think that then the detected memory cell of this buffering area is normal; If the self-detection result of all memory cell of this buffering area is all normal, then after this buffering area self check finishes, its first address is written in the freebuf formation, just can use this buffering area in the work of logic IC or asic chip afterwards; Have certain or some storage-unit-failure if detect this buffering area, then the first address of this buffering area will not be written in the freebuf formation, and this buffering area that has the LSU local store unit inefficacy will no longer be accessed in the operate as normal of logic IC or asic chip forever; Simultaneously, the memory self-checking circuit is when carrying out the memory self check, one counter is set to be counted the buffering area number that damages that lost efficacy, and after self check finishes, by CPU read this statistics lost efficacy the buffer count device that damages count value and handle accordingly: if lost efficacy the buffering area number that damages seldom, under the little situation of influences such as the function of veneer and performance, can think that this veneer is normal, allow the work as usual of this veneer; The number that damaged buffering area when losing efficacy is bigger, under the situation that may affect greatly the function and the performance of veneer, should send alarm signal, and request maintenance or prompting user in time change this data storage veneer.
When each buffering area to this memory carries out self check, can select to adopt a kind of method for testing memory to the requirement of memory error detection probability according to the complexity and the system that realize, for example scanning patter method, checkerboard pattern method, MATS algorithm, the graphic-arts technique that strides, nine step algorithms, nine step of expansion algorithm, 13 go on foot algorithms, MarchC algorithm or the like, the concrete grammar of above-mentioned these algorithms can be checked related data, and the present invention does not give unnecessary details at this.
Method of the present invention is carried out emulation and simulation by the applicant in computer and some equipment, system, and in actual items, implement test,, realized goal of the invention through the practice test, prove that this method performing step is simple, reliable operation, have good application prospects.

Claims (1)

1, a kind of method that solves the memory partial failure, it is characterized in that: this method is that the logical circuit by logic IC or asic chip itself is that unit carries out self check with the buffering area to memory, if the self-detection result of all memory cell of certain buffering area is all normal, then after this buffering area self check finishes, its first address is written in the freebuf formation, just can uses this buffering area in the work of logic IC or asic chip afterwards; Have certain or some storage-unit-failure if detect certain buffering area, then the first address of this buffering area will not be written in the freebuf formation, and this buffering area that has the LSU local store unit inefficacy will be accessed in the operate as normal of logic IC or asic chip never; Simultaneously, the memory self-checking circuit is when carrying out the memory self check, one counter is set to be counted the buffering area number that damages that lost efficacy, and after self check finishes, read this statistics by CPU and lost efficacy and damage the count value of buffer count device and handle accordingly, if lost efficacy the buffering area number that damages seldom, under the little situation of the function of veneer and performance impact, can think that this veneer is normal, allow the work as usual of this veneer; When the number that damaged buffering area when losing efficacy is big, under the situation that may affect greatly, should sends alarm signal request maintenance or change veneer the function and the performance of veneer.
CNB011350881A 2001-11-27 2001-11-27 Solution to local failure of memory Expired - Fee Related CN1288882C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB011350881A CN1288882C (en) 2001-11-27 2001-11-27 Solution to local failure of memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB011350881A CN1288882C (en) 2001-11-27 2001-11-27 Solution to local failure of memory

Publications (2)

Publication Number Publication Date
CN1422048A true CN1422048A (en) 2003-06-04
CN1288882C CN1288882C (en) 2006-12-06

Family

ID=4672943

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB011350881A Expired - Fee Related CN1288882C (en) 2001-11-27 2001-11-27 Solution to local failure of memory

Country Status (1)

Country Link
CN (1) CN1288882C (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101742038B (en) * 2008-11-14 2012-08-22 夏普株式会社 Image processing apparatus
CN114315541A (en) * 2022-01-17 2022-04-12 万华化学(四川)有限公司 Cyclohexanone composition and application thereof
CN115292114A (en) * 2022-10-09 2022-11-04 中科声龙科技发展(北京)有限公司 Data storage method, device, equipment and storage medium based on ETHASH algorithm

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101742038B (en) * 2008-11-14 2012-08-22 夏普株式会社 Image processing apparatus
CN114315541A (en) * 2022-01-17 2022-04-12 万华化学(四川)有限公司 Cyclohexanone composition and application thereof
CN115292114A (en) * 2022-10-09 2022-11-04 中科声龙科技发展(北京)有限公司 Data storage method, device, equipment and storage medium based on ETHASH algorithm

Also Published As

Publication number Publication date
CN1288882C (en) 2006-12-06

Similar Documents

Publication Publication Date Title
CN102084430B (en) Method and apparatus for repairing high capacity/high bandwidth memory devices
CN101589370B (en) A parallel computer system and fault recovery method therefor
CN102541756A (en) Cache memory system
US20080071499A1 (en) Run-time performance verification system
JP2001350651A (en) Method for isolating failure state
US9141463B2 (en) Error location specification method, error location specification apparatus and computer-readable recording medium in which error location specification program is recorded
CN102932444A (en) Load balancing module in financial real-time trading system
US20040216003A1 (en) Mechanism for FRU fault isolation in distributed nodal environment
US6950978B2 (en) Method and apparatus for parity error recovery
CN101150458A (en) Method and device for single board detection
CN105959235A (en) Distributed data processing system and method
JPH07183898A (en) Method for recovering predetermined order for cell style of asymmetric order in atm exchange technology
CN101299685B (en) Method and system for testing switching network as well as test initiation module
CN107203335A (en) Storage system and its operating method
CN1288882C (en) Solution to local failure of memory
CN108228669A (en) A kind of method for caching and processing and device
CN104780123B (en) A kind of network pack receiving and transmitting processing unit and its design method
CN101458305A (en) Embedded module test and maintenance bus system
CN101634939B (en) Fast addressing device and method thereof
CN102135941B (en) Method and device for writing data from cache to memory
JP3401160B2 (en) Distributed shared memory network device
US6928588B2 (en) System and method of improving memory yield in frame buffer memory using failing memory location
CN112613254B (en) System and method for verifying fault injection of mirror image control module in processor
RU2383067C2 (en) Method of storing data packets using pointer technique
US7788546B2 (en) Method and system for identifying communication errors resulting from reset skew

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20061206

Termination date: 20161127