CN101656624B - Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method - Google Patents
Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method Download PDFInfo
- Publication number
- CN101656624B CN101656624B CN2008102108091A CN200810210809A CN101656624B CN 101656624 B CN101656624 B CN 101656624B CN 2008102108091 A CN2008102108091 A CN 2008102108091A CN 200810210809 A CN200810210809 A CN 200810210809A CN 101656624 B CN101656624 B CN 101656624B
- Authority
- CN
- China
- Prior art keywords
- node
- priority
- center node
- module
- backup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a multi-node application-level disaster recovery system and a multi-node application-level disaster recovery method. The system comprises a production central node and two or more backup central nodes, wherein the backup central nodes and the production central node are connected through a local area network and/or a wide area network; and the backup central nodes are arranged in a priority order from high to low, and the priority of the production central node is higher than that of all the backup central nodes, wherein when the production central node works normally, data and/or operation is sent to all the backup central nodes, and when the production central node fails, services running on the production central node are switched to the backup central node with the highest priority; and the backup central node is used for saving the data and/or the operation sent by the production central node, sending the data and/or the operation to all the remaining backup central nodes when the backup central node replaces the production central node or other backup central nodes to run the services, and switching the services to the backup central node with the lower priority when failures happen to the running services.
Description
Technical field
The present invention relates to communication technique field, relate in particular to a kind of multinode application redundancy system and disaster recovery method.
Background technology
Disaster tolerance is meant for after guaranteeing key business and being applied in the various disasters of experience, a series of system plannings and the construction behavior that still can provide normal service to carry out to greatest extent.Typical disaster event is a natural disaster, as fire, flood, earthquake, cyclone, typhoon etc., also has other as originally offering the required service disruption of service operation, as equipment fault, software error, communication network interruption and power failure or the like.In addition, artificial factor often also can be bred disaster, as operator error, destruction, implantation harmful code and the attack of terrorism.
The essence of disaster tolerance is to guarantee never-ceasing service operation, and the final construction object of disaster tolerance is to guarantee business continuance.The origin of disaster tolerance industry and development are the inevitable outcomes of technical development of computer, have also reflected information system and the data importance to individual, enterprise, country simultaneously.
The disaster tolerance kind is divided from protection level, and can be divided into data redundancy and application redundancy substantially: the focus of data redundancy is data itself, will guarantee that after disaster takes place original data can not lose or be destroyed; Application redundancy is on the basis of data redundancy, in the same identical application system of a cover that makes up of backup site, under the prerequisite that realizes data redundancy, guarantee the lasting available of outward service, when disaster takes place, can realize professional the switching aborning between the heart and the strange land Disaster Preparation Center, all application programs that influenced by disaster are taken over seamlessly back-up system, guarantee the professional availability that continues.
At present, the disaster tolerance system that industry is popular has 2 big classes:
First class is based on the disaster tolerance system of accumulation layer, occupy dominant position in high-end storage, mainly based on the disk array reproduction technology, by storage system built-in firmware or operating system, by special circuit for example optical-fibre channel realize duplicating or mirror image of data between the physical storage device, not only can do asynchronous replication, also can do synchronization replication, have nothing to do with the operating system platform application, guaranteed the consistency of two end datas.
But the shortcoming of this disaster tolerance system also clearly: data redundancy only is provided, the application redundancy function is not provided; Hardware investment is very expensive, and the user must be equipped with end in local side and calamity and dispose the identical storage system of two covers respectively, purchase cost height not only, but also to be limited by single equipment vendors, following autgmentability certainly will lack flexibility; The method of synchronization is bigger to the traffic handing capacity influence of home site; Active and standby machine distance can not be too big, generally is limited within the 200km; Requirement to the network bandwidth is relatively very high, generally needs to surpass the bandwidth of GB.
Second largest class is based on the disaster tolerance system of server layer, in the low and middle-end memory device, occupy dominant position, by being installed in the data in server propagation software, or the data that application program provides are duplicated, disaster recovery instrument (as the related tool of database), utilize the TCP/IP network to connect the Disaster Preparation Center of far-end, realization strange land data are duplicated.Transmission range between the Disaster Preparation Center of strange land without limits and storage hardware irrelevant, server and memory device that can compatible different brands, user's input cost is lower relatively.
But the present main product in the industry cycle of this disaster tolerance system has following drawback: with prior art, owing to the actual demand that input cost is excessive and temporary transient is not vigorous, the application redundancy function of 2 nodes can only be provided, surpass 2 nodes, the data redundancy function can only be provided; Single synchronization replication or asynchronous replication mode can only be provided, can not be according to the suitable replication strategy of the flexible automatic utilization of actual conditions.Data are duplicated the operating level that is based on file system, after detecting file change, need carry out in file-level or piece rank when carrying out the residual quantity transfer of data, and efficiency of transmission is lower.
Summary of the invention
The present invention wants the technical solution problem to provide a kind of multinode application redundancy system and disaster recovery method, to realize the application redundancy of multinode.
In order to solve the problems of the technologies described above, the invention provides a kind of multinode application redundancy system, comprise production center node and two or more backup center node, described two or more backup center node is connected by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, wherein:
Described production center node is used to provide the service operation ability, during operate as normal, data and/or operation is sent to all backup center nodes, and the business with operation on it when breaking down switches on the highest backup center node of priority;
Described backup center node, be used to preserve data and/or the operation that described production center node sends, when replacing the operation of production center node or other backup center nodes professional, data and/or operation are sent on all the other all backup center nodes, and when replacing production center node or other backup center node operation business to break down, described business is switched on the backup center node of next priority, to keep the normal operation of described business.
Further, described production center node and backup center node also are used to carry out prioritization, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it; Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with the production center priority inferior to described production center node; The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.
Further, when described production center node or backup center node send to other backup center node with data and/or operation, the node of node to be in same address realm with it in that sends data adopts synchronous reproduction mode, Xiang Yuqi to be in and adopts the asynchronous replication mode when node in the different address realms sends when sending.
Further, described production center node and backup center node include application program module, IO administration module, IO filtration drive module, operational module, local memory module, data replication module, local agent service module and administration module, wherein:
Described application program module, application program for the described system protection of needs, be used to produce the IO request, application program module in the node of the described production center is in running status, and the application program module in the described backup center node is only just moved when replacing the operation of production center node or other backup center nodes professional;
Described IO administration module links to each other with described application program module and data replication module, is used to receive the IO that IO asks and described data replication module the receives request that described application program module sends;
Described IO filtration drive module, link to each other with described IO administration module and data replication module, receive the IO request that above-mentioned two modules send, send to described operational module, and from the IO request that receives, may cause operational module to have the IO of alter operation to ask to filter out to send to other backup center nodes by described data replication module;
Described operational module links to each other with described IO filtration drive module, is used to carry out the described IO requested operation that receives;
Described local memory module links to each other with described operational module, is used to provide memory space;
Described data replication module links to each other with described IO administration module and IO filtration drive module, is used for from other node reception IO requests and business datum and/or sends IO to other nodes asking and business datum;
Described local agent service module links to each other with described administration module, when being used to detect the described system failure, sending a warning message to described administration module, and transmit heartbeat message between each node;
Described administration module, link to each other with described data replication module and local agent service module, be used for receiving the precedence information that other address of node information and user set in advance according to described heartbeat message, other nodes are carried out prioritization, and behind the warning information that receives described local agent service module, trigger described local agent service module and send heartbeat message to other nodes, notify other backup nodes to take over all business that the node that breaks down is born according to priority order from high to low.
Further, described operational module also is used to scan the sector data on the described local memory module, when detecting certain sectors of data and change, trigger described data replication module by described IO filtration drive module the data in this sector that changes are copied to other nodes.
In order to address the above problem, the present invention also provides a kind of multinode application redundancy method, the system that described method was suitable for comprises: production center node and two or more backup center node that the service operation ability is provided, described backup center node links to each other by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, and described method comprises:
Described production center node sends to all backup center nodes with data and/or operation when operate as normal, described backup center node is preserved data and/or the operation that described production center node sends;
When described production center node breaks down, the business of operation on it is switched on the highest backup center node of priority, when described backup center node replaces production center node operation professional, data and/or operation are sent on all the other all backup center nodes;
When the professional backup center node of operation breaks down, described business is switched on the backup center node of next priority, to keep the normal operation of described business.
Further, described production center node and each backup center node are provided with the priority orders of other node respectively, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it; Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with the production center priority inferior to described production center node; The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.
Further, when described production center node or backup center node send to other backup center node with data and/or operation, the node of node to be in same address realm with it in that sends data adopts synchronous reproduction mode, Xiang Yuqi to be in and adopts the asynchronous replication mode when node in the different address realms sends when sending.
Further, the Centroid periodic scanning sector data that operate as normal luck industry is engaged in detects certain sectors of data when changing, and the data in this sector that changes are copied to other nodes.
Further, the sort method of described priority comprises: this node sends priority query requests bag to other nodes, after receiving the priority feedback data packet of returning, set up priority list, respectively each priority feedback data packet is carried out following operation, comprises the precedence information that the address of node information that sends this bag and user set in the described priority feedback data packet:
(a) whether the node that judge to send the priority feedback data packet is to produce Centroid, if, then the priority of this node be set to the highest, if not, execution in step (b) then;
(b) the described address of node of inquiry information, judge whether itself and production center node are in the same address realm, if, then the priority of described node is changed to and is only second to production center node but is higher than other backup center nodes, if have a plurality of and the production center to be in node in the same address realm, then the precedence information of setting according to the user between each node sorts, if not, execution in step (c);
(c) whether the described address of node of inquiry information is in the same address realm with this node, if, the priority of described node is set to be lower than the priority of this node or the precedence information set according to the user sorts, otherwise, the priority of described node is changed to the priority that is lower than this node.
The present invention has realized the application redundancy of multinode, and the transmission between a plurality of nodes can automatically be selected different synchronous or asynchronous replication strategies for use flexibly according to actual conditions, and detect the data total amount that residual quantity data transmission policies after the data variation can more can effectively reduce transmission than file-level or piece rank transmission policy, slow down bandwidth pressure.
Description of drawings
Fig. 1 is the structural representation of the described multinode application redundancy of the invention process system;
Fig. 2 is in the described multinode application redundancy of the invention process system, the structural representation of HA service module;
Fig. 3 is in the described multinode application redundancy of the embodiment of the invention system, the structural representation of data replication module;
In the described method of Fig. 4 embodiment of the invention, the schematic flow sheet of multinode application redundancy system handles file I/O request;
Fig. 5 is in the described method of the embodiment of the invention, the schematic flow sheet that multinode application redundancy system handles data are duplicated;
Fig. 6 is in the described method of the embodiment of the invention, the handling process schematic diagram of HA service module sending node priority query requests bag in the multinode application redundancy system;
Fig. 7 is in the described method of the embodiment of the invention, the handling process schematic diagram of HA service module receiving node priority query requests bag in the multinode application redundancy system.
Embodiment
As shown in Figure 1, the disaster tolerance system that the present invention proposes mainly comprises: by production center node and N backup center node of external lan (LAN) and/or wide area network (WAN) connection, wherein, production center node is for being responsible for the node of operation business application, other aborning during the heart node failure node of the normal operation of maintenance service application program be called the backup center node.LAN and/or WAN network specifically are divided into public network and private network again according to purposes, public network is used to provide the passage of client access service, and private network is used to provide data Copy Info and the heartbeat message transmission between each node (comprising production center node and backup center node).The backup center node can be positioned at same position with production center node, also can be positioned at other positions outside arbitrarily remote.Same position can be meant two euclidean distance between node pair (in 100 kilometers) or be meant that two nodes are in same IP section within the specific limits.
As shown in Figure 1, structural representation for the described multinode application redundancy of the invention process system, comprise production center node 100, local backup Centroid 101 and remote backup Centroid 102, wherein local backup Centroid 101 and remote backup Centroid 102 are as the application redundancy backup center node of production center node 100, and three nodes connect by LAN and/or WAN.
The internal structure of production center node 100, local backup Centroid 101 and remote backup Centroid 102 is identical.Be the internal structure that example illustrates each node with production center node 100 below.As shown in Figure 1, production center node 100 specifically mainly comprises: application program module 1001, IO administration module 1002, IO filtration drive module 1003, operational module 1004, local memory module 1005, data replication module 1006, administration module 1007 and local agent (HA) service module 1008, wherein:
Application program module 1001, be meant that the application program that needs the disaster tolerance system protection is (according to concrete application, a plurality of application programs when possible, also may be an application program), for example create when application program module 1001 needs to carry out, during the operation of deletion or revised file etc., will produce corresponding IO request, send to operational module 1004 by IO administration module 1002 and IO filtration drive module 1003 and carry out; The application program module of production center node is in normal operating condition, and the application program module of backup center node is not moved, and only just passes through the data and/or the operation of the scheduling load store of administration module when replacing production center node to run application;
IO administration module 1002, link to each other with application program module 1001, IO filtration drive module 1003 and data replication module 1006, be responsible for receiving all IO requests, comprise that the IO request to file read-write, the IO that network is read and write ask and ask from the IO that described data replication module 1006 receives, unification is sent the IO request into IO filtration drive module 1003 then;
IO filtration drive module 1003, with IO administration module 1002, operational module 1004 and data replication module 1006 link to each other, be used for being responsible for detecting to cause change (as adding at operational module 1004, deletion, operations such as modification) IO request, IO request when needing that at least the target of mirror image between production center node and the backup center node carried out write operation filters out, when needs are carried out copy operation to next node, just data replication module 1006 is delivered in this IO request, copy to other backup center nodes, these IO requests comprise establishment, the size of deletion or revised file, request such as attribute or security descriptor;
Operational module 1004 links to each other with IO filtration drive module 1003 and local memory module 1005, is used to provide the IO request is converted to actual data manipulation, carries out concrete operations such as establishment, deletion or revised file;
Local memory module 1005 links to each other with operational module 1004, is used to the memory space that provides local;
Data replication module 1006, link to each other with IO administration module 1002, IO filtration drive module 1003 and administration module 1007, inside comprises Data Receiving submodule 1061 and data send submodule 1062, wherein Data Receiving submodule 1061 links to each other with IO administration module 1002, be used between node, receiving data, data send submodule 1062 and link to each other with IO filtration drive module 1003, are used for sending between node data;
Administration module 1007, link to each other with data replication module 1006 and HA service module 1008, at the system monitoring management aspect, after HA service module 1008 detects the system failure, administration module 1007 is responsible for receiving the warning information that HA service module 1008 sends, and triggers the alarming processing flow process of administration module 1007 inside; Aspect data duplicate, obtain the precedence information that other nodes address information of living in and user set in advance by heartbeat message with other nodes, other nodes are carried out prioritization, and when the priority orders backup center node transmission data lower than this node are switched with business, pay the utmost attention to the node of high priority, guarantee the consistency that data are duplicated as far as possible, perhaps receive the business datum that other nodes switch to this node; In the present embodiment, priority is relevant with the node present position between the node, the node priority near more with production center nodal distance is high more, the highest with production center node in the priority of the backup center node of same address realm, aborning between the backup center that the heart and this priority are the highest, the preferred synchronous reproduction mode that adopts, can guarantee the real-time and the consistency of two node datas, the backup center node far away with production center nodal distance then adopts the asynchronous replication mode, for two or more the backup center node between same address, also preferably adopt synchronous reproduction mode in addition; Aspect the business switching, be used for judging the handover operation whether needs are carried out service resources according to the warning information that HA service module 1008 provides, guarantee professional continuation, when needs are carried out handover operation, from prioritization, select a node that is only second to this node priority by administration module, send the request of switching to it, administration module as the backup center node is responsible for switching after receiving handoff request, backed up data or operation are loaded on the application program of this backup center node; In addition, compression options can also be set data being compressed and decompressed on administration module 1007, and, data be encrypted and deciphered according to the encryption policy that the user is provided with;
Fig. 2 shows the internal structure of HA service module 1008, and it mainly comprises system monitoring module 2801, resource object detection module 2802, resource object module 2803, wherein:
Resource object detection module 2802, link to each other with system monitoring module 2801 and resource object module 2803, as a monitor module, main task is to be used for monitoring resource object module 2803, the availability of some important hardware and software resources for example, comprise database service or other application service process etc., and send the state of resources information that detects, the state of report service to system monitoring module 2801; If system monitoring module 2801 receives the information that resource object detection module 2802 is sent in the given time, just think that this service is normal, receive relevant information (information that is referred to as disappears) or receive mistake (error) information if confiscate in the given time, just think that this service is abnormal; System monitoring module 2801 can determine whether the resource service that resource object detection module 2802 detected is normal according to the detection information that receives, and then carries out corresponding the processing and move;
Resource object module 2803 is hardware and software resources of resource object detection module 2802 monitorings, and include but not limited to following content: server self hardware state comprises states such as hard disk, internal memory, network interface card; Internet resources, for example floating IP address; Share storage resources, for example magnetic battle array; Database Systems, for example Oracle, Sybase, SQL, Informix etc.; Important system application module, for example WWW service, FTP service etc.
Fig. 3 shows the internal structure of data replication module 1006, and it comprises that mainly Data Receiving submodule 1061 and data send submodule 1062.Data Receiving submodule 1061 mainly comprises receiving port negotiation element 1611, receives detecting unit 1612, receives fifo queue 1613, decrypting device 1614 and decompression unit 1615, wherein:
Receiving port negotiation element 1611 links to each other with reception detecting unit 1612, is used for the target node position of determination data bag, the concrete mode of determination data bag transmission.
Receive detecting unit 1612, link to each other, be used for detecting whether receive packet, and whether be used for detecting reception fifo queue 1613 full with receiving port negotiation element 1611 and reception fifo queue 1613;
Receive fifo queue 1613, link to each other with reception detecting unit 1612 and decrypting device 1614, be used for the IO request package that buffer memory need send to this node, be forwarded to this node or decrypting device 1614 successively (for not needing to decipher or the IO request package of decompression according to the order of first in first out, directly be forwarded to this node, IO request package for needs deciphering or decompression is forwarded to decrypting device 1614);
Decrypting device 1614 links to each other with reception fifo queue 1613 and decompression unit 1615, is used for setting according to user's request the encryption policy of transmits data packets, and the encrypted packets that transmits is decrypted operation;
Decompression unit 1615 links to each other with decrypting device 1614, is used for setting according to user's request the compression ratio of transmits data packets, and the compressed data packets that transmits is separated press operation.
Data send submodule 1062 and mainly comprise transmission detecting unit 1621, send fifo queue 1622, compression unit 1623, ciphering unit 1624 and transmit port negotiation element 1625, wherein:
Send detecting unit 1621, link to each other, be used for detecting whether receive packet, and whether be used for detecting the transmission fifo queue full with transmission fifo queue 1622;
Send fifo queue 1622, link to each other with transmission detecting unit 1621 and compression unit 1623, be used for the IO request package that buffer memory need send to next node, send to next node or compression unit 1623 successively (for not needing to compress or the IO request package of encryption according to the order of first in first out, directly send to next node, IO request package for needs compression or encryption then is forwarded to compression unit 1623);
Compression unit 1623 links to each other with transmission fifo queue 1622 and ciphering unit 1624, is used for setting according to user's request the compression ratio of transmits data packets, and packet is carried out squeeze operation;
Ciphering unit 1624 links to each other with compression unit 1623 and transmit port negotiation element 1625, is used for setting according to user's request the encryption policy of transmits data packets, and packet is carried out cryptographic operation;
Transmit port negotiation element 1625 links to each other with ciphering unit 1624, is used for the target node position of determination data bag, the concrete mode of determination data bag transmission.
The present invention also provides a kind of multinode application redundancy method, in a plurality of nodes by LAN and/or WAN network connection, the node of being responsible for the operation business application is called production center node, other nodes are called the backup center node, LAN and/or WAN network specifically are divided into public network and private network again according to purposes, public network is used to provide the passage of client access service, private network is used to provide data between nodes synchronization replication and heartbeat message transmission, when this method is in normal operating conditions at described production center node, data that need protection and/or operation are carried out copy transmissions by LAN and/or WAN network, backup copy is to each backup center node, when production center node breaks down, the service application of production center node is at first all switched on the highest backup center node of priority, to keep professional normal operation; If the backup center node that priority is the highest also breaks down when replacing production center node to keep the normal operation of business, then business is switched on the inferior high backup center node of priority, to keep professional normal operation; The rest may be inferred, if when the backup center node of a certain priority replaces node operation business in the production center to break down, then business switched on the backup center node of next priority.
Have only 1 node can move a certain concrete application program in a plurality of nodes, other nodes are as the redundancy backup function, and typical configuration comprises 3 nodes as shown in Figure 1,2 nodes of local configuration are realized the two cabinet highly available systems of two-shipper, and 1 node of Remote configuration is realized application redundancy.
The concrete object that switching was duplicated and used to described data between nodes need decide according to priority.Use to switch to have only and just carry out handover operation after heart node breaks down aborning, switching destination node at first is other nodes that are only second to production center node priority, determines according to the prioritization on the node that breaks down.Data are duplicated and need be carried out copy operation from priority node to the low priority node successively at each node.At a certain concrete node, duplicate destination node and be other nodes that are only second to self node priority, adopt the mode of cascade to duplicate step by step.
The decision principle of node priority is, the rank of production center node always is higher than the rank of backup center node, for all backup center nodes, the priority of backup center node that is positioned at same address realm with production center node is the highest, if being in production center node has a plurality of backup center nodes in the same address realm, then can according to the user priority of disposing order is set or according to arranging with the distance of production center node.For a plurality of backup center nodes in being in same address realm, sort or according to this node priority height, other the low mode of node priority sorts in the address realm in the inner priority that is provided with of administration module in advance according to the user.When duplicating with professional the switching, data pay the utmost attention to local node.The data of each node are duplicated and can only be duplicated at all node medium priority the highest nodes lower than this node priority.By the mode of cascade, realize the multiple duplication of data and the application redundancy function of multinode.
Described data between nodes transmission means, the source replica node can be selected different replication strategies for use flexibly according to actual conditions, and whether the source replica node is in same address by comparison purpose replica node is selected to adopt synchronous reproduction mode or asynchronous replication mode.When two nodes are positioned at same address, adopt synchronous reproduction mode, thereby guarantee real-time, the consistency of data; Take the asynchronous replication mode between two centers between two nodes are positioned at different addresses, are long-range, guarantee that transmission range is unrestricted, but data have certain time-delay.
Described node is after detecting file change, the data transmission policies that adopts in the real time data synchronization process that need carry out between source replica node and purpose replica node is by the sector-level transmission policy, with reference to figure 1, for example, at first filtering out the IO request by IO filtration drive module is read operation or write operation, if detecting is write operation, then according to the IO write operation at operational module, execution result on the local memory module, by the sector-level data of operational module periodic scanning on local memory module, relatively the difference of sector-level data changes, when detecting certain sectors of data when changing, then feed back to IO filtration drive module, further the data in this sector that changes are copied to other nodes by IO filtration drive module trigger data replication module, relatively and file-level or piece level transmission policy, the sector-level transmission policy can effectively reduce the data total amount of transmission, only transmit the sector piece that data change, and do not need the whole file or the data block that change are all copied to backup center.
Be elaborated to utilizing the described system of the embodiment of the invention to carry out the multinode application redundancy below in conjunction with Fig. 4, Fig. 5 and Fig. 6.
The present embodiment method mainly comprises: application programs produces the respective handling that need carry out the IO request of file read-write the IO operation requests of modification or deleted file (for example create), IO filtration drive module filters out the IO write request, when needs when next node is carried out the data copy operation, IO filtration drive module is transmitted to the data replication module with corresponding IO write request, otherwise only simply the IO request is forwarded to operational module, carries out the disk read and/or write.
The flow process signal of the multinode application redundancy system handles file I/O request shown in Fig. 4 mainly may further comprise the steps:
Step 420 is forwarded to the IO administration module with these IO requests to file read-write;
Step 455 for intermediate node, all needs to carry out disk write operation, and IO filtration drive module is converted to packet with the IO request and sends to the data replication module, changes step 460 then;
The described method of the embodiment of the invention comprises that also data are duplicated the option that the IO request package transmits between node carry out compression ratio setting and encryption and decryption setting.Fig. 5 shows the schematic flow sheet that multinode application redundancy system handles data are duplicated, and can may further comprise the steps:
The transmission detecting unit that step 510, data send in the submodule detects whether receive packet, if receive packet, then execution in step 520, does not detect if receive packet then continue;
Step 520 sends detecting unit and detects whether the transmission fifo queue is full state, if full state, then execution in step 525; If not full state, then execution in step 530;
Step 525 sends detecting unit and stops to return step 520 and continuing to carry out to sending fifo queue transmission data;
Step 540 sends fifo queue and according to the network bandwidth situation packet is progressively sent into compression module;
Step 550 detects on the administration module whether be provided with compression options; If be provided with compression options, then execution in step 555; If compression options is not set, then execution in step 560;
Step 555 is carried out squeeze operation to packet, execution in step 560 according to the compression ratio that compression is provided with;
Step 560 is forwarded to encrypting module with packet, detects on the administration module whether Encryption Options is set, and is provided with then execution in step 565 of Encryption Options, and then execution in step 570 of Encryption Options is not set;
Step 565 is carried out cryptographic operation according to the encryption policy that compression is provided with data flow, and execution in step 570 then;
Step 570 is forwarded to the transmit port negotiation module with packet, and the transmit port negotiation module triggers the priority of management module analysis residue backup center node, the destination node of determination data bag;
Step 585 adopts synchronous reproduction mode between this node and the destination node, and execution in step 590 then;
Step 586 adopts the asynchronous replication mode between this node and the destination node, execution in step 590 then;
As shown in Figure 6, in the described method of the embodiment of the invention, the HA service module in the multinode application redundancy system handles node mainly may further comprise the steps to the schematic flow sheet of other nodes transmission priority query requests bags:
The priority feedback data packet comprises the precedence information that address of node information and user set.
If do not have address of node, the production center information this moment, then according to the processing inequality of production center node address information, promptly execution in step 660.
Whether step 650 has other nodes identical with this address of node information in the administration module inquiry priority list, if having then execution in step 652; If do not have then execution in step 654;
Step 654 comes the foremost of priority list with the priority of this node, and execution in step 690 then;
Whether step 666 contains the node identical with production center node address in the administration module Query List, if having then execution in step 670; If do not have then execution in step 680;
After step 690, administration module are finished ordering according to precedence information, upgrade priority list, this flow process finishes.
In the sort method that present embodiment is given an example, production center node in other embodiments, can place this tabulation with this production center node when ordering not in priority list.
The administration module of each node can carry out the ordering of priority.Prioritization on each backup center node is not necessarily identical, but needs all to guarantee that the priority of production center node is the highest.The priority that is in other nodes of same address realm with this node can be set to be lower than this node, and other and this node are in the prioritization that the node priority in strange land can be provided with according to the user.
As shown in Figure 7, in the described method of the embodiment of the invention, after the HA service module receives the priority query requests bag that other nodes send,, form feedback data packet and send the schematic flow sheet that goes back, specifically may further comprise the steps the self information encapsulation:
Step 740, the HA service module is inquired about feedback packet with priority and is sent to this node that sends priority query requests bag by private network, and this flow process finishes.
Be example only below with production center node 100, the method of utilizing multinode application redundancy system to carry out the multinode application redundancy is illustrated, in fact, those of ordinary skills will be appreciated that, in whole multinode application redundancy system, backup center node 101,102 servers and production center node 100 are operations of carrying out simultaneously roughly the same, can guarantee to automatically switch to when system breaks down the backup center node like this.For the detailed process of backup center node 101,102, the processing that please refer to production center node 100 is described, and repeats no more herein.
Multinode application redundancy system and method provided by the invention has solved the problem that the multinode application redundancy can't be provided that exists in the present prior art, can effectively improve application system whole reliability and availability.Aspect data transfer mode, the present invention can adopt different replication strategies flexibly according to actual conditions, the advantages of synchronization replication and asynchronous replication is got up, effectively avoid the shortcoming of copy mode separately, for example between two these two centers of cabinet of local active standby, adopt synchronous reproduction mode, thereby guarantee real-time, the consistency of data, and take the asynchronous replication mode between two centers between long-range, guarantee that transmission range is unrestricted.Aspect the residual quantity data transmission policies that adopts in the two ends real time data synchronizing process of after detecting file change, implementing, the present invention replaces existing file-level or piece level data transmission policies by the sector-level data transmission policies, only transmit the sector piece that data change, and do not need the whole file or the data block that change are all copied to backup center, effectively reduce the data total amount of transmission.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claims.
Claims (10)
1. multinode application redundancy system, it is characterized in that, comprise production center node and two or more backup center node, described two or more backup center node is connected by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, wherein:
Described production center node is used to provide the service operation ability, during operate as normal, data and/or operation is sent to all backup center nodes, and the business with operation on it when breaking down switches on the highest backup center node of priority;
Described backup center node, be used to preserve data and/or the operation that described production center node sends, when replacing the operation of production center node or other backup center nodes professional, data and/or operation are sent on all the other all backup center nodes, and when replacing production center node or other backup center node operation business to break down, described business is switched on the backup center node of next priority, to keep the normal operation of described business.
2. the system as claimed in claim 1 is characterized in that,
Described production center node and backup center node also are used to carry out prioritization, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it;
Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with production center node priority inferior to described production center node;
The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.
3. system as claimed in claim 2 is characterized in that,
When described production center node or backup center node send to other backup center node with data and/or operation, the node of node to be in same address realm with it in that sends data adopts synchronous reproduction mode, Xiang Yuqi to be in and adopts the asynchronous replication mode when node in the different address realms sends when sending.
4. as claim 1 or 2 or 3 described systems, it is characterized in that, described production center node and backup center node include application program module, IO administration module, IO filtration drive module, operational module, local memory module, data replication module, local agent service module and administration module, wherein:
Described application program module, application program for the described system protection of needs, be used to produce the IO request, application program module in the node of the described production center is in running status, and the application program module in the described backup center node is only just moved when replacing the operation of production center node or other backup center nodes professional;
Described IO administration module links to each other with described application program module and data replication module, is used to receive the IO that IO asks and described data replication module the sends request that described application program module sends;
Described IO filtration drive module, link to each other with described IO administration module and data replication module, receive the IO request that above-mentioned two modules send, send to described operational module, and from the IO request that receives, may cause operational module to have the IO of alter operation to ask to filter out to send to other backup center nodes by described data replication module;
Described operational module links to each other with described IO filtration drive module, is used to carry out the described IO requested operation that receives;
Described local memory module links to each other with described operational module, is used to provide memory space;
Described data replication module links to each other with described IO administration module and IO filtration drive module, is used for from other node reception IO requests and business datum and/or sends IO to other nodes asking and business datum;
Described local agent service module links to each other with described administration module, when being used to detect the described system failure, sending a warning message to described administration module, and transmit heartbeat message between each node;
Described administration module, link to each other with described data replication module and local agent service module, be used for receiving the precedence information that other address of node information and user set in advance according to described heartbeat message, other nodes are carried out prioritization, and behind the warning information that receives described local agent service module, trigger described local agent service module and send heartbeat message to other nodes, notify other backup center nodes to take over all business that the node that breaks down is born according to priority order from high to low.
5. system as claimed in claim 4 is characterized in that,
Described operational module also is used to scan the sector data on the described local memory module, when detecting certain sectors of data and change, trigger described data replication module by described IO filtration drive module the data in this sector that changes are copied to other nodes.
6. multinode application redundancy method, it is characterized in that, the system that described method was suitable for comprises: production center node and two or more backup center node that the service operation ability is provided, described backup center node links to each other by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, and described method comprises:
Described production center node sends to all backup center nodes with data and/or operation when operate as normal, described backup center node is preserved data and/or the operation that described production center node sends;
When described production center node breaks down, the business of operation on it is switched on the highest backup center node of priority, when described backup center node replaces production center node operation professional, data and/or operation are sent on all the other all backup center nodes;
When the professional backup center node of operation breaks down, described business is switched on the backup center node of next priority, to keep the normal operation of described business.
7. method as claimed in claim 6 is characterized in that,
Described production center node and each backup center node are provided with the priority orders of other node respectively, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it;
Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with production center node priority inferior to described production center node;
The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.
8. method as claimed in claim 7 is characterized in that,
When described production center node or backup center node send to other backup center node with data and/or operation, the node of node to be in same address realm with it in that sends data adopts synchronous reproduction mode, Xiang Yuqi to be in and adopts the asynchronous replication mode when node in the different address realms sends when sending.
9. method as claimed in claim 6 is characterized in that,
The Centroid periodic scanning sector data that operate as normal luck industry is engaged in detects certain sectors of data when changing, and the data in this sector that changes are copied to other nodes.
10. as the described method of arbitrary claim among the claim 6-9, it is characterized in that, the sort method of described priority comprises: this node sends priority query requests bag to other nodes, after receiving the priority feedback data packet of returning, set up priority list, respectively each priority feedback data packet is carried out following operation, comprises the precedence information that the address of node information that sends this bag and user set in the described priority feedback data packet:
(a) whether the node that judge to send the priority feedback data packet is to produce Centroid, if, then the priority of this node be set to the highest, if not, execution in step (b) then;
(b) the described address of node of inquiry information, judge whether itself and production center node are in the same address realm, if, then the priority of described node is changed to and is only second to production center node but is higher than other backup center nodes, if have a plurality of and production center node to be in node in the same address realm, then the precedence information of setting according to the user between each node sorts, if not, execution in step (c);
(c) whether the described address of node of inquiry information is in the same address realm with this node, if, the priority of described node is set to be lower than the priority of this node or the precedence information set according to the user sorts, otherwise, the priority of described node is changed to the priority that is lower than this node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008102108091A CN101656624B (en) | 2008-08-18 | 2008-08-18 | Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008102108091A CN101656624B (en) | 2008-08-18 | 2008-08-18 | Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101656624A CN101656624A (en) | 2010-02-24 |
CN101656624B true CN101656624B (en) | 2011-12-07 |
Family
ID=41710730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2008102108091A Active CN101656624B (en) | 2008-08-18 | 2008-08-18 | Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101656624B (en) |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101621409B (en) * | 2009-07-22 | 2012-07-18 | 中兴通讯股份有限公司 | Service control method, service control device and broadband access servers |
CN101888405B (en) * | 2010-06-07 | 2013-03-06 | 北京高森明晨信息科技有限公司 | Cloud computing file system and data processing method |
CN102142008B (en) * | 2010-12-02 | 2013-04-17 | 华为技术有限公司 | Method and system for implementing distributed memory database, token controller and memory database |
CN102681911B (en) * | 2011-03-09 | 2016-03-02 | 腾讯科技(深圳)有限公司 | A kind of disaster tolerance system of configuration center and method |
CN102497615A (en) * | 2011-11-30 | 2012-06-13 | 清华大学 | Location-information-based clustering method for node mobile network |
CN103312753A (en) * | 2012-03-14 | 2013-09-18 | 中国移动通信集团公司 | Communication method and device of Internet of things |
CN102752093B (en) * | 2012-06-29 | 2016-02-10 | 中国联合网络通信集团有限公司 | Based on the data processing method of distributed file system, equipment and system |
CN103179204A (en) * | 2013-03-13 | 2013-06-26 | 广东新支点技术服务有限公司 | Double-proxy-based WAN (wide area network) disk image optimization method and device |
CN104080132B (en) * | 2013-03-25 | 2018-06-19 | 中国移动通信集团公司 | A kind of data processing method and equipment |
CN103324715B (en) * | 2013-06-20 | 2017-04-12 | 交通银行股份有限公司 | Disaster recovery backup system availability detection method and device |
CN103617096A (en) * | 2013-11-04 | 2014-03-05 | 华为技术有限公司 | Storage data copying method, equipment and system |
CN104636218B (en) * | 2013-11-15 | 2019-04-16 | 腾讯科技(深圳)有限公司 | Data reconstruction method and device |
EP3358466B1 (en) | 2013-12-12 | 2019-11-13 | Huawei Technologies Co., Ltd. | Data replication method and storage system |
CN103684720B (en) * | 2014-01-06 | 2017-12-19 | 迈普通信技术股份有限公司 | A kind of system of selection of active and standby service unit and device |
RU2641477C1 (en) | 2014-04-14 | 2018-01-17 | Хуавэй Текнолоджиз Ко., Лтд. | Method and device for configuration of providing solution in cloud protection computing architecture |
CN104182300B (en) * | 2014-08-19 | 2017-04-12 | 北京京东尚科信息技术有限公司 | Backup method and system of virtual machines in cluster |
CN104850628B (en) * | 2015-05-21 | 2018-06-15 | 中国工商银行股份有限公司 | The synchronous method and device of a kind of database data |
CN105141445A (en) * | 2015-07-24 | 2015-12-09 | 广州尚融网络科技有限公司 | Method and device for realizing multiple backups of multiple flow groups in high-availability cluster system |
CN105406980B (en) * | 2015-10-19 | 2018-06-05 | 浪潮(北京)电子信息产业有限公司 | A kind of multinode backup method and device |
CN106817239B (en) * | 2015-11-30 | 2020-01-31 | 华为软件技术有限公司 | site switching method, related device and system |
CN105786405B (en) | 2016-02-25 | 2018-11-13 | 华为技术有限公司 | A kind of online upgrading method, apparatus and system |
CN105827435A (en) * | 2016-03-09 | 2016-08-03 | 中国工商银行股份有限公司 | System for maintaining continuous business operation based on double center systems and method thereof |
CN105763386A (en) * | 2016-05-13 | 2016-07-13 | 中国工商银行股份有限公司 | Service processing system and method |
CN106101208A (en) * | 2016-06-10 | 2016-11-09 | 北京银信长远科技股份有限公司 | The method building cross-platform high-availability system based on Ethernet |
CN107688584A (en) * | 2016-08-05 | 2018-02-13 | 华为技术有限公司 | A kind of method, node and the system of disaster tolerance switching |
CN106649065B (en) * | 2016-12-09 | 2019-07-23 | 华北理工大学 | A kind of computer system and the faulty computer replacement method applied to the system |
CN108270814A (en) * | 2016-12-30 | 2018-07-10 | 北京优朋普乐科技有限公司 | A kind of method of data synchronization and device |
CN107196799B (en) * | 2017-05-26 | 2020-10-16 | 河南职业技术学院 | Data processing platform redundant server backup and switching operation control method |
CN109151613B (en) * | 2017-06-16 | 2022-12-02 | 中兴通讯股份有限公司 | Content distribution system and method |
CN107465537A (en) * | 2017-07-13 | 2017-12-12 | 深圳市盛路物联通讯技术有限公司 | The backup method and system of Internet of Things repeater |
CN107483228B (en) * | 2017-07-14 | 2020-02-18 | 深圳市盛路物联通讯技术有限公司 | Method and system for split backup of Internet of things repeater |
CN109274986B (en) * | 2017-07-17 | 2021-02-12 | 中兴通讯股份有限公司 | Multi-center disaster recovery method, system, storage medium and computer equipment |
CN107483542B (en) * | 2017-07-18 | 2020-09-04 | 深圳市盛路物联通讯技术有限公司 | Exception handling method and device for wireless sensor network |
CN107483330B (en) * | 2017-07-20 | 2020-07-03 | 深圳市盛路物联通讯技术有限公司 | Relay bridging method and gateway |
CN107404401B (en) * | 2017-07-21 | 2021-03-19 | 深圳市盛路物联通讯技术有限公司 | Wireless sensor network repeater exception handling method and device |
CN107529187A (en) * | 2017-07-27 | 2017-12-29 | 深圳市盛路物联通讯技术有限公司 | A kind of data back up method and device based on Internet of Things |
CN107465609B (en) * | 2017-07-31 | 2020-05-19 | 深圳市盛路物联通讯技术有限公司 | Terminal routing method based on Internet of things and Internet of things terminal |
CN107483236B (en) * | 2017-08-01 | 2021-03-19 | 深圳市盛路物联通讯技术有限公司 | Method and device for backing up access point of Internet of things |
CN107483234B (en) * | 2017-08-01 | 2021-06-22 | 深圳市盛路物联通讯技术有限公司 | Method and device for split backup of access point of Internet of things |
CN107612718A (en) * | 2017-08-29 | 2018-01-19 | 深圳市盛路物联通讯技术有限公司 | Forwarding unit switching method and device |
CN107708085B (en) * | 2017-08-29 | 2020-11-13 | 深圳市盛路物联通讯技术有限公司 | Repeater guaranteeing method and access point |
CN110635927B (en) * | 2018-06-21 | 2022-08-19 | 中兴通讯股份有限公司 | Node switching method, network node and network system |
CN109361769A (en) * | 2018-12-10 | 2019-02-19 | 浪潮(北京)电子信息产业有限公司 | A kind of disaster tolerance system and a kind of disaster recovery method |
CN109669816A (en) * | 2018-12-15 | 2019-04-23 | 无锡北方数据计算股份有限公司 | A kind of disaster tolerant backup system based on distributed structure/architecture |
CN110149366B (en) * | 2019-04-16 | 2022-03-18 | 平安科技(深圳)有限公司 | Method and device for improving availability of cluster system and computer equipment |
CN110932861A (en) * | 2019-10-17 | 2020-03-27 | 杭州安存网络科技有限公司 | Digital certificate management method, device, equipment and storage medium based on multiple CA |
CN111093249B (en) * | 2019-12-05 | 2022-06-21 | 合肥中感微电子有限公司 | Wireless local area network communication method, system and wireless transceiving equipment |
CN111130979B (en) * | 2019-12-09 | 2022-02-22 | 苏州浪潮智能科技有限公司 | Method and equipment for connecting branch node with central node in SDWAN (software development wide area network) scene |
CN111147567A (en) * | 2019-12-23 | 2020-05-12 | 中国银联股份有限公司 | Service calling method, device, equipment and medium |
CN111666179B (en) * | 2020-06-12 | 2023-03-28 | 重庆云海时代信息技术有限公司 | Intelligent replication system and server for multi-point data disaster tolerance |
CN112084200B (en) | 2020-08-24 | 2024-08-20 | 中国银联股份有限公司 | Data read-write processing method, data center, disaster recovery system and storage medium |
CN112416655A (en) * | 2020-11-26 | 2021-02-26 | 深圳市中博科创信息技术有限公司 | Storage disaster recovery system based on enterprise service portal and data copying method |
CN113595805B (en) * | 2021-08-23 | 2024-01-30 | 海南房小云科技有限公司 | Personal computer data sharing method for local area network |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1505315A (en) * | 2002-12-05 | 2004-06-16 | 华为技术有限公司 | A data disaster recovery solution method producing no interlinked data reproduction |
-
2008
- 2008-08-18 CN CN2008102108091A patent/CN101656624B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1505315A (en) * | 2002-12-05 | 2004-06-16 | 华为技术有限公司 | A data disaster recovery solution method producing no interlinked data reproduction |
Also Published As
Publication number | Publication date |
---|---|
CN101656624A (en) | 2010-02-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101656624B (en) | Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method | |
CN101741536B (en) | Data level disaster-tolerant method and system and production center node | |
US9916113B2 (en) | System and method for mirroring data | |
US9794232B2 (en) | Method for data privacy in a fixed content distributed data storage | |
EP3522494B1 (en) | Cloud storage based data processing method and device | |
US7254740B2 (en) | System and method for state preservation in a stretch cluster | |
US7383463B2 (en) | Internet protocol based disaster recovery of a server | |
US8103630B2 (en) | Data processing system and storage subsystem provided in data processing system | |
JP2004532442A (en) | Failover processing in a storage system | |
JP2010509686A (en) | Primary cluster fast recovery | |
CN1838055A (en) | Storage replication system with data tracking | |
US20100031081A1 (en) | Data Storage System and Control Method Thereof | |
US20110228084A1 (en) | Content management in a video surveillance system | |
US8527454B2 (en) | Data replication using a shared resource | |
CN101552799A (en) | Media node fault-tolerance method and device | |
CN111522499A (en) | Operation and maintenance data reading device and reading method thereof | |
KR101466007B1 (en) | A multiple duplexed network video recorder and the recording method thereof | |
JP2001045023A (en) | Video server system and video data distribution method | |
CN1889418B (en) | Network storing method and network storing system | |
CN116226093B (en) | Real-time database system based on dual-activity high-availability architecture | |
JP2004094608A (en) | Data backup method and data backup device | |
CN104281591B (en) | The remote disaster tolerance technology integrated based on data particle | |
CN117201281A (en) | Network equipment fault processing method and device, electronic equipment and storage medium | |
JP2009265811A (en) | Subscriber information backup system and authentication server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |