CN101656624B

CN101656624B - Multi-node application-level disaster recovery system and multi-node application-level disaster recovery method

Info

Publication number: CN101656624B
Application number: CN2008102108091A
Authority: CN
Inventors: 刘步荣
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2008-08-18
Filing date: 2008-08-18
Publication date: 2011-12-07
Anticipated expiration: 2028-08-18
Also published as: CN101656624A

Abstract

The invention discloses a multi-node application-level disaster recovery system and a multi-node application-level disaster recovery method. The system comprises a production central node and two or more backup central nodes, wherein the backup central nodes and the production central node are connected through a local area network and/or a wide area network; and the backup central nodes are arranged in a priority order from high to low, and the priority of the production central node is higher than that of all the backup central nodes, wherein when the production central node works normally, data and/or operation is sent to all the backup central nodes, and when the production central node fails, services running on the production central node are switched to the backup central node with the highest priority; and the backup central node is used for saving the data and/or the operation sent by the production central node, sending the data and/or the operation to all the remaining backup central nodes when the backup central node replaces the production central node or other backup central nodes to run the services, and switching the services to the backup central node with the lower priority when failures happen to the running services.

Description

A kind of multinode application redundancy system and disaster recovery method

Technical field

The present invention relates to communication technique field, relate in particular to a kind of multinode application redundancy system and disaster recovery method.

Background technology

Disaster tolerance is meant for after guaranteeing key business and being applied in the various disasters of experience, a series of system plannings and the construction behavior that still can provide normal service to carry out to greatest extent.Typical disaster event is a natural disaster, as fire, flood, earthquake, cyclone, typhoon etc., also has other as originally offering the required service disruption of service operation, as equipment fault, software error, communication network interruption and power failure or the like.In addition, artificial factor often also can be bred disaster, as operator error, destruction, implantation harmful code and the attack of terrorism.

The essence of disaster tolerance is to guarantee never-ceasing service operation, and the final construction object of disaster tolerance is to guarantee business continuance.The origin of disaster tolerance industry and development are the inevitable outcomes of technical development of computer, have also reflected information system and the data importance to individual, enterprise, country simultaneously.

The disaster tolerance kind is divided from protection level, and can be divided into data redundancy and application redundancy substantially: the focus of data redundancy is data itself, will guarantee that after disaster takes place original data can not lose or be destroyed; Application redundancy is on the basis of data redundancy, in the same identical application system of a cover that makes up of backup site, under the prerequisite that realizes data redundancy, guarantee the lasting available of outward service, when disaster takes place, can realize professional the switching aborning between the heart and the strange land Disaster Preparation Center, all application programs that influenced by disaster are taken over seamlessly back-up system, guarantee the professional availability that continues.

At present, the disaster tolerance system that industry is popular has 2 big classes:

First class is based on the disaster tolerance system of accumulation layer, occupy dominant position in high-end storage, mainly based on the disk array reproduction technology, by storage system built-in firmware or operating system, by special circuit for example optical-fibre channel realize duplicating or mirror image of data between the physical storage device, not only can do asynchronous replication, also can do synchronization replication, have nothing to do with the operating system platform application, guaranteed the consistency of two end datas.

But the shortcoming of this disaster tolerance system also clearly: data redundancy only is provided, the application redundancy function is not provided; Hardware investment is very expensive, and the user must be equipped with end in local side and calamity and dispose the identical storage system of two covers respectively, purchase cost height not only, but also to be limited by single equipment vendors, following autgmentability certainly will lack flexibility; The method of synchronization is bigger to the traffic handing capacity influence of home site; Active and standby machine distance can not be too big, generally is limited within the 200km; Requirement to the network bandwidth is relatively very high, generally needs to surpass the bandwidth of GB.

Second largest class is based on the disaster tolerance system of server layer, in the low and middle-end memory device, occupy dominant position, by being installed in the data in server propagation software, or the data that application program provides are duplicated, disaster recovery instrument (as the related tool of database), utilize the TCP/IP network to connect the Disaster Preparation Center of far-end, realization strange land data are duplicated.Transmission range between the Disaster Preparation Center of strange land without limits and storage hardware irrelevant, server and memory device that can compatible different brands, user's input cost is lower relatively.

But the present main product in the industry cycle of this disaster tolerance system has following drawback: with prior art, owing to the actual demand that input cost is excessive and temporary transient is not vigorous, the application redundancy function of 2 nodes can only be provided, surpass 2 nodes, the data redundancy function can only be provided; Single synchronization replication or asynchronous replication mode can only be provided, can not be according to the suitable replication strategy of the flexible automatic utilization of actual conditions.Data are duplicated the operating level that is based on file system, after detecting file change, need carry out in file-level or piece rank when carrying out the residual quantity transfer of data, and efficiency of transmission is lower.

Summary of the invention

The present invention wants the technical solution problem to provide a kind of multinode application redundancy system and disaster recovery method, to realize the application redundancy of multinode.

In order to solve the problems of the technologies described above, the invention provides a kind of multinode application redundancy system, comprise production center node and two or more backup center node, described two or more backup center node is connected by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, wherein:

Described production center node is used to provide the service operation ability, during operate as normal, data and/or operation is sent to all backup center nodes, and the business with operation on it when breaking down switches on the highest backup center node of priority;

Described backup center node, be used to preserve data and/or the operation that described production center node sends, when replacing the operation of production center node or other backup center nodes professional, data and/or operation are sent on all the other all backup center nodes, and when replacing production center node or other backup center node operation business to break down, described business is switched on the backup center node of next priority, to keep the normal operation of described business.

Further, described production center node and backup center node also are used to carry out prioritization, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it; Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with the production center priority inferior to described production center node; The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.

Further, when described production center node or backup center node send to other backup center node with data and/or operation, the node of node to be in same address realm with it in that sends data adopts synchronous reproduction mode, Xiang Yuqi to be in and adopts the asynchronous replication mode when node in the different address realms sends when sending.

Further, described production center node and backup center node include application program module, IO administration module, IO filtration drive module, operational module, local memory module, data replication module, local agent service module and administration module, wherein:

Described application program module, application program for the described system protection of needs, be used to produce the IO request, application program module in the node of the described production center is in running status, and the application program module in the described backup center node is only just moved when replacing the operation of production center node or other backup center nodes professional;

Described IO administration module links to each other with described application program module and data replication module, is used to receive the IO that IO asks and described data replication module the receives request that described application program module sends;

Described IO filtration drive module, link to each other with described IO administration module and data replication module, receive the IO request that above-mentioned two modules send, send to described operational module, and from the IO request that receives, may cause operational module to have the IO of alter operation to ask to filter out to send to other backup center nodes by described data replication module;

Described operational module links to each other with described IO filtration drive module, is used to carry out the described IO requested operation that receives;

Described local memory module links to each other with described operational module, is used to provide memory space;

Described data replication module links to each other with described IO administration module and IO filtration drive module, is used for from other node reception IO requests and business datum and/or sends IO to other nodes asking and business datum;

Described local agent service module links to each other with described administration module, when being used to detect the described system failure, sending a warning message to described administration module, and transmit heartbeat message between each node;

Described administration module, link to each other with described data replication module and local agent service module, be used for receiving the precedence information that other address of node information and user set in advance according to described heartbeat message, other nodes are carried out prioritization, and behind the warning information that receives described local agent service module, trigger described local agent service module and send heartbeat message to other nodes, notify other backup nodes to take over all business that the node that breaks down is born according to priority order from high to low.

Further, described operational module also is used to scan the sector data on the described local memory module, when detecting certain sectors of data and change, trigger described data replication module by described IO filtration drive module the data in this sector that changes are copied to other nodes.

In order to address the above problem, the present invention also provides a kind of multinode application redundancy method, the system that described method was suitable for comprises: production center node and two or more backup center node that the service operation ability is provided, described backup center node links to each other by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, and described method comprises:

Described production center node sends to all backup center nodes with data and/or operation when operate as normal, described backup center node is preserved data and/or the operation that described production center node sends;

When described production center node breaks down, the business of operation on it is switched on the highest backup center node of priority, when described backup center node replaces production center node operation professional, data and/or operation are sent on all the other all backup center nodes;

When the professional backup center node of operation breaks down, described business is switched on the backup center node of next priority, to keep the normal operation of described business.

Further, described production center node and each backup center node are provided with the priority orders of other node respectively, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it; Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with the production center priority inferior to described production center node; The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.

Further, the Centroid periodic scanning sector data that operate as normal luck industry is engaged in detects certain sectors of data when changing, and the data in this sector that changes are copied to other nodes.

Further, the sort method of described priority comprises: this node sends priority query requests bag to other nodes, after receiving the priority feedback data packet of returning, set up priority list, respectively each priority feedback data packet is carried out following operation, comprises the precedence information that the address of node information that sends this bag and user set in the described priority feedback data packet:

(a) whether the node that judge to send the priority feedback data packet is to produce Centroid, if, then the priority of this node be set to the highest, if not, execution in step (b) then;

(b) the described address of node of inquiry information, judge whether itself and production center node are in the same address realm, if, then the priority of described node is changed to and is only second to production center node but is higher than other backup center nodes, if have a plurality of and the production center to be in node in the same address realm, then the precedence information of setting according to the user between each node sorts, if not, execution in step (c);

(c) whether the described address of node of inquiry information is in the same address realm with this node, if, the priority of described node is set to be lower than the priority of this node or the precedence information set according to the user sorts, otherwise, the priority of described node is changed to the priority that is lower than this node.

The present invention has realized the application redundancy of multinode, and the transmission between a plurality of nodes can automatically be selected different synchronous or asynchronous replication strategies for use flexibly according to actual conditions, and detect the data total amount that residual quantity data transmission policies after the data variation can more can effectively reduce transmission than file-level or piece rank transmission policy, slow down bandwidth pressure.

Description of drawings

Fig. 1 is the structural representation of the described multinode application redundancy of the invention process system;

Fig. 2 is in the described multinode application redundancy of the invention process system, the structural representation of HA service module;

Fig. 3 is in the described multinode application redundancy of the embodiment of the invention system, the structural representation of data replication module;

In the described method of Fig. 4 embodiment of the invention, the schematic flow sheet of multinode application redundancy system handles file I/O request;

Fig. 5 is in the described method of the embodiment of the invention, the schematic flow sheet that multinode application redundancy system handles data are duplicated;

Fig. 6 is in the described method of the embodiment of the invention, the handling process schematic diagram of HA service module sending node priority query requests bag in the multinode application redundancy system;

Fig. 7 is in the described method of the embodiment of the invention, the handling process schematic diagram of HA service module receiving node priority query requests bag in the multinode application redundancy system.

Embodiment

As shown in Figure 1, the disaster tolerance system that the present invention proposes mainly comprises: by production center node and N backup center node of external lan (LAN) and/or wide area network (WAN) connection, wherein, production center node is for being responsible for the node of operation business application, other aborning during the heart node failure node of the normal operation of maintenance service application program be called the backup center node.LAN and/or WAN network specifically are divided into public network and private network again according to purposes, public network is used to provide the passage of client access service, and private network is used to provide data Copy Info and the heartbeat message transmission between each node (comprising production center node and backup center node).The backup center node can be positioned at same position with production center node, also can be positioned at other positions outside arbitrarily remote.Same position can be meant two euclidean distance between node pair (in 100 kilometers) or be meant that two nodes are in same IP section within the specific limits.

As shown in Figure 1, structural representation for the described multinode application redundancy of the invention process system, comprise production center node 100, local backup Centroid 101 and remote backup Centroid 102, wherein local backup Centroid 101 and remote backup Centroid 102 are as the application redundancy backup center node of production center node 100, and three nodes connect by LAN and/or WAN.

The internal structure of production center node 100, local backup Centroid 101 and remote backup Centroid 102 is identical.Be the internal structure that example illustrates each node with production center node 100 below.As shown in Figure 1, production center node 100 specifically mainly comprises: application program module 1001, IO administration module 1002, IO filtration drive module 1003, operational module 1004, local memory module 1005, data replication module 1006, administration module 1007 and local agent (HA) service module 1008, wherein:

Application program module 1001, be meant that the application program that needs the disaster tolerance system protection is (according to concrete application, a plurality of application programs when possible, also may be an application program), for example create when application program module 1001 needs to carry out, during the operation of deletion or revised file etc., will produce corresponding IO request, send to operational module 1004 by IO administration module 1002 and IO filtration drive module 1003 and carry out; The application program module of production center node is in normal operating condition, and the application program module of backup center node is not moved, and only just passes through the data and/or the operation of the scheduling load store of administration module when replacing production center node to run application;

IO administration module 1002, link to each other with application program module 1001, IO filtration drive module 1003 and data replication module 1006, be responsible for receiving all IO requests, comprise that the IO request to file read-write, the IO that network is read and write ask and ask from the IO that described data replication module 1006 receives, unification is sent the IO request into IO filtration drive module 1003 then;

IO filtration drive module 1003, with IO administration module 1002, operational module 1004 and data replication module 1006 link to each other, be used for being responsible for detecting to cause change (as adding at operational module 1004, deletion, operations such as modification) IO request, IO request when needing that at least the target of mirror image between production center node and the backup center node carried out write operation filters out, when needs are carried out copy operation to next node, just data replication module 1006 is delivered in this IO request, copy to other backup center nodes, these IO requests comprise establishment, the size of deletion or revised file, request such as attribute or security descriptor;

Operational module 1004 links to each other with IO filtration drive module 1003 and local memory module 1005, is used to provide the IO request is converted to actual data manipulation, carries out concrete operations such as establishment, deletion or revised file;

Local memory module 1005 links to each other with operational module 1004, is used to the memory space that provides local;

Data replication module 1006, link to each other with IO administration module 1002, IO filtration drive module 1003 and administration module 1007, inside comprises Data Receiving submodule 1061 and data send submodule 1062, wherein Data Receiving submodule 1061 links to each other with IO administration module 1002, be used between node, receiving data, data send submodule 1062 and link to each other with IO filtration drive module 1003, are used for sending between node data;

Administration module 1007, link to each other with data replication module 1006 and HA service module 1008, at the system monitoring management aspect, after HA service module 1008 detects the system failure, administration module 1007 is responsible for receiving the warning information that HA service module 1008 sends, and triggers the alarming processing flow process of administration module 1007 inside; Aspect data duplicate, obtain the precedence information that other nodes address information of living in and user set in advance by heartbeat message with other nodes, other nodes are carried out prioritization, and when the priority orders backup center node transmission data lower than this node are switched with business, pay the utmost attention to the node of high priority, guarantee the consistency that data are duplicated as far as possible, perhaps receive the business datum that other nodes switch to this node; In the present embodiment, priority is relevant with the node present position between the node, the node priority near more with production center nodal distance is high more, the highest with production center node in the priority of the backup center node of same address realm, aborning between the backup center that the heart and this priority are the highest, the preferred synchronous reproduction mode that adopts, can guarantee the real-time and the consistency of two node datas, the backup center node far away with production center nodal distance then adopts the asynchronous replication mode, for two or more the backup center node between same address, also preferably adopt synchronous reproduction mode in addition; Aspect the business switching, be used for judging the handover operation whether needs are carried out service resources according to the warning information that HA service module 1008 provides, guarantee professional continuation, when needs are carried out handover operation, from prioritization, select a node that is only second to this node priority by administration module, send the request of switching to it, administration module as the backup center node is responsible for switching after receiving handoff request, backed up data or operation are loaded on the application program of this backup center node; In addition, compression options can also be set data being compressed and decompressed on administration module 1007, and, data be encrypted and deciphered according to the encryption policy that the user is provided with;

HA service module 1008, link to each other with administration module 1007, be used to monitor the state of self node and other nodes, transmit heartbeat message by private network, when production center node breaks down or during disaster, notify other backup center nodes by HA service module 1008, take over all business that the production center node that breaks down is born, guarantee professional continuation according to the priority policy of setting.

Fig. 2 shows the internal structure of HA service module 1008, and it mainly comprises system monitoring module 2801, resource object detection module 2802, resource object module 2803, wherein:

System monitoring module 2801, link to each other with resource object detection module 2802, be used to detect the detection state of resources information that resource object detection module 2802 sends, when not detecting described state information in the given time or receive the state information that is labeled as mistake, send alarm to administration module, simultaneously when being defined as resource object module 2803 and breaking down, described system monitoring module 2801 sends the heartbeat message that is labeled as mistake to the system monitoring module of other nodes by private network, triggers the alarming processing flow process of the administration module of other nodes; Simultaneously, system monitoring module 2801 also is responsible for regularly sending the heartbeat message communication by private network to the system monitoring module of other nodes, relation between coordinated operation production center node and other backup center nodes, and the external heart beat information sent of the system monitoring module that regularly detects other backup center nodes, in heartbeat message, comprise precedence information to node address and consumer premise setting, when not detecting the external heart beat information that other backup center nodes send in the given time or receiving when being labeled as wrong heartbeat message, illustrate that other backup center nodes may break down, the administration module of node carries out respective handling thereby excite separately;

Resource object detection module 2802, link to each other with system monitoring module 2801 and resource object module 2803, as a monitor module, main task is to be used for monitoring resource object module 2803, the availability of some important hardware and software resources for example, comprise database service or other application service process etc., and send the state of resources information that detects, the state of report service to system monitoring module 2801; If system monitoring module 2801 receives the information that resource object detection module 2802 is sent in the given time, just think that this service is normal, receive relevant information (information that is referred to as disappears) or receive mistake (error) information if confiscate in the given time, just think that this service is abnormal; System monitoring module 2801 can determine whether the resource service that resource object detection module 2802 detected is normal according to the detection information that receives, and then carries out corresponding the processing and move;

Resource object module 2803 is hardware and software resources of resource object detection module 2802 monitorings, and include but not limited to following content: server self hardware state comprises states such as hard disk, internal memory, network interface card; Internet resources, for example floating IP address; Share storage resources, for example magnetic battle array; Database Systems, for example Oracle, Sybase, SQL, Informix etc.; Important system application module, for example WWW service, FTP service etc.

Fig. 3 shows the internal structure of data replication module 1006, and it comprises that mainly Data Receiving submodule 1061 and data send submodule 1062.Data Receiving submodule 1061 mainly comprises receiving port negotiation element 1611, receives detecting unit 1612, receives fifo queue 1613, decrypting device 1614 and decompression unit 1615, wherein:

Receiving port negotiation element 1611 links to each other with reception detecting unit 1612, is used for the target node position of determination data bag, the concrete mode of determination data bag transmission.

Receive detecting unit 1612, link to each other, be used for detecting whether receive packet, and whether be used for detecting reception fifo queue 1613 full with receiving port negotiation element 1611 and reception fifo queue 1613;

Receive fifo queue 1613, link to each other with reception detecting unit 1612 and decrypting device 1614, be used for the IO request package that buffer memory need send to this node, be forwarded to this node or decrypting device 1614 successively (for not needing to decipher or the IO request package of decompression according to the order of first in first out, directly be forwarded to this node, IO request package for needs deciphering or decompression is forwarded to decrypting device 1614);

Decrypting device 1614 links to each other with reception fifo queue 1613 and decompression unit 1615, is used for setting according to user's request the encryption policy of transmits data packets, and the encrypted packets that transmits is decrypted operation;

Decompression unit 1615 links to each other with decrypting device 1614, is used for setting according to user's request the compression ratio of transmits data packets, and the compressed data packets that transmits is separated press operation.

Data send submodule 1062 and mainly comprise transmission detecting unit 1621, send fifo queue 1622, compression unit 1623, ciphering unit 1624 and transmit port negotiation element 1625, wherein:

Send detecting unit 1621, link to each other, be used for detecting whether receive packet, and whether be used for detecting the transmission fifo queue full with transmission fifo queue 1622;

Send fifo queue 1622, link to each other with transmission detecting unit 1621 and compression unit 1623, be used for the IO request package that buffer memory need send to next node, send to next node or compression unit 1623 successively (for not needing to compress or the IO request package of encryption according to the order of first in first out, directly send to next node, IO request package for needs compression or encryption then is forwarded to compression unit 1623);

Compression unit 1623 links to each other with transmission fifo queue 1622 and ciphering unit 1624, is used for setting according to user's request the compression ratio of transmits data packets, and packet is carried out squeeze operation;

Ciphering unit 1624 links to each other with compression unit 1623 and transmit port negotiation element 1625, is used for setting according to user's request the encryption policy of transmits data packets, and packet is carried out cryptographic operation;

Transmit port negotiation element 1625 links to each other with ciphering unit 1624, is used for the target node position of determination data bag, the concrete mode of determination data bag transmission.

The present invention also provides a kind of multinode application redundancy method, in a plurality of nodes by LAN and/or WAN network connection, the node of being responsible for the operation business application is called production center node, other nodes are called the backup center node, LAN and/or WAN network specifically are divided into public network and private network again according to purposes, public network is used to provide the passage of client access service, private network is used to provide data between nodes synchronization replication and heartbeat message transmission, when this method is in normal operating conditions at described production center node, data that need protection and/or operation are carried out copy transmissions by LAN and/or WAN network, backup copy is to each backup center node, when production center node breaks down, the service application of production center node is at first all switched on the highest backup center node of priority, to keep professional normal operation; If the backup center node that priority is the highest also breaks down when replacing production center node to keep the normal operation of business, then business is switched on the inferior high backup center node of priority, to keep professional normal operation; The rest may be inferred, if when the backup center node of a certain priority replaces node operation business in the production center to break down, then business switched on the backup center node of next priority.

Have only 1 node can move a certain concrete application program in a plurality of nodes, other nodes are as the redundancy backup function, and typical configuration comprises 3 nodes as shown in Figure 1,2 nodes of local configuration are realized the two cabinet highly available systems of two-shipper, and 1 node of Remote configuration is realized application redundancy.

The concrete object that switching was duplicated and used to described data between nodes need decide according to priority.Use to switch to have only and just carry out handover operation after heart node breaks down aborning, switching destination node at first is other nodes that are only second to production center node priority, determines according to the prioritization on the node that breaks down.Data are duplicated and need be carried out copy operation from priority node to the low priority node successively at each node.At a certain concrete node, duplicate destination node and be other nodes that are only second to self node priority, adopt the mode of cascade to duplicate step by step.

The decision principle of node priority is, the rank of production center node always is higher than the rank of backup center node, for all backup center nodes, the priority of backup center node that is positioned at same address realm with production center node is the highest, if being in production center node has a plurality of backup center nodes in the same address realm, then can according to the user priority of disposing order is set or according to arranging with the distance of production center node.For a plurality of backup center nodes in being in same address realm, sort or according to this node priority height, other the low mode of node priority sorts in the address realm in the inner priority that is provided with of administration module in advance according to the user.When duplicating with professional the switching, data pay the utmost attention to local node.The data of each node are duplicated and can only be duplicated at all node medium priority the highest nodes lower than this node priority.By the mode of cascade, realize the multiple duplication of data and the application redundancy function of multinode.

Described data between nodes transmission means, the source replica node can be selected different replication strategies for use flexibly according to actual conditions, and whether the source replica node is in same address by comparison purpose replica node is selected to adopt synchronous reproduction mode or asynchronous replication mode.When two nodes are positioned at same address, adopt synchronous reproduction mode, thereby guarantee real-time, the consistency of data; Take the asynchronous replication mode between two centers between two nodes are positioned at different addresses, are long-range, guarantee that transmission range is unrestricted, but data have certain time-delay.

Described node is after detecting file change, the data transmission policies that adopts in the real time data synchronization process that need carry out between source replica node and purpose replica node is by the sector-level transmission policy, with reference to figure 1, for example, at first filtering out the IO request by IO filtration drive module is read operation or write operation, if detecting is write operation, then according to the IO write operation at operational module, execution result on the local memory module, by the sector-level data of operational module periodic scanning on local memory module, relatively the difference of sector-level data changes, when detecting certain sectors of data when changing, then feed back to IO filtration drive module, further the data in this sector that changes are copied to other nodes by IO filtration drive module trigger data replication module, relatively and file-level or piece level transmission policy, the sector-level transmission policy can effectively reduce the data total amount of transmission, only transmit the sector piece that data change, and do not need the whole file or the data block that change are all copied to backup center.

Be elaborated to utilizing the described system of the embodiment of the invention to carry out the multinode application redundancy below in conjunction with Fig. 4, Fig. 5 and Fig. 6.

The present embodiment method mainly comprises: application programs produces the respective handling that need carry out the IO request of file read-write the IO operation requests of modification or deleted file (for example create), IO filtration drive module filters out the IO write request, when needs when next node is carried out the data copy operation, IO filtration drive module is transmitted to the data replication module with corresponding IO write request, otherwise only simply the IO request is forwarded to operational module, carries out the disk read and/or write.

The flow process signal of the multinode application redundancy system handles file I/O request shown in Fig. 4 mainly may further comprise the steps:

Step 410, during the node operation, whether administration module is in running status by the detection business application is judged whether this node is to produce Centroid, this node is for producing Centroid if this business application is in running status, other nodes are the backup center node, change step 415 and carry out, if this business application is not in running status, then this node is the backup center node, then changes step 416 and carries out;

Step 415, application program produce and need for example create the IO request of file read-write, during the operation of modification or deleted file, execution in step 420;

Step 416, the data replication module is converted to the IO write request with the packet that receives, execution in step 420;

Step 420 is forwarded to the IO administration module with these IO requests to file read-write;

Step 430, the IO administration module is sent the IO request into IO filtration drive module;

Step 440, IO filtration drive module judge that whether IO asks write request, if write request, then execution in step 450; If not write request, then execution in step 445;

Step 445, IO filtration drive module will be read the IO request and send to operational module, carry out the disk read operation, and this flow process finishes;

Step 450, IO filtration drive module judges whether this node is end-node (being the minimum node of priority), end-node does not need to carry out the data copy operation again, do not need the copy operation of IO filtration drive module trigger data, directly carrying out write operation gets final product, if end-node, then execution in step 460; If not end-node, then execution in step 455;

Step 455 for intermediate node, all needs to carry out disk write operation, and IO filtration drive module is converted to packet with the IO request and sends to the data replication module, changes step 460 then;

Step 460, IO filtration drive module will be write the IO request and issue operational module, carry out disk write operation;

Step 470, the data replication module is put into the transmission fifo queue with the packet that forwards, and is ready for sending the data replication module to next priority backup center, and this flow process finishes.

The described method of the embodiment of the invention comprises that also data are duplicated the option that the IO request package transmits between node carry out compression ratio setting and encryption and decryption setting.Fig. 5 shows the schematic flow sheet that multinode application redundancy system handles data are duplicated, and can may further comprise the steps:

The transmission detecting unit that step 510, data send in the submodule detects whether receive packet, if receive packet, then execution in step 520, does not detect if receive packet then continue;

Step 520 sends detecting unit and detects whether the transmission fifo queue is full state, if full state, then execution in step 525; If not full state, then execution in step 530;

Step 525 sends detecting unit and stops to return step 520 and continuing to carry out to sending fifo queue transmission data;

Step 530, the transmission detecting unit is sent packet into the transmission fifo queue and is stored;

Step 540 sends fifo queue and according to the network bandwidth situation packet is progressively sent into compression module;

Step 550 detects on the administration module whether be provided with compression options; If be provided with compression options, then execution in step 555; If compression options is not set, then execution in step 560;

Step 555 is carried out squeeze operation to packet, execution in step 560 according to the compression ratio that compression is provided with;

Step 560 is forwarded to encrypting module with packet, detects on the administration module whether Encryption Options is set, and is provided with then execution in step 565 of Encryption Options, and then execution in step 570 of Encryption Options is not set;

Step 565 is carried out cryptographic operation according to the encryption policy that compression is provided with data flow, and execution in step 570 then;

Step 570 is forwarded to the transmit port negotiation module with packet, and the transmit port negotiation module triggers the priority of management module analysis residue backup center node, the destination node of determination data bag;

Step 580, the transmit port negotiation module is set up with destination node and is linked, and judge whether this node and destination node are in same address realm, with the determination data copy mode, if be in same address realm then change step 585 and carry out, if be not in same address realm then change step 586 and carry out;

Step 585 adopts synchronous reproduction mode between this node and the destination node, and execution in step 590 then;

Step 586 adopts the asynchronous replication mode between this node and the destination node, execution in step 590 then;

Step 590, the transmit port negotiation module sends packet to destination node, and packet enters private network, and this flow process finishes.

As shown in Figure 6, in the described method of the embodiment of the invention, the HA service module in the multinode application redundancy system handles node mainly may further comprise the steps to the schematic flow sheet of other nodes transmission priority query requests bags:

Step 610, HA service module send the priority query requests bag of query node priority to other nodes by private network;

Step 620, HA service module receive the priority feedback data packet of other nodes at the node priority inquiry;

The priority feedback data packet comprises the precedence information that address of node information and user set.

Step 630, administration module are set up the priority and the tabulation of IP address corresponding priorities of other nodes, and the priority feedback data packet is carried out the relevant information parse operation;

Step 640, whether administration module inquiry corresponding node is to produce Centroid (whether being in running status by the detection business application judges), if production center node, then execution in step 645; If not production center node, then execution in step 646;

Step 645, address of node, management module records production center information, the priority of this node is higher than the priority of any backup center node, returns step 620;

Step 646, the address information of management module records corresponding node, and whether inquire about this corresponding node identical with address of node, production center information, the identical step 650 of then changeing, otherwise change step 660;

If do not have address of node, the production center information this moment, then according to the processing inequality of production center node address information, promptly execution in step 660.

Whether step 650 has other nodes identical with this address of node information in the administration module inquiry priority list, if having then execution in step 652; If do not have then execution in step 654;

Step 652, the priority of setting according to the user sorts to this node, and the priority of this node is come before this node, and execution in step 690 then;

Step 654 comes the foremost of priority list with the priority of this node, and execution in step 690 then;

Step 660, whether the address information of administration module inquiry corresponding node and this node self is identical, if inequality then execution in step 665; If identical then execution in step 666;

Step 665, the priority of this node come with this node and are in after the node of identical address, and execution in step 690 then;

Whether step 666 contains the node identical with production center node address in the administration module Query List, if having then execution in step 670; If do not have then execution in step 680;

Step 670, the priority of this node come in priority list and produce after the node that node is in identical address, and the priority of setting according to the user between the node of identical address sorts, and execution in step 690 then;

Step 680, this node comes the foremost of priority list, and the priority of setting according to the user between the node of identical address sorts, and execution in step 690 then;

After step 690, administration module are finished ordering according to precedence information, upgrade priority list, this flow process finishes.

In the sort method that present embodiment is given an example, production center node in other embodiments, can place this tabulation with this production center node when ordering not in priority list.

The administration module of each node can carry out the ordering of priority.Prioritization on each backup center node is not necessarily identical, but needs all to guarantee that the priority of production center node is the highest.The priority that is in other nodes of same address realm with this node can be set to be lower than this node, and other and this node are in the prioritization that the node priority in strange land can be provided with according to the user.

As shown in Figure 7, in the described method of the embodiment of the invention, after the HA service module receives the priority query requests bag that other nodes send,, form feedback data packet and send the schematic flow sheet that goes back, specifically may further comprise the steps the self information encapsulation:

Step 710, HA service module receive the node priority query requests bag that other nodes are sent;

Step 715, administration module inquiry and minute book address of node;

Step 720, administration module judge that whether this node is to produce Centroid, if production center node execution in step 730 then; If not production center node execution in step 725 then;

Step 725, the precedence information that the user is provided with in administration module inquiry and the minute book node changes step 730 and continues to carry out;

Step 730, HA service module will be collected information and be encapsulated as priority inquiry feedback packet, as the feedback to priority query requests bag;

Step 740, the HA service module is inquired about feedback packet with priority and is sent to this node that sends priority query requests bag by private network, and this flow process finishes.

Be example only below with production center node 100, the method of utilizing multinode application redundancy system to carry out the multinode application redundancy is illustrated, in fact, those of ordinary skills will be appreciated that, in whole multinode application redundancy system, backup center node 101,102 servers and production center node 100 are operations of carrying out simultaneously roughly the same, can guarantee to automatically switch to when system breaks down the backup center node like this.For the detailed process of backup center node 101,102, the processing that please refer to production center node 100 is described, and repeats no more herein.

Multinode application redundancy system and method provided by the invention has solved the problem that the multinode application redundancy can't be provided that exists in the present prior art, can effectively improve application system whole reliability and availability.Aspect data transfer mode, the present invention can adopt different replication strategies flexibly according to actual conditions, the advantages of synchronization replication and asynchronous replication is got up, effectively avoid the shortcoming of copy mode separately, for example between two these two centers of cabinet of local active standby, adopt synchronous reproduction mode, thereby guarantee real-time, the consistency of data, and take the asynchronous replication mode between two centers between long-range, guarantee that transmission range is unrestricted.Aspect the residual quantity data transmission policies that adopts in the two ends real time data synchronizing process of after detecting file change, implementing, the present invention replaces existing file-level or piece level data transmission policies by the sector-level data transmission policies, only transmit the sector piece that data change, and do not need the whole file or the data block that change are all copied to backup center, effectively reduce the data total amount of transmission.

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claims.

Claims

1. multinode application redundancy system, it is characterized in that, comprise production center node and two or more backup center node, described two or more backup center node is connected by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, wherein:

2. the system as claimed in claim 1 is characterized in that,

Described production center node and backup center node also are used to carry out prioritization, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it;

Be in the priority of the priority of the backup center node in the same address realm with described production center node, but be higher than other are not in the backup center node in the same address realm with production center node priority inferior to described production center node;

The priority that is in other backup center nodes in the same address realm with this backup center node is lower than the priority of this backup center node.

3. system as claimed in claim 2 is characterized in that,

When described production center node or backup center node send to other backup center node with data and/or operation, the node of node to be in same address realm with it in that sends data adopts synchronous reproduction mode, Xiang Yuqi to be in and adopts the asynchronous replication mode when node in the different address realms sends when sending.

4. as claim 1 or 2 or 3 described systems, it is characterized in that, described production center node and backup center node include application program module, IO administration module, IO filtration drive module, operational module, local memory module, data replication module, local agent service module and administration module, wherein:

Described IO administration module links to each other with described application program module and data replication module, is used to receive the IO that IO asks and described data replication module the sends request that described application program module sends;

Described administration module, link to each other with described data replication module and local agent service module, be used for receiving the precedence information that other address of node information and user set in advance according to described heartbeat message, other nodes are carried out prioritization, and behind the warning information that receives described local agent service module, trigger described local agent service module and send heartbeat message to other nodes, notify other backup center nodes to take over all business that the node that breaks down is born according to priority order from high to low.

5. system as claimed in claim 4 is characterized in that,

Described operational module also is used to scan the sector data on the described local memory module, when detecting certain sectors of data and change, trigger described data replication module by described IO filtration drive module the data in this sector that changes are copied to other nodes.

6. multinode application redundancy method, it is characterized in that, the system that described method was suitable for comprises: production center node and two or more backup center node that the service operation ability is provided, described backup center node links to each other by local area network (LAN) and/or wide area network with described production center node, each backup center node is provided with priority orders from high to low, the priority of described production center node is higher than the priority of all backup center nodes, and described method comprises:

7. method as claimed in claim 6 is characterized in that,

Described production center node and each backup center node are provided with the priority orders of other node respectively, when the operation business breaks down, described business are switched on the backup center node of next priority according to the prioritization on it;

8. method as claimed in claim 7 is characterized in that,

9. method as claimed in claim 6 is characterized in that,

The Centroid periodic scanning sector data that operate as normal luck industry is engaged in detects certain sectors of data when changing, and the data in this sector that changes are copied to other nodes.

10. as the described method of arbitrary claim among the claim 6-9, it is characterized in that, the sort method of described priority comprises: this node sends priority query requests bag to other nodes, after receiving the priority feedback data packet of returning, set up priority list, respectively each priority feedback data packet is carried out following operation, comprises the precedence information that the address of node information that sends this bag and user set in the described priority feedback data packet:

(b) the described address of node of inquiry information, judge whether itself and production center node are in the same address realm, if, then the priority of described node is changed to and is only second to production center node but is higher than other backup center nodes, if have a plurality of and production center node to be in node in the same address realm, then the precedence information of setting according to the user between each node sorts, if not, execution in step (c);