CN111831490B

CN111831490B - Method and system for synchronizing memories between redundant main and standby nodes

Info

Publication number: CN111831490B
Application number: CN202010614622.9A
Authority: CN
Inventors: 徐振朋; 李韦韦; 马若飞; 杨建�; 殷进勇; 尤长军; 唐道奎; 鲜于运香; 杨智; 梁丁
Original assignee: 716th Research Institute of CSIC
Current assignee: 716th Research Institute of CSIC
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2023-11-10
Anticipated expiration: 2040-06-30
Also published as: CN111831490A

Abstract

The application discloses a memory synchronization method and a system between redundant main and standby nodes, wherein the system comprises a main host node and a standby host node, memory synchronization components are respectively deployed on operating systems of the main host node and the standby host node, the memory synchronization components comprise a network data packet collecting module, a network data packet comparing module, a dirty page extracting module, a memory check point module, a check point transmission synchronization module and a data packet mirror image filtering module, and meanwhile, the main host node and the standby host node have the same hardware, namely have CPUs, memories, network cards and hard disks with the same configuration. The application combines an event-driven memory synchronization triggering mechanism, and can greatly improve the memory synchronization efficiency between redundant main and standby nodes on the premise that the standby nodes can seamlessly take over the application function of the main node.

Description

Method and system for synchronizing memories between redundant main and standby nodes

Technical Field

The application relates to the technical field of high reliability of dual-machine hot standby fault tolerance, in particular to a memory synchronization method and a memory synchronization system between redundant main and standby nodes in a dual-machine hot standby system.

Background

Electronic information systems need to operate stably for a long period of time under different conditions, and have extremely high requirements for reliability. As various computer systems with complex structures and powerful functions are applied to information computing systems, the reliability problem of core control or service becomes more and more important. In order to improve the reliability of a computing system, the current fault processing methods for systems such as a server are generally divided into error avoidance and fault tolerance. The fault avoidance method mainly comprises the steps of taking certain measures to completely eliminate faults before the faults occur so as to improve the reliability of the system. Although the error avoidance technology can improve the reliability of the system to a certain extent, the effect brought by the error avoidance technology is limited, and the cost of the error avoidance technology also rises sharply along with the improvement of the performance, so that the scheme for avoiding the fault from entering the system is almost impossible to realize in practical application. And fault tolerance refers to a system with certain redundancy capability. Under the condition that the system encounters a certain software fault or hardware fault, the fault tolerant system can still continue to operate according to the original performance index to continue to finish a given task, but the fault tolerant technology needs certain auxiliary measures.

The current redundant fault tolerance method is a main method for realizing the fault tolerance of the system, and mainly reduces or even eliminates the influence of faults on the system by adding redundant resources on the system structure. Redundant dual hot standby is an efficient way to boost information computing systems. The fault-tolerant tool which simply uses service resources such as a database cannot meet the running reliability requirement of equipment information systems, the hot standby solution of the existing non-service information computing system also needs to rely on a reliable design mechanism of an application layer, an application program designer needs to consider the function requirement of application software and also needs to consider the fault-tolerant mechanism of the system, the development workload of the application system is increased exponentially, the complexity of the system is increased, and the defects of extremely high requirement on an developer, poor reusability of the reliable mechanism, influence on project period and the like exist.

Disclosure of Invention

The application aims to provide a memory synchronization method and a memory synchronization system between redundant main and standby nodes, which solve the problems of high memory synchronization frequency of the main and standby nodes and large data volume of the memory synchronization of the main and standby nodes in the existing dual-machine hot standby system and reduce the influence of a hot standby mechanism on the system performance.

The technical solution for realizing the purpose of the application is as follows: a memory synchronization system between redundant main and standby nodes comprises 2 main and standby host nodes, wherein the two host nodes are communicated through Ethernet network interconnection equipment;

the main and standby nodes have the same hardware configuration, namely, the host has the same CPU, memory, network card and hard disk, the same application software is operated between the main and standby nodes, and during normal operation, the application software of the main node performs interactive collaboration with external information systems by externally sending network data, and the application software on the standby nodes operates simultaneously but is in a silent mode; when the main node sends a hardware fault, the application software on the standby node is switched to a normal mode, and takes over the function of interactive cooperation with an external information system;

the main node and the standby node deploy the same memory synchronization assembly, wherein the memory synchronization assembly comprises a network data packet collecting module, a network data packet comparing module, a dirty page extracting module, a memory check point module, a check point transmission synchronization module and a data packet mirror image filtering module;

the network data packet collecting module dynamically monitors and intercepts the network data packet sent out by the standby host node through the network interconnection equipment, and sends the intercepted network data packet to the main node;

the network data packet comparison module is used for judging whether the data packets sent out by the main node and the standby node are consistent;

the dirty page extraction module is used for monitoring and recording the dirty pages of the memory data in the running process of the master node;

the memory check point module is mainly used for creating a check point of a dirty page of memory data of the main node;

the check point transmission synchronization module is mainly used for transmitting the memory check point of the main node to the standby node through a high-speed network and updating the memory page of the standby node;

the data packet mirror image filtering module mainly submits the external information system network data packet received by the main node to the main node for processing, and simultaneously sends the copied mirror image network data packet to the standby node for processing.

A memory synchronization method between redundant main and standby nodes comprises the following steps:

s1, when a main and standby node sends an Ethernet data packet outwards, a network data packet aggregation module monitors the outgoing data packet of the standby node, intercepts the outgoing data packet and redirects and sends the outgoing data packet to the main node;

s2, the master node judges whether the data packet of the standby node is consistent with the data packet sent out by the master node by utilizing a network data packet comparison module, and if so, the standby node is judged to be in an equivalent memory state; if the nodes are inconsistent, the standby nodes are judged to be in a non-equivalent memory state;

s3, if the standby node is in a non-equivalent memory state, jumping to S4; if the standby node is in the equivalent memory state, jumping to S1;

s4, triggering a memory synchronization process between the main node and the standby node, and creating a check point corresponding to a dirty page of the memory by a main node memory check point module in an increment check point mode;

s5, using a check point transmission synchronization module, and transmitting a memory check point to a standby node by the main node through a high-speed network;

s6, the standby node completes page update of the related memory by using the received memory check point, informs the main node and returns to S1;

and S7, after the main node receives the network data packet of the external information system, the data packet is submitted to the main node for processing by utilizing the data packet mirror image filtering module, and the copied mirror image network data packet is sent to the standby node for processing.

Compared with the prior art, the application has the beneficial effects that: (1) The event-driven memory synchronous trigger mechanism is adopted, so that the memory state updating frequency is low; (2) The increment check point mode is adopted, so that the memory state updating data quantity is small; (3) The main node and the standby node can maintain consistent data connection through the data packet filtering module; (4) supporting seamless handover of the primary and standby nodes; (5) By combining the fault detection and switching mechanism, the memory synchronization efficiency between redundant nodes in the main and standby fault-tolerant system can be greatly improved, and the influence of the hot standby mechanism on the system performance is reduced.

The application is described in further detail below with reference to the accompanying drawings.

Drawings

Fig. 1 is a block diagram of a memory synchronization system between a first redundant master and a second redundant node according to an embodiment of the present application.

Fig. 2 is a block diagram of a memory synchronization system between a second redundant master and a second redundant slave node according to an embodiment of the present application.

Fig. 3 is a diagram of software modules of a memory synchronization component in a memory synchronization system between redundant primary and standby nodes according to an embodiment of the present application.

Fig. 4 is a flowchart of a method for synchronizing memory between redundant master and slave nodes according to an embodiment of the present application.

Detailed Description

In order to better promote the reliability of a computer system, the application designs a memory synchronization method and a system between redundant main and standby nodes in a dual-computer hot standby system, which aim at the problem of synchronous efficiency caused by large memory data and frequent reading and writing in the dual-computer hot standby system. Meanwhile, the incremental check point method adopted in the method can effectively reduce the data volume of the memory synchronization of the main and standby nodes, thereby achieving the purpose of improving the memory synchronization efficiency of the main and standby nodes.

The application discloses a memory synchronization system between redundant main and standby nodes, which comprises 2 main and standby host nodes, wherein the two host nodes are communicated through Ethernet interconnection equipment.

The main and standby nodes have the same hardware configuration, namely, the host has the same CPU, memory, network card and hard disk, the same application software is operated between the redundant main and standby nodes, and during normal operation, the application software of the main node performs interactive cooperation with external transmission network data and an external information system, and the application software on the standby nodes operates simultaneously but is in a silent mode. When the main node sends a hardware fault, the application software on the standby node is switched to a normal mode, and takes over the function of interactive cooperation with an external information system.

The main node and the standby node are respectively provided with the same memory synchronization assembly, and the memory synchronization assembly comprises a network data packet collecting module, a network data packet comparing module, a dirty page extracting module, a memory check point module, a check point transmission synchronization module and a data packet mirror image filtering module.

The network data packet collecting module dynamically monitors and intercepts the network data packet which is sent out by the standby host node through the network interconnection equipment, and sends the intercepted network data packet to the main node.

The network data packet comparison module is mainly used for judging whether the data packets sent out by the main node and the standby node are consistent.

The dirty page extraction module is mainly used for monitoring and recording the dirty pages of the memory data in the running process of the master node.

The memory check point module is mainly used for creating check points of the main node memory data dirty pages.

The check point transmission synchronization module is mainly used for transmitting the memory check point of the main node to the standby node through the high-speed network and updating the memory page of the standby node.

The application also provides a memory synchronization method between the redundant main and standby nodes, which comprises the following steps:

s2, the master node judges whether the data packet of the standby node is consistent with the data packet sent out by the master node by utilizing the network data packet comparison module, and if so, the standby node is judged to be in an equivalent memory state. If the nodes are inconsistent, the standby nodes are judged to be in a non-equivalent memory state;

s3, if the standby node is in the non-equivalent memory state, jumping to S4. If the standby node is in the equivalent memory state, jumping to S1;

As a preferred technical solution, the "outgoing packet" in step S1 is a network packet sent to the external information system through the ethernet interface during the running process of the application software of the standby node.

As a preferable technical scheme, the "outgoing data packet interception" in step S1 is that the standby node monitors the network data packet sent by the application software to the external information system through the I/O event filtering function, and records the memory address where the content of the data packet is located.

As a preferred technical solution, the "redirection" in step S1 changes the destination address of the intercepted outgoing packet to the standby node and changes the destination address to the active node.

As a preferable technical scheme, the "data packets keep consistent" in step S2 is whether the content of the network data packets with the same serial numbers sent by the application software on the primary and standby nodes to the external information system is the same, and if the content of the data packets is the same, the data packets keep consistent.

As a preferred technical solution, the "equivalent memory state" in step S2 is that the memory data between the primary and the secondary nodes may be different but relatively consistent, and the staged running results of the application software on the primary and the secondary nodes are consistent.

As a preferable technical scheme, the "nonequivalent memory state" in the step S2 is that there is a difference in memory data between the primary and the secondary nodes, and the staged running results of the application software on the primary and the secondary nodes cannot be kept consistent.

As a preferred technical solution, the "memory synchronization process" in step S4 is to transmit the latest memory data of the primary node to the standby node, so that the standby node updates its own memory page, so that the memory data between the primary node and the standby node are kept consistent.

As a preferred technical solution, the "memory dirty page" in step S4 is a page where the modified memory data is located in a period of time during which the host node operates, and the memory page is a way for the operating system to manage the memory data of the computer.

As a preferred technical solution, in step S4, the "incremental check point" is a period of time operated by the master node, and only updated data in the dirty page of the memory is extracted, and created as the memory check point, instead of encapsulating the entire page into the memory check point.

As a preferable technical scheme, the "page update" in step S6 is to modify the corresponding data area in the own memory page for the standby node according to the memory check point of the main node, so as to implement the memory page of the main node.

As a preferred technical scheme, step S7 is specifically that, through an ethernet port, the master node receives a socket packet, and at this time, the packet mirror filtering module copies the socket packet, modifies the MAC address, and sends the packet to the standby node for processing.

In order to make the method and the solution according to the present application better understood by those skilled in the art, the technical solution according to the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments.

Examples

Fig. 1 is a block diagram of a first memory synchronization system between redundant main and standby nodes provided by an embodiment of the present application, where the system includes a main node 1, a standby node 2, and a network interconnection device 3, where the main node 1 and the standby node 2 are respectively configured with the same independent hard disk 4, CPU processor 5, network card 6, memory 7, operating system 8, and memory synchronization component 9. Fig. 2 is a block diagram of a memory synchronization system between a second redundant master node and a redundant slave node according to an embodiment of the present application, where the system includes a master node 1, a slave node 2, a network interconnection device 3, and a shared hard disk 10, and the master node 1 and the slave node 2 are respectively configured with the same CPU processor 5, network card 6, memory 7, operating system 8, and memory synchronization component 9.

Fig. 3 is a diagram of software modules of a memory synchronization component in a memory synchronization system between redundant primary and standby nodes according to an embodiment of the present application. The main node 1 is provided with a memory synchronization assembly 9, wherein the memory synchronization assembly 9 comprises a network data packet aggregation module 11, a network data packet comparison module 12, a dirty page extraction module 13, a memory check point module 14, a check point transmission synchronization module 15 and a data packet mirror image filtering module 16; the standby node 2 is also provided with a memory synchronization component 9, and the memory synchronization component 9 comprises a network data packet aggregation module 11, a network data packet comparison module 12, a dirty page extraction module 13, a memory check point module 14, a check point transmission synchronization module 15 and a data packet mirror image filtering module 16. The network data packet collecting module 11 dynamically monitors and intercepts the network data packet sent out by the standby host node through the network interconnection equipment, and sends the intercepted network data packet to the main node. The network data packet comparison module 12 mainly judges whether the data packets sent out by the main node and the standby node are consistent. The dirty page extraction module 13 mainly realizes the monitoring and page recording of the dirty pages of the memory data in the running process of the master node. The memory checkpoint module 14 primarily checkpoints the dirty pages of the host node memory data. The checkpoint transmission synchronization module 15 mainly transmits the memory checkpoint of the primary node to the standby node through the high-speed network, and updates the memory page of the standby node. The data packet mirror image filtering module 16 mainly submits the external information system network data packet received by the primary node to the primary node for processing, and simultaneously sends the copied mirror image network data packet to the standby node for processing.

During normal operation, the application software of the main node performs interactive cooperation with external transmission network data and an external information system, and the application software on the standby node operates simultaneously but is in a silent mode. When the main node sends a hardware fault, the application software on the standby node is switched to a normal mode, and takes over the function of interactive cooperation with an external information system.

Fig. 4 is a flowchart of a method for synchronizing memory between redundant master and slave nodes according to the present embodiment, where the method includes the following steps:

when the main and standby nodes send Ethernet data packets outwards, the network data packet aggregation module 11 monitors the outgoing data packets of the standby node 2, intercepts the outgoing data packets and redirects the outgoing data packets to be sent to the main node 1;

in the second process, the primary node 1 uses the network data packet comparison module 12 to determine whether the data packet of the standby node 2 is consistent with the data packet sent out by itself, if so, the standby node 2 is determined to be in an equivalent memory state. If the nodes are inconsistent, the standby nodes are judged to be in a non-equivalent memory state;

and step three, if the standby node 2 is in the non-equivalent memory state, jumping to the step S4. If the standby node 2 is in the equivalent memory state, jumping to a first process;

triggering a memory synchronization process between the main node 2 and the standby node 2, and creating a check point corresponding to a dirty memory page by combining the dirty page extraction module 13 and the memory check point module 14 of the main node 1 in a mode of incremental check points;

a fifth process, using a check point transmission synchronization module 15, the primary node 1 transmits a memory check point to the standby node 2 through a high-speed network;

step six, the standby node 2 completes the page update of the related memory by utilizing the received memory check point, informs the main node 1 and returns to the step one;

and seventhly, after the primary node 1 receives the network data packet of the external information system, the data packet is submitted to the primary node 2 for processing by utilizing the data packet mirror image filtering module 16, and the copied mirror image network data packet is sent to the standby node 2 for processing.

The foregoing is merely a preferred embodiment of the present application, and it should be noted that modifications and additions may be made to those skilled in the art without departing from the method of the present application, which modifications and additions are also to be considered as within the scope of the present application.

Claims

1. A memory synchronization system between redundant main and standby nodes is characterized in that: the system comprises a main host node and a standby host node, wherein the two host nodes are communicated through Ethernet network interconnection equipment;

2. A method for synchronizing memory between redundant primary and backup nodes based on the system of claim 1, comprising the steps of:

3. The method for synchronizing memory between redundant master and slave nodes according to claim 2, wherein the outgoing packet in step S1 is a network packet sent to an external information system through an ethernet interface during the operation of the slave node application software.

4. The method for synchronizing internal memories between redundant master and slave nodes according to claim 2, wherein the outgoing packet interception in step S1 is that the slave node monitors a network packet sent by the application software to the external information system through an I/O event filtering function, and then records the internal memory address where the content of the packet is located.

5. The method according to claim 2, wherein the redirecting in step S1 is that the standby node will change the destination address of the intercepted outgoing packet, and change the destination address to the active node.

6. The method for synchronizing memories between redundant master and slave nodes according to claim 2, wherein the method for determining that the data packets in step S2 are consistent is whether the content of network data packets with the same sequence number sent by the application software on the master and slave nodes to the external information system is the same, and if the content of the data packets is the same, the data packets are consistent.

7. The method for synchronizing memory between redundant master and slave nodes according to claim 2, wherein the equivalent memory state in step S2 is that memory data between the master and slave nodes are kept consistent, and a result of staged operation of application software on the master and slave nodes is kept consistent; the non-equivalent memory state is that memory data between the main node and the standby node have differences, and the staged operation result of the application software on the main node and the standby node cannot be kept consistent.

8. The method for synchronizing memory between redundant master and slave nodes according to claim 2, wherein the memory synchronization in step S4 is to transmit the latest memory data of the master node to the slave node, and the slave node updates its own memory page so that the memory data between the master node and the slave node are consistent;

the dirty memory page in the step S4 is a page where the modified memory data is located in a period of time operated by the master node;

in the step S4, the incremental check point is a period of time for the master node to operate, only updated data in the dirty page of the memory is extracted, and the updated data is created as a memory check point, instead of packaging the whole page into the memory check point.

9. The method for synchronizing memory between redundant master and slave nodes according to claim 2, wherein the page update in step S6 is that the slave node modifies the corresponding data area in the own memory page according to the memory check point of the master node, so as to implement the memory page of the master node.

10. The method for synchronizing memories between redundant master and slave nodes according to claim 2, wherein the step S7 is specifically: and the master node receives the socket data packet through the Ethernet port, and the data packet mirror image filtering module copies the socket data packet to one part at the moment, modifies the MAC address and sends the socket data packet to the standby node for processing.