CN113282334A - Method and device for recovering software defects, computer equipment and storage medium - Google Patents
Method and device for recovering software defects, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113282334A CN113282334A CN202110630686.2A CN202110630686A CN113282334A CN 113282334 A CN113282334 A CN 113282334A CN 202110630686 A CN202110630686 A CN 202110630686A CN 113282334 A CN113282334 A CN 113282334A
- Authority
- CN
- China
- Prior art keywords
- message
- node
- target
- cluster
- master node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 230000007547 defect Effects 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 124
- 238000012216 screening Methods 0.000 claims abstract description 21
- 238000004590 computer program Methods 0.000 claims description 15
- 238000011084 recovery Methods 0.000 abstract description 9
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000005096 rolling process Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3644—Software debugging by instrumenting at runtime
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/366—Software debugging using diagnostics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Retry When Errors Occur (AREA)
Abstract
The application relates to a method and a device for recovering software defects, computer equipment and a storage medium. The method comprises the following steps: receiving error information sent by a main node after a target application program runs wrongly; screening out target messages causing the operation errors of the target application program from the candidate messages received in advance according to the error information; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message. By adopting the method, the high availability of the cluster can be ensured, and the problem of automatic recovery of the distributed cluster system in a software defect scene is solved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for recovering software defects, a computer device, and a storage medium.
Background
With the development of computer technology, clustering technology has emerged, in which a group of mutually independent computers interconnected through a high-speed network are formed into a group and managed in a single system mode. A client interacts with a cluster, which appears as a stand-alone server. The cluster configuration is for improved availability and scalability. Generally, a high availability cluster may include a plurality of servers, each server runs the same application program, and consistency is ensured by means of state machine replication, that is, by ensuring the same initial state and message input, a deterministic processing procedure ensures consistent output and internal state. When one of the servers fails, the remaining servers can continue to provide services, thereby achieving high availability of the cluster. However, the method for copying the state machine cannot solve the problem of application program crash caused by software defects, when the application program processes a certain message to cause an application program error, and further causes the application program in the whole cluster to completely crash, developers need to position and repair the defects one by one, the message cannot be processed for a long time, and the high availability of the cluster cannot be guaranteed.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for recovering a software defect capable of ensuring high availability of a cluster.
A method for recovering software defects, applied to a target tracking node in a cluster, wherein the cluster comprises a master node and at least one tracking node, target applications run in the master node and the tracking node, and the tracking node is consistent with the content and sequence of messages received by the master node, the method comprising:
receiving error information sent by the main node after the target application program runs wrongly;
according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed;
discarding the target message;
switching the target following node to a current main node in the cluster;
and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
In one embodiment, before the receiving the error information sent by the master node after the target application program runs in error, the method further includes:
receiving message processing progress information which is sent by the main node after message processing is carried out on a message and corresponds to the message;
screening out messages corresponding to the message processing progress information from candidate messages which are received in advance and are not processed;
and processing the message corresponding to the message processing progress information.
In one embodiment, before the discarding the target message, the method further includes:
closing the service processing logic corresponding to the message type of the target message; or the like, or, alternatively,
and returning the version of the target application program from the current version to the last version of the current version.
In one embodiment, the cluster further includes at least one standby node, and when the master node fails, a target standby node in the standby nodes is switched to a current master node in the cluster.
An apparatus for recovering software defects, applied to a target tracking node in a cluster, wherein the cluster comprises a master node and at least one tracking node, target applications run in the master node and the tracking node, and the tracking node is consistent with the content and sequence of messages received by the master node, the apparatus comprising:
the receiving module is used for receiving error information sent by the main node after the target application program runs wrongly;
the screening module is used for screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed according to the error information;
a discarding module for discarding the target message;
the switching module is used for switching the target running following node into a current main node in the cluster;
and the processing module is used for receiving a new message through the current master node and processing the candidate message with the receiving time after the receiving time of the target message and the new message.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
receiving error information sent by the main node after the target application program runs wrongly;
according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed;
discarding the target message;
switching the target following node to a current main node in the cluster;
and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
receiving error information sent by the main node after the target application program runs wrongly;
according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed;
discarding the target message;
switching the target following node to a current main node in the cluster;
and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
The method, the device, the computer equipment and the storage medium for recovering the software defects receive error information sent by the main node after the target application program runs wrongly; according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message. Therefore, the nodes are divided into the main nodes and the run-following nodes, when the main nodes process a certain message to trigger the target application program to run mistakenly, the target run-following node can screen the message causing the target application program to run mistakenly and discard the message, and then the target run-following node can replace the main node with the running error to be called as a new main node in the cluster and continue to process the message, so that the normal processing of the message is ensured, and the high availability of the cluster is ensured.
A method for recovering software defects, applied to a master node in a cluster, wherein the cluster comprises the master node and at least one tracking node, target applications run in the master node and the tracking node, and the tracking node is consistent with the content and sequence of messages received by the master node, the method comprises the following steps:
generating corresponding error information after the target application program runs wrongly;
sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
In one embodiment, before generating the corresponding error information after the target application program runs in an error, the method further includes:
acquiring a message and processing the message;
after the message is processed, generating message processing progress information corresponding to the message;
sending the message processing progress information to the target running following node to indicate the target running following node to screen out a message corresponding to the message processing progress information from candidate messages which are received in advance and are not processed; and processing the message corresponding to the message processing progress information.
In one embodiment, the cluster further includes at least one standby node, and when the master node fails, a target standby node in the standby nodes is switched to a current master node in the cluster.
An apparatus for recovering software defects, applied to a master node in a cluster, the cluster including a master node and at least one tracking node, the master node and the tracking node running therein a target application, the tracking node corresponding to the content and sequence of messages received by the master node, the apparatus comprising:
the generating module is used for generating corresponding error information after the target application program runs wrongly;
a sending module, configured to send the error information to a target running following node in the running following nodes, so as to instruct the target running following node to screen, according to the error information, a target message causing an error in running of the target application program from candidate messages that are received in advance and are not processed; discarding the target message; switching the target following node to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
generating corresponding error information after the target application program runs wrongly;
screening the target running following node indicated by the target running following node to obtain a target message causing the operation error of the target application program from the candidate messages which are received in advance and are not processed according to the error information; discarding the target message; switching the target following node to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
generating corresponding error information after the target application program runs wrongly;
sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
According to the software defect recovery method, the software defect recovery device, the computer equipment and the storage medium, after the target application program runs wrongly, corresponding error information is generated; sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message. Therefore, the nodes are divided into the main nodes and the run-following nodes, when the main nodes process a certain message to trigger the target application program to run mistakenly, the target run-following node can screen the message causing the target application program to run mistakenly and discard the message, and then the target run-following node can replace the main node with the running error to be called as a new main node in the cluster and continue to process the message, so that the normal processing of the message is ensured, and the high availability of the cluster is ensured.
Drawings
FIG. 1 is a diagram illustrating an exemplary embodiment of a method for recovering from software bugs;
FIG. 2 is a flowchart illustrating a method for recovering from a software defect according to an embodiment;
FIG. 3 is a flowchart illustrating a method for recovering from a software defect according to another embodiment;
FIG. 4 is a block diagram showing an exemplary embodiment of a device for recovering from a software defect;
FIG. 5 is a block diagram showing a configuration of a software defect recovery apparatus according to another embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for recovering the software defect provided by the application can be applied to the application environment shown in fig. 1. The application environment includes a master node 102 and at least one tracking node 104. The master node 102 communicates with the running node 104 via a network. The tracking node 104 may be implemented by a separate server or a server cluster composed of a plurality of servers. Those skilled in the art will understand that the application environment shown in fig. 1 is only a part of the scenario related to the present application, and does not constitute a limitation to the application environment of the present application.
The target running following node 104 receives error information sent by the main node 102 after the target application program runs in error; screening out target messages causing the operation errors of the target application program from the candidate messages received in advance according to the error information; discarding the target message; switching the target running following node 104 to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message.
In one embodiment, as shown in fig. 2, a method for recovering from a software defect is provided, which is described by taking the method as an example applied to the target running node 104 in fig. 1, and includes the following steps:
s202, receiving error information sent by the main node after the target application program runs in error.
The master node is a node currently responsible for providing services to the outside in the cluster, and can receive and process messages in real time. The tracking node is a node in the cluster where the master node controls the progress of message processing. The target tracking node is the highest priority tracking node in the cluster and is in a healthy state. It will be appreciated that the tracking node may receive the same message as the master node, but will not automatically process the received message in real time. After each message is processed, the master node informs the running following node to process the corresponding message. The error information is information carrying a message identification of a message causing the target application to operate erroneously.
For example, the master node and the tracking node may receive messages with message sequence numbers 1, 2, and 3, where the message sequence numbers may be sequentially incremented according to the time sequence of receiving the messages. The main node can process the messages with the message sequence numbers 1, 2 and 3 in real time in sequence. After the master node finishes processing the message with the message serial number of 1, the master node can inform the running following node to start processing the message with the message serial number of 1; after the master node finishes processing the message with the message serial number of 2, the master node can inform the running following node to start processing the message with the message serial number of 2; after the master node completes processing the message with the message sequence number of 3, the master node may notify the race following node to start processing the message with the message sequence number of 3, and so on. It can be understood that the processing of the message by the master node and the tracking node is not performed simultaneously, and the tracking node needs to process the corresponding message after receiving the notification from the master node.
Specifically, a target application program runs in the master node, and when the master node processes a message, a situation that a certain message triggers the target application program to run incorrectly may occur. It will be appreciated that if the target application is running incorrectly, the process responsible for running the target application will fail. And when the process responsible for running the target application program fails, the main node can generate error information corresponding to the message causing the failure through the monitoring process and send the error information to the target running node. The target running following node can receive error information sent by the main node after the target application program runs in error.
S204, according to the error information, screening out the target message causing the operation error of the target application program from the candidate messages which are received in advance and are not processed.
Wherein the candidate message is a message received by the target tracking node before the target application program of the master node runs in error.
In particular, the error information may carry a message identification of the message that caused the target application to operate in error. The target running following node can extract the message identification of the message causing the target application program to operate in error from the error information, and screen out the target message corresponding to the message identification from the candidate messages which are received in advance and are not processed.
S206, discarding the target message.
Specifically, the target tracking node may discard the target message causing the target application to operate incorrectly, i.e., the target tracking node may skip the target message and not process the target message.
And S208, switching the target running following node into the current main node in the cluster.
Specifically, after the target application program in the master node runs in error, for the target application program, the master node has failed to provide the business service to the target application program and is already in a crash state. At this time, in order to ensure normal processing of the message, the target tracking node may be switched to be the current master node in the cluster.
S210, receiving a new message through the current master node, and performing message processing on the candidate message and the new message, the receiving time of which is after the receiving time of the target message.
Specifically, the candidate messages include messages whose reception time is before the reception time of the target message, and messages whose reception time is after the reception time of the target message. In order to avoid repeated processing of the messages, after the target running node is switched to the current master node in the cluster, the candidate messages with the receiving time after the receiving time of the target message can be directly processed, new messages continue to be received, and the new messages are processed.
In the method for recovering the software defects, error information sent by a main node after a target application program runs wrongly is received; according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message. Therefore, the nodes are divided into the main nodes and the run-following nodes, when the main nodes process a certain message to trigger the target application program to run mistakenly, the target run-following node can screen the message causing the target application program to run mistakenly and discard the message, and then the target run-following node can replace the main node with the running error to be called as a new main node in the cluster and continue to process the message, so that the normal processing of the message is ensured, and the high availability of the cluster is ensured.
In an embodiment, before the step S202 of receiving an error message sent by the master node after the target application program runs in error, the method for recovering the software defect further includes: receiving message processing progress information which is sent by a main node after message processing is carried out on a message and corresponds to the message; screening out messages corresponding to message processing progress information from candidate messages which are received in advance and are not processed; and processing the message corresponding to the message processing progress information.
The message processing progress information is information representing the processing progress of the master node on the message.
Specifically, after the message is processed by the master node, the master node may generate message processing progress information corresponding to the message, and send the message processing progress information to the target running following node. The target running following node can receive message processing progress information corresponding to the message sent by the master node after the message is processed, and screen out the message corresponding to the message processing progress information from the candidate messages which are received in advance and unprocessed, so as to process the message corresponding to the message processing progress information.
Optionally, the messages in the master node and the running node are marked with corresponding continuously increasing message sequence numbers according to the sequence of the receiving time, and after the master node finishes processing a certain message, the message sequence number of the message is recorded to serve as the current message processing progress information of the master node.
In one embodiment, the target tracking node may continue to send message processing progress information to other tracking nodes in the cluster after becoming the master node.
In the above embodiment, the nodes are divided into the master node and the race node, and the master node is responsible for processing the message in real time and providing the service to the outside. The tracking node is responsible for tracking the message processing progress of the master node, and processes the message after the master node processes a certain message, so that the situation that all nodes in the cluster are completely crashed when the target application program is triggered to run by a certain message and has errors is avoided. In this way, normal processing of messages is guaranteed, thereby guaranteeing high availability of the cluster.
In one embodiment, before step S206, that is, before the step of discarding the target message, the method for recovering the software bug further includes: closing the service processing logic corresponding to the message type of the target message; or, rolling back the version of the target application program from the current version to a last version of the current version.
Specifically, when the target application program of the master node runs in a fault, the target running node can synchronously block a target message causing the running fault of the target application program to be delivered to the application layer, wherein the application layer is used for processing the message. In the event that it is determined that the business processing logic corresponding to the message type of the target message may cause the target application to operate incorrectly, the target tracking node may turn off the business processing logic corresponding to the message type of the target message. In the event that the current version of the target application is determined, which may cause the target application to operate in error, the target tracking node may roll back the version of the target application from the current version to a previous version of the current version.
In the above embodiment, by closing the service processing logic corresponding to the message type of the target message, it may be avoided that the target running following node subsequently processes a message of the same message type, or that the subsequent message is processed by a previous version of the current version, thereby avoiding a situation that the target running following node is broken down again.
In one embodiment, the cluster further includes at least one standby node, and in the case of a failure of the master node, a target standby node in the standby node is switched to a current master node in the cluster.
The standby node is a node completely isomorphic with the main node, and the content and the sequence of receiving the message, the processing progress of the message, the content and the sequence of receiving the message by the main node and the processing progress of the message are consistent.
Optionally, the failure of the master node may specifically be a failure of software in the master node, or a failure of hardware of the master node itself.
Optionally, the determination manner of the target standby node may be specifically obtained by determining through a prestored election configuration file, where the election priority of each standby node is recorded in the election configuration file. It can be understood that, when the master node fails, the backup node with the highest election priority and in a healthy state in the election configuration file may be determined as the target backup node, and then the target backup node is switched to the current master node in the cluster.
In the above embodiment, when the master node fails, selecting one standby node to switch to the current master node in the cluster can ensure normal processing of the message, thereby further ensuring high availability of the cluster.
In one embodiment, as shown in fig. 3, a method for recovering a software bug is provided, which is described by taking the method as an example for being applied to the master node 102 in fig. 1, and includes the following steps:
s302, after the target application program runs wrongly, corresponding error information is generated.
Specifically, a target application program runs in the master node, and when the master node processes a message, a situation that a certain message triggers the target application program to run incorrectly may occur. It will be appreciated that if the target application is running incorrectly, the process responsible for running the target application will fail. The monitoring process in the main node can monitor the process responsible for running the target application program in real time, and when the process responsible for running the target application program fails, the main node can generate error information corresponding to the message causing the failure through the monitoring process.
S304, sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from the candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message.
Specifically, the master node may send error information to the target tracking node. The target running following node can receive error information sent by the main node after the target application program runs in error. The error information may carry a message identification of the message that caused the target application to operate incorrectly. The target running following node can extract the message identification of the message causing the target application program to operate in error from the error information, and screen out the target message corresponding to the message identification from the candidate messages which are received in advance and are not processed. The target tracking node may discard the target message causing the target application to operate incorrectly, i.e., the target tracking node may skip the target message and not process the target message. After the target application program in the master node runs in error, for the target application program, the master node cannot provide the business service for the target application program, and is already in a breakdown state. At this time, in order to ensure normal processing of the message, the target tracking node may be switched to be the current master node in the cluster. The candidate messages include messages whose reception time is before the reception time of the target message, and messages whose reception time is after the reception time of the target message. In order to avoid repeated processing of the messages, after the target running node is switched to the current master node in the cluster, the candidate messages with the receiving time after the receiving time of the target message can be directly processed, new messages continue to be received, and the new messages are processed.
In the embodiment, after the target application program runs wrongly, corresponding error information is generated; sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message. Therefore, the nodes are divided into the main nodes and the run-following nodes, when the main nodes process a certain message to trigger the target application program to run mistakenly, the target run-following node can screen the message causing the target application program to run mistakenly and discard the message, and then the target run-following node can replace the main node with the running error to be called as a new main node in the cluster and continue to process the message, so that the normal processing of the message is ensured, and the high availability of the cluster is ensured.
In an embodiment, in step S302, that is, before the step of generating corresponding error information after the target application program runs in error, the method for recovering the software defect further includes: acquiring a message and processing the message; after the message is processed, generating message processing progress information corresponding to the message; sending the message processing progress information to a target running following node to indicate the target running following node to screen out a message corresponding to the message processing progress information from candidate messages which are received in advance and are not processed; and processing the message corresponding to the message processing progress information.
Specifically, the master node may retrieve the message and process the message. After the message is processed by the main node, the main node can generate message processing progress information corresponding to the message and send the message processing progress information to the target running following node. The target running following node can receive message processing progress information corresponding to the message sent by the master node after the message is processed, and screen out the message corresponding to the message processing progress information from the candidate messages which are received in advance and unprocessed, so as to process the message corresponding to the message processing progress information.
In the above embodiment, the nodes are divided into the master node and the race node, and the master node is responsible for processing the message in real time and providing the service to the outside. The tracking node is responsible for tracking the message processing progress of the master node, and processes the message after the master node processes a certain message, so that the situation that all nodes in the cluster are completely crashed when the target application program is triggered to run by a certain message and has errors is avoided. In this way, normal processing of messages is guaranteed, thereby guaranteeing high availability of the cluster.
In one embodiment, the cluster further includes at least one standby node, and in the case of a failure of the master node, a target standby node in the standby node is switched to a current master node in the cluster.
In the foregoing embodiment, in the case that the master node fails, selecting one standby node to switch to the current master node in the cluster may ensure normal processing of the message, thereby further ensuring high availability of the cluster.
It should be understood that although the various steps of fig. 2 and 3 are shown sequentially in order, these steps are not necessarily performed sequentially in order. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 3 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided a software defect recovery apparatus 400, applied to a target tracking node in a cluster, where the cluster includes a master node and at least one tracking node, and the master node and the tracking node run a target application program, and the tracking node and the master node receive a message in a content and sequence, the apparatus including: a receiving module 401, a screening module 402, a discarding module 403, a switching module 404, and a processing module 405, wherein:
the receiving module 401 is configured to receive error information sent by the host node after the target application program runs in an error.
And a screening module 402, configured to screen out, according to the error information, a target message that causes an error in the operation of the target application from the candidate messages that are received in advance and are not processed.
A discarding module 403, configured to discard the target message.
And a switching module 404, configured to switch the target running following node to a current master node in the cluster.
A processing module 405, configured to receive a new message through the current master node, and perform message processing on the candidate message and the new message whose receiving time is after the receiving time of the target message.
In an embodiment, the receiving module 401 is further configured to receive message processing progress information corresponding to the message, which is sent by the host node after the message is processed by the host node. The screening module 402 is further configured to screen out a message corresponding to the message processing progress information from candidate messages that are received in advance and are not processed. The processing module 405 is further configured to perform message processing on a message corresponding to the message processing progress information.
In one embodiment, the discarding module 403 is further configured to close the service processing logic corresponding to the message type of the target message; or, rolling back the version of the target application program from the current version to a last version of the current version.
In one embodiment, the cluster further includes at least one standby node, and in the case of a failure of the master node, a target standby node in the standby node is switched to a current master node in the cluster.
The software defect recovery device receives error information sent by the main node after the operation of the target application program is in error; according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message. Therefore, the nodes are divided into the main nodes and the run-following nodes, when the main nodes process a certain message to trigger the target application program to run mistakenly, the target run-following node can screen the message causing the target application program to run mistakenly and discard the message, and then the target run-following node can replace the main node with the running error to be called as a new main node in the cluster and continue to process the message, so that the normal processing of the message is ensured, and the high availability of the cluster is ensured.
In one embodiment, as shown in fig. 5, there is provided a software defect recovery apparatus 500, applied to a master node in a cluster, where the cluster includes a master node and at least one following node, and the master node and the following node run target applications, and the following node and the master node receive messages in a content and sequence consistent with each other, the apparatus including: a generating module 501 and a sending module 502, wherein,
the generating module 501 is configured to generate corresponding error information after the target application program runs in an error.
A sending module 502, configured to send error information to a target running following node in the running following nodes, so as to instruct the target running following node to screen, according to the error information, a target message causing an error in running of a target application program from candidate messages that are received in advance and are not processed; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message.
In one embodiment, the generating module 501 is further configured to obtain a message and perform message processing on the message; and after the message is processed, generating message processing progress information corresponding to the message. The sending module 502 is further configured to send the message processing progress information to the target running following node to instruct the target running following node to screen out a message corresponding to the message processing progress information from the candidate messages that are received in advance and are not processed; and processing the message corresponding to the message processing progress information.
In one embodiment, the cluster further includes at least one standby node, and in the case of a failure of the master node, a target standby node in the standby node is switched to a current master node in the cluster.
The software defect recovery device generates corresponding error information after the operation of the target application program is in error; sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node into a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message and the new message, wherein the receiving time of the candidate message is after that of the target message. Therefore, the nodes are divided into the main nodes and the run-following nodes, when the main nodes process a certain message to trigger the target application program to run mistakenly, the target run-following node can screen the message causing the target application program to run mistakenly and discard the message, and then the target run-following node can replace the main node with the running error to be called as a new main node in the cluster and continue to process the message, so that the normal processing of the message is ensured, and the high availability of the cluster is ensured.
For the specific limitation of the device for recovering from the software defect, reference may be made to the above limitation on the method for recovering from the software defect, and details are not described herein again. The respective modules in the above-described software defect recovery apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be the master node 102 or the target tracking node 104 of FIG. 1, and whose internal structure may be as shown in FIG. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing message processing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of recovering from a software bug.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described method of recovering from a software defect. The steps of the method for recovering from a software defect herein may be the steps in the method for recovering from a software defect of the above-described respective embodiments.
In one embodiment, a computer readable storage medium is provided, storing a computer program that, when executed by a processor, causes the processor to perform the steps of the above-described method for recovering from a software bug. The steps of the method for recovering from a software defect herein may be the steps in the method for recovering from a software defect of the above-described respective embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method for recovering software defects, the method being applied to a target tracking node in a cluster, the cluster including a master node and at least one tracking node, the master node and the tracking node having a target application running therein, the tracking node being consistent with the content and sequence of messages received by the master node, the method comprising:
receiving error information sent by the main node after the target application program runs wrongly;
according to the error information, screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed;
discarding the target message;
switching the target following node to a current main node in the cluster;
and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
2. The method of claim 1, wherein prior to receiving an error message sent by the master node after the target application program has run an error, the method further comprises:
receiving message processing progress information which is sent by the main node after message processing is carried out on a message and corresponds to the message;
screening out messages corresponding to the message processing progress information from candidate messages which are received in advance and are not processed;
and processing the message corresponding to the message processing progress information.
3. The method of claim 1, wherein prior to said discarding said target message, said method further comprises:
closing the service processing logic corresponding to the message type of the target message; or the like, or, alternatively,
and returning the version of the target application program from the current version to the last version of the current version.
4. The method of claim 1, wherein the cluster further comprises at least one standby node, and wherein in case of a failure of the master node, a target standby node in the standby nodes switches to a current master node in the cluster.
5. A method for recovering a software defect, the method being applied to a master node in a cluster, the cluster including a master node and at least one tracking node, the master node and the tracking node running therein a target application, the tracking node corresponding to a content and an order of messages received by the master node, the method comprising:
generating corresponding error information after the target application program runs wrongly;
sending the error information to a target running node in the running nodes to indicate the target running node to screen out a target message causing the running error of the target application program from candidate messages which are received in advance and unprocessed according to the error information; discarding the target message; switching the target following node to a current main node in the cluster; and receiving a new message through the current master node, and performing message processing on the candidate message with the receiving time after the receiving time of the target message and the new message.
6. The method of claim 5, wherein before generating the corresponding error message after the target application program runs in error, the method further comprises:
acquiring a message and processing the message;
after the message is processed, generating message processing progress information corresponding to the message;
sending the message processing progress information to the target running following node to indicate the target running following node to screen out a message corresponding to the message processing progress information from candidate messages which are received in advance and are not processed; and processing the message corresponding to the message processing progress information.
7. The method of claim 5, wherein the cluster further comprises at least one standby node, and wherein in case of a failure of the master node, a target standby node in the standby nodes switches to a current master node in the cluster.
8. An apparatus for recovering software defects, the apparatus being applied to a target tracking node in a cluster, the cluster including a master node and at least one tracking node, the master node and the tracking node having a target application running thereon, the tracking node corresponding to the content and sequence of messages received by the master node, the apparatus comprising:
the receiving module is used for receiving error information sent by the main node after the target application program runs wrongly;
the screening module is used for screening out target messages causing the operation errors of the target application program from the candidate messages which are received in advance and are not processed according to the error information;
a discarding module for discarding the target message;
the switching module is used for switching the target running following node into a current main node in the cluster;
and the processing module is used for receiving a new message through the current master node and processing the candidate message with the receiving time after the receiving time of the target message and the new message.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented by the processor when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110630686.2A CN113282334A (en) | 2021-06-07 | 2021-06-07 | Method and device for recovering software defects, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110630686.2A CN113282334A (en) | 2021-06-07 | 2021-06-07 | Method and device for recovering software defects, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113282334A true CN113282334A (en) | 2021-08-20 |
Family
ID=77283505
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110630686.2A Pending CN113282334A (en) | 2021-06-07 | 2021-06-07 | Method and device for recovering software defects, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113282334A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113687834A (en) * | 2021-10-27 | 2021-11-23 | 深圳华锐金融技术股份有限公司 | Distributed system node deployment method, device, equipment and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103607297A (en) * | 2013-11-07 | 2014-02-26 | 上海爱数软件有限公司 | Fault processing method of computer cluster system |
US20180234336A1 (en) * | 2017-02-15 | 2018-08-16 | Intel Corporation | Compute node cluster based routing method and apparatus |
CN110908723A (en) * | 2019-11-29 | 2020-03-24 | 新华三大数据技术有限公司 | Main/standby switching method and device of operating system and related equipment |
CN111124755A (en) * | 2019-12-06 | 2020-05-08 | 中国联合网络通信集团有限公司 | Cluster node fault recovery method and device, electronic equipment and storage medium |
CN111198662A (en) * | 2020-01-03 | 2020-05-26 | 腾讯科技(深圳)有限公司 | Data storage method and device and computer readable storage medium |
CN112749178A (en) * | 2019-10-31 | 2021-05-04 | 华为技术有限公司 | Method for ensuring data consistency and related equipment |
-
2021
- 2021-06-07 CN CN202110630686.2A patent/CN113282334A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103607297A (en) * | 2013-11-07 | 2014-02-26 | 上海爱数软件有限公司 | Fault processing method of computer cluster system |
US20180234336A1 (en) * | 2017-02-15 | 2018-08-16 | Intel Corporation | Compute node cluster based routing method and apparatus |
CN112749178A (en) * | 2019-10-31 | 2021-05-04 | 华为技术有限公司 | Method for ensuring data consistency and related equipment |
CN110908723A (en) * | 2019-11-29 | 2020-03-24 | 新华三大数据技术有限公司 | Main/standby switching method and device of operating system and related equipment |
CN111124755A (en) * | 2019-12-06 | 2020-05-08 | 中国联合网络通信集团有限公司 | Cluster node fault recovery method and device, electronic equipment and storage medium |
CN111198662A (en) * | 2020-01-03 | 2020-05-26 | 腾讯科技(深圳)有限公司 | Data storage method and device and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
黄伟健;胡怀湘;: "面向流数据的分布式时序同步系统的设计与实现", 软件, no. 02, 15 February 2017 (2017-02-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113687834A (en) * | 2021-10-27 | 2021-11-23 | 深圳华锐金融技术股份有限公司 | Distributed system node deployment method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108710673B (en) | Method, system, computer device and storage medium for realizing high availability of database | |
CN109495312B (en) | Method and system for realizing high-availability cluster based on arbitration disk and double links | |
CN108897658B (en) | Method and device for monitoring master database, computer equipment and storage medium | |
CN110798375A (en) | Monitoring method, system and terminal equipment for enhancing high availability of container cluster | |
CN109308227B (en) | Fault detection control method and related equipment | |
CN108322533A (en) | Configuration and synchronization method between distributed type assemblies node based on operation log | |
CN111198921A (en) | Database switching method and device, computer equipment and storage medium | |
CN115994044B (en) | Database fault processing method and device based on monitoring service and distributed cluster | |
CN110290546B (en) | Method and device for restarting and positioning base station, base station equipment and storage medium | |
CN112558997A (en) | Method and device for deploying applications | |
CN110704166A (en) | Service operation method and device and server | |
US7373542B2 (en) | Automatic startup of a cluster system after occurrence of a recoverable error | |
CN111752488B (en) | Management method and device of storage cluster, management node and storage medium | |
CN113282334A (en) | Method and device for recovering software defects, computer equipment and storage medium | |
CN111324419A (en) | Deployment method, device, equipment and storage medium of combined container | |
CN108924772B (en) | Short message sending method and device, computer equipment and storage medium | |
CN111342986A (en) | Distributed node management method and device, distributed system and storage medium | |
CN115599310B (en) | Method and device for controlling storage resources in storage node and storage node | |
CN110489208B (en) | Virtual machine configuration parameter checking method, system, computer equipment and storage medium | |
CN113055203A (en) | Method and device for recovering abnormity of SDN control plane | |
CN113626240A (en) | Cluster fault recovery method and device, computer equipment and storage medium | |
CN111338848B (en) | Failure application copy processing method and device, computer equipment and storage medium | |
CN107959595B (en) | Method, device and system for anomaly detection | |
CN109240816B (en) | System scheme switching method and device, computer equipment and storage medium | |
CN114064717A (en) | Data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Room 2301, building 5, Shenzhen new generation industrial park, 136 Zhongkang Road, Meidu community, Meilin street, Futian District, Shenzhen City, Guangdong Province Applicant after: Shenzhen Huarui Distributed Technology Co.,Ltd. Address before: Room 2301, building 5, Shenzhen new generation industrial park, 136 Zhongkang Road, Meidu community, Meilin street, Futian District, Shenzhen City, Guangdong Province Applicant before: SHENZHEN ARCHFORCE FINANCIAL TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information |