Nothing Special   »   [go: up one dir, main page]

CN117914777B - Fault handling method, device, network equipment and storage medium - Google Patents

Fault handling method, device, network equipment and storage medium Download PDF

Info

Publication number
CN117914777B
CN117914777B CN202410020936.4A CN202410020936A CN117914777B CN 117914777 B CN117914777 B CN 117914777B CN 202410020936 A CN202410020936 A CN 202410020936A CN 117914777 B CN117914777 B CN 117914777B
Authority
CN
China
Prior art keywords
network device
data transmission
transmission direction
route
control signaling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410020936.4A
Other languages
Chinese (zh)
Other versions
CN117914777A (en
Inventor
单延晋
顾玮
乔伟
张亦帆
刘淳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lingrui Lanxin Technology Beijing Co ltd
Original Assignee
Lingrui Lanxin Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lingrui Lanxin Technology Beijing Co ltd filed Critical Lingrui Lanxin Technology Beijing Co ltd
Priority to CN202410020936.4A priority Critical patent/CN117914777B/en
Publication of CN117914777A publication Critical patent/CN117914777A/en
Application granted granted Critical
Publication of CN117914777B publication Critical patent/CN117914777B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/247Multipath using M:N active or standby paths

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请涉及一种故障处理方法、装置、网络设备及存储介质。所述方法包括:第一网络设备检测第二网络设备是否发生故障,所述第二网络设备为第一数据传输方向上所述第一网络设备的下一跳,且所述第二网络设备为主用设备;在所述第二网络设备发生了故障的情况下,将所述第一数据传输方向上的路由指向第三网络设备,并向所述第三网络设备发送第一控制信令,所述第三网络设备表示所述第二网络设备的备用设备,所述第一控制信令用于控制所述第三网络设备中所述第一数据传输方向上的路由指向第四网络设备,所述第四网络设备为所述第一数据传输方向上所述第二网络设备上的下一跳。采用本方法能够提升路由切换速度。

The present application relates to a fault handling method, apparatus, network equipment and storage medium. The method includes: a first network device detects whether a second network device fails, the second network device is the next hop of the first network device in the first data transmission direction, and the second network device is the main device; in the event of a failure of the second network device, the route in the first data transmission direction is directed to a third network device, and a first control signaling is sent to the third network device, the third network device represents a backup device of the second network device, the first control signaling is used to control the route in the first data transmission direction in the third network device to point to a fourth network device, and the fourth network device is the next hop on the second network device in the first data transmission direction. The use of this method can improve the routing switching speed.

Description

Fault processing method, device, network equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a fault processing method, a device, a network device, and a storage medium.
Background
In the current network environment, there is an increasing demand for network transmission reliability and service continuity, and SHHS (spake-HUB-SPOKE) architecture has arisen. In SHHS architecture, the use of multiple HUB nodes avoids the risk of single point failure, improves the stability of data transmission and the distributed load, and thus increases the reliability and stability of the network.
In the conventional technology, in SHHS architecture, when a HUB node fails, BGP can automatically converge and repair routing, so that other devices can still communicate through other HUB nodes. However, this approach converges at a slower rate and network performance is still compromised.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a failure processing method, apparatus, network device, and storage medium capable of improving the speed of route switching.
In a first aspect, the present application provides a fault handling method, the method being applied to a first network device, the method comprising:
Detecting whether a second network device is in fault, wherein the second network device is the next hop of the first network device in the first data transmission direction, and the second network device is a main device;
And under the condition that the second network equipment fails, directing the route in the first data transmission direction to third network equipment, and sending a first control signaling to the third network equipment, wherein the third network equipment represents standby equipment of the second network equipment, the first control signaling is used for controlling the route in the first data transmission direction in the third network equipment to be directed to fourth network equipment, and the fourth network equipment is the next hop in the first data transmission direction in the second network equipment.
In a possible implementation manner, the first control signaling is further used to control a route in a second data transmission direction in the third network device to point to the first network device, and to control a route in the second data transmission direction in the fourth network device to point to the third network device, where the second data transmission direction represents a data transmission direction opposite to the first data transmission direction.
In one possible implementation, the method further includes:
Detecting whether the failure of the second network device is recovered;
And under the condition that the fault of the second network equipment is recovered and the recovery duration reaches a preset threshold value, directing the route in the first data transmission direction to the second network equipment, and sending a second control signaling to the second network equipment, wherein the second control signaling is used for controlling the route in the second network equipment in the first data transmission direction to be directed to the fourth network equipment and the route in the fourth network equipment in the second data transmission direction to be directed to the second network equipment.
In one possible implementation manner, the detecting whether the second network device fails includes:
Detecting a link between the first network device and the second network device;
Determining that the second network device has failed under the condition that the quality of a link between the first network device and the second network device does not meet a preset condition;
And/or the number of the groups of groups,
And determining that the second network device has failed in the event that the second network device is not reachable.
A second aspect provides a fault handling method, the method being applied to a third network device, the method comprising:
Receiving a first control signaling sent by a first network device, wherein the first control signaling is generated when a second network device fails, the second network device is the next hop of the first network device in a first data transmission direction, the second network device is a main device, and the third network device represents a standby device of the second network device;
and in response to the first control signaling, directing the route in the first data transmission direction to a fourth network device, the fourth network device being a next hop of the second network device in the first data transmission direction.
In one possible implementation, the method further includes:
in response to the first control signaling, directing a route in a second data transmission direction to the first network device, and sending third control signaling to the fourth network device, the third control signaling being used to control the route in the second data transmission direction in the fourth network device to the third network device, the second data transmission direction representing a data transmission direction opposite to the first data transmission direction.
In a third aspect, the present application further provides a fault handling apparatus, the apparatus being applied to a first network device, the apparatus comprising:
The first detection module is used for detecting whether a second network device has a fault or not, wherein the second network device is the next hop of the first network device in the first data transmission direction, and the second network device is a main device;
And the first routing module is used for directing the route in the first data transmission direction to third network equipment and sending a first control signaling to the third network equipment under the condition that the second network equipment fails, wherein the third network equipment represents standby equipment of the second network equipment, the first control signaling is used for controlling the route in the first data transmission direction in the third network equipment to be directed to fourth network equipment, and the fourth network equipment is the next hop in the second network equipment in the first data transmission direction.
In a possible implementation manner, the first control signaling is further used to control a route in a second data transmission direction in the third network device to point to the first network device, and to control a route in the second data transmission direction in the fourth network device to point to the third network device, where the second data transmission direction represents a data transmission direction opposite to the first data transmission direction.
In one possible implementation, the apparatus further includes:
the second detection module is used for detecting whether the fault of the second network equipment is recovered or not;
and the second routing module is used for directing the route in the first data transmission direction to the second network equipment and sending a second control signaling to the second network equipment under the condition that the fault of the second network equipment is recovered and the recovery duration reaches a preset threshold value, wherein the second control signaling is used for controlling the route in the first data transmission direction in the second network equipment to be directed to the fourth network equipment and the route in the second data transmission direction in the fourth network equipment to be directed to the second network equipment.
In one possible implementation manner, the first detection module is further configured to:
Detecting a link between the first network device and the second network device;
Determining that the second network device has failed under the condition that the quality of a link between the first network device and the second network device does not meet a preset condition;
And/or the number of the groups of groups,
And determining that the second network device has failed in the event that the second network device is not reachable.
In a fourth aspect, the present application further provides a fault handling apparatus, the apparatus being applied to a third network device, the apparatus comprising:
A receiving module, configured to receive a first control signaling sent by a first network device, where the first control signaling is generated when a second network device has a fault, the second network device is a next hop of the first network device in a first data transmission direction, the second network device is a primary device, and the third network device represents a standby device of the second network device;
And the first routing module is used for responding to the first control signaling and directing the route in the first data transmission direction to fourth network equipment, wherein the fourth network equipment is the next hop of the second network equipment in the first data transmission direction.
In one possible implementation, the apparatus further includes:
And a second routing module, configured to respond to the first control signaling, direct a route in a second data transmission direction to the first network device, and send a third control signaling to the fourth network device, where the third control signaling is used to control the route in the second data transmission direction in the fourth network device to be directed to the third network device, and the second data transmission direction represents a data transmission direction opposite to the first data transmission direction.
In a fifth aspect, the present application further provides a network device, including a memory and a processor, where the memory stores a computer program, and the processor implements the above first aspect or the second aspect, or a fault handling method of any possible implementation of the first aspect or the second aspect, when the processor executes the computer program.
In a sixth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above first or second aspect, or a fault handling method of any one of the possible implementations of the first or second aspect.
In a seventh aspect, the present application also provides a computer program product comprising a computer program which when executed by a processor implements the above first or second aspect, or a fault handling method of any one of the possible implementations of the first or second aspect.
When the first network device detects that the second network device serving as the next hop breaks down, the first network device directs its own route to the standby device of the second network device, and controls the route in the first data transmission direction on the standby device to direct to the next hop of the second network device by sending a control signaling to the standby device. In this way, the standby equipment is directly indicated to carry out route direction change through the control signaling, so that the route direction change does not depend on the convergence speed of BGP (border gateway protocol) routes, the route switching speed is effectively improved, and the influence of equipment faults on the network performance is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 illustrates an architecture diagram of an exemplary SHHS architecture provided by an embodiment of the present application;
FIG. 2 shows an interactive flow chart of a fault handling method provided by an embodiment of the present application;
FIG. 3 is a flow diagram of a fault handling method in one embodiment;
Fig. 4 shows an application schematic diagram of a fault handling method according to an embodiment of the present application;
fig. 5 shows an application schematic diagram of a fault handling method according to an embodiment of the present application;
fig. 6 shows an application schematic diagram of a fault handling method according to an embodiment of the present application;
FIG. 7 is a block diagram of a fault handling apparatus in one embodiment;
FIG. 8 is a block diagram of a fault handling apparatus in one embodiment;
Fig. 9 is an internal structural diagram of a network device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the SHHS (SPOKE-HUB-HUB-SPOKE) architecture, the risk of single point failure is avoided by using multiple HUB nodes. Specifically, the SHHS architecture includes a SPOKE node and a HUB node, and communication between HUB nodes may be performed through the HUB node. The HUB nodes can be divided into main HUB nodes and redundant HUB nodes, and the redundant HUB nodes serve as backup of the main HUB nodes, so that the risk of single-point faults is avoided. In SHHS architecture, the SPOKE node may be a Customer Edge (CE) or a Provider Edge (PE). Similarly, HUB nodes may be CEs or PEs.
Fig. 1 shows an architecture diagram of an exemplary SHHS architecture provided by an embodiment of the present application. As shown in fig. 1, the SHHS architecture includes two CEs (i.e., C1 and C2) and four PEs (i.e., P1, P2, P3, and P4). Wherein, two CEs are SPOKE nodes, four PEs are HUB nodes, P1 and P2 are main HUB nodes, P3 is redundant HUB node of P1, and P4 is redundant HUB node of P2. As shown in fig. 1, the HUB nodes may communicate directly, and the SPOKE nodes may communicate through the HUB nodes. A, B, C, D, E, F, G, H, I and J in fig. 1 are communication links. In the case where the network device and the communication link are not failed, the network device and the communication link through which the data transmission path from C1 to C2 passes are C1, A, P1, B, P2, C, and C2 in this order.
It should be noted that the foregoing is merely exemplary of the SHHS architecture, and the SHHS architecture may include more or fewer SPOKE nodes and HUB nodes, which is not a limitation of the embodiments of the present application.
Fig. 2 shows an interactive flowchart of a fault handling method provided by an embodiment of the present application. The method may be applied in SHHS architecture, for example, SHHS architecture shown in fig. 1. As shown in fig. 2, the method may include:
in step S201, the first network device detects whether the second network device has failed.
The first network device may represent any network device in SHHS architecture, and the first network device may be a SPOKE node or a HUB node. For example, the first network device may be C1, C2, P1, or P2 shown in fig. 1.
The second network device may be a next hop of the first network device in the first data transmission direction, and the second network device is a master device. That is, the second network device is a HUB node and is a primary HUB node. For example, referring to fig. 1, assuming that the first data transmission direction is from C1 to C2, the second network device is P1 in the case where the first network device is C1, and is P2 in the case where the first network device is P1.
The first data transmission direction may represent any one of the data transmission directions. For example, referring to fig. 1, the first output transmission direction may be from C1 to C2, or from C2 to C1. The second data transfer direction may represent a data transfer direction opposite to the first data transfer direction. Referring to fig. 1, if the first data transmission direction is from C1 to C2, the second data transmission direction is from C2 to C1, and if the first data transmission direction is from C2 to C1, the second data transmission direction is from C1 to C2.
That is, in the embodiment of the present application, the first network device may detect whether the next hop in each data transmission direction is faulty. As shown in fig. 1, assuming that P1 is the first network device, P1 may detect whether C1 has failed or whether P2 has failed.
In a possible implementation manner, step S201 may include detecting a link between the first network device and the second network device, and in a case where a quality of the link between the first network device and the second network device does not meet a preset condition, the first network device may determine that the second network device has failed.
The preset conditions may be used to measure link quality, and the preset conditions may be set according to needs, for example, the preset conditions may include one or more of a packet loss rate smaller than a certain threshold, a bandwidth larger than a certain threshold, jitter smaller than a certain threshold, and a time delay smaller than a certain threshold, which is not limited in this embodiment of the present application. If the quality of the link between the first network device and the second network device is not satisfied with the preset condition, the link quality is poor, and the effect of transmitting the data packet between the first network device and the second network device is poor, so that the first network device can determine that the second network device has a fault, and the first network device needs to bypass the link between the first network device and the second network device when receiving or transmitting the data packet.
In one possible implementation, step S201 may include detecting a link between the first network device and the second network device, and determining that the second network device has failed if the second network device is not reachable.
In the event that the second network device is not reachable, the first network device may determine that the second network device has failed, such that the first network device needs to bypass a link between the first network device and the second network device to receive or transmit the data packet.
In one possible implementation, the first network device may detect the quality of the link with the second network device and/or detect whether the second network device is reachable by sending a ping message or heartbeat, etc.
In step S202, in the case where the second network device fails, the route in the first data transmission direction is directed to the third network device.
In step S203, the first network device sends a first control signaling to the third network device.
Wherein the third network device may represent a standby device of the second network device. As shown in fig. 1, assuming that the second network device is P1, the third network device is P3, and assuming that the second network device is P2, the third network device is P4.
In the case of a failure of the second network device, the first network device needs to bypass the second network device when transmitting data, so as to ensure reliable transmission of the data. At this time, the first network device may direct the route in the first data transmission direction to the third network device. Thus, when the first network device receives a data packet (which may be simply referred to as a first data packet) in the first data transmission direction, the first network device may send the first data packet to the third network device according to the routing indication, so as to bypass the second network device for transmission.
The first network device may also send first control signaling to the third network device in the event of a failure of the second network device. The first control signaling may be used to control routing in the first data transmission direction in the third network device to the fourth network device.
Wherein the fourth network device may be a next hop on the second network device in the first data transmission direction.
Before the second network device fails, the first network device first sends the first data packet to the second network device, and then the second network device sends the first data packet to the fourth network device.
After the second network device fails, the first network device does not send the first data packet to the second network device any more, but instead sends the first data packet to the third network device, at which time the third network device needs to send the first data packet to the fourth network device. Thus, the first network device may send the first control information to the third network device to cause the third network device to direct the route in the first data transmission direction to the fourth network device. Thus, after the third network device receives the first data packet, the first data packet may be sent to the fourth network device. That is, in the case where the second network device fails, the first network device first transmits the first data packet to the third network device, and then the third network device transmits the first data packet to the fourth network device.
In one possible implementation, the first control signaling is further used to control routing in the second data transmission direction in the third network device to the first network device and to control routing in the second data transmission direction in the fourth network device to the third network device.
Since the first data transmission direction and the second data transmission direction are opposite data transmission directions, the second network device is the next hop of the first network device in the first data transmission direction, and the second network device is the next hop of the second network device in the second data transmission direction. The fourth network device is the next hop of the second network device in the first data transmission direction, and the second network device corresponds to the next hop of the fourth network device in the second data transmission direction.
Before the second network device fails, the fourth network device first sends the second data packet (i.e., the data packet in the second data transmission direction) to the second network device, and then the second network device sends the second data packet to the fourth network device.
After the second network device fails, the fourth network device first sends the second data packet to the third network device, and then the third network device sends the second data packet to the first network device.
And comparing the two data transmission directions, wherein after the second network equipment fails, the transmission path of the first data packet in the first data transmission direction is the first network equipment, the third network equipment and the fourth network equipment, and the transmission path of the second data packet in the second data transmission direction is the fourth network equipment, the third network equipment and the first network equipment. Therefore, in the embodiment of the application, the path is modified in two directions, and after the fault occurs, the paths for sending data and receiving data are modified together, so that the network problem of inconsistent back and forth paths is avoided.
According to the fault processing method, when the first network device detects that the second network device serving as the next hop breaks down, the first network device directs its own route to the standby device of the second network device, and controls the route in the first data transmission direction on the standby device to direct to the next hop of the second network device by sending a control signaling to the standby device. In this way, the standby equipment is directly indicated to carry out route direction change through the control signaling, so that the route direction change does not depend on the convergence speed of BGP (border gateway protocol) routes, the route switching speed is effectively improved, and the influence of equipment faults on the network performance is reduced.
In step S204, the third network device directs the route in the first data transmission direction to the fourth network device in response to the first control signaling.
As is known from step S201 and step S202, the first control signaling is generated in case the second network device fails. The second network device may be a next hop of the first network device in the first data transmission direction, and the second network device is a master device. And the third network device represents a standby device for the second network device. That is, in the case where the first network device detects a failure of the second network device, the first network device generates the first control signaling and transmits the first control information to the third network device, which is a standby device of the second network device, so that the third network device performs a data transmission work instead of the second network device.
Wherein the fourth network device may represent a next hop of the second network device in the first data transmission direction.
In step S205, the third network device directs the route in the second data transmission direction to the first network device in response to the first control signaling, and sends the third control signaling to the fourth network device.
Wherein the third control signaling may be used to control routing in the second data transmission direction in the fourth network device to the third network device. Here, the route in the first data transmission direction in the fourth network device does not need to be changed, but still follows the previous path.
In step S206, the fourth network device directs the route in the second data transmission direction to the third network device in response to the third control signaling.
Thus, after the fourth network device receives the second data packet, the second data packet may be sent to the third network device, and then the third network device sends the second data packet to the first network device.
In addition, if the fourth network device is a HUB node, after step S206, the fourth network device may continue to send third control signaling to the next hop device in its first direction, so that it directs the route in the second data transmission direction to the fourth network device. If the fourth network device is a SPOKE node, the fourth network device does not need to continue to send the third control signaling after step S206. That is, the third control signaling is sent sequentially in the network along the first data direction starting from the fourth network device (i.e., the next hop of the failed second network device in the first data transmission direction) to node SOPKE.
In step S207, the first network device detects whether the failure of the second network device is recovered.
The first network device may determine a failure recovery of the second network device when a quality of a link between the first network device and the second network device satisfies a preset condition or the second network device is reachable. Specific details of step S201 may be referred to and will not be described herein.
In step S208, the first network device directs the route in the first data transmission direction to the second network device in the case that the failure of the second network device is recovered and the duration reaches the preset threshold.
The first network device may start timing when detecting that the second network device is failed to recover, and if the timing duration reaches a preset threshold, direct the route in the first data transmission direction to the second network device. If the first network equipment detects the second network equipment to fail again before the timing duration does not reach the preset threshold, the first network equipment clears the timing until the second network equipment is detected again to recover from the failure, and then the timing is restarted.
The preset threshold may be set as needed, for example, the preset threshold may be set to 300 seconds. When the preset threshold value is larger, the probability of repeated switching of the routing line is lower. When the preset threshold value is smaller, the route line recovery time is shorter.
In the embodiment of the application, the route is switched after the second network equipment is recovered for a period of time, so that the influence of the route switching back and forth on the traffic transmission caused by the repeated faults of the links or the equipment can be prevented.
In step S209, the first network device sends a second control signaling to the second network device.
Wherein the second control signaling may be used to control routing in the first data transmission direction in the second network device to the fourth network device and routing in the second data transmission direction in the fourth network device to the second network device.
In step S210, the second network device directs the route in the first data transmission direction to the fourth network device in response to the second control signaling.
In step S211, the second network device sends fourth control signaling to the fourth network device in response to the second control signaling.
Wherein the fourth control signaling is for controlling the fourth network device to direct the route in the second data transmission direction to the second network device.
In step S212, the fourth network device directs the route in the second data transmission direction to the second network device in response to the fourth control signaling.
After the failure recovery of the second network device, the transmission path in the first data transmission direction becomes the first network device, the second network device, and the fourth network device, and the transmission path in the second data transmission direction becomes the fourth network device, the second network device, and the first network device.
In one possible implementation, each network device in the embodiments of the present application may change the route direction by filtering the route through the pre-list. For example, the first network device may direct the route in the first data transmission direction to the third network device by rejecting the route of the second network device in the local first data transmission direction's pre-list. The fourth network device may direct the route in the second data transmission direction to the third network device by rejecting the route of the second network device in the pre-list in the local second data transmission direction. Of course, the various network devices may change the route directions by other means, such as modifying a forwarding table, a routing table, etc., which embodiments of the present application do not limit.
In the embodiment of the application, the first network equipment can detect whether the second network equipment fails or not in real time through the probe, and once the second network equipment fails, the route switching is automatically triggered without manual intervention, so that the automation of the whole switching process is realized.
The automatic BGP convergence only repairs a single route selection after the current SHHS architecture fails, so that the problems of unequal routes or inconsistent route back and forth paths and the like are caused. In the embodiment of the application, the bidirectional route selection is modified at the same time, and the stability and reliability of the route are determined.
In the embodiment of the application, the route is indicated to be modified through the control signaling, so that the switching speed of the route selection is greatly increased. Compared with the traditional automatic route convergence, the switching time from the probe trigger task to the route selection is controlled at the millisecond level, and the response speed and the usability of the network are improved.
After the network line or HUB failure of the current SHHS architecture is recovered, BGP will re-converge and then recover the original route. Persistent link fluctuations and iterations of availability may occur in this process. In the embodiment of the application, after the fault line or equipment is recovered, whether the line or equipment is normal or not is judged by continuously detecting for a period of time, and then the line or equipment is switched back to the original line, so that the stability and the usability of the whole network are ensured.
In summary, the embodiment of the application optimizes the usability and usability of the architecture, and greatly increases the advantages of the whole architecture.
In an exemplary embodiment, fig. 3 shows a flowchart of a fault handling method provided by an embodiment of the present application. As shown in fig. 3, the method may include:
In step S301, the first network device uses a probe to continuously detect the underlying line.
In step S302, the first network device detects whether the second network device has failed. If yes, go to step S303, otherwise, go to step S307.
In step S303, the first network device modifies the local routing.
In step S304, the first network device sends a first control signaling to the third network device.
In step S305, the third network device modifies the local routing in response to the first control signaling.
In step S306, the first network device determines whether the failure of the second network device is recovered. If yes, go to step S301, otherwise, go to step S307.
Step S303 to step S306 may refer to step S202 to step S210, which are not described herein.
In step S307, the first network device performs normal data forwarding.
Application example
Fig. 4 shows an application schematic diagram of a fault handling method according to an embodiment of the present application. As shown in fig. 4, on the basis of fig. 1, if C1 (i.e., the first network device) detects that a (i.e., the link between the first network device and the second network device) fails, C1 modifies its routing from C1 to C2 to P3, and sends the first control signaling to P3 (i.e., the third network device). P3 modifies its own routing from C1 to C2 to P2 in response to the first control signaling, and sends a third control signaling to P2. P4 modifies itself from C2 to C1 routing direction P4 in response to the third control signaling. As shown in fig. 4, in the case of a failure, the data transmission path from C1 to C2 passes through C1, D, P3, H, P2, C, and C2 in order.
Fig. 5 shows an application schematic diagram of a fault handling method according to an embodiment of the present application. As shown in fig. 5, if P1 detects that B fails on the basis of fig. 1, P1 modifies its routing from C1 to C2 to P4 and sends a first control signaling to P4. P4 modifies itself routing from C1 to C2 in response to the first control signaling and sends a third control signaling to C2. C2 modifies itself from the routing direction of C2 to C1 to P4 in response to the third control signaling. As shown in fig. 4, in the case of a B failure, the data transmission path from C1 to C2 passes through C1, A, P1, G, P4, F, and C2 in order.
Fig. 6 shows an application schematic diagram of a fault handling method according to an embodiment of the present application. As shown in fig. 6, if C1 detects that P1 fails, C1 modifies its routing from C1 to C2 to P3 and sends a first control signaling to P3. P3 modifies its own routing from C1 to C2 to P2 in response to the first control signaling, and sends a third control signaling to P2. P2 modifies itself from the routing direction of C2 to C1 to P3 in response to the second control heart. As shown in fig. 4, in the case of a P1 failure, the data transmission path from C1 to C2 passes through C1, D, P3, H, P2, C, and C2 in order.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a fault processing device for realizing the fault processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the fault handling device provided below may refer to the limitation of the fault handling method described above, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 7, there is provided a fault handling apparatus 700, which may be applied to a first network device, the apparatus 700 may include a first detection module 701 and a first routing module 702, wherein:
The first detection module is used for detecting whether a second network device has a fault or not, wherein the second network device is the next hop of the first network device in the first data transmission direction, and the second network device is a main device;
And the first routing module is used for directing the route in the first data transmission direction to third network equipment and sending a first control signaling to the third network equipment under the condition that the second network equipment fails, wherein the third network equipment represents standby equipment of the second network equipment, the first control signaling is used for controlling the route in the first data transmission direction in the third network equipment to be directed to fourth network equipment, and the fourth network equipment is the next hop in the second network equipment in the first data transmission direction.
In a possible implementation manner, the first control signaling is further used to control a route in a second data transmission direction in the third network device to point to the first network device, and to control a route in the second data transmission direction in the fourth network device to point to the third network device, where the second data transmission direction represents a data transmission direction opposite to the first data transmission direction.
In one possible implementation, the apparatus further includes:
the second detection module is used for detecting whether the fault of the second network equipment is recovered or not;
and the second routing module is used for directing the route in the first data transmission direction to the second network equipment and sending a second control signaling to the second network equipment under the condition that the fault of the second network equipment is recovered and the recovery duration reaches a preset threshold value, wherein the second control signaling is used for controlling the route in the first data transmission direction in the second network equipment to be directed to the fourth network equipment and the route in the second data transmission direction in the fourth network equipment to be directed to the second network equipment.
In one possible implementation manner, the first detection module is further configured to:
Detecting a link between the first network device and the second network device;
Determining that the second network device has failed under the condition that the quality of a link between the first network device and the second network device does not meet a preset condition;
And/or the number of the groups of groups,
And determining that the second network device has failed in the event that the second network device is not reachable.
In an exemplary embodiment, as shown in fig. 8, there is provided a fault handling apparatus 800, which may be applied to a first network device, the apparatus 800 may include a receiving module 801 and a first routing module 802, wherein:
A receiving module, configured to receive a first control signaling sent by a first network device, where the first control signaling is generated when a second network device has a fault, the second network device is a next hop of the first network device in a first data transmission direction, the second network device is a primary device, and the third network device represents a standby device of the second network device;
And the first routing module is used for responding to the first control signaling and directing the route in the first data transmission direction to fourth network equipment, wherein the fourth network equipment is the next hop of the second network equipment in the first data transmission direction.
In one possible implementation, the apparatus further includes:
And a second routing module, configured to respond to the first control signaling, direct a route in a second data transmission direction to the first network device, and send a third control signaling to the fourth network device, where the third control signaling is used to control the route in the second data transmission direction in the fourth network device to be directed to the third network device, and the second data transmission direction represents a data transmission direction opposite to the first data transmission direction.
Each of the modules in the fault handling apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or independent of a processor in the network device, or may be stored in software in a memory in the network device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, a network device is provided, which may be a server, and an internal structure thereof may be as shown in fig. 9. The network device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the network device is configured to provide computing and control capabilities. The memory of the network device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the network device is for storing routing data. The input/output interface of the network device is used to exchange information between the processor and the external device. The communication interface of the network device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a fault handling method.
It will be appreciated by those skilled in the art that the architecture shown in fig. 9 is merely a block diagram of a portion of the architecture associated with the inventive arrangements and is not limiting as to the network device to which the inventive arrangements are applied, and that a particular network device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an exemplary embodiment, a network device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (8)

1.一种故障处理方法,其特征在于,所述方法应用于第一网络设备,所述方法包括:1. A fault handling method, characterized in that the method is applied to a first network device, and the method comprises: 检测第二网络设备是否发生故障,所述第二网络设备为第一数据传输方向上所述第一网络设备的下一跳,且所述第二网络设备为主用设备;Detecting whether a second network device fails, where the second network device is the next hop of the first network device in the first data transmission direction and the second network device is a main device; 在所述第二网络设备发生了故障的情况下,将所述第一数据传输方向上的路由指向第三网络设备,并向所述第三网络设备发送第一控制信令,所述第三网络设备表示所述第二网络设备的备用设备,所述第一控制信令用于控制所述第三网络设备中所述第一数据传输方向上的路由指向第四网络设备,所述第四网络设备为所述第一数据传输方向上所述第二网络设备上的下一跳;所述第一控制信令还用于控制所述第三网络设备中第二数据传输方向上的路由指向所述第一网络设备,以及控制所述第四网络设备中所述第二数据传输方向上的路由指向所述第三网络设备,所述第二数据传输方向表示与所述第一数据传输方向相反的数据传输方向。In the event that the second network device fails, the route in the first data transmission direction is directed to a third network device, and a first control signaling is sent to the third network device, the third network device represents a backup device of the second network device, and the first control signaling is used to control the route in the first data transmission direction in the third network device to point to a fourth network device, and the fourth network device is the next hop on the second network device in the first data transmission direction; the first control signaling is also used to control the route in the second data transmission direction in the third network device to point to the first network device, and to control the route in the second data transmission direction in the fourth network device to point to the third network device, the second data transmission direction represents a data transmission direction opposite to the first data transmission direction. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 检测所述第二网络设备的故障是否恢复;Detecting whether the failure of the second network device is restored; 在所述第二网络设备的故障恢复且恢复持续时长达到预设阈值的情况下,将所述第一数据传输方向上的路由指向所述第二网络设备,并向所述第二网络设备发送第二控制信令,所述第二控制信令用于控制所述第二网络设备中所述第一数据传输方向上的路由指向所述第四网络设备以及所述第四网络设备中所述第二数据传输方向上的路由指向所述第二网络设备。When the failure of the second network device is recovered and the recovery duration reaches a preset threshold, the route in the first data transmission direction is directed to the second network device, and a second control signaling is sent to the second network device, wherein the second control signaling is used to control the route in the first data transmission direction in the second network device to be directed to the fourth network device and the route in the second data transmission direction in the fourth network device to be directed to the second network device. 3.根据权利要求1所述的方法,其特征在于,所述检测第二网络设备是否发生故障,包括:3. The method according to claim 1, wherein the step of detecting whether the second network device fails comprises: 检测所述第一网络设备与所述第二网络设备之间链路;Detecting a link between the first network device and the second network device; 在所述第一网络设备与所述第二网络设备之间链路的质量不满足预设条件的情况下,确定所述第二网络设备发生了故障;When the quality of the link between the first network device and the second network device does not meet a preset condition, determining that a fault occurs in the second network device; 和/或,and/or, 在所述第二网络设备不可达的情况下,确定所述第二网络设备发生了故障。In the case that the second network device is unreachable, it is determined that a failure occurs in the second network device. 4.一种故障处理方法,其特征在于,所述方法应用于第三网络设备,所述方法包括:4. A fault handling method, characterized in that the method is applied to a third network device, and the method comprises: 接收第一网络设备发送的第一控制信令,所述第一控制信令是在第二网络设备发生了故障的情况下生成的,所述第二网络设备为第一数据传输方向上所述第一网络设备的下一跳,所述第二网络设备为主用设备,所述第三网络设备表示所述第二网络设备的备用设备;receiving a first control signaling sent by a first network device, where the first control signaling is generated when a second network device fails, the second network device is a next hop of the first network device in a first data transmission direction, the second network device is an active device, and the third network device represents a backup device of the second network device; 响应于所述第一控制信令,将所述第一数据传输方向上的路由指向第四网络设备,以及将第二数据传输方向上的路由指向所述第一网络设备,以及向所述第四网络设备发送第三控制信令,所述第三控制信令用于控制所述第四网络设备中所述第二数据传输方向上的路由指向所述第三网络设备,所述第二数据传输方向表示与所述第一数据传输方向相反的数据传输方向,所述第四网络设备为所述第一数据传输方向上所述第二网络设备的下一跳。In response to the first control signaling, the route in the first data transmission direction is directed to a fourth network device, and the route in the second data transmission direction is directed to the first network device, and a third control signaling is sent to the fourth network device, wherein the third control signaling is used to control the route in the second data transmission direction in the fourth network device to point to the third network device, the second data transmission direction represents a data transmission direction opposite to the first data transmission direction, and the fourth network device is the next hop of the second network device in the first data transmission direction. 5.一种故障处理装置,其特征在于,所述装置应用于第一网络设备,所述装置包括:5. A fault handling device, characterized in that the device is applied to a first network device, and the device comprises: 第一检测模块,用于检测第二网络设备是否发生故障,所述第二网络设备为第一数据传输方向上所述第一网络设备的下一跳,且所述第二网络设备为主用设备;A first detection module, used to detect whether a second network device fails, the second network device being the next hop of the first network device in a first data transmission direction, and the second network device being a main device; 第一路由模块,用于在所述第二网络设备发生了故障的情况下,将所述第一数据传输方向上的路由指向第三网络设备,并向所述第三网络设备发送第一控制信令,所述第三网络设备表示所述第二网络设备的备用设备,所述第一控制信令用于控制所述第三网络设备中所述第一数据传输方向上的路由指向第四网络设备,所述第四网络设备为所述第一数据传输方向上所述第二网络设备上的下一跳,所述第一控制信令还用于控制所述第三网络设备中第二数据传输方向上的路由指向所述第一网络设备,以及控制所述第四网络设备中所述第二数据传输方向上的路由指向所述第三网络设备,所述第二数据传输方向表示与所述第一数据传输方向相反的数据传输方向。The first routing module is used to point the route in the first data transmission direction to a third network device and send a first control signaling to the third network device when a failure occurs in the second network device. The third network device represents a backup device of the second network device. The first control signaling is used to control the route in the first data transmission direction in the third network device to point to a fourth network device. The fourth network device is the next hop on the second network device in the first data transmission direction. The first control signaling is also used to control the route in the second data transmission direction in the third network device to point to the first network device, and control the route in the second data transmission direction in the fourth network device to point to the third network device. The second data transmission direction represents a data transmission direction opposite to the first data transmission direction. 6.一种故障处理装置,其特征在于,所述装置应用于第三网络设备,所述装置包括:6. A fault handling device, characterized in that the device is applied to a third network device, and the device comprises: 接收模块,用于接收第一网络设备发送的第一控制信令,所述第一控制信令是在第二网络设备发生了故障的情况下生成的,所述第二网络设备为第一数据传输方向上所述第一网络设备的下一跳,所述第二网络设备为主用设备,所述第三网络设备表示所述第二网络设备的备用设备;A receiving module, configured to receive a first control signaling sent by a first network device, wherein the first control signaling is generated when a second network device fails, the second network device is a next hop of the first network device in a first data transmission direction, the second network device is an active device, and the third network device represents a backup device of the second network device; 第一路由模块,用于响应于所述第一控制信令,将所述第一数据传输方向上的路由指向第四网络设备,以及将第二数据传输方向上的路由指向所述第一网络设备,以及向所述第四网络设备发送第三控制信令,所述第三控制信令用于控制所述第四网络设备中所述第二数据传输方向上的路由指向所述第三网络设备,所述第二数据传输方向表示与所述第一数据传输方向相反的数据传输方向,所述第四网络设备为所述第一数据传输方向上所述第二网络设备的下一跳。The first routing module is used to, in response to the first control signaling, direct the route in the first data transmission direction to the fourth network device, direct the route in the second data transmission direction to the first network device, and send a third control signaling to the fourth network device, wherein the third control signaling is used to control the route in the second data transmission direction in the fourth network device to point to the third network device, the second data transmission direction represents a data transmission direction opposite to the first data transmission direction, and the fourth network device is the next hop of the second network device in the first data transmission direction. 7.一种网络设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至3中任一项所述的方法的步骤,或者实现权利要求4所述的方法的步骤。7. A network device, comprising a memory and a processor, wherein the memory stores a computer program, wherein when the processor executes the computer program, the processor implements the steps of the method described in any one of claims 1 to 3, or implements the steps of the method described in claim 4. 8.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至3中任一项所述的方法的步骤,或者实现权利要求4所述的方法的步骤。8. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 3, or implements the steps of the method according to claim 4.
CN202410020936.4A 2024-01-05 2024-01-05 Fault handling method, device, network equipment and storage medium Active CN117914777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410020936.4A CN117914777B (en) 2024-01-05 2024-01-05 Fault handling method, device, network equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410020936.4A CN117914777B (en) 2024-01-05 2024-01-05 Fault handling method, device, network equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117914777A CN117914777A (en) 2024-04-19
CN117914777B true CN117914777B (en) 2025-02-11

Family

ID=90688253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410020936.4A Active CN117914777B (en) 2024-01-05 2024-01-05 Fault handling method, device, network equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117914777B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995029A (en) * 2018-06-30 2021-06-18 华为技术有限公司 Method, device and system for processing transmission path fault

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006025296A1 (en) * 2004-08-31 2006-03-09 Nec Corporation Failure recovery method, network device, and program
CN110267285B (en) * 2019-06-28 2021-02-09 京信通信系统(中国)有限公司 Main/standby link switching method and device and digital switch
CN113472645A (en) * 2020-03-30 2021-10-01 华为技术有限公司 Method, device and equipment for sending route and processing route

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995029A (en) * 2018-06-30 2021-06-18 华为技术有限公司 Method, device and system for processing transmission path fault

Also Published As

Publication number Publication date
CN117914777A (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN110166356B (en) Method and network equipment for sending message
CN110113259B (en) Path state notification method, path switching method, forwarding equipment and system
US20110228669A1 (en) Techniques for link redundancy in layer 2 networks
CN111817881B (en) Fault processing method and related device
CN109889350A (en) A kind of method and device for toggle path in SDN network failure
JP2011517209A (en) Reduction of traffic loss in EAPS system
CN109462533B (en) Link switching method, link redundancy backup network and computer readable storage medium
US10666554B2 (en) Inter-chassis link failure management system
US20190268235A1 (en) Method for managing network nodes and communication control method thereof
CN109391543B (en) Method and system for multi-service fault recovery and service recovery auxiliary system
CN113949649B (en) Fault detection protocol deployment method and device, electronic equipment and storage medium
CN109120449B (en) Method and device for detecting link failure
CN117914777B (en) Fault handling method, device, network equipment and storage medium
WO2016169214A1 (en) Tunnel protection switching method and device
WO2015165280A1 (en) Method, device, and system for determining intermediate routing node
US20210203593A1 (en) Multicast Fast Switching Method, Device and Equipment, and Storage Medium
CN117411840A (en) Link failure processing method, device, equipment, storage medium and program product
WO2018120228A1 (en) Method and device for recovering from ring circuit fault, and node apparatus
CN114430295B (en) Satellite network link fault processing method, device, equipment and storage medium
CN111917637B (en) Data message sending method and device
WO2022105325A1 (en) Rerouting method, communication apparatus and storage medium
CN110611620B (en) Link updating method and device
CN116155795A (en) Route updating method and device
CN113805788B (en) Distributed storage system and exception handling method and related device thereof
WO2017152595A1 (en) Method and device for responding to network topology change

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant