US20180351855A1 - All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups - Google Patents
All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups Download PDFInfo
- Publication number
- US20180351855A1 US20180351855A1 US15/611,283 US201715611283A US2018351855A1 US 20180351855 A1 US20180351855 A1 US 20180351855A1 US 201715611283 A US201715611283 A US 201715611283A US 2018351855 A1 US2018351855 A1 US 2018351855A1
- Authority
- US
- United States
- Prior art keywords
- standby
- node
- active
- links
- common endpoint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/24—Multipath
- H04L45/245—Link aggregation, e.g. trunking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/28—Routing or path finding of packets in data switching networks using route fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/66—Layer 2 routing, e.g. in Ethernet based MAN's
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0811—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Definitions
- the present disclosure generally relates to networking systems and methods. More particularly, the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs).
- M-LAGs Multi-Chassis Link Aggregation Groups
- Link aggregation relates to combining various network connections in parallel to increase throughput, beyond what a single connection could sustain, and to provide redundancy between the links.
- Link aggregation including the Link Aggregation Control Protocol (LACP) for Ethernet is defined in IEEE 802.1AX, IEEE 802.1aq, IEEE 802.3ad, as well in various proprietary solutions.
- IEEE 802.1AX-2008 and IEEE 802.1AX-2014 are entitled Link Aggregation, the contents of which are incorporated by reference.
- IEEE 802.1aq-2012 is entitled Shortest Path Bridging, the contents of which are incorporated by reference.
- IEEE 802.3ad-2000 is entitled Link Aggregation, the contents of which are incorporated by reference.
- Multi-Chassis Link Aggregation Group is a type of LAG with constituent ports that terminate on separate chassis, primarily for the purpose of providing nodal redundancy in the event one of the chassis fails.
- the relevant standards for LAG do not mention MC-LAG, but do not preclude it.
- MC-LAG implementation varies by vendor.
- LAG is a technique for inverse multiplexing over multiple Ethernet links, thereby increasing bandwidth and providing redundancy.
- IEEE 802.1AX-2008 states “Link Aggregation allows one or more links to be aggregated together to form a Link Aggregation Group, such that a MAC (Media Access Control) client can treat the Link Aggregation Group as if it were a single link.”
- This layer 2 transparency is achieved by LAG using a single MAC address for all the device's ports in the LAG group.
- LAG can be configured as either static or dynamic.
- Dynamic LAG uses a peer-to-peer protocol for control, called Link Aggregation Control Protocol (LACP). This LACP protocol is also defined within the 802.1AX-2008 standard the entirety of which is incorporated herein by reference.
- LACP Link Aggregation Control Protocol
- LAG can be implemented in multiple ways, namely LAG N and LAG N+N/M+N.
- LAG N is the load sharing mode of LAG and LAG N+N/M+N provides the redundancy.
- the LAG N protocol automatically distributes and load balances the traffic across the working links within a LAG, thus maximizing the use of the group if Ethernet links go down or come back up, providing improved resilience and throughput.
- a complete implementation of the LACP protocol supports separate worker/standby LAG subgroups. For LAG N+N, the work links as a group will failover to the standby links if any one or more or all of the links in the worker group fail. Note, LACP marks links as in standby mode using an “out of sync” flag.
- Link Aggregation includes increased throughput/bandwidth (physical link capacity*number of physical links), load balancing across aggregated links and link-level redundancy (failure of a link does not result in a traffic drop; rather standby links can take over as active role for traffic distribution).
- One of the limitations of Link Aggregation is that it does not provide node-level redundancy. If one end of a LAG fails, it leads to a complete traffic drop as there is no other data path available for the data traffic to be switched to the other node.
- “Multi-Chassis” Link Aggregation Group (MC-LAG) is introduced, that provides a node-level redundancy in addition to link-level redundancy and other merits provided by LAG.
- MC-LAG allows two or more nodes (referred to herein as a Redundant Group (RG)) to share a common LAG endpoint (Dual Homing Device (DHD)).
- RG Redundant Group
- DHD Dual Homing Device
- the multiple nodes present a single logical LAG to the remote end.
- MC-LAG implementations are vendor-specific, but cooperating chassis remain externally compliant to the IEEE 802.1AX-2008 standard.
- Nodes in an MC-LAG cluster communicate to synchronize and negotiate automatic switchovers (failover).
- Some implementations may support administrator-initiated (manual) switchovers.
- the multiple nodes in the redundant group maintain some form of adjacency with one another, such as the Inter-Chassis Communication Protocol (ICCP).
- ICCP Inter-Chassis Communication Protocol
- the redundant group requires the adjacency to operate the MC-LAG, a loss in the adjacency (for any reason including a link fault, a nodal fault, etc.) results in a so-called split-brain problem where all peers in the redundant group attempt to take an active role considering corresponding peers as operationally down. This can lead to the introduction of loops in the MC-LAG network and result in the rapid duplication of packets.
- a method utilizing all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network implemented by a standby node includes remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; monitoring frames transmitted by the common endpoint to the standby node over the standby links; and determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon.
- M-LAG Multi-Chassis Link Aggregation Group
- the method can further include determining based on the monitoring frames whether less than all of the active links have failed and remaining in the standby state and continuing monitoring the frames transmitted by the common endpoint over the standby links based thereon.
- the monitoring can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
- the common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
- the common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
- LACP Link Aggregation Control Protocol
- the common endpoint can be unaware the active node and the standby node are in separate network elements.
- the loss of adjacency with the active node can be based on a failure or fault on a link between the active node and the standby node used for coordination of the active node and the standby node in a Redundant Group, while the active node and the standby node are both operational.
- a standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems
- MC-LAG Multi-Chassis Link Aggregation Group
- a standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems
- the standby node can be further configured to determine based on the monitoring frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
- the frames can be monitored to check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
- the common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
- the common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
- LACP Link Aggregation Control Protocol
- the common endpoint can be unaware the active node and the standby node are in separate network elements.
- the loss of adjacency with the active node can be based on a failure or fault on the communication link, while the active node and the standby node are both operational.
- M-LAG Multi-Chassis Link Aggregation Group
- the apparatus can further include circuitry configured to determine based on the monitored frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
- the circuitry configured to monitor can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
- the common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
- the common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
- LACP Link Aggregation Control Protocol
- N:N link-level redundancy between the active node and the standby node The common endpoint can be unaware the active node and the standby node are in separate network elements.
- FIG. 1 illustrates an active/standby Multi-Chassis Link Aggregation Group (MC-LAG);
- FIG. 2 illustrates the MC-LAG of FIG. 1 with a fault and associated node-level redundancy
- FIG. 3 illustrates the MC-LAG of FIG. 1 with the Inter-Chassis Communication Protocol (ICCP) link failed and associated operation with no other faults;
- ICCP Inter-Chassis Communication Protocol
- FIG. 4 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on one of the active links causing the split-brain problem of the prior art
- FIG. 5 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on any but the last the active link in an all-or-none (AON) switchover to prevent the split-brain problem in accordance with an embodiment of the proposed solution;
- AON all-or-none
- FIG. 6 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on all of the active links in the AON switchover in accordance with an embodiment of the proposed solution;
- FIG. 7 illustrates a flowchart of an AON switchover process in accordance with an embodiment of the proposed solution implemented by the standby RG member node subsequent to the loss of connectivity with the active Redundant Group (RG) member node such as due to the fault on the ICCP link; and
- RG Redundant Group
- FIG. 8 illustrates an example network element for the proposed systems and methods described herein.
- the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs).
- MC-LAGs Multi-Chassis Link Aggregation Groups
- the systems and method solve the split-brain problem in an active/standby MC-LAG in a triangle topology (a DHD connected to a plurality of RG members).
- the proposed systems and methods are implemented between the RG members only without the involvement of the DHD; thus, the systems and methods can interoperate with any vendor's DHD.
- the systems and methods do not change system MAC addresses thereby avoiding increased switchover time.
- FIG. 1 illustrates an active/standby MC-LAG 10 .
- MC-LAG 10 simply means dual-homing an endpoint to two or more upstream devices, i.e., allowing two or more upstream nodes to share a common endpoint thereby providing node-level redundancy.
- the MC-LAG 10 includes a Redundant Group (RG) 12 which includes RG member nodes 14 , 16 which are the two or more upstream devices.
- the common endpoint is a Dual Homing Device (DHD) 18 .
- the nodes 14 , 16 and the DHD 18 can be Ethernet switches, routers, packet-optical devices, etc. supporting Layer 2 connectivity.
- the multiple nodes 14 , 16 in the RG 12 present a single logical LAG interface 20 which is an MC-LAG to a DHD LAG 22 .
- the nodes 14 , 16 each have a separate LAG 24 , 26 which are logically operated as the logical LAG interface 20 based on adjacency and coordination between the nodes 14 , 16 .
- the RG 12 can appear to the DHD 18 as a single node with the logical LAG interface 20 .
- the nodes 14 , 16 rely on LACP as an underlying communication protocol between one another.
- the nodes 14 , 16 can exchange their configuration and dynamic state data over an Inter-Chassis Communication Protocol (ICCP) link 28 .
- ICCP Inter-Chassis Communication Protocol
- the nodes 14 , 16 are different physical network elements which can be in the same location or in different locations.
- the nodes 14 , 16 are interconnected via a network 30 , such as a G.8032 Ethernet network, a Multiprotocol Label Switching (MPLS) network, or the like.
- the ICCP link 28 can be a physical connection in the network 30 .
- the ICCP link 28 can be a dedicated link between the nodes 14 , 16 such as when they are in the same location or chassis.
- RG 12 implementation is typically vendor-specific, i.e., not specified by the relevant LAG standards.
- the objective of the RG 12 is to present the nodes 14 , 16 and the logical LAG interface 20 as a single virtual endpoint to a standards-based LAG DHD 18 .
- Various vendors use different terminology for the MC-LAG which include: MLAG, distributed split multi-link trunking, multi-chassis trunking, MLAG, etc.
- the proposed systems and methods described herein can apply to any implementation of the RG 12 and seek to avoid coordination with the DHD 18 such that the RG 12 appears to any LAG-compliant DHD 12 as the single logical LAG interface 20 .
- other terminology may be used for the ICCP link 28 , but the objective is the same—to enable adjacency and coordination between the nodes 14 , 16 .
- the ICCP link 28 can be monitored via keep-alive message exchanges that deem this link operational.
- Connectivity Fault Management (CFM) or Bidirectional Forwarding Detection (BFD) services can be configured across the RG member nodes 14 , 16 .
- the DHD 18 includes four ports 32 into the LAG 22 , two ports 34 are active and connected to the LAG 26 and two ports 36 that are standby connected to the LAG 24 .
- the MC-LAG 10 is an active/standby MC-LAG.
- the four ports 32 appear as a standard LAG, and the DHD 18 is unaware that the ports 34 , 36 terminate on separate nodes 14 , 16 .
- the ICCP link 28 coordination between the RG member nodes 14 , 16 cause them to appear as a single node from the DHD 18 's perspective.
- FIG. 2 illustrates the MC-LAG 10 with a fault 50 and associated node-level redundancy.
- FIG. 2 illustrates two states 52 , 54 shown to illustrate how node-level redundancy is performed.
- the ports 34 are active such that the node 14 is the active RG member node and the ports 36 are standby such that the node 16 is the standby RG member node.
- the ports 34 , 36 include sending frames (LACPDUs—LACP Protocol Data Units) between the DHD 18 and the nodes 14 , 16 with SYNC bits.
- the ports 34 Prior to the fault 50 , the ports 34 have the LACPDU SYNC bits set to 1 indicating the ports 34 are active and the ports 36 have the LACPDU SYNC bits set to 0 indicating the ports 36 are standby.
- step 60 - 1 assume the node 14 fails, and the active RG member node's failure causes protection switching of traffic to the standby RG member node 16 .
- An MC-LAG supports a triangle, square, and mesh topology. Particularly, the disclosure herein focuses on the split-brain problem and solution in the MC-LAG triangle topology such that the DHD 18 is not required to participate in the diagnosis or correction and such that the ports 34 , 36 do not require new MAC addresses.
- the split-brain problem is an industry-wide known problem that happens in the case of dual homing. It may occur when communication between two MC-LAG nodes 14 , 16 is lost (i.e., the ICCP link 28 failed/operational down) while both the nodes 14 , 16 are still up and operational.
- both the nodes 14 , 16 being no longer aware of each other's existence, try to take active role considering the other one as operationally down. This can lead to the introduction of loops in MC-LAG 10 network and can result in rapid duplication of packets at the DHD 18 .
- the ICCP link 28 communication can be lost between the nodes 14 , 16 for various reasons, such as misconfigurations, network congestion, network errors, hardware failures, etc.
- example problems can include configuring or administratively enabling the ICCP link 28 only on one RG member node 14 , 16 , configuring different ICCP heartbeat interval or timeout multiplier on the RG member nodes 14 , 16 , incorrectly configuring CFM or BFD Monitoring over the ICCP link 28 , configuring CFM Maintenance End Points (MEPs) incorrectly that may result in MEP Faults (MEP Faults will be propagated to the ICCP link 28 deeming the ICCP link 28 operationally down), etc.
- MEP Faults MEP Faults will be propagated to the ICCP link 28 deeming the ICCP link 28 operationally down
- OAM Operations, Administration, and Maintenance
- FPGA Field Programmable Gate Array
- NPU Network Processor Unit
- ASIC Application Specific Integrated Circuit
- FIG. 3 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with no other faults.
- step 100 - 1 there is a fault 102 that causes the ICCP link 28 to fail. The reason for fault 102 is irrelevant.
- step 100 - 2 since the ICCP link 28 connectivity is lost between the RG member nodes 14 , 16 , both the RG member nodes 14 , 16 try to take the active role by setting the SYNC bit to 1 on all their member ports 34 , 36 .
- the node 14 already is the active node, so the node 14 does not change the SYNC bit, but the node 16 is in standby and goes into standalone active at step 100 - 3 .
- FIG. 4 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on one of the active links ( 34 ) causing the split-brain problem.
- step 150 - 1 there is fault 102 that causes the ICCP link 28 to fail. Again, the fault 102 could be for any reason.
- step 150 - 2 since the ICCP link 28 connectivity is lost between the RG member nodes 14 , 16 , both the RG member nodes 14 , 16 try to take the active role by setting the SYNC bit to 1 on all their member ports 34 , 36 .
- any distributing link fails on the ports 34 between the DHD 18 and the active RG member node 14 .
- the fault 104 causes a failure on one of the ports 34 , and the SYNC bit is 0 and unable to send on this port.
- the DHD 18 unaware of the fault 102 affecting the ICCP link 28 , selects one of the standby links on the ports 36 to take an active role and sets its SYNC Bit to 1 at step 150 - 4 .
- the SYNC bit has already been set to 1 on the standby RG member node 16 because of the ICCP link 28 fault 102 .
- the backup path on the ports 36 goes to the distribution state. Since, there is at least one link distributing from the DHD 18 to both the RG member nodes 14 , 16 ; it results in the formation of a loop resulting in packet duplication towards the DHD at step 150 - 5 .
- the result is the split-brain problem where the member nodes 14 , 16 cause the loop due to their lack of adjacency and coordination.
- the split-brain problem can only occur when there are more than one physical ports between the DHD 18 and each RG member node 14 , 16 .
- the DHD's 18 1:1 redundancy will ensure that only one port can be active at any point of time thus preventing active-active situation from happening.
- N:N/M:N redundancy is desired over 1:1 redundancy and employing N:N/M:N redundancy exposes the arrangement to the split-brain problem.
- FIGS. 5 and 6 illustrate the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on one of the active links with an all-or-none (AON) switchover to prevent the split-brain problem in accordance with the proposed solution.
- FIG. 5 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on any but the last the active link ( 34 ) in the AON switchover.
- FIG. 6 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with fault 104 on all of the active links in the AON switchover.
- the AON switchover can be implemented by each of the RG member nodes 14 , 16 with the restriction that the standby RG member node 16 will only take the active role when all of the active links ( 34 ) on the active RG member node 14 fail.
- the RG member nodes 14 , 16 cannot coordinate this with one another due to the fault 102 and the lack of adjacency. Instead, this is achieved by making optimal use of the SYNC bit as employed by DHD 18 .
- the standby RG member node 16 will not set its member's SYNC bit to 1 immediately, but rather rely on the DHD 18 port's SYNC bits in order to set its member's ( 16 ) SYNC bit.
- the AON switchover eliminates a loop during split brain situation where MC-LAG 10 is configured with N:N link redundancy and there is no link failure on the standby path (on the ports 36 ).
- the standby RG member node 16 will not go active and will keep the SYNC Bits to FALSE (0) and will keep monitoring the SYNC bits coming from the DHD 18 .
- the DHD 18 may not know it is in the MC-LAG but rather assume this is a standard LAG.
- This AON switchover approach does not require the DHD 18 to have a special configuration, but rather operate standard LACP. Further, the AON switchover does not require new MAC addresses and/or re-convergence.
- RG member nodes 14 , 16 are runtime upgraded to employ the functionality of the proposed solution, preferably standby RG member node 16 should be upgraded first (before active RG member node 14 ).
- FIG. 7 is a flowchart of an AON switchover process 300 implemented by the standby RG member node 16 subsequent to the loss of connectivity with the active RG member node 14 such as due to the fault 102 on the ICCP link 28 .
- the standby RG member node 16 performs the AON switchover process 300 to eliminate chances that the split-brain problem may cause a loop.
- the standby RG member node 16 begins the AON switchover process 300 subsequent to the loss of adjacency with the active RG member node 14 (step 302 ).
- the standby RG member node 16 remains in the standby state on all of the ports 36 keeping the SYNC bits set to 0 with the standby RG member node 16 monitoring LACPDUs from the DHD 18 for their associated SYNC bit (step 304 ). Specifically, this monitoring does not require the DHD 18 to make changes, but simply assumes DHD 18 to operate standard LACP in an N:N link-level redundancy scheme.
- the standby RG member node 16 can infer the operational status of the active ports 34 based on the SYNC bits from the DHD 18 on the standby ports 36 . Specifically, the standby RG member node 16 knows the value of N (N:N) and can infer the number of active/failed links on the ports 34 based on the number of SYNC bit values equal to 1 coming from the DHD 18 on the ports 36 . Thus, the AON switchover process 300 operates in a triangle MC-LAG with N:N active/standby configurations.
- the standby RG member node 16 can determine if any active links have failed (step 306 ). Specifically, no active links have failed if none of the ports 36 have the SYNC bit set to 0 coming from the DHD 18 and the standby RG member node 16 remains, (step 304 ), in the standby state on all of the ports 36 keeping the SYNC bits set to 0 and the standby RG member node 16 monitors LACPDUs from the DHD 18 for their associated SYNC bit (step 306 ).
- the standby RG member node 16 determines whether all of the active links have failed or whether some, but not all of the active links have failed (step 306 ). The standby RG member node 16 will only become active when all of the active links ( 34 ) have failed. This prevents the loops and does not require coordination with the DHD 18 or changes to system MAC addresses.
- step 306 If not all of the active links have failed (step 306 ), then the standby RG member node 16 remains in the standby state on all ports keeping the SYNC bits set to 0 and continues to monitor LACPDUs from the DHD 18 (step 304 ). If all of the active links ( 34 ) have failed (step 308 ), the standby RG member node enters the active state on all ports 36 changing the SYNC bits to 1 (step 308 ). This will result in the backup path going to distribution state and traffic will resume after protection switching.
- the AON switchover process 300 is implemented on the RG 12 and therefore is interoperable with any vendor's DHD 18 supporting standard LACP and the switchover time is not compromised since no re-convergence is required. Also, the AON switchover process 300 can be configurable and selectively enabled/disabled on both of the member nodes 14 , 16 .
- FIG. 5 similar to FIG. 4 , at step 350 - 1 , there is a fault 102 that causes the ICCP link 28 to fail. Again, the fault 102 could be for any reason.
- the member nodes 14 , 16 detect the ICCP link 28 failure and report the same to the MC-LAG 10 .
- the active member RG node 14 goes to standalone (active), and the SYNC bit remains at 1 on the operational links in the ports 34 .
- step 350 - 3 if the standby RG member node 16 is configured with the AON switchover process 300 enabled, the standby RG member node 16 goes to a standalone mode, but non-distributing, keeping the SYNC bits set at 0 for all links in the ports 36 .
- the standby RG member node 16 monitors the LACPDUs from the DHD 18 on the ports 36 .
- the DHD 18 determines the fault 104 on the ports 34 and since this is N:N redundancy, the DHD 18 selects a standby port as active on the ports 36 setting the SYNC bit to 1.
- the last link in the ports 34 fails.
- the active RG member node 14 goes into standalone, non-distributing and the SYNC bits are 0 on all links on the ports 34 .
- the DHD 18 selects another standby port of the ports 36 to set as active and sets the SYNC bit to 1.
- the standby RG member node 16 sets the SYNC bit to 1 on all of the ports 36 since the DHD 18 also has the SYNC bit set to 1 on all of the ports 36 and the ports 36 go into distribution, such that the traffic switches from the ports 34 to the ports 36 .
- FIG. 8 illustrates an example network element 400 for the systems and methods described herein.
- the network element 400 is an Ethernet, MPLS, IP, etc. network switch, but those of ordinary skill in the art will recognize the systems and methods described herein can operate with other types of network elements and other implementations.
- the network element 400 can be the RG member nodes 14 , 16 .
- the network element 400 can be the DHD 18 as well.
- the network element 400 includes a plurality of blades 402 , 404 interconnected via an interface 406 .
- the blades 402 , 404 are also known as line cards, line modules, circuit packs, pluggable modules, etc. and generally refer to components mounted on a chassis, shelf, etc.
- Each of the blades 402 , 404 can include numerous electronic devices and optical devices mounted on a circuit board along with various interconnects including interfaces to the chassis, shelf, etc.
- the network element 400 is illustrated in an oversimplified manner and may include other components and functionality.
- the line blades 402 include data ports 408 such as a plurality of Ethernet ports.
- the line blade 402 can include a plurality of physical ports disposed on an exterior of the blade 402 for receiving ingress/egress connections.
- the line blades 402 can include switching components to form a switching fabric via the interface 406 between all of the data ports 408 allowing data traffic to be switched between the data ports 408 on the various line blades 402 .
- the switching fabric is a combination of hardware, software, firmware, etc. that moves data coming into the network element 400 out by the correct port 408 to the next network element 400 .
- Switching fabric includes switching units, or individual boxes, in a node; integrated circuits contained in the switching units; and programming that allows switching paths to be controlled. Note, the switching fabric can be distributed on the blades 402 , 404 , in a separate blade (not shown), or a combination thereof.
- the line blades 402 can include an Ethernet manager (i.e., a processor) and a Network Processor (NP)/Application Specific Integrated Circuit (ASIC).
- NP Network Processor
- ASIC Application Specific Integrated Circuit
- the control blades 404 include a microprocessor 410 , memory 412 , software 414 , and a network interface 416 .
- the microprocessor 410 , the memory 412 , and the software 414 can collectively control, configure, provision, monitor, etc. the network element 400 .
- the network interface 416 may be utilized to communicate with an element manager, a network management system, etc.
- the control blades 404 can include a database 420 that tracks and maintains provisioning, configuration, operational data and the like.
- the network element 400 includes two control blades 404 which may operate in a redundant or protected configuration such as 1:1, 1+1, etc.
- the control blades 404 maintain dynamic system information including packet forwarding databases, protocol state machines, and the operational status of the ports 408 within the network element 400 .
- the various components of the network element 400 can be configured to implement the AON switchover process 300 .
- processors such as microprocessors; Central Processing Units (CPUs); Digital Signal Processors (DSPs): customized processors such as Network Processors (NPs) or Network Processing Units (NPUs), Graphics Processing Units (GPUs), or the like; Field Programmable Gate Arrays (FPGAs); and the like along with unique stored program instructions (including both software and firmware) for control thereof to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the methods and/or systems described herein.
- processors such as microprocessors; Central Processing Units (CPUs); Digital Signal Processors (DSPs): customized processors such as Network Processors (NPs) or Network Processing Units (NPUs), Graphics Processing Units (GPUs), or the like; Field Programmable Gate Arrays (FPGAs); and the like along with unique stored program instructions (including both software and firmware) for control thereof to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of
- circuitry configured or adapted to
- logic configured or adapted to
- some embodiments may include a non-transitory computer-readable storage medium having computer readable code stored thereon for programming a computer, server, appliance, device, processor, circuit, etc. each of which may include a processor to perform functions as described and claimed herein.
- Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory), Flash memory, and the like.
- software can include instructions executable by a processor or device (e.g., any type of programmable circuitry or logic) that, in response to such execution, cause a processor or the device to perform a set of operations, steps, methods, processes, algorithms, functions, techniques, etc. as described herein for the various exemplary embodiments.
- a processor or device e.g., any type of programmable circuitry or logic
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Systems and methods utilize an all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network. A standby node in the MC-LAG network can perform the steps of remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; monitoring frames transmitted by the common endpoint to the standby node over the standby links; and determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon.
Description
- The present disclosure generally relates to networking systems and methods. More particularly, the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs).
- Link aggregation relates to combining various network connections in parallel to increase throughput, beyond what a single connection could sustain, and to provide redundancy between the links. Link aggregation including the Link Aggregation Control Protocol (LACP) for Ethernet is defined in IEEE 802.1AX, IEEE 802.1aq, IEEE 802.3ad, as well in various proprietary solutions. IEEE 802.1AX-2008 and IEEE 802.1AX-2014 are entitled Link Aggregation, the contents of which are incorporated by reference. IEEE 802.1aq-2012 is entitled Shortest Path Bridging, the contents of which are incorporated by reference. IEEE 802.3ad-2000 is entitled Link Aggregation, the contents of which are incorporated by reference. Multi-Chassis Link Aggregation Group (MC-LAG), is a type of LAG with constituent ports that terminate on separate chassis, primarily for the purpose of providing nodal redundancy in the event one of the chassis fails. The relevant standards for LAG do not mention MC-LAG, but do not preclude it. MC-LAG implementation varies by vendor.
- LAG is a technique for inverse multiplexing over multiple Ethernet links, thereby increasing bandwidth and providing redundancy. IEEE 802.1AX-2008 states “Link Aggregation allows one or more links to be aggregated together to form a Link Aggregation Group, such that a MAC (Media Access Control) client can treat the Link Aggregation Group as if it were a single link.” This layer 2 transparency is achieved by LAG using a single MAC address for all the device's ports in the LAG group. LAG can be configured as either static or dynamic. Dynamic LAG uses a peer-to-peer protocol for control, called Link Aggregation Control Protocol (LACP). This LACP protocol is also defined within the 802.1AX-2008 standard the entirety of which is incorporated herein by reference.
- LAG can be implemented in multiple ways, namely LAG N and LAG N+N/M+N. LAG N is the load sharing mode of LAG and LAG N+N/M+N provides the redundancy. The LAG N protocol automatically distributes and load balances the traffic across the working links within a LAG, thus maximizing the use of the group if Ethernet links go down or come back up, providing improved resilience and throughput. For a different style of resilience between two nodes, a complete implementation of the LACP protocol supports separate worker/standby LAG subgroups. For LAG N+N, the work links as a group will failover to the standby links if any one or more or all of the links in the worker group fail. Note, LACP marks links as in standby mode using an “out of sync” flag.
- Advantages of Link Aggregation include increased throughput/bandwidth (physical link capacity*number of physical links), load balancing across aggregated links and link-level redundancy (failure of a link does not result in a traffic drop; rather standby links can take over as active role for traffic distribution). One of the limitations of Link Aggregation is that it does not provide node-level redundancy. If one end of a LAG fails, it leads to a complete traffic drop as there is no other data path available for the data traffic to be switched to the other node. To solve this problem, “Multi-Chassis” Link Aggregation Group (MC-LAG) is introduced, that provides a node-level redundancy in addition to link-level redundancy and other merits provided by LAG.
- MC-LAG allows two or more nodes (referred to herein as a Redundant Group (RG)) to share a common LAG endpoint (Dual Homing Device (DHD)). The multiple nodes present a single logical LAG to the remote end. Note that MC-LAG implementations are vendor-specific, but cooperating chassis remain externally compliant to the IEEE 802.1AX-2008 standard. Nodes in an MC-LAG cluster communicate to synchronize and negotiate automatic switchovers (failover). Some implementations may support administrator-initiated (manual) switchovers.
- The multiple nodes in the redundant group maintain some form of adjacency with one another, such as the Inter-Chassis Communication Protocol (ICCP). Since the redundant group requires the adjacency to operate the MC-LAG, a loss in the adjacency (for any reason including a link fault, a nodal fault, etc.) results in a so-called split-brain problem where all peers in the redundant group attempt to take an active role considering corresponding peers as operationally down. This can lead to the introduction of loops in the MC-LAG network and result in the rapid duplication of packets.
- Thus, there is a need for a solution to the split-brain which is solely implemented between the RG members that are interoperable with any vendor supporting standard LACP on the DHD and which does not increase switchover time.
- There are some conventional solutions to addressing this problem. One conventional solution introduces configuration changes on the common LAG endpoint where the DHD detects the split-brain and configures packet flow accordingly. However, this solution is a proprietary solution requiring the DHD to participate in the MC-LAG. It would be advantageous to avoid configuration on the DHD due to the split-brain problem since the DHD may or may not be aware of the MC-LAG, preferably, the DHD may simply think it is participating in a conventional LAG supporting standard LACP. Another conventional solution includes changing the system MACs on RG members during a split-brain along with the use of an out-of-band management channel as a backup to verify communication between the RG members. However, this solution may lead to a significant switchover time since the underlying LACP would have to re-converge with the new system MACs.
- In an embodiment, a method utilizing all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network implemented by a standby node includes remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; monitoring frames transmitted by the common endpoint to the standby node over the standby links; and determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon. The method can further include determining based on the monitoring frames whether less than all of the active links have failed and remaining in the standby state and continuing monitoring the frames transmitted by the common endpoint over the standby links based thereon. The monitoring can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links. The common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
- The common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node. The common endpoint can be unaware the active node and the standby node are in separate network elements. The loss of adjacency with the active node can be based on a failure or fault on a link between the active node and the standby node used for coordination of the active node and the standby node in a Redundant Group, while the active node and the standby node are both operational.
- In another embodiment, a standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems includes a plurality of ports in a logical Link Aggregation Group (LAG) with an active node, wherein the plurality of ports form standby links with a common endpoint; a communication link with an active node; and a switching fabric between the plurality of ports, wherein the standby node is configured to remain in a standby state responsive to a loss of the communication link, wherein, in the standby state, all the standby links are non-distributing; monitor frames transmitted by the common endpoint to the standby node over the standby links; and determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
- The standby node can be further configured to determine based on the monitoring frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon. The frames can be monitored to check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links. The common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology. The common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node. The common endpoint can be unaware the active node and the standby node are in separate network elements. The loss of adjacency with the active node can be based on a failure or fault on the communication link, while the active node and the standby node are both operational.
- In a further embodiment, an apparatus configured for all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network located at a standby node includes circuitry configured to remain in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; circuitry configured to monitor frames transmitted by the common endpoint to the standby node over the standby links; and circuitry configured to determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
- The apparatus can further include circuitry configured to determine based on the monitored frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon. The circuitry configured to monitor can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links. The common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology. The common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node. The common endpoint can be unaware the active node and the standby node are in separate network elements.
- The proposed solution is illustrated and described herein with reference to the various drawings, in which like reference numbers are used to denote like system components/method steps, as appropriate, and in which:
-
FIG. 1 illustrates an active/standby Multi-Chassis Link Aggregation Group (MC-LAG; -
FIG. 2 illustrates the MC-LAG ofFIG. 1 with a fault and associated node-level redundancy; -
FIG. 3 illustrates the MC-LAG ofFIG. 1 with the Inter-Chassis Communication Protocol (ICCP) link failed and associated operation with no other faults; -
FIG. 4 illustrates the MC-LAG ofFIG. 1 with the ICCP link failed and associated operation with a fault on one of the active links causing the split-brain problem of the prior art; -
FIG. 5 illustrates the MC-LAG ofFIG. 1 with the ICCP link failed and associated operation with a fault on any but the last the active link in an all-or-none (AON) switchover to prevent the split-brain problem in accordance with an embodiment of the proposed solution; -
FIG. 6 illustrates the MC-LAG ofFIG. 1 with the ICCP link failed and associated operation with a fault on all of the active links in the AON switchover in accordance with an embodiment of the proposed solution; -
FIG. 7 illustrates a flowchart of an AON switchover process in accordance with an embodiment of the proposed solution implemented by the standby RG member node subsequent to the loss of connectivity with the active Redundant Group (RG) member node such as due to the fault on the ICCP link; and -
FIG. 8 illustrates an example network element for the proposed systems and methods described herein. - In various embodiments, the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs). In particular, the systems and method solve the split-brain problem in an active/standby MC-LAG in a triangle topology (a DHD connected to a plurality of RG members). The proposed systems and methods are implemented between the RG members only without the involvement of the DHD; thus, the systems and methods can interoperate with any vendor's DHD. Also, the systems and methods do not change system MAC addresses thereby avoiding increased switchover time.
-
FIG. 1 illustrates an active/standby MC-LAG 10. MC-LAG 10 simply means dual-homing an endpoint to two or more upstream devices, i.e., allowing two or more upstream nodes to share a common endpoint thereby providing node-level redundancy. The MC-LAG 10 includes a Redundant Group (RG) 12 which includesRG member nodes nodes DHD 18 can be Ethernet switches, routers, packet-optical devices, etc. supporting Layer 2 connectivity. Themultiple nodes RG 12 present a singlelogical LAG interface 20 which is an MC-LAG to aDHD LAG 22. Specifically, thenodes separate LAG logical LAG interface 20 based on adjacency and coordination between thenodes RG 12 can appear to theDHD 18 as a single node with thelogical LAG interface 20. - In order to present the
RG 12 as thelogical LAG interface 20, thenodes nodes nodes nodes network 30, such as a G.8032 Ethernet network, a Multiprotocol Label Switching (MPLS) network, or the like. The ICCP link 28 can be a physical connection in thenetwork 30. Also, theICCP link 28 can be a dedicated link between thenodes -
RG 12 implementation is typically vendor-specific, i.e., not specified by the relevant LAG standards. However, in general, the objective of theRG 12 is to present thenodes logical LAG interface 20 as a single virtual endpoint to a standards-basedLAG DHD 18. Various vendors use different terminology for the MC-LAG which include: MLAG, distributed split multi-link trunking, multi-chassis trunking, MLAG, etc. The proposed systems and methods described herein can apply to any implementation of theRG 12 and seek to avoid coordination with theDHD 18 such that theRG 12 appears to any LAG-compliant DHD 12 as the singlelogical LAG interface 20. Also, other terminology may be used for theICCP link 28, but the objective is the same—to enable adjacency and coordination between thenodes - The ICCP link 28 can be monitored via keep-alive message exchanges that deem this link operational. For faster ICCP Link Failure detection/recovery, Connectivity Fault Management (CFM) or Bidirectional Forwarding Detection (BFD) services can be configured across the
RG member nodes - In the example of
FIG. 1 , theDHD 18 includes fourports 32 into theLAG 22, twoports 34 are active and connected to theLAG 26 and twoports 36 that are standby connected to theLAG 24. In this manner, the MC-LAG 10 is an active/standby MC-LAG. From the perspective of theDHD 18, the fourports 32 appear as a standard LAG, and theDHD 18 is unaware that theports separate nodes RG member nodes DHD 18's perspective. -
FIG. 2 illustrates the MC-LAG 10 with afault 50 and associated node-level redundancy. Specifically,FIG. 2 illustrates twostates state 52, theports 34 are active such that thenode 14 is the active RG member node and theports 36 are standby such that thenode 16 is the standby RG member node. In LACP, theports DHD 18 and thenodes fault 50, theports 34 have the LACPDU SYNC bits set to 1 indicating theports 34 are active and theports 36 have the LACPDU SYNC bits set to 0 indicating theports 36 are standby. - At step 60-1, assume the
node 14 fails, and the active RG member node's failure causes protection switching of traffic to the standbyRG member node 16. As soon as the standbyRG member node 16 losses connectivity with active RG member node 14 (theICCP link 28 failure in step 60-2 due to the fault 50), the standbyRG member node 16 takes the active role by setting the SYNC bit=1 on all itsmember ports 36 at step 60-3. Since theDHD 18 also gets a link failure for all active links on theports 34 at step 60-4, all the standby links on theDHD 18 take the active role by setting their SYNC bit=1 at step 60-5. This makes the backup links “distributing” and hence, traffic switches to the new active RG member node 16 (node-level redundancy). - An MC-LAG supports a triangle, square, and mesh topology. Particularly, the disclosure herein focuses on the split-brain problem and solution in the MC-LAG triangle topology such that the
DHD 18 is not required to participate in the diagnosis or correction and such that theports - The split-brain problem is an industry-wide known problem that happens in the case of dual homing. It may occur when communication between two MC-
LAG nodes ICCP link 28 failed/operational down) while both thenodes nodes LAG 10 network and can result in rapid duplication of packets at theDHD 18. - The ICCP link 28 communication can be lost between the
nodes ICCP link 28 only on oneRG member node RG member nodes ICCP link 28, configuring CFM Maintenance End Points (MEPs) incorrectly that may result in MEP Faults (MEP Faults will be propagated to theICCP link 28 deeming theICCP link 28 operationally down), etc. Network congestion may lead to CFM/BFD/ICCP frame-loss that in-turn may cause theICCP link 28 to appear operationally down while some data traffic may still be switched across. For network errors, high bit errors may result in CFM/BFD/ICCP packet drops. For hardware failure, Operations, Administration, and Maintenance (OAM) engine failures may result in faults in theICCP link 28 monitoring. For example, the OAM engine may be implemented in hardware as a Field Programmable Gate Array (FPGA), a Network Processor Unit (NPU), an Application Specific Integrated Circuit (ASIC), etc. -
FIG. 3 illustrates the MC-LAG 10 with theICCP link 28 failed and associated operation with no other faults. At step 100-1, there is afault 102 that causes theICCP link 28 to fail. The reason forfault 102 is irrelevant. At step 100-2, since theICCP link 28 connectivity is lost between theRG member nodes RG member nodes member ports node 14 already is the active node, so thenode 14 does not change the SYNC bit, but thenode 16 is in standby and goes into standalone active at step 100-3. - This scenario, however, does not cause the split-brain problem to occur because of the configured link-level redundancy (N:N) on the
DHD 18. Since all N links on theports 34 from the activeRG member node 14 are active, theDHD 18 does not set its SYNC bit on the N standby links on theports 36 at step 100-4. This prevents the standby path from going to the distribution state even though standby RG member node 16 (after taking the new active role) sets the SYNC Bit to 1 on the backup path. -
FIG. 4 illustrates the MC-LAG 10 with theICCP link 28 failed and associated operation with afault 104 on one of the active links (34) causing the split-brain problem. At step 150-1, there isfault 102 that causes theICCP link 28 to fail. Again, thefault 102 could be for any reason. At step 150-2, since theICCP link 28 connectivity is lost between theRG member nodes RG member nodes member ports - An issue, however, arises if any distributing link fails on the
ports 34 between theDHD 18 and the activeRG member node 14. At step 150-3, thefault 104 causes a failure on one of theports 34, and the SYNC bit is 0 and unable to send on this port. In this scenario, theDHD 18, unaware of thefault 102 affecting theICCP link 28, selects one of the standby links on theports 36 to take an active role and sets its SYNC Bit to 1 at step 150-4. - The SYNC bit has already been set to 1 on the standby
RG member node 16 because of theICCP link 28fault 102. Thus, the backup path on theports 36 goes to the distribution state. Since, there is at least one link distributing from theDHD 18 to both theRG member nodes member nodes DHD 18 and eachRG member node DHD 18 and eachRG member node -
FIGS. 5 and 6 illustrate the MC-LAG 10 with theICCP link 28 failed and associated operation with afault 104 on one of the active links with an all-or-none (AON) switchover to prevent the split-brain problem in accordance with the proposed solution. Specifically,FIG. 5 illustrates the MC-LAG 10 with theICCP link 28 failed and associated operation with afault 104 on any but the last the active link (34) in the AON switchover.FIG. 6 illustrates the MC-LAG 10 with theICCP link 28 failed and associated operation withfault 104 on all of the active links in the AON switchover. - The AON switchover can be implemented by each of the
RG member nodes RG member node 16 will only take the active role when all of the active links (34) on the activeRG member node 14 fail. Of course, theRG member nodes fault 102 and the lack of adjacency. Instead, this is achieved by making optimal use of the SYNC bit as employed byDHD 18. When theICCP link 28 goes down operationally, the standbyRG member node 16 will not set its member's SYNC bit to 1 immediately, but rather rely on theDHD 18 port's SYNC bits in order to set its member's (16) SYNC bit. The standbyRG member node 16 will set its port's SYNC Bits to 1 only if receives SYNC bit=1 on all the operational ports from theDHD 18. - The AON switchover eliminates a loop during split brain situation where MC-
LAG 10 is configured with N:N link redundancy and there is no link failure on the standby path (on the ports 36). With the AON switchover, when theICCP link 28 fails, the standbyRG member node 16 will not go active and will keep the SYNC Bits to FALSE (0) and will keep monitoring the SYNC bits coming from theDHD 18. Again, theDHD 18 may not know it is in the MC-LAG but rather assume this is a standard LAG. This AON switchover approach does not require theDHD 18 to have a special configuration, but rather operate standard LACP. Further, the AON switchover does not require new MAC addresses and/or re-convergence. - If
RG member nodes RG member node 16 should be upgraded first (before active RG member node 14). -
FIG. 7 is a flowchart of anAON switchover process 300 implemented by the standbyRG member node 16 subsequent to the loss of connectivity with the activeRG member node 14 such as due to thefault 102 on theICCP link 28. The standbyRG member node 16 performs theAON switchover process 300 to eliminate chances that the split-brain problem may cause a loop. The standbyRG member node 16 begins theAON switchover process 300 subsequent to the loss of adjacency with the active RG member node 14 (step 302). Subsequent to loss of adjacency (theICCP link 28 failure), the standbyRG member node 16 remains in the standby state on all of theports 36 keeping the SYNC bits set to 0 with the standbyRG member node 16 monitoring LACPDUs from theDHD 18 for their associated SYNC bit (step 304). Specifically, this monitoring does not require theDHD 18 to make changes, but simply assumesDHD 18 to operate standard LACP in an N:N link-level redundancy scheme. - The standby
RG member node 16 can infer the operational status of theactive ports 34 based on the SYNC bits from theDHD 18 on thestandby ports 36. Specifically, the standbyRG member node 16 knows the value of N (N:N) and can infer the number of active/failed links on theports 34 based on the number of SYNC bit values equal to 1 coming from theDHD 18 on theports 36. Thus, theAON switchover process 300 operates in a triangle MC-LAG with N:N active/standby configurations. - Based on the monitoring, the standby
RG member node 16 can determine if any active links have failed (step 306). Specifically, no active links have failed if none of theports 36 have the SYNC bit set to 0 coming from theDHD 18 and the standbyRG member node 16 remains, (step 304), in the standby state on all of theports 36 keeping the SYNC bits set to 0 and the standbyRG member node 16 monitors LACPDUs from theDHD 18 for their associated SYNC bit (step 306). - There are active links failed if any link on the
ports 36 has the SYNC bit set to 1 coming from the DHD 18 (step 306). The standbyRG member node 16 determines whether all of the active links have failed or whether some, but not all of the active links have failed (step 306). The standbyRG member node 16 will only become active when all of the active links (34) have failed. This prevents the loops and does not require coordination with theDHD 18 or changes to system MAC addresses. - The standby
RG member node 16 can determine whether or not all of the active links have failed by determining the number of links on theports 36 from theDHD 18 which are showing the SYNC bit as 1. That is, if all of theports 36 are showing LACPDUs from theDHD 18 with the SYNC bit as 1, then all of the active links (34) have failed, i.e., N links on theports 36 show SYNC=1 from theDHD 18 then the N links on theports 34 are failed. - If not all of the active links have failed (step 306), then the standby
RG member node 16 remains in the standby state on all ports keeping the SYNC bits set to 0 and continues to monitor LACPDUs from the DHD 18 (step 304). If all of the active links (34) have failed (step 308), the standby RG member node enters the active state on allports 36 changing the SYNC bits to 1 (step 308). This will result in the backup path going to distribution state and traffic will resume after protection switching. - Again, the
AON switchover process 300 is implemented on theRG 12 and therefore is interoperable with any vendor'sDHD 18 supporting standard LACP and the switchover time is not compromised since no re-convergence is required. Also, theAON switchover process 300 can be configurable and selectively enabled/disabled on both of themember nodes - Referring back to
FIGS. 5 and 6 , an operation of theAON switchover process 300 is illustrated. InFIG. 5 , similar toFIG. 4 , at step 350-1, there is afault 102 that causes theICCP link 28 to fail. Again, thefault 102 could be for any reason. At step 350-2, themember nodes ICCP link 28 failure and report the same to the MC-LAG 10. At step 350-3, the activemember RG node 14 goes to standalone (active), and the SYNC bit remains at 1 on the operational links in theports 34. Also at step 350-3, if the standbyRG member node 16 is configured with theAON switchover process 300 enabled, the standbyRG member node 16 goes to a standalone mode, but non-distributing, keeping the SYNC bits set at 0 for all links in theports 36. - Now, in the standalone mode, but non-distributing, the standby
RG member node 16 monitors the LACPDUs from theDHD 18 on theports 36. At step 350-4, theDHD 18 determines thefault 104 on theports 34 and since this is N:N redundancy, theDHD 18 selects a standby port as active on theports 36 setting the SYNC bit to 1. Note, since the standbyRG member node 16 is operating theAON switchover process 300, the standbyRG member node 16 remains in the standalone mode, but non-distributing with all links in theports 36 transmitting SYNC=0 to theDHD 18. - In
FIG. 6 , at step 350-5, the last link in theports 34 fails. The activeRG member node 14 goes into standalone, non-distributing and the SYNC bits are 0 on all links on theports 34. At step 350-6, theDHD 18 selects another standby port of theports 36 to set as active and sets the SYNC bit to 1. At step 350-7, the standbyRG member node 16 determines that all of the active links (34) have failed. In this example, this is due to theDHD 18 sending SYNC=1 on two ports of theports 36, N=2 here. At this point, (350-7) the standbyRG member node 16 sets the SYNC bit to 1 on all of theports 36 since theDHD 18 also has the SYNC bit set to 1 on all of theports 36 and theports 36 go into distribution, such that the traffic switches from theports 34 to theports 36. -
FIG. 8 illustrates anexample network element 400 for the systems and methods described herein. In this embodiment, thenetwork element 400 is an Ethernet, MPLS, IP, etc. network switch, but those of ordinary skill in the art will recognize the systems and methods described herein can operate with other types of network elements and other implementations. Specifically, thenetwork element 400 can be theRG member nodes network element 400 can be theDHD 18 as well. In this embodiment, thenetwork element 400 includes a plurality ofblades interface 406. Theblades network element 400. Each of theblades network element 400 is illustrated in an oversimplified manner and may include other components and functionality. - Two blades are illustrated with
line blades 402 andcontrol blades 404. Theline blades 402 includedata ports 408 such as a plurality of Ethernet ports. For example, theline blade 402 can include a plurality of physical ports disposed on an exterior of theblade 402 for receiving ingress/egress connections. Additionally, theline blades 402 can include switching components to form a switching fabric via theinterface 406 between all of thedata ports 408 allowing data traffic to be switched between thedata ports 408 on thevarious line blades 402. The switching fabric is a combination of hardware, software, firmware, etc. that moves data coming into thenetwork element 400 out by thecorrect port 408 to thenext network element 400. “Switching fabric” includes switching units, or individual boxes, in a node; integrated circuits contained in the switching units; and programming that allows switching paths to be controlled. Note, the switching fabric can be distributed on theblades line blades 402 can include an Ethernet manager (i.e., a processor) and a Network Processor (NP)/Application Specific Integrated Circuit (ASIC). - The
control blades 404 include amicroprocessor 410,memory 412,software 414, and anetwork interface 416. Specifically, themicroprocessor 410, thememory 412, and thesoftware 414 can collectively control, configure, provision, monitor, etc. thenetwork element 400. Thenetwork interface 416 may be utilized to communicate with an element manager, a network management system, etc. Additionally, thecontrol blades 404 can include adatabase 420 that tracks and maintains provisioning, configuration, operational data and the like. In this embodiment, thenetwork element 400 includes twocontrol blades 404 which may operate in a redundant or protected configuration such as 1:1, 1+1, etc. In general, thecontrol blades 404 maintain dynamic system information including packet forwarding databases, protocol state machines, and the operational status of theports 408 within thenetwork element 400. - When operating as the standby
RG member node 16, the various components of thenetwork element 400 can be configured to implement theAON switchover process 300. - It will be appreciated that some embodiments described herein may include one or more generic or specialized processors (“one or more processors”) such as microprocessors; Central Processing Units (CPUs); Digital Signal Processors (DSPs): customized processors such as Network Processors (NPs) or Network Processing Units (NPUs), Graphics Processing Units (GPUs), or the like; Field Programmable Gate Arrays (FPGAs); and the like along with unique stored program instructions (including both software and firmware) for control thereof to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the methods and/or systems described herein. Alternatively, some or all functions may be implemented by a state machine that has no stored program instructions, or in one or more Application Specific Integrated Circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic or circuitry. Of course, a combination of the aforementioned approaches may be used. For some of the embodiments described herein, a corresponding device in hardware and optionally with software, firmware, and a combination thereof can be referred to as “circuitry configured or adapted to,” “logic configured or adapted to,” etc. perform a set of operations, steps, methods, processes, algorithms, functions, techniques, etc. on digital and/or analog signals as described herein for the various embodiments.
- Moreover, some embodiments may include a non-transitory computer-readable storage medium having computer readable code stored thereon for programming a computer, server, appliance, device, processor, circuit, etc. each of which may include a processor to perform functions as described and claimed herein. Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory), Flash memory, and the like. When stored in the non-transitory computer readable medium, software can include instructions executable by a processor or device (e.g., any type of programmable circuitry or logic) that, in response to such execution, cause a processor or the device to perform a set of operations, steps, methods, processes, algorithms, functions, techniques, etc. as described herein for the various exemplary embodiments.
- Although the present disclosure has been illustrated and described herein with reference to preferred embodiments and specific examples thereof, it will be readily apparent to those of ordinary skill in the art that other embodiments and examples may perform similar functions and/or achieve like results. All such equivalent embodiments and examples are within the spirit and scope of the present disclosure, are contemplated thereby, and are intended to be covered by the following claims.
Claims (20)
1. A method utilizing all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network implemented by a standby node, the method comprising:
remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing;
monitoring frames transmitted by the common endpoint to the standby node over the standby links; and
determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon.
2. The method of claim 1 , further comprising:
determining based on the monitoring frames whether less than all of the active links have failed and remaining in the standby state and continuing monitoring the frames transmitted by the common endpoint over the standby links based thereon.
3. The method of claim 1 , wherein the monitoring checks for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
4. The method of claim 1 , wherein the common endpoint is communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
5. The method of claim 1 , wherein the common endpoint is configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
6. The method of claim 1 , wherein the common endpoint is unaware the active node and the standby node are in separate network elements.
7. The method of claim 1 , wherein the loss of adjacency with the active node is based on a failure or fault on a link between the active node and the standby node used for coordination of the active node and the standby node in a Redundant Group, while the active node and the standby node are both operational.
8. A standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems, the standby node comprising:
a plurality of ports in a logical Link Aggregation Group (LAG) with an active node, wherein the plurality of ports form standby links with a common endpoint;
a communication link with an active node; and
a switching fabric between the plurality of ports,
wherein the standby node is configured to
remain in a standby state responsive to a loss of the communication link, wherein, in the standby state, all the standby links are non-distributing;
monitor frames transmitted by the common endpoint to the standby node over the standby links; and
determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
9. The standby node of claim 8 , wherein the standby node is further configured to
determine based on the monitoring frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
10. The standby node of claim 8 , wherein the frames are monitored to check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
11. The standby node of claim 8 , wherein the common endpoint is communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
12. The standby node of claim 8 , wherein the common endpoint is configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
13. The standby node of claim 8 , wherein the common endpoint is unaware the active node and the standby node are in separate network elements.
14. The standby node of claim 8 , wherein the loss of adjacency with the active node is based on a failure or fault on the communication link, while the active node and the standby node are both operational.
15. An apparatus configured for all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network located at a standby node, the apparatus comprising:
circuitry configured to remain in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing;
circuitry configured to monitor frames transmitted by the common endpoint to the standby node over the standby links; and
circuitry configured to determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
16. The apparatus of claim 15 , further comprising:
circuitry configured to determine based on the monitored frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
17. The apparatus of claim 15 , wherein the circuitry configured to monitor checks for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
18. The apparatus of claim 15 , wherein the common endpoint is communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
19. The apparatus of claim 15 , wherein the common endpoint is configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
20. The apparatus of claim 15 , wherein the common endpoint is unaware the active node and the standby node are in separate network elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/611,283 US10164873B1 (en) | 2017-06-01 | 2017-06-01 | All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/611,283 US10164873B1 (en) | 2017-06-01 | 2017-06-01 | All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180351855A1 true US20180351855A1 (en) | 2018-12-06 |
US10164873B1 US10164873B1 (en) | 2018-12-25 |
Family
ID=64459042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/611,283 Active 2037-09-01 US10164873B1 (en) | 2017-06-01 | 2017-06-01 | All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups |
Country Status (1)
Country | Link |
---|---|
US (1) | US10164873B1 (en) |
Cited By (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190089590A1 (en) * | 2017-09-19 | 2019-03-21 | Cisco Technology, Inc. | Detection and Configuration of a Logical Channel |
US20190140889A1 (en) * | 2017-11-09 | 2019-05-09 | Nicira, Inc. | Method and system of a high availability enhancements to a computer network |
US10523539B2 (en) | 2017-06-22 | 2019-12-31 | Nicira, Inc. | Method and system of resiliency in cloud-delivered SD-WAN |
US20200007666A1 (en) * | 2018-06-27 | 2020-01-02 | T-Mobile Usa, Inc. | Micro-level network node failover system |
US10594516B2 (en) | 2017-10-02 | 2020-03-17 | Vmware, Inc. | Virtual network provider |
US10749711B2 (en) | 2013-07-10 | 2020-08-18 | Nicira, Inc. | Network-link method useful for a last-mile connectivity in an edge-gateway multipath system |
US10771317B1 (en) * | 2018-11-13 | 2020-09-08 | Juniper Networks, Inc. | Reducing traffic loss during link failure in an ethernet virtual private network multihoming topology |
US10778528B2 (en) | 2017-02-11 | 2020-09-15 | Nicira, Inc. | Method and system of connecting to a multipath hub in a cluster |
US10805272B2 (en) | 2015-04-13 | 2020-10-13 | Nicira, Inc. | Method and system of establishing a virtual private network in a cloud service for branch networking |
WO2021025826A1 (en) * | 2019-08-02 | 2021-02-11 | Ciena Corporation | Retaining active operations, administration, and maintenance (oam) sessions across multiple devices operating as a single logical device |
US10959098B2 (en) | 2017-10-02 | 2021-03-23 | Vmware, Inc. | Dynamically specifying multiple public cloud edge nodes to connect to an external multi-computer node |
US10992558B1 (en) | 2017-11-06 | 2021-04-27 | Vmware, Inc. | Method and apparatus for distributed data network traffic optimization |
US10992568B2 (en) | 2017-01-31 | 2021-04-27 | Vmware, Inc. | High performance software-defined core network |
US10999100B2 (en) | 2017-10-02 | 2021-05-04 | Vmware, Inc. | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider |
US10999165B2 (en) | 2017-10-02 | 2021-05-04 | Vmware, Inc. | Three tiers of SaaS providers for deploying compute and network infrastructure in the public cloud |
US10999137B2 (en) | 2019-08-27 | 2021-05-04 | Vmware, Inc. | Providing recommendations for implementing virtual networks |
US11012369B2 (en) * | 2019-07-05 | 2021-05-18 | Dell Products L.P. | Aggregated switch path optimization system |
US20210160318A1 (en) * | 2014-06-04 | 2021-05-27 | Pure Storage, Inc. | Scale out storage platform having active failover |
US11044190B2 (en) | 2019-10-28 | 2021-06-22 | Vmware, Inc. | Managing forwarding elements at edge nodes connected to a virtual network |
US11050588B2 (en) | 2013-07-10 | 2021-06-29 | Nicira, Inc. | Method and system of overlay flow control |
US11088963B2 (en) * | 2019-12-16 | 2021-08-10 | Dell Products L.P. | Automatic aggregated networking device backup link configuration system |
US11089111B2 (en) | 2017-10-02 | 2021-08-10 | Vmware, Inc. | Layer four optimization for a virtual network defined over public cloud |
US11115480B2 (en) | 2017-10-02 | 2021-09-07 | Vmware, Inc. | Layer four optimization for a virtual network defined over public cloud |
US11121962B2 (en) | 2017-01-31 | 2021-09-14 | Vmware, Inc. | High performance software-defined core network |
CN113676405A (en) * | 2021-08-18 | 2021-11-19 | 上海晨驭信息科技有限公司 | Load sharing-based rapid link master-slave switching distributed system and method |
US11206197B2 (en) | 2017-04-05 | 2021-12-21 | Ciena Corporation | Scaling operations, administration, and maintenance sessions in packet networks |
US11245641B2 (en) | 2020-07-02 | 2022-02-08 | Vmware, Inc. | Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN |
US11252079B2 (en) | 2017-01-31 | 2022-02-15 | Vmware, Inc. | High performance software-defined core network |
US11363124B2 (en) | 2020-07-30 | 2022-06-14 | Vmware, Inc. | Zero copy socket splicing |
US11375005B1 (en) | 2021-07-24 | 2022-06-28 | Vmware, Inc. | High availability solutions for a secure access service edge application |
US11374904B2 (en) | 2015-04-13 | 2022-06-28 | Nicira, Inc. | Method and system of a cloud-based multipath routing protocol |
US11381499B1 (en) | 2021-05-03 | 2022-07-05 | Vmware, Inc. | Routing meshes for facilitating routing through an SD-WAN |
US11394640B2 (en) | 2019-12-12 | 2022-07-19 | Vmware, Inc. | Collecting and analyzing data regarding flows associated with DPI parameters |
US20220247631A1 (en) * | 2019-05-28 | 2022-08-04 | Nippon Telegraph And Telephone Corporation | Network management apparatus and method |
US11418997B2 (en) | 2020-01-24 | 2022-08-16 | Vmware, Inc. | Using heart beats to monitor operational state of service classes of a QoS aware network link |
US11444872B2 (en) | 2015-04-13 | 2022-09-13 | Nicira, Inc. | Method and system of application-aware routing with crowdsourcing |
US11444865B2 (en) | 2020-11-17 | 2022-09-13 | Vmware, Inc. | Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN |
US11489720B1 (en) | 2021-06-18 | 2022-11-01 | Vmware, Inc. | Method and apparatus to evaluate resource elements and public clouds for deploying tenant deployable elements based on harvested performance metrics |
US11489783B2 (en) | 2019-12-12 | 2022-11-01 | Vmware, Inc. | Performing deep packet inspection in a software defined wide area network |
US11502943B2 (en) * | 2019-05-14 | 2022-11-15 | Hewlett Packard Enterprise Development Lp | Distributed neighbor state management for networked aggregate peers |
US11539551B2 (en) * | 2018-01-11 | 2022-12-27 | Huawei Technologies Co., Ltd. | Data transmission method, device, and network system |
US11575600B2 (en) | 2020-11-24 | 2023-02-07 | Vmware, Inc. | Tunnel-less SD-WAN |
US11601356B2 (en) | 2020-12-29 | 2023-03-07 | Vmware, Inc. | Emulating packet flows to assess network links for SD-WAN |
US11606286B2 (en) | 2017-01-31 | 2023-03-14 | Vmware, Inc. | High performance software-defined core network |
US20230103537A1 (en) * | 2020-02-27 | 2023-04-06 | Nippon Telegraph And Telephone Corporation | Communication system, network relay device, network relay method, and program |
US11706127B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | High performance software-defined core network |
US11706126B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | Method and apparatus for distributed data network traffic optimization |
US11729065B2 (en) | 2021-05-06 | 2023-08-15 | Vmware, Inc. | Methods for application defined virtual network service among multiple transport in SD-WAN |
US11792127B2 (en) | 2021-01-18 | 2023-10-17 | Vmware, Inc. | Network-aware load balancing |
US11909815B2 (en) | 2022-06-06 | 2024-02-20 | VMware LLC | Routing based on geolocation costs |
US11943146B2 (en) | 2021-10-01 | 2024-03-26 | VMware LLC | Traffic prioritization in SD-WAN |
US11979325B2 (en) | 2021-01-28 | 2024-05-07 | VMware LLC | Dynamic SD-WAN hub cluster scaling with machine learning |
US12009987B2 (en) | 2021-05-03 | 2024-06-11 | VMware LLC | Methods to support dynamic transit paths through hub clustering across branches in SD-WAN |
US12015536B2 (en) | 2021-06-18 | 2024-06-18 | VMware LLC | Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of types of resource elements in the public clouds |
US12034587B1 (en) | 2023-03-27 | 2024-07-09 | VMware LLC | Identifying and remediating anomalies in a self-healing network |
US12047282B2 (en) | 2021-07-22 | 2024-07-23 | VMware LLC | Methods for smart bandwidth aggregation based dynamic overlay selection among preferred exits in SD-WAN |
US12057993B1 (en) | 2023-03-27 | 2024-08-06 | VMware LLC | Identifying and remediating anomalies in a self-healing network |
US12066907B1 (en) | 2023-04-28 | 2024-08-20 | Netapp, Inc. | Collection of state information by nodes in a cluster to handle cluster management after master-node failover |
US12137140B2 (en) * | 2021-01-13 | 2024-11-05 | Pure Storage, Inc. | Scale out storage platform having active failover |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110535791B (en) * | 2019-06-25 | 2022-03-08 | 南京邮电大学 | Data center network based on prism structure |
US11290422B1 (en) | 2020-12-07 | 2022-03-29 | Ciena Corporation | Path-aware NAPT session management scheme with multiple paths and priorities in SD-WAN |
US11659002B2 (en) | 2021-05-04 | 2023-05-23 | Ciena Corporation | Extending Media Access Control Security (MACsec) to Network-to-Network Interfaces (NNIs) |
US12113696B2 (en) | 2022-02-01 | 2024-10-08 | Bank Of America Corporation | System and method for monitoring network processing optimization |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8817594B2 (en) | 2010-07-13 | 2014-08-26 | Telefonaktiebolaget L M Ericsson (Publ) | Technique establishing a forwarding path in a network system |
US8488608B2 (en) * | 2010-08-04 | 2013-07-16 | Alcatel Lucent | System and method for traffic distribution in a multi-chassis link aggregation |
US8902738B2 (en) * | 2012-01-04 | 2014-12-02 | Cisco Technology, Inc. | Dynamically adjusting active members in multichassis link bundle |
US8885562B2 (en) | 2012-03-28 | 2014-11-11 | Telefonaktiebolaget L M Ericsson (Publ) | Inter-chassis redundancy with coordinated traffic direction |
US20160323179A1 (en) * | 2015-04-29 | 2016-11-03 | Telefonaktiebolaget L M Ericsson (Publ) | Bng subscribers inter-chassis redundancy using mc-lag |
US10305796B2 (en) | 2015-06-01 | 2019-05-28 | Ciena Corporation | Enhanced forwarding database synchronization for media access control addresses learned in interconnected layer-2 architectures |
US9992102B2 (en) | 2015-08-28 | 2018-06-05 | Ciena Corporation | Methods and systems to select active and standby ports in link aggregation groups |
-
2017
- 2017-06-01 US US15/611,283 patent/US10164873B1/en active Active
Cited By (119)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10749711B2 (en) | 2013-07-10 | 2020-08-18 | Nicira, Inc. | Network-link method useful for a last-mile connectivity in an edge-gateway multipath system |
US11804988B2 (en) | 2013-07-10 | 2023-10-31 | Nicira, Inc. | Method and system of overlay flow control |
US11212140B2 (en) | 2013-07-10 | 2021-12-28 | Nicira, Inc. | Network-link method useful for a last-mile connectivity in an edge-gateway multipath system |
US11050588B2 (en) | 2013-07-10 | 2021-06-29 | Nicira, Inc. | Method and system of overlay flow control |
US20210160318A1 (en) * | 2014-06-04 | 2021-05-27 | Pure Storage, Inc. | Scale out storage platform having active failover |
US11374904B2 (en) | 2015-04-13 | 2022-06-28 | Nicira, Inc. | Method and system of a cloud-based multipath routing protocol |
US11677720B2 (en) | 2015-04-13 | 2023-06-13 | Nicira, Inc. | Method and system of establishing a virtual private network in a cloud service for branch networking |
US11444872B2 (en) | 2015-04-13 | 2022-09-13 | Nicira, Inc. | Method and system of application-aware routing with crowdsourcing |
US10805272B2 (en) | 2015-04-13 | 2020-10-13 | Nicira, Inc. | Method and system of establishing a virtual private network in a cloud service for branch networking |
US11706127B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | High performance software-defined core network |
US10992568B2 (en) | 2017-01-31 | 2021-04-27 | Vmware, Inc. | High performance software-defined core network |
US12058030B2 (en) | 2017-01-31 | 2024-08-06 | VMware LLC | High performance software-defined core network |
US12034630B2 (en) | 2017-01-31 | 2024-07-09 | VMware LLC | Method and apparatus for distributed data network traffic optimization |
US11606286B2 (en) | 2017-01-31 | 2023-03-14 | Vmware, Inc. | High performance software-defined core network |
US11121962B2 (en) | 2017-01-31 | 2021-09-14 | Vmware, Inc. | High performance software-defined core network |
US11252079B2 (en) | 2017-01-31 | 2022-02-15 | Vmware, Inc. | High performance software-defined core network |
US11700196B2 (en) | 2017-01-31 | 2023-07-11 | Vmware, Inc. | High performance software-defined core network |
US11706126B2 (en) | 2017-01-31 | 2023-07-18 | Vmware, Inc. | Method and apparatus for distributed data network traffic optimization |
US11349722B2 (en) | 2017-02-11 | 2022-05-31 | Nicira, Inc. | Method and system of connecting to a multipath hub in a cluster |
US10778528B2 (en) | 2017-02-11 | 2020-09-15 | Nicira, Inc. | Method and system of connecting to a multipath hub in a cluster |
US12047244B2 (en) | 2017-02-11 | 2024-07-23 | Nicira, Inc. | Method and system of connecting to a multipath hub in a cluster |
US11206197B2 (en) | 2017-04-05 | 2021-12-21 | Ciena Corporation | Scaling operations, administration, and maintenance sessions in packet networks |
US10938693B2 (en) | 2017-06-22 | 2021-03-02 | Nicira, Inc. | Method and system of resiliency in cloud-delivered SD-WAN |
US10523539B2 (en) | 2017-06-22 | 2019-12-31 | Nicira, Inc. | Method and system of resiliency in cloud-delivered SD-WAN |
US11533248B2 (en) | 2017-06-22 | 2022-12-20 | Nicira, Inc. | Method and system of resiliency in cloud-delivered SD-WAN |
US10574519B2 (en) * | 2017-09-19 | 2020-02-25 | Cisco Technology, Inc. | Detection and configuration of a logical channel |
US20190089590A1 (en) * | 2017-09-19 | 2019-03-21 | Cisco Technology, Inc. | Detection and Configuration of a Logical Channel |
US11115480B2 (en) | 2017-10-02 | 2021-09-07 | Vmware, Inc. | Layer four optimization for a virtual network defined over public cloud |
US11606225B2 (en) | 2017-10-02 | 2023-03-14 | Vmware, Inc. | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider |
US10666460B2 (en) | 2017-10-02 | 2020-05-26 | Vmware, Inc. | Measurement based routing through multiple public clouds |
US11005684B2 (en) | 2017-10-02 | 2021-05-11 | Vmware, Inc. | Creating virtual networks spanning multiple public clouds |
US10608844B2 (en) | 2017-10-02 | 2020-03-31 | Vmware, Inc. | Graph based routing through multiple public clouds |
US10999165B2 (en) | 2017-10-02 | 2021-05-04 | Vmware, Inc. | Three tiers of SaaS providers for deploying compute and network infrastructure in the public cloud |
US11089111B2 (en) | 2017-10-02 | 2021-08-10 | Vmware, Inc. | Layer four optimization for a virtual network defined over public cloud |
US11102032B2 (en) | 2017-10-02 | 2021-08-24 | Vmware, Inc. | Routing data message flow through multiple public clouds |
US11516049B2 (en) | 2017-10-02 | 2022-11-29 | Vmware, Inc. | Overlay network encapsulation to forward data message flows through multiple public cloud datacenters |
US10999100B2 (en) | 2017-10-02 | 2021-05-04 | Vmware, Inc. | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider |
US10778466B2 (en) | 2017-10-02 | 2020-09-15 | Vmware, Inc. | Processing data messages of a virtual network that are sent to and received from external service machines |
US10686625B2 (en) | 2017-10-02 | 2020-06-16 | Vmware, Inc. | Defining and distributing routes for a virtual network |
US11894949B2 (en) | 2017-10-02 | 2024-02-06 | VMware LLC | Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SaaS provider |
US11895194B2 (en) | 2017-10-02 | 2024-02-06 | VMware LLC | Layer four optimization for a virtual network defined over public cloud |
US10805114B2 (en) | 2017-10-02 | 2020-10-13 | Vmware, Inc. | Processing data messages of a virtual network that are sent to and received from external service machines |
US11855805B2 (en) | 2017-10-02 | 2023-12-26 | Vmware, Inc. | Deploying firewall for virtual network defined over public cloud infrastructure |
US10594516B2 (en) | 2017-10-02 | 2020-03-17 | Vmware, Inc. | Virtual network provider |
US10841131B2 (en) | 2017-10-02 | 2020-11-17 | Vmware, Inc. | Distributed WAN security gateway |
US10958479B2 (en) | 2017-10-02 | 2021-03-23 | Vmware, Inc. | Selecting one node from several candidate nodes in several public clouds to establish a virtual network that spans the public clouds |
US10959098B2 (en) | 2017-10-02 | 2021-03-23 | Vmware, Inc. | Dynamically specifying multiple public cloud edge nodes to connect to an external multi-computer node |
US10992558B1 (en) | 2017-11-06 | 2021-04-27 | Vmware, Inc. | Method and apparatus for distributed data network traffic optimization |
US11323307B2 (en) | 2017-11-09 | 2022-05-03 | Nicira, Inc. | Method and system of a dynamic high-availability mode based on current wide area network connectivity |
US20190140889A1 (en) * | 2017-11-09 | 2019-05-09 | Nicira, Inc. | Method and system of a high availability enhancements to a computer network |
US11902086B2 (en) * | 2017-11-09 | 2024-02-13 | Nicira, Inc. | Method and system of a dynamic high-availability mode based on current wide area network connectivity |
US11223514B2 (en) * | 2017-11-09 | 2022-01-11 | Nicira, Inc. | Method and system of a dynamic high-availability mode based on current wide area network connectivity |
US20220131740A1 (en) * | 2017-11-09 | 2022-04-28 | Nicira, Inc. | Method and system of a dynamic high-availability mode based on current wide area network connectivity |
US11539551B2 (en) * | 2018-01-11 | 2022-12-27 | Huawei Technologies Co., Ltd. | Data transmission method, device, and network system |
US20230073291A1 (en) * | 2018-01-11 | 2023-03-09 | Huawei Technologies Co., Ltd. | Data transmission method, device, and network system |
US12034568B2 (en) * | 2018-01-11 | 2024-07-09 | Huawei Technologies Co., Ltd. | Data transmission method, device, and network system |
US20200007666A1 (en) * | 2018-06-27 | 2020-01-02 | T-Mobile Usa, Inc. | Micro-level network node failover system |
US10972588B2 (en) * | 2018-06-27 | 2021-04-06 | T-Mobile Usa, Inc. | Micro-level network node failover system |
US10771317B1 (en) * | 2018-11-13 | 2020-09-08 | Juniper Networks, Inc. | Reducing traffic loss during link failure in an ethernet virtual private network multihoming topology |
US11502943B2 (en) * | 2019-05-14 | 2022-11-15 | Hewlett Packard Enterprise Development Lp | Distributed neighbor state management for networked aggregate peers |
US20220247631A1 (en) * | 2019-05-28 | 2022-08-04 | Nippon Telegraph And Telephone Corporation | Network management apparatus and method |
US11012369B2 (en) * | 2019-07-05 | 2021-05-18 | Dell Products L.P. | Aggregated switch path optimization system |
US11310102B2 (en) | 2019-08-02 | 2022-04-19 | Ciena Corporation | Retaining active operations, administration, and maintenance (OAM) sessions across multiple devices operating as a single logical device |
WO2021025826A1 (en) * | 2019-08-02 | 2021-02-11 | Ciena Corporation | Retaining active operations, administration, and maintenance (oam) sessions across multiple devices operating as a single logical device |
US11018995B2 (en) | 2019-08-27 | 2021-05-25 | Vmware, Inc. | Alleviating congestion in a virtual network deployed over public clouds for an entity |
US11252106B2 (en) | 2019-08-27 | 2022-02-15 | Vmware, Inc. | Alleviating congestion in a virtual network deployed over public clouds for an entity |
US11171885B2 (en) | 2019-08-27 | 2021-11-09 | Vmware, Inc. | Providing recommendations for implementing virtual networks |
US11212238B2 (en) | 2019-08-27 | 2021-12-28 | Vmware, Inc. | Providing recommendations for implementing virtual networks |
US11831414B2 (en) | 2019-08-27 | 2023-11-28 | Vmware, Inc. | Providing recommendations for implementing virtual networks |
US11153230B2 (en) | 2019-08-27 | 2021-10-19 | Vmware, Inc. | Having a remote device use a shared virtual network to access a dedicated virtual network defined over public clouds |
US12132671B2 (en) | 2019-08-27 | 2024-10-29 | VMware LLC | Providing recommendations for implementing virtual networks |
US11606314B2 (en) | 2019-08-27 | 2023-03-14 | Vmware, Inc. | Providing recommendations for implementing virtual networks |
US11252105B2 (en) | 2019-08-27 | 2022-02-15 | Vmware, Inc. | Identifying different SaaS optimal egress nodes for virtual networks of different entities |
US10999137B2 (en) | 2019-08-27 | 2021-05-04 | Vmware, Inc. | Providing recommendations for implementing virtual networks |
US11258728B2 (en) | 2019-08-27 | 2022-02-22 | Vmware, Inc. | Providing measurements of public cloud connections |
US11310170B2 (en) | 2019-08-27 | 2022-04-19 | Vmware, Inc. | Configuring edge nodes outside of public clouds to use routes defined through the public clouds |
US11121985B2 (en) | 2019-08-27 | 2021-09-14 | Vmware, Inc. | Defining different public cloud virtual networks for different entities based on different sets of measurements |
US11044190B2 (en) | 2019-10-28 | 2021-06-22 | Vmware, Inc. | Managing forwarding elements at edge nodes connected to a virtual network |
US11611507B2 (en) | 2019-10-28 | 2023-03-21 | Vmware, Inc. | Managing forwarding elements at edge nodes connected to a virtual network |
US11716286B2 (en) | 2019-12-12 | 2023-08-01 | Vmware, Inc. | Collecting and analyzing data regarding flows associated with DPI parameters |
US11394640B2 (en) | 2019-12-12 | 2022-07-19 | Vmware, Inc. | Collecting and analyzing data regarding flows associated with DPI parameters |
US11489783B2 (en) | 2019-12-12 | 2022-11-01 | Vmware, Inc. | Performing deep packet inspection in a software defined wide area network |
US11088963B2 (en) * | 2019-12-16 | 2021-08-10 | Dell Products L.P. | Automatic aggregated networking device backup link configuration system |
US11722925B2 (en) | 2020-01-24 | 2023-08-08 | Vmware, Inc. | Performing service class aware load balancing to distribute packets of a flow among multiple network links |
US11606712B2 (en) | 2020-01-24 | 2023-03-14 | Vmware, Inc. | Dynamically assigning service classes for a QOS aware network link |
US12041479B2 (en) | 2020-01-24 | 2024-07-16 | VMware LLC | Accurate traffic steering between links through sub-path path quality metrics |
US11438789B2 (en) | 2020-01-24 | 2022-09-06 | Vmware, Inc. | Computing and using different path quality metrics for different service classes |
US11689959B2 (en) | 2020-01-24 | 2023-06-27 | Vmware, Inc. | Generating path usability state for different sub-paths offered by a network link |
US11418997B2 (en) | 2020-01-24 | 2022-08-16 | Vmware, Inc. | Using heart beats to monitor operational state of service classes of a QoS aware network link |
US20230103537A1 (en) * | 2020-02-27 | 2023-04-06 | Nippon Telegraph And Telephone Corporation | Communication system, network relay device, network relay method, and program |
US11477127B2 (en) | 2020-07-02 | 2022-10-18 | Vmware, Inc. | Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN |
US11245641B2 (en) | 2020-07-02 | 2022-02-08 | Vmware, Inc. | Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN |
US11709710B2 (en) | 2020-07-30 | 2023-07-25 | Vmware, Inc. | Memory allocator for I/O operations |
US11363124B2 (en) | 2020-07-30 | 2022-06-14 | Vmware, Inc. | Zero copy socket splicing |
US11575591B2 (en) | 2020-11-17 | 2023-02-07 | Vmware, Inc. | Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN |
US11444865B2 (en) | 2020-11-17 | 2022-09-13 | Vmware, Inc. | Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN |
US11575600B2 (en) | 2020-11-24 | 2023-02-07 | Vmware, Inc. | Tunnel-less SD-WAN |
US11601356B2 (en) | 2020-12-29 | 2023-03-07 | Vmware, Inc. | Emulating packet flows to assess network links for SD-WAN |
US11929903B2 (en) | 2020-12-29 | 2024-03-12 | VMware LLC | Emulating packet flows to assess network links for SD-WAN |
US12137140B2 (en) * | 2021-01-13 | 2024-11-05 | Pure Storage, Inc. | Scale out storage platform having active failover |
US11792127B2 (en) | 2021-01-18 | 2023-10-17 | Vmware, Inc. | Network-aware load balancing |
US11979325B2 (en) | 2021-01-28 | 2024-05-07 | VMware LLC | Dynamic SD-WAN hub cluster scaling with machine learning |
US11582144B2 (en) | 2021-05-03 | 2023-02-14 | Vmware, Inc. | Routing mesh to provide alternate routes through SD-WAN edge forwarding nodes based on degraded operational states of SD-WAN hubs |
US11509571B1 (en) | 2021-05-03 | 2022-11-22 | Vmware, Inc. | Cost-based routing mesh for facilitating routing through an SD-WAN |
US12009987B2 (en) | 2021-05-03 | 2024-06-11 | VMware LLC | Methods to support dynamic transit paths through hub clustering across branches in SD-WAN |
US11637768B2 (en) | 2021-05-03 | 2023-04-25 | Vmware, Inc. | On demand routing mesh for routing packets through SD-WAN edge forwarding nodes in an SD-WAN |
US11388086B1 (en) | 2021-05-03 | 2022-07-12 | Vmware, Inc. | On demand routing mesh for dynamically adjusting SD-WAN edge forwarding node roles to facilitate routing through an SD-WAN |
US11381499B1 (en) | 2021-05-03 | 2022-07-05 | Vmware, Inc. | Routing meshes for facilitating routing through an SD-WAN |
US11729065B2 (en) | 2021-05-06 | 2023-08-15 | Vmware, Inc. | Methods for application defined virtual network service among multiple transport in SD-WAN |
US11489720B1 (en) | 2021-06-18 | 2022-11-01 | Vmware, Inc. | Method and apparatus to evaluate resource elements and public clouds for deploying tenant deployable elements based on harvested performance metrics |
US12015536B2 (en) | 2021-06-18 | 2024-06-18 | VMware LLC | Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of types of resource elements in the public clouds |
US12047282B2 (en) | 2021-07-22 | 2024-07-23 | VMware LLC | Methods for smart bandwidth aggregation based dynamic overlay selection among preferred exits in SD-WAN |
US11375005B1 (en) | 2021-07-24 | 2022-06-28 | Vmware, Inc. | High availability solutions for a secure access service edge application |
CN113676405A (en) * | 2021-08-18 | 2021-11-19 | 上海晨驭信息科技有限公司 | Load sharing-based rapid link master-slave switching distributed system and method |
US11943146B2 (en) | 2021-10-01 | 2024-03-26 | VMware LLC | Traffic prioritization in SD-WAN |
US11909815B2 (en) | 2022-06-06 | 2024-02-20 | VMware LLC | Routing based on geolocation costs |
US12057993B1 (en) | 2023-03-27 | 2024-08-06 | VMware LLC | Identifying and remediating anomalies in a self-healing network |
US12034587B1 (en) | 2023-03-27 | 2024-07-09 | VMware LLC | Identifying and remediating anomalies in a self-healing network |
US12066907B1 (en) | 2023-04-28 | 2024-08-20 | Netapp, Inc. | Collection of state information by nodes in a cluster to handle cluster management after master-node failover |
Also Published As
Publication number | Publication date |
---|---|
US10164873B1 (en) | 2018-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10164873B1 (en) | All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups | |
US10257019B2 (en) | Link aggregation split-brain detection and recovery | |
US9172662B2 (en) | Virtual chassis system control protocols | |
US8027246B2 (en) | Network system and node apparatus | |
US10454809B2 (en) | Automatic network topology detection for merging two isolated networks | |
US20080215910A1 (en) | High-Availability Networking with Intelligent Failover | |
US9148390B2 (en) | System and method for virtual chassis split prevention | |
US20130094357A1 (en) | Fhrp optimizations for n-way gateway load balancing in fabric path switching networks | |
US8873369B2 (en) | Fiber channel 1:N redundancy | |
KR101691759B1 (en) | Virtual chassis system control protocols | |
US10230540B2 (en) | Method, device and system for communicating in a ring network | |
EP2918054B1 (en) | System and method for a pass thru mode in a virtual chassis system | |
US20160308753A1 (en) | Packet network linear protection systems and methods in a dual home or multi-home configuration | |
US9807051B1 (en) | Systems and methods for detecting and resolving split-controller or split-stack conditions in port-extended networks | |
US8477598B2 (en) | Method and system for implementing network element-level redundancy | |
US8711681B2 (en) | Switch redundancy in systems with dual-star backplanes | |
US10862706B2 (en) | Detection of node isolation in subtended ethernet ring topologies | |
US9577872B2 (en) | Fiber channel 1:N redundancy | |
US8144574B1 (en) | Distributed control packet processing | |
WO2014074546A1 (en) | Network node and method in a node operable in a virtual chassis system wherein it is determined whether to issue a warning that an administrative action triggers a virtual chassis split | |
JP2007027954A (en) | Packet network and layer 2 switch | |
Park et al. | Toward control path high availability for software-defined networks | |
US20150271057A1 (en) | Method for running a computer network | |
CN116248581B (en) | Cloud scene gateway cluster master-slave switching method and system based on SDN | |
US8477599B2 (en) | Method and system for implementing network element-level redundancy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CIENA CORPORATION, MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SOOD, ANKIT;BAHERI, HOSSEIN;GUDIMETLA, LEELA SANKAR;AND OTHERS;REEL/FRAME:042569/0883 Effective date: 20170531 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |