Nothing Special   »   [go: up one dir, main page]

US20180351855A1 - All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups - Google Patents

All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups Download PDF

Info

Publication number
US20180351855A1
US20180351855A1 US15/611,283 US201715611283A US2018351855A1 US 20180351855 A1 US20180351855 A1 US 20180351855A1 US 201715611283 A US201715611283 A US 201715611283A US 2018351855 A1 US2018351855 A1 US 2018351855A1
Authority
US
United States
Prior art keywords
standby
node
active
links
common endpoint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/611,283
Other versions
US10164873B1 (en
Inventor
Ankit SOOD
Hossein BAHERI
Leela Sankar GUDIMETLA
Vijay Mohan CHANDRA MOHAN
Wei-Chiuan Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ciena Corp
Original Assignee
Ciena Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ciena Corp filed Critical Ciena Corp
Priority to US15/611,283 priority Critical patent/US10164873B1/en
Assigned to CIENA CORPORATION reassignment CIENA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAHERI, HOSSEIN, CHANDRA MOHAN, VIJAY MOHAN, CHEN, WEI-CHIUAN, GUDIMETLA, LEELA SANKAR, SOOD, ANKIT
Publication of US20180351855A1 publication Critical patent/US20180351855A1/en
Application granted granted Critical
Publication of US10164873B1 publication Critical patent/US10164873B1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/245Link aggregation, e.g. trunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/66Layer 2 routing, e.g. in Ethernet based MAN's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • the present disclosure generally relates to networking systems and methods. More particularly, the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs).
  • M-LAGs Multi-Chassis Link Aggregation Groups
  • Link aggregation relates to combining various network connections in parallel to increase throughput, beyond what a single connection could sustain, and to provide redundancy between the links.
  • Link aggregation including the Link Aggregation Control Protocol (LACP) for Ethernet is defined in IEEE 802.1AX, IEEE 802.1aq, IEEE 802.3ad, as well in various proprietary solutions.
  • IEEE 802.1AX-2008 and IEEE 802.1AX-2014 are entitled Link Aggregation, the contents of which are incorporated by reference.
  • IEEE 802.1aq-2012 is entitled Shortest Path Bridging, the contents of which are incorporated by reference.
  • IEEE 802.3ad-2000 is entitled Link Aggregation, the contents of which are incorporated by reference.
  • Multi-Chassis Link Aggregation Group is a type of LAG with constituent ports that terminate on separate chassis, primarily for the purpose of providing nodal redundancy in the event one of the chassis fails.
  • the relevant standards for LAG do not mention MC-LAG, but do not preclude it.
  • MC-LAG implementation varies by vendor.
  • LAG is a technique for inverse multiplexing over multiple Ethernet links, thereby increasing bandwidth and providing redundancy.
  • IEEE 802.1AX-2008 states “Link Aggregation allows one or more links to be aggregated together to form a Link Aggregation Group, such that a MAC (Media Access Control) client can treat the Link Aggregation Group as if it were a single link.”
  • This layer 2 transparency is achieved by LAG using a single MAC address for all the device's ports in the LAG group.
  • LAG can be configured as either static or dynamic.
  • Dynamic LAG uses a peer-to-peer protocol for control, called Link Aggregation Control Protocol (LACP). This LACP protocol is also defined within the 802.1AX-2008 standard the entirety of which is incorporated herein by reference.
  • LACP Link Aggregation Control Protocol
  • LAG can be implemented in multiple ways, namely LAG N and LAG N+N/M+N.
  • LAG N is the load sharing mode of LAG and LAG N+N/M+N provides the redundancy.
  • the LAG N protocol automatically distributes and load balances the traffic across the working links within a LAG, thus maximizing the use of the group if Ethernet links go down or come back up, providing improved resilience and throughput.
  • a complete implementation of the LACP protocol supports separate worker/standby LAG subgroups. For LAG N+N, the work links as a group will failover to the standby links if any one or more or all of the links in the worker group fail. Note, LACP marks links as in standby mode using an “out of sync” flag.
  • Link Aggregation includes increased throughput/bandwidth (physical link capacity*number of physical links), load balancing across aggregated links and link-level redundancy (failure of a link does not result in a traffic drop; rather standby links can take over as active role for traffic distribution).
  • One of the limitations of Link Aggregation is that it does not provide node-level redundancy. If one end of a LAG fails, it leads to a complete traffic drop as there is no other data path available for the data traffic to be switched to the other node.
  • “Multi-Chassis” Link Aggregation Group (MC-LAG) is introduced, that provides a node-level redundancy in addition to link-level redundancy and other merits provided by LAG.
  • MC-LAG allows two or more nodes (referred to herein as a Redundant Group (RG)) to share a common LAG endpoint (Dual Homing Device (DHD)).
  • RG Redundant Group
  • DHD Dual Homing Device
  • the multiple nodes present a single logical LAG to the remote end.
  • MC-LAG implementations are vendor-specific, but cooperating chassis remain externally compliant to the IEEE 802.1AX-2008 standard.
  • Nodes in an MC-LAG cluster communicate to synchronize and negotiate automatic switchovers (failover).
  • Some implementations may support administrator-initiated (manual) switchovers.
  • the multiple nodes in the redundant group maintain some form of adjacency with one another, such as the Inter-Chassis Communication Protocol (ICCP).
  • ICCP Inter-Chassis Communication Protocol
  • the redundant group requires the adjacency to operate the MC-LAG, a loss in the adjacency (for any reason including a link fault, a nodal fault, etc.) results in a so-called split-brain problem where all peers in the redundant group attempt to take an active role considering corresponding peers as operationally down. This can lead to the introduction of loops in the MC-LAG network and result in the rapid duplication of packets.
  • a method utilizing all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network implemented by a standby node includes remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; monitoring frames transmitted by the common endpoint to the standby node over the standby links; and determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon.
  • M-LAG Multi-Chassis Link Aggregation Group
  • the method can further include determining based on the monitoring frames whether less than all of the active links have failed and remaining in the standby state and continuing monitoring the frames transmitted by the common endpoint over the standby links based thereon.
  • the monitoring can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
  • the common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
  • the common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
  • LACP Link Aggregation Control Protocol
  • the common endpoint can be unaware the active node and the standby node are in separate network elements.
  • the loss of adjacency with the active node can be based on a failure or fault on a link between the active node and the standby node used for coordination of the active node and the standby node in a Redundant Group, while the active node and the standby node are both operational.
  • a standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems
  • MC-LAG Multi-Chassis Link Aggregation Group
  • a standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems
  • the standby node can be further configured to determine based on the monitoring frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
  • the frames can be monitored to check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
  • the common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
  • the common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
  • LACP Link Aggregation Control Protocol
  • the common endpoint can be unaware the active node and the standby node are in separate network elements.
  • the loss of adjacency with the active node can be based on a failure or fault on the communication link, while the active node and the standby node are both operational.
  • M-LAG Multi-Chassis Link Aggregation Group
  • the apparatus can further include circuitry configured to determine based on the monitored frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
  • the circuitry configured to monitor can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
  • the common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
  • the common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
  • LACP Link Aggregation Control Protocol
  • N:N link-level redundancy between the active node and the standby node The common endpoint can be unaware the active node and the standby node are in separate network elements.
  • FIG. 1 illustrates an active/standby Multi-Chassis Link Aggregation Group (MC-LAG);
  • FIG. 2 illustrates the MC-LAG of FIG. 1 with a fault and associated node-level redundancy
  • FIG. 3 illustrates the MC-LAG of FIG. 1 with the Inter-Chassis Communication Protocol (ICCP) link failed and associated operation with no other faults;
  • ICCP Inter-Chassis Communication Protocol
  • FIG. 4 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on one of the active links causing the split-brain problem of the prior art
  • FIG. 5 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on any but the last the active link in an all-or-none (AON) switchover to prevent the split-brain problem in accordance with an embodiment of the proposed solution;
  • AON all-or-none
  • FIG. 6 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on all of the active links in the AON switchover in accordance with an embodiment of the proposed solution;
  • FIG. 7 illustrates a flowchart of an AON switchover process in accordance with an embodiment of the proposed solution implemented by the standby RG member node subsequent to the loss of connectivity with the active Redundant Group (RG) member node such as due to the fault on the ICCP link; and
  • RG Redundant Group
  • FIG. 8 illustrates an example network element for the proposed systems and methods described herein.
  • the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs).
  • MC-LAGs Multi-Chassis Link Aggregation Groups
  • the systems and method solve the split-brain problem in an active/standby MC-LAG in a triangle topology (a DHD connected to a plurality of RG members).
  • the proposed systems and methods are implemented between the RG members only without the involvement of the DHD; thus, the systems and methods can interoperate with any vendor's DHD.
  • the systems and methods do not change system MAC addresses thereby avoiding increased switchover time.
  • FIG. 1 illustrates an active/standby MC-LAG 10 .
  • MC-LAG 10 simply means dual-homing an endpoint to two or more upstream devices, i.e., allowing two or more upstream nodes to share a common endpoint thereby providing node-level redundancy.
  • the MC-LAG 10 includes a Redundant Group (RG) 12 which includes RG member nodes 14 , 16 which are the two or more upstream devices.
  • the common endpoint is a Dual Homing Device (DHD) 18 .
  • the nodes 14 , 16 and the DHD 18 can be Ethernet switches, routers, packet-optical devices, etc. supporting Layer 2 connectivity.
  • the multiple nodes 14 , 16 in the RG 12 present a single logical LAG interface 20 which is an MC-LAG to a DHD LAG 22 .
  • the nodes 14 , 16 each have a separate LAG 24 , 26 which are logically operated as the logical LAG interface 20 based on adjacency and coordination between the nodes 14 , 16 .
  • the RG 12 can appear to the DHD 18 as a single node with the logical LAG interface 20 .
  • the nodes 14 , 16 rely on LACP as an underlying communication protocol between one another.
  • the nodes 14 , 16 can exchange their configuration and dynamic state data over an Inter-Chassis Communication Protocol (ICCP) link 28 .
  • ICCP Inter-Chassis Communication Protocol
  • the nodes 14 , 16 are different physical network elements which can be in the same location or in different locations.
  • the nodes 14 , 16 are interconnected via a network 30 , such as a G.8032 Ethernet network, a Multiprotocol Label Switching (MPLS) network, or the like.
  • the ICCP link 28 can be a physical connection in the network 30 .
  • the ICCP link 28 can be a dedicated link between the nodes 14 , 16 such as when they are in the same location or chassis.
  • RG 12 implementation is typically vendor-specific, i.e., not specified by the relevant LAG standards.
  • the objective of the RG 12 is to present the nodes 14 , 16 and the logical LAG interface 20 as a single virtual endpoint to a standards-based LAG DHD 18 .
  • Various vendors use different terminology for the MC-LAG which include: MLAG, distributed split multi-link trunking, multi-chassis trunking, MLAG, etc.
  • the proposed systems and methods described herein can apply to any implementation of the RG 12 and seek to avoid coordination with the DHD 18 such that the RG 12 appears to any LAG-compliant DHD 12 as the single logical LAG interface 20 .
  • other terminology may be used for the ICCP link 28 , but the objective is the same—to enable adjacency and coordination between the nodes 14 , 16 .
  • the ICCP link 28 can be monitored via keep-alive message exchanges that deem this link operational.
  • Connectivity Fault Management (CFM) or Bidirectional Forwarding Detection (BFD) services can be configured across the RG member nodes 14 , 16 .
  • the DHD 18 includes four ports 32 into the LAG 22 , two ports 34 are active and connected to the LAG 26 and two ports 36 that are standby connected to the LAG 24 .
  • the MC-LAG 10 is an active/standby MC-LAG.
  • the four ports 32 appear as a standard LAG, and the DHD 18 is unaware that the ports 34 , 36 terminate on separate nodes 14 , 16 .
  • the ICCP link 28 coordination between the RG member nodes 14 , 16 cause them to appear as a single node from the DHD 18 's perspective.
  • FIG. 2 illustrates the MC-LAG 10 with a fault 50 and associated node-level redundancy.
  • FIG. 2 illustrates two states 52 , 54 shown to illustrate how node-level redundancy is performed.
  • the ports 34 are active such that the node 14 is the active RG member node and the ports 36 are standby such that the node 16 is the standby RG member node.
  • the ports 34 , 36 include sending frames (LACPDUs—LACP Protocol Data Units) between the DHD 18 and the nodes 14 , 16 with SYNC bits.
  • the ports 34 Prior to the fault 50 , the ports 34 have the LACPDU SYNC bits set to 1 indicating the ports 34 are active and the ports 36 have the LACPDU SYNC bits set to 0 indicating the ports 36 are standby.
  • step 60 - 1 assume the node 14 fails, and the active RG member node's failure causes protection switching of traffic to the standby RG member node 16 .
  • An MC-LAG supports a triangle, square, and mesh topology. Particularly, the disclosure herein focuses on the split-brain problem and solution in the MC-LAG triangle topology such that the DHD 18 is not required to participate in the diagnosis or correction and such that the ports 34 , 36 do not require new MAC addresses.
  • the split-brain problem is an industry-wide known problem that happens in the case of dual homing. It may occur when communication between two MC-LAG nodes 14 , 16 is lost (i.e., the ICCP link 28 failed/operational down) while both the nodes 14 , 16 are still up and operational.
  • both the nodes 14 , 16 being no longer aware of each other's existence, try to take active role considering the other one as operationally down. This can lead to the introduction of loops in MC-LAG 10 network and can result in rapid duplication of packets at the DHD 18 .
  • the ICCP link 28 communication can be lost between the nodes 14 , 16 for various reasons, such as misconfigurations, network congestion, network errors, hardware failures, etc.
  • example problems can include configuring or administratively enabling the ICCP link 28 only on one RG member node 14 , 16 , configuring different ICCP heartbeat interval or timeout multiplier on the RG member nodes 14 , 16 , incorrectly configuring CFM or BFD Monitoring over the ICCP link 28 , configuring CFM Maintenance End Points (MEPs) incorrectly that may result in MEP Faults (MEP Faults will be propagated to the ICCP link 28 deeming the ICCP link 28 operationally down), etc.
  • MEP Faults MEP Faults will be propagated to the ICCP link 28 deeming the ICCP link 28 operationally down
  • OAM Operations, Administration, and Maintenance
  • FPGA Field Programmable Gate Array
  • NPU Network Processor Unit
  • ASIC Application Specific Integrated Circuit
  • FIG. 3 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with no other faults.
  • step 100 - 1 there is a fault 102 that causes the ICCP link 28 to fail. The reason for fault 102 is irrelevant.
  • step 100 - 2 since the ICCP link 28 connectivity is lost between the RG member nodes 14 , 16 , both the RG member nodes 14 , 16 try to take the active role by setting the SYNC bit to 1 on all their member ports 34 , 36 .
  • the node 14 already is the active node, so the node 14 does not change the SYNC bit, but the node 16 is in standby and goes into standalone active at step 100 - 3 .
  • FIG. 4 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on one of the active links ( 34 ) causing the split-brain problem.
  • step 150 - 1 there is fault 102 that causes the ICCP link 28 to fail. Again, the fault 102 could be for any reason.
  • step 150 - 2 since the ICCP link 28 connectivity is lost between the RG member nodes 14 , 16 , both the RG member nodes 14 , 16 try to take the active role by setting the SYNC bit to 1 on all their member ports 34 , 36 .
  • any distributing link fails on the ports 34 between the DHD 18 and the active RG member node 14 .
  • the fault 104 causes a failure on one of the ports 34 , and the SYNC bit is 0 and unable to send on this port.
  • the DHD 18 unaware of the fault 102 affecting the ICCP link 28 , selects one of the standby links on the ports 36 to take an active role and sets its SYNC Bit to 1 at step 150 - 4 .
  • the SYNC bit has already been set to 1 on the standby RG member node 16 because of the ICCP link 28 fault 102 .
  • the backup path on the ports 36 goes to the distribution state. Since, there is at least one link distributing from the DHD 18 to both the RG member nodes 14 , 16 ; it results in the formation of a loop resulting in packet duplication towards the DHD at step 150 - 5 .
  • the result is the split-brain problem where the member nodes 14 , 16 cause the loop due to their lack of adjacency and coordination.
  • the split-brain problem can only occur when there are more than one physical ports between the DHD 18 and each RG member node 14 , 16 .
  • the DHD's 18 1:1 redundancy will ensure that only one port can be active at any point of time thus preventing active-active situation from happening.
  • N:N/M:N redundancy is desired over 1:1 redundancy and employing N:N/M:N redundancy exposes the arrangement to the split-brain problem.
  • FIGS. 5 and 6 illustrate the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on one of the active links with an all-or-none (AON) switchover to prevent the split-brain problem in accordance with the proposed solution.
  • FIG. 5 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on any but the last the active link ( 34 ) in the AON switchover.
  • FIG. 6 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with fault 104 on all of the active links in the AON switchover.
  • the AON switchover can be implemented by each of the RG member nodes 14 , 16 with the restriction that the standby RG member node 16 will only take the active role when all of the active links ( 34 ) on the active RG member node 14 fail.
  • the RG member nodes 14 , 16 cannot coordinate this with one another due to the fault 102 and the lack of adjacency. Instead, this is achieved by making optimal use of the SYNC bit as employed by DHD 18 .
  • the standby RG member node 16 will not set its member's SYNC bit to 1 immediately, but rather rely on the DHD 18 port's SYNC bits in order to set its member's ( 16 ) SYNC bit.
  • the AON switchover eliminates a loop during split brain situation where MC-LAG 10 is configured with N:N link redundancy and there is no link failure on the standby path (on the ports 36 ).
  • the standby RG member node 16 will not go active and will keep the SYNC Bits to FALSE (0) and will keep monitoring the SYNC bits coming from the DHD 18 .
  • the DHD 18 may not know it is in the MC-LAG but rather assume this is a standard LAG.
  • This AON switchover approach does not require the DHD 18 to have a special configuration, but rather operate standard LACP. Further, the AON switchover does not require new MAC addresses and/or re-convergence.
  • RG member nodes 14 , 16 are runtime upgraded to employ the functionality of the proposed solution, preferably standby RG member node 16 should be upgraded first (before active RG member node 14 ).
  • FIG. 7 is a flowchart of an AON switchover process 300 implemented by the standby RG member node 16 subsequent to the loss of connectivity with the active RG member node 14 such as due to the fault 102 on the ICCP link 28 .
  • the standby RG member node 16 performs the AON switchover process 300 to eliminate chances that the split-brain problem may cause a loop.
  • the standby RG member node 16 begins the AON switchover process 300 subsequent to the loss of adjacency with the active RG member node 14 (step 302 ).
  • the standby RG member node 16 remains in the standby state on all of the ports 36 keeping the SYNC bits set to 0 with the standby RG member node 16 monitoring LACPDUs from the DHD 18 for their associated SYNC bit (step 304 ). Specifically, this monitoring does not require the DHD 18 to make changes, but simply assumes DHD 18 to operate standard LACP in an N:N link-level redundancy scheme.
  • the standby RG member node 16 can infer the operational status of the active ports 34 based on the SYNC bits from the DHD 18 on the standby ports 36 . Specifically, the standby RG member node 16 knows the value of N (N:N) and can infer the number of active/failed links on the ports 34 based on the number of SYNC bit values equal to 1 coming from the DHD 18 on the ports 36 . Thus, the AON switchover process 300 operates in a triangle MC-LAG with N:N active/standby configurations.
  • the standby RG member node 16 can determine if any active links have failed (step 306 ). Specifically, no active links have failed if none of the ports 36 have the SYNC bit set to 0 coming from the DHD 18 and the standby RG member node 16 remains, (step 304 ), in the standby state on all of the ports 36 keeping the SYNC bits set to 0 and the standby RG member node 16 monitors LACPDUs from the DHD 18 for their associated SYNC bit (step 306 ).
  • the standby RG member node 16 determines whether all of the active links have failed or whether some, but not all of the active links have failed (step 306 ). The standby RG member node 16 will only become active when all of the active links ( 34 ) have failed. This prevents the loops and does not require coordination with the DHD 18 or changes to system MAC addresses.
  • step 306 If not all of the active links have failed (step 306 ), then the standby RG member node 16 remains in the standby state on all ports keeping the SYNC bits set to 0 and continues to monitor LACPDUs from the DHD 18 (step 304 ). If all of the active links ( 34 ) have failed (step 308 ), the standby RG member node enters the active state on all ports 36 changing the SYNC bits to 1 (step 308 ). This will result in the backup path going to distribution state and traffic will resume after protection switching.
  • the AON switchover process 300 is implemented on the RG 12 and therefore is interoperable with any vendor's DHD 18 supporting standard LACP and the switchover time is not compromised since no re-convergence is required. Also, the AON switchover process 300 can be configurable and selectively enabled/disabled on both of the member nodes 14 , 16 .
  • FIG. 5 similar to FIG. 4 , at step 350 - 1 , there is a fault 102 that causes the ICCP link 28 to fail. Again, the fault 102 could be for any reason.
  • the member nodes 14 , 16 detect the ICCP link 28 failure and report the same to the MC-LAG 10 .
  • the active member RG node 14 goes to standalone (active), and the SYNC bit remains at 1 on the operational links in the ports 34 .
  • step 350 - 3 if the standby RG member node 16 is configured with the AON switchover process 300 enabled, the standby RG member node 16 goes to a standalone mode, but non-distributing, keeping the SYNC bits set at 0 for all links in the ports 36 .
  • the standby RG member node 16 monitors the LACPDUs from the DHD 18 on the ports 36 .
  • the DHD 18 determines the fault 104 on the ports 34 and since this is N:N redundancy, the DHD 18 selects a standby port as active on the ports 36 setting the SYNC bit to 1.
  • the last link in the ports 34 fails.
  • the active RG member node 14 goes into standalone, non-distributing and the SYNC bits are 0 on all links on the ports 34 .
  • the DHD 18 selects another standby port of the ports 36 to set as active and sets the SYNC bit to 1.
  • the standby RG member node 16 sets the SYNC bit to 1 on all of the ports 36 since the DHD 18 also has the SYNC bit set to 1 on all of the ports 36 and the ports 36 go into distribution, such that the traffic switches from the ports 34 to the ports 36 .
  • FIG. 8 illustrates an example network element 400 for the systems and methods described herein.
  • the network element 400 is an Ethernet, MPLS, IP, etc. network switch, but those of ordinary skill in the art will recognize the systems and methods described herein can operate with other types of network elements and other implementations.
  • the network element 400 can be the RG member nodes 14 , 16 .
  • the network element 400 can be the DHD 18 as well.
  • the network element 400 includes a plurality of blades 402 , 404 interconnected via an interface 406 .
  • the blades 402 , 404 are also known as line cards, line modules, circuit packs, pluggable modules, etc. and generally refer to components mounted on a chassis, shelf, etc.
  • Each of the blades 402 , 404 can include numerous electronic devices and optical devices mounted on a circuit board along with various interconnects including interfaces to the chassis, shelf, etc.
  • the network element 400 is illustrated in an oversimplified manner and may include other components and functionality.
  • the line blades 402 include data ports 408 such as a plurality of Ethernet ports.
  • the line blade 402 can include a plurality of physical ports disposed on an exterior of the blade 402 for receiving ingress/egress connections.
  • the line blades 402 can include switching components to form a switching fabric via the interface 406 between all of the data ports 408 allowing data traffic to be switched between the data ports 408 on the various line blades 402 .
  • the switching fabric is a combination of hardware, software, firmware, etc. that moves data coming into the network element 400 out by the correct port 408 to the next network element 400 .
  • Switching fabric includes switching units, or individual boxes, in a node; integrated circuits contained in the switching units; and programming that allows switching paths to be controlled. Note, the switching fabric can be distributed on the blades 402 , 404 , in a separate blade (not shown), or a combination thereof.
  • the line blades 402 can include an Ethernet manager (i.e., a processor) and a Network Processor (NP)/Application Specific Integrated Circuit (ASIC).
  • NP Network Processor
  • ASIC Application Specific Integrated Circuit
  • the control blades 404 include a microprocessor 410 , memory 412 , software 414 , and a network interface 416 .
  • the microprocessor 410 , the memory 412 , and the software 414 can collectively control, configure, provision, monitor, etc. the network element 400 .
  • the network interface 416 may be utilized to communicate with an element manager, a network management system, etc.
  • the control blades 404 can include a database 420 that tracks and maintains provisioning, configuration, operational data and the like.
  • the network element 400 includes two control blades 404 which may operate in a redundant or protected configuration such as 1:1, 1+1, etc.
  • the control blades 404 maintain dynamic system information including packet forwarding databases, protocol state machines, and the operational status of the ports 408 within the network element 400 .
  • the various components of the network element 400 can be configured to implement the AON switchover process 300 .
  • processors such as microprocessors; Central Processing Units (CPUs); Digital Signal Processors (DSPs): customized processors such as Network Processors (NPs) or Network Processing Units (NPUs), Graphics Processing Units (GPUs), or the like; Field Programmable Gate Arrays (FPGAs); and the like along with unique stored program instructions (including both software and firmware) for control thereof to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the methods and/or systems described herein.
  • processors such as microprocessors; Central Processing Units (CPUs); Digital Signal Processors (DSPs): customized processors such as Network Processors (NPs) or Network Processing Units (NPUs), Graphics Processing Units (GPUs), or the like; Field Programmable Gate Arrays (FPGAs); and the like along with unique stored program instructions (including both software and firmware) for control thereof to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of
  • circuitry configured or adapted to
  • logic configured or adapted to
  • some embodiments may include a non-transitory computer-readable storage medium having computer readable code stored thereon for programming a computer, server, appliance, device, processor, circuit, etc. each of which may include a processor to perform functions as described and claimed herein.
  • Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory), Flash memory, and the like.
  • software can include instructions executable by a processor or device (e.g., any type of programmable circuitry or logic) that, in response to such execution, cause a processor or the device to perform a set of operations, steps, methods, processes, algorithms, functions, techniques, etc. as described herein for the various exemplary embodiments.
  • a processor or device e.g., any type of programmable circuitry or logic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Systems and methods utilize an all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network. A standby node in the MC-LAG network can perform the steps of remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; monitoring frames transmitted by the common endpoint to the standby node over the standby links; and determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon.

Description

    FIELD OF THE DISCLOSURE
  • The present disclosure generally relates to networking systems and methods. More particularly, the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs).
  • BACKGROUND OF THE DISCLOSURE
  • Link aggregation relates to combining various network connections in parallel to increase throughput, beyond what a single connection could sustain, and to provide redundancy between the links. Link aggregation including the Link Aggregation Control Protocol (LACP) for Ethernet is defined in IEEE 802.1AX, IEEE 802.1aq, IEEE 802.3ad, as well in various proprietary solutions. IEEE 802.1AX-2008 and IEEE 802.1AX-2014 are entitled Link Aggregation, the contents of which are incorporated by reference. IEEE 802.1aq-2012 is entitled Shortest Path Bridging, the contents of which are incorporated by reference. IEEE 802.3ad-2000 is entitled Link Aggregation, the contents of which are incorporated by reference. Multi-Chassis Link Aggregation Group (MC-LAG), is a type of LAG with constituent ports that terminate on separate chassis, primarily for the purpose of providing nodal redundancy in the event one of the chassis fails. The relevant standards for LAG do not mention MC-LAG, but do not preclude it. MC-LAG implementation varies by vendor.
  • LAG is a technique for inverse multiplexing over multiple Ethernet links, thereby increasing bandwidth and providing redundancy. IEEE 802.1AX-2008 states “Link Aggregation allows one or more links to be aggregated together to form a Link Aggregation Group, such that a MAC (Media Access Control) client can treat the Link Aggregation Group as if it were a single link.” This layer 2 transparency is achieved by LAG using a single MAC address for all the device's ports in the LAG group. LAG can be configured as either static or dynamic. Dynamic LAG uses a peer-to-peer protocol for control, called Link Aggregation Control Protocol (LACP). This LACP protocol is also defined within the 802.1AX-2008 standard the entirety of which is incorporated herein by reference.
  • LAG can be implemented in multiple ways, namely LAG N and LAG N+N/M+N. LAG N is the load sharing mode of LAG and LAG N+N/M+N provides the redundancy. The LAG N protocol automatically distributes and load balances the traffic across the working links within a LAG, thus maximizing the use of the group if Ethernet links go down or come back up, providing improved resilience and throughput. For a different style of resilience between two nodes, a complete implementation of the LACP protocol supports separate worker/standby LAG subgroups. For LAG N+N, the work links as a group will failover to the standby links if any one or more or all of the links in the worker group fail. Note, LACP marks links as in standby mode using an “out of sync” flag.
  • Advantages of Link Aggregation include increased throughput/bandwidth (physical link capacity*number of physical links), load balancing across aggregated links and link-level redundancy (failure of a link does not result in a traffic drop; rather standby links can take over as active role for traffic distribution). One of the limitations of Link Aggregation is that it does not provide node-level redundancy. If one end of a LAG fails, it leads to a complete traffic drop as there is no other data path available for the data traffic to be switched to the other node. To solve this problem, “Multi-Chassis” Link Aggregation Group (MC-LAG) is introduced, that provides a node-level redundancy in addition to link-level redundancy and other merits provided by LAG.
  • MC-LAG allows two or more nodes (referred to herein as a Redundant Group (RG)) to share a common LAG endpoint (Dual Homing Device (DHD)). The multiple nodes present a single logical LAG to the remote end. Note that MC-LAG implementations are vendor-specific, but cooperating chassis remain externally compliant to the IEEE 802.1AX-2008 standard. Nodes in an MC-LAG cluster communicate to synchronize and negotiate automatic switchovers (failover). Some implementations may support administrator-initiated (manual) switchovers.
  • The multiple nodes in the redundant group maintain some form of adjacency with one another, such as the Inter-Chassis Communication Protocol (ICCP). Since the redundant group requires the adjacency to operate the MC-LAG, a loss in the adjacency (for any reason including a link fault, a nodal fault, etc.) results in a so-called split-brain problem where all peers in the redundant group attempt to take an active role considering corresponding peers as operationally down. This can lead to the introduction of loops in the MC-LAG network and result in the rapid duplication of packets.
  • Thus, there is a need for a solution to the split-brain which is solely implemented between the RG members that are interoperable with any vendor supporting standard LACP on the DHD and which does not increase switchover time.
  • BRIEF SUMMARY OF THE DISCLOSURE
  • There are some conventional solutions to addressing this problem. One conventional solution introduces configuration changes on the common LAG endpoint where the DHD detects the split-brain and configures packet flow accordingly. However, this solution is a proprietary solution requiring the DHD to participate in the MC-LAG. It would be advantageous to avoid configuration on the DHD due to the split-brain problem since the DHD may or may not be aware of the MC-LAG, preferably, the DHD may simply think it is participating in a conventional LAG supporting standard LACP. Another conventional solution includes changing the system MACs on RG members during a split-brain along with the use of an out-of-band management channel as a backup to verify communication between the RG members. However, this solution may lead to a significant switchover time since the underlying LACP would have to re-converge with the new system MACs.
  • In an embodiment, a method utilizing all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network implemented by a standby node includes remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; monitoring frames transmitted by the common endpoint to the standby node over the standby links; and determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon. The method can further include determining based on the monitoring frames whether less than all of the active links have failed and remaining in the standby state and continuing monitoring the frames transmitted by the common endpoint over the standby links based thereon. The monitoring can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links. The common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
  • The common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node. The common endpoint can be unaware the active node and the standby node are in separate network elements. The loss of adjacency with the active node can be based on a failure or fault on a link between the active node and the standby node used for coordination of the active node and the standby node in a Redundant Group, while the active node and the standby node are both operational.
  • In another embodiment, a standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems includes a plurality of ports in a logical Link Aggregation Group (LAG) with an active node, wherein the plurality of ports form standby links with a common endpoint; a communication link with an active node; and a switching fabric between the plurality of ports, wherein the standby node is configured to remain in a standby state responsive to a loss of the communication link, wherein, in the standby state, all the standby links are non-distributing; monitor frames transmitted by the common endpoint to the standby node over the standby links; and determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
  • The standby node can be further configured to determine based on the monitoring frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon. The frames can be monitored to check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links. The common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology. The common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node. The common endpoint can be unaware the active node and the standby node are in separate network elements. The loss of adjacency with the active node can be based on a failure or fault on the communication link, while the active node and the standby node are both operational.
  • In a further embodiment, an apparatus configured for all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network located at a standby node includes circuitry configured to remain in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing; circuitry configured to monitor frames transmitted by the common endpoint to the standby node over the standby links; and circuitry configured to determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
  • The apparatus can further include circuitry configured to determine based on the monitored frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon. The circuitry configured to monitor can check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links. The common endpoint can be communicatively coupled to both the active node and the standby node in an active/standby triangle topology. The common endpoint can be configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node. The common endpoint can be unaware the active node and the standby node are in separate network elements.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The proposed solution is illustrated and described herein with reference to the various drawings, in which like reference numbers are used to denote like system components/method steps, as appropriate, and in which:
  • FIG. 1 illustrates an active/standby Multi-Chassis Link Aggregation Group (MC-LAG;
  • FIG. 2 illustrates the MC-LAG of FIG. 1 with a fault and associated node-level redundancy;
  • FIG. 3 illustrates the MC-LAG of FIG. 1 with the Inter-Chassis Communication Protocol (ICCP) link failed and associated operation with no other faults;
  • FIG. 4 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on one of the active links causing the split-brain problem of the prior art;
  • FIG. 5 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on any but the last the active link in an all-or-none (AON) switchover to prevent the split-brain problem in accordance with an embodiment of the proposed solution;
  • FIG. 6 illustrates the MC-LAG of FIG. 1 with the ICCP link failed and associated operation with a fault on all of the active links in the AON switchover in accordance with an embodiment of the proposed solution;
  • FIG. 7 illustrates a flowchart of an AON switchover process in accordance with an embodiment of the proposed solution implemented by the standby RG member node subsequent to the loss of connectivity with the active Redundant Group (RG) member node such as due to the fault on the ICCP link; and
  • FIG. 8 illustrates an example network element for the proposed systems and methods described herein.
  • DETAILED DESCRIPTION OF THE DISCLOSURE
  • In various embodiments, the present disclosure relates to systems and methods performing an all-or-none switchover to address split-brain problems in Multi-Chassis Link Aggregation Groups (MC-LAGs). In particular, the systems and method solve the split-brain problem in an active/standby MC-LAG in a triangle topology (a DHD connected to a plurality of RG members). The proposed systems and methods are implemented between the RG members only without the involvement of the DHD; thus, the systems and methods can interoperate with any vendor's DHD. Also, the systems and methods do not change system MAC addresses thereby avoiding increased switchover time.
  • Active/Standby MC-LAG
  • FIG. 1 illustrates an active/standby MC-LAG 10. MC-LAG 10 simply means dual-homing an endpoint to two or more upstream devices, i.e., allowing two or more upstream nodes to share a common endpoint thereby providing node-level redundancy. The MC-LAG 10 includes a Redundant Group (RG) 12 which includes RG member nodes 14, 16 which are the two or more upstream devices. The common endpoint is a Dual Homing Device (DHD) 18. The nodes 14, 16 and the DHD 18 can be Ethernet switches, routers, packet-optical devices, etc. supporting Layer 2 connectivity. The multiple nodes 14, 16 in the RG 12 present a single logical LAG interface 20 which is an MC-LAG to a DHD LAG 22. Specifically, the nodes 14, 16 each have a separate LAG 24, 26 which are logically operated as the logical LAG interface 20 based on adjacency and coordination between the nodes 14, 16. In this manner, the RG 12 can appear to the DHD 18 as a single node with the logical LAG interface 20.
  • In order to present the RG 12 as the logical LAG interface 20, the nodes 14, 16 rely on LACP as an underlying communication protocol between one another. The nodes 14, 16 can exchange their configuration and dynamic state data over an Inter-Chassis Communication Protocol (ICCP) link 28. Again, the nodes 14, 16 are different physical network elements which can be in the same location or in different locations. In either situation, the nodes 14, 16 are interconnected via a network 30, such as a G.8032 Ethernet network, a Multiprotocol Label Switching (MPLS) network, or the like. The ICCP link 28 can be a physical connection in the network 30. Also, the ICCP link 28 can be a dedicated link between the nodes 14, 16 such as when they are in the same location or chassis.
  • RG 12 implementation is typically vendor-specific, i.e., not specified by the relevant LAG standards. However, in general, the objective of the RG 12 is to present the nodes 14, 16 and the logical LAG interface 20 as a single virtual endpoint to a standards-based LAG DHD 18. Various vendors use different terminology for the MC-LAG which include: MLAG, distributed split multi-link trunking, multi-chassis trunking, MLAG, etc. The proposed systems and methods described herein can apply to any implementation of the RG 12 and seek to avoid coordination with the DHD 18 such that the RG 12 appears to any LAG-compliant DHD 12 as the single logical LAG interface 20. Also, other terminology may be used for the ICCP link 28, but the objective is the same—to enable adjacency and coordination between the nodes 14, 16.
  • The ICCP link 28 can be monitored via keep-alive message exchanges that deem this link operational. For faster ICCP Link Failure detection/recovery, Connectivity Fault Management (CFM) or Bidirectional Forwarding Detection (BFD) services can be configured across the RG member nodes 14, 16.
  • In the example of FIG. 1, the DHD 18 includes four ports 32 into the LAG 22, two ports 34 are active and connected to the LAG 26 and two ports 36 that are standby connected to the LAG 24. In this manner, the MC-LAG 10 is an active/standby MC-LAG. From the perspective of the DHD 18, the four ports 32 appear as a standard LAG, and the DHD 18 is unaware that the ports 34, 36 terminate on separate nodes 14, 16. The ICCP link 28 coordination between the RG member nodes 14, 16 cause them to appear as a single node from the DHD 18's perspective.
  • FIG. 2 illustrates the MC-LAG 10 with a fault 50 and associated node-level redundancy. Specifically, FIG. 2 illustrates two states 52, 54 shown to illustrate how node-level redundancy is performed. At the state 52, the ports 34 are active such that the node 14 is the active RG member node and the ports 36 are standby such that the node 16 is the standby RG member node. In LACP, the ports 34, 36 include sending frames (LACPDUs—LACP Protocol Data Units) between the DHD 18 and the nodes 14, 16 with SYNC bits. Prior to the fault 50, the ports 34 have the LACPDU SYNC bits set to 1 indicating the ports 34 are active and the ports 36 have the LACPDU SYNC bits set to 0 indicating the ports 36 are standby.
  • At step 60-1, assume the node 14 fails, and the active RG member node's failure causes protection switching of traffic to the standby RG member node 16. As soon as the standby RG member node 16 losses connectivity with active RG member node 14 (the ICCP link 28 failure in step 60-2 due to the fault 50), the standby RG member node 16 takes the active role by setting the SYNC bit=1 on all its member ports 36 at step 60-3. Since the DHD 18 also gets a link failure for all active links on the ports 34 at step 60-4, all the standby links on the DHD 18 take the active role by setting their SYNC bit=1 at step 60-5. This makes the backup links “distributing” and hence, traffic switches to the new active RG member node 16 (node-level redundancy).
  • Split-Brain in Active/Standby MC-LAG Triangle Topology
  • An MC-LAG supports a triangle, square, and mesh topology. Particularly, the disclosure herein focuses on the split-brain problem and solution in the MC-LAG triangle topology such that the DHD 18 is not required to participate in the diagnosis or correction and such that the ports 34, 36 do not require new MAC addresses.
  • The split-brain problem is an industry-wide known problem that happens in the case of dual homing. It may occur when communication between two MC- LAG nodes 14, 16 is lost (i.e., the ICCP link 28 failed/operational down) while both the nodes 14, 16 are still up and operational. When the split-brain problem happens, both the nodes 14, 16, being no longer aware of each other's existence, try to take active role considering the other one as operationally down. This can lead to the introduction of loops in MC-LAG 10 network and can result in rapid duplication of packets at the DHD 18.
  • The ICCP link 28 communication can be lost between the nodes 14, 16 for various reasons, such as misconfigurations, network congestion, network errors, hardware failures, etc. For misconfigurations, example problems can include configuring or administratively enabling the ICCP link 28 only on one RG member node 14, 16, configuring different ICCP heartbeat interval or timeout multiplier on the RG member nodes 14, 16, incorrectly configuring CFM or BFD Monitoring over the ICCP link 28, configuring CFM Maintenance End Points (MEPs) incorrectly that may result in MEP Faults (MEP Faults will be propagated to the ICCP link 28 deeming the ICCP link 28 operationally down), etc. Network congestion may lead to CFM/BFD/ICCP frame-loss that in-turn may cause the ICCP link 28 to appear operationally down while some data traffic may still be switched across. For network errors, high bit errors may result in CFM/BFD/ICCP packet drops. For hardware failure, Operations, Administration, and Maintenance (OAM) engine failures may result in faults in the ICCP link 28 monitoring. For example, the OAM engine may be implemented in hardware as a Field Programmable Gate Array (FPGA), a Network Processor Unit (NPU), an Application Specific Integrated Circuit (ASIC), etc.
  • FIG. 3 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with no other faults. At step 100-1, there is a fault 102 that causes the ICCP link 28 to fail. The reason for fault 102 is irrelevant. At step 100-2, since the ICCP link 28 connectivity is lost between the RG member nodes 14, 16, both the RG member nodes 14, 16 try to take the active role by setting the SYNC bit to 1 on all their member ports 34, 36. The node 14 already is the active node, so the node 14 does not change the SYNC bit, but the node 16 is in standby and goes into standalone active at step 100-3.
  • This scenario, however, does not cause the split-brain problem to occur because of the configured link-level redundancy (N:N) on the DHD 18. Since all N links on the ports 34 from the active RG member node 14 are active, the DHD 18 does not set its SYNC bit on the N standby links on the ports 36 at step 100-4. This prevents the standby path from going to the distribution state even though standby RG member node 16 (after taking the new active role) sets the SYNC Bit to 1 on the backup path.
  • FIG. 4 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on one of the active links (34) causing the split-brain problem. At step 150-1, there is fault 102 that causes the ICCP link 28 to fail. Again, the fault 102 could be for any reason. At step 150-2, since the ICCP link 28 connectivity is lost between the RG member nodes 14, 16, both the RG member nodes 14, 16 try to take the active role by setting the SYNC bit to 1 on all their member ports 34, 36.
  • An issue, however, arises if any distributing link fails on the ports 34 between the DHD 18 and the active RG member node 14. At step 150-3, the fault 104 causes a failure on one of the ports 34, and the SYNC bit is 0 and unable to send on this port. In this scenario, the DHD 18, unaware of the fault 102 affecting the ICCP link 28, selects one of the standby links on the ports 36 to take an active role and sets its SYNC Bit to 1 at step 150-4.
  • The SYNC bit has already been set to 1 on the standby RG member node 16 because of the ICCP link 28 fault 102. Thus, the backup path on the ports 36 goes to the distribution state. Since, there is at least one link distributing from the DHD 18 to both the RG member nodes 14, 16; it results in the formation of a loop resulting in packet duplication towards the DHD at step 150-5. The result is the split-brain problem where the member nodes 14, 16 cause the loop due to their lack of adjacency and coordination. The split-brain problem can only occur when there are more than one physical ports between the DHD 18 and each RG member node 14, 16. In case there is only one physical port between the DHD 18 and each RG member node 14, 16, the DHD's 18 1:1 redundancy will ensure that only one port can be active at any point of time thus preventing active-active situation from happening. However, N:N/M:N redundancy is desired over 1:1 redundancy and employing N:N/M:N redundancy exposes the arrangement to the split-brain problem.
  • All-or-None Switchover in Split-Brain in Active/Standby MC-LAG Triangle Topology
  • FIGS. 5 and 6 illustrate the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on one of the active links with an all-or-none (AON) switchover to prevent the split-brain problem in accordance with the proposed solution. Specifically, FIG. 5 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with a fault 104 on any but the last the active link (34) in the AON switchover. FIG. 6 illustrates the MC-LAG 10 with the ICCP link 28 failed and associated operation with fault 104 on all of the active links in the AON switchover.
  • The AON switchover can be implemented by each of the RG member nodes 14, 16 with the restriction that the standby RG member node 16 will only take the active role when all of the active links (34) on the active RG member node 14 fail. Of course, the RG member nodes 14, 16 cannot coordinate this with one another due to the fault 102 and the lack of adjacency. Instead, this is achieved by making optimal use of the SYNC bit as employed by DHD 18. When the ICCP link 28 goes down operationally, the standby RG member node 16 will not set its member's SYNC bit to 1 immediately, but rather rely on the DHD 18 port's SYNC bits in order to set its member's (16) SYNC bit. The standby RG member node 16 will set its port's SYNC Bits to 1 only if receives SYNC bit=1 on all the operational ports from the DHD 18.
  • The AON switchover eliminates a loop during split brain situation where MC-LAG 10 is configured with N:N link redundancy and there is no link failure on the standby path (on the ports 36). With the AON switchover, when the ICCP link 28 fails, the standby RG member node 16 will not go active and will keep the SYNC Bits to FALSE (0) and will keep monitoring the SYNC bits coming from the DHD 18. Again, the DHD 18 may not know it is in the MC-LAG but rather assume this is a standard LAG. This AON switchover approach does not require the DHD 18 to have a special configuration, but rather operate standard LACP. Further, the AON switchover does not require new MAC addresses and/or re-convergence.
  • If RG member nodes 14, 16 are runtime upgraded to employ the functionality of the proposed solution, preferably standby RG member node 16 should be upgraded first (before active RG member node 14).
  • FIG. 7 is a flowchart of an AON switchover process 300 implemented by the standby RG member node 16 subsequent to the loss of connectivity with the active RG member node 14 such as due to the fault 102 on the ICCP link 28. The standby RG member node 16 performs the AON switchover process 300 to eliminate chances that the split-brain problem may cause a loop. The standby RG member node 16 begins the AON switchover process 300 subsequent to the loss of adjacency with the active RG member node 14 (step 302). Subsequent to loss of adjacency (the ICCP link 28 failure), the standby RG member node 16 remains in the standby state on all of the ports 36 keeping the SYNC bits set to 0 with the standby RG member node 16 monitoring LACPDUs from the DHD 18 for their associated SYNC bit (step 304). Specifically, this monitoring does not require the DHD 18 to make changes, but simply assumes DHD 18 to operate standard LACP in an N:N link-level redundancy scheme.
  • The standby RG member node 16 can infer the operational status of the active ports 34 based on the SYNC bits from the DHD 18 on the standby ports 36. Specifically, the standby RG member node 16 knows the value of N (N:N) and can infer the number of active/failed links on the ports 34 based on the number of SYNC bit values equal to 1 coming from the DHD 18 on the ports 36. Thus, the AON switchover process 300 operates in a triangle MC-LAG with N:N active/standby configurations.
  • Based on the monitoring, the standby RG member node 16 can determine if any active links have failed (step 306). Specifically, no active links have failed if none of the ports 36 have the SYNC bit set to 0 coming from the DHD 18 and the standby RG member node 16 remains, (step 304), in the standby state on all of the ports 36 keeping the SYNC bits set to 0 and the standby RG member node 16 monitors LACPDUs from the DHD 18 for their associated SYNC bit (step 306).
  • There are active links failed if any link on the ports 36 has the SYNC bit set to 1 coming from the DHD 18 (step 306). The standby RG member node 16 determines whether all of the active links have failed or whether some, but not all of the active links have failed (step 306). The standby RG member node 16 will only become active when all of the active links (34) have failed. This prevents the loops and does not require coordination with the DHD 18 or changes to system MAC addresses.
  • The standby RG member node 16 can determine whether or not all of the active links have failed by determining the number of links on the ports 36 from the DHD 18 which are showing the SYNC bit as 1. That is, if all of the ports 36 are showing LACPDUs from the DHD 18 with the SYNC bit as 1, then all of the active links (34) have failed, i.e., N links on the ports 36 show SYNC=1 from the DHD 18 then the N links on the ports 34 are failed.
  • If not all of the active links have failed (step 306), then the standby RG member node 16 remains in the standby state on all ports keeping the SYNC bits set to 0 and continues to monitor LACPDUs from the DHD 18 (step 304). If all of the active links (34) have failed (step 308), the standby RG member node enters the active state on all ports 36 changing the SYNC bits to 1 (step 308). This will result in the backup path going to distribution state and traffic will resume after protection switching.
  • Again, the AON switchover process 300 is implemented on the RG 12 and therefore is interoperable with any vendor's DHD 18 supporting standard LACP and the switchover time is not compromised since no re-convergence is required. Also, the AON switchover process 300 can be configurable and selectively enabled/disabled on both of the member nodes 14, 16.
  • Referring back to FIGS. 5 and 6, an operation of the AON switchover process 300 is illustrated. In FIG. 5, similar to FIG. 4, at step 350-1, there is a fault 102 that causes the ICCP link 28 to fail. Again, the fault 102 could be for any reason. At step 350-2, the member nodes 14, 16 detect the ICCP link 28 failure and report the same to the MC-LAG 10. At step 350-3, the active member RG node 14 goes to standalone (active), and the SYNC bit remains at 1 on the operational links in the ports 34. Also at step 350-3, if the standby RG member node 16 is configured with the AON switchover process 300 enabled, the standby RG member node 16 goes to a standalone mode, but non-distributing, keeping the SYNC bits set at 0 for all links in the ports 36.
  • Now, in the standalone mode, but non-distributing, the standby RG member node 16 monitors the LACPDUs from the DHD 18 on the ports 36. At step 350-4, the DHD 18 determines the fault 104 on the ports 34 and since this is N:N redundancy, the DHD 18 selects a standby port as active on the ports 36 setting the SYNC bit to 1. Note, since the standby RG member node 16 is operating the AON switchover process 300, the standby RG member node 16 remains in the standalone mode, but non-distributing with all links in the ports 36 transmitting SYNC=0 to the DHD 18.
  • In FIG. 6, at step 350-5, the last link in the ports 34 fails. The active RG member node 14 goes into standalone, non-distributing and the SYNC bits are 0 on all links on the ports 34. At step 350-6, the DHD 18 selects another standby port of the ports 36 to set as active and sets the SYNC bit to 1. At step 350-7, the standby RG member node 16 determines that all of the active links (34) have failed. In this example, this is due to the DHD 18 sending SYNC=1 on two ports of the ports 36, N=2 here. At this point, (350-7) the standby RG member node 16 sets the SYNC bit to 1 on all of the ports 36 since the DHD 18 also has the SYNC bit set to 1 on all of the ports 36 and the ports 36 go into distribution, such that the traffic switches from the ports 34 to the ports 36.
  • Network Element
  • FIG. 8 illustrates an example network element 400 for the systems and methods described herein. In this embodiment, the network element 400 is an Ethernet, MPLS, IP, etc. network switch, but those of ordinary skill in the art will recognize the systems and methods described herein can operate with other types of network elements and other implementations. Specifically, the network element 400 can be the RG member nodes 14, 16. Also, the network element 400 can be the DHD 18 as well. In this embodiment, the network element 400 includes a plurality of blades 402, 404 interconnected via an interface 406. The blades 402, 404 are also known as line cards, line modules, circuit packs, pluggable modules, etc. and generally refer to components mounted on a chassis, shelf, etc. of a data switching device, i.e., the network element 400. Each of the blades 402, 404 can include numerous electronic devices and optical devices mounted on a circuit board along with various interconnects including interfaces to the chassis, shelf, etc. Those skilled in the art will recognize that the network element 400 is illustrated in an oversimplified manner and may include other components and functionality.
  • Two blades are illustrated with line blades 402 and control blades 404. The line blades 402 include data ports 408 such as a plurality of Ethernet ports. For example, the line blade 402 can include a plurality of physical ports disposed on an exterior of the blade 402 for receiving ingress/egress connections. Additionally, the line blades 402 can include switching components to form a switching fabric via the interface 406 between all of the data ports 408 allowing data traffic to be switched between the data ports 408 on the various line blades 402. The switching fabric is a combination of hardware, software, firmware, etc. that moves data coming into the network element 400 out by the correct port 408 to the next network element 400. “Switching fabric” includes switching units, or individual boxes, in a node; integrated circuits contained in the switching units; and programming that allows switching paths to be controlled. Note, the switching fabric can be distributed on the blades 402, 404, in a separate blade (not shown), or a combination thereof. The line blades 402 can include an Ethernet manager (i.e., a processor) and a Network Processor (NP)/Application Specific Integrated Circuit (ASIC).
  • The control blades 404 include a microprocessor 410, memory 412, software 414, and a network interface 416. Specifically, the microprocessor 410, the memory 412, and the software 414 can collectively control, configure, provision, monitor, etc. the network element 400. The network interface 416 may be utilized to communicate with an element manager, a network management system, etc. Additionally, the control blades 404 can include a database 420 that tracks and maintains provisioning, configuration, operational data and the like. In this embodiment, the network element 400 includes two control blades 404 which may operate in a redundant or protected configuration such as 1:1, 1+1, etc. In general, the control blades 404 maintain dynamic system information including packet forwarding databases, protocol state machines, and the operational status of the ports 408 within the network element 400.
  • When operating as the standby RG member node 16, the various components of the network element 400 can be configured to implement the AON switchover process 300.
  • It will be appreciated that some embodiments described herein may include one or more generic or specialized processors (“one or more processors”) such as microprocessors; Central Processing Units (CPUs); Digital Signal Processors (DSPs): customized processors such as Network Processors (NPs) or Network Processing Units (NPUs), Graphics Processing Units (GPUs), or the like; Field Programmable Gate Arrays (FPGAs); and the like along with unique stored program instructions (including both software and firmware) for control thereof to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the methods and/or systems described herein. Alternatively, some or all functions may be implemented by a state machine that has no stored program instructions, or in one or more Application Specific Integrated Circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic or circuitry. Of course, a combination of the aforementioned approaches may be used. For some of the embodiments described herein, a corresponding device in hardware and optionally with software, firmware, and a combination thereof can be referred to as “circuitry configured or adapted to,” “logic configured or adapted to,” etc. perform a set of operations, steps, methods, processes, algorithms, functions, techniques, etc. on digital and/or analog signals as described herein for the various embodiments.
  • Moreover, some embodiments may include a non-transitory computer-readable storage medium having computer readable code stored thereon for programming a computer, server, appliance, device, processor, circuit, etc. each of which may include a processor to perform functions as described and claimed herein. Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory), Flash memory, and the like. When stored in the non-transitory computer readable medium, software can include instructions executable by a processor or device (e.g., any type of programmable circuitry or logic) that, in response to such execution, cause a processor or the device to perform a set of operations, steps, methods, processes, algorithms, functions, techniques, etc. as described herein for the various exemplary embodiments.
  • Although the present disclosure has been illustrated and described herein with reference to preferred embodiments and specific examples thereof, it will be readily apparent to those of ordinary skill in the art that other embodiments and examples may perform similar functions and/or achieve like results. All such equivalent embodiments and examples are within the spirit and scope of the present disclosure, are contemplated thereby, and are intended to be covered by the following claims.

Claims (20)

What is claimed is:
1. A method utilizing all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network implemented by a standby node, the method comprising:
remaining in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing;
monitoring frames transmitted by the common endpoint to the standby node over the standby links; and
determining based on the monitoring frames whether all active links between the active node and the common endpoint have failed and entering an active state with all the standby links distributing based thereon.
2. The method of claim 1, further comprising:
determining based on the monitoring frames whether less than all of the active links have failed and remaining in the standby state and continuing monitoring the frames transmitted by the common endpoint over the standby links based thereon.
3. The method of claim 1, wherein the monitoring checks for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
4. The method of claim 1, wherein the common endpoint is communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
5. The method of claim 1, wherein the common endpoint is configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
6. The method of claim 1, wherein the common endpoint is unaware the active node and the standby node are in separate network elements.
7. The method of claim 1, wherein the loss of adjacency with the active node is based on a failure or fault on a link between the active node and the standby node used for coordination of the active node and the standby node in a Redundant Group, while the active node and the standby node are both operational.
8. A standby node in a Multi-Chassis Link Aggregation Group (MC-LAG) network configured with all-or-none switchover to prevent split-brain problems, the standby node comprising:
a plurality of ports in a logical Link Aggregation Group (LAG) with an active node, wherein the plurality of ports form standby links with a common endpoint;
a communication link with an active node; and
a switching fabric between the plurality of ports,
wherein the standby node is configured to
remain in a standby state responsive to a loss of the communication link, wherein, in the standby state, all the standby links are non-distributing;
monitor frames transmitted by the common endpoint to the standby node over the standby links; and
determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
9. The standby node of claim 8, wherein the standby node is further configured to
determine based on the monitoring frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
10. The standby node of claim 8, wherein the frames are monitored to check for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
11. The standby node of claim 8, wherein the common endpoint is communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
12. The standby node of claim 8, wherein the common endpoint is configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
13. The standby node of claim 8, wherein the common endpoint is unaware the active node and the standby node are in separate network elements.
14. The standby node of claim 8, wherein the loss of adjacency with the active node is based on a failure or fault on the communication link, while the active node and the standby node are both operational.
15. An apparatus configured for all-or-none switchover to prevent split-brain problems in a Multi-Chassis Link Aggregation Group (MC-LAG) network located at a standby node, the apparatus comprising:
circuitry configured to remain in a standby state responsive to a loss of adjacency with an active node, wherein, in the standby state, all standby links between the standby node and a common endpoint are non-distributing;
circuitry configured to monitor frames transmitted by the common endpoint to the standby node over the standby links; and
circuitry configured to determine based on the monitored frames whether all active links between the active node and the common endpoint have failed and enter an active state with all the standby links distributing based thereon.
16. The apparatus of claim 15, further comprising:
circuitry configured to determine based on the monitored frames whether less than all of the active links have failed and remain in the standby state and continue monitoring the frames transmitted by the common endpoint over the standby links based thereon.
17. The apparatus of claim 15, wherein the circuitry configured to monitor checks for a presence of SYNC bits from the common endpoint with each SYNC bit set to TRUE indicative of a switch by the common endpoint of one of the active links to one of the standby links.
18. The apparatus of claim 15, wherein the common endpoint is communicatively coupled to both the active node and the standby node in an active/standby triangle topology.
19. The apparatus of claim 15, wherein the common endpoint is configured to operate Link Aggregation Control Protocol (LACP) and an N:N link-level redundancy between the active node and the standby node.
20. The apparatus of claim 15, wherein the common endpoint is unaware the active node and the standby node are in separate network elements.
US15/611,283 2017-06-01 2017-06-01 All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups Active 2037-09-01 US10164873B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/611,283 US10164873B1 (en) 2017-06-01 2017-06-01 All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/611,283 US10164873B1 (en) 2017-06-01 2017-06-01 All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups

Publications (2)

Publication Number Publication Date
US20180351855A1 true US20180351855A1 (en) 2018-12-06
US10164873B1 US10164873B1 (en) 2018-12-25

Family

ID=64459042

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/611,283 Active 2037-09-01 US10164873B1 (en) 2017-06-01 2017-06-01 All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups

Country Status (1)

Country Link
US (1) US10164873B1 (en)

Cited By (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190089590A1 (en) * 2017-09-19 2019-03-21 Cisco Technology, Inc. Detection and Configuration of a Logical Channel
US20190140889A1 (en) * 2017-11-09 2019-05-09 Nicira, Inc. Method and system of a high availability enhancements to a computer network
US10523539B2 (en) 2017-06-22 2019-12-31 Nicira, Inc. Method and system of resiliency in cloud-delivered SD-WAN
US20200007666A1 (en) * 2018-06-27 2020-01-02 T-Mobile Usa, Inc. Micro-level network node failover system
US10594516B2 (en) 2017-10-02 2020-03-17 Vmware, Inc. Virtual network provider
US10749711B2 (en) 2013-07-10 2020-08-18 Nicira, Inc. Network-link method useful for a last-mile connectivity in an edge-gateway multipath system
US10771317B1 (en) * 2018-11-13 2020-09-08 Juniper Networks, Inc. Reducing traffic loss during link failure in an ethernet virtual private network multihoming topology
US10778528B2 (en) 2017-02-11 2020-09-15 Nicira, Inc. Method and system of connecting to a multipath hub in a cluster
US10805272B2 (en) 2015-04-13 2020-10-13 Nicira, Inc. Method and system of establishing a virtual private network in a cloud service for branch networking
WO2021025826A1 (en) * 2019-08-02 2021-02-11 Ciena Corporation Retaining active operations, administration, and maintenance (oam) sessions across multiple devices operating as a single logical device
US10959098B2 (en) 2017-10-02 2021-03-23 Vmware, Inc. Dynamically specifying multiple public cloud edge nodes to connect to an external multi-computer node
US10992558B1 (en) 2017-11-06 2021-04-27 Vmware, Inc. Method and apparatus for distributed data network traffic optimization
US10992568B2 (en) 2017-01-31 2021-04-27 Vmware, Inc. High performance software-defined core network
US10999100B2 (en) 2017-10-02 2021-05-04 Vmware, Inc. Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider
US10999165B2 (en) 2017-10-02 2021-05-04 Vmware, Inc. Three tiers of SaaS providers for deploying compute and network infrastructure in the public cloud
US10999137B2 (en) 2019-08-27 2021-05-04 Vmware, Inc. Providing recommendations for implementing virtual networks
US11012369B2 (en) * 2019-07-05 2021-05-18 Dell Products L.P. Aggregated switch path optimization system
US20210160318A1 (en) * 2014-06-04 2021-05-27 Pure Storage, Inc. Scale out storage platform having active failover
US11044190B2 (en) 2019-10-28 2021-06-22 Vmware, Inc. Managing forwarding elements at edge nodes connected to a virtual network
US11050588B2 (en) 2013-07-10 2021-06-29 Nicira, Inc. Method and system of overlay flow control
US11088963B2 (en) * 2019-12-16 2021-08-10 Dell Products L.P. Automatic aggregated networking device backup link configuration system
US11089111B2 (en) 2017-10-02 2021-08-10 Vmware, Inc. Layer four optimization for a virtual network defined over public cloud
US11115480B2 (en) 2017-10-02 2021-09-07 Vmware, Inc. Layer four optimization for a virtual network defined over public cloud
US11121962B2 (en) 2017-01-31 2021-09-14 Vmware, Inc. High performance software-defined core network
CN113676405A (en) * 2021-08-18 2021-11-19 上海晨驭信息科技有限公司 Load sharing-based rapid link master-slave switching distributed system and method
US11206197B2 (en) 2017-04-05 2021-12-21 Ciena Corporation Scaling operations, administration, and maintenance sessions in packet networks
US11245641B2 (en) 2020-07-02 2022-02-08 Vmware, Inc. Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN
US11252079B2 (en) 2017-01-31 2022-02-15 Vmware, Inc. High performance software-defined core network
US11363124B2 (en) 2020-07-30 2022-06-14 Vmware, Inc. Zero copy socket splicing
US11375005B1 (en) 2021-07-24 2022-06-28 Vmware, Inc. High availability solutions for a secure access service edge application
US11374904B2 (en) 2015-04-13 2022-06-28 Nicira, Inc. Method and system of a cloud-based multipath routing protocol
US11381499B1 (en) 2021-05-03 2022-07-05 Vmware, Inc. Routing meshes for facilitating routing through an SD-WAN
US11394640B2 (en) 2019-12-12 2022-07-19 Vmware, Inc. Collecting and analyzing data regarding flows associated with DPI parameters
US20220247631A1 (en) * 2019-05-28 2022-08-04 Nippon Telegraph And Telephone Corporation Network management apparatus and method
US11418997B2 (en) 2020-01-24 2022-08-16 Vmware, Inc. Using heart beats to monitor operational state of service classes of a QoS aware network link
US11444872B2 (en) 2015-04-13 2022-09-13 Nicira, Inc. Method and system of application-aware routing with crowdsourcing
US11444865B2 (en) 2020-11-17 2022-09-13 Vmware, Inc. Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN
US11489720B1 (en) 2021-06-18 2022-11-01 Vmware, Inc. Method and apparatus to evaluate resource elements and public clouds for deploying tenant deployable elements based on harvested performance metrics
US11489783B2 (en) 2019-12-12 2022-11-01 Vmware, Inc. Performing deep packet inspection in a software defined wide area network
US11502943B2 (en) * 2019-05-14 2022-11-15 Hewlett Packard Enterprise Development Lp Distributed neighbor state management for networked aggregate peers
US11539551B2 (en) * 2018-01-11 2022-12-27 Huawei Technologies Co., Ltd. Data transmission method, device, and network system
US11575600B2 (en) 2020-11-24 2023-02-07 Vmware, Inc. Tunnel-less SD-WAN
US11601356B2 (en) 2020-12-29 2023-03-07 Vmware, Inc. Emulating packet flows to assess network links for SD-WAN
US11606286B2 (en) 2017-01-31 2023-03-14 Vmware, Inc. High performance software-defined core network
US20230103537A1 (en) * 2020-02-27 2023-04-06 Nippon Telegraph And Telephone Corporation Communication system, network relay device, network relay method, and program
US11706127B2 (en) 2017-01-31 2023-07-18 Vmware, Inc. High performance software-defined core network
US11706126B2 (en) 2017-01-31 2023-07-18 Vmware, Inc. Method and apparatus for distributed data network traffic optimization
US11729065B2 (en) 2021-05-06 2023-08-15 Vmware, Inc. Methods for application defined virtual network service among multiple transport in SD-WAN
US11792127B2 (en) 2021-01-18 2023-10-17 Vmware, Inc. Network-aware load balancing
US11909815B2 (en) 2022-06-06 2024-02-20 VMware LLC Routing based on geolocation costs
US11943146B2 (en) 2021-10-01 2024-03-26 VMware LLC Traffic prioritization in SD-WAN
US11979325B2 (en) 2021-01-28 2024-05-07 VMware LLC Dynamic SD-WAN hub cluster scaling with machine learning
US12009987B2 (en) 2021-05-03 2024-06-11 VMware LLC Methods to support dynamic transit paths through hub clustering across branches in SD-WAN
US12015536B2 (en) 2021-06-18 2024-06-18 VMware LLC Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of types of resource elements in the public clouds
US12034587B1 (en) 2023-03-27 2024-07-09 VMware LLC Identifying and remediating anomalies in a self-healing network
US12047282B2 (en) 2021-07-22 2024-07-23 VMware LLC Methods for smart bandwidth aggregation based dynamic overlay selection among preferred exits in SD-WAN
US12057993B1 (en) 2023-03-27 2024-08-06 VMware LLC Identifying and remediating anomalies in a self-healing network
US12066907B1 (en) 2023-04-28 2024-08-20 Netapp, Inc. Collection of state information by nodes in a cluster to handle cluster management after master-node failover
US12137140B2 (en) * 2021-01-13 2024-11-05 Pure Storage, Inc. Scale out storage platform having active failover

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110535791B (en) * 2019-06-25 2022-03-08 南京邮电大学 Data center network based on prism structure
US11290422B1 (en) 2020-12-07 2022-03-29 Ciena Corporation Path-aware NAPT session management scheme with multiple paths and priorities in SD-WAN
US11659002B2 (en) 2021-05-04 2023-05-23 Ciena Corporation Extending Media Access Control Security (MACsec) to Network-to-Network Interfaces (NNIs)
US12113696B2 (en) 2022-02-01 2024-10-08 Bank Of America Corporation System and method for monitoring network processing optimization

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8817594B2 (en) 2010-07-13 2014-08-26 Telefonaktiebolaget L M Ericsson (Publ) Technique establishing a forwarding path in a network system
US8488608B2 (en) * 2010-08-04 2013-07-16 Alcatel Lucent System and method for traffic distribution in a multi-chassis link aggregation
US8902738B2 (en) * 2012-01-04 2014-12-02 Cisco Technology, Inc. Dynamically adjusting active members in multichassis link bundle
US8885562B2 (en) 2012-03-28 2014-11-11 Telefonaktiebolaget L M Ericsson (Publ) Inter-chassis redundancy with coordinated traffic direction
US20160323179A1 (en) * 2015-04-29 2016-11-03 Telefonaktiebolaget L M Ericsson (Publ) Bng subscribers inter-chassis redundancy using mc-lag
US10305796B2 (en) 2015-06-01 2019-05-28 Ciena Corporation Enhanced forwarding database synchronization for media access control addresses learned in interconnected layer-2 architectures
US9992102B2 (en) 2015-08-28 2018-06-05 Ciena Corporation Methods and systems to select active and standby ports in link aggregation groups

Cited By (119)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10749711B2 (en) 2013-07-10 2020-08-18 Nicira, Inc. Network-link method useful for a last-mile connectivity in an edge-gateway multipath system
US11804988B2 (en) 2013-07-10 2023-10-31 Nicira, Inc. Method and system of overlay flow control
US11212140B2 (en) 2013-07-10 2021-12-28 Nicira, Inc. Network-link method useful for a last-mile connectivity in an edge-gateway multipath system
US11050588B2 (en) 2013-07-10 2021-06-29 Nicira, Inc. Method and system of overlay flow control
US20210160318A1 (en) * 2014-06-04 2021-05-27 Pure Storage, Inc. Scale out storage platform having active failover
US11374904B2 (en) 2015-04-13 2022-06-28 Nicira, Inc. Method and system of a cloud-based multipath routing protocol
US11677720B2 (en) 2015-04-13 2023-06-13 Nicira, Inc. Method and system of establishing a virtual private network in a cloud service for branch networking
US11444872B2 (en) 2015-04-13 2022-09-13 Nicira, Inc. Method and system of application-aware routing with crowdsourcing
US10805272B2 (en) 2015-04-13 2020-10-13 Nicira, Inc. Method and system of establishing a virtual private network in a cloud service for branch networking
US11706127B2 (en) 2017-01-31 2023-07-18 Vmware, Inc. High performance software-defined core network
US10992568B2 (en) 2017-01-31 2021-04-27 Vmware, Inc. High performance software-defined core network
US12058030B2 (en) 2017-01-31 2024-08-06 VMware LLC High performance software-defined core network
US12034630B2 (en) 2017-01-31 2024-07-09 VMware LLC Method and apparatus for distributed data network traffic optimization
US11606286B2 (en) 2017-01-31 2023-03-14 Vmware, Inc. High performance software-defined core network
US11121962B2 (en) 2017-01-31 2021-09-14 Vmware, Inc. High performance software-defined core network
US11252079B2 (en) 2017-01-31 2022-02-15 Vmware, Inc. High performance software-defined core network
US11700196B2 (en) 2017-01-31 2023-07-11 Vmware, Inc. High performance software-defined core network
US11706126B2 (en) 2017-01-31 2023-07-18 Vmware, Inc. Method and apparatus for distributed data network traffic optimization
US11349722B2 (en) 2017-02-11 2022-05-31 Nicira, Inc. Method and system of connecting to a multipath hub in a cluster
US10778528B2 (en) 2017-02-11 2020-09-15 Nicira, Inc. Method and system of connecting to a multipath hub in a cluster
US12047244B2 (en) 2017-02-11 2024-07-23 Nicira, Inc. Method and system of connecting to a multipath hub in a cluster
US11206197B2 (en) 2017-04-05 2021-12-21 Ciena Corporation Scaling operations, administration, and maintenance sessions in packet networks
US10938693B2 (en) 2017-06-22 2021-03-02 Nicira, Inc. Method and system of resiliency in cloud-delivered SD-WAN
US10523539B2 (en) 2017-06-22 2019-12-31 Nicira, Inc. Method and system of resiliency in cloud-delivered SD-WAN
US11533248B2 (en) 2017-06-22 2022-12-20 Nicira, Inc. Method and system of resiliency in cloud-delivered SD-WAN
US10574519B2 (en) * 2017-09-19 2020-02-25 Cisco Technology, Inc. Detection and configuration of a logical channel
US20190089590A1 (en) * 2017-09-19 2019-03-21 Cisco Technology, Inc. Detection and Configuration of a Logical Channel
US11115480B2 (en) 2017-10-02 2021-09-07 Vmware, Inc. Layer four optimization for a virtual network defined over public cloud
US11606225B2 (en) 2017-10-02 2023-03-14 Vmware, Inc. Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider
US10666460B2 (en) 2017-10-02 2020-05-26 Vmware, Inc. Measurement based routing through multiple public clouds
US11005684B2 (en) 2017-10-02 2021-05-11 Vmware, Inc. Creating virtual networks spanning multiple public clouds
US10608844B2 (en) 2017-10-02 2020-03-31 Vmware, Inc. Graph based routing through multiple public clouds
US10999165B2 (en) 2017-10-02 2021-05-04 Vmware, Inc. Three tiers of SaaS providers for deploying compute and network infrastructure in the public cloud
US11089111B2 (en) 2017-10-02 2021-08-10 Vmware, Inc. Layer four optimization for a virtual network defined over public cloud
US11102032B2 (en) 2017-10-02 2021-08-24 Vmware, Inc. Routing data message flow through multiple public clouds
US11516049B2 (en) 2017-10-02 2022-11-29 Vmware, Inc. Overlay network encapsulation to forward data message flows through multiple public cloud datacenters
US10999100B2 (en) 2017-10-02 2021-05-04 Vmware, Inc. Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SAAS provider
US10778466B2 (en) 2017-10-02 2020-09-15 Vmware, Inc. Processing data messages of a virtual network that are sent to and received from external service machines
US10686625B2 (en) 2017-10-02 2020-06-16 Vmware, Inc. Defining and distributing routes for a virtual network
US11894949B2 (en) 2017-10-02 2024-02-06 VMware LLC Identifying multiple nodes in a virtual network defined over a set of public clouds to connect to an external SaaS provider
US11895194B2 (en) 2017-10-02 2024-02-06 VMware LLC Layer four optimization for a virtual network defined over public cloud
US10805114B2 (en) 2017-10-02 2020-10-13 Vmware, Inc. Processing data messages of a virtual network that are sent to and received from external service machines
US11855805B2 (en) 2017-10-02 2023-12-26 Vmware, Inc. Deploying firewall for virtual network defined over public cloud infrastructure
US10594516B2 (en) 2017-10-02 2020-03-17 Vmware, Inc. Virtual network provider
US10841131B2 (en) 2017-10-02 2020-11-17 Vmware, Inc. Distributed WAN security gateway
US10958479B2 (en) 2017-10-02 2021-03-23 Vmware, Inc. Selecting one node from several candidate nodes in several public clouds to establish a virtual network that spans the public clouds
US10959098B2 (en) 2017-10-02 2021-03-23 Vmware, Inc. Dynamically specifying multiple public cloud edge nodes to connect to an external multi-computer node
US10992558B1 (en) 2017-11-06 2021-04-27 Vmware, Inc. Method and apparatus for distributed data network traffic optimization
US11323307B2 (en) 2017-11-09 2022-05-03 Nicira, Inc. Method and system of a dynamic high-availability mode based on current wide area network connectivity
US20190140889A1 (en) * 2017-11-09 2019-05-09 Nicira, Inc. Method and system of a high availability enhancements to a computer network
US11902086B2 (en) * 2017-11-09 2024-02-13 Nicira, Inc. Method and system of a dynamic high-availability mode based on current wide area network connectivity
US11223514B2 (en) * 2017-11-09 2022-01-11 Nicira, Inc. Method and system of a dynamic high-availability mode based on current wide area network connectivity
US20220131740A1 (en) * 2017-11-09 2022-04-28 Nicira, Inc. Method and system of a dynamic high-availability mode based on current wide area network connectivity
US11539551B2 (en) * 2018-01-11 2022-12-27 Huawei Technologies Co., Ltd. Data transmission method, device, and network system
US20230073291A1 (en) * 2018-01-11 2023-03-09 Huawei Technologies Co., Ltd. Data transmission method, device, and network system
US12034568B2 (en) * 2018-01-11 2024-07-09 Huawei Technologies Co., Ltd. Data transmission method, device, and network system
US20200007666A1 (en) * 2018-06-27 2020-01-02 T-Mobile Usa, Inc. Micro-level network node failover system
US10972588B2 (en) * 2018-06-27 2021-04-06 T-Mobile Usa, Inc. Micro-level network node failover system
US10771317B1 (en) * 2018-11-13 2020-09-08 Juniper Networks, Inc. Reducing traffic loss during link failure in an ethernet virtual private network multihoming topology
US11502943B2 (en) * 2019-05-14 2022-11-15 Hewlett Packard Enterprise Development Lp Distributed neighbor state management for networked aggregate peers
US20220247631A1 (en) * 2019-05-28 2022-08-04 Nippon Telegraph And Telephone Corporation Network management apparatus and method
US11012369B2 (en) * 2019-07-05 2021-05-18 Dell Products L.P. Aggregated switch path optimization system
US11310102B2 (en) 2019-08-02 2022-04-19 Ciena Corporation Retaining active operations, administration, and maintenance (OAM) sessions across multiple devices operating as a single logical device
WO2021025826A1 (en) * 2019-08-02 2021-02-11 Ciena Corporation Retaining active operations, administration, and maintenance (oam) sessions across multiple devices operating as a single logical device
US11018995B2 (en) 2019-08-27 2021-05-25 Vmware, Inc. Alleviating congestion in a virtual network deployed over public clouds for an entity
US11252106B2 (en) 2019-08-27 2022-02-15 Vmware, Inc. Alleviating congestion in a virtual network deployed over public clouds for an entity
US11171885B2 (en) 2019-08-27 2021-11-09 Vmware, Inc. Providing recommendations for implementing virtual networks
US11212238B2 (en) 2019-08-27 2021-12-28 Vmware, Inc. Providing recommendations for implementing virtual networks
US11831414B2 (en) 2019-08-27 2023-11-28 Vmware, Inc. Providing recommendations for implementing virtual networks
US11153230B2 (en) 2019-08-27 2021-10-19 Vmware, Inc. Having a remote device use a shared virtual network to access a dedicated virtual network defined over public clouds
US12132671B2 (en) 2019-08-27 2024-10-29 VMware LLC Providing recommendations for implementing virtual networks
US11606314B2 (en) 2019-08-27 2023-03-14 Vmware, Inc. Providing recommendations for implementing virtual networks
US11252105B2 (en) 2019-08-27 2022-02-15 Vmware, Inc. Identifying different SaaS optimal egress nodes for virtual networks of different entities
US10999137B2 (en) 2019-08-27 2021-05-04 Vmware, Inc. Providing recommendations for implementing virtual networks
US11258728B2 (en) 2019-08-27 2022-02-22 Vmware, Inc. Providing measurements of public cloud connections
US11310170B2 (en) 2019-08-27 2022-04-19 Vmware, Inc. Configuring edge nodes outside of public clouds to use routes defined through the public clouds
US11121985B2 (en) 2019-08-27 2021-09-14 Vmware, Inc. Defining different public cloud virtual networks for different entities based on different sets of measurements
US11044190B2 (en) 2019-10-28 2021-06-22 Vmware, Inc. Managing forwarding elements at edge nodes connected to a virtual network
US11611507B2 (en) 2019-10-28 2023-03-21 Vmware, Inc. Managing forwarding elements at edge nodes connected to a virtual network
US11716286B2 (en) 2019-12-12 2023-08-01 Vmware, Inc. Collecting and analyzing data regarding flows associated with DPI parameters
US11394640B2 (en) 2019-12-12 2022-07-19 Vmware, Inc. Collecting and analyzing data regarding flows associated with DPI parameters
US11489783B2 (en) 2019-12-12 2022-11-01 Vmware, Inc. Performing deep packet inspection in a software defined wide area network
US11088963B2 (en) * 2019-12-16 2021-08-10 Dell Products L.P. Automatic aggregated networking device backup link configuration system
US11722925B2 (en) 2020-01-24 2023-08-08 Vmware, Inc. Performing service class aware load balancing to distribute packets of a flow among multiple network links
US11606712B2 (en) 2020-01-24 2023-03-14 Vmware, Inc. Dynamically assigning service classes for a QOS aware network link
US12041479B2 (en) 2020-01-24 2024-07-16 VMware LLC Accurate traffic steering between links through sub-path path quality metrics
US11438789B2 (en) 2020-01-24 2022-09-06 Vmware, Inc. Computing and using different path quality metrics for different service classes
US11689959B2 (en) 2020-01-24 2023-06-27 Vmware, Inc. Generating path usability state for different sub-paths offered by a network link
US11418997B2 (en) 2020-01-24 2022-08-16 Vmware, Inc. Using heart beats to monitor operational state of service classes of a QoS aware network link
US20230103537A1 (en) * 2020-02-27 2023-04-06 Nippon Telegraph And Telephone Corporation Communication system, network relay device, network relay method, and program
US11477127B2 (en) 2020-07-02 2022-10-18 Vmware, Inc. Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN
US11245641B2 (en) 2020-07-02 2022-02-08 Vmware, Inc. Methods and apparatus for application aware hub clustering techniques for a hyper scale SD-WAN
US11709710B2 (en) 2020-07-30 2023-07-25 Vmware, Inc. Memory allocator for I/O operations
US11363124B2 (en) 2020-07-30 2022-06-14 Vmware, Inc. Zero copy socket splicing
US11575591B2 (en) 2020-11-17 2023-02-07 Vmware, Inc. Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN
US11444865B2 (en) 2020-11-17 2022-09-13 Vmware, Inc. Autonomous distributed forwarding plane traceability based anomaly detection in application traffic for hyper-scale SD-WAN
US11575600B2 (en) 2020-11-24 2023-02-07 Vmware, Inc. Tunnel-less SD-WAN
US11601356B2 (en) 2020-12-29 2023-03-07 Vmware, Inc. Emulating packet flows to assess network links for SD-WAN
US11929903B2 (en) 2020-12-29 2024-03-12 VMware LLC Emulating packet flows to assess network links for SD-WAN
US12137140B2 (en) * 2021-01-13 2024-11-05 Pure Storage, Inc. Scale out storage platform having active failover
US11792127B2 (en) 2021-01-18 2023-10-17 Vmware, Inc. Network-aware load balancing
US11979325B2 (en) 2021-01-28 2024-05-07 VMware LLC Dynamic SD-WAN hub cluster scaling with machine learning
US11582144B2 (en) 2021-05-03 2023-02-14 Vmware, Inc. Routing mesh to provide alternate routes through SD-WAN edge forwarding nodes based on degraded operational states of SD-WAN hubs
US11509571B1 (en) 2021-05-03 2022-11-22 Vmware, Inc. Cost-based routing mesh for facilitating routing through an SD-WAN
US12009987B2 (en) 2021-05-03 2024-06-11 VMware LLC Methods to support dynamic transit paths through hub clustering across branches in SD-WAN
US11637768B2 (en) 2021-05-03 2023-04-25 Vmware, Inc. On demand routing mesh for routing packets through SD-WAN edge forwarding nodes in an SD-WAN
US11388086B1 (en) 2021-05-03 2022-07-12 Vmware, Inc. On demand routing mesh for dynamically adjusting SD-WAN edge forwarding node roles to facilitate routing through an SD-WAN
US11381499B1 (en) 2021-05-03 2022-07-05 Vmware, Inc. Routing meshes for facilitating routing through an SD-WAN
US11729065B2 (en) 2021-05-06 2023-08-15 Vmware, Inc. Methods for application defined virtual network service among multiple transport in SD-WAN
US11489720B1 (en) 2021-06-18 2022-11-01 Vmware, Inc. Method and apparatus to evaluate resource elements and public clouds for deploying tenant deployable elements based on harvested performance metrics
US12015536B2 (en) 2021-06-18 2024-06-18 VMware LLC Method and apparatus for deploying tenant deployable elements across public clouds based on harvested performance metrics of types of resource elements in the public clouds
US12047282B2 (en) 2021-07-22 2024-07-23 VMware LLC Methods for smart bandwidth aggregation based dynamic overlay selection among preferred exits in SD-WAN
US11375005B1 (en) 2021-07-24 2022-06-28 Vmware, Inc. High availability solutions for a secure access service edge application
CN113676405A (en) * 2021-08-18 2021-11-19 上海晨驭信息科技有限公司 Load sharing-based rapid link master-slave switching distributed system and method
US11943146B2 (en) 2021-10-01 2024-03-26 VMware LLC Traffic prioritization in SD-WAN
US11909815B2 (en) 2022-06-06 2024-02-20 VMware LLC Routing based on geolocation costs
US12057993B1 (en) 2023-03-27 2024-08-06 VMware LLC Identifying and remediating anomalies in a self-healing network
US12034587B1 (en) 2023-03-27 2024-07-09 VMware LLC Identifying and remediating anomalies in a self-healing network
US12066907B1 (en) 2023-04-28 2024-08-20 Netapp, Inc. Collection of state information by nodes in a cluster to handle cluster management after master-node failover

Also Published As

Publication number Publication date
US10164873B1 (en) 2018-12-25

Similar Documents

Publication Publication Date Title
US10164873B1 (en) All-or-none switchover to address split-brain problems in multi-chassis link aggregation groups
US10257019B2 (en) Link aggregation split-brain detection and recovery
US9172662B2 (en) Virtual chassis system control protocols
US8027246B2 (en) Network system and node apparatus
US10454809B2 (en) Automatic network topology detection for merging two isolated networks
US20080215910A1 (en) High-Availability Networking with Intelligent Failover
US9148390B2 (en) System and method for virtual chassis split prevention
US20130094357A1 (en) Fhrp optimizations for n-way gateway load balancing in fabric path switching networks
US8873369B2 (en) Fiber channel 1:N redundancy
KR101691759B1 (en) Virtual chassis system control protocols
US10230540B2 (en) Method, device and system for communicating in a ring network
EP2918054B1 (en) System and method for a pass thru mode in a virtual chassis system
US20160308753A1 (en) Packet network linear protection systems and methods in a dual home or multi-home configuration
US9807051B1 (en) Systems and methods for detecting and resolving split-controller or split-stack conditions in port-extended networks
US8477598B2 (en) Method and system for implementing network element-level redundancy
US8711681B2 (en) Switch redundancy in systems with dual-star backplanes
US10862706B2 (en) Detection of node isolation in subtended ethernet ring topologies
US9577872B2 (en) Fiber channel 1:N redundancy
US8144574B1 (en) Distributed control packet processing
WO2014074546A1 (en) Network node and method in a node operable in a virtual chassis system wherein it is determined whether to issue a warning that an administrative action triggers a virtual chassis split
JP2007027954A (en) Packet network and layer 2 switch
Park et al. Toward control path high availability for software-defined networks
US20150271057A1 (en) Method for running a computer network
CN116248581B (en) Cloud scene gateway cluster master-slave switching method and system based on SDN
US8477599B2 (en) Method and system for implementing network element-level redundancy

Legal Events

Date Code Title Description
AS Assignment

Owner name: CIENA CORPORATION, MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SOOD, ANKIT;BAHERI, HOSSEIN;GUDIMETLA, LEELA SANKAR;AND OTHERS;REEL/FRAME:042569/0883

Effective date: 20170531

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4