Nothing Special   »   [go: up one dir, main page]

WO2013017017A1 - Load balancing in link aggregation - Google Patents

Load balancing in link aggregation Download PDF

Info

Publication number
WO2013017017A1
WO2013017017A1 PCT/CN2012/078855 CN2012078855W WO2013017017A1 WO 2013017017 A1 WO2013017017 A1 WO 2013017017A1 CN 2012078855 W CN2012078855 W CN 2012078855W WO 2013017017 A1 WO2013017017 A1 WO 2013017017A1
Authority
WO
WIPO (PCT)
Prior art keywords
port
member port
congestion
processor
hash value
Prior art date
Application number
PCT/CN2012/078855
Other languages
French (fr)
Inventor
Yanjun YANG
Original Assignee
Hangzhou H3C Technologies Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou H3C Technologies Co., Ltd filed Critical Hangzhou H3C Technologies Co., Ltd
Publication of WO2013017017A1 publication Critical patent/WO2013017017A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/7453Address table lookup; Address filtering using hashing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/245Link aggregation, e.g. trunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • the bandwidth of a single port in a network device has increased from 10 Mpbs to 10 Gbps in recent years. Despite the rapid increase, the bandwidth of a single port is still inadequate to meet the bandwidth requirements of various network applications. For example, the bandwidth requirement for an uplink port is generally at least 100 Gbps.
  • Fig. 1 is a block diagram of an example network supporting link aggregation
  • Fig. 2 is a block diagram of an example network device
  • Fig. 3 is a block diagram of the example network device in Fig. 2 during packet transmission;
  • Fig. 4 is a flowchart of an example method for load balancing in a network device supporting link aggregation
  • Fig. 5 is a detailed flowchart of one of the blocks in Fig. 4 for determining and storing status information of ports of a network device;
  • Fig. 6 is a flowchart of an example structure of a status notification message
  • Fig. 7 is a block diagram of an example structure of a line card of the example network device in Fig. 2 and Fig. 3. Detailed Description
  • Fig. 1 shows an example network 100 in which a network device 110 connects source devices 120 to destination devices 130 via multiple network links 140, 150.
  • the source devices 120 are connected to ingress ports 112 of the network device 110, while the destination devices 130 to egress ports 114 of the network device 110.
  • Each physical link 140, 150 terminates at an ingress 112 or egress port 114 of the network device 110.
  • the network device 110 supports link aggregation, which is also known as port aggregation and trunking.
  • Link aggregation is used to aggregate the network links 140, 150 as logical links 160, 170.
  • Link aggregation increases bandwidth and reliability, and also provides redundancy in case of failure or congestion at one network link or port.
  • the logical links 160, 170 each represent an aggregation group, in which: (a) Aggregation groups LAG1 to LAG3 are each formed by network links 140 connecting a source device 120 with the network device 1 10; and
  • Aggregation groups LAG4 to LAG6 are each formed by network links 150 connecting a destination device 130 with the network device 1 10.
  • the corresponding ingress 1 12 or egress 1 14 ports of each aggregation group are known as their "member ports” or “trunk members".
  • the network device 1 10 receives, from a source device 120, packets through one of the member ingress ports 1 12 of an aggregation group; and determines the egress port 1 14 through which the packets are sent to the destination device 130.
  • the destination device 130 the packets are sent is not necessarily their final destination.
  • the network device 110 may be any device for forwarding packets in the network 100, such as a switch, router etc.
  • the source 120 and destination 130 devices may be switches, routers, hosts, computers etc.
  • the network device 1 10 may be a core switch connecting multiple access switches (source devices 120) to a router (destination device 140). Although multiple source 120 and destination 130 devices are shown, it will be appreciated that the network device 1 10 may connect to one source device 120 and/or one destination device 130.
  • the network links 140, 150 may be any suitable links, such as optical fibres, copper wires etc. Referring now to Fig. 2, an example network device 1 10 in the form of a switch 200 will be explained in more detail.
  • the network device 1 10 includes multiple line cards, such as three ingress line cards (LCI , LC2 and LC3); and three egress line cards (LC4, LC5 and LC6) in this example.
  • the line cards are generally interconnected via internal forwarding paths such as a switching fabric 210.
  • the term "line card” is used to generally refer to a network interface, network interface card etc. for transmitting and receiving packets, frames, etc.
  • Each line card (LC1-LC6) further includes one or more processors that are connected to the ports (labelled Port 1 ... N) on the network device 1 10.
  • processors are provided on each line card (LC1-LC6) in the example in Fig. 2.
  • the processors (P1-P6) are interconnected via internal forwarding paths on the network device 110, such as the switching fabric 210 in Fig. 2.
  • the processors (P1-P6) are also known as "packet processors" and "forwarding chips”.
  • the ports connect the switch 110 to one or more other devices (120, 140) in the network 110, as explained using Fig. 1.
  • the ports on ingress line cards LCI and LC2 are known as ingress ports, and ports on egress line cards LC3 and LC4 are known as ingress ports.
  • the processor P2 determines the appropriate egress line card and egress port (e.g. Port 1 of LC4) through which the received packets are sent.
  • the appropriate egress line card and egress port are generally determined based on a hash value computed from the received packet(s). As such, it is possible that packets received at port 2 of processor PI are also forwarded to port 1 of processor P4, possibly causing load imbalance and congestion at port 1 of processor P4.
  • Fig. 4 is a flowchart of an example method 400 for load balancing in network device 110 supporting link aggregation:
  • the network device 110 determines and stores status information of member ports of each aggregation group.
  • the network device 110 determines a first hash value for one or more received packets.
  • the network device 110 determines a first member port of an aggregation group for sending the received packets.
  • the network device 110 determines whether there is congestion at the first member port. In one example, the determination may be based on the status information stored at block 410.
  • the network device 110 sends the received packets through the first member port.
  • the network device 110 determines a second hash value.
  • the network device 110 determines a second member port of the aggregation group through which the received packets are sent.
  • the received packets are sent through the second member port instead of the first member port.
  • traffic or load is diverted from a first member port to a second member port of an aggregation group. This reduces load imbalance among the redundant links, thereby reduces link resource wastage.
  • the example method will be explained in more detail below.
  • the network device 110 determines and stores status information of member ports of each aggregation group configured on the network device 110.
  • FIG. 5 An example of this process 410 is shown in Fig. 5, in which each processor (P1-P6) on the network device 110 determines the status of their physical ports and notifies other processors accordingly.
  • the processors (P1-P6) on the network device 110 each determine and store status information of the ports on the network device 110.
  • the status information sets out the relationship between member ports and aggregation groups, and the status of each member port.
  • the status information may be in the form of a forwarding table, an example of which is Table 1.
  • the status information includes the identifier of each aggregation group; number of member ports in each aggregation group; the identifier of each member port; and a congestion indicator of each member port.
  • the identifier of each member port of an aggregation group includes an identifier of the processor ("Device") connected to the member port, and an identifier of the port.
  • the congestion indicator is set to an initial value of zero to indicate no congestion.
  • the port status information may be stored in other suitable form(s), such as using an individual forwarding table for each aggregation group.
  • the processors (P1-P6) determine the status of their respective ports, and notify other processors of the network device 110 accordingly.
  • the notification may be in the form of a status notification message shown in Fig. 6 that includes:
  • the status notification message may be broadcasted to other processors on the network device 110 via the switching fabric 210.
  • the status may be determined periodically, and/or when triggered by a trigger signal received from other processors on the network device 110 or other nodes in the network.
  • processor P4 of line card LC4 determines the status of its ports (1 to N) and detects congestion at port 1 (see label 320). P4 then generates a status notification message with the first bit set to 1, which represents congestion at port 1.
  • the status notification message is sent to other processors (i.e. P1-P3 and P5-P6) on the network device 110.
  • a processor that receives the status notification message reporting congestion at one or more member ports, and updates its port status information accordingly.
  • processor P2 receives a status notification message from processor P4. From the sender identifier and port status information of the message, processor P2 learns that there is congestion at port 1 of processor P4. Processor P2 then retrieves its copy of the status information of ports on the network device 110 from a memory on line card LC2.
  • Processor P2 searches for the entry in Table 1 that is associated with port 1 of P4, and updates its congestion indicator from a value that indicates no congestion (zero) to a new value that indicates otherwise.
  • the new value may be in any suitable form, such as a random value determined by the processor P2.
  • the updated status information is shown in Table 2, in which the congestion indicator of Port 1 of P4 is set to a random value 0.52.
  • processor P4 similarly sends a status notification message to other processors on the network device 110. This time, the bit that represents port 1 of processor P4 in the 64-bit port status information is set to zero instead of a one.
  • processor P2 Upon receiving the status notification message from processor P4, processor P2 update the status information of port 1 of processor P4, from the random value to a value that represents no congestion, i.e. zero in this example.
  • a processor determines a first hash value for a packet received through an ingress port of the network device 110.
  • the first hash value may also be determined for a packet flow, but for simplicity, a packet is used here.
  • processor P2 receives a packet 310 through ingress port 1. After ingress processing, since the destination address of the received packet corresponds to an aggregation group (or aggregated links), processor P2 performs hash computation on the received packet to determine the first hash value.
  • the processor determines a first member port of an aggregation group for sending the received packet.
  • processor P2 determines the number of member ports of the aggregation group, and determine the first member port based on the hash value and number of member ports:
  • Index 1 First Hash Value % Number of Member Ports, where Index 1 (or Trunk Select Index 1) is the index of the first member port in the aggregation group; First Hash Value (or Trunk Hash Value) is the hash value of the received packet; and Number of Member Ports (or Number of Trunk Members) is the number of members in the aggregation group.
  • the processor determines whether there is congestion at the first member port.
  • the determination is based on the status information stored at block 410 in Fig. 4. If the value of the congestion indicator of the first member port is zero, this indicates no congestion at the first member port. But otherwise, if the congestion indicator has a random value, this indicates that there is congestion at the first member port.
  • the received packet is sent through the first member port.
  • the processor determines a second hash value.
  • the second hash value is determined based on the first hash value and an "additional value".
  • the congestion indicator of the first member port which has a random value if there is congestion, may be used as the "additional value". It will be appreciated that adding the additional value to the first hash value allows the same hash computation algorithm to be used, causing less disturbance compared to changing the algorithm.
  • processor P2 retrieves Table 2 from its memory to determine whether there is congestion at the first member port (i.e. port 1 of processor P4). Processor P2 searches for the entry that corresponds to the first member port and its aggregation group.
  • the first member port is port 1 of processor P4 of aggregation group ID 1, its congestion indicator has a random value of 0.52. This value is used to determine the second hash value, such as by adding the random value to the first hash value, as follows:
  • Second Hash Value First Hash Value + Random Value Second member port
  • the processor determines a second member port of the same aggregation group through which the packet is sent from the network device 1 10.
  • processor P2 determines the second member port as follows:
  • Index2 (First Hash Value + Random Factor) % Number of Member Ports, where Index2 (or Trunk Select Index2) is the index of the second member port in the aggregation group of the first member port; First Hash Value (or Trunk Hash Value) is the hash value of the received packet; Random Factor is a random value; and Number of Member Ports (or Number of Trunk Members) is the number of members in the aggregation group .
  • the received packet is sent through the second member port instead of the first member port, thereby diverting the traffic away from the first member port to reduce the likelihood of further aggravating the congestion.
  • blocks 460 and 470 may be repeated, if the second member port is also congested based on the status information of the second member port.
  • this third hash value is calculated to determine a third member port through which the received packet is sent, and so on.
  • Fig. 7 shows a block diagram of an example line card 700 in the example network device 1 10 in Fig. 2 and Fig. 3.
  • the line card 700 includes one or more processors 710 (labelled Pl to V ⁇ n) that are each connected to a subset of ports 720 (also labelled a to n).
  • the processors 710 are interconnected to each other via internal paths 750, and connected to a central processing unit (CPU) 730 and memory 740.
  • CPU central processing unit
  • Each processor 710 may be connected to any number of ports 320, and this number may vary from one processor 310 to another.
  • An aggregation group may include ports 720 from the same processor 310 or different ones, as well as ports on the same line card 700 or different line cards of the network device 110.
  • the CPU 730 programs the processors 710 with machine-readable instructions 742 to analyse a packet received at the line card 300, and determines an egress port of the network device 110 to forward the packet flow.
  • the machine-readable instructions 742 are stored in the memory 740.
  • the methods, processes and functional units described herein may be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof.
  • the term 'processor' is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc.
  • the processes, methods and functional units may all be performed by the one or more processors 710; reference in this disclosure or the claims to a 'processor' should thus be interpreted to mean 'one or more processors'.
  • the processes, methods and functional units described in this disclosure may be implemented in the form of a computer software product.
  • the computer software product is stored in a storage medium and comprises a plurality of instructions for making a processor to implement the methods recited in the examples of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method for load balancing in a network device supporting link aggregation. A first hash value is determined for a received packet. Based on the first hash value, a first member port of an aggregation group corresponding to a destination address of the packet is determined. If there is congestion at the first member port, a second hash value is determined for the received packet. Based on the second hash value, a second member port in the aggregation group is determined and the received packet is sent through the second member port. A network device and a processor for load balancing are also disclosed.

Description

Load Balancing in Link Aggregation
Background
The bandwidth of a single port in a network device has increased from 10 Mpbs to 10 Gbps in recent years. Despite the rapid increase, the bandwidth of a single port is still inadequate to meet the bandwidth requirements of various network applications. For example, the bandwidth requirement for an uplink port is generally at least 100 Gbps.
Brief Description of Drawings
Non-limiting example(s) will be described with reference to the following drawings, in which:
Fig. 1 is a block diagram of an example network supporting link aggregation; Fig. 2 is a block diagram of an example network device;
Fig. 3 is a block diagram of the example network device in Fig. 2 during packet transmission;
Fig. 4 is a flowchart of an example method for load balancing in a network device supporting link aggregation;
Fig. 5 is a detailed flowchart of one of the blocks in Fig. 4 for determining and storing status information of ports of a network device;
Fig. 6 is a flowchart of an example structure of a status notification message; and
Fig. 7 is a block diagram of an example structure of a line card of the example network device in Fig. 2 and Fig. 3. Detailed Description
Fig. 1 shows an example network 100 in which a network device 110 connects source devices 120 to destination devices 130 via multiple network links 140, 150. The source devices 120 are connected to ingress ports 112 of the network device 110, while the destination devices 130 to egress ports 114 of the network device 110. Each physical link 140, 150 terminates at an ingress 112 or egress port 114 of the network device 110.
The network device 110 supports link aggregation, which is also known as port aggregation and trunking. Link aggregation is used to aggregate the network links 140, 150 as logical links 160, 170. Link aggregation increases bandwidth and reliability, and also provides redundancy in case of failure or congestion at one network link or port. The logical links 160, 170 each represent an aggregation group, in which: (a) Aggregation groups LAG1 to LAG3 are each formed by network links 140 connecting a source device 120 with the network device 1 10; and
(b) Aggregation groups LAG4 to LAG6 are each formed by network links 150 connecting a destination device 130 with the network device 1 10.
The corresponding ingress 1 12 or egress 1 14 ports of each aggregation group are known as their "member ports" or "trunk members". During packet transmission, the network device 1 10 receives, from a source device 120, packets through one of the member ingress ports 1 12 of an aggregation group; and determines the egress port 1 14 through which the packets are sent to the destination device 130. The destination device 130 the packets are sent is not necessarily their final destination.
The network device 110 may be any device for forwarding packets in the network 100, such as a switch, router etc. The source 120 and destination 130 devices may be switches, routers, hosts, computers etc. In one example, the network device 1 10 may be a core switch connecting multiple access switches (source devices 120) to a router (destination device 140). Although multiple source 120 and destination 130 devices are shown, it will be appreciated that the network device 1 10 may connect to one source device 120 and/or one destination device 130. The network links 140, 150 may be any suitable links, such as optical fibres, copper wires etc. Referring now to Fig. 2, an example network device 1 10 in the form of a switch 200 will be explained in more detail. The network device 1 10 includes multiple line cards, such as three ingress line cards (LCI , LC2 and LC3); and three egress line cards (LC4, LC5 and LC6) in this example. The line cards are generally interconnected via internal forwarding paths such as a switching fabric 210. Throughout this disclosure, the term "line card" is used to generally refer to a network interface, network interface card etc. for transmitting and receiving packets, frames, etc.
Each line card (LC1-LC6) further includes one or more processors that are connected to the ports (labelled Port 1 ... N) on the network device 1 10. For simplicity, one processor is provided on each line card (LC1-LC6) in the example in Fig. 2. The processors (P1-P6) are interconnected via internal forwarding paths on the network device 110, such as the switching fabric 210 in Fig. 2. The processors (P1-P6) are also known as "packet processors" and "forwarding chips".
Referring also to the block diagram 300 in Fig. 3, there are two aggregation groups in the example in Fig. 2:
(a) Aggregation group LAG1 whose members are port 1 of processor P4, port 1 of processor P5, and port 1 of processor P6; and
(b) Aggregation group LAG2 whose members are port 2 of processor PI, and port 1 of processor P2.
The ports connect the switch 110 to one or more other devices (120, 140) in the network 110, as explained using Fig. 1. The ports on ingress line cards LCI and LC2 are known as ingress ports, and ports on egress line cards LC3 and LC4 are known as ingress ports.
In the example in Fig. 3, when packets 310 arrive at an ingress port of a line card (e.g. Port 1 of LC2) of the switch 200, the processor P2 determines the appropriate egress line card and egress port (e.g. Port 1 of LC4) through which the received packets are sent.
The appropriate egress line card and egress port are generally determined based on a hash value computed from the received packet(s). As such, it is possible that packets received at port 2 of processor PI are also forwarded to port 1 of processor P4, possibly causing load imbalance and congestion at port 1 of processor P4.
Fig. 4 is a flowchart of an example method 400 for load balancing in network device 110 supporting link aggregation:
At block 410, the network device 110 determines and stores status information of member ports of each aggregation group.
At block 420, the network device 110 determines a first hash value for one or more received packets. At block 430, based on the first hash value, the network device 110 determines a first member port of an aggregation group for sending the received packets. At block 440, the network device 110 determines whether there is congestion at the first member port. In one example, the determination may be based on the status information stored at block 410.
At block 450, if the determination at block 440 is not affirmative (no congestion), the network device 110 sends the received packets through the first member port. At block 460, if the determination at block 430 is affirmative (congested), the network device 110 determines a second hash value.
At block 470, based on the second hash value, the network device 110 determines a second member port of the aggregation group through which the received packets are sent.
At block 480, the received packets are sent through the second member port instead of the first member port. Using the example method, if there is congestion at the first member port, traffic or load is diverted from a first member port to a second member port of an aggregation group. This reduces load imbalance among the redundant links, thereby reduces link resource wastage. The example method will be explained in more detail below. Port status information
At block 410 in Fig. 4, the network device 110 determines and stores status information of member ports of each aggregation group configured on the network device 110.
An example of this process 410 is shown in Fig. 5, in which each processor (P1-P6) on the network device 110 determines the status of their physical ports and notifies other processors accordingly.
(i) At block 412 in Fig. 5, the processors (P1-P6) on the network device 110 each determine and store status information of the ports on the network device 110. The status information sets out the relationship between member ports and aggregation groups, and the status of each member port. In one implementation, the status information may be in the form of a forwarding table, an example of which is Table 1.
Table 1: Initial port status information
Figure imgf000007_0001
The status information includes the identifier of each aggregation group; number of member ports in each aggregation group; the identifier of each member port; and a congestion indicator of each member port. The identifier of each member port of an aggregation group includes an identifier of the processor ("Device") connected to the member port, and an identifier of the port.
In this example, there are three member ports in aggregation group 1 : port 1 of processor P4 ("Device 4") or; port 1 of P5 ("Device 5"); and port 1 of P6 ("Device 6"). The congestion indicator, or "congestion factor", is set to an initial value of zero to indicate no congestion.
It will be appreciated that the port status information may be stored in other suitable form(s), such as using an individual forwarding table for each aggregation group.
(ii) At block 414 in Fig. 5, the processors (P1-P6) determine the status of their respective ports, and notify other processors of the network device 110 accordingly. The notification may be in the form of a status notification message shown in Fig. 6 that includes:
(a) A 10-bit sender identifier, which identifying the processor that sends the status notification message.
(b) A 64-bit port status information or port congestion state, where each bit represents the status of a port connected to the processor. Value 0 represents no congestion, whereas value 1 represents the detection of congestion.
(c) A 54-bit reserved value that is currently not in use. The status notification message may be broadcasted to other processors on the network device 110 via the switching fabric 210. The status may be determined periodically, and/or when triggered by a trigger signal received from other processors on the network device 110 or other nodes in the network. In the example in Fig. 3, processor P4 of line card LC4 determines the status of its ports (1 to N) and detects congestion at port 1 (see label 320). P4 then generates a status notification message with the first bit set to 1, which represents congestion at port 1. The status notification message is sent to other processors (i.e. P1-P3 and P5-P6) on the network device 110.
(iii) At block 416 in Fig. 5, a processor that receives the status notification message reporting congestion at one or more member ports, and updates its port status information accordingly. In the example in Fig. 3, processor P2 receives a status notification message from processor P4. From the sender identifier and port status information of the message, processor P2 learns that there is congestion at port 1 of processor P4. Processor P2 then retrieves its copy of the status information of ports on the network device 110 from a memory on line card LC2.
Processor P2 searches for the entry in Table 1 that is associated with port 1 of P4, and updates its congestion indicator from a value that indicates no congestion (zero) to a new value that indicates otherwise. The new value may be in any suitable form, such as a random value determined by the processor P2. The updated status information is shown in Table 2, in which the congestion indicator of Port 1 of P4 is set to a random value 0.52.
Table 2: Updated status information
Aggregation Number of Member port Congestion
group ID member ports Information indicator 1 3 (Device 4, Port 1) 0.52
(Device 5, Port 1) 0
(Device 6, Port 1) 0
2 2 (Device 1, Port 2) 0
(Device 2, Port 1) 0
Once port 1 of processor P4 recovers from congestion, processor P4 similarly sends a status notification message to other processors on the network device 110. This time, the bit that represents port 1 of processor P4 in the 64-bit port status information is set to zero instead of a one.
Upon receiving the status notification message from processor P4, processor P2 update the status information of port 1 of processor P4, from the random value to a value that represents no congestion, i.e. zero in this example.
First hash value
At block 420 in Fig. 4, a processor determines a first hash value for a packet received through an ingress port of the network device 110. The first hash value may also be determined for a packet flow, but for simplicity, a packet is used here.
In the example in Fig. 3, processor P2 receives a packet 310 through ingress port 1. After ingress processing, since the destination address of the received packet corresponds to an aggregation group (or aggregated links), processor P2 performs hash computation on the received packet to determine the first hash value.
First member port
At block 430 in Fig. 4, based on the first hash value, the processor determines a first member port of an aggregation group for sending the received packet. In the example in Fig. 3, processor P2 determines the number of member ports of the aggregation group, and determine the first member port based on the hash value and number of member ports:
Index 1 = First Hash Value % Number of Member Ports, where Index 1 (or Trunk Select Index 1) is the index of the first member port in the aggregation group; First Hash Value (or Trunk Hash Value) is the hash value of the received packet; and Number of Member Ports (or Number of Trunk Members) is the number of members in the aggregation group.
Congestion determination
At block 440 in Fig. 4, the processor determines whether there is congestion at the first member port.
In one example, the determination is based on the status information stored at block 410 in Fig. 4. If the value of the congestion indicator of the first member port is zero, this indicates no congestion at the first member port. But otherwise, if the congestion indicator has a random value, this indicates that there is congestion at the first member port.
At block 450 in Fig. 4, if there is no congestion at the first member port, the received packet is sent through the first member port.
Second hash value
At block 460 in Fig. 4, if there is congestion at the first member port, the processor determines a second hash value. In one example implementation, the second hash value is determined based on the first hash value and an "additional value".
The congestion indicator of the first member port, which has a random value if there is congestion, may be used as the "additional value". It will be appreciated that adding the additional value to the first hash value allows the same hash computation algorithm to be used, causing less disturbance compared to changing the algorithm.
In the example in Fig. 3, processor P2 retrieves Table 2 from its memory to determine whether there is congestion at the first member port (i.e. port 1 of processor P4). Processor P2 searches for the entry that corresponds to the first member port and its aggregation group.
In Table 2, the first member port is port 1 of processor P4 of aggregation group ID 1, its congestion indicator has a random value of 0.52. This value is used to determine the second hash value, such as by adding the random value to the first hash value, as follows:
Second Hash Value = First Hash Value + Random Value Second member port
At block 470 in Fig. 4, based on the second hash value, the processor determines a second member port of the same aggregation group through which the packet is sent from the network device 1 10.
In the example in Fig. 3, processor P2 determines the second member port as follows:
Index2 = (First Hash Value + Random Factor) % Number of Member Ports, where Index2 (or Trunk Select Index2) is the index of the second member port in the aggregation group of the first member port; First Hash Value (or Trunk Hash Value) is the hash value of the received packet; Random Factor is a random value; and Number of Member Ports (or Number of Trunk Members) is the number of members in the aggregation group .
At block 480 in Fig. 4, the received packet is sent through the second member port instead of the first member port, thereby diverting the traffic away from the first member port to reduce the likelihood of further aggravating the congestion.
It will be appreciated that blocks 460 and 470 may be repeated, if the second member port is also congested based on the status information of the second member port. In this third hash value is calculated to determine a third member port through which the received packet is sent, and so on.
Network Device 1 10
Fig. 7 shows a block diagram of an example line card 700 in the example network device 1 10 in Fig. 2 and Fig. 3. The line card 700 includes one or more processors 710 (labelled Pl to V\n) that are each connected to a subset of ports 720 (also labelled a to n). The processors 710 are interconnected to each other via internal paths 750, and connected to a central processing unit (CPU) 730 and memory 740. Each processor 710 may be connected to any number of ports 320, and this number may vary from one processor 310 to another. An aggregation group may include ports 720 from the same processor 310 or different ones, as well as ports on the same line card 700 or different line cards of the network device 110.
The CPU 730 programs the processors 710 with machine-readable instructions 742 to analyse a packet received at the line card 300, and determines an egress port of the network device 110 to forward the packet flow. The machine-readable instructions 742 are stored in the memory 740. Other information required for load balancing, such as the status information in Table 1 and Table 2, is also stored in the memory 740. The methods, processes and functional units described herein may be implemented by hardware (including hardware logic circuitry), software or firmware or a combination thereof. The term 'processor' is to be interpreted broadly to include a processing unit, ASIC, logic unit, or programmable gate array etc. The processes, methods and functional units may all be performed by the one or more processors 710; reference in this disclosure or the claims to a 'processor' should thus be interpreted to mean 'one or more processors'.
Further, the processes, methods and functional units described in this disclosure may be implemented in the form of a computer software product. The computer software product is stored in a storage medium and comprises a plurality of instructions for making a processor to implement the methods recited in the examples of the present disclosure.
The figures are only illustrations of an example, wherein the units or procedure shown in the figures are not necessarily essential for implementing the present disclosure. Those skilled in the art will understand that the units in the device in the example can be arranged in the device in the examples as described, or can be alternatively located in one or more devices different from that in the examples. The units in the examples described can be combined into one module or further divided into a plurality of sub- units.
Although the flowcharts described show a specific order of execution, the order of execution may differ from that which is depicted. For example, the order of execution of two or more blocks may be changed relative to the order shown. Also, two or more blocks shown in succession may be executed concurrently or with partial concurrence. All such variations are within the scope of the present disclosure. It will be appreciated that numerous variations and/or modifications may be made to the processes, methods and functional units as shown in the examples without departing from the scope of the disclosure as broadly described. The examples are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

Claims
1. A method for load balancing in a network device supporting link aggregation, the method comprising a first processor of the network device:
determining a first hash value for a received packet;
based on the first hash value, determining a first member port of an aggregation group corresponding to a destination address of the packet;
if there is congestion at the first member port, determining a second hash value for the received packet; and
based on the second hash value, determining a second member port in the aggregation group and sending the received packet through the second member port.
2. The method of claim 1, wherein the second hash value for the received packet is determined based on the first hash value and an additional value.
3. The method of claim 2, wherein the additional value is a congestion indicator of the first member port, the congestion indicator having a random value if there is congestion at the first member port.
4. The method of claim 3, further comprising:
receiving, from a second processor connected to the first member port of the aggregation group, a status notification message reporting congestion at the first member port; and
updating the congestion indicator of the first member port from a value that represents no congestion to the random value.
5. The method of claim 3 or 4, further comprising:
receiving, from the second processor connected to the first member port of the aggregation group, a status notification message reporting recovery of the first member port from congestion; and
updating the congestion indicator of the first member port from the random value to a value that represents no congestion.
6. The method of claim 4 or 5, wherein the status notification message includes information identifying the second processor, and status of each port connected to the second processor.
7. The method of any one of the preceding claims, wherein the second port member is determined based on the second hash value and number of members in the aggregation group.
8. A network device for load balancing, the network device supporting link aggregation and comprising a first processor to:
determine a first hash value for a received packet;
based on the first hash value, determine a first member port of an aggregation group corresponding to a destination address of the packet;
if there is congestion at the first member port, determine a second hash value for the received packet; and
based on the second hash value, determine a second member port in the aggregation group and send the received packet through the second member port.
9. The network device of claim 8, wherein the second hash value for the received packet is determined based on the first hash value and an additional value.
10. The network device of claim 8 or 9, wherein the additional value is a congestion indicator of the first member port, the congestion indicator having a random value if there is congestion at the first member port.
11. The network device of claim 10, wherein the first processor is further to:
receive, from a second processor connected to the first member port of the aggregation group, a status notification message reporting congestion at the first member port; and
update the congestion indicator of the first member port from a value that represents no congestion to the random value.
12. The network device of claim 10 or 11, wherein the first processor is further to: receive, from a second processor connected to the first member port of the aggregation group, a status notification message reporting recovery of the first member port from congestion; and
update the congestion indicator in the status information of the first member port from the random value to a value that represents no congestion.
13. The network device of claim 11 or 12, wherein the status notification message includes information identifying the second processor, and status of each port connected to the second processor.
14. The network device of any one of claims 8 to 13, wherein the second port member is determined based on the second hash value and number of members in the aggregation group.
15. A processor for load balancing in a network device supporting link aggregation, the processor is to:
determine a first hash value for a received packet;
based on the first hash value, determine a first member port of an aggregation group corresponding to a destination address of the packet;
if there is congestion at the first member port, determine a second hash value for the received packet; and
based on the second hash value, determine a second member port in the aggregation group and send the received packet through the second member port.
PCT/CN2012/078855 2011-08-03 2012-07-19 Load balancing in link aggregation WO2013017017A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110221835.6 2011-08-03
CN201110221835.6A CN102263697B (en) 2011-08-03 2011-08-03 Method and device for sharing aggregated link traffic

Publications (1)

Publication Number Publication Date
WO2013017017A1 true WO2013017017A1 (en) 2013-02-07

Family

ID=45010163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/078855 WO2013017017A1 (en) 2011-08-03 2012-07-19 Load balancing in link aggregation

Country Status (2)

Country Link
CN (1) CN102263697B (en)
WO (1) WO2013017017A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016074498A1 (en) * 2014-11-12 2016-05-19 中兴通讯股份有限公司 Multicast device and method for managing bandwidth of internet group management protocol snooping multicast stream
EP3125479A1 (en) * 2014-03-27 2017-02-01 Huawei Technologies Co., Ltd Packet forwarding method, system, and apparatus
US9866470B2 (en) 2014-01-24 2018-01-09 Red Hat, Inc. Multiple active link aggregators
US9906592B1 (en) * 2014-03-13 2018-02-27 Marvell Israel (M.I.S.L.) Ltd. Resilient hash computation for load balancing in network switches
US10244047B1 (en) 2008-08-06 2019-03-26 Marvell Israel (M.I.S.L) Ltd. Hash computation for network switches
US10243857B1 (en) 2016-09-09 2019-03-26 Marvell Israel (M.I.S.L) Ltd. Method and apparatus for multipath group updates
EP3477893A4 (en) * 2016-06-22 2019-05-01 Huawei Technologies Co., Ltd. A data transmission method and device, and network element
US10432556B1 (en) 2010-05-26 2019-10-01 Marvell International Ltd. Enhanced audio video bridging (AVB) methods and apparatus
CN112187540A (en) * 2020-09-28 2021-01-05 新华三信息安全技术有限公司 Issuing method of aggregation port configuration and network equipment
CN113014502A (en) * 2021-02-08 2021-06-22 北京星网锐捷网络技术有限公司 Load balancing method and device based on line card
US20220400080A1 (en) * 2021-06-11 2022-12-15 Fujitsu Limited Packet processing apparatus and packet processing method

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102263697B (en) * 2011-08-03 2014-12-10 杭州华三通信技术有限公司 Method and device for sharing aggregated link traffic
CN103023815B (en) * 2012-12-26 2015-05-13 杭州华三通信技术有限公司 Aggregation link load sharing method and device
GB2535264B (en) * 2014-08-29 2021-10-06 Pismo Labs Technology Ltd Methods and systems for transmitting packets through an aggregated connection
CN105939283B (en) * 2016-03-17 2019-03-15 杭州迪普科技股份有限公司 The method and device of network flow quantity shunting
CN109525501B (en) * 2018-12-27 2022-05-24 新华三技术有限公司 Method and device for adjusting forwarding path
CN112737956A (en) * 2019-10-28 2021-04-30 华为技术有限公司 Message sending method and first network equipment
CN111314236A (en) * 2020-04-14 2020-06-19 杭州迪普科技股份有限公司 Message forwarding method and device
CN114268589B (en) * 2020-09-16 2024-05-03 北京华为数字技术有限公司 Traffic forwarding method, device and storage medium
CN113347230B (en) * 2021-05-13 2022-09-06 长沙星融元数据技术有限公司 Load balancing method, device, equipment and medium based on programmable switch
CN113645145B (en) * 2021-08-02 2024-07-19 迈普通信技术股份有限公司 Load balancing method, load balancing device, network equipment and computer readable storage medium
WO2023168657A1 (en) * 2022-03-10 2023-09-14 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for selecting lag port for ip flow

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1571354A (en) * 2003-07-12 2005-01-26 华为技术有限公司 A method for implementing link aggregation
CN1622532A (en) * 2003-11-25 2005-06-01 华为技术有限公司 A dynamic equilibrium distributing method for port data flow
CN1809021A (en) * 2005-01-17 2006-07-26 华为技术有限公司 Ethernet link converging method
CN102263697A (en) * 2011-08-03 2011-11-30 杭州华三通信技术有限公司 Method and device for sharing aggregated link traffic

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7715384B2 (en) * 2004-11-30 2010-05-11 Broadcom Corporation Unicast trunking in a network device
JP4983438B2 (en) * 2007-06-29 2012-07-25 富士通株式会社 Packet transmission load balancing control method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1571354A (en) * 2003-07-12 2005-01-26 华为技术有限公司 A method for implementing link aggregation
CN1622532A (en) * 2003-11-25 2005-06-01 华为技术有限公司 A dynamic equilibrium distributing method for port data flow
CN1809021A (en) * 2005-01-17 2006-07-26 华为技术有限公司 Ethernet link converging method
CN102263697A (en) * 2011-08-03 2011-11-30 杭州华三通信技术有限公司 Method and device for sharing aggregated link traffic

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10244047B1 (en) 2008-08-06 2019-03-26 Marvell Israel (M.I.S.L) Ltd. Hash computation for network switches
US10432556B1 (en) 2010-05-26 2019-10-01 Marvell International Ltd. Enhanced audio video bridging (AVB) methods and apparatus
US9866470B2 (en) 2014-01-24 2018-01-09 Red Hat, Inc. Multiple active link aggregators
US9906592B1 (en) * 2014-03-13 2018-02-27 Marvell Israel (M.I.S.L.) Ltd. Resilient hash computation for load balancing in network switches
EP3125479A1 (en) * 2014-03-27 2017-02-01 Huawei Technologies Co., Ltd Packet forwarding method, system, and apparatus
JP2017511065A (en) * 2014-03-27 2017-04-13 ホアウェイ・テクノロジーズ・カンパニー・リミテッド Packet forwarding method, system, and apparatus
EP3125479A4 (en) * 2014-03-27 2017-04-26 Huawei Technologies Co., Ltd. Packet forwarding method, system, and apparatus
US10447599B2 (en) 2014-03-27 2019-10-15 Huawei Technologies Co., Ltd. Packet forwarding method, system, and apparatus
WO2016074498A1 (en) * 2014-11-12 2016-05-19 中兴通讯股份有限公司 Multicast device and method for managing bandwidth of internet group management protocol snooping multicast stream
EP3477893A4 (en) * 2016-06-22 2019-05-01 Huawei Technologies Co., Ltd. A data transmission method and device, and network element
US10904139B2 (en) 2016-06-22 2021-01-26 Huawei Technologies Co., Ltd. Data transmission method and apparatus and network element
US10243857B1 (en) 2016-09-09 2019-03-26 Marvell Israel (M.I.S.L) Ltd. Method and apparatus for multipath group updates
CN112187540A (en) * 2020-09-28 2021-01-05 新华三信息安全技术有限公司 Issuing method of aggregation port configuration and network equipment
CN112187540B (en) * 2020-09-28 2022-04-26 新华三信息安全技术有限公司 Issuing method of aggregation port configuration and network equipment
CN113014502A (en) * 2021-02-08 2021-06-22 北京星网锐捷网络技术有限公司 Load balancing method and device based on line card
CN113014502B (en) * 2021-02-08 2022-08-19 北京星网锐捷网络技术有限公司 Load balancing method and device based on line card
US20220400080A1 (en) * 2021-06-11 2022-12-15 Fujitsu Limited Packet processing apparatus and packet processing method

Also Published As

Publication number Publication date
CN102263697A (en) 2011-11-30
CN102263697B (en) 2014-12-10

Similar Documents

Publication Publication Date Title
WO2013017017A1 (en) Load balancing in link aggregation
US20240022515A1 (en) Congestion-aware load balancing in data center networks
US8948004B2 (en) Fault tolerant communication in a trill network
JP6576006B2 (en) Control device detection in networks with separate control and forwarding devices
EP2449735B1 (en) Inter-node link aggregation method and node
CN104335537B (en) For the system and method for the multicast multipath of layer 2 transmission
US9049131B2 (en) Network system and load balancing method
EP2252015B1 (en) Method and apparatus for providing fast reroute of a packet that may be forwarded on one of a plurality of equal cost multipath routes through a network
US9577956B2 (en) System and method for supporting multi-homed fat-tree routing in a middleware machine environment
Kanagevlu et al. SDN controlled local re-routing to reduce congestion in cloud data center
CN105634823B (en) A kind of data center network fault recovery method based on multirouting configuration
EP2252013A1 (en) Method and apparatus for maintaining port state tables in a forwarding plane of a network element
US10027571B2 (en) Load balancing
WO2015066367A1 (en) Network topology of hierarchical ring with recursive shortcuts
US9838298B2 (en) Packetmirror processing in a stacking system
US11228524B1 (en) Methods and apparatus for efficient use of link aggregation groups
CN114024969B (en) Load balancing method, device and system
EP2987286A1 (en) Topology discovery in a stacked switches system
CN102891800A (en) Scalable forwarding table with overflow address learning
WO2022012145A1 (en) Load balancing method, apparatus and system
KR20150080953A (en) Method and Apparatus for fault recovery in Fat-Tree network
Reinemo et al. Multi-homed fat-tree routing with InfiniBand
CN101888344A (en) Method, device and switch for flooding route

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12820067

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12820067

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC OF 130814

122 Ep: pct application non-entry in european phase

Ref document number: 12820067

Country of ref document: EP

Kind code of ref document: A1