FCoE Handbook First-A Ebook
FCoE Handbook First-A Ebook
FCoE Handbook First-A Ebook
2010 Brocade Communications Systems, Inc. All Rights Reserved. Brocade, the B-wing symbol, BigIron, DCX, Fabric OS, FastIron, IronView, NetIron, SAN Health, ServerIron, and TurboIron are registered trademarks, and Brocade Assurance, DCFM, Extraordinary Networks, and Brocade NET Health are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. Other brands, products, or service names mentioned are or may be trademarks or service marks of their respective owners. Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on feature and product availability. Export of technical data contained in this document may require an export license from the United States government. FCoE Handbook First Edition, April 2010; revision A, June 2010 Witten by Ahmad Zamer and Chip Copper Edited by Victoria Thomas Illustrated by David Lehmann, Jim Heuser, and Victoria Thomas Design and layout by Victoria Thomas
Ahmad Zamer
Ahmad is responsible for new technology protocols at Brocade, helping evangelize and drive the adoption of new networking and storage technologies such as FCoE, DCB, and TRILL. Ahmad is a high tech veteran with more than 25 years of global computer networking and networked storage industry experience. Most recently, at Intel, he led the introduction of iSCSI to the marketplace and helped drive market adoption. He is a patent holder and an author with more than 50 published articles covering a wide range of technology topics and a frequent speaker at industry events.
Chip Copper
As a Brocade Solutioneer, Chip tracks the SAN market and serves as a resource to customers, partners, and integrators--helping them solve real-world business problems through SANs. Copper has over 30 years of experience in program management and information systems with a broad technical background. He has a Ph.D. from the University of Pittsburgh in the area of Distributed Computing.
FCoE Handbook
Page 2
CONTENTS
CHAPTER 1: OVERVIEW ....................................................................................... 5 Introduction................................................................................................... 5 The Challenges ............................................................................................. 6 The Solution .................................................................................................. 8 Ethernet and Fibre Channel Remain ......................................................... 10 CHAPTER 2: FCOE & DCB INDUSTRY STANDARDS BODIES ........................ 11 INCITS Technical Committee T11 --FCoE ................................................... 11 IEEEData Center Bridging ........................................................................ 12 802.1Qbb: Priority-based Flow Control (PFC) ........................................ 12 802.1Qaz: Enhanced Transmission Selection (ETS)............................. 12 802.1Qau: Congestion Notification (QCN) ............................................. 13 IETF TRILL ................................................................................................ 13 Fibre Channel over Ethernet (FCoE) .......................................................... 14 FCoE Encapsulation ................................................................................ 14 The FCoE Protocol Stack ........................................................................ 15 FCoE Initialization Protocol (FIP) ............................................................ 16 CHAPTER 3: ARCHITECTURAL MODELS .......................................................... 17 Logical vs. Physical Topologies .................................................................. 19 The FCoE Controller .................................................................................... 21 FIP and MAC Addresses ............................................................................. 22 FPMA and SPMA ......................................................................................... 24 Making Ethernet Lossless .......................................................................... 25 FC Transport Requirements ....................................................................... 27 The Ethernet PAUSE Function .................................................................... 27
FCoE Handbook
Page 3
Priority-based Flow Control ........................................................................ 29 Enhanced Transmission Selection ............................................................ 31 Data Center Bridge eXchange ................................................................ 33 Building the DCB Cloud .............................................................................. 36 Congestion Notification .............................................................................. 38 802.1Qau: Congestion Notification (QCN) ............................................. 38 CHAPTER 4: TRILLADDING MULTI-PATHING TO LAYER 2 NETWORKS ... 39 Why Do We Need It? ................................................................................... 39 Introducing TRILL ........................................................................................ 43 The TRILL Protocol ...................................................................................... 45 TRILL Encapsulation ............................................................................... 45 Link-State Protocols ................................................................................ 46 Routing Bridges....................................................................................... 46 Moving TRILL Data...................................................................................... 46 Summary ..................................................................................................... 48 CHAPTER 5: DELIVERING DCB/FCOE .............................................................. 50 Converged Network Adapters (CNAs) .................................................... 50 DCB /FCoE Switches............................................................................... 52 CHAPTER 6: FCOE IN THE DATA CENTER........................................................ 54 Where Will FCoE Be Deployed? .............................................................. 56 Top of Rack ............................................................................................. 57 End of Row .............................................................................................. 59
FCoE Handbook
Page 4
CHAPTER 1: OVERVIEW
INTRODUCTION
Data networking and storage networking technologies have evolved on parallel but separate paths. Ethernet has emerged as the technology of choice for enterprise data networks, while Fibre Channel (FC) became the dominant choice for enterprise shared Storage Area Networks (SANs). Ethernet and Fibre Channel continue to evolveEthernet is poised to reach speeds of 40 Gigabits per second (Gbps) and Fibre Channel will achieve 16 Gbpson their way to higher speeds and new features. Most large organizations invested in both technologies for data center needs. Ethernet provided the front-end Local Area Networks (LANs) linking users to enterprise servers and Fibre Channel provided the back-end SAN links between server and storage. Today, server virtualization requires powerful CPUs and servers, placing higher demands on server networking and storage I/O interconnects. Maintaining separate data and storage networks also adds to the complexity of managing and maintaining data centers. As enterprises embrace virtualization to increase operational efficiencies and multiple applications are consolidated onto a smaller number of powerful servers, the next step appears to be simplifying server I/O links by converging the data and storage networks onto a common fabric. New industry standards that enable the convergence of data and storage networking interconnect now exist. The new Fibre Channel over Ethernet (FCoE) protocol enables the transport of FC storage traffic over a new lossless Ethernet medium. FCoE is an encapsulation protocol and is not a replacement technology for Fibre Channel. In fact, FCoE builds on the success of FC in the data center and utilizes FC along with new lossless Ethernet to solve server I/O challenges facing IT professionals in the data center.
FCoE Handbook
Page 5
This book investigates the implementation and uses of FCoE. It discusses the technologies of FCoE and lossless Ethernet, and shows how these can be combined to produce solutions resulting in a simplified physical infrastructure. This new infrastructure will consolidate I/O traffic for storage and data networks over common lossless Ethernet links. Before discussing the new technology, lets start with a discussion of why FCoE is being considered at all. By understanding the challenges presented by consolidation, its easier to understand the choice of technologies selected for the solution.
THE CHALLENGES
Throughout the enterprise and especially in the data center, there is a continuing need to drive down costs as much as possible. Budgets for capital expenses (CapEx) and operating expenses (OpEx) are almost always being squeezed and rarely increased. Data center managers and architects continue to look for new and creative ways to make infrastructures and operations more cost effective. One common theme for achieving cost reduction is consolidation. An example of a successful consolidation technology is Storage Area Networks. Before SANs, each server had a certain amount of storage directly attached to it (see Figure 1). The storage on each server had to be managed independently, and available disk space on one server could not be effectively deployed on another. Data protection, backups, and migration were difficult and time consuming, since manipulating the data on storage devices meant also dealing with servers.
FCoE Handbook
Page 6
Figure 1. Direct-Attached Storage (DAS, left) and a simple SAN (right). In a SAN solution, a specialized network is installed between the servers and the storage devices. This network has protocols and hardware to guarantee that storage traffic sent across it will arrive in a timely fashion and that data will be received in order, non-duplicated, and non-corrupted. By consolidating the storage, larger, more cost-effective solutions can be deployed, and the overall management of storage can be simplified. The deployment of a SAN solution involves the implementation of a separate network in the data center. In addition to the Ethernet network used for peer-to-peer or client-server traffic, the SAN uses separate network controllers called host bus adapters (HBAs), distinct cabling, and specialized switcheswhich all have to be configured and maintained. Because of the timing, reliability, and management differences between Ethernet networks and SANs, these two infrastructures have been deployed separately. This means that a server had to have separate storage and networking I/O adapters, more cabling, and more networking devices had to be used. The opportunity for cost reduction using FCoE is the reduction in the amount of hardware required to deploy a data center solution for both IP networking and storage traffic. This approach reduces not only the initial CapEx required to purchase and deploy the equipment, but also optimizes ongoing OpEx, such as electricity and cooling.
FCoE Handbook
Page 7
THE SOLUTION
The FCoE solution for server I/O consolidation and for reducing the amount of cabling in the data center recognizes several fundamental principles: TCP/IP is the protocol of choice for peer-to-peer and client server networking. Fibre Channel is the protocol of choice for storage traffic in the data center. Ethernet is a ubiquitous networking technology. Although traditional Ethernet has several characteristics that do not work well with native storage applications, the volume of Ethernet equipment and expertise deployed worldwide makes it a natural candidate for the solution.
These factors influenced the development of FCoE. The idea behind FCoE is to encapsulate FC frames into Ethernet frames, so that they can be transported over a lossless Ethernet medium. To achieve that, Ethernet had to be enhanced with new features so that is could natively provide the lossless delivery of frames containing FC data. In creating the new FCoE stack, the higher levels of the Fibre Channel protocol are layered on top of the new lossless Ethernet. By isolating the changes to Fibre Channel to the lowest layers (FC-0 and FC-1, see Figure 4), the upper constructs of FC are preserved, allowing FCoE to seamlessly integrate into existing Fibre Channel SANs without disrupting installed resources. Although this results in a change of transport, there are no fundamental differences in the behavior of Fibre Channel whether deployed natively or across lossless Ethernet.
FCoE Handbook
Page 8
iSCSI TCP
IP Ethernet
iSCSI
FCP
FCP
FCoE DCB
Fibre Channel over Ethernet
FC
Fibre Channel
Figure 2. Protocols today and converged over DCB. The changes, or enhancements, made to Ethernet to support convergence do not prevent the simultaneous deployment of a TCP/IP stack. More importantly, since the new lossless Ethernet can be beneficial to the flow control aspects of TCP/IP, the lossless behavior of the new Ethernet can be turned on or off for LAN TCP/IP traffic. That gives data center professionals the ability to deploy LANs using all of the following on one converged medium: Traditional Ethernet, lossless Ethernet with DCB features FCoE with DCB features enabled
The combination of DCB and FCoE technologies provides a solution for the challenges of physical infrastructure reduction and cabling simplicity. Subsequent chapters describe how this was accomplished.
FCoE Handbook
Page 9
FCoE Handbook
Page 10
The FCoE protocol was developed by the INCITS Technical Committee T11 as part of the T11 FC-BB-5 project. The FCoE protocol and the FCoE Initialization (FIP) protocol are defined in FC-BB-5, which describes how other protocols are transported and mapped over a Fibre Channel network. The T11 committee completed its technical work for FC-BB-5 in June 2009 and forwarded the draft standard to INCITS for approval and publishing. The INCITS public review was completed with no comments, which means that the standard will be published by INCITS as an industry standard very soon. The new FCoE standard is an encapsulation protocol that wraps FC storage data into Ethernet frames, which enables them to be transported over a new lossless Ethernet medium.
FCoE Handbook
Page 11
The Data Center Bridging (DCB) effort undertaken by the IEEE 802.1 work group is aimed at adding new extensions to bridging and Ethernet, so that it becomes capable of converging LAN and storage traffic on a single link. Often, you hear that DCB new features will make Ethernet like Fibre Channel. That is true, because the new features being added to Ethernet are solving issues that FC faced in the past and successfully resolved. IEEE is expected to complete its work on the components of DCB in the second half of 2010. The new enhancements are PFC, ETS, and DCBX.
FCoE Handbook
Page 12
Incorporates Data Center Bridging Exchange (DCBX), a discovery and initialization protocol that discovers the resources connected to the DCB cloud and establishes cloud limits. DCBX distributes the local configuration and detects the misconfiguration of ETS and PFC between peers. It also provides the capability for configuring a remote peer with PFC, ETS, and application parameters. The application parameter is used for informing the end station which priority to use for a given application type (e.g., FCoE, iSCSI). DCBX leverages the capabilities of IEEE 802.1AB Link Layer Discovery Protocol (LLDP).
IETF TRILL
Internet Engineering Task Force (IETF) is developing a new shortest path frame routing for multi-hop environments. The new protocol is called Transparent Interconnection of Lots of Links, or TRILL for short, and is expected to be completed in the second half of 2010. TRILL provides a Layer 2 (L2) multi-path alternative to the single path and network bandwidth limiting Spanning Tree Protocol (STP), currently deployed in data center networks. TRILL will also deliver L2 multi-hop routing capabilities, which are essential for expanding the deployment of DCB/FCoE solutions beyond access layer server I/O consolidation and into larger data center networks.
FCoE Handbook
Page 13
FCoE Handbook
Page 14
FCoE-HDR FC-HDR
DA
SA
TYPE
Data
CRC
Ethernet frame
FCoE Handbook
Page 15
Ethernet
FCoE Handbook
Page 16
This same concept is true for F_Ports and E_Ports. Consider a Fibre Channel switch with 32 ports. Because there are 32 connectors on the switch, it is commonly understood that the device can support up to a total of 32 F_Ports and E_Ports, and that any particular port can take on either personality. There can be, however, no more than a total of 32 F_Ports and E_Ports, because the logical functionality of a port is associated with the physical connector. This one-to-one mapping of ports to logical functionality does not exist as the lower layers of Fibre Channel are replaced with lossless Ethernet. One physical lossless Ethernet port is capable of supporting the functionality of multiple logical Fibre Channel ports. A single server with dual lossless Ethernet ports and controllers may be capable of supporting many more instances of logical Fibre Channel ports through the FCoE Entity layer. In order to distinguish between the physical ports and the logical functionality, a new nomenclature is used in FCoE. A P prefix on a port type name refers to the physical entity, and a V prefix refers to the virtual functionality. For example, a VN_Port represents the logical functionality of a Fibre Channel N_Port on an FCoE-capable server. In this example, there is no corresponding PN_Port, since all of the VN_Ports traffic would flow through a lossless Ethernet port. This convention provides much more clarity when describing FCoE implementation and architecture. Certain characteristics such as MAC addresses can be associated not only with lossless Ethernet ports but also with the VN_Ports in a particular server. By differentiating between the physical and logical, it is clear which entity is being referenced by an attribute such as a MAC address. In the FCoE environment, the equivalent of a Fibre Channel node is an Enode. By identifying something as an Enode, the use of FCoE for FC protocol connectivity is implied. Similarly, an FCoE switch is called a Fibre Channel Forwarder (FCF). An FCF performs all of the functions and has all of the services of an FC switch, and is equipped with one or more FCoE ports. Additionally, an FCF may contain one or more lossless Ethernet bridges and possibly native Fibre Channel ports, but these elements are not necessary.
FCoE Handbook
Page 18
FCoE Handbook
Page 19
FC-3/FC-4s
FC-3/FC-4s
VN_Port
F2-2V
VN_Port
F2-2V
FCoE_LEP
FCoE_LEP
FCoE entity
Ethernet port
Figure 5. FCoE port model Although the FCoE specification allows multiple virtual connections from a single physical port on an FCF, it requires all of the connections from a single physical port to be of the same port type. While a particular lossless Ethernet port is being used to support VE_Port connections, it cannot be used for VF_Ports, and vice versa. This is because of the way Ethernet MAC addresses are used in the discovery process, as discussed in the section Establishing Virtual Connections.
FCoE Handbook
Page 20
This characteristic of not requiring a physical connection for each logical connection has significant ramifications for the design and behavior of a fabric. Although a single Enode may be able to attach to many switches logically, it is not clear that this is a desired behavior. For example, if there are 10 FCFs in a fabric, an administrator may not want a single Enode to establish a connection with each of them. This and additional topics relating to the merging of Ethernet and Fibre Channel are discussed later.
The FCoE Controller uses the FCoE Initialization Protocol (FIP) to communicate with other FCoE Controllers for connection management. The FIP protocol is not really an extension of Fibre Channel or FCoE, but rather exists as an independently operating protocol. It has a unique EtherType, meaning that Ethernet switches that do not understand FCoE or Fibre Channel can still distinguish FIP messages from FCoE traffic. Once an FCoE Controller has identified a peer of an appropriate type (using a mechanism to be described shortly), it establishes a connection with that peer and creates an FCoE Entity representing that connection. The partner FCoE Controller will create an FCoE Entity representing the opposite side of the connection for use by its local Fibre Channel protocol stack. Because multiple eligible partners may be discovered, more than one FCoE Entity may be created. The FCoE Controller is also responsible for ongoing maintenance of the connections. It will tear down connections when appropriate, and can optionally send keep-alive messages to insure that ongoing connectivity is maintained.
FCoE Handbook
Page 22
An FCoE Controller on an FCF can represent VE_Ports or VF_Ports. (Remember that the standard doesnt allow a single FCF port to support both types simultaneously, although a single port can support multiple logical ports of the same type at the same time.) When the FCoE Controller represents a VE_Port, it periodically multicasts an announcement specific to other VE_Port-configured FCoE Controllers, indicating its availability. This announcement includes the MAC address of the sending controller so that receiving stations can reply with a unicast request for connection if they want to. This periodic multicast also serves as a keep-alive message to other stations currently logically connected to this FCF.
Enode
VLAN Request VLAN Notification Discovery Solicitation
FCF
Discovery
FIP
Discovery Advertisement FLOGI Request FLOGI Accept PLOGI (Directory Server) PLOGI Accept
Login
FCoE
Data Transfer
Figure 6. Roles of FIP and FCoE in discovery and data transmission. When an FCoE Entity representing VE_Ports comes online, it multicasts a request for other VE_Port-configured FCoE Controllers. All VE_Port FCoE Controllers receiving this request respond with a unicast message if they are interested in establishing a logical connection. This response includes their MAC address so that the receiving station can contact them directly.
FCoE Handbook
Page 23
If an FCF happens to have multiple lossless Ethernet MACs, then each of the FCoE Controllers associated with those MACs will perform independent advertisements and solicitations. Similarly, when a new VF_Port comes online, it also sends a multicast announcement. This multicast is sent to an address representing all FCoE Controllers for Enodes on the network. It serves as a keep-alive message to all Enodes currently logically connected to that MAC and can be used by other Enodes to build a list of reachable FCFs. A new Enode coming online sends a multicast to all FCFs. FCF MACs receiving this multicast respond with a unicast message containing their MAC addresses to allow the Enode to contact them directly. When an FCF creates a new FCoE Entity, that new entity continues to be addressed using the MAC address of the lossless Ethernet controller. In the case of an Enode, however, different rules apply. The FCoE Entities in Enodes are permitted to have unique MAC addresses of their own.
FCoE Handbook
Page 24
FPMA also allows quick differentiation between multiple separate streams between an Enode and a single FCF. An Enode can create several logical VF_Ports, and each of those ports can have a logical link with the same FCF. By assigning a distinct MAC address to each VF_Port, multiple streams are differentiated only by the Ethernet headers. With SPMA, the Enode device is responsible for the management and assignment of MAC addresses to FCoE Controllers and FCoE Entities. To be sure that duplicate MAC addresses do not appear on the network, these MAC addresses must be globally defined and so cannot be dynamically created. A node implementing SPMA may choose to use the same MAC address for multiple VN_Ports and the FCoE Controllers, so its impossible to distinguish between different streams by looking only at the MAC addresses. The frames require deeper inspection to determine the unique fabric identifiers from the encapsulated FC frame.
about the same time. This station would then continue to transmit for a period to be sure that the other station was also aware of the collision but would then abort the frame. After a random time interval, the station would reattempt to send the frame. By using this approach, a station can be confident that a frame was sent correctly but not whether the frame was received correctly. Ethernet implementations have moved from this shared-media approach to one in which each segment of media is shared by only two Ethernet stations. Dual unidirectional data paths allow the two stations to communicate with each other simultaneously without fear of collisions. Although this approach addresses how frames are delivered between Ethernet stations, it doesnt change the behavior of how frames are treated once theyre received. The rules of Ethernet allow a station to throw away frames for a variety of reasons. For example, if a frame arrives with errors, its discarded. If a nonforwarding station receives a frame not intended for it, it discards the frame. But most significantly, if a station receives an Ethernet frame and it has no data buffer in which to put it, according to the rules of Ethernet, it can discard the frame. It can do this because its understood that stations implementing the Ethernet layer all have this behavior. If a higher level protocol requires a lossless transmission, another protocol must be layered on top of Ethernet to provide it. Consider an implementation of the FTP protocol running across an Ethernet network. FTP is part of the TCP/IP tool suite. This means that from the bottom layer up, FTP is based on Ethernet, IP, TCP, and finally FTP itself. Ethernet does not guarantee that frames will not be lost and neither does IP. The TCP layer is responsible for monitoring data transmitted between the FTP client and server, and if any data is lost, corrupted, duplicated, or arrives out of order, TCP will detect and correct it. It will request the retransmission of data if necessary, using the IP and Ethernet layers below it to move the data from station to station. It will continue to monitor, send, and request transmissions until all the necessary data has been received reliably by the FTP application.
FCoE Handbook
Page 26
FC TRANSPORT REQUIREMENTS
The architecture of the Fibre Channel protocol is different. Ethernet only guarantees the best-effort delivery of frames and allows frames to be discarded under certain circumstances. Fibre Channel, however, requires reliable delivery of frames at the equivalent level of the Ethernet layer. At this layer, a Fiber Channel switch or host is not allowed to discard frames because it does not have room for them. It accomplishes this by using a mechanism called buffer credits A buffer credit represents a guarantee that sufficient buffer space exists in a Fibre Channel node to receive an FCl frame. When a Fibre Channel node initializes, it examines its available memory space and determines how many incoming frames it can accommodate. It expresses this quantity as a number of buffer credits. A Fibre Channel node wishing to send a frame to an adjacent node must first obtain a buffer credit from that node. This is a guarantee that the frame will not be discarded on arrival because of a lack of buffer space. The rules of Fibre Channel also require a node to retain a frame until it has been reliably passed to another node or it has been delivered to a higher level protocol. As discussed previously, implementations of FCoE replace the lower layers of Fibre Channel with Ethernet. Since the lower layers of Fibre Channel are responsible for guaranteeing the reliable delivery of frames throughout the network, that role must now fall to Ethernet. The behavior of Ethernet must therefore be changed to accommodate this new responsibility.
FCoE Handbook
Page 27
Using this approach, lossless behavior can be provided if a receiving station issues PAUSE requests when it does not have any buffer space available to receive frames. It assumes that by the time the PAUSE request expires, there will be sufficient buffer space available. If not, it is the responsibility of the receiving station to issue ongoing PAUSE requests until sufficient buffer space becomes available. The PAUSE command provides a mechanism for lossless behavior between Ethernet stations, but it is only suited for links carrying one type of data flow. Recall that one of the goals of FCoE is to allow for I/O consolidation, with TCP/IP and Fibre Channel traffic converged onto the same media. If the PAUSE command is used to guarantee that Fiber Channel frames are not dropped as is required by that protocol, then as a side effect, TCP/IP frames will also be stopped once a PAUSE command is issued. The PAUSE command doesnt differentiate traffic based on protocols. It pauses all traffic on the link between two stations, even control commands. So a conflict between what must be done to accommodate storage traffic in FCoE and TCP/IP trafficboth of which need to coexist on the same segment of media. And problems could arise because one type of network traffic may interfere with the other. Suppose, for example, that storage traffic is delayed because of a slow storage device. In order to not lose any frames relating to the storage traffic, a PAUSE command is issued for a converged link carrying both FCoE and TCP/IP traffic. Even though the TCP/IP streams may not need to be delayed, they will be delayed as a side effect of having all traffic on the link stopped. This in turn could cause TCP time-outs and may even make the situation worse as retransmit requests for TCP streams add additional traffic to the already congested I/O link. The solution to this problem is to enable Ethernet to differentiate between different types of traffic and to allow different types of traffic to be paused individually if required.
FCoE Handbook
Page 28
Identifier. Indicates that frame is 802.1Q tagged Used for DCB Priority Flow Control (PFC)
802.1Q VLAN Ta Field TPID 16 bits Used by DCB for PFC Value PCP 3 bits CFI 1 bit
VID: VLAN ID
For proper FCoE traffic, Brocade 8000 DCB ports are set to converged mode to handle tagged frames with PFC value
Figure 7. Priority Flow Control in IEEE 802.1q VLAN. In addition, the Ethernet PAUSE command has a sufficient number of bytes available to allow an individual pause interval to be specified for each of the eight levels, or classes, of traffic. FCoE and TCP/IP traffic types can therefore be converged on the same link but placed into separate traffic
FCoE Handbook
Page 29
classes. The FCoE traffic can be paused in order to guarantee the reliable delivery of frames, while TCP/IP frames are allowed to continue to flow. Not only can different traffic types coexist, but best practices for each can be implemented in a non-intrusive manner.
Figure 8. Eight priorities per link using PFC. From another perspective, consider that PFC attempts to emulate Virtual Channel (VC) technology widely deployed in current Brocade Fibre Channel SANs. While borrowing the lossless aspect of VCs, PFC retains the option of being configured as lossy or lossless. PFC is an enhancement to the current link-level of Ethernet flow control mechanism defined in IEEE 802.3x (PAUSE). Current Ethernet protocols support the capability to assign different priorities to different applications, but the existing standard PAUSE mechanism ignores the priority information in the Ethernet frame. Triggering the PAUSE command results in the link shutting down, which impacts all applications even when only a single application is causing congestion. The current PAUSE is not suitable for links in which storage FCoE and networking applications share the same link, because congestion caused by any one of applications shouldnt disrupt the rest of the application traffic.
FCoE Handbook
Page 30
IEEE 802.1Qb is tasked with enhancing the existing PAUSE protocol to include priority in the frames contributing to congestion. PFC establishes eight priorities using the priority code point field in the IEEE 802.1Q tags (see Figure 8), which enable the control of individual data flows, called flow control, based on the frames priority. Using the priority information the peer (server or switch) stops sending traffic for that specific application, or priority flow, while other applications data flows continue without disruption on the shared link. The new PFC feature allows FC storage traffic encapsulated in FCoE frames to receive lossless service from a link that is shared with traditional LAN traffic which is loss-tolerant. In other words, separate data flows, can share a common lossless Ethernet, while each is protected from flow control problems of the other flows. Note that LAN traffic priorities can be configured with PFC off, allowing for lossy or lossless LAN transmissions.
FCoE Handbook
Page 31
This situation does not occur if separate channels are used for storage and non-storage traffic. A Fibre Channel-attached server could access its block traffic independent of the messages traveling across an Ethernet TCP/IP connection. Competition for bandwidth occurs only when these two ordinarily independent streams share a common link. In order to insure that all types of traffic are given the appropriate amount of bandwidth, a mechanism called Enhanced Transmission Selection (ETS) is used with Priority Flow Control. ETS establishes priorities and bandwidth limitations to insure that all types of traffic receive the priority and bandwidth they require for the proper operation of the server and all applications. ETS establishes Priority Groups, or traffic class groups. A Priority Group is a collection of priorities as established in PFC. For example, all of the priorities associated with Inter Process Communication (IPC) can be allocated to one Priority Group (traffic class group). All priorities assigned to FCoE can be assigned to a second traffic class group, and all IP traffic can be assigned to a third group, as shown in Figure 9. Each Priority Group has an integer identifier called the Priority Group ID (PGID) assigned to it. The value of the PGID is either 15 or a number in the range of 0 through 7. If the PGID for a Priority Group is 15, all traffic in that group is handled on a strict priority basis. That is, if traffic becomes available, it is handled before traffic in all other Priority Groups without regard for the amount of bandwidth it takes. A PGID of 15 should be used only with protocols requiring either an extremely high priority or very low latency. Examples of traffic in this category include management traffic, IPC, or audio/video bridging (AVB). The other traffic class groups with PGID identifiers between 0 and 7 are assigned a bandwidth allocation (PG%). The sum of all bandwidth allocations should equal 100%. The bandwidth allocation assigned to a traffic class group is the guaranteed minimum bandwidth for that group assuming high utilization of the link. For example, if the PG for the traffic class group containing all storage traffic is 60%, it is guaranteed that at
FCoE Handbook
Page 32
least 60% of the bandwidth available after all PGID 15 traffic has been processed will be allocated to the storage traffic Priority Group. The specification for ETS allows a traffic class group to take advantage of unused bandwidth available on the link. For example, if the storage traffic class group has been allocated 60% of the bandwidth and the IP traffic class group has been allocated 30%, the storage group can use more than 60% if the IP traffic class group does not require the entire 30%.
Priority Group 1: Storage 60% Priority Group 2: LAN 30% Priority Group 3: IPC 10%
FCoE Handbook
Page 33
Today, nearly all Ethernet devices are equipped to support the Link Layer Discovery Protocol (LLDP). LLDP is a mechanism whereby each switch periodically broadcasts information about itself to all of its neighbors. Its a one-way protocol, meaning that there is no acknowledgement of any of the data transmitted. Broadcasted information includes a chassis identifier, a port identifier, a time-to-live (TTL) field, and other information about the state and configuration of the device. Information in an LLDP data unit (LLDPDU) is encoded using a type-lengthvalue (TLV) convention: Each unit of information in the LLDPDU starts with a type field that tells the receiver what that information block contains. The next field, the length field, allows a receiver to determine where the next unit of information begins. By using this field, a receiver can skip over any TLVs that it either doesnt understand or doesnt want to process. The third element of the TLV is the value of that information unit.
The LLDP standard defines a number of required and optional TLVs. It also allows for a unique TLV type, which permits organizations to define their own additional TLVs as required. By taking advantage of this feature, DCBX can build on LLDP to allow two stations to exchange information about their ability to support PFC and ETS. Stations that do not support PFC and ETS are not negatively impacted by the inclusion of this information in the LLDPDU, and they can just skip over it. The absence of DCBX-specific information from an LLDPDU informs an adjacent station that it is not capable of supporting those protocols. DCBX also enhances the capabilities of LLDP by including additional information that allow the two stations to be better informed about what the other station has learned to keep the two stations in sync. For example, the addition of sequence numbers in the DCBX TLV allows each of the two stations to know that it has received the latest information from its peer and that its peer has received the latest information from it.
FCoE Handbook
Page 34
There are currently three different subclasses of information exchanged by DCBX: The first subclass is control traffic for the DCBX protocol itself. By using this subtype, state information and updates can be exchanged reliably between two peers. The second subtype allows the bandwidth for Traffic Class Groups to be exchanged. The first part of this data unit identifies the PGID for each of the seven message priorities. (For a review of message priorities, Priority Groups, PGIDs, and PG%, see previous several sections.) The second part of the data unit identifies the bandwidth allocation that is assigned to each of the PGIDs 0 through 7. Recall that PGID 15 is a special group that always gets priority over the others independent of any bandwidth allocation. The final part of the subtype allows a station to identify how many traffic classes it supports on this port. You can think of a traffic class as a collection of different types of traffic that are handled collectively. The limitation on the number of traffic classes supported by a port may depend on physical characteristics such as the number message queues available or the capabilities of the communication processors.
Because of the grouping of message priorities into traffic classes, its not necessary for a communications port to be able to support as many traffic classes as there are priorities. To support PFS and ETS, a communications port only needs to be able to handle three different traffic classes: One for PGID 15 high priority traffic One for those classes of traffic that require PFC support for protocols such as FCoE One for traffic that does not require the lossless behavior of PFC, such as TCP/IP.
By exchanging the number of traffic classes supported, a station can figure out if the allocation of additional Priority Groups is possible on the peer station.
FCoE Handbook
Page 35
The third subtype exchanged in DCBX indicates two characteristics of the sender. First, it identifies which of the message priorities should have PFC turned on. For consistency, all of the priorities in a particular Priority Group should either require PFC or not. If those requiring PFC are mixed up with those that do not, buffer space will be wasted and traffic may be delayed. The second piece of information in this subtype indicates how many traffic classes in the sender can support PFC traffic. Because the demands for PFC-enabled traffic classes are greater than those classes of traffic that do not require lossless behavior, the number of traffic classes supporting PFC may be less than the total number of traffic classes supported on a port. By combining the information in this subtype with that in the previous subtype, a station can determine the number of PFC-enabled and non-PFC-enabled traffic classes supported by a peer.
FCoE Handbook
Page 36
When a station receives a DCBX-extended LLDP message, it will examine the values of the parameters for compatibility. The peer must be capable of supporting the required protocols and must have like configurations for Priority Groups and PGIDs. At first, this may sound like a daunting problem, but most experts agree that just as there are best practices for other protocols, there will be best practices for determining Priority Groups, PGIDs, and PG%s. There will be mechanisms for customizing these values for special situations, but the default configuration values will be those generally agreed upon by the industry. As devices participating in the LLDP process establish which links will be used for lossless Ethernet traffic, a natural boundary will form. Within this boundary, FCoE traffic will be allowed to move between stations and switches. TCP/IP traffic will be allowed to travel within, across, and beyond this boundary. But to minimize the impact of TCP/IP on storage paths, a best practice will be to direct all IP traffic out of the cloud as quickly as possible toward nodes not within the lossless boundary.
Upper Layers 4 Upper layer driver Data Center Bridging parameter exchange Auto-negotiation Driver initialization 1 MAC Local Node Ethernet link 3 Declare link UP Upper Layers
Upper layer driver Data Center Bridging parameter exchange Auto-negotiation Driver initialization MAC Remote Node
Speed negotiation
FCoE Handbook
Page 37
CONGESTION NOTIFICATION
802.1Qau: Congestion Notification (QCN)
An end-to-end congestion management mechanism enables throttling of traffic at the end stations in the network in the event of traffic congestion. When a device is congested, it sends a congestion notification message to the end station to reduce its transmission. End stations discover when congestion eases so that they may resume transmissions at higher rates.
NIC Bridge NIC RL CNM Bridge NIC
from IEEE DCB tutorial
Congestion
NIC
Bridge
Bridge NIC
PFC
CNM (Congestion Notification Message) Message is generated sent to ingress end station when a bridge experiences congestion RL (Rate Limiter) In response to CNM, ingress node rate-limits the flows that caused the congestion
Figure 11. Achieving lossless transport with PFC and CN. It is important to note the QCN is a separate protocol independent of ETS and PFC. While ETS and PFC are dependent on each other, they do not depend on or require QCN to function or be implemented in systems.
FCoE Handbook
Page 38
FCoE Handbook
Page 39
Because Ethernet was originally a flat point-to-point topology across a single segment of shared media (see Figure 12), you didnt need to be concerned about multiple paths through the network. Each node was logically connected directly to each of its peers with no intermediary devices along the way. That meant that the Ethernet protocol could ignore cases in which multiple paths from a source to a destination were available. As a result counters and headers for management metrics such as hop count and time-out values were unnecessary and were not included in the standard Ethernet frame.
Figure 12. Basic Ethernet topology. As Ethernet deployments became more common, new devices were introduced into the infrastructure to enable larger networks. Analog repeaters and digital bridges began to appear, and as they did, new complexities began to surface. For example, with these devices, you could design a network where there was more than one physical path from any one source to a destination (see Figure 13).
FCoE Handbook
Page 40
Figure 13. More complexity in the Ethernet topology. The problem was that a network device receiving a frame on a port didnt know if it had seen that frame before. This introduced the possibility of a single frame circulating throughout the network indefinitely (see Figure 14). Left unchecked, a network would soon be saturated with frames that couldnt be removed because they couldnt be identified or limited.
FCoE Handbook
Page 41
To address this problem, a logical topological restriction in the form of a spanning tree was placed on Ethernet networks. The STP protocol meant that: although there may be many physical paths through the network at any given time, all traffic will flow along paths that have been defined by a spanning tree that includes all network devices and nodes (see Figure 15). By restricting traffic to this tree, loops in the logical topology are prevented at the expense of blocking alternative network paths.
FCoE Handbook
Page 42
While STP solves the problem of traffic loops, it prevents network capacity from being fully used. Algorithms that calculate this spanning tree may take a lot of time to converge. During that time, the regular flow of traffic must be halted to prevent the type of network saturation described above. Even if multiple simultaneous spanning trees are used for separate VLANs to better distribute the traffic, traffic in any one VLAN will still suffer from the same disadvantage of not being able to use all of the available capacity in the network.
INTRODUCING TRILL
To eliminate the restriction of a single path through the network, the IETF formed a working group to study this problem. The official documentation states the goal of the group this way: The TRILL WG will design a solution for shortest-path frame routing in multi-hop IEEE 802.1-compliant Ethernet networks with arbitrary topologies, using an existing link-state routing protocol technology. In simpler terms, the group was charged with developing a solution that: Uses shortest path routing Works at Layer 2 Supports multi-hopping environments Works with an arbitrary topology Uses an existing link-state routing protocol Remains compatible with IEEE 802.1 Ethernet networks that use STP
The result was a protocol called TRILL. Although routing is ordinarily done at Layer 3 of the ISO protocol stack, by making Layer 2 a routing layer, protocols other than IP, such as FCoE, can take advantage of this increased functionality. Multi-hopping allows specifying multiple paths through the network. By working in an arbitrary topology, links that otherwise would have been blocked are usable for traffic. Finally, if the network can use an existing link-state protocol, solution providers can use protocols that have
FCoE Handbook
Page 43
already been developed, hardened, and optimized. This reduces the amount of work that must be done to deploy TRILL.
Core
Aggregation
Access
Servers
Figure 16. TRILL provides L2 multi-pathing. Just as important is what TRILL doesnt do. Although TRILL can serve as an alternative to STP, it doesnt require that STP be removed from an Ethernet infrastructure. Most networking administrators cant just rip and replace their current deployments simply for the sake of implementing TRILL. So hybrid solutions that use both STP and TRILL are not only possible but will most likely be the norm at least in the near term. TRILL will also not automatically eliminate the risk of a single point of failure, especially in a hybrid architecture. The goals of TRILL are restricted to those explicitly listed above. Some unrealistic expectations and misrepresentations have been made about this technology, so its important to keep in mind the relatively narrow range of problems that TRILL can solve. Simply put, TRILL enables only two things: Multi-pathing for L2 networks Multi-hopping that can benefit FCoE
TRILL Encapsulation
TRILL encapsulation turns Ethernet frames into TRILL frames by adding a TRILL header to the frame. The new TRILL header (see Figure 17) is in exactly the same format as a legacy Ethernet header. This allows bridges (switches) that are not aware of TRILL to continue forwarding frames according to the rules theyve always used. The source address used in the TRILL header is the address of the RBridge adding the header. The destination address is determined by consulting tables built by the linkstate routing protocol. A new EtherType is assigned to TRILL. Note also the HC (hop count) field, a 6-bit field that allows for 64 hops. The HC field is used to prevent the formation of loops on the VLAN or the premature discarding of frames.
Header length: 64 bits
6 octets 6 octets 4 octets 2 octets 2 octets 2 octets 2 octets 6 octets 6 octets 4 octets 2 octets Variable 4 octets V Outer MAC DA Outer MAC SA Outer VLAN tag Etype = TRILL (TBA) R M OL HC Egress Rbridge Nickname Ingress Rbridge Nickname Inner MAC DA Inner MAC SA Inner VLAN tag Type/Length Payload CRC Outer MAC header
Nickname: auto
FCoE Handbook
Page 45
Link-State Protocols
As noted earlier, TRILL will use link-state protocols to form the control plane of TRILL. The purpose of the control plane is to distribute the VLAN configuration to all the RBridges on the VLAN. Link-state protocols also continuously monitor the VLAN configuration and adjust the configuration data base in the event of changes. The control plane also provides the algorithms used to calculate the shortest path between any two RBridges on the VLAN. Considering the fact that TRILL will be used in converged environments where storage and TCP/IP networks are deployed, you can expect that link-state protocols from both worlds will be utilized by TRILL
Routing Bridges
Routing bridges are a new type of L2 devices that implement the TRILL protocol, perform L2 forwarding, and require little or no configurations. Using the configuration information distributed by the link-state protocol, RBridges discover each other and calculate the shortest path to all other RBridges on the VLAN. The combination of all calculated shortest paths make up the RBridge routing table. It is important to note that all RBridges maintain a copy of the configuration database, which helps reduce convergence time. When they discover each other, RBridges select a designated bridge (DRB), which in turn assigns a designation for the VLAN and selects an appointed forwarder (AF) for the VLAN. Although the DRB can select itself as an AF, there can only be a single AF per VLAN. The AF handles native frames on the VLAN.
FCoE Handbook
Page 46
Using the original destination address as a key, a list of eligible next-hop RBridges is determined. This list contains the RBridges that could be the next step along all least-cost paths moving to the final destination. If more than one RBridge is in the list, a hash is used to distribute the traffic load and guarantee that all traffic in a single stream stays on the same path to avoid reordering overhead. The RBridge that results from this is placed in the TRILL header and the frame is sent on.
MAC-RB3 MAC-RB4 MAC-RB1 RB3 RB1 MAC-T MAC-H Data RB3 MAC-T MAC-H Data RB1 1. Host sends frames Host Target RB2 4. Target receives frames MAC-T MAC-H Data 2. RB1 adds TRILL header and outer MAC header 3. RB3 removes TRILL header RB4 RB5 MAC-RB1 RB3 RB1 MAC-T MAC-H Data
Figure 18. TRILL adds a new header to the beginning of an Ethernet frame. The new TRILL header (see Figure 18) is in exactly the same format as a legacy Ethernet header. This allows bridges (switches) that are not aware of TRILL to continue forwarding frames according to the rules theyve always used. The source address used in the TRILL header is the address of the RBridge adding the header. The destination address is determined by consulting the tables built by the link-state routing protocol.
FCoE Handbook
Page 47
When a frame with a TRILL header is received by an RBridge, the RBridge removes the header and examines the original source and destination addresses. It then creates a new TRILL header using the method described above and forwards the frame. The last RBridge receiving a frame prior to the delivery of the frame to either the destination or the local segment that connects to the destination removes the TRILL header and forwards the frame (see Figure 19).
RB4 RB5
(AF VLAN 2)
Host
Target
RB
TRILL Encapsulation
RB B
Figure 19. TRILL header added and then removed at the last RBridge.
SUMMARY
TRILL is a new draft standard being created by IETF. The goal of TRILL is to create an L2 shortest path routing protocol to replace STP and enable L2 multi-pathing capability. The more resilient L2 will fulfill the needs of virtualized applications and data migration. It will also enable multi-hop capabilities for FCoE that will drive the expanded adoption of the new technology in converged network environments.
FCoE Handbook
Page 48
FCoE Handbook
Page 49
Because the FCoE entity is internal to the CNAs and doesnt have contacts with the PCI bus, FCoE is not exposed externally. As a result, the server operating system is not aware of FCoE, but views CNAs as adapters with
FCoE Handbook
Page 50
two identities or personalities. The server operating system sees the FC and NIC drivers and handles CNAs as if each contained a NIC and an HBA. In other words, the operating system view of the I/O world does not change with the use of CNAs. This is critical to ensuring non-disruptive introduction of FCoE into existing data center environments. It means that IT professional can continue to deploy applications using FCoE without modifications. They can continue to use management tools, which are not affected by the introduction of CNAs. Simply put, CNAs connect servers to FCoE switches. CNAs are responsible for encapsulating FC traffic into FCoE frames and forwarding them to FCoE switches over 10 GbE Ethernet links as part of converged traffic.
FCoE
FC
PCIe
FCoE Handbook
Page 51
FCoE Handbook
Page 52
The current generation of DCB/FCoE switches are first-hop devices. This means that they cant route FCoE traffic to other switches. Current devices receive a stream of converged traffic, inspect it, and then divide the data into two separate streams, one for FC storage traffic and the other for LAN traffic. When the data leaves the switch it is either in the form of FC frames or Ethernet frames.
FC port FC port FC port FC port
FC switch
FCoE
Figure 22. High-level block diagram of the Brocade 8000 FCoE switch.
FCoE Handbook
Page 53
FCoE Handbook
Page 54
Other cost savings. Data centers will realize significant savings when FCoE is deployed, since they will be able to continue using existing Fibre Channel management tools and not incur retraining costs. Less often cited are the time savings realized from dealing with simpler configurations and a much less cluttered environmenta result of reduced cabling and cable management. Troubleshooting and diagnostics can be performed more easily in environments in which technicians can identify and correct problems more quickly. The reality is that simpler cabling helps reduce the potential for human error.
As an encapsulation protocol, FCoE will perform its functions with some performance overhead above that of the native FC protocol that it encapsulates. In addition, FCoE represents the second attempt (after iSCSI) to converge storage data and LAN traffic over shared Ethernet links. The reality is that data centers with genuine need for high-performing 8 Gbps FC will question the benefits of sharing a 10 GbE link with LAN traffic. For that reason, it is expected that FCoE will most likely be deployed in environments currently using 4 Gbps FC and 1 GbE links. Like most new technologies, FCoE is a new technology that enterprises will first test and deploy in the part of their networks in which some risk can be tolerated before they expand the deployment to other areas. It is expected that in the near term, FCoE will find a home in new server deployments in Windows and Linux environments with virtualized tier 3 and some tier 2 applications.
FCoE Handbook
Page 55
FCoE
Tier 2
Tier 3
X X X
FCoE
X X
FCoE Handbook
Page 56
Top of Rack
The Brocade 8000 Switch is a DCB/FCoE switch that delivers server I/O consolidation to data centers. As noted earlier, such devices are basically L2 switches, so the first deployment model for the Brocade 8000 is a topof-rack (ToR) deployment functioning as an Ethernet switch. In this configuration, the Brocade 8000 performs as a standard Ethernet switch providing server connectivity and delivering 10 GbE performance, as shown in Figure 24.
FCoE Handbook
Page 57
The most likely deployment scenario for the Brocade 8000 is as top of rack in server I/O environments. In this configuration, the Brocade 8000 offers 10 GbE server connectivity and 8 Gbps Fibre Channel connectivity to shared SAN storage. Using the configuration shown in Figure 25, data centers can simplify server I/O environments with fewer cables and ports. And IT managers can take advantage of the benefits of convergence provided by the Brocade 8000.
FCoE Handbook
Page 58
End of Row
The Brocade FCOE10-24 Blade is a DCB/FCoE blade for the Brocade DCX or DCX-4S Backbone. It brings DCB/FCoE capabilities to the backbone platforms and enables end-of-row (EoR) convergence, shown in Figure 25. It uses a built-in FCoE hardware engine to deliver Fibre Channel data to SANs using external FC ports available on other blades on the Brocade DCX. With 24 x 10 GbE ports, the FCOE10-24 also enables high-performance server connectivity.
FCoE Handbook
Page 59