Design Guide For NSX With Cisco Nexus 9000 and Ucs White Paper
Design Guide For NSX With Cisco Nexus 9000 and Ucs White Paper
Design Guide For NSX With Cisco Nexus 9000 and Ucs White Paper
Table of Contents
1 Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 VXLAN and VDS Connectivity with Cisco UCS and Nexus 9000. . . . . . . . . . 5
4 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1 Executive Summary
Enterprise data centers are already realizing the tremendous benefits of server and storage virtualization solutions
to consolidate infrastructure, reduce operational complexity, and dynamically scale application infrastructure.
However, the data center network has not kept pace and remains rigid, complex, and closed to innovation—a barrier
to realizing the full potential of virtualization and the software defined data center (SDDC).
VMware NSX network virtualization delivers for networking what VMware has already delivered for compute and
storage. It enables virtual networks to be created, saved, deleted, and restored on demand without requiring any
reconfiguration of the physical network. The result fundamentally transforms the data center network operational
model, reduces network provisioning time from days or weeks to minutes and dramatically simplifies network
operations.
This document provides guidance for networking and virtualization architects interested in deploying VMware NSX
for vSphere for network virtualization with Cisco UCS (Unified Computing System) blade servers and Cisco Nexus
9000 Series switches. It discusses the fundamental building blocks of NSX with VMware ESXi (the enterprise-class
hypervisor), recommended configurations with Cisco UCS, and the connectivity of Cisco UCS to Nexus 9000
switches.
The NSX introduces an additional infrastructure VLAN that provides single bridge domain for VM guest traffic
carried over physical network.
VXLAN Transport Zone VLAN: During the NSX configuration phase an additional VMkernel interface is created
for VXLAN traffic. Overall, each host is prepared with four VMkernel networks that are presented to Cisco UCS
as well as Nexus 9000 infrastructure. These VLANs are trunked to the Nexus 9000 access-layer switch.
Configuring these four VLANs is a one-time task. This allows the logical networks to be created independently
from the physical network, eliminating the need to define the VLAN every time a new logical segment is added
to accommodate VM growth. The VLAN Switch Virtual Interface (SVI) termination is either at the aggregation
layer or at the access layer, depending on the topology deployed with Nexus 9000 physical network.
Table 2: VXLAN VLAN for VM Guest Traffic
Traffic Type Function VLAN ID
VXLAN Transport Zone VLAN Overlay VXLAN VTEP 103
Connectivity
These enhancements to VXLAN simplify the underlay physical network. For additional details about VXLAN,
packet flow for various layer-2 control plane discovery, and connectivity, please refer to the VMware® NSX for
vSphere Network Virtualization Design Guide.
As one can notice from Table 3, selecting LACP or Static EtherChannel teaming mode limits the choice for
selecting team-modes per traffic types (port-groups for management, vMotions, VXLAN). With LACP or Static
EtherChannel, only one VTEP per host can be configured. Any other teaming modes allow the flexibility to
choose the behavior of failover or load sharing per traffic type. The only exception is that LBT (Load Based
Teaming) mode is not supported for VTEP VMkernel.
The table above also shows the port-configuration mode for Nexus 9000 switches relative to the uplink teaming
mode. Notice that LACP and Static EtherChannel modes require VLAN based vPC (Virtual Port-Channel) and
can only support a single VTEP. The LACP mode is also not possible with Cisco UCS blade server environment
due lack of support on the server-side LACP on Fabric Interconnect.
Typically in a layer-2 topology, the VLAN ID only has to be unique to the layer-2 domain. In the diagram above,
two distinct layer-2 PODs each have locally unique VLAN ID. However, the VXLAN transport zone, which defines
the scope of VXLAN enabled cluster, spans both PODs. This implies that VLAN ID for VXLAN has to be the same
for both the PODs. In other words one would map the VLAN designated for VXLAN with two different subnets
for the same VLAN ID; however at a different aggregation boundary for each POD. This is depicted in the above
figure with VLAN ID for VXLAN being 103 extending both pods however the subnet that it maps to is unique at
aggregation layer. This multi-POD case is similar to a spine-leaf routed data center design. The only difference is
that in spine-leaf routed DC layer-3 demarcation starts at the access layer, which is discussed next.
However, the exception is the selection of the VLAN ID configured for the given VDS VXLAN transport zone. The
VLAN ID must be the same for VXLAN VTEP for each rack/ToR and map to the unique subnet. In other words,
for the VXLAN VTEP, the VLAN ID remains the same for every ToR; however, the subnet that maps to VLANs is
unique per ToR. One can keep the VLAN ID for the rest of the traffic types to be the same for every rack. This
simplifies the configuration for every rack and only requires configuration once. As an example, this is depicted
in the table above, which can be repeated for each ToR configuration with unique subnet identified by rack ID.
Data Plane Compute and Edge VDS kernel Compute & Edge Cluster
East-West components – VXLAN
forwarding & DLR (Distributed
Logical Router)
Data Plane Edge Service Gateway (ESG) Edge Cluster
North-South
The VMware® NSX for vSphere Network Virtualization Design Guide recommends building three distinct
vSphere cluster types. The figure below shows an example of logical components of cluster design to the
physical rack placement.
As shown in the diagram, edge and management clusters are distributed to separate physical racks and connect
to separate ToR switches. For management and edge clusters, the resources are shared or split between two
racks to avoid any single rack failure. This also enables scaling.
Note that for even in smaller configurations a single rack can be used to provide connectivity for the edge and
management cluster. The key concept is that the edge cluster configuration is localized to a ToR pair to reduce
the span of layer-2 requirements; this also helps localize the egress routing configuration to a pair of ToR
switches. The localization of edge components also allows flexibility in selecting the appropriate hardware (CPU,
memory and NIC) and features based on network-centric functionalities such as firewall, NetFlow, NAT and
ECMP routing.
In order to provide a recommendation on connecting host belonging to different cluster types it is important to
know the VDS uplink design option as well as the Nexus 9000 capability support. These capabilities are
described in section 2.1.2.
The VMware® NSX for vSphere Network Virtualization Design Guide best practices document calls for a
separate VDS for compute and edge cluster. This enables flexibility of choosing VDS uplink configuration mode
per cluster type. It is important to note that the guidelines provided below supersede the VMware® NSX for
vSphere Network Virtualization Design Guide guideline in some cases as these recommendations apply only
to Cisco Nexus 9000 switches.
As of this writing, Nexus 9000 switches do not support routing over vPC. Therefore, the recommendation for
edge clusters is to select either the Explicit Failover Order or the SRC-ID as teaming options for VDS dvUplink.
This will allow the edge cluster to establish a routing peer over a selected dvUplink along with load sharing per
ECMP edge VM to a dvUplink. Please refer to below URL or latest release URL for an additional topology and
connectivity options with Nexus 9000 switches:
Configuring vPCs
In addition, the non-LACP uplink-teaming mode allows the multiple-VTEPs configuration recommended with
Cisco UCS blade servers.
As shown in the figure above, each ECMP node peers over its respective external VLANs to exactly one Nexus
router. Each external VLAN is defined only on one ESXi uplink (in the figure above external VLAN10 is enabled on
uplink toward R1 while external VLAN20 on the uplink toward R2). This is done so that under normal
circumstances both ESXi uplinks can be concurrently utilized to send and receive north-south traffic, even without
requiring the creation of a port-channel between the ESXi host and the ToR devices.
In addition, with this model a physical failure of an ESXi NIC would correspond to a logical uplink failure for the
NSX edge running inside that host, and the edge would continue sending and receiving traffic leveraging the
second logical uplink (the second physical ESXi NIC interface).
In order to build a resilient design capable of tolerating the complete loss of an edge rack, it is also recommended
to deploy two sets of four edge gateways in two separate edge racks. The below table describes the necessary
configuration with ECMP edge.
Table 6: Edge Cluster VDS Configuration
Port Group VLAN dvUplink 1 dvUplink 2 Load Balancing
VTEPs XXX Active Active SRC_ID
Edge-External-1 YYY Active NA SRC_ID
Edge-External-2 ZZZ NA Active SRC_ID
NSX builds multicast-free VXLAN based overlay networks. One can extend layer-2 and IP subnets across servers
connected to different ToR Nexus 9000 switches in a layer-3 fabric. This layer-2 adjacency between the VMs can
be established independently of the physical network configuration. New logical networks can be created on
demand via NSX, decoupling the logical virtual network from the physical network topology.
The key benefit of distributed routing is an optimal scale-out routing for east-west traffic between VMs. Each
hypervisor has a kernel module that is capable of a routing lookup and forwarding decision. As shown in Figure 10
above, traffic within a single host can be routed optimally within the host itself—even if the VMs are part of a
different logical switch. The localized forwarding reduces traffic to the ToR and potential for reduced latency as
packets are switched locally in memory.
Traffic across hosts needs to go to the physical switch where NSX can make a forwarding decision based upon the
destination VTEP IP. In Figure 11 below, traffic between two VMs on two different VTEP IP addresses is sent up to
the UCS fabric interconnects. However, since VTEP IP 10.0.20.10 and VTEP IP 10.0.20.11 are in the same layer-2
domain, the UCS fabric interconnect can forward it without sending it up to the Nexus 9300 switch, reducing the
physical number of hops needed—thereby improving latency, performance and oversubscription.
In a classic architecture all traffic would be forwarded to the switch with the SVI configuration; that is not neces-
sary with the NSX distributed routing capability.
The distributed router scale-out capability supports multi-tenancy in which multiple distributed logical router
instances can be invoked to provide routing-control plane separation within the shared infrastructure.
NSX provides a scale-out routing architecture with the use of ECMP between the NSX distributed router and the
NSX Edge routing instances as shown in the figure below. The NSX Edges can peer using dynamic routing
protocols (OSPF or BGP) with the physical routers and provide scalable bandwidth. In the case of a Nexus 9000
switch infrastructure, the routing peer could be a ToR Nexus 9300.
Layer-2 bridging design considerations are covered in the NSX design guide. Additionally, one can use multicast-
based HW VTEP integration if needed, with additional design considerations.
As shown in the figure above, the designer now has flexibility in building a sophisticated policy since policy is not
tied to physical topology. The policy can be customized for inter- and intra-layer-2 segment(s), complete or partial
access, as well as managing N-S rules sets that can be employed directly at the VM level with edge firewall being
an option for the interdomain security boundary.
Micro-segmentation as shown in the figure above allows creating a PCI zone within a shared segment, allowing
sophisticated security policies for desktops in a VDI environment as well as eliminating the scaling limitation of
centralized access-control ACL management.
The figure above shows the power of a software-based load-balancer in which multiple instances of the load-
balancer serve multiple applications or segments. Each instance of the load-balancer is an edge appliance that
can be dynamically defined via an API as needed and deployed in a high-availability mode. Alternatively, the load
balancer can be deployed in an in-line mode, which can serve the entire logical domain. The in-line load-balancer
can scale via enabling multi-tier edge per application such that each application is a dedicated domain for which
first-tier edge is a gateway for an application, the second-tier edge can be an ECMP gateway to provide the
scalable north-south bandwith.
As one can observe from the figure above, the first application block on the left is allowing a single-ARM load-
balancer with distributed logical routing. The center and the right block of the application allow an in-line load-
balancer with either routed or NAT capability respectively. The second-tier edge is enabled with ECMP mode to
allow the application to scale on demand from 10GB to 80GB and more.
Conclusion
NSX, deployed on top of Nexus 9000 switches and Cisco UCS blade server infrastructure, enables best-of-breed
design with flexibility and ease of deployment for a full stack of virtualized network services. The programmatic
capability of software-based services opens the door for self-service IT, dynamic orchestration of workloads, and
strong security policies with connectivity to the hybrid cloud.