sFlow: virtualization

Showing posts with label virtualization. Show all posts

Monday, March 20, 2017

Nutanix

Maximum Performance from Acropolis Hypervisor and Open vSwitch describes the network architecture within a Nutanix converged infrastructure appliance - see diagram above. This article will explore how the Host sFlow agent can be deployed to enable sFlow instrumentation in the Open vSwitch (OVS) and deliver streaming network and system telemetry from nodes in a Nutanix cluster.

This article is based on a single hardware node running Nutanix Community Edition (CE), built following the instruction in Part I: How to setup a three-node NUC Nutanix CE cluster. If you don't have hardware readily available, the article, 6 Nested Virtualization Resources To Get You Started With Community Edition, describes how to run Nutanix CE as a virtual machine.

The sFlow standard is widely supported by network equipment vendors, which combined with sFlow from each Nutanix appliance, delivers end to end visibility in the Nutanix cluster. The following screen captures from the free sFlowTrend tool are representative examples of the data available from the Nutanix appliance.

The Network > Top N chart displays the top flows traversing OVS. In this case an HTTP connection is responsible for most of the traffic. Inter-VM and external traffic flows traverse OVS and are efficiently monitored by the embedded sFlow instrumentation.

The Hosts > CPU utilization chart shows an increase in CPU utilization due to the increased traffic.

The Hosts > Disk IO shows the Write operations associated with connection.

Installing Host sFlow agent on Nutanix appliance

The following steps install Host sFlow on a Nutanix device:

First log into the Nutanix host as root.

Update June 19, 2019: When you log in as root you will see a warning that installing software on the hypervisor is not supported. See comment below.

Next, find the latest version of the Centos 7 RPM on sFlow.net and use the following commands to download and install the software:

wget https://github.com/sflow/host-sflow/releases/download/v2.0.8-1/hsflowd-centos7-2.0.8-1.x86_64.rpm
rpm -ivh hsflowd-centos7-2.0.8-1.x86_64.rpm
rm hsflowd-centos7-2.0.8-1.x86_64.rpm

Edit the /etc/hsflowd.conf file to direct sFlow telemetry to collector 10.0.0.50, enable KVM monitoring (virtual machine stats), and push sFlow configuration to OVS (network stats):

sflow {
  ...
  # collectors:
  collector { ip=10.0.0.50 udpport=6343 }
  ...
  # Open vSwitch sFlow configuration:
  ovs { }
  # KVM (libvirt) hypervisor and VM monitoring:
  kvm { }
  ...
}

Now start the Host sFlow daemon:

systemctl enable hsflowd.service
systemctl start hsflowd.service

Data will immediately start to appear in sFlowTrend.

Saturday, March 12, 2016

Microservices

Figure 1: Visibility and the software defined data center

In the land of microservices, the network is the king(maker) by Sudip Chakrabarti, Lightspeed Venture Partners, makes the case that visibility into network traffic is the key to monitoring, managing and securing applications that are composed of large numbers of communicating services running in virtual machines or containers.

While I genuinely believe that the network will play an immensely strategic role in the microservices world, inspecting and storing billions of API calls on a daily basis will require significant computing and storage resources. In addition, deep packet inspection could be challenging at line rates; so, sampling, at the expense of full visibility, might be an alternative. Finally, network traffic analysis must be combined with service-level telemetry data (that we already collect today) in order to get a comprehensive and in-depth picture of the distributed application.

Sampling isn't just an alternative, sampling is the key to making large scale microservice visibility a reality. Shrink ray describes how sampling acts as a scaling function, reducing the task of monitoring large scale microservice infrastructure from an intractable measurement and big data problem to a lightweight real-time data center wide visibility solution for monitoring, managing, optimizing and securing the infrastructure.

Figure 2: sFlow Host Structures

Industry standard sFlow is the multi-vendor method for distributed sampling of network traffic. The sFlow standard is model based - models of entities such as interfaces, switches, routers, forwarding state, hosts, virtual machines, messages, etc. are used to define standard measurements that describe their operation. Standardized measurements embedded within the infrastructure ensure consistent reporting that is independent of the specific vendors and application stacks deployed in the data center. Push vs Pull describes how sFlow's push based streaming telemetry addresses the challenge of monitoring large scale cloud environments where services and hosts are constantly being added, removed, started and stopped. In addition, sFlow Host Structures describes how the data model allows telemetry streams from independent sources in network, server and application entities to be combined at the sFlow receiver to provide end to end visibility into the microservice interactions and the compute and networking services on which they depend.

The challenge in delivering network visibility to microservice management tools is not technical - the solution is fully deployable today:

Applications - e.g. Apache, NGINX, Tomcat, HAproxy, ADC (F5, A10, ..), Memcache, ...
Virtual Servers - e.g. Xen, Hyper-V, KVM, Docker, JVM, ...
Virtual Network - e.g. Open vSwitch, Linux Bridge, macvlan, ...
Servers - e.g. Linux, Windows, FreeBSD, Solaris, AIX
Network - e.g. Cisco Nexus 9k/3k, Arista, Juniper QFX/EX, Dell, HPE, Brocade, Cumulus, Big Switch, Pica8, Quanta, ... – visit sFlow.org for a complete list

Network, system and application teams working together can enable sFlow instrumentation that is already embedded throughout the infrastructure to achieve comprehensive visibility into microservice interactions.

Incorporating sFlow analytics into the microservices architecture is straightforward. The sFlow-RT analytics engine processes the raw telemetry streams, combines data using the data model, and delivers visibility as a REST based microservice that is easily consumed by new and existing cloud based or locally hosted orchestration, operations, and security tools.

Saturday, February 27, 2016

Open vSwitch version 2.5 released

The recent Open vSwitch version 2.5 release includes significant network virtualization enhancements:

   - sFlow agent now reports tunnel and MPLS structures.
   ...
   - Add experimental version of OVN.  OVN, the Open Virtual Network, is a
     system to support virtual network abstraction.  OVN complements the
     existing capabilities of OVS to add native support for virtual network
     abstractions, such as virtual L2 and L3 overlays and security groups.

The sFlow Tunnel Structures specification enhances visibility into network virtualization by capturing encapsulation / decapsulation actions performed by tunnel end points. In many network virtualization implementations VXLAN, GRE, Geneve tunnels are terminate in Open vSwitch and so the new feature has broad application.

The second related feature is the inclusion of the Open Virtual Network (OVN), providing a simple method of building virtual networks for OpenStack and Docker.

The following articles provide additional background:

Thursday, January 21, 2016

Podcast with Nick Buraglio and Brent Salisbury

"Have you seen sFlow options in your router configuration or flow collector? Are you looking for alternatives to SNMP or NetFlow? Have you been curious about the instrumentation of your new white box or virtual switch? Yes? Then you will probably enjoy learning more about sFlow!"

Non-Blocking #1: SFlow With Peter Phaal Of InMon And SFlow.Org is a discussion between Brent Salisbury (networkstatic.net), Nick Buraglio (forwardingplane.net), and Peter Phaal (blog.sflow.com).

Web sites and tools mentioned in the podcast:

sFlow.org
Devices that support sFlow
Software to analyze sFlow
sFlow.org mailing list
sFlow structures
blog.sflow.com (incorrectly referenced as blog.sflow.org in the podcast)
Host sFlow
sflowtool

The podcast touches on a number of topics that have been explored in greater detail on this blog. The topics are listed in roughly the order they are mentioned in the podcast:

Friday, November 20, 2015

Open vSwitch 2015 Fall Conference

Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2015 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: Open Network Virtualization (OVN), containers, service chaining, and network function virtualization (NFV).

The video above is a recording of the following sFlow related talk from the conference:

New OVS instrumentation features aimed at real-time monitoring of virtual networks (Peter Phaal, InMon)
The talk will describe the recently added packet-sampling mechanism that returns the full list of OVS actions from the kernel. A demonstration will show how the OVS sFlow agent uses this mechanism to provide real-time tunnel visibility. The motivation for this visibility will be discussed, using examples such as end-to-end troubleshooting across physical and virtual networks, and tuning network packet paths by influencing workload placement in a VM/Container environment.

This talk is a follow up to an Open vSwitch 2014 Fall Conference talk on the role of monitoring in building feedback control systems.

Slides and videos for all the conference talks are available on the Open vSwitch web site.

Monday, December 1, 2014

Open vSwitch 2014 Fall Conference

Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2014 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: large scale operation experiences at Rackspace, implementing stateful firewalls, Docker networking, and acceleration technologies (Intel DPDK and Netmap/VALE).

The video above is a recording of the following sFlow related talk from the conference:

Traffic visibility and control with sFlow (Peter Phaal, InMon)
sFlow instrumentation has been included in Open vSwitch since version 0.99.1 (released 25 Jan 2010). This talk will introduce the sFlow architecture and discuss how it differs from NetFlow/IPFIX, particularly in regards to delivering real-time flow analytics to an SDN controller. The talk will demonstrate that sFlow measurements from Open vSwitch are identical to sFlow measurements made in hardware on bare metal switches, providing unified, end-to-end, measurement across physical and virtual networks. Finally, Open vSwitch / Mininet will be used to demonstrate Elephant flow detection and marking using a combination of sFlow and OpenFlow.

Slides and videos for all the conference talks will soon be available on the Open vSwitch web site.

Tuesday, November 4, 2014

SDN fabric controllers

Credit: sFlow.com

There is an ongoing debate in the software defined networking community about the functional split between a software edge and the physical core. Brad Hedlund argues the case in On choosing VMware NSX or Cisco ACI that a software only solution maximizes flexibility and creates fluid resource pools. Brad argues for a network overlay architecture that is entirely software based and completely independent of the underlying physical network. On the other hand, Ivan Pepelnjak argues in Overlay-to-underlay network interactions: document your hidden assumptions that the physical core cannot be ignored and, when you get past the marketing hype, even the proponents of network virtualization acknowledge the importance of the physical network in delivering edge services.

Despite differences, the advantages of a software based network edge are compelling and there is emerging consensus behind this architecture with a large number of solutions available, including: Hadoop, Mesos, OpenStack, VMware NSX, Juniper OpenContrail, Midokura Midonet, Nuage Networks Virtual Services Platform, CPLANE Dynamic Virtual Networks and PLUMgrid Open Networking Suite.

In addition, the move to a software based network edge is leading to the adoption of configuration management and deployment tools from the DevOps community such as Puppet, Chef, Ansible, CFEngine, and Salt. As network switches become more open, these same tools are increasingly being used to manage switch configurations, reducing operational complexity and increasing agility by coordinating network, server, and application configurations.

The following articles from network virtualization proponents touch on the need for visibility and performance from the physical core:

Demo: End to end, hop by hop, physical and virtual network flow visibility with NSX, Brad Hedlund
A Tale of Two Layers – Correlating Overlay and Physical Network Data for better OpenStack Network Analytics, OpenContrail Blog
Elephant Detection in the vSwitch With Performance Handling in the Underlay, Network Heresy

While acknowledging the dependency on the underlying physical fabric, the articles don't offer practical solutions to deliver comprehensive visibility and automated management of the physical network to support the needs of a software defined edge.

In this evolving environment, how does software defined networking apply to the physical core and deliver the visibility and control needed to support the emerging software edge?

Credit: Cisco ACI

Cisco's Application Centric Infrastructure (ACI) is one approach. The monolithic Application Centric Infrastructure Controller (APIC) uses Cisco's OpFlex protocol to orchestrate networking, storage, compute and application services.

The recent announcement of Switch Fabric Accelerator (SFA) offers a modular alternative to Cisco ACI. The controller leverages open APIs to monitor and control network devices, and works with existing edge controllers and configuration management tools to deliver the visibility and control of physical network resources needed to support current and emerging edge services.

The following table compares the two approaches:

	Cisco ACI	InMon SFA
Switch vendors	Cisco only - Nexus 9K	Inexpensive commodity switches from multiple vendors, including: Alcatel-Lucent Enterprise, Arista, Brocade, Cisco Nexus 3K, Cumulus, Dell, Edge-Core, Extreme, Huawei, IBM, HP, Juniper, Mellanox, NEC, Pica8, Pluribus, Quanta, ZTE
Switch hardware	Custom Application Leaf Engine (ALE) chip + merchant silicon ASIC	Merchant silicon ASICs from Broadcom, Intel or Marvell
Software vSwitch	Cisco Application Virtual Switch managed by Cisco APIC	Agnostic. Choose vSwitch to maximize functionality of edge. vSwitch is managed by edge controller.
Visibility		Analytics based on industry standard sFlow measurement
Boost throughput	Cisco proprietary ALE chip and proprietary VxLAN extension	Controls based on industry standard sFlow measurement and hybrid control API
Reduce latency	Cisco proprietary ALE chip and proprietary VxLAN extension	Controls based on DSCP/QoS, industry standard measurement and hybrid control API
Limit impact of DDoS attacks		Controls based on industry standard sFlow measurements and hybrid control API

A loosely federated approach allows customers to benefit from a number of important trends: inexpensive bare metal / white box switches, rich ecosystem of edge networking software, network function virtualization, and well established DevOps orchestration tools. On the other hand, tight integration limits choice and locks customers into Cisco's hardware and ecosystem of partners, increasing cost without delivering clear benefits.

Sunday, September 22, 2013

Wile E. Coyote

One of the classic moments in a Road Runner cartoon is Wile E. Coyote pursuing the Road Runner into a cloud of dust. Wile E. Coyote starts to suspect that there is something wrong, but remains suspended until the moment of realization that he is no longer on the road, but is instead suspended in mid-air over a chasm.

In the cartoon, the dust cloud allows Wile E. Coyote to temporarily defy the laws of physics by hiding the underlying physical topography. The Road Runner is under no such illusion - by leading the Road Runner is able to see the road ahead and stay on firm ground.

Example of an SDN solution with tunnels

Current network virtualization architectures are built on a similar cartoon reality - hiding the network under a cloud (using an overlay network of tunnels) and asserting that applications will somehow be insulated from the physical network topology and communication devices.

The network virtualization software used to establish and manage the overlay are a form of distributed computing system that delivers network connectivity as a service. Vendors of network virtualization software that assert that their solution is "independent of underlying hardware" are making flawed assumptions about networking that are common to distributing computing systems and are collectively known as the Fallacies of Distributed Computing:

The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous

This article isn't intended to dismiss the value of the network virtualization abstraction. Virtualizing networking greatly increases operational flexibility. In addition, the move of complex functionality from the network core to edge hardware and virtual switches simplifies configuration and deployment of network functions (e.g. load balancing, firewalls, routing etc.). However, in order to realize the virtual network abstraction the orchestration system needs to be aware of the physical resources on which the service depends. The limitations of ignoring physical networking are demonstrated in the article, Multi-tenant performance isolation, which provides a real-life example of the type of service failure that impacts the entire data center and is difficult to address with current network virtualization architectures.

To be effective, virtualization architectures needs to be less like Wile E. Coyote, blindly running into trouble, and more like the Road Runner, fully aware of road ahead, safely navigating around obstacles and using resources to maximum advantage. In much the same way the hypervisor takes responsibility for managing limited physical resources like memory, CPU cycles and I/O bandwidth in order to deliver compute virtualization; the network virtualization system needs to be aware of the physical networking resources in order to integrate them into the virtualization stack. The article, NUMA, draws the parallel between how operating systems optimize performance by being aware of the location of resources and how cloud orchestration systems need to be similarly location aware.

One of the main reasons for the popularity of current overlay approaches to network virtualization has nothing to do with technology. The organizational silos that separate networking, compute and application operational teams in most enterprises make it difficult to deploy integrated solutions. Given the organizational challenges, it is easy to see the appeal to vendors creating overlay based products that bypasses the network silo and deliver operational flexibility to the virtualization team - see Network virtualization, management silos and missed opportunities. However, as network virtualization reaches the mainstream and software defined networking matures, expect to see enterprises integrate their functional teams and the emergence of network virtualization solutions that address current limitations. Multi-tenant traffic in virtualized network environments, examine the architectural problems with current cloud architectures and describe the benefits of taking a holistic, visibility driven, approach to coordinating network, compute, storage and application resources.

Monday, August 26, 2013

NSX network gateway services

Figure 1: VMware NSX network gateway services partners

VMware recently released the list of Network Gateway Services (top of rack switch) partners. All but one of these vendors supports the sFlow standard for network visibility across their full range of data center switches (Arista Networks, Brocade Networks, Dell Systems, HP and Juniper Networks). The remaining vendor, Cumulus Networks, has developed a version of Linux that runs on merchant silicon based hardware platforms. Merchant silicon switch ASICs include hardware support for sFlow and it is likely that future versions of Cumulus Linux will expose this capability.

Figure 2: Network gateway services / VxLAN tunnel endpoint (VTEP)

Figure 2 from Network Virtualization Gets Physical shows the role that top of rack switches play in virtualizing physical workloads (e.g. servers, load balancers, firewalls, etc.). Essentially, the physical top of rack switch provides the same services for the physical devices as Open vSwitch in the hypervisor provides for virtual machines. The OVSDB protocol, described in the Internet Draft (ID) The Open vSwitch Database Management Protocol, allows the NSX controller to configure physical and virtual switches to set up the VxLAN tunnels used to overlay the virtual networks over the underlying physical network.

The Open vSwitch also supports the sFlow standard, providing a common monitoring solution for virtual switches and top of rack switches. In addition, core switches and routers from the listed partner vendors (and many other switch vendors) also implement sFlow, offering complete end to end visibility into traffic flowing on the virtualized and physical networks.

The packet header export mechanism in sFlow is uniquely suited to monitoring tunneled traffic, see Tunnels, exposing inner and outer addresses and allowing monitoring tools to trace virtualized traffic as it flows over the physical fabric, see Down the rabbit hole.

In addition, F5 networks is listed as a partner in the Application Delivery category. F5 supports sFlow on their BIG IP platform, see F5 BIG-IP LTM and TMOS 11.4.0, providing visibility into application performance (including response times, URLs, status etc.) and linking the front-end performance seen by clients accessing virtual IP addresses (VIPS) with the performance of individual back-end servers.

Embedding visibility in all the elements of the data center provides comprehensive, cost effective, visibility into data center resources. Visibility is critical in reducing operational complexity, improving performance, decreasing the time to identify the root cause of performance problems, and isolating performance between virtual networks, see Multi-tenant performance isolation.

Monday, July 22, 2013

Top of rack network virtualization

Network virtualization (credit Brad Hedlund)

Support for tunneling protocols (NVGRE, VxLAN etc.) and OpenFlow in top of rack switches allows physical hosts to participate in the virtualized network that connects virtual machines, see Network Virtualization: a next generation modular platform for the data center virtual network. In this architecture, the top of rack switch replicates the functionality of the Open vSwitch (OVS) instances running on the hypervisors, allowing the controllers to flexibly integrate physical and virtual servers within the virtualized network.

Support for sFlow and OpenFlow in the top of rack, virtual switches, and within the network fabric, provides the unified, data center wide, visibility and control needed to optimize virtual machine placement, improve network performance, isolate performance between tenants and defend against DDoS attacks.

Monday, April 22, 2013

Multi-tenant traffic in virtualized network environments

Figure 1: Network virtualization (credit Brad Hedlund)

Network Virtualization: a next generation modular platform for the data center virtual network describes the basic concepts of network virtualization. Figure 1 shows the architectural elements of the solution which involves creating tunnels to encapsulate traffic between hypervisors. Tunneling allows the controller to create virtual networks between virtual machines that are independent of the underlying physical network (Any Network in the diagram).

Figure 2: Physical and virtual packet paths

Figure 2 shows a virtual network on the upper layer and maps the paths onto a physical network below. The network virtualization architecture is not aware of the topology of the underlying physical network and so the physical location of virtual machines and resulting packet paths are unlikely to bear any relationship to their logical relationships, resulting in an inefficient "spaghetti" of traffic flows. When a network manager observes traffic on the physical network, the traffic between hypervisors, top of rack switches, or virtual machine to virtual machine will appear to have very little structure.

Figure 3: Apparent virtual network traffic matrix

Figure 3 shows a traffic matrix in which the probability of any virtual machine talking to any other virtual machine is uniform. A network designed to carry this flat traffic matrix must itself be topologically flat, i.e. provide equal bandwidth between all hosts.

Figure 4: Relative cost of different topologies (from Flyways To De-Congest Data Networks)

Figure 4 shows that eliminating over-subscription to create a flat network is expensive, ranging from 2 to 5 times the cost of a conventional network design. Applying this same strategy to the road system would be the equivalent of connecting every town and city with an 8-lane freeway, no matter how small or remote the town. In practice, traffic studies guide development and roads are built where they are needed to satisfy demand. A similar, measurement-based, approach can be applied to network design.

In fact, the traffic matrix isn't random, it just appears random because the virtual machines have been randomly scattered around the data center by the network virtualization layer. Consider an important use case for network virtualization - multi-tenant isolation. Virtual networks are created for each tenant and configured to isolate and protect tenants from each other in the public cloud. Virtual machines assigned to each tenant are free to communicate among themselves, but are prevented for communicating with other tenants in the data center.

Figure 5: Traffic matrix within and between tenants

Figure 5 shows the apparently random traffic matrix shown in Figure 3, but this time the virtual machines have been grouped by tenant and the tenants have been sorted from largest to smallest. The resulting traffic matrix has some interesting features:

The largest tenant occupies a small fraction of the total area in the traffic matrix.
Tenant size rapidly decreases with most tenants being much smaller than the largest few.
The traffic matrix is extremely sparse.

Even this picture is misleading, because if you drill down to look at a single tenant, their traffic matrix is likely to be equally sparse.

Figure 6: Traffic from large map / reduce cluster

Figure 5 shows the traffic matrix for a common large scale workload that a tenant might run in the cloud - map / reduce (Hadoop) - and the paper, Traffic Patterns and Affinities, discusses the sparseness and structure of this traffic matrix in some detail.

Note: There is a striking similarity between the traffic matrices in figures 5 and 6. The reason for the strong diagonal in the Hadoop traffic matrix is that the Hadoop scheduler is topologically aware, assigning compute tasks to nodes that are close to the storage they are going to operate on, and orchestrating storage replication in order to minimise non-local transfers. However, when this workload is run over a virtualized network, the virtual machines are scattered, turning this highly localized and efficient traffic pattern into randomly distributed traffic.

Apart from Hadoop, how else might a large tenant use the network? It's worth focusing on large tenants since their workloads are likely to be the hardest to accomodate. Netflix is one of the largest and most sophisticated tenants in the Amazon Elastic Compute Cloud (EC2) and the presentation, Dynamically Scaling Netflix in the Cloud provides some interesting insights into their use of cloud resources.

Figure 7: Netflix elastic load balancing pools

Figure 7 shows how Netflix distributes copies of its service across availability zones. Each service intance, A, B or C is implemented by a scale out pool of virtual machines (EC2 instances). Note also the communication patterns between service pools, resulting in a sparse, structured traffic matrix.

Figure 8: Elastic load balancing

Figure 8 shows how each service within an availability zone is dynamically scaled based on measured demand. As demand increases, additional virtual machines are added to the pool. When demand decreases, virtual machines are released from the pool.

Figure 9: Variation in number of Netflix instances over a 24 hour period

Figure 9 shows how the number of virtual machines in each pool varies over the course of a day as a result of the elastic load balancing. Looking at the graph, one can see that a significant fraction of the virtual machines in each service pool is recycled each day.

Elastic load balancing is a service provided by the underlying infrastructure, the service provider is aware of the pools and the members within each pool. Since it's in the nature of a load balancing pool that each instance has similar traffic pattern to its peers, observing the communication patterns of active pool members would allow a topology aware orchestration controller to select poorly placed VMs when making a removal decision and add new VMs in locations that are close to their peers.

Note: Netflix maintains a base number of reserved instances (reserved instances are the least expensive option, provided you can keep them busy) and uses this "free" capacity for analytics tasks (Hadoop) during off peak periods. Exposing basic locality information to tenants would allow them to better configure topology aware workloads like Hadoop, delivering improved performance and reducing traffic on the shared physical network.

Multi-tenancy is just one application of network virtualization. However, the general concept of creating multiple virtual networks implies constraints on communication patterns and a location aware virtual network controller will be able to reduce network loads, improve application performance, and increase scaleability by placing nodes that communicate together topologically close to each other.

There are challenges dealing with large tenants since they may have large groups of machines that need to be provided with high bandwidth communication. Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers describes some of the limitations of fixed configuration networks and describes how optical networking can be used to flexibly allocate bandwidth where it is needed.

Figure 10: Demonstrating AWESOME in the Pursuit of the Optical Data Center

Figure 10, from the article Demonstrating AWESOME in the Pursuit of the Optical Data Center, shows a joint Plexxi and Calient solution that orchestrates connectivity based on what Plexxi terms network affinities. This technology can be used to "rewire" the network to create tailored pods to efficiently accomodate large tenants. The paper, PAST: Scalable Ethernet for Data Centers, describes how software defined networking can be used to exploit the capabilities of merchant silicon to deliver bandwidth where it is needed.

However flexible the network, coordinated management of storage, virtual machine and networking resources is required to fully realize the flexibility and efficiency promised by cloud data centers. The paper, Joint VM Placement and Routing for Data Center Traffic Engineering, shows that jointly optimizing network and server resources can yield significant benefits.

Note: Vint Cerf recently revealed that Google has re-engineering its data center networks to use OpenFlow based software defined networks, possibly bringing networking under the coordinated control of their data center resource management system. In addition, one of the authors of the Helios paper, Amin Vahdat, is a distinguished engineer at Google and has described Google's use of optical networking and OpenFlow in the context of WAN traffic engineering; it would be surprising if Google weren't applying similar techniques within their data centers.

Comprehensive measurement is an essential, but often overlooked, component of an adaptive architecture. The controller cannot optimally place workloads if the traffic matrix, link utilizations, and server loads are not known. The widely supported sFlow standard addresses the requirement for pervasive visibility by embedding instrumentation within physical and virtual switches, and in the servers and applications making use of the network to provide the integrated view of performance needed for unified control.

Finally, there are significant challenges to realizing revolutionary improvements in datacenter flexibility and scaleability, many of which aren't technical. Network virtualization, management silos and missed opportunities discusses how inflexible human organizational structures are being reflected in the data center architectures proposed by industry consortia. The article talks about Open Stack, but the recently formed Open Daylight consortium seems to have similar issues, freezing in place existing architectures that offer incremental benefits, rather than providing the flexibility needed for radical innovation and improvement.

Saturday, April 20, 2013

Merchant silicon competition

Figure 1: Open Network Platform Switch Reference Design (see Intel Product Brief)

Rose Schooler's keynote at the recent Open Networking Summit described Intel's new reference switch platform. Intel merchant silicon addresses network virtualization and SDN use cases through support for open standards, including: OpenFlow, NVGRE, VxLAN and sFlow.

Figure 2: Top of rack using Broadcom merchant silicon (from Merchant silicon)

Intel appears to be targeting Broadcom's postion as merchant silicon provider in the data center switch market. Just as competition between Intel, AMD and ARM has spurred innovation, increased choice, and driven down CPU prices, competition between merchant silicon vendors promises similar benefits.

In the compute space, the freedom to choose operating systems (Windows, Linux, Solaris etc.) increases competition among hardware vendors and between operating system vendors. Choices in switch operating system are starting to appear (PicOS and the open source Switch Light project), opening the door to disruptive change in the networking market that is likely to mirror the transition from proprietary minicomputers to commodity x86 servers that occurred in the 1980's.

Thursday, January 31, 2013

Down the rabbit hole

The article, Tunnels, describes the use of tunneling protocols such as GRE, NVGRE and VXLAN to create virtual networks in cloud environments. Tunneling is also an important tool in addressing challenges posed by IPv6 migration. However, while tunnels are an effective way to virtualize networking, they pose difficult challenges for application development and operations (DevOps) teams trying to optimize network performance and for network administrators who no longer have visibility into the applications running over the physical infrastructure.

This article uses sFlow-RT to demonstrate how sFlow monitoring, build into the physical and virtual network infrastructure, can be used to provide comprehensive visibility into tunneled traffic to application, operations and networking teams.

Note: The sFlow-RT analytics module is primarily intended to be used in automated performance aware software defined networking applications. However, it also provides a rudimentary web based user interface that can be used to demonstrate the visibility into tunneled traffic offered by the sFlow standard.

Application performance

One of the reasons that tunnels are popular for network virtualization is that they provide a useful abstraction that hides the underlying physical network topology. However, while this abstraction offers significant operational flexibility, lack of visibility into the physical network can result in poorly placed workloads, inefficient use of resources, and consequent performance problems (see NUMA).

In this example, consider the problem faced by a system manager troubleshooting poor throughput between two virtual machines: 10.0.201.1 and 10.0.201.2.

Figure 1: Tracing a tunneled flow

Figure 1 shows the Flows table with the following flow definition:

Name: trace
Keys: ipsource,ipdestination,ipprotocol
Value: frames
Filter: ipsource.1=10.0.201.1&ipdestination.1=10.0.201.2

These settings define a new flow definition called trace that is looking for traffic in which the inner (tenant) addresses are 10.0.201.1 and 10.0.201.2 and asks for information on the outer IP addresses.

Note: ipsource.1 has a suffix of 1, indicating a reference to the inner address. It is possible to have nested tunnels such that the inner, inner ipsource address would be indicated as ipsource.2 etc.

Figure 2: Outer addresses of a tunneled flow

Clicking on the flow in the Flows table brings up the chart shown in Figure 2. The chart shows a flow of approximately 15K packets per second and identifies the outer ipsource, ipdestination and ipprotocol as 10.0.0.151, 10.0.0.152 and 47 respectively.

Note: The IP protocol of 47 indicates that this is a GRE tunnel.

Figure 3: All data sources observing a flow

The sFlow-RT module has a REST/HTTP API and editing the URL modifies the query to reveal additional information. Figure 3 shows the effect of changing the query from metric to dump. The dump output shows each switch (Agent) and port (Data Source) that saw the traffic. In this case the traffic was seen traversing 2 virtual switches 10.0.0.28 and 10.0.0.20, and a physical switch 10.0.0.253.

Given the switch and port information, follow up queries could be constructed to look at utilizations, errors and discards on the links to see if there are network problems affecting the traffic.

Network performance

Tunnels hide the applications using the network from network managers, making it difficult to manage capacity, assess the impact of network performance problems and maintain security.

Consider the same example, but this time from a network manager's perspective, having identified a large flow from address 10.0.0.151 to 10.0.0.152.

Figure 4: Looking into a tunnel

Figure 4 shows the Flows table with the following definition:

Name: inside
Keys: ipsource.1,ipdestination.1,stack
Value: frames
Filter: ipsource=10.0.0.151&10.0.0.152

These settings define a new flow called inside that is looking for traffic in which the outer addresses are 10.0.0.151 and 10.0.0.152 and asks for information on the inner (tenant) addresses.

Figure 5: Inner addresses in a tunneled flow

Again, clicking on the entry in the Flows table brings up the chart shown in Figure 5. The chart shows a flow of 15K packets per second and identifies the inner ipsource.1, ipdestination.1 and stack as 10.0.201.1, 10.0.201.2 and eth.ip.gre.ip.tcp respectively.

Given the inner IP addresses and stack, follow up queries can identify the TCP port, server names, application names, CPU loads etc. needed to understand the application demand driving traffic and determine possible actions (moving a virtual machine for example).

Automation

This was a trivial example, in practice tunneled topologies are more complex and cloud data centers are far too large to be managed using manual processes like the one demonstrated here. sFlow-RT provides visibility into large, complex, multi-layered environments, including: QinQ, TRILL, VXLAN, NVGRE and 6over4. Programmatic access to performance data through sFlow-RT's REST API allows cloud orchestration and software defined networking (SDN) controllers to incorporate real-time network, server and application visibility to automatically load balance and optimize workloads.

Monday, May 7, 2012

Tunnels

Figure 1: Network virtualization using tunnels

Layer 3/4 tunnels (GRE, VxLAN, STT, CAPWAP, NVGRE etc.) can be used to virtualize network services so that communication between virtual machines can be provisioned and controlled without dependencies on the underlying network.

Figure 1 shows the basic elements of a typical virtual machine networking stack (VMware, Hyper-V, XenServer, Xen, KVM etc.). Each virtual machine is connected to a software virtual switch using virtual network adapters. The virtual switch delivers packets based on destination MAC address (just like a physical switch). For example, when VM1 sends a packet to VM2, it will create an IP packet with source address vIP 1 and destination address vIP 2. The network stack on VM1 will create an Ethernet frame with source address vMAC 1 and destination address vMAC 2 with the IP packet as the frame payload. The virtual switch on Server 1 receives the Ethernet frame, examines the destination MAC address, and delivers the frame to the virtual network adapter corresponding to vMAC 2, which delivers the frame to VM2. The challenge is ensuring that virtual machines on different servers can communicate while minimizing dependencies on the underlying physical network.

Setting up a L3/4 tunnel between Server 1 and Server 2 (with tunnel endpoint addresses IP 1 and IP 2 respectively) limits the dependency on the physical infrastructure to IP connectivity between the two servers. For example, when VM1 sends a packet to VM3, the virtual switch on Server 1 recognizes that the packet is destined to a VM on Server 2 and sends the packet through the tunnel. On entering the tunnel, the original Ethernet frame from VM1 is encapsulated and the resulting IP packet is sent to Server 2. When Server 2 receives the packet, it extracts the original Ethernet frame, hands it to the virtual switch, which delivers the frame to VM3.

Network virtualization is particularly important in cloud environments where tenants need to be isolated from each other, but still share the same physical infrastructure.

Figure 2: Nicira's Distributed Virtual Network Infrastructure (DVNI)

Nicira's Distributed Virtual Network Infrastructure (DVNI) architecture, shown in Figure 2, is a good example of network virtualization using a Tunnel Mesh to connect virtual switches and overlay multiple Virtual Networks on the shared Physical Fabric. The Controller Cluster manages the virtual switches, setting up tunnels and controlling forwarding behavior.

Note: The Controller Cluster uses the OpenFlow protocol to configure virtual switches, making this an example of Software Defined Networking (SDN).

The importance of visibility in managing virtualized environments has been a constant theme on this blog, see Network visibility in the data center, System boundary and NUMA. The question is, how do you maintain visibility when tunneling is used for network virtualization?

The remainder of this article describes how the widely supported sFlow standard provides detailed visibility into tunneled traffic. You might be surprised to know that every sFlow enabled switch produced in the last 10 years is fully capable of reporting on L3/4 tunneled traffic, in spite of the fact that there is no mention of VxLAN, GRE, etc. in any of the sFlow standard documents.

The key to sFlow's adaptability is that switches export packet headers, leaving it to sFlow analysis software to decode the packet headers and report on traffic, see Choosing an sFlow analyzer. The templates needed to extract tunnel information from packet headers are described in the Internet Drafts and RFCs that define the various tunneling protocols. For example, the following packet diagram from the internet draft VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks describes the fields present in a VxLAN packet header:

            0                   1                   2                   3
            0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     
        Outer Ethernet Header:             |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |             Outer Destination MAC Address                     |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           | Outer Destination MAC Address | Outer Source MAC Address      |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                Outer Source MAC Address                       |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       Optional Ethertype = C-Tag 802.1Q   | Outer.VLAN Tag Information    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           | Ethertype 0x0800              |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        Outer IP Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |Version|  IHL  |Type of Service|          Total Length         |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |         Identification        |Flags|      Fragment Offset    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |  Time to Live |    Protocol   |         Header Checksum       |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                       Outer Source Address                    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                   Outer Destination Address                   |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         Outer UDP Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |       Source Port = xxxx      |       Dest Port = VXLAN Port  |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |           UDP Length          |        UDP Checksum           |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         VXLAN Header:
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |R|R|R|R|I|R|R|R|            Reserved                           |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |                VXLAN Network Identifier (VNI) |   Reserved    |
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             0                   1                   2                   3
             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     
      Inner Ethernet Header:             |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Inner Destination MAC Address                     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            | Inner Destination MAC Address | Inner Source MAC Address      |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |                Inner Source MAC Address                       |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     Optional Ethertype = C-Tag [802.1Q]    | Inner.VLAN Tag Information    |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     Payload:
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            | Ethertype of Original Payload |                               |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
            |                                  Original Ethernet Payload    |
            |                                                               |
            | (Note that the original Ethernet Frame's FCS is not included) |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          Frame Check Sequence:
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Examining the template, the VxLAN header captured by sFlow includes the following information:

outer destination MAC
outer destination IP
outer source MAC
outer source IP
outer VLAN
outer source IP
outer destination IP
VXLAN Network Identifier
inner destination MAC
inner source MAC
inner VLAN
inner Ethertype
original Ethernet Payload (providing inner source, destination IP etc.)

This level of detailed visibility allows network managers to see both the outer tunnel information and the inner VM to VM traffic details.

Generic Routing Encapsulation (GRE) is similar to VxLAN, but the encapsulated Ethernet packet is transported directly over IP, rather UDP. The packet header is described in RFC 2784: Generic Routing Encapsulation (GRE).

Note: Visibility into tunnels is challenging if you are using NetFlow/IPFIX as your traffic monitoring protocol since you are dependent on the switch vendor for the hardware and firmware to decode, analyze and export details of the tunneled traffic, see Software defined networking. Maintaining visibility with traditional flow monitoring technologies is especially difficult in rapidly changing areas like network virtualization where a new tunneling protocol is proposed every few months.

Network visibility gets even more challenging when you throw fabric technologies like 802.1aq and TRILL into the mix. The following packet diagram from RFC 6325: Routing Bridges (RBridges): Base Protocol Specification shows the format of a TRILL header:

   Flow:
     +-----+  +-------+   +-------+       +-------+   +-------+  +----+
     | ESa +--+  RB1  +---+  RB3  +-------+  RB4  +---+  RB2  +--+ESb |
     +-----+  |ingress|   |transit|   ^   |transit|   |egress |  +----+
              +-------+   +-------+   |   +-------+   +-------+
                                      |
   Outer Ethernet Header:             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Outer Destination MAC Address  (RB4)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Outer Destination MAC Address | Outer Source MAC Address      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                Outer Source MAC Address  (RB3)                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Ethertype = C-Tag [802.1Q-2005]| Outer.VLAN Tag Information    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   TRILL Header:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ethertype = TRILL             | V | R |M|Op-Length| Hop Count |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Egress (RB2) Nickname         | Ingress (RB1) Nickname        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Inner Ethernet Header:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Inner Destination MAC Address  (ESb)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Inner Destination MAC Address | Inner Source MAC Address      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                  Inner Source MAC Address  (ESa)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Ethertype = C-Tag [802.1Q-2005]| Inner.VLAN Tag Information    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Payload:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Ethertype of Original Payload |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
      |                                  Original Ethernet Payload    |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   Frame Check Sequence:
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               New FCS (Frame Check Sequence)                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In this case, the L3/4 tunneled packet sent by a virtual switch enters the top of rack switch (RB1 ingress), where it is encapsulated in an outer Ethernet header before being sent across the switch fabric to the destination top of rack switch (RB2 egress), where it is decapsulated and delivered to the destination virtual switch.

With sFlow reporting packet headers from intermediate switches, you get visibility into both sets of outer MAC addresses (TRILL and VxLAN), as well as the inner MAC addresses associated with the VMs and TCP/IP flows between the VMs.

Note: Wireshark can decode almost every protocol you are likely to see in your network and is a great troubleshooting tool to use with sFlow, see Wireshark for details.

Finally, sFlow provides the scalability needed to maintain full, end-to-end, visibility in virtual network environments; delivering multi-layer visibility into the physical fabric as well monitoring inter-VM traffic, in top of rack, intermediate and virtual switches. The sFlow standard is comprehensive, extending beyond network monitoring to provide unified, cloud-scale, visibility that links network, system and application performance in a single integrated system.