Troubleshooting Cisco Nexus Swit
Troubleshooting Cisco Nexus Swit
Troubleshooting Cisco Nexus Swit
Cisco Press
800 East 96th Street
Indianapolis, Indiana 46240 USA
Troubleshooting Cisco Nexus Switches and NX-OS
Vinit Jain, Brad Edgeworth, and Richard Furr
Copyright © 2018 Cisco Systems, Inc.
Published by:
Cisco Press
800 East 96th Street
Indianapolis, IN 46240 USA
All rights reserved. No part of this book may be reproduced or transmitted
in any form or by any means, electronic or mechanical, including
photocopying, recording, or by any information storage and retrieval
system, without written permission from the publisher, except for the
inclusion of brief quotations in a review.
01 18
Library of Congress Control Number: 2018931070
ISBN-13: 978-1-58714-505-6
ISBN-10: 1-58714-505-7
Trademark Acknowledgments
All terms mentioned in this book that are known to be trademarks or
service marks have been appropriately capitalized. Cisco Press or Cisco
Systems, Inc., cannot attest to the accuracy of this information. Use of a
term in this book should not be regarded as affecting the validity of any
trademark or service mark.
Special Sales
For information about buying this title in bulk quantities, or for special
sales opportunities (which may include electronic versions; custom cover
designs; and content particular to your business, training goals, marketing
focus, or branding interests), please contact our corporate sales department
at corpsales@pearsoned.com or (800) 382-3419.
For government sales inquiries, please contact
governmentsales@pearsoned.com.
For questions about sales outside the U.S., please contact
intlcs@pearson.com.
Feedback Information
At Cisco Press, our goal is to create in-depth technical books of the highest
quality and value. Each book is crafted with care and precision, undergoing
rigorous development that involves the unique expertise of members from
the professional technical community.
Readers’ feedback is a natural continuation of this process. If you have any
comments regarding how we could improve the quality of this book, or
otherwise alter it to better suit your needs, you can contact us through
email at feedback@ciscopress.com. Please make sure to include the book
title and ISBN in your message.
We greatly appreciate your assistance.
Editor-in-Chief: Mark Taub
Alliances Manager, Cisco Press: Arezou Gol
Product Line Manager: Brett Bartow
Managing Editor: Sandra Schroeder
Development Editor: Marianne Bartow
Senior Project Editor: Tonya Simpson
Copy Editors: Barbara Hacha, Krista Hansing
Technical Editor(s): Ramiro Garza Rios, Matt Esau
Editorial Assistant: Vanessa Evans
Cover Designer: Chuti Prasertsith
Composition: codemantra
Indexer: Cheryl Lenser
Proofreader: Jeanine Furino
About the Authors
Vinit Jain, CCIE No. 22854 (R&S, SP, Security & DC), is a technical
leader with the Cisco Technical Assistance Center (TAC) providing
escalation support in areas of routing and data center technologies. Vinit is
a speaker at various networking forums, including Cisco Live events
globally on various topics. Prior to joining Cisco, Vinit worked as a CCIE
trainer and a network consultant. In addition to his CCIEs, Vinit holds
multiple certifications on programming and databases. Vinit graduated
from Delhi University in Mathematics and earned his Master’s in
Information Technology from Kuvempu University in India. Vinit can be
found on Twitter as @VinuGenie.
Brad Edgeworth, CCIE No. 31574 (R&S & SP), is a systems engineer at
Cisco Systems. Brad is a distinguished speaker at Cisco Live, where he has
presented on various topics. Before joining Cisco, Brad worked as a
network architect and consultant for various Fortune 500 companies.
Brad’s expertise is based on enterprise and service provider environments
with an emphasis on architectural and operational simplicity. Brad holds a
Bachelor of Arts degree in Computer Systems Management from St.
Edward’s University in Austin, Texas. Brad can be found on Twitter as
@BradEdgeworth.
Richard Furr, CCIE No. 9173 (R&S & SP), is a technical leader with the
Cisco Technical Assistance Center (TAC), supporting customers and TAC
teams around the world. For the past 17 years, Richard has worked for the
Cisco TAC and High Touch Technical Support (HTTS) organizations,
supporting service provider, enterprise, and data center environments.
Richard specializes in resolving complex problems found with routing
protocols, MPLS, multicast, and network overlay technologies.
About the Technical Reviewers
Ramiro Garza Rios, CCIE No. 15469 (R&S, SP, and Security), is a
solutions integration architect with Cisco Advanced Services, where he
plans, designs, implements, and optimizes IP NGN service provider
networks. Before joining Cisco in 2005, he was a network consulting and
presales engineer for a Cisco Gold Partner in Mexico, where he planned,
designed, and implemented both enterprise and service provider networks.
Matt Esau, CCIE No. 18586 (R&S) is a graduate from the University of
North Carolina at Chapel Hill. He currently resides in Ohio with his wife
and two children, ages three and one. Matt is a Distinguished Speaker at
Cisco Live. He started with Cisco in 2002 and has spent 15 years working
closely with customers on troubleshooting issues and product usability.
For the past eight years, he has worked in the Data Center space, with a
focus on Nexus platforms and technologies.
Dedications
This book is dedicated to three important women in my life: my mother,
my wife, Khushboo, and Sonal. Mom, thanks for being a friend and a
teacher in different phases of my life. You have given me the courage to
stand up and fight every challenge that comes my way in life. Khushboo, I
want to thank you for being so patient with my madness and craziness. I
couldn’t have completed this book or any other project without your
support, and I cannot express in words how much it all means to me. This
book is a small token of love, gratitude and appreciation for you. Sonal,
thank you for being the driver behind my craziness. You have inspired me
to reach new heights by setting new targets every time we met. This book
is a small token of my love and gratitude for all that you have done for me.
I would further like to dedicate this book to my dad and my brother for
believing in me and standing behind me as a wall whenever I faced
challenges in life. I couldn’t be where I am today without your invincible
support.
—Vinit Jain
This book is dedicated to David Kyle. Thank you for taking a chance on
me. You will always be more than a former boss. You mentored me with
the right attitude and foundational skills early in my career.
In addition to stress testing the network with Quake, you let me start my
path with networking under you. Look where I am now!
—Brad Edgeworth
This book is dedicated to my loving wife, Sandra, and my daughter,
Calianna. You are my inspiration. Your love and support drive me to
succeed each and every day. Thank you for providing the motivation for
me to push myself further than I thought possible. Calianna, you are only
two years old now. When you are old enough to read this, you will have
long forgotten about all the late nights daddy spent working on this
project. When you hold this book, I want you to remember that anything is
possible through dedication and hard work.
I would like to further dedicate this book to my mother and father. Mom,
thanks for always encouraging me, and for teaching me that I can do
anything I put my mind to. Dad, thank you for always supporting me, and
teaching me how to be dedicated and work hard. Both of you have given
me your best.
—Richard Furr
Acknowledgments
Vinit Jain:
Brad and Richard: Thank you for being part of this yearlong journey. This
project wouldn’t have been possible without your support. It was a great
team effort, and it was a pleasure working with both of you.
I would like to thank our technical editors, Ramiro and Matt, for your in-
depth verification of the content and insightful input to make this project a
successful one.
I couldn’t have completed the milestone without the support from my
managers, Chip Little and Mike Stallings. Thank you for enabling us with
so many resources, as well as being flexible and making an environment
that is full of opportunities.
I would like to thank David Jansen, Lukas Krattiger, Vinayak Sudame,
Shridhar Dhodapkar, and Ryan McKenna for your valuable input during
the course of this book.
Most importantly, I would like to thank Brett Bartow and Marianne
Bartow for their wonderful support on this project. This project wouldn’t
have been possible without your support.
Brad Edgeworth:
Vinit, thanks again for asking me to co-write another book with you.
Richard, thanks again for your insight. I’ve always enjoyed our late-night
conference calls.
Ramiro and Matt, thank you for hiding all my mistakes, or at least
pointing them out before they made it to print!
This is the part of the book that you look at to see if you have been
recognized. Well, many people have provided feedback, suggestions, and
support to make this a great book. Thanks to all who have helped in the
process, especially Brett Bartow, Marianne Bartow, Jay Franklin,
Katherine McNamara, Dustin Schuemann, Craig Smith, and my managers.
P.S. Teagan, this book does not contain dragons or princesses, but the next
one might!
Richard Furr:
I’d like to thank my coauthors, Vinit Jain and Brad Edgeworth, for the
opportunity to work on this project together. It has been equally
challenging and rewarding on many levels.
Brad, thank you for all the guidance and your ruthless red pen on my first
chapter. You showed me how to turn words and sentences into a book.
Vinit, your drive and ambition are contagious. I look forward to working
with both of you again in the future.
I would also like to thank our technical editors, Matt Esau and Ramiro
Garza Rios, for their expertise and guidance. This book would not be
possible without your contributions.
I could not have completed this project without the support and
encouragement of my manager, Mike Stallings. Mike, thank you for
allowing me to be creative and pursue projects like this one. You create the
environment for us to be our best.
Contents at a Glance
Foreword
Introduction
Note
This book covers multiple Nexus switch platforms (5000, 7000, 9000,
etc). A generic NX-OS icon is used along with a naming syntax for
differentiation of devices. Platform-specific topics use a platform-
specific icon and major platform number in the system name.
Foreword
The data center is at the core of all companies in the digital age. It
processes bits and bytes of data that represent products and services to its
customers. The data storage and processing capabilities of a modern
business have become synonymous with the ability to generate revenue.
Companies in all business sectors are storing and processing more
information digitally every year, regardless of their vertical affiliation
(construction, medical, entertainment, and so on). This means that the
network must be designed for speed, capacity, and flexibility.
The Nexus platform was built with speed and bandwidth capacity in mind.
When the Nexus 7000 launched in 2008, it provided high-density 10
Gigabit interfaces at a low per-port cost. In addition, the Nexus switch
operating system, NX-OS, brought forth evolutionary technologies like
virtual port channels (vPC) that increased available bandwidth and
redundancy while overcoming the inefficiencies of Spanning-Tree
Protocol (STP). NX-OS introduced technologies such as Overlay Transport
Virtualization (OTV), which revolutionized the design of the data center
network by enabling host mobility between sites and allowing full data
center redundancy. Today, the Nexus platform continues to evolve by
supporting 25/40/100 Gigabit interfaces in a high-density compact form
factor, and brings other innovative technologies such as VXLAN and
Application Centric Infrastructure (ACI) to the market.
NX-OS was built with the mindset of operational simplicity and includes
additional tools and capabilities that improve the operational efficiency of
the network. Today, websites and applications are expected to be available
24 hours a day, 7 days a week, and 365 days a year. Downtime in the data
center directly translates to a financial impact. The move toward
digitization and the potential impact the network has to a business makes
it more important than ever for network engineers to attain the skills to
troubleshoot data center network environments efficiently.
As the leader of Cisco’s technical services for more than 25 years, I have
the benefit of working with the best network professionals in the industry.
This book is written by Brad, Richard, and Vinit: “Network Rock Stars,”
who have been in my organization for years supporting multiple Cisco
customers. This book provides a complete reference for troubleshooting
Nexus switches and the NX-OS operating system. The methodologies
taught in this book are the same methods used by Cisco’s technical
services to solve a variety of complex network problems.
Joseph Pinto
SVP, Technical Services, Cisco, San Jose
Introduction
The Nexus operating system (NX-OS) contains a modular software
architecture that primarily targets high-speed/high-density network
environments like data centers. NX-OS provides virtualization, high
availability, scalability, and upgradeability features for Nexus switches.
In particular, the NX-OS is expected to have a measure of resilience
during software upgrades or hardware upgrades (failover, OIR), with both
sets of operations not affecting nonstop forwarding. NX-OS is required to
scale to very large multichassis systems and still operate with the same
expectations of resilience in the face of outages of various kinds. The NX-
OS feature set includes a variety of features and protocols that have
revolutionized data center designs with virtual port channels (vPC),
Overlay Transport Virtualization (OTV), and now virtual extensible LAN
(VXLAN).
The Nexus 7000 switch debuted in 2008, providing more than 512 10 Gbps
ports. Over the years, Cisco has released other Nexus switch families that
include the Nexus 5000, Nexus 2000, Nexus 9000, and virtual Nexus 1000.
NX-OS has grown in features, allowing Nexus switch deployments in
enterprise routing and switching roles.
This book is the single source for mastering techniques to troubleshoot
various features and issues running on Nexus platforms with NX-OS
operating system. Bringing together content previously spread across
multiple sources and Cisco Press titles, it covers updated various features
and architecture-level information on how various features function on
Nexus platforms and how one can leverage the capabilities of NX-OS to
troubleshoot them.
Additional Reading
The authors tried to keep the size of the book manageable while providing
only necessary information for the topics involved.
Some readers may require additional reference material and may find the
following books a great supplementary resource for the topics in this book.
Fuller, Ron, David Jansen, and Matthew McPherson. NX-OS and
Cisco Nexus Switching. Indianapolis: Cisco Press, 2013.
Edgeworth, Brad, Aaron Foss, and Ramiro Garza Rios. IP Routing on
Cisco IOS, IOS XE, and IOS XR. Indianapolis: Cisco Press, 2014.
Krattiger, Lukas, Shyam Kapadia, and David Jansen. Building Data
Centers with VXLAN BGP EVPN. Indianapolis: Cisco Press, 2017.
Part I
Introduction to
Troubleshooting Nexus
Switches
Chapter 1
Introduction to Nexus
Operating System (NX-OS)
At the time of its release in 2008, the Nexus operating system (NX-OS) and
the Nexus 7000 platform provided a substantial leap forward in terms of
resiliency, extensibility, virtualization, and system architecture compared to
other switching products of the time. Wasteful excess capacity in bare
metal server resources had already given way to the efficiency of virtual
machines and now that wave was beginning to wash over to the network as
well. Networks were evolving from traditional 3-Tier designs (access layer,
distribution layer, core layer) to designs that required additional capacity,
scale, and availability. It was no longer acceptable to have links sitting idle
due to Spanning Tree Protocol blocking while that capacity could be
utilized to increase the availability of the network.
As network topologies evolved, so did the market’s expectation of the
network infrastructure devices that connected their hosts and network
segments. Network operators were looking for platforms that were more
resilient to failures, offered increased switching capacity, and allowed for
additional network virtualization in their designs to better utilize physical
hardware resources. Better efficiency was also needed in terms of reduced
power consumption and cooling requirements as data centers grew larger
with increased scale.
The Nexus 7000 series was the first platform in Cisco’s Nexus line of
switches created to meet the needs of this changing data center market.
NX-OS combines the functionality of Layer 2 switching, Layer 3 routing,
and SAN switching into a single operating system. From the initial release,
the operating system has continued to evolve, and the portfolio of Nexus
switching products has expanded to include several series of switches that
address the needs of a modern network. Throughout this expansion, the
following four fundamental pillars of NX-OS have remained unchanged:
Resiliency
Virtualization
Efficiency
Extensibility
This chapter introduces the different types of Nexus platforms along with
their placement into the modern network architecture, and the major
functional components of NX-OS. In addition, some of the advanced
serviceability and usability enhancements are introduced to prepare you for
the troubleshooting chapters that follow. This enables you to dive into each
of the troubleshooting chapters with a firm understanding of NX-OS and
Nexus switching to build upon.
Note
All Nexus 3000 series, with the exception of the Nexus 3500 series,
run the same NX-OS software release as the Nexus 9000 series
switches.
Note
The Nexus 9000 series operates in standalone NX-OS mode or in
application-centric infrastructure (ACI) mode, depending on what
software and license is installed. This book covers only Nexus
standalone configurations and troubleshooting.
NX-OS Architecture
Since its inception, the four fundamental pillars of NX-OS have been
resiliency, virtualization, efficiency, and extensibility. The designers also
wanted to provide a user interface that had an IOS-like look and feel so that
customers migrating to NX-OS from legacy products feel comfortable
deploying and operating them. The greatest improvements to the core
operating system over IOS were in the following areas:
Process scheduling
Memory management
Process isolation
Management of feature processes
In NX-OS, feature processes are not started until they are configured by the
user. This saves system resources and allows for greater scalability and
efficiency. The features use their own memory and system resources, which
adds stability to the operating system. Although similar in look and feel,
under the hood, the NX-OS operating system has improved in many areas
over Cisco’s IOS operating system.
The NX-OS modular architecture is depicted in Figure 1-3.
Figure 1-3 NX-OS Modular Architecture
Note
The next section covers some of the fundamental NX-OS components
that are of the most interest. Additional NX-OS services and
components are explained in the context of specific examples
throughout the remainder of this book.
The Kernel
The primary responsibility of the kernel is to manage the resources of the
system and interface with the system hardware components. The NX-OS
operating system uses a Linux kernel to provide key benefits, such as
support for symmetric-multiprocessors (SMPs) and pre-emptive
multitasking. Multithreaded processes can be scheduled and distributed
across multiple processors for improved scalability. Each component
process of the OS was designed to be modular, self-contained, and memory
protected from other component processes. This approach results in a
highly resilient system where process faults are isolated and therefore
easier to recover from when failure occurs. This self-contained, self-
healing approach means that recovery from such a condition is possible
with no or minimal interruption because individual processes are restarted
and the system self-heals without requiring a reload.
Note
Historically, access to the Linux portion of NX-OS required the
installation of a “debug plugin” by Cisco support personnel. However,
on some platforms NX-OS now offers a feature bash-shell that allows
users to access the underlying Linux portion of NX-OS.
Additional details about a service, such as its current state, how many times
it has restarted, and how many times it has crashed is viewed by using the
UUID obtained in the output of the previous command. The syntax for the
command is show system internal sysmgr service uuid uuid as
demonstrated in Example 1-2.
Note
If a service has crashed, the process name, PID, and date/time of the
event is found in the output of show cores.
In Example 1-4, the show system internal mts sup sap sap-id
[description | uuid | stats] command is used to obtain details about a
particular SAP. To examine a particular SAP, first confirm that the service
name and UUID match the values from the show system internal sysmgr
services all command. This is a sanity check to ensure the correct SAP is
being investigated. The output of show system internal mts sup sap sap-id
[description] should match the service name, and the output of show
system internal mts sup sap sap-id [UUID] should match the UUID in the
sysmgr output. Next examine the MTS statistics for the SAP. This output is
useful to determine what the maximum value of the MTS queue was (high-
water mark), as well as examining the number of messages this service has
exchanged. If the max_q_size ever reached is equal to the hard_q_limit it is
possible that MTS has dropped messages for that service.
Example 1-4 Examining the MTS Queue for a SAP
Click here to view code image
Note
In the output of Example 1-4, the UUID is displayed as a decimal
value, whereas in the output from the system manager it is given as
hexadecimal. NX-OS has a built-in utility to do the conversion using
the hex value or dec value command.
The NX-OS MTS service is covered in more detail in Chapter 3,
“Troubleshooting Nexus Platform Issues,” along with additional
troubleshooting examples.
Example 1-5 Verify the Size and Location of PSS in the Flash File System
Click here to view code image
NX-1# show system internal flash
Mount-on 1K-
blocks Used Available Use% Filesystem
/ 409600 65624 343976 17
/dev/root
/proc 0 0 0 0
proc
/sys 0 0 0 0
none
/isan 1572864 679068 893796 44
none
/var 51200 488 50712 1
none
/etc 5120 1856 3264 37
none
/nxos/tmp 102400 2496 99904 3
none
/var/log 51200 1032 50168 3
none
/var/home 5120 36 5084 1
none
/var/tmp 307200 744 306456 1
none
/var/sysmgr 3670016 664 3669352 1
none
/var/sysmgr/ftp 819200 219536 599664 27
none
/var/sysmgr/srv_logs 102400 0 102400 0
none
/var/sysmgr/ftp/debug_logs 10240 0 10240 0
none
/dev/shm 3145728 964468 2181260 31
none
/volatile 512000 0 512000 0
none
/debug 5120 32 5088 1
none
/dev/mqueue 0 0 0 0
none
/debugfs 0 0 0 0
nodev
/mnt/plog 242342 5908 223921 3
/dev/sdc1
/mnt/fwimg 121171 4127 110788 4
/dev/sdc3
/mnt/cfg/0 75917 5580 66417 8
/dev/md5
/mnt/cfg/1 75415 5580 65941 8
/dev/md6
/bootflash 1773912 1046944 636856 63
/dev/md3
/cgroup 0 0 0 0
vdccontrol
/var/sysmgr/startup-
cfg 409600 15276 394324 4 none
/dev/pts 0 0 0 0
devpts
/mnt/pss 38172 9391 26810 26
/dev/md4
/usbslot1 7817248 5750464 2066784 74
/dev/sdb1
/fwimg_tmp 131072 508 130564 1
tmpfs
Feature Manager
Features in NX-OS are enabled on-demand and only consume system
resources such as memory, CPU time, MTS queues, and PSS when they
have been enabled. If a feature is in use and is then later shut down by the
operator, the resources associated with that feature are freed and reclaimed
by the system. The task of enabling or disabling features is handled by the
NX-OS infrastructure component known as the feature manager. The
feature manager is also responsible for maintaining and tracking the
operational state of all features in the system.
To better understand the role of the feature manager and its interaction with
other services, let’s review a specific example. An operator wants to enable
BGP on a particular Nexus switch. Because services in NX-OS are not
started until they are enabled, the user must first enter the feature bgp
command in configuration mode. The feature manager acts on this request
by ensuring the proper license is in place for the feature, and then feature
manager sends a message to the system manager to start the service. When
the BGP service is started, it binds to an MTS SAP, creates its PSS entries
to store run-time state, and then informs the system manager. The BGP
service then registers itself with the feature manager where the operational
state is changed to enabled.
When a feature is disabled by a user, a similar set of events occur in
reverse order. The feature manager asks the service to disable itself. The
feature empties its MTS buffers and destroys its PSS data and then
communicates with the system manager and feature manager, which sets
the operational state to disabled.
It is important to note that some services have dependencies on other
services. If a service is started and its dependencies are not satisfied,
additional services are started so the feature operates correctly. An example
of this is the BGP feature that depends on the route policy manager (RPM).
The most important concept to understand from this is that services
implement one or multiple features and dependencies exist. Except for the
fact that a user must enable features, the rest of this is transparent to the
user, and NX-OS takes care of the dependencies automatically.
Certain complex features require the user to specifically install a feature
set before the associated feature is enabled. MPLS, FEX, and Fabricpath
are a few examples. To enable these features, the user must first install the
feature set with the install feature-set [feature] command. The feature set
is then enabled with the feature-set [feature] command.
Note
The license manager tracks all the feature licenses on the system.
When a license expires, the license manager notifies the feature
manager to shut down the feature.
In Example 1-6, the current state of a feature is verified using the show
system internal feature-mgr feature state command. The output is
provided in a table format that lists the feature name, along with its UUID,
state, and reason for the current state. In Example 1-6, several features
have been enabled successfully by the feature manager, including two
instances of EIGRP. The output also displays instances of a feature that
have not yet been enabled, such as EIGRP instance 3 through 16.
Example 1-6 Checking the Feature Manager State for a Feature
Click here to view code image
Although problems with feature manager are not common, NX-OS does
provide a way to verify whether errors have occurred using the command-
line interface (CLI). Although no error codes are present in this output,
Example 1-7 shows how to obtain an error code for a specific feature if it
existed, using the show system internal feature-mgr feature action
command.
Note
NX-OS maintains a running log of events for many features and
services referred to as event history logs, which are discussed later in
this chapter and referenced throughout this book. Feature manager
provides two event history logs (errors and messages) that provide
additional detail for troubleshooting purposes. The output is obtained
using the show system internal feature-mgr event-history [msgs |
errors] command.
NX-OS Line Card Microcode
Distributed line cards run a microcode version of the NX-OS operating
system, as depicted in Figure 1-4. The modular architecture of NX-OS
allows the foundational concepts and components of the software to be
applied consistently to the line card as well as the system overall.
During system boot, or if a card is inserted into the chassis, the supervisor
decides if it should power on the card or not. This is done by checking the
card type and verifying that the required power, software, and hardware
resources are in place for the card to operate correctly. If so, the decision to
power on the card is made. From that point, the line card powers on and
executes its Basic Input/Output System (BIOS), power-on self-tests, and
starts its system manager. Next, all the line card services are started that are
required for normal operation. Communication and messaging channels are
established to the supervisor that allow the supervisor to push the
configuration and line card software upgrades as needed. Additional
services are started for local handling of exception logging, management of
environmental sensors, the card LEDs, health monitoring, and so on. After
the critical system services are started, the individual ASICs are started,
which allow the card to forward traffic.
In the operational state packets are forwarded and communications occur as
needed with the supervisor to update counters, statistics, and environmental
data. The line card has local storage for PSS as well as for On-Board
Failure Logging (OBFL). The OBFL data is stored in nonvolatile memory
so that it can survive reloads and is an excellent source of data for
troubleshooting problems specific to the line card. Information such as
exception history, card boot history, environmental history and much more
is stored in the OBFL storage.
For day-to-day operations, there is typically no need to enter the line card
CLI. The NX-OS operating system and distributed platforms are designed
to be configured and managed from the supervisor module. There are some
instances where direct access to the CLI of a line card is required.
Typically, these scenarios also involve working with Cisco TAC to collect
data and troubleshoot the various line card subsystems. In Example 1-8, the
line card CLI is entered from the supervisor module using the attach
module command. Notice that the prompt changes to indicate which
module the user is currently connected to. After the user has entered the
line card CLI, the show hardware internal dev-port-map command is
issued, which displays the mapping of front panel ports to the various
ASICs of the card on this Nexus 7000 M2 series card.
Example 1-8 Use of the attach module CLI from the Supervisor
Click here to view code image
NX-1# attach module 10
Attaching to module 10 ...
To exit type 'exit', to abort type '$.'
module-10# show hardware internal dev-port-map
--------------------------------------------------------------
CARD_TYPE: 24 port 10G
>Front Panel ports:24
--------------------------------------------------------------
Device name Dev role Abbr num_inst:
--------------------------------------------------------------
> Skytrain DEV_QUEUEING QUEUE 4
> Valkyrie DEV_REWRITE RWR_0 4
> Eureka DEV_LAYER_2_LOOKUP L2LKP 2
> Lamira DEV_LAYER_3_LOOKUP L3LKP 2
> Garuda DEV_ETHERNET_MAC MAC_0 2
> EDC DEV_PHY PHYS 6
> Sacramento Xbar ASIC DEV_SWITCH_FABRIC SWICHF 1
+-----------------------------------------------------------------
------+
+----------------+++FRONT PANEL PORT TO ASIC INSTANCE MAP+++------
------+
+-----------------------------------------------------------------
------+
FP port | PHYS | SECUR | MAC_0 | RWR_0 | L2LKP | L3LKP | QUEUE
|SWICHF
1 0 0 0 0,1 0 0 0,1
0
2 0 0 0 0,1 0 0 0,1
0
3 0 0 0 0,1 0 0 0,1
0
4 0 0 0 0,1 0 0 0,1
0
5 1 0 0 0,1 0 0 0,1
0
6 1 0 0 0,1 0 0 0,1
0
7 1 0 0 0,1 0 0 0,1
0
8 1 0 0 0,1 0 0 0,1
0
9 2 0 0 0,1 0 0 0,1
0
10 2 0 0 0,1 0 0 0,1
0
11 2 0 0 0,1 0 0 0,1
0
12 2 0 0 0,1 0 0 0,1
0
13 3 1 1 2,3 1 1 2,3
0
14 3 1 1 2,3 1 1 2,3
0
15 3 1 1 2,3 1 1 2,3
0
16 3 1 1 2,3 1 1 2,3
0
17 4 1 1 2,3 1 1 2,3
0
18 4 1 1 2,3 1 1 2,3
0
19 4 1 1 2,3 1 1 2,3
0
20 4 1 1 2,3 1 1 2,3
0
21 5 1 1 2,3 1 1 2,3
0
22 5 1 1 2,3 1 1 2,3
0
23 5 1 1 2,3 1 1 2,3
0
24 5 1 1 2,3 1 1 2,3
0
+-----------------------------------------------------------------
------+
+-----------------------------------------------------------------
------+
Note
A common reason to access a line card’s CLI is to run embedded logic
analyzer module (ELAM) packet captures on the local forwarding
engine. ELAM is a tool used to troubleshoot data plane forwarding and
hardware forwarding table programming problems. ELAM capture is
outside the scope of this book.
File Systems
The file system is a vital component of any operating system, and NX-OS
is no exception. The file system contains the directories and files needed by
the operating system to boot, log events, and store data generated by the
user, such as support files, debug outputs, and scripts. It is also used to
store the configuration and any data that services store in nonvolatile PSS,
which aids in system recovery after a failure.
Working with the NX-OS file system is similar to working with files in
Cisco’s IOS, with some improvements. Files and directories are created and
deleted from bootflash: or the external USB memory referred to as slot0:.
Archive files are created and compress large files, like show techs, to save
space. Table 1-1 provides a list of file system commands that are needed to
manage and troubleshoot an NX-OS switch.
Table 1-1 File System Commands
Command Purpose
Note
The gzip and tar options are useful when working with data collected
during troubleshooting. Multiple files are combined into an archive
and compressed for easy export to a central server for analysis.
This provides the list of files and subdirectories on the currently active
supervisor. For platforms with redundant supervisors, directories of the
standby supervisor are accessed as demonstrated in Example 1-10 by
appending //sup-standby/ to the directory path.
Example 1-10 Listing the Files on the Standby Supervisor
Click here to view code image
Note
The output in Example 1-11 is from a distributed platform; however,
OBFL data is available for nondistributed platforms as well. The items
enabled depend on the platform. Configure the OBFL options using
the hw-module logging onboard configuration command with various
subcommand options. There is typically no reason to disable OBFL.
Logflash
Logflash is a persistent storage location used to store system logs, syslog
messages, debug output, and core files. On some Nexus platforms the
logflash is an external compact flash or USB that may have not been
installed, or was removed at some point. The system prints a periodic
message indicating the logflash is missing to alert the operator about this
condition so that it can be corrected. It is recommended to have the
logflash mounted and available for use by the system so that any
operational data is stored there. In the event of a problem, the persistent
nature of logflash means that this data is available for analysis. Example 1-
12 uses the show system internal flash to verify that logflash: is mounted
and how much free space is available.
Example 1-12 Verifying the State and Available Space for the logflash:
Click here to view code image
The contents of the logflash directory is examined using the dir logflash:
as shown in Example 1-13.
Example 1-14 demonstrates using the show file command to print the
contents of a file in logflash:.
Note
The Nexus 3000 and Nexus 9000 series platforms now share a
common platform-dependent software base, and the image name
begins with nxos; for example, nxos.7.0.3.I6.1.bin.
Note
Not every NX-OS system upgrade requires an EPLD upgrade. The
procedure for installing NX-OS software and EPLD images are
documented with examples for each Nexus platform on
www.cisco.com. Refer to the Software Upgrade and Installation
Guides for more details.
Note
An SMU is valid only for the image it was created for. If the NX-OS
software is upgraded to another release, the SMU is deactivated. It is
critical to ensure any applicable software defects are fixed in the new
version of software before performing an upgrade.
The SMU files are packaged as a binary and a README.txt that detail the
associated bugs that are addressed by the SMU. The naming convention of
the SMU file is platform-package_type.release_version.Bug_ID.file_type.
For example, n7700-s2-dk9.7.3.1.D1.1.CSCvc44582.bin. The general
procedure for installing a SMU follows:
Step 1. Copy the package file or files to a local storage device or file
server.
Step 2. Add the package or packages on the device using the install add
command.
Step 3. Activate the package or packages on the device using the install
activate command.
Step 4. Commit the current set of packages using the install commit
command. However, in case of the reload or ISSU SMU, commit
the packages after the reload or ISSU.
Step 5. (Optional) Deactivate and remove the package, when desired.
Note
Before attempting the installation of an SMU, please review the
detailed examples on www.cisco.com for the platform.
Licensing
NX-OS requires that the operator obtain and install appropriate license files
for the features being enabled. Typically, Nexus platforms support a base
feature set with no additional license requirements. This includes most
Layer 2 functionality and generally some form of Layer 3 routing support.
To enable advanced features, such as MPLS, OTV, FabricPath, FCoE,
advanced routing, or VXLAN, additional licenses may need to be installed
depending on the platform. In addition to feature licenses, several Nexus
platforms also offer licenses to provide additional hardware capabilities.
For example, SCALEABLE_SERVICES_PKT on the Nexus 7000 series
enables XL-capable I/O modules to operate in XL mode and take full
advantage of their larger table sizes. Another example is the port upgrade
licenses available for some Nexus 3000 platforms.
License enforcement is built in to the NX-OS operating system by the
feature manager, which disables services if the appropriate licenses are not
present. If a specific feature is not configurable, the most likely culprit is a
missing license. Cisco does allow for feature testing without a license by
configuring the license grace-period in global configuration, which allows
features to function for up to 120 days without the license installed. This
does not cover all feature licenses on all platforms, however. Most notably
the Nexus 9000 and Nexus 3000 do not support license grace-period.
License files are downloaded from www.cisco.com. To obtain a license file
you need the serial number that is found with the show license host-id
command. Next, use the product authorization key (PAK) from your
software license claim to retrieve the license file and copy it to your
switch. Installation of a license is a nondisruptive task and is accomplished
with the install license command. For platforms that support virtual device
contexts (VDC) the license is installed and managed on the default VDC
and applies for all VDCs present on the chassis. The license installation is
verified with the show license command.
NX-OS High-Availability Infrastructure
The system manager, MTS, and PSS infrastructure components that were
described previously in this chapter provide NX-OS with the core of its
high-availability infrastructure. This high-availability infrastructure
enables NX-OS to seamlessly recover from most failure scenarios, such as
a supervisor switchover or a process restart.
NX-OS is capable of restarting a service to recover and resume normal
operation while minimizing impact to the data plane traffic being
forwarded. This process restart event is either stateful or stateless and
occurs when initiated by the user, or automatically when the system
manager identifies a process failure.
In the event of a stateless restart, all the run-time data structures associated
with the failed process are lost, and the system manager quickly spawns a
new process to replace the one that failed. A stateful restart means that a
portion of the run-time data is used to recover and seamlessly resume
functioning where the previous process left off after a process failure or
restart. Stateful restart is possible because the service updates its state in
PSS while active and then recovers the important run-time data structures
from PSS after a failure. Persistent MTS messages left in the process queue
are picked up by the restarted service to allow a seamless recovery. The
capability to resume processing persistent messages in the MTS queue
means the service restart is transparent to other services that were
communicating with the failed process.
NX-OS provides the infrastructure to the individual processes so that they
can choose the type of recovery mechanism to implement. In some cases, a
stateful recovery does not make sense, because a recovery mechanism is
built in to the higher layers of a protocol. Consider a routing protocol
process, such as OSPF or BGP, that has a protocol level graceful restart or
nonstop forwarding implementation. For those protocols, it does not make
sense to checkpoint the routing updates into the PSS infrastructure because
they are recovered by the protocol.
Note
The reason for a reset is reviewed in the output of show system reset-
reason. Process crash or restart details are viewed with the show
processes log pid and show cores commands.
Supervisor Redundancy
Nexus platforms with redundant supervisor modules operate in an
Active/Standby redundancy mode. This means that only one of the
supervisors is active at a time, and the standby is ready and waiting to take
over when a fatal failure of the active occurs. Active/Standby supervisor
redundancy provides a fully redundant control plane for the device and
allows for stateful switchover (SSO) and in-service software upgrades
(ISSU). The current redundancy state and which supervisor is active is
viewed in the output of show module, as well as the output of show system
redundancy status, as shown in Example 1-15.
HA info:
slotid = 5 supid = 0
cardstate = SYSMGR_CARDSTATE_ACTIVE .
cardstate = SYSMGR_CARDSTATE_ACTIVE (hot switchover is configured
enabled).
Configured to use the real platform manager.
Configured to use the real redundancy driver.
Redundancy register: this_sup = RDN_ST_AC, other_sup = RDN_ST_SB.
EOBC device name: veobc.
Remote addresses: MTS - 0x00000601/3 IP - 127.1.1.6
MSYNC done.
Remote MSYNC not done.
Module online notification received.
Local super-state is: SYSMGR_SUPERSTATE_STABLE
Standby super-state is: SYSMGR_SUPERSTATE_STABLE
Swover Reason : SYSMGR_UNKNOWN_SWOVER
Total number of Switchovers: 0
Swover threshold settings: 5 switchovers within 4800 seconds
Switchovers within threshold interval: 0
Last switchover time: 0 seconds after system start time
Cumulative time between last 0 switchovers: 0
Start done received for 1 plugins, Total number of plugins = 1
Statistics:
Message count: 0
Total latency: 0 Max latency: 0
Total exec: 0 Max exec: 0
The sysmgr output confirms that the superstate is stable for both
supervisors, which indicates there is no problem currently. If there was a
problem, the superstate displays as unstable. The superstate on the standby
supervisor is verified by attaching to the standby supervisor module, as
shown in Example 1-18.
Debugging info:
HA info:
slotid = 6 supid = 0
cardstate = SYSMGR_CARDSTATE_STANDBY .
cardstate = SYSMGR_CARDSTATE_STANDBY (hot switchover is configured
enabled).
Configured to use the real platform manager.
Configured to use the real redundancy driver.
Redundancy register: this_sup = RDN_ST_SB, other_sup = RDN_ST_AC.
EOBC device name: veobc.
Remote addresses: MTS - 0x00000501/3 IP - 127.1.1.5
MSYNC done.
Remote MSYNC done.
Module online notification received.
Local super-state is: SYSMGR_SUPERSTATE_STABLE
Standby super-state is: SYSMGR_SUPERSTATE_STABLE
Swover Reason : SYSMGR_UNKNOWN_SWOVER
Total number of Switchovers: 0
Swover threshold settings: 5 switchovers within 4800 seconds
Switchovers within threshold interval: 0
Last switchover time: 0 seconds after system start time
Cumulative time between last 0 switchovers: 0
Start done received for 1 plugins, Total number of plugins = 1
Statistics:
Message count: 0
Total latency: 0 Max latency: 0
Total exec: 0 Max exec: 0
The superstate is stable, and the redundancy register indicates that this
supervisor is redundancy state standby (RDN_ST_SB). Verify there are no
services pending synchronization on the standby, as shown in Example 1-
19.
If a service that was pending synchronization was found in this output, the
next step in the investigation is to verify the MTS queues for that particular
service. An example of verifying the MTS queues for a service was
demonstrated earlier in this chapter and is also shown in Chapter 3. If the
MTS queue had messages pending for the service, further investigation into
why those messages are pending is the next step in solving the problem.
Network or device instability could be causing frequent MTS updates to the
service that is preventing the synchronization from completing.
ISSU
NX-OS allows for in-service software upgrade (ISSU) as a high-availability
feature. ISSU makes use of the NX-OS stateful switchover (SSO) capability
with redundant supervisors and allows the system software to be updated
without an impact to data traffic. During an ISSU, all components of the
chassis are upgraded.
ISSU is initiated using the install all command, which performs the
following steps to upgrade the system.
Step 1. Determines whether the upgrade is disruptive and asks if you want
to continue
Step 2. Ensure that enough space is available in the standby bootflash
Step 3. Copies the kickstart and system images to the standby supervisor
module
Step 4. Sets the KICKSTART and SYSTEM boot variables
Step 5. Reloads the standby supervisor module with the new Cisco NX-OS
software
Step 6. Reloads the active supervisor module with the new Cisco NX-OS
software, which causes a switchover to the newly upgraded standby
supervisor module
Step 7. Upgrades the line cards
Step 8. The Connectivity Management Processor (CMP) on both
supervisors get upgraded (Sup1 on Nexus 7000 only)
For platforms that do not have a redundant supervisor, such as the Nexus
5000 series, a different method is used to achieve ISSU. The control plane
becomes inactive while the data plane continues to forward packets. This
allows the supervisor CPU to reset without causing a traffic disruption and
load the new NX-OS software version. After the CPU is booted on the new
software release, the control plane is restored from the previous
configuration and run-time state. The switch then synchronizes the control
plane state to the data plane.
Nexus 9000 and Nexus 3000 platforms introduced an enhanced ISSU
feature beginning in release 7.0(3)I5(1). Normally the NX-OS software
runs directly on the hardware. However, with enhanced ISSU, the NX-OS
software runs inside of a separate Linux container (LXC) for the supervisor
and line cards. During enhanced ISSU, a third container is created to act as
the standby supervisor so that the primary supervisor and line cards are
upgraded without disruption to data traffic. This feature is enabled with the
boot mode lxc configuration command on supported platforms.
Note
ISSU has restrictions on some platforms, and ISSU may not be
supported between certain releases of NX-OS. Please reference the
documentation on www.cisco.com to ensure ISSU is supported before
attempting an upgrade with this method.
NX-OS Virtualization Features
As a modern data center class operating system, NX-OS and the Nexus
switch platforms must provide support for virtualization of hardware and
software resources to meet the demands of today’s network architectures.
These features are introduced in this section and are explained in additional
detail throughout this book.
With appropriate licenses, the Supervisor 1 and Supervisor 2 allow for four
VDCs plus an admin VDC. The Supervisor 2E allows for eight VDCs plus
an admin VDC. The admin VDC does not handle any data plane traffic and
serves only switch management functions. In the context of operating or
troubleshooting in a VDC environment, note that certain tasks can be
performed only from the default VDC.
1. In-service software upgrade/downgrade (ISSU/ISSD)
2. Erasable programmable logic devices (EPLD) upgrades
3. Control-plane policing (CoPP) configuration
4. Licensing operations
5. VDC configuration, including creation, suspension, deletion, and
resource allocation
6. Systemwide QoS policy and port channel load-balancing configuration
7. Generic online diagnostics (GOLD) configuration
8. Ethanalyzer captures
Although VDCs allow additional versatility, some restrictions exist. For
instance, all VDCs run on the same NX-OS version and kernel. Restrictions
also exist on which I/O modules can be in the same VDC, and which ports
of a line card can be allocated to a VDC based on the hardware application-
specific integrated circuit (ASIC) architecture of the I/O module and
forwarding engine. Before attempting to create VDCs, check the
documentation for the specific supervisor and I/O modules that are
installed in the switch so that any limitations are dealt with in the design
and planning phase.
Note
At the time of this writing, multiple VDCs are supported only on the
Nexus 7000 series.
Note
Support for MPLS VPN is dependent upon the capabilities of the
platform and the installed feature licenses.
In Figure 1-9, the vPC pair is using vPC Port-channel 10 and vPC Port-
channel 20 to connect with two access switches. A third access switch to
the right of NX-2 is not connected in vPC mode. This non-vPC enabled
interface is known as an orphan port in vPC terminology. Each of the vPC
terms in Figure 1-9 are as follows:
vPC Peer-link is configured to carry all user-defined VLANs. It is
also used to forward BPDUs, HSRP hellos, and CFS (Cisco Fabric
Services) protocol packets between vPC peer switches. It should be a
port channel with member links on different modules for redundancy
purposes.
vPC Peer Keepalive link should be a separate path from the peer-link.
It does not have to be a point-to-point link and can traverse a routed
infrastructure. The peer keepalive link is used to ensure liveness of the
vPC peer switch.
Orphan Port is a non-vPC port connected to a device in the vPC
topology.
vPC Port (vPC member ports) are ports assigned to a vPC Port-
channel group. The ports of the vPC are split between the vPC peers.
Because all links are up and forwarding in vPC, the decision of which vPC
member-link interface to forward a packet on is made by the port-channel
load-balance hash of the device sending the packet. The sending switch
looks at the frame source and destination addresses of a traffic flow and
feeds these details into an algorithm. The algorithm performs a hash
function and returns a selected member port of the port-channel for the
frame to exit on. This allows all member link interfaces of the port-channel
to share the load of traffic.
Historically, routing protocol adjacencies had limitations when configured
to operate over a vPC topology. Recent versions of NX-OS have made
changes that allow dynamic unicast routing protocols to operate over vPC
without the limitations that existed previously. Check the vPC
configuration guide of the platform to ensure support exists before
attempting this configuration.
With each vPC peer making independent frame forwarding decisions, there
is a need for state synchronization between the peers. CFS is the protocol
that is used to synchronize Layer 2 (L2) state. It operates over the peer-link
and is enabled automatically. Its role is to ensure compatibility of the vPC
member ports between vPC peers. It is also used to synchronize mac
address tables and IGMP snooping state between vPC peers so that any
table entries exist on both vPC peers. Layer 3 (L3) forwarding tables and
protocol state are independent on each vPC peer.
Note
vPC is introduced here as a virtualization concept and is covered in
detail later in this book.
The use of the parsing utilities (i.e. “| count” and “| wc”) provide a way of
counting the number of lines or words in the show command being
executed. Example 1-21 shows how to use this for situations where a
simple count provides verification of the current state; for example,
comparing the number of lines in show ip ospf neighbor before and after a
configuration change.
In some cases it is desirable to obtain only the last few lines of a command
instead of paging through the output or using an include/exclude utility
option. The last count utility displays the last few entries and is used when
parsing the accounting log, system log buffer, or event history logs.
Example 1-23 shows how to print only the last line in the log buffer.
The egrep and grep utilities are extremely useful for removing clutter in a
command output to show only the character pattern of interest. Example 1-
25 demonstrates a common use for egrep, which is to review an event
history log and look for a specific event. In this example, egrep is used to
find each time OSPF has run its shortest path first (SPF) algorithm. The
prev 1 option is used so that the line previous to the pattern match is
printed, which indicates a full or partial SPF run. The next option is used to
get lines after the egrep pattern match.
Egrep has several other useful options that you should become familiar
with. They are count, which returns a count of the number of matches,
invert-match, which prints only lines that do not match the pattern, and
line-number, which appends the line number of the match to each line.
Table 1-2 provides information on additional command utilities available
in NX-OS.
Table 1-2 Additional Command Utilities
Utility Purpose
no- Command output pages without the need to press return or space
more when the terminal length is reached.
json Command output is printed as JSON. This is useful when data will
be collected and then consumed by a script or software application
because the output is structured data.
xml Prints the output in XML format.
Example 1-26 demonstrates show cli list [string], which returns all CLI
commands that match the given string input. This saves time over using the
“?” to figure out which commands exist.
Complementary to the previous example, the syntax for the identified CLI
commands are available with the show cli syntax [string] command, as
shown in Example 1-27.
Cisco IOS requires that you prepend any user or exec level command with
do while in the configuration mode. NX-OS eliminates the need for the do
command because it allows the execution of exec level commands from
within the configuration mode.
Technical Support Files
The concept of a show tech-support CLI command is likely familiar to
anyone who has worked on Cisco router or switch platforms. The general
idea of a show tech support file is to capture common outputs related to a
problem for offline analysis. NX-OS offers a very useful show tech support
hierarchy. At the top level is the show tech details, which obtains and
aggregates many commonly needed feature show tech files, event histories,
and internal data structures with a single CLI command. Another useful
command is the tac-pac, which collects the show tech details output and
automatically stores it as a compressed file in bootflash:.
Each feature enabled in NX-OS has the capability to provide a tech-support
file that obtains the most useful information about that specific feature.
The show tech-support [feature] obtains the feature configuration, show
commands, data structures, and event histories needed for offline analysis
of a problem with a specific feature. Be aware of feature dependencies
when performing data collection so that all relevant information about the
problem is gathered.
For example, a unicast routing problem with OSPF as the routing protocol
requires you to collect show tech-support ospf, but for a complete analysis
the output of show tech-support routing ip unicast is also needed to get
Unicast Routing Information Base (URIB) events. Feature dependency and
what to collect is determined on a case-by-case basis, depending on the
problem under investigation. Many feature show tech-support outputs do
not include a full show running-config.
It is always a good idea to collect the full show running-config along with
any specific feature show techs that are needed. Example 1-29 shows the
collection of show tech-support commands and the running-configuration
to investigate a unicast routing problem with OSPF.
Note
In addition to the show tech-support [feature], NX-OS also provides
show running-config [feature], which prints only the running-
configuration of the given feature.
Accounting Log
NX-OS keeps a history of all configuration changes made to the device in
the accounting log. This is a useful piece of information to determine what
has changed in a switch, and by whom. Typically, problems are investigated
based on the time when they started to occur, and the accounting log can
answer the question, What has changed? An example of reviewing the
accounting log is shown in Example 1-30. Because only the last few lines
are of interest, the start-seqnum option is used to jump to the end of the
list.
Note
The terminal log-all configuration command enables the logging of
show commands in the accounting log.
Feature Event-History
One very useful serviceability feature of NX-OS is that it keeps circular
event-history buffers for each configured feature. The event-history is
similar in many ways to an always-on debug for the feature that does not
have any negative CPU impact on the switch. The granularity of events
stored in the event-history depends on the individual feature, but many are
equivalent to the output that is obtained with debugging. In many cases, the
event-history contains enough data to determine what sequence of events
has occurred for the feature without the need for additional debug logs,
which makes them a great troubleshooting resource.
Event-history buffers are circular, which means that the possibility exists
for events to be overwritten by the time a problem condition is recognized,
leaving no event history evidence to investigate. For some features, the
event-history size is configurable as [small | medium | large]. If a problem
with a particular feature is occurring regularly, increase the event-history
size to improve the chance of catching the problem sequence in the buffer.
Most feature event-histories are viewed with the show {feature} internal
event-history command, as shown in Example 1-31.
Note
Troubleshooting scenarios may require a periodic dump of the feature
tech support file. This is done with Embedded Event Manager (EEM),
or another method of scripting the data collection. bloggerd is another
tool for such scenarios, but it is recommended for use only under
guidance from Cisco Technical Assistance Center (TAC).
!!
no router ospfv3 1
NX-1# rollback running-config checkpoint known_good
Note: Applying config parallelly may fail Rollback verification
Collecting Running-Config
#Generating Rollback Patch
Executing Rollback Patch
Generating Running-config for verification
Generating Patch for verification
Verification is Successful.
In Example 1-33, an OSPFv3 process was deleted after creating the initial
configuration checkpoint. The difference between the checkpoint and the
running configuration was highlighted with the show diff rollback-patch
checkpoint command. The configuration change was then rolled back
successfully and the OSPFv3 process was restored.
Consistency Checkers
Consistency checkers are an example of how NX-OS platforms are
improving serviceability in each release. Certain software bugs or race
conditions may result in a mismatch of state between the control plane,
data plane, or forwarding ASICs. Finding these problems is nontrivial and
usually requires in-depth knowledge of the platform. Consistency checkers
were introduced to deal with these situations, and they are the result of
feedback from TAC and customers. Example 1-34 shows the usage of the
forwarding consistency checker on a Nexus 3172 platform.
NX-OS provides access to the python interpreter from the exec mode CLI
by using the python command, as shown in Example 1-36.
NX-1# python
Python 2.7.5 (default, Jun 3 2016, 03:57:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> print "hello world!"
hello world!
>>> quit()
Note
Chapter 15, “Programmability and Automation,” covers the
programming and automation capabilities of NX-OS in more detail.
Bash Shell
A recent addition to NX-OS is the bash shell feature. After feature bash-
shell is enabled, the user can enter the Linux bash shell of the NX-OS
operating system. Example 1-37 shows how to access the bash shell from
the exec prompt.
To gain access to the bash shell, your user account must be associated to the
dev-ops or network-admin role. Bash commands can also be run from the
exec mode CLI of the switch with the run bash [command] option. This
command takes the command argument specified, runs it in the bash shell,
and returns the output. Python scripts can run from within the bash shell as
well. It is recommended to use caution while utilizing the bash shell to
manage the device.
Summary
NX-OS is a powerful, feature-rich network operating system that is
deployed in thousands of networks around the world. For the past 10 years,
it has remained under constant development to meet the needs of those
networks as they evolve over time. The modular architecture allows for
rapid feature development and source code sharing between the different
Nexus switching platforms. NX-OS and Nexus switches are designed for
resilience of both the hardware and software to allow for uninterrupted
service even in the event of a component failure.
The important management and operational features of NX-OS were
introduced in this chapter. That foundational knowledge enables you to
apply the troubleshooting techniques from the remaining chapters to your
own network environment.
References
Fuller, Ron. “Virtual Device Context (VDC) Design and Implementation”
(presented at Cisco Live, San Francisco 2014).
Fuller, Ron. “Cisco NX-OS Software Architecture” (presented at Cisco
Live, Orlando 2013).
Esau, Matthew and Souvik Ghosh. “Advanced Troubleshooting Cisco 7000
Series” (presented at Cisco Live, Las Vegas 2017).
Fuller, Ron, David Jansen, and Matthew McPherson. NX-OS and Cisco
Nexus Switching. Indianapolis: Cisco Press, 2013.
Cisco. Cisco Nexus Platform Product Data Sheets, www.cisco.com.
Cisco. Cisco Nexus Platform Configuration Guides, www.cisco.com.
Cisco. Cisco Nexus Platform Software Upgrade and Installation Guides,
www.cisco.com.
Chapter 2
NX-OS Troubleshooting
Tools
Note
These features can vary on each Nexus platform, based on the
hardware support. The number of active sessions and the source and
destination interfaces per session vary on different Nexus platforms.
Be sure to verify relevant Cisco documentation before configuring a
SPAN session on any Nexus switch.
To enable a port to forward the spanned traffic to the capture PC, the
destination interface is enabled for monitoring with the interface
parameter command switchport monitor. The destination ports are either
an Ethernet or Port-Channel interface configured in access or trunk mode.
The SPAN session is configured using the command monitor session
session-number, under which the source interface is specified with the
command source interface interface-id [rx|tx|both]. The rx option is used
to capture the ingress (incoming) traffic, whereas the tx option is used to
capture the egress (outgoing) traffic. By default, the option is set to both,
which captures both ingress and egress traffic on the configured source
interface. The destination interface is specified with the command
destination interface interface-id. By default, the monitor session is in
shutdown state and must be manually un-shut for the SPAN session to
function.
Note
The SPAN features can vary across different Nexus platforms. For
instance, features such as SPAN-on-Drop and SPAN-on-Latency are
supported on Nexus 5000 and Nexus 6000 series but not on Nexus
7000 series. Refer to the platform documentation for more about the
feature support.
Note
On FCoE ports, the SPAN destination interface is configured with the
command switchport mode SD, which is similar to the command
switchport monitor.
Example 2-2 displays the status of the monitor session. In this example,
the rx, tx, and both fields are populated for interface Eth4/1 and Eth4/2,
but the interface Eth5/1 is listed only for the rx direction. There is also an
option to filter VLANS under the monitor session using the filter vlan
vlan-id command.
Note
ACL filtering varies on different Nexus platforms. Refer to the CCO
documentation for ACL filtering support on respective Nexus
platforms.
Note
Nexus platforms do not support Remote SPAN (RSPAN).
For the ERSPAN source session to come up, the destination IP should be
present in the routing table. The ERSPAN session status is verified using
the command show monitor session session-id. Example 2-5
demonstrates the verification of both the source and destination ERSPAN
sessions.
Note
Refer to the Cisco documentation before configuring ERSPAN on any
Nexus switch, to verify any platform limitations.
SPAN on Latency and Drop
Both SPAN and ERSPAN provide the capability to apply filters to SPAN-
specific traffic based on protocol and IP addressing. Often users or
applications report high latency or experience traffic drops between the
source and destination, making it hard to figure out where the drop is
happening. In such instances, gaining visibility of traffic that is impacting
users is always helpful during troubleshooting and can both minimize the
service impact and speed up the troubleshooting process.
NX-OS provides the capability to span the traffic based on the specified
latency thresholds or based on drops noticed in the path. These capabilities
are available for both SPAN and ERSPAN.
SPAN-on-Latency
The SPAN-on-Latency (SOL) feature works a bit differently than the
regular SPAN session. In SOL, the source port is the egress port on which
latency is monitored. The destination port is still the port where the
network analyzer is connected on the switch. The latency threshold is
defined on the interface that is being monitored using the command
packets latency threshold threshold-value. When the packets cross or
exceed the specified threshold, the SPAN session is triggered and captures
the packets. If the threshold value is not specified under the interface, the
value is truncated to the nearest multiple of 8.
Example 2-6 illustrates the SOL configuration, in which packets are
sniffed only at the egressing interface Eth1/1 and Eth1/2 for flows that
have latency more than 1μs (microsecond). The packet latency threshold
configuration is per port for 40G interfaces but if there are 4x10G
interfaces, they share the same configuration. For this reason, Example 2-6
displays the log message that interfaces Eth1/1 to Eth1/4 are configured
with a latency threshold of 1000 ns.
SPAN-on-Drop
SPAN-on-Drop is a new feature that enables the spanning of packets that
were dropped because of unavailable buffer or queue space upon ingress.
This feature provides the capability to span packets that would otherwise
be dropped because the copy of the spanned traffic is transferred to a
specific destination port. A SPAN-on-Drop session is configured by
specifying the type as span-on-drop in the monitor session configuration.
Example 2-7 demonstrates the SPAN-on-Drop monitor session
configuration. The source interface Eth1/1 specified in the configuration is
the interface where congestion is present.
Note
The SPAN-on-Drop feature captures only drops in unicast flows that
result from buffer congestion.
Unlike other SPAN features, SPAN-on-Drop does not have any ternary
content addressable memory (TCAM) programming involved.
Programming for the source side is in the buffer or queue space.
Additionally, only one instance of SPAN-on-Drop can be enabled on the
switch; enabling a second instance brings down the session with the syslog
message “No hardware resource error.” If the SPAN-on-Drop session is up
but no packets are spanned, it is vital to verify that the drop is happening
in the unicast flow. This is verified by using the command show platform
software qd info interface interface-id and checking that the counter
IG_RX_SPAN_ON_DROP is incrementing and is nonzero. Example 2-8
shows the output for the counter IG_RX_SPAN_ON_DROP, confirming
that no drops are occurring in the unicast flows.
Note
At the time of writing, SOL and SPAN-on-Drop are supported only on
Nexus 5600 and Nexus 6000 series switches.
Nexus Platform Tools
Nexus switches are among the most powerful data center switches in the
industry. This is partly because of the CPU and memory available in the
switch, but also because of the wide range of integrated tools that the NX-
OS offers. These tools provide the capability to capture packets at
different ASIC levels within the switch and help verify both hardware
programming and the action taken by the hardware or the software on the
packet under investigation. Some of these tools include the following:
Ethanalyzer
Embedded Logic Analyzer Module (ELAM)
Packet Tracer
These tools are capable of performing packet capture for the traffic
destined for the CPU or transit hardware-switched traffic. They are helpful
in understanding the stages the packet goes through in a switch, which
helps narrow down the issue very quickly. The main benefit of these
features is that they do not require time to set up an external sniffing
device.
Note
The ELAM capture is supported on all Nexus switches, but because it
requires deeper understanding of the ASICs and the configuration
differs among Nexus platforms, it is outside the scope of this book.
Additionally, ELAM is best performed under the supervision of a
Cisco Technical Assistance Center (TAC) engineer. ELAM also is not
supported on N5000 or N5500 switches.
Ethanalyzer
Ethanalyzer is an NX-OS implementation of TShark, a terminal version of
Wireshark. TShark uses the libpcap library, which gives Ethanalyzer the
capability to capture and decode packets. It can capture inband and
management traffic on all Nexus platforms. Ethanalyzer provides the users
with the following capabilities:
Capture packets sent and received by the switch Supervisor CPU
Define the number of packets to be captured
Define the length of the packets to be captured
Display packets with very detailed protocol information or a one-line
summary
Open and save captured packet data
Filter packets capture on many criteria (capture filter)
Filter packets to be displayed on many criteria (display filter)
Decode the internal header of control packet
Avoid the requirement of using an external sniffing device to capture
the traffic
Ethanalyzer does not allow hardware-switched traffic to be captured
between data ports of the switch. For this type of packet capture, SPAN or
ELAM is used. When the interfaces are configured with ACLs with ACEs
configured with the log option, the hardware-switched flows gets punted to
the CPU and thus are captured using Ethanalyzer. However, this should not
be tried in production because the packets could get dropped as a result of
CoPP policies or the excessive traffic punted to the CPU could impact
other services on the device.
Ethanalyzer is configured in three simple steps:
Step 1. Define capture interface.
Step 2. Define Filters: Set the capture filter or display filter.
Step 3. Define the stop criteria.
There are three kinds of capture interfaces:
Mgmt: Captures traffic on the Mgmt0 interface of the switch
Inbound-hi: Captures high-priority control packets on the inband,
such as Spanning Tree Protocol (STP), Link Aggregation Control
Protocol (LACP), Cisco Discovery Protocol (CDP), Data Center
Bridging Exchange (DCBX), Fiber Channel, and Fiber Channel over
Ethernet (FCOE)
Inbound-low: Captures low-priority control packets on the inband,
such as Internet Group Management Protocol (IGMP), Transmission
Control Protocol (TCP), User Datagram Protocol (UDP), Internet
Protocol (IP), and Address Resolution Protocol (ARP) traffic
The next step is to set the filters. With a working knowledge of Wireshark,
configuring filters for Ethanalyzer is fairly simple. Two kinds of filters
can be set up for configuring Ethanalyzer: capture filter and display filter.
As the name suggests, when a capture filter is set, only frames that match
the filter are captured. The display filter is used to display the packets that
match the filter from the captured set of packets. That means Ethanalyzer
captures other frames that do not match the display filter but are not
displayed in the output. By default, Ethanalyzer supports capturing up to
10 frames and then stops automatically. This value is changed by setting
the limit-captured-frames option, where 0 means no limit.
Note
All in-band Ethernet ports that send or receive data to the switch
supervisor are captured with the inbound-hi or inbound-low option.
However, display or capture filtering can be applied.
Packet length:
less length
greater length
Layer 4 udp port 53 tcp.port==53
udp dst port 53 udp.port==53
udp src port 53
tcp port 179
tcp portrange 2000-2100
FabricPath proto 0x8903 Dest HMAC/MC destination:
cfp.d_hmac==mac
cfp.d_hmac_mc==mac
EID/FTAG/IG Bit:
cfp.eid==
cfp.ftag==
cfp.ig==
ICMP-Types:
icmp-echoreply
icmp-unreach
icmp-sourcequench
icmp-redirect
icmp-echo
icmp-routeradvert
icmp-routersolicit
icmp-timxceed
icmp-paramprob
icmp-tstamp
icmp-tstampreply
icmp-ireq
icmp-ireqreply
icmp-maskreq
icmp-maskreply
Capturing on inband
2017-05-21 21:34:42.821141 10.162.223.34 -> 10.162.223.33 BGP
KEEPALIVE Message
2017-05-21 21:34:42.932217 10.162.223.33 -> 10.162.223.34 TCP bgp
> 14779 [ACK]
Seq=1 Ack=20 Win=17520 Len=0
2017-05-21 21:34:43.613048 10.162.223.33 -> 10.162.223.34 BGP
KEEPALIVE Message
2017-05-21 21:34:43.814804 10.162.223.34 -> 10.162.223.33 TCP
14779 > bgp [ACK]
Seq=20 Ack=20 Win=15339 Len=0
2017-05-21 21:34:46.005039 10.1.12.2 -> 224.0.0.5 OSPF
Hello Packet
2017-05-21 21:34:46.919884 10.162.223.34 -> 10.162.223.33 BGP
KEEPALIVE Message
2017-05-21 21:34:47.032215 10.162.223.33 -> 10.162.223.34 TCP bgp
> 14779 [ACK]
Seq=20 Ack=39 Win=17520 Len=0
! Output omitted for brevity
The saved .pcap file can also be transferred to a remote server via File
Transfer Protocol (FTP), Trivial File Transfer Protocol (TFTP), Secure
Copy Protocol (SCP), Secure FTP (SFTP), and Universal Serial Bus
(USB), after which it can be easily analyzed using a packet analyzer tool
such as Wireshark.
Note
If multiple VDCs exist on the Nexus 7000, the Ethanalyzer runs only
on the admin or default VDC. In addition, starting with Release 7.2
on Nexus 7000, you can use the option to filter on a per-VDC basis.
Packet Tracer
During troubleshooting, it becomes difficult to understand what action the
system is taking on a particular packet or flow. For such instances, the
packet tracer feature is used. Starting with NX-OS Version 7.0(3)I2(2a),
the packet tracer utility was introduced on the Nexus 9000 switch. It is
used when intermittent or complete packet loss is observed.
Note
At the time of writing, the packet tracer utility is supported only on
the line cards or fabric modules that come with Broadcom Trident II
ASICs. More details about the Cisco Nexus 9000 ASICs can be found
at http://www.cisco.com.
Packet-tracer stats
---------------------
Module 1:
Filter 1 installed: src-ip 192.168.2.2 dst-ip 192.168.1.1
protocol 1
ASIC instance 0:
Entry 0: id = 9473, count = 120, active, fp,
Entry 1: id = 9474, count = 0, active, hg,
Filter 2 uninstalled:
Filter 3 uninstalled:
Filter 4 uninstalled:
Filter 5 uninstalled:
! Second iteration of the Output
N9000-1# test packet-tracer show
Packet-tracer stats
---------------------
Module 1:
Filter 1 installed: src-ip 192.168.2.2 dst-ip 192.168.1.1
protocol 1
ASIC instance 0:
Entry 0: id = 9473, count = 181, active, fp,
Entry 1: id = 9474, count = 0, active, hg,
Filter 2 uninstalled:
Filter 3 uninstalled:
Filter 4 uninstalled:
Filter 5 uninstalled:
! Stopping the Packet-Tracer
N9000-1# test packet-tracer stop
NetFlow Configuration
These questions help users make the right choice of applying a Layer 3 or
Layer 2 NetFlow configuration. Configuring NetFlow on a Nexus switch
consists of following steps:
Step 1. Enable the NetFlow feature.
Step 2. Define a flow record by specifying key and nonkey fields of
interest.
Step 3. Define one or many flow exporters by specifying export format,
protocol, destination, and other parameters.
Step 4. Define a flow monitor based on the previous flow record and flow
exporter(s).
Step 5. Apply the flow monitor to an interface with a sampling method
specified.
Enable NetFlow Feature
On NX-OS, the NetFlow feature is enabled using the command feature
netflow. When the feature is enabled, the entire NetFlow-related CLI
becomes available to the user.
Note
NetFlow consumes hardware resources such as TCAM and CPU.
Thus, understanding the resource utilization on the device is
recommended before enabling NetFlow.
D IF SrcAddr DstAddr L4
Info PktCnt TCPFlags
-+----------+---------------+---------------+---------------+----
------+--------
I 3/31 010.012.001.002 224.000.000.005 089:00000:00000
0000000159 ......
I 3/32 010.013.001.003 224.000.000.005 089:00000:00000
0000000128 ......
I 3/32 003.003.003.003 002.002.002.002 001:00000:00000
0000000100 ......
I 3/31 002.002.002.002 003.003.003.003 001:00000:00000
0000000100 ......
O 3/31 003.003.003.003 002.002.002.002 001:00000:00000
0000000100 ......
O 3/32 002.002.002.002 003.003.003.003 001:00000:00000
0000000100 ......
The statistics in Example 2-15 are collected on the N7k platform, which
supports hardware-based flows. However, not all Nexus platforms have
support for hardware-based flow matching. Nexus switches such as Nexus
6000 do not support hardware-based flow matching. Thus, a software-
based flow matching must be performed. This can be resource consuming
and can impact performance, however, so such platforms support only
Sampled NetFlow (see the following section).
Note
Nexus 5600 and Nexus 6000 support only ingress NetFlow applied to
the interface; Nexus 7000 supports both ingress and egress NetFlow
statistics collection.
NetFlow Sampling
NetFlow supports sampling on the data points to reduce the amount of data
collected. This implementation of NetFlow is called Sampled NetFlow
(SNF). SNF supports M:N packet sampling, where only M packets are
sampled out of N packets.
A sampler is configured using the command sampler name. Under the
sampler configuration, sampler mode is defined using the command mode
sample-number out-of packet-number, where sample-number ranges from
1 to 64 and the packet-number ranges from 1 to 65536 packets). This is
defined using the sampler subcommand mode sampler-number out-of
packet-number. After the sampler is defined, it is used in conjunction with
the flow monitor configuration under the interface in Example 2-16.
sampler NF-SAMPLER1
mode 1 out-of 1000
!
interface Eth3/31-32
ip flow monitor FL_MON input sampler NF-SAMPLER1
Users can also define the active and inactive timer for the flows using the
command flow timeout [active | inactive] time-in-seconds.
Starting with NX-OS Version 7.3(0)D1(1), NetFlow is also supported on
the control plane policing (CoPP) interface. NetFlow on the CoPP
interface enables users to monitor and collect statistics of different
packets that are destined for the supervisor module on the switch. NX-OS
allows an IPv4 flow monitor and a sampler to be attached to the control
plane interface in the output direction. Example 2-17 demonstrates the
NetFlow configuration under CoPP interface and the relevant NetFlow
statistics on the Nexus 7000 platform.
Note
In case of any problems with NetFlow, collect the output of the
command show tech-support netflow during problematic state.
sFlow
Defined in RFC 3176, sFlow is a technology for monitoring traffic using
sampling mechanisms that are implemented as part of an sFlow agent in
data networks that contain switches and routers. The sFlow agent is a new
software feature for the Nexus 9000 and Nexus 3000 platforms. The sFlow
agent on these platforms collects the sampled packet from both ingress
and egress ports and forwards it to the central collector, known as the
sFlow Analyzer. The sFlow agent can periodically sample or poll the
counters associated with a data source of the sampled packets.
When sFlow is enabled on an interface, it is enabled for both ingress and
egress directions. sFlow can be configured only for Ethernet and port-
channel interfaces. sFlow is enabled by configuring the command feature
sflow. Various parameters can be defined as part of the configuration (see
Table 2-2).
Table 2-2 sFlow Parameters
sFlow Parameter Description
Configuration
sflow sampling rate rate The sampling rate for packets. The default is
4096. A value of 0 implies that sampling is
disabled.
sflow agent-ip ip-address The address of the sFlow agent. This is the local
and valid IP address on the switch.
To verify the configuration, use the command show sflow. This command
output displays all the information that is configured for the sFlow (see
Example 2-19).
When sFlow is configured, the sFlow agent starts collecting the statistics.
Although the actual flow is viewed on the sFlow collector tools, you can
still see the sFlow statistics on the switch using the command show sflow
statistics and also view both internal information about the sFlow and
statistics using the command show system internal sflow info. Example
2-20 displays the statistics for the sFlow. Notice that although the total
packet count is high, the number of sampled packets is very low. This is
because the configuration defines sampling taken per 1000 packets. The
system internal command for sFlow also displays the resource utilization
and its present state.
Note
In case of any problems with sFlow, collect the output of the
command by using show tech-support sflow during problematic
state.
After the NTP has been synchronized, the time is verified using the show
clock command.
NX-OS also has a built-in proprietary feature known as Cisco Fabric
Services (CFS) that can be used to distribute data and configuration
changes to all Nexus devices. CFS distributes all local NTP configuration
across all the Nexus devices in the network. It applies a network-wide lock
for NTP when the NTP configuration is started. When the configuration
changes are made, users can discard or commit the changes, and the
committed configuration replicates across all Nexus devices. The CFS for
NTP is enabled using the command ntp distribute. The configuration is
committed to all the Nexus devices by using the ntp commit command
and is aborted using the ntp abort command. When either command is
executed, CFS releases the lock on NTP across network devices. To check
that the fabric distribution is enabled for NTP, use the command show ntp
status.
NX-OS also provides a CLI to verify the statistics of the NTP packets.
Users can view input-output statistics for NTP packets, local counters
maintained by NTP, and memory-related NTP counters (which is useful in
case of a memory leak condition by NTP process), and per-peer NTP
statistics. If the NTP packets are getting dropped for some reason, those
statistics can be viewed from the CLI itself. To view these statistics, use
the command show ntp statistics [io | local | memory | peer ipaddr ip-
address]. Example 2-23 displays the IO and local statistics for NTP
packets. If bad NTP packets or bad authentication requests are received,
those counters are viewed under local statistics.
Name : __pfm_fanabsent_any_singlefan
Description : Shutdown if any fanabsent for 5 minute(s)
Overridable : Yes
Name : __pfm_fanbad_any_singlefan
Description : Syslog when fan goes bad
Overridable : Yes
Name : __pfm_power_over_budget
Description : Syslog warning for insufficient power
overbudget
Overridable : Yes
Name : __pfm_tempev_major
Description : TempSensor Major Threshold. Action: Shutdown
Overridable : Yes
Name : __pfm_tempev_minor
Description : TempSensor Minor Threshold. Action: Syslog.
Overridable : Yes
NX-1# show event manager policy-state __lcm_module_failure
Policy __lcm_module_failure
Cfg count : 3
Hash Count Policy will trigger if
----------------------------------------------------------------
default 0 3 more event(s) occur
Similarly, a Python script can be referenced in the EEM script. The Python
script is also saved in the bootflash with the .py extension. Example 2-27
illustrates a Python script and its reference in the EEM script. In this
example, the EEM script is triggered when the traffic on the interface
exceeds the configured storm-control threshold. In such an event, the
triggered Python script collects multiple commands.
Logging
Network issues are hard to troubleshoot and investigate if the device
contains no information. For instance, if an OSPF adjacency goes down
and no correlating alert exists, determining when the problem happened
and what caused the problem is difficult. For these reasons, logging is
important. All Cisco routers and switches support logging functionality.
Logging capabilities are also available for specific features and protocols.
For example, logging can be enabled for BGP session state changes or
OSPF adjacency state changes.
Table 2-3 lists the various logging levels that can be configured.
Table 2-3 Logging Levels
Level Number Level Name
0 Emergency
1 Alert
2 Critical
3 Errors
4 Warnings
5 Notifications
6 Informational
7 Debugging
When the higher value is set, all the lower logging levels are enabled by
default. If the logging level is set to 5 (Notifications), for example, all
events falling under the category from 0 to 5 (Emergency to Notifications)
are logged. For troubleshooting purpose, setting the logging level to 7
(Debugging) is good practice.
Multiple logging options are available on Cisco devices:
Console logging
Buffered logging
Logging to syslog server
Console logging is important when the device is experiencing crashes or a
high CPU condition and access to the terminal session via Telnet or Secure
Shell (SSH) is not available. However, having console logging enabled
when running debugs is not a good practice because some debug outputs
are chatty and can flood the device console. As a best practice, console
logging should always be disabled when running debugs. Example 2-28
illustrates how to enable console logging on Nexus platforms.
Example 2-28 Configuring Console Logging
Click here to view code image
NX-OS not only provides robust logging, but it also is persistent across
reloads. All the buffered logging is present in the /var/log/external/
directory. To view the internal directories, use the command show system
internal flash. This command lists all the internal directories that are part
of the flash along with their utilization. The buffered log messages are
viewed using the command show logging log.
Example 2-29 displays the directories present in the flash and the contents
of the /var/log/external/ directory. If the show logging log command does
not display output or the logging gets stopped, check the /var/log/
directory to ensure that space is available for that directory.
The logging level is also defined for various NX-OS components so that
the user can control logging for chatty components or disable certain
logging messages for less chatty or less important components. This is
achieved by setting the logging level of the component using the command
logging level component-name level. Example 2-30 demonstrates setting
the logging level of the ARP and Ethpm components to 3 to reduce
unwanted log messages.
Debug Logfiles
NX-OS provides the user with an option to redirect debug output to a file.
This is useful when running debugs and segregating debug outputs from
regular log messages. Use the debug logfile file-name size size command.
Example 2-32 demonstrates using the debug logfile command to capture
debugs in a logfile. In this example, a debug logfile named bgp_dbg is
created with a size of 10000 bytes. The size of the logfile ranges from
4096 bytes to 4194304 bytes. All the debugs that are enabled are logged
under the logfile. To filter the debug output further to capture more precise
debug output, use the debug-filter option. In the following example, a
BGP update debug is enabled and the update debug logs are filtered for
neighbor 10.12.1.2 in a VRF context VPN_A.
The NX-OS software creates the logfile in the log: file system root
directory, so all the created logfiles are viewed using dir log:. After the
debug logfile is created, the respective debugs are enabled and all the
debug outputs are redirected to the debug logfile. To view the contents of
the logfile, use the show debug logfile file-name command.
Accounting Log
During troubleshooting, it is important to identify the trigger of the
problem, which could be normal show command or a configuration
change. For such issues, examining all the configuration and show
commands during the time of the problem provides vital information.
NX-OS logs all this information into the accounting logfile, which is
readily available to the users. Using the command show accounting log,
users capture all the commands executed and configured on the system,
along with the time stamp and user information. The accounting logs are
persistent across reloads. By default, the accounting logs capture only the
configuration commands. To allow the capture of show commands along
with configuration commands, configure the command terminal log-all.
Example 2-33 displays the output of the accounting log, highlighting the
various configuration changes made on the device.
Note
The accounting logs and show logging logfiles are both stored on
logflash and are accessible across reloads.
Event-History
NX-OS provides continuous logging for all events that occur in the system
for both hardware and software components as event-history logs. The
event-history logs are VDC local and are maintained on a per-component
basis. These logs reduce the need for running debugs in a live production
environment and are useful for investigating a service outage even after
the services are restored. The event-history logs are captured in the
background for each component and do not have any impact on CPU
utilization to perform this task.
The event-history log size is configurable to three sizes:
Large
Medium
Small
The event-history logs are viewed from the CLI of each component. For
instance, the event-history is viewed for all ARP events using the
command show ip arp internal event-history event. Example 2-34
displays the event-history logs for ARP and shows how to modify the
event-history size. Disable the event-history logs by using the disabled
keyword while defining the size of the event-history. Disabling event-
history is not a recommended practice, however, because it reduces the
chances of root causing a problem and understanding the sequence of
events that occurred.
References
RFC 3176, InMon Corporation’s sFlow: A Method for Monitoring Traffic
in Switched and Routed Networks. P. Phaal, S. Panchen, and N. McKee.
IETF, https://www.ietf.org/rfc/rfc3176.txt, September 2001.
BRKARC-2011, Overview of Packet Capturing Tools, Cisco Live.
Cisco, sFlow Configuration Guide,
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus3000/sw/s
ystem_mgmt/503_U4_1/b_3k_System_Mgmt_Config_503_u4_1/b_3k_Sy
stem_Mgmt_Config_503_u4_1_chapter_010010.html.
Chapter 3
Troubleshooting Nexus
Platform Issues
Mod Sw Hw
--- --------------- ------
1 8.0(1) 0.403
2 8.0(1) 1.0
6 8.0(1) 1.2
Xbar Sw Hw
--- --------------- ------
1 NA 2.0
2 NA 3.0
3 NA 2.0
4 NA 2.0
5 NA 2.0
Nexus 3500
N3K1# show module
Mod Ports Module-
Type Model Status
--- ----- ----------------------------------- -------------------
--- -----------
1 48 48x10GE Supervisor N3K-C3548P-10G-
SUP active *
Note
A fabric module is not required for all Nexus 7000 chassis types. The
Nexus 7004 chassis has no fabric module, for example. However,
higher slot chassis types do require fabric modules for the Nexus
7000 switch to function successfully.
Note
Nexus I/O module compatibility matrix CCO documentation is
available at
http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/nexus
7000/sw/matrix/technical/reference/Module_Comparison_Matrix.pdf
.
The referenced CCO documentation also lists the compatibility of the
FEX modules with different line cards.
Bootup Diagnostics
Bootup diagnostics detect hardware faults such as soldering errors, loose
connections, and faulty module. These tests are run when the system boots
up and before the hardware is brought online. Table 3-1 shows some of the
bootup diagnostic tests.
Table 3-1 Nexus Bootup Diagnostic Tests
Test Name Description Attributes Hardware
ASIC Register Tests access to all the registers Disruptive SUP and
Test in the ASIC line card
ASIC Memory Tests access to all the memory Disruptive SUP and
Test in the ASICs line card
EOBC Port Test the loopback of Ethernet Disruptive SUP and
Loopback out-of-band connection (EOBC) line card
Port Loopback Tests the port in internal Disruptive Line card
Test loopback and checks the
forwarding path by sending and
receiving data on the same port
Boot Read-Only Tests the integrity of the Nondisruptive SUP
Memory (ROM) primary and secondary boot
Test devices on the SUP card
Universal Serial Verifies the USB controller Nondisruptive SUP
Bus (USB) initialization on the SUP card
Management Tests the loopback of the Disruptive SUP
Port Loopback management port on the SUP
Test card
OBFL Tests the integrity of the Nondisruptive SUP and
onboard failure logging (OBFL) line card
flash
Federal Verifies the security device on Disruptive Line card
Information the module
Processing
Standards
(FIPS)
Note
The FIPS test is not supported on the F1 series modules on Nexus
7000.
Runtime Diagnostics
The runtime diagnostics are run when the system is in running state (that
is, on a live node). These tests help detect runtime hardware errors such as
memory errors, resource exhaustion, and hardware faults/degradation. The
runtime diagnostics are further be classified into two categories:
Health-monitoring diagnostics
On-demand diagnostics
Health-monitoring (HM) tests are nondisruptive and run in the background
on each module. The main aim of these tests is to ensure that the hardware
and software components are healthy while the switch is running network
traffic. Some specific HM tests, marked as HM-always, start by default
when the module goes online. Users can easily enable and disable all HM
tests except HM-always tests on any module via the configuration
command-line interface (CLI). Additionally, users can change the interval
of all HM tests except the fixed-interval tests marked as HM-fixed. Table
3-2 lists the HM tests available across SUP and line card modules.
Table 3-2 Nexus Health-Monitoring Diagnostic Tests
Test Name Description Attributes Hardware
ASIC Scratch Tests the access to a scratch Nondisruptive SUP and
Register Test pad register of the ASICs line card
(all ASICs
that
support
scratch
pad
register)
RTC Test Verifies that the real-time Nondisruptive SUP
clock (RTC) on the Supervisor
is ticking
Nonvolatile Tests the sanity of NVRAM Nondisruptive SUP
Random Access blocks on the SUP modules
Memory
(NVRAM) Sanity
Test
Port Loopback Tries to loop back a packet to Nondisruptive Line card
Test check the forwarding path (all front-
periodically without disrupting panel
port traffic ports on
the
switch)
Rewrite Engine Tests the integrity of loopback Nondisruptive Line card
Loopback Test for all ports to the Rewrite
Engine ASIC on the module
Primary Boot Tests the integrity of the Nondisruptive SUP and
ROM Test primary boot devices on the line card
card
Secondary Boot Tests the integrity of the Nondisruptive SUP and
ROM Test secondary boot devices on the line card
card
CompactFlash Verifies the access to internal Nondisruptive SUP
CompactFlash on the SUP card
External Verifies the access to external Nondisruptive SUP
CompactFlash CompactFlash on the SUP card
Power Test the standby power Nondisruptive SUP
Management Bus management control bus on
Test the SUP card
Spine Control Tests and verifies the Nondisruptive SUP
Bus Test availability of the standby
spine module control bus
Standby Fabric Tests the packet path between Nondisruptive SUP
Loopback Test the standby SUP and fabric
Status Bus (Two Checks the two wire interfaces Nondisruptive SUP
Wire) Test that connect the various
modules (including fabric
cards) to the SUP module
The interval for HM tests is set using the global configuration command
diagnostic monitor interval module slot test [name | test-id | all] hour
hour min minutes second sec. Note that the name of the test is case
sensitive. To enable or disable an HM test, use the global configuration
command [no] diagnostic monitor module slot test [name | test-id | all].
Use the command show diagnostic content module [slot | all] to display
the information about the diagnostics and their attributes on a given line
card. Example 3-2 illustrates how to view the diagnostics information on a
line card on a Nexus 7000 switch and how to disable an HM test. The line
card in the output of Example 3-2 is the SUP card, so the test names listed
are relevant only for the SUP card, not the line card. For example, with the
ExternalCompactFlash test, notice that the attribute in the first output is
set to A, which indicates that the test is Active. When the test is disabled
from the configuration mode, the output displays the attribute as I,
indicating that the test is Inactive.
1) ASICRegisterCheck-------------> .
2) USB---------------------------> .
3) NVRAM-------------------------> .
4) RealTimeClock-----------------> .
5) PrimaryBootROM----------------> .
6) SecondaryBootROM--------------> .
7) CompactFlash------------------> .
8) ExternalCompactFlash----------> U
9) PwrMgmtBus--------------------> .
10) SpineControlBus---------------> .
11) SystemMgmtBus-----------------> .
12) StatusBus---------------------> .
13) PCIeBus-----------------------> .
14) StandbyFabricLoopback---------> U
15) ManagementPortLoopback--------> .
16) EOBCPortLoopback--------------> .
17) OBFL--------------------------> .
N7K1# show diagnostic result module 1 detail
Current bootup diagnostic level: complete
Module 1: Supervisor Module-2 (Active)
_________________________________________________________
_____________
1) ASICRegisterCheck .
2) USB .
On-demand diagnostics have a different focus. Some tests are not required
to be run periodically, but they might be run in response to certain events
(such as faults) or in an anticipation of an event (such as exceeded
resources). Such on-demand tests are useful in localizing faults and
applying fault-containment solutions.
Both disruptive and nondisruptive on-demand diagnostic tests are run
from a CLI. An on-demand test is executed using the command diagnostic
start module slot test [test-id | name | all | non-disruptive] [port port-
number | all]. The test-id variable is the number of tests supported on a
given module. The test is also run on a port basis (depending on the kind
of test) by specifying the optional keyword port. The command
diagnostic stop module slot test [test-id | name | all] is used to stop an
on-demand test. The on-demand tests default to single execution, but the
number of iterations can be increased using the command diagnostic
ondemand iteration number, where number specifies the number of
iterations. Be careful when running disruptive on-demand diagnostic tests
within production traffic.
Example 3-4 demonstrates an on-demand PortLoopback test on a Nexus
7000 switch module.
Example 3-4 On-Demand Diagnostic Test
Click here to view code image
N7K1# diagnostic ondemand iteration 3
N7K1# diagnostic start module 6 test PortLoopback
N7K1# show diagnostic status module 6
<BU>-Bootup Diagnostics, <HM>-Health Monitoring
Diagnostics
<OD>-OnDemand Diagnostics, <SCH>-Scheduled
Diagnostics
==============================================
Card:(6) 1/10 Gbps Ethernet Module
==============================================
Current running test Run by
PortLoopback OD
Currently Enqueued Test Run by
PortLoopback OD (Remaining Iteration: 2)
N7K1# show diagnostic result module 6 test PortLoopback detail
Current bootup diagnostic level: complete
Module 6: 1/10 Gbps Ethernet Module
_________________________________________________________
_____________
6) PortLoopback:
Port 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-----------------------------------------------------
U U U U U U U U U U U U U . . .
Port 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
-----------------------------------------------------
U U . . U U U U U U U U U U U U
Port 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
-----------------------------------------------------
U U U U U U U U U U U U U U U U
Note
Diagnostic tests are also run in offline mode. Use the command
hardware module slot offline to put the module in offline mode, and
then use the command diagnostic start module slot test [test-id |
name | all] offline to execute the diagnostic test with the offline
attribute.
GOLD Test and EEM Support
The diagnostic tests help identify hardware problems on SUP as well as
line cards, but corrective actions also need to be taken whenever those
problems are encountered. NX-OS provides such a capability by
integrating GOLD tests with the Embedded Event Manager (EEM), which
takes corrective actions in case diagnostic tests fail. One of the most
common use cases for GOLD tests is conducting burn-in testing or staging
new equipment before placing the device into a production environment.
Burn-in testing is similar to load testing: The device is typically under
some load, with investigation into resource utilization, including memory,
CPU, and buffers over time. This helps prevent any major outages that
result from hardware issues before the device starts processing production
traffic.
NX-OS supports corrective actions for the following HM tests:
RewriteEngineLoopback
StandbyFabricLoopback
Internal PortLoopback
SnakeLoopback
On the Supervisor module, if the StandbyFabricLoopback test fails, the
system reloads the standby supervisor card. If the standby supervisor card
does not come back up online in three retries, the standby supervisor card
is powered off. After the reload of the standby supervisor card, the HM
diagnostics start by default. The corrective actions are disabled by default
and are enabled by configuring the command diagnostic eem action
conservative.
Note
The command diagnostic eem action conservative is not
configurable on a per-test basis; it applies to all four of the previously
mentioned GOLD tests.
Nexus Device Health Checks
In any network environment, the network administrators and operators are
required to perform regular device health checks to ensure stability in the
network and to capture issues before they cause major network impacts.
Health checks are performed either manually or by using automation tools.
The command line might vary among Nexus platforms, but a few common
points are verified at regular intervals:
Module state and diagnostics
Hardware and process crashes and resets
Packet drops
Interface errors and drops
The previous section covered module state and diagnostics. This section
focuses on commands used across different Nexus platforms to perform
health checks.
core://<module-number>/<process-id>/<instance-number>
For instance, in Example 3-6, the location for the core files is
core://6/4298/1. If the Nexus 7000 switch rebooted or a switchover
occurred, the core files would be located in the logflash://[sup-1 | sup-
2]/core directory. On other Nexus platforms, such as Nexus 5000, 4000, or
3000, the core files would be located in the volatile: file system instead of
the logflash: file system; thus, they can be lost if the device reloads. In
newer versions of software for platforms that stores core files in volatile:
file system, the capability was added to write the core files to bootflash: or
to a remote file location when they occur.
If a process crashed but no core files were generated for the crash, a stack
trace might have been generated for the process. But if neither a core file
nor a stack trace exists for the crashed service, use the command show
processes log vdc-all to identify which processes were impacted. Such
crashed processes usually are marked with the N flag. Using the process
ID (PID) values from the previous command and using the command show
processes log pid pid can identify the reason the service went down. The
command output displays the reason the process failed in the Death reason
field. Example 3-7 displays using the show processes log and show
processes log pid commands to identify crashes on the Nexus platform
PID: 5656
Exit code: signal 6 (core dumped)
cgroup: 1:devices,memory,cpuacct,cpu:/1
CWD: /var/sysmgr/work
RLIMIT_AS: 1936268083
! Output omitted for brevity
For quick verification of the last reset reason, use the show system reset-
reason command. Additional commands to capture and identify the reset
reason when core files were not generated follow:
show system exception-info
show module internal exceptionlog module slot
show logging onboard [module slot]
show process log details
Packet Loss
Packet loss is a complex issue to troubleshoot in any environment. Packet
happens because of multiple reasons:
Bad hardware
Drops on a platform
A routing or switching issue
The packet drops that result from routing and switching issues can be
fixed by rectifying the configuration. Bad hardware, on the other hand,
impacts all traffic on a partial port or on the whole line card. Nexus
platforms provide various counters that can be viewed to determine the
reason for packet loss on the device (see the following sections).
To view just the various counters on the interfaces, use the command show
interface counters errors. The counters errors option is also used with
the specific show interface interface-number command. Example 3-9
displays the error counters for the interface. If any counter is increasing,
the interface needs further troubleshooting, based on the kind of errors
received. The error can point to Layer 1 issues, a bad port issue, or even
buffer issues. Some counters indicated in the output are not errors, but
instead indicate a different problem: The Giants counter, for instance,
indicates that packets are being received with a higher MTU size than the
one configured on the interface.
-----------------------------------------------------------------
---------------
Port Align-Err FCS-Err Xmit-Err Rcv-
Err UnderSize OutDiscards
-----------------------------------------------------------------
---------------
Eth2/1 0 0 0 0
0 0
-----------------------------------------------------------------
---------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-
Sen Runts
-----------------------------------------------------------------
---------------
Eth2/1 0 0 0 0
0 0
-----------------------------------------------------------------
---------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er
IntMacRx-Er Symbol-Err
-----------------------------------------------------------------
---------------
Eth2/1 0 -
- 0 0 0 0
To view the details of the hardware interface resources and utilization, use
the command show hardware capacity interface. This command displays
not only buffer information but also any drops in both the ingress and
egress directions on multiple ports across each line card. The output varies
a bit among Nexus platforms, such as between the Nexus 7000 and the
Nexus 9000, but this command is useful for identifying interfaces with the
highest drops on the switch. Example 3-10 displays the hardware interface
resources on the Nexus 7000 switch.
Interface drops:
Module Total drops Highest drop ports
3 Tx: 0 -
3 Rx: 101850 Ethernet3/37
4 Tx: 0 -
4 Rx: 64928 Ethernet4/4
Transmit queues
----------------------------------------
Queue 1p7q4t-out-q-default
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 0
Queue 1p7q4t-out-q2
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 0
Queue 1p7q4t-out-q3
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 0
Queue 1p7q4t-out-q4
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 81653
Queue 1p7q4t-out-q5
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 35096
Queue 1p7q4t-out-q6
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 245191
Queue 1p7q4t-out-q7
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 657759
Queue 1p7q4t-out-pq1
Total bytes 0
Total packets 0
Current depth in bytes 0
Min pg drops 0
No desc drops 0
WRED drops 0
Taildrop drops 0
Platform-Specific Drops
Nexus platforms provide in-depth information on various platform-level
counters to identify problems with hardware and software components. If
packet loss is noticed on a particular interface or line card, the platform-
level commands provide information on what is causing the packets to be
dropped. For instance, on the Nexus 7000 switch, the command show
hardware internal statistics [module slot | module-all] pktflow
dropped is used to identify the reason for packet drops. This command
details the information per line card module and packet drops across all
interfaces on the line card. Example 3-12 displays the packet drops across
various ports on the line card in slot 3. The command output displays
packet drops resulting from bad packet length, error packets from Media
Access Control (MAC), a bad cyclic redundancy check (CRC), and so on.
Using the diff keyword along with the command helps identify drops that
are increasing on particular interfaces and that result from specific
reasons, for further troubleshooting.
|---------------------------------------|
|Executed at : 2017-06-02 10:09:16.914 |
|---------------------------------------|
Hardware statistics on module 03:
|----------------------------------------------------------------
--------|
| Device:Flanker Eth Mac
Driver Role:MAC Mod: 3 |
| Last cleared @ Fri Jun 2 00:28:46 2017
|----------------------------------------------------------------
--------|
Instance:0
Cntr Name Value
Ports
----- ----- ----
- -----
0 igr in upm: pkts rcvd, len(>= 64B, <= mtu) with bad crc
0000000000000001
3 -
1 igr rx pl: received error pkts from
mac 0000000000000001 3 -
2 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000004 3 -
3 igr rx pl: cbl
drops 0000000000002818 3 -
4 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000002 4 -
Instance:1
Cntr Name Value
Ports
----- ----- ----
- -----
5 igr in upm: pkts rcvd, len > MTU with bad CRC
0000000000000001 10 -
6 igr in upm: pkts rcvd, len > MTU with bad CRC
0000000000000001 11 -
7 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000002 9 -
8 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000011 10 -
9 igr rx pl: cbl
drops 0000000000000004 10 -
10 igr rx pl: received error pkts from
mac 0000000000000001 11 -
11 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000017 11 -
12 igr rx pl: cbl
drops 0000000000002812 11 -
Instance:3
Cntr Name Value
Ports
----- ----- ----
- -----
13 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000003 26 -
14 igr rx pl: cbl
drops 0000000000000008 26 -
15 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000001 31 -
Instance:4
Cntr Name Value
Ports
----- ----- ----
- -----
16 igr in upm: pkts rcvd, len > MTU with bad CRC
0000000000000027 35 -
17 igr in upm: pkts rcvd, len > MTU with bad CRC
0000000000000044 36 -
18 igr in upm: pkts rcvd, len(>= 64B, <= mtu) with bad crc
0000000000000001
36 -
19 igr in upm: pkts rcvd, len > MTU with bad CRC
0000000000005795 37 -
20 igr in upm: pkts rcvd, len > MTU with bad CRC
0000000000000034 38 -
21 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000008 33 -
22 igr rx pl: cbl
drops 0000000000002801 33 -
23 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000004 34 -
24 egr out pl: total pkts dropped due to
cbl 0000000000001769 34 -
25 igr rx pl: received error pkts from
mac 0000000000000003 35 -
26 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000200 35 -
27 igr rx pl: cbl
drops 0000000000002813 35 -
28 igr rx pl: dropped pkts
cnt 0000000000000017 35 -
29 igr rx pl: received error pkts from
mac 0000000000000093 36 -
30 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000002515 36 -
31 igr rx pl: cbl
drops 0000000000002894 36 -
32 igr rx pl: dropped pkts
cnt 0000000000000166 36 -
33 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000047337 37 -
34 igr rx pl: dropped pkts
cnt 0000000000001371 37 -
35 igr rx pl: EM-IPL i/f dropped pkts
cnt 0000000000000212 38 -
36 igr rx pl: dropped pkts
cnt 0000000000000012 38 -
Instance:5
Cntr Name Value
Ports
----- ----- ----
- -----
5 igr ib_500: de drops, shared by parser and de
0000000000000004 41-48 -
6 igr ib_500: vq ib pkt
drops 0000000000000004 41-48 -
7 igr vq: l2 pkt drop
count 0000000000000004 41-48 -
8 igr vq: total pkts
dropped 0000000000000004 41-48 -
|----------------------------------------------------------------
--------|
| Device:Lightning Role:ARB-
MUX Mod: 3 |
| Last cleared @ Fri Jun 2 00:28:46 2017
|----------------------------------------------------------------
--------|
Communication among the supervisor card, line card, and fabric cards
occurs over the Ethernet out-of-band channel (EOBC). If errors occur on
the EOBC channel, the Nexus switch can experience packet loss and major
service loss. EOBC errors are verified using the command show hardware
internal cpu-mac eobc stats. The Error Counters section displays a list of
errors that occur on the EOBC interface. In most instances, physically
reseating the line card fixes the EOBC errors. Example 3-13 displays the
EOBC stats for Error Counters on a Nexus 7000 switch. Filter the output
for checking just the error counters by using the grep keyword (see
Example 3-13).
Nexus platforms also provide in-band stats for packets that the central
processing unit (CPU) processes. If an error counter shows the inband stats
increasing frequently, it could indicate a problem with the supervisor card
and might lead to packet loss. To view the CPU in-band statistics, use the
command show hardware internal cpu-mac inband stats. This command
displays various statistics on packets and length of packets received by or
sent from the CPU, interrupt counters, error counters, and present and
maximum punt statistics. Example 3-14 displays the output of the in-band
stats on the Nexus 7000 switch. This command is also available on the
Nexus 9000 switch, as the second output shows.
RMON counters Rx Tx
----------------------+--------------------+--------------------
total packets 1154193 995903
good packets 1154193 995903
64 bytes packets 0 0
65-127 bytes packets 432847 656132
128-255 bytes packets 429319 8775
256-511 bytes packets 236194 328244
512-1023 bytes packets 619 18
1024-max bytes packets 55214 2734
broadcast packets 0 0
multicast packets 0 0
good octets 262167681 201434260
total octets 0 0
XON packets 0 0
XOFF packets 0 0
management packets 0 0
Interrupt counters
-------------------+--
Assertions 1176322
Rx packet timer 1154193
Rx absolute timer 0
Rx overrun 0
Rx descr min thresh 0
Tx packet timer 0
Tx absolute timer 1154193
Tx queue empty 995903
Tx descr thresh low 0
Error counters
--------------------------------+--
CRC errors ..................... 0
Alignment errors ............... 0
Symbol errors .................. 0
Sequence errors ................ 0
RX errors ...................... 0
Missed packets (FIFO overflow) 0
Single collisions .............. 0
Excessive collisions ........... 0
Multiple collisions ............ 0
Late collisions ................ 0
Collisions ..................... 0
Defers ......................... 0
Tx no CRS ..................... 0
Carrier extension errors ....... 0
Rx length errors ............... 0
FC Rx unsupported .............. 0
Rx no buffers .................. 0
Rx undersize ................... 0
Rx fragments ................... 0
Rx oversize .................... 0
Rx jabbers ..................... 0
Rx management packets dropped .. 0
Tx TCP segmentation context .... 0
Tx TCP segmentation context fail 0
Throttle statistics
-----------------------------+---------
Throttle interval ........... 2 * 100ms
Packet rate limit ........... 64000 pps
Rate limit reached counter .. 0
Tick counter ................ 193078
Active ...................... 0
Rx packet rate (current/max) 3 / 182 pps
Tx packet rate (current/max) 2 / 396 pps
NAPI statistics
----------------+---------
Weight Queue 0 ......... 512
Weight Queue 1 ......... 256
Weight Queue 2 ......... 128
Weight Queue 3 ......... 16
Weight Queue 4 ......... 64
Weight Queue 5 ......... 64
Weight Queue 6 ......... 64
Weight Queue 7 ......... 64
Poll scheduled . 1176329
Poll rescheduled 0
Poll invoked ... 1176329
Weight reached . 0
Tx packets ..... 995903
Rx packets ..... 1154193
Rx congested ... 0
Rx redelivered . 0
qdisc stats:
----------------+---------
Tx queue depth . 10000
qlen ........... 0
packets ........ 995903
bytes .......... 197450648
drops .......... 0
Inband stats
----------------+---------
Tx src_p stamp . 0
N9396PX-5# show hardware internal cpu-mac inband stats
================ Packet Statistics ======================
Packets received: 58021524
Bytes received: 412371530221
Packets sent: 57160641
Bytes sent: 409590752550
Rx packet rate (current/peak): 0 / 281 pps
Peak rx rate time: 2017-03-08 19:03:21
Tx packet rate (current/peak): 0 / 289 pps
Peak tx rate time: 2017-04-24 14:26:36
Note
The output varies among Nexus platforms. For instance, the previous
output is brief and comes from the Nexus 9396 PX switch. The same
command output on the Nexus 9508 switch is similar to the output
displayed for the Nexus 7000 switch. This command is available on
all Nexus platforms.
In the previous output, the in-band stats command on Nexus 9396, though
brief, displays the time when the traffic hit the peak rate; such information
is not available on the command for the Nexus 7000 switch. Nexus 7000
provides the show hardware internal cpu-mac inband events command,
which displays the event history of the traffic rate in the ingress (Rx) or
egress (Tx) direction of the CPU, including the peak rate. Example 3-15
displays the in-band events history for the traffic rate in the ingress or
egress direction of the CPU. The time stamp of the peak traffic rate is
useful when investigating high CPU or packet loss on the Nexus 7000
switches.
NX-OS also provides with a brief in-band counters CLI that displays the
number of in-band packets in both ingress (Rx) and egress (Tx) directions,
errors, dropped counters, overruns, and more. These are used to quickly
determine whether the in-band traffic is getting dropped. Example 3-16
displays the output of the command show hardware internal cpu-mac
inband counters. If nonzero counters appear for errors, drops, or
overruns, use the diff keyword to determine whether they are increasing
frequently. This command is available on all platforms.
Packet drops on the Nexus switch happen because of various errors in the
hardware. The drops happen either at the line card or on the supervisor
module itself. To view the various errors and their counters across all the
modules on a Nexus switch, use the command show hardware internal
errors [all | module slot]. Example 3-17 displays the hardware internal
errors on the Nexus 7000 switch. Note that the command is applicable for
all Nexus platforms.
|----------------------------------------------------------------
--------|
| Device:Clipper
XBAR Role:QUE Mod: 1 |
| Last cleared @ Wed May 31 12:59:42 2017
| Device Statistics Category :: ERROR
|----------------------------------------------------------------
--------|
|----------------------------------------------------------------
--------|
| Device:Clipper
FWD Role:L2 Mod: 1 |
| Last cleared @ Wed May 31 12:59:42 2017
| Device Statistics Category :: ERROR
|----------------------------------------------------------------
--------|
! Output omitted for brevity
Note
Each Nexus platform has different ASICs where errors or drops are
observed. However, these are outside the scope of this book. It is
recommended to capture show tech-support detail and tac-pac
command output during problematic states, to identify the platform-
level problems leading to packet loss.
Note
Compatibility between an FEX and its parent switch is based on the
software release notes of the software version being used on the
Nexus switch.
Note
Chapter 4, “Nexus Switching,” has more details on the FEX supported
and nonsupported designs.
To enable FEX, NX-OS first requires installing the feature set using the
command install feature-set fex. Then the feature set for FEX must be
installed using the command feature-set fex. If the FEX is being enabled
on the Nexus 7000, the FEX feature set is installed in the default VDC
along with the command no hardware ip verify address reserved; the
feature-set fex then is configured under the relevant VDC. The command
no hardware ip verify address reserved is required only when the
intrusion detection system (IDS) reserved address check is enabled. This is
verified using the command show hardware ip verify. If the check is
already disabled, the command no hardware ip verify address reserved
is not required to be configured.
When the feature-set fex is enabled, interfaces are enabled as FEX fabric
using the command switchport mode fex-fabric. The next step is to
assign an ID for the FEX, which is further used to distinguish an FEX on
the switch. Example 3-18 illustrates the configuration on the Nexus switch
for connecting to an FEX.
Further details on the FEX are viewed using the command show fex fex-
number detail. This command displays the status of the FEX and all the
FEX interfaces. Additionally, it displays the details of pinning mode and
information regarding the FEX fabric ports. Example 3-20 displays the
detailed output of the FEX 101.
When the FEX satellite ports are available, use them to configure these
ports as either Layer 2 or Layer 3 ports; they also can act as active-active
ports by making them part of the vPC configuration.
If issues arise with the fabric ports or the satellite ports, the state change
information is viewed using the command show system internal fex info
fport [all | interface-number] or show system internal fex info satport
[all | interface-number]. Example 3-21 displays the internal information of
both the satellite and fabric ports on the Nexus 7000 switch. In the first
section of the output, the command displays a list of events that the
system goes through to bring up the FEX. It lists all the finite state
machine events, which is useful while troubleshooting in case the FEX
does not come up and gets stuck in one of the states. The second section of
the output displays information about the satellite ports and their status
information.
Note
If any issues arise with the FEX, it is useful to collect show tech-
support fex fex-number during the problematic state. The issue might
also result from the Ethpm component on Nexus as the FEX sends
state change messages to Ethpm. Thus, capturing the show tech-
support ethpm output during problematic state could be relevant.
Ethpm is discussed later in this chapter.
vdc-default
-------------
Resource Min Max
---------- ----- -----
monitor-rbs-product 0 12
monitor-rbs-filter 0 12
monitor-session-extended 0 12
monitor-session-mx-exception-src 0 1
monitor-session-inband-src 0 1
port-channel 0 768
monitor-session-erspan-dst 0 23
monitor-session 0 2
vlan 16 4094
anycast_bundleid 0 16
m6route-mem 5 20
m4route-mem 8 90
u6route-mem 4 4
u4route-mem 8 8
vrf 2 4096
N7K-1(config)# vdc resource template DEMO-TEMPLATE
N7K-1(config-vdc-template)# limit-resource port-channel minimum 1
maximum 4
N7K-1(config-vdc-template)# limit-resource vrf minimum 5 maximum
100
N7K-1(config-vdc-template)# limit-resource vlan minimum 20
maximum 200
N7K-1# show vdc resource template DEMO-TEMPLATE
DEMO-TEMPLATE
---------------
Resource Min Max
---------- ----- -----
vlan 20 200
vrf 5 100
port-channel 1 4
Bringdown Bringdown
Restart (default) Restart
Reset Switchover (default)
N7K-1(config-vdc)#
N7K-1(config-vdc)# limit-resource module-type f3
This will cause all ports of unallowed types to be removed from
this vdc. Continue
(y/n)? [yes] yes
N7K-1(config-vdc)# allocate interface ethernet 3/1
Entire port-group is not present in the command. Missing ports
will be included
automatically
Additional Interfaces Included are :
Ethernet3/2
Ethernet3/3
Ethernet3/4
Ethernet3/5
Ethernet3/6
Ethernet3/7
Ethernet3/8
Moving ports will cause all config associated to them in source
vdc to be removed.
Are you sure you want to move the ports (y/n)? [yes] yes
N7K-1(config-vdc)# ha-policy dual-sup ?
bringdown Bring down the vdc
restart Bring down the vdc, then bring the vdc back up
switchover Switchover the supervisor
N7K-1(config-vdc)# ha-policy dual-sup restart
N7K-1(config-vdc)# ha-policy single-sup bringdown
N7K-1(config-vdc)# limit-resource port-channel minimum 3 maximum
5
N7K-1(config-vdc)# limit-resource vlan minimum 20 maximum 100
N7K-1(config-vdc)# limit-resource vrf minimum 5 maximum 10
VDC Initialization
VDC is initialized before VDC-specific configuration is applied. Before
VDC initialization, perform a copy run start after the VDC is created so
that the newly created VDC is part of the startup configuration. The VDC
is initialized using the switchto vdc name command from the default or
admin VDC (see Example 3-24). The initialization process of the VDC has
steps similar to when a new Nexus switch is brought up. It prompts for the
admin password and then the basic configuration dialog. Use this option to
perform basic configuration setups for the VDC using this method, or
follow manual configuration by replying with no for the basic
configuration dialog. The command switchback is used to switch back to
default or admin VDC.
This setup utility will guide you through the basic configuration
of
the system. Setup configures only enough connectivity for
management
of the system.
In Example 3-24, after the VDC is initialized, the host name of the VDC is
seen as N7k-1-N7k-2—that is, the hostnames of both the default VDC and
the new VDC are concatenated. To avoid this behavior, configure the
command no vdc combined-hostname in default or admin VDC.
Out-of-Band and In-Band Management
The Cisco NX-OS software provides a virtual management interface for
out-of-band management for each VDC. Each virtual management interface
is configured with a separate IP address that is accessed through the
physical mgmt0 interface. Using the virtual management interface enables
you to use only one management network, which shares the AAA servers
and syslog servers among the VDCs.
VDCs also support in-band management. VDC is accessed using one of the
Ethernet interfaces that are allocated to the VDC. Using in-band
management involves using only separate management networks, which
ensures separation of the AAA servers and syslog servers among the VDCs.
VDC Management
NX-OS software provides a CLI to easily manage the VDCs when
troubleshooting problems. The VDC configuration of all the VDCs is seen
from default or admin VDC. Use the command show run vdc to view all
the VDC-related configuration. Additionally, when saving the
configuration, use the command copy run start vdc-all to copy the
configuration done on all VDCs.
NX-OS provides a CLI to view further details of the VDC without looking
at the configuration. Use the command show vdc [detail] to view the
details of each VDC. The show vdc detail command displays various lists
of information for each VDC, such as ID, name, state, HA policy, CPU
share, creation time and uptime of the VDC, VDC type, and line cards
supported by each VDC (see Example 3-25). On a Nexus 7000 switch, some
VDCs might be running critical services. By default, NX-OS allocates an
equal CPU share (CPU resources) to all the VDCs. On SUP2 and SUP2E
supervisor cards, NX-OS allows users to allocate a specific amount of the
switch’s CPU, to prioritize more critical VDCs.
vdc id: 1
vdc name: N7k-1
vdc state: active
vdc mac address: 50:87:89:4b:c0:c1
vdc ha policy: RELOAD
vdc dual-sup ha policy: SWITCHOVER
vdc boot Order: 1
CPU Share: 5
CPU Share Percentage: 50%
vdc create time: Fri Apr 21 05:57:30 2017
vdc reload count: 0
vdc uptime: 1 day(s), 0 hour(s), 35 minute(s), 41 second(s)
vdc restart count: 1
vdc restart time: Fri Apr 21 05:57:30 2017
vdc type: Ethernet
vdc supported linecards: f3
vdc id: 2
vdc name: N7k-2
vdc state: active
vdc mac address: 50:87:89:4b:c0:c2
vdc ha policy: RESTART
vdc dual-sup ha policy: SWITCHOVER
vdc boot Order: 1
CPU Share: 5
CPU Share Percentage: 50%
vdc create time: Sat Apr 22 05:05:59 2017
vdc reload count: 0
vdc uptime: 0 day(s), 1 hour(s), 28 minute(s), 12 second(s)
vdc restart count: 1
vdc restart time: Sat Apr 22 05:05:59 2017
vdc type: Ethernet
vdc supported linecards: f3
To further view the details of resources allocated to each VDC, use the
command show vdc resource [detail]. This command displays the
configured minimum and maximum value and the used, unused, and
available values for each resource. The output is run for individual VDCs
using the command show vdc name resource [detail]. Example 3-26
displays the resource configuration and utilization for each VDC on the
Nexus 7000 chassis running two VDCs (for instance, N7k-1 and N7k-2).
N7k-
2 0 2 0 0 2
vrf 5 used 0 unused 4091 free 4091
avail 4096 total
-----
Vdc Min Max Used Unu
sed Avail
--- --- --- ---- ---
--- -----
N7k-
1 2 4096 3 0 4091
N7k-
2 2 4096 2 0 4091
N7k-
2 8 8 1 7 7
! Output omitted for brevity
Based on the kind of line cards the VDC supports, interfaces are allocated
to each VDC. To view the member interfaces of each VDC, use the
command show vdc membership. Example 3-27 displays the output of the
show vdc membership command. In Example 3-27, notice the various
interfaces that are part of VDC 1 (N7k-1) and VDC 2 (N7k-2). If a
particular VDC is deleted, the interfaces become unallocated and are thus
shown under the VDC ID 0.
Ethernet3/48
Ethernet3/7 Ethernet3/8
NX-OS also provides internal event history logs to view errors or messages
related to a VDC. Use the command show vdc internal event-history
[errors | msgs | vdc_id id] to view the debugging information related to
VDCs. Example 3-28 demonstrates creating a new VDC (N7k-3) and shows
relevant event history logs that display events the VDC creation process
goes through before the VDC is created and active for use. The events in
Example 3-28 show the VDC creation in progress and then show that it
becomes active.
Note
For more details on supported module combinations and the behavior
of modules running in different modes, refer to the CCO
documentation listed in the “References” section, at the end of the
chapter.
Troubleshooting NX-OS System Components
Nexus is a distributed architecture platform, so it runs features that are both
platform independent (PI) and platform dependent (PD). In troubleshooting
PI features such as the routing protocol control plane, knowing the feature
helps in easily isolating the problem; for features in which PD
troubleshooting is required, however, understanding the NX-OS system
components helps.
Troubleshooting PD issues requires having knowledge about not only
various system components but also dependent services or components. For
instance, Route Policy Manager (RPM) is a process that is dependent on the
Address Resolution Protocol (ARP) and Netstack processes (see Example
3-29). These processes are further dependent on other processes. The
hierarchy of dependency is viewed using the command show system
internal sysmgr service dependency srvname name.
Note
A client is required to know the server’s SAP (usually a static SAP) to
communicate with the server.
An MTS address is divided into two parts: a 4-byte node address and a 2-
byte SAP number. Because an MTS domain provides services to the
processes associated with that domain, the node address in the MTS
address is used to decide the destination MTS domain. Thus, the SAP
number resides in the MTS domain identified by the node address. If the
Nexus switch has multiple VDCs, each VDC has its own MTS domain; this
is reflected as SUP for VDC1, SUP-1 for VDC2, SUP-2 for VDC3, and so
on.
MTS also has various operational codes to identify different kinds of
payloads in the MTS message:
sync: This is used to synchronize information to standby.
notification: The operations code is used for one-way notification.
request_response: The message carries a token to match the request
and response.
switchover_send: The operational code can be sent during switchover.
switchover_recv: The operational code can be received during
switchover.
seqno: The operational code carries a sequence number.
Various symptoms can indicate problems with MTS, and different
symptoms mean different problems. If a feature or process is not
performing as expected, high CPU is noticed on the Nexus switch, or ports
are bouncing on the switch for no reason, then the MTS message might be
stuck in the queue. The easiest way to check is to check the MTS buffer
utilization, using the command show system internal mts buffer
summary. This output needs to be taken several times to see which queues
are not clearing. Example 3-30 demonstrates how the MTS buffer summary
looks when the queues are not clearing. The process with SAP number 2938
seems to be stuck because the messages are stuck in the receive queue; the
other process with SAP number 2592 seems to have cleared the messages
from the receive queue.
recv_q Receive
Queue
pers_q Persistent Messages in this queue survive through the
Queue crash. MTS replays the message after the
crash.
npers_q Nonpersistent Messages do not survive the crash.
Queue
log_q Log Queue MTS logs the message when an application
sends or receives the message. The application
uses logging for transaction recovery in
restart. The application retrieves logged
messages explicitly after restart.
Messages stuck in the queue lead to various impacts on the device. For
instance, if the device is running BGP, you might randomly see BGP flaps
or BGP peering not even coming up, even though the BGP peers might have
reachability and correct configuration. Alternatively, the user might not be
able to perform a configuration change, such as adding a new neighbor
configuration.
After determining that the messages are stuck in one of the queues, identify
the process associated with the SAP number. The command show system
internal mts sup sap sapno description obtains this information. The
same information also can be viewed from the sysmgr output using the
command show system internal sysmgr service all. For details about all
the queued messages, use the command show system internal mts buffers
detail. Example 3-31 displays the description of the SAP 2938, which
shows the statsclient process. The statsclient process is used to collect
statistics on supervisor or line card modules. The second section of the
output displays all the messages present in the queue.
Note
The SAP description information in Example 3-31 is taken from the
default VDC. For the information on the nondefault DVC, use the
command show system internal mts node sup-[vnode-id] sap sapno
description.
The first and most important field to check in the previous output is the
SAP number and its age. If the duration of the message stuck in the queue
is fairly long, those messages need to be investigated; they might be
causing services to misbehave on the Nexus platform. The other field to
look at is OPC, which refers to the operational code. After the messages in
the queue are verified from the buffers detail output, use the command
show system internal sup opcodes to determine the operational code
associated with the message, to understand the state of the process.
SAP statistics are also viewed to verify different queue limits of various
SAPs and to check the maximum queue limit that a process has reached.
This is done using the command show system internal mts sup sap sapno
stats (see Example 3-32).
The MTS errors are also reported in the MTS event history logs and can be
viewed using the command show system internal mts event-history
errors.
If the MTS queue is stuck or an MTS buffer leak is observed, performing a
supervisor switchover clears the MTS queues and helps recover from
service outages from an MTS queue stuck problem.
Note
If SAP number 284 appears in the MTS buffer queue, ignore it: It
belongs to the TCPUDP process client and is thus expected.
Netstack and Packet Manager
Netstack is the NX-OS implementation of the user-mode Transmission
Control Protocol (TCP)/Internet Protocol (IP) stack, which runs only on the
supervisor module. The Netstack components are implemented in user
space processes. Each Netstack component runs as a separate process with
multiple threads. In-band packets and features specific to NX-OS, such as
vPC- and VDC-aware capabilities, must be processed in software. Netstack
is the NX-OS component in charge of processing software-switched
packets. As stated earlier, the Netstack process has three main roles:
Pass in-band packets to the correct control plane process application
Forward in-band punted packets through software in the desired
manner
Maintain in-band network stack configuration data
Netstack is made up of both Kernel Loadable Module (KLM) and user
space components. The user space components are VDC local processes
containing Packet Manager, which is the Layer 2 processing component; IP
Input, the Layer 3 processing component; and TCP/UDP functions, which
handle the Layer 4 packets. The Packet Manager (PktMgr) component is
mostly isolated with IP input and TCP/UDP, even though they share the
same process space. Figure 3-1 displays the Netstack architecture and the
components part of KLM and user space.
Figure 3-1 Netstack Architecture
If the packets being sent to the supervisor are from a particular interface,
verify the PktMgr statistics for the interface using the command show
system internal pktmgr interface interface-id (see Example 3-36). This
example explicitly shows how many unicast, multicast, and broadcast
packets were sent and received.
--------------------------------------------
Driver:
--------------------------------------------
State: Up
Filter: 0x0
For IP processing, Netstack queries the URIB—that is, the routing table
and all other necessary components, such as the Route Policy Manager
(RPM)—to make a forwarding decision for the packet. Netstack performs
all the accounting in the show ip traffic command output. The IP traffic
statistics are used to track fragmentation, Internet Control Message
Protocol (ICMP), TTL, and other exception packets. This command also
displays the RFC 4293 traffic statistics. An easy way to figure out whether
the IP packets are hitting the NX-OS Netstack component is to observe the
statistics for exception punted traffic, such as fragmentation. Example 3-38
illustrates the different sections of the show ip traffic command output.
Necessary details of the TCP socket connection are verified using the
command show sockets connection tcp [detail]. The output with the detail
option provides information such as TCP windowing information, the MSS
value for the session, and the socket state. The output also provides the
MTS SAP ID. If the TCP socket is having a problem, look up the MTS SAP
ID in the buffer to see whether it is stuck in a queue. Example 3-40 displays
the socket connection details for BGP peering between two routers.
Netstack socket clients are monitored with the command show sockets
client detail. This command explains the socket client behavior and shows
how many socket library calls the client has made. This command is useful
in identifying issues a particular socket client is facing because it also
displays the Errors section, where errors are reported for a problematic
client. As Example 3-41 illustrates, the output displays two clients, syslogd
and bgp. The output shows the associated SAP ID with the client and
statistics on how many socket calls the process has made. The Errors
section is empty because no errors are seen for the displayed sockets.
TCP v4 Received:
402528 total packets received, 203911 packets received in
sequence,
3875047 bytes received in sequence, 8 out-of-order
packets received,
10 rcvd duplicate acks, 208189 rcvd ack packets,
3957631 bytes acked by rcvd acks, 287 Dropped no inpcb,
203911 Fast recv packets enqueued, 16 Fast TCP can not
recv more,
208156 Fast TCP data ACK to app,
TCP v4 Sent:
406332 total packets sent, 20 control (SYN|FIN|RST)
packets sent,
208162 data packets sent, 3957601 data bytes sent,
198150 ack-only packets sent,
INPCB Statistics:
in_pcballoc: 38 in_pcbbind: 9
in_pcbladdr: 18 in_pcbconnect: 14
in_pcbdetach: 19 in_pcbdetach_no_rt: 19
in_setsockaddr: 13 in_setpeeraddr: 14
in_pcbnotify: 1 in_pcbinshash_ipv4: 23
in_pcbinshash_ipv6: 5 in_pcbrehash_ipv4: 18
in_pcbremhash: 23
INPCB Errors:
IN6PCB Statistics:
in6_pcbbind: 5
in6_pcbdetach: 4 in6_setsockaddr: 1
in6_pcblookup_local: 2
IN6PCB Errors:
Multiple clients (ARP, STP, BGP, EIGRP, OSPF, and so on) interact with
the Netstack component. Thus, while troubleshooting control plane issues,
if you are able to see the packet in Ethanalyzer but the packet is not
received by the client component itself, the issue might be related to the
Netstack or the Packet Manager (Pktmgr). Figure 3-2 illustrates the control
plane packet flow and placement of the Netstack and Pktmgr components
in the system.
Note
If an issue arises with any Netstack component or Netstack component
clients, such as OSPF or TCP failure, collect output from the
commands show tech-support netstack and show tech-support
pktmgr, along with the relevant client show tech-support outputs, to
aid in further investigation by the Cisco TAC.
ARP and Adjacency Manager
The ARP component handles ARP functionality for the Nexus switch
interfaces. The ARP component registers with PktMgr as a Layer 2
component and provides a few other functionalities:
Manages Layer 3–to–Layer 2 adjacency learning and timers
Manages static ARP entries
Punts the glean adjacency packets to the CPU, which then triggers
ARP resolution
Adds ARP entries into the Adjacency Manager (AM) database
Manages virtual addresses registered by first-hop redundancy
protocols (FHRP), such as Virtual Router Redundancy Protocol
(VRRP), Hot Standby Router Protocol (HSRP), and Gateway Load-
Balancing Protocol (GLBP)
Has clients listening for ARP packets such as ARP snooping, HSRP,
VRRP, and GLBP
All the messaging and communication with the ARP component happens
with the help of MTS. ARP packets are sent to PktMgr via MTS. The ARP
component does not support the Reverse ARP (RARP) feature, but it does
support features such as proxy ARP, local proxy ARP, and sticky ARP.
Note
If the router receives packets destined to another host in the same
subnet and local proxy ARP is enabled on the interface, the router does
not send the ICMP redirect messages. Local proxy ARP is disabled by
default.
If the Sticky ARP option is set on an interface, any new ARP entries
that are learned are marked so that they are not overwritten by a new
adjacency (for example, gratuitous ARP). These entries also do not get
aged out. This feature helps prevent a malicious user from spoofing an
ARP entry.
Glean adjacencies can cause packet loss and also cause excessive packets to
get punted to CPU. Understanding the treatment of packets when a glean
adjacency is seen is vital. Let’s assume that a switch receives IP packets
where the next hop is a connected network. If an ARP entry exists but no
host route (/32 route) is installed in the FIB or in the AM shared database,
the FIB lookup points to glean adjacency. The glean adjacency packets are
rate-limited. If no network match is found in FIB, packets are silently
dropped in hardware (known as a FIB miss).
To protect the CPU from high bandwidth flows with no ARP entries or
adjacencies programmed in hardware, NX-OS provides rate-limiters for
glean adjacency traffic on Nexus 7000 and 9000 platforms. The
configuration for the preset hardware rate-limiters for glean adjacency
traffic is viewed using the command show run all | include glean.
Example 3-43 displays the hardware rate-limiters for glean traffic.
IP ARP Table
Total number of entries: 2
Address Age MAC Address Interface
10.1.12.10 00:10:20 5087.894b.bb41 Vlan10
10.1.12.2 00:00:09 INCOMPLETE Vlan10
To view the forwarding adjacency, use the command show forwarding ipv4
adjacency interface-type interface-num [module slot]. If the adjacency for
a particular next hop appears as unresolved, there is no adjacency; FIB then
matches the network glean adjacency and performs a punt operation.
Example 3-46 illustrates the output of the show forwarding ipv4
adjacency command with an unresolved adjacency entry.
Note
The ARP packets are also captured using Ethanalyzer in both ingress
and egress directions.
The ARP component is closely coupled with the Adjacency Manager (AM)
component. The AM takes care of programming the /32 host routes in the
hardware. AM provides the following functionalities:
Exports Layer 3 to Layer 2 adjacencies through shared memory
Generates adjacency change notification, including interface deletion
notification, and sends updates via MTS
Adds host routes (/32 routes) into URIB/U6RIB for learned
adjacencies
Performs IP/IPv6 lookup AM database while forwarding packets out
of the interface
Handles adjacencies restart by maintaining the adjacency SDB for
restoration of the AM state
Provides a single interface for URIB/UFDM to learn routes from
multiple sources
When an ARP is learned, the ARP entry is added to the AM SDB. AM then
communicates directly with URIB and UFDM to install a /32 adjacency in
hardware. The AM database queries the state of active ARP entries. The
ARP table is not persistent upon process restart and thus must requery the
AM SDB. AM registers various clients that can install adjacencies. To view
the registered clients, use the command show system internal adjmgr
client (see Example 3-48). One of the most common clients of AM is ARP.
Example 3-48 Adjacency Manager Clients
Click here to view code image
Address : 10.1.12.10
MacAddr : 5087.894b.bb41
Preference : 50
Source : arp
Interface : Vlan10
Physical Interface : Ethernet2/1
Packet Count : 0
Byte Count : 0
Best : Yes
Throttled : No
! Unresolved Adjacency
N7k-1# show ip adjacency 10.1.12.2 detail
! Output omitted for brevity
Address : 10.1.12.10
MacAddr : 5087.894b.bb41
Preference : 50
Source : arp
Interface : Vlan10
Physical Interface : Ethernet2/1
Packet Count : 0
Byte Count : 0
Best : Yes
Throttled : No
! Unresolved Adjacency
N7k-1# show ip adjacency 10.1.12.2 detail
! Output omitted for brevity
Address : 10.1.12.2
MacAddr : 0000.0000.0000
Preference : 255
Source : arp
Interface : Vlan10
Physical Interface : Vlan10
Packet Count : 0
Byte Count : 0
Best : Yes
Throttled : No
Note
If an issue arises with any ARP or AM component, capture the show
tech arp and show tech adjmgr outputs during problematic state.
CLIENT: ospf-100
index mask: 0x0000000000008000
epid: 23091 MTS SAP: 320 MRU cache
hits/misses: 2/1
Stale Time: 2100
Routing Instances:
VRF: "default" routes: 1, rnhs: 0, labels: 0
Messages received:
Register : 1 Convergence-notify: 1 Modify-
route : 1
Messages sent:
Modify-route-ack : 1
Each routing protocol has its own region of shared URIB memory space.
When a routing protocol learns routes from its neighbor, it installs those
learned routes in its own region of shared URIB memory space. URIB then
copies updated routes to its own protected region of shared memory, which
is read-only memory and is readable only to Netstack and other
components. The routing decisions are made from the entry present in
URIB shared memory. It is vital to note that URIB itself does not perform
any of the add, modify, or delete operations in the routing table. URIB
clients (the routing protocols and Netstack) handle all updates, except when
the URIB client process crashes. In such a case, URIB might then delete
abandoned routes.
OSPF CLI provides users with the command show ip ospf internal txlist
urib to view the OSPF routes sent to URIB. For all other routing protocols,
the information is viewed using event history commands. Example 3-52
displays the output, showing the source SAP ID of OSPF process and the
destination SAP ID for MTS messages.
Server up : L3VM|IFMGR|RPM|AM|CLIS|URIB|U6RIB|IP|IPv6|SNMP
Server required : L3VM|IFMGR|RPM|AM|CLIS|URIB|IP|SNMP
Server registered: L3VM|IFMGR|RPM|AM|CLIS|URIB|IP|SNMP
Server optional : none
Early hello : OFF
Force write PSS: FALSE
OSPF mts pkt sap 324
OSPF mts base sap 320
9: 10.1.12.0/24
10: 1.1.1.1/32
11: 2.2.2.2/32
11: RIB marker
N7k-1# show system internal mts sup sap 320 description
ospf-100
N7k-1# show system internal mts sup sap 324 description
OSPF pkt MTS queue
The routes being updated from an OSPF process or any other routing
process to URIB are recorded in the event history logs. To view the updates
copied by OSPF from OSPF process memory to URIB shared memory, use
the command show ip ospf internal event-history rib. Use the command
show routing internal event-history msgs to examine URIB updating the
globally readable shared memory. Example 3-53 shows the learned OSPF
routes being processed and updated to URIB and also the routing event
history showing the routes being updated to shared memory.
Example 3-53 Routing Protocol and URIB Updates
Click here to view code image
N7k-1# show ip ospf internal event-history rib
OSPF RIB events for Process "ospf-100"
2017 May 14 03:12:14.711449 ospf 100 [23091]: : Done sending
routes to URIB
2017 May 14 03:12:14.711447 ospf 100 [23091]: : Examined 3 OSPF
routes
2017 May 14 03:12:14.710532 ospf 100 [23091]: : Route (mbest) does
not have any
next-hop
2017 May 14 03:12:14.710531 ospf 100 [23091]: : Path type changed
from nopath to
intra
2017 May 14 03:12:14.710530 ospf 100 [23091]: : Admin distance
changed from 255
to 110
2017 May 14 03:12:14.710529 ospf 100 [23091]: : Mbest metric
changed from 429496
7295 to 41
2017 May 14 03:12:14.710527 ospf 100 [23091]: : Processing route
2.2.2.2/32
(mbest)
2017 May 14 03:12:14.710525 ospf 100 [23091]: : Done processing
next-hops for
2.2.2.2/32
2017 May 14 03:12:14.710522 ospf 100 [23091]: : Route 2.2.2.2/32
next-hop
10.1.12.2 added to RIB.
2017 May 14 03:12:14.710515 ospf 100 [23091]: : Path type changed
from nopath to
intra
2017 May 14 03:12:14.710513 ospf 100 [23091]: : Admin distance
changed from 255
to 110
2017 May 14 03:12:14.710511 ospf 100 [23091]: : Ubest metric
changed from 429496
7295 to 41
2017 May 14 03:12:14.710509 ospf 100 [23091]: : Processing route
2.2.2.2/32 (ubest)
! Output omitted for brevity
2017 May 14 03:12:14.710430 ospf 100 [23091]: : Start sending
routes to URIB and summarize
N7k-1# show routing internal event-history msgs
! Output omitted for brevity
6) Event:E_MTS_TX, length:60, at 710812 usecs after Sun May 14
03:12:14 2017
[NOT] Opc:MTS_OPC_URIB(52225), Id:0X0036283B, Ret:SUCCESS
Src:0x00000101/253, Dst:0x00000101/320, Flags:None
HA_SEQNO:0X00000000, RRtoken:0x00000000, Sync:NONE,
Payloadsize:312
Payload:
0x0000: 04 00 1a 00 53 0f 00 00 53 0f 00 00 ba 49 07 00
7) Event:E_MTS_RX, length:60, at 710608 usecs after Sun May 14
03:12:14 2017
[NOT] Opc:MTS_OPC_URIB(52225), Id:0X00362839, Ret:SUCCESS
Src:0x00000101/320, Dst:0x00000101/253, Flags:None
HA_SEQNO:0X00000000, RRtoken:0x00000000, Sync:NONE,
Payloadsize:276
Payload:
0x0000: 04 00 19 00 33 5a 00 00 33 5a 00 00 ba 49 07 00
N7k-1# show system internal mts sup sap 253 description
URIB queue
After the routes are installed in the URIB, they can be viewed using the
command show ip route routing-process detail, where routing-process is
the NX-OS process for the respective routing protocols, as in Example 3-53
(ospf-100).
Note
URIB stores all routing information in shared memory. Because the
memory space is shared, it can be exhausted by large-scale routing
issues or memory leak issues. Use the command show routing
memory statistics to view the shared URIB memory space.
Note
NX-OS no longer has Cisco Express Forwarding (CEF). It now relies
on hardware FIB, which is based on AVL Trees, a self-balancing
binary search tree.
The UFDM component distributes AM, FIB, and RPF updates to IPFIB on
each line card in the VDC and then sends an acknowledgment route-ack to
URIB. This is verified using the command show system internal ufdm
event-history debugs (see Example 3-54).
After the hardware FIB has been programmed, the forwarding information
is verified using the command show forwarding route ip-address/len
[detail]. The command output displays the information of the next hop to
reach the destination prefix and the outgoing interface, as well as the
destination MAC information. This information is also verified at the
platform level to get more details on it from the hardware/platform
perspective using the command show forwarding ipv4 route ip-
address/len platform [module slot].
Then the information must be propagated in the relevant line card. This is
verified using the command show system internal forwarding route ip-
address/len [detail]. This command output also provides interface
hardware adjacency information; this is further verified using the command
show system internal forwarding adjacency entry adj, where the adj
value is the adjacency value received from the previous command.
Note
Note that the previous outputs can be collected on the supervisor card
as well as at the line card level by logging into the line card console
using the command attach module slot and then executing the
forwarding commands as already described.
----+---------------------+----------+----------+-----------
Dev | Prefix | PfxIndex | AdjIndex | LIF
----+---------------------+----------+----------+-----------
0 2.2.2.2/32 0x6320 0x5f 0x3
Note
In case of any forwarding issues, collect the following show tech
outputs during problematic state:
show tech routing ip unicast
show tech-support forwarding l3 unicast [module slot]
show tech-support detail
bundle_bringup_id(5)
service_xconnect(0)
current state [ETH_PORT_FSM_ST_L2_UP]
xfp(inserted), status(ok) Extended info (present and valid)
Platform Information:
Local IOD(0xd7), Global IOD(0) Runtime IOD(0xd7)
Capabilities:
Speed(0xc), Duplex(0x1), Flowctrl(r:0x3,t:0x3),
LinkDebounce(0x1)
udld(0x1), SFPCapable(0x1), TrunkEncap(0x1), AutoNeg(0x1)
channel(0x1), suppression(0x1), cos_rewrite(0x1),
tos_rewrite(0x1)
dce capable(0x4), l2 capable(0x1), l3 capable(0x2) qinq
capable(0x10)
ethertype capable(0x1000000), Fabric capable (y), EFP capable
(n)
slowdrain congestion capable(y), slowdrain pause capable (y)
slowdrain slow-speed capable(y)
Num rewrites allowed(104)
eee capable speeds () and eee flap flags (0)
eee max wk_time rx(0) tx(0) fb(0)
Operational Vlans: 10
Pacer Information:
Pacer State: released credits
ISSU Pacer State: initialized
For these link events, relevant messages are seen in the port-client event
history logs for the specified port using the line card-level command show
system internal port-client event-history port port-num.
Note
If issues arise with ports not coming up on the Nexus chassis, collect
the output of the command show tech ethpm during problematic state.
HWRL, CoPP, and System QoS
Denial of service (DoS) attacks take many forms and affect both servers
and infrastructure in any network environment, especially in data centers.
Attacks targeted at infrastructure devices generate IP traffic streams at very
high data rates. These IP data streams contain packets that are destined for
processing by the control plane of the route processor (RP). Based on the
high rate of rogue packets presented to the RP, the control plane is forced to
spend an inordinate amount of time processing this DoS traffic. This
scenario usually results in one of the following issues:
Loss of line protocol keepalives, which cause a line to go down and
lead to route flaps and major network transitions.
Excessive packet processing because packets are being punted to the
CPU.
Loss of routing protocol updates, which leads to route flaps and major
network transitions.
Unstable Layer 2 network
Near 100% CPU utilization that locks up the router and prevents it
from completing high-priority processing (resulting in other negative
side effects).
RP at near 100% utilization, which slows the response time at the user
command line (CLI) or locks out the CLI. This prevents the user from
taking corrective action to respond to the attack.
Consumption of resources such as memory, buffers, and data
structures, causing negative side effects.
Backup of packet queues, leading to indiscriminate drops of important
packets.
Router crashes
To overcome the challenges of DoS/DDoS attacks and excessive packet
processing, NX-OS gives users two-stage policing:
Rate-limiting packets in hardware on a per-module basis before
sending the packets to the CPU
Policy-based traffic policing using control plane policing (CoPP) for
traffic that has passed rate-limiters
The hardware rate-limiters and CoPP policy together increase device
security by protecting its CPU (Route-Processor) from unnecessary traffic
or DoS attacks and gives priority to relevant traffic destined for the CPU.
Note that the hardware rate limiters are available only with Nexus 7000 and
Nexus 9000 series switches and are not available on other Nexus platforms.
Packets that hit the CPU or reach the control plane are classified into these
categories:
Received packets: These packets are destined for the router (such as
keepalive messages)
Multicast packets: These packets are further divided into three
categories:
Directly connected sources
Multicast control packets
Copy packets: For supporting features such as ACL-log, a copy of the
original packet is made and sent to the supervisor. Thus, these are
called copy packets.
ACL-log copy
FIB unicast copy
Multicast copy
NetFlow copy
Exception packets: These packets need special handling. Hardware is
unable to process them or detects an exception, so they are sent to the
supervisor for further processing. Such packets fall under the
exception category. Some of the following exceptions fall under this
category of packets:
Same interface check
TTL expiry
MTU failure
Dynamic Host Control Protocol (DHCP) ACL redirect
ARP ACL redirect
Source MAC IP check failure
Unsupported rewrite
Stale adjacency error
Glean packets: When an L2 MAC for the destination IP or next hop is
not present in the FIB, the packet is sent to the supervisor. The
supervisor then takes care of generating an ARP request for the
destination host or next hop.
Broadcast, non-IP packets: The following packets fall under this
category:
Broadcast MAC + non-IP packet
Broadcast MAC + IP unicast
Multicast MAC + IP unicast
Remember that both the CoPP policy and rate-limiters are applied on per-
module, per-forwarding engine (FE) basis.
Note
On the Nexus 7000 platform, CoPP policy is supported on all line
cards except F1 series cards. F1 series cards exclusively use rate-
limiters to protect the CPU. HWRL is supported on Nexus 7000/7700
and Nexus 9000 series platforms.
Example 3-59 displays the output of the command show hardware rate-
limiters [module slot] to view the rate-limiter configuration and statistics
per each line card module present in the chassis.
Module: 3
R-L
Class Config Allowed Dropped
Total
+------------------+--------+---------------+---------------+----
-------------+
L3
mtu 500 0 0
0
L3
ttl 500 0 0
0
L3
control 10000 0 0
0
L3
glean 100 0 0
0
L3 mcast
dirconn 3000 1 0
1
L3 mcast loc-
grp 3000 0 0 0
L3 mcast rpf-
leak 500 0 0 0
L2 storm-ctrl Disable
access-list-
log 100 0 0 0
copy 30000 54649 0
54649
receive 30000 292600 0
292600
L2 port-
sec 500 0 0
0
L2 mcast-
snoop 10000 2242 0 2
242
L2 vpc-
low 4000 0 0
0
L2
l2pt 500 0 0
0
L2 vpc-peer-
gw 5000 0 0 0
L2 lisp-map-
cache 5000 0 0 0
L2
dpss 100 0 0
0
L3 glean-
fast 100 0 0
0
L2
otv 100 0 0
0
L2
netflow 500 0 0
0
Module: 2
R-L
Class Config Allowed Dropped
Total
+------------------+--------+---------------+---------------+----
-------------+
L3
glean 100 0 0
0
L3 mcast loc-
grp 3000 0 0 0
access-list-
log 100 0 0 0
bfd 10000 0 0
0
exception 50 0 0
0
fex 3000 0 0
0
span 50 0 0
0
dpss 6400 0 0
0
sflow 40000 0 0
0
For verifying the rate-limiter statistics on F1 module on Nexus
7000 switches, use the command show hardware rate-limiter [f1 rl-1
| rl-2 | rl-3 | rl-4 | rl-5].
The Nexus 7000 series switches also enable you to view the rate-limiters
for the SUP bound traffic and its usage. Different modules determine what
exceptions match each rate-limiter. These differences are viewed using the
command show hardware internal forwarding rate-limiter usage
[module slot]. Example 3-60 displays the output of this command, showing
not only the different rate-limiters but also which packet streams or rate-
limiters are handled by either CoPP or the L2 or L3 rate-limiters.
-------------------------+------+------+--------+------+--------+-
-------
Packet streams | CAP1 | CAP2 | DI | CoPP | L3 RL |
L2 RL
-------------------------+------+------+--------+------+--------+-
-------
L3 control (224.0.0.0/24) Yes x sup-
hi x control copy
L2
broadcast x x flood x x strm
-ctl
ARP request Yes x sup-
lo Yes x copy
Mcast direct-con Yes x x Yes m-dircon
copy
ISIS Yes x sup-
lo x x x
L2 non-IP
multicast x x x x x x
Access-list log x Yes acl-
log x x acl-log
L3 unicast control x x sup-
hi Yes x receive
L2
control x x x x x x
Glean x x sup-
lo x x glean
Port-security x x port-
sec x x port-sec
IGMP-Snoop x x m-
snoop x x m-snoop
-------------------------+------+------+--------+------+--------+-
-------
Exceptions | CAP1 | CAP2 | DI | CoPP | L3 RL |
L2 RL
-------------------------+------+------+--------+------+--------+-
-------
IPv4 header
options 0 0 x Yes x
FIB TCAM no
route 0 0 x Yes x
Same interface
check 0 0 x x ttl x
IPv6 scope check
fail 0 0 drop x x
Unicast RPF more
fail 0 0 drop x x
Unicast RPF
fail 0 0 drop Yes x
Multicast RPF
fail 0 0 drop x x
Multicast DF
fail 0 0 drop x x
TTL
expiry 0 0 x x ttl x
Drop 0 0 drop x
x
L3 ACL
deny 0 0 drop x x
L2 ACL
deny 0 0 drop x x
IPv6 header
options 0 0 drop Yes x
MTU
fail 0 0 x x mtu x
DHCP ACL
redirect 0 0 x Yes mtu x
ARP ACL
redirect 0 0 x Yes mtu x
Smac IP check
fail 0 0 x x mtu x
Hardware
drop 0 0 drop x x
Software
drop 0 0 drop x x
Unsupported
RW 0 0 x x ttl x
Invalid
packet 0 0 drop x x
L3 proto filter
fail 0 0 drop x x
Netflow
error 0 0 drop x x
Stale adjacency
error 0 0 x x ttl x
Result-bus
drop 0 0 drop x x
Policer
drop 0 0 x x x
Information about specific exceptions is seen using the command show
hardware internal forwarding l3 asic exceptions exception detail
[module slot].
The configuration settings for both l2 and l3 ASIC rate-limiters are viewed
using the command show hardware internal forwarding [l2 | l3] asic
rate-limiter rl-name detail [module slot], where the rl-name variable is
the name of the rate-limiter. Example 3-61 displays the output for L3 ASIC
exceptions, as well as the L2 and L3 rate-limiters. The first output shows
the configuration and statistics for packets that fail the RPF check. The
second and third outputs show the rate-limiter and exception configuration
for packets that fail the MTU check.
Match fields:
Cap1 bit: 0
Cap2 bit: 0
DI select: 0
DI: 0
Flood bit: 0
class copp-system-p-class-management
set cos 2
police cir 10000 kbps bc 250 ms conform transmit violate drop
class copp-system-p-class-normal
set cos 1
police cir 680 kbps bc 250 ms conform transmit violate drop
class copp-system-p-class-exception
set cos 1
police cir 360 kbps bc 250 ms conform transmit violate drop
class copp-system-p-class-monitoring
set cos 1
police cir 130 kbps bc 1000 ms conform transmit violate drop
class class-default
set cos 0
police cir 100 kbps bc 250 ms conform transmit violate drop
To view the differences in the different CoPP profiles, use the command
show copp diff profile profile-type profile profile-type. The command
displays the policy-map configuration differences of both specified
profiles.
Note
Starting with NX-OS Release 6.2(2), the copp-system-p-class-
multicast-router, copp-system-p-class-multicast-host, and copp-
system-p-class-normal classes were added for multicast traffic. Before
Release 6.2(2), this was achieved through custom user configuration.
Both HWRL and CoPP are done at the forwarding engine (FE) level. An
aggregate amount of traffic from multiple FEs can still overwhelm the
CPU. Thus, both the HWRL and CoPP are best-effort approaches. Another
important point to keep in mind is that the CoPP policy should not be too
aggressive; it also should be designed based on the network design and
configuration. For example, if the rate at which routing protocol packets
are hitting the CoPP policy is more than the policed rate, even the
legitimate sessions can be dropped and protocol flaps can be seen. If the
predefined CoPP policies must be modified, create a custom CoPP policy
by copying a preclassified CoPP policy and then edit the new custom
policy. None of the predefined CoPP profiles can be edited. Additionally,
the CoPP policies are hidden from the show running-config output. The
CoPP policies are viewed from the show running-config all or show
running-config copp all commands. Example 3-63 shows how to use the
CoPP policy configuration and create a custom strict policy.
Example 3-63 Viewing a CoPP Policy and Creating a Custom CoPP Policy
Click here to view code image
One problem that is faced with the access lists part of the CoPP policy is
that the statistics per-entry command is not supported for IP and MAC
access control lists (ACL); thus, it has no effect when applied under the
ACLs. To view the CoPP policy–referenced IP and MAC ACL counters on
an input/output (I/O) module, use the command show system internal
access-list input entries detail. Example 3-65 displays the output of the
command show system internal access-list input entries detail, showing
the hits on the MAC ACL for the FabricPath MAC address 0180.c200.0041.
Linecard Configuration:
-----------------------
Scale Factors
Module 1: 1.00
Module 2: 1.00
Module 3: 0.50
Module 4: 1.00
Module 5: 1.00
Module 6: 1.00
Module 7: 1.00
Module 8: 1.00
Module 9: 1.00
Note
Refer to the CCO documentation for the appropriate scale factor
recommendation for the appropriate Nexus 7000 chassis.
A few best practices need to be kept in mind for NX-OS CoPP policy
configuration:
Use the strict CoPP profile.
Use the copp profile strict command after each NX-OS upgrade, or at
least after each major NX-OS upgrade. If a CoPP policy modification
was previously done, it must be reapplied after the upgrade.
The dense CoPP profile is recommended when the chassis is fully
loaded with F2 series Modules or loaded with more F2 series modules
than any other I/O modules.
Disabling CoPP is not recommended. Tune the default CoPP, as
needed.
Monitor unintended drops, and add or modify the default CoPP policy
in accordance with the expected traffic.
Because traffic patterns constantly change in a data center, customization
of CoPP is a constant process.
MTU Settings
The MTU settings on a Nexus platform work differently than on other
Cisco platforms. Two kinds of MTU settings exist: Layer 2 (L2) MTU and
Layer 3 (L3) MTU. The L3 MTU is manually configured under the
interface using the mtu value command. On the other hand, the L2 MTU is
configured either through the network QoS policy or by setting the MTU on
the interface itself on the Nexus switches that support per-port MTU. The
L2 MTU settings are defined under the network-qos policy type, which is
then applied under the system qos policy configuration. Example 3-68
displays the sample configuration to enable jumbo L2 MTU on the Nexus
platforms.
Note
Not all platforms support jumbo L2 MTU at the port level. The port-
level L2 MTU configuration is supported only on the Nexus 7000,
7700, 9300, and 9500 platforms. All the other platforms (such as
Nexus 3048, 3064, 3100, 3500, 5000, 5500, and 6000) support only
network QoS policy-based jumbo L2 MTU settings.
The MTU settings on the Nexus 3000, 7000, 7700, and 9000 (platforms that
support per-port MTU settings) can be viewed using the command show
interface interface-type x/y. On the Nexus 3100, 3500, 5000, 5500, and
6000 (platforms supporting network QoS policy-based MTU settings),
these are verified using the command show queuing interface interface-
type x/y.
Note
Beginning with NX-OS Version 6.2, the per-port MTU configuration
on FEX ports is not supported on Nexus 7000 switches. A custom
network QoS policy is required to configure these (see Example 3-69).
In NX-OS, the Ethernet Port Manager (ethpm) process manages the port-
level MTU configuration. The MTU information under the ethpm process
is verified using the command show system internal ethpm info interface
interface-type x/y (see Example 3-71).
The MTU settings also can be verified on the Earl Lif Table Manager
(ELTM) process, which maintains Ethernet state information. The ELTM
process also takes care of managing the logical interfaces, such as switch
virtual interfaces (SVI). To verify the MTU settings under the ELTM
process on a particular interface, use the command show system internal
eltm info interface interface-type x/y (see Example 3-72).
Note
If MTU issues arise across multiple devices or a software issue is
noticed with the ethpm process or MTU settings, capture the show
tech-support ethpm and show tech-support eltm [detail] output in a
file and open a TAC case for further investigation.
Summary
This chapter focused on troubleshooting various hardware- and software-
related problems on Nexus platforms. From the hardware troubleshooting
perspective, this chapter covered the following topics:
GOLD tests
Line card and process crashes
Packet loss and platform errors
Interface errors and drops
Troubleshooting for Fabric Extenders
This chapter detailed how VDCs work and explored how to troubleshoot
any issues with the same. Various issues arise with a combination of
modules within a VDC. This chapter also demonstrated how to limit the
resources on a VDC and deeply covered various NX-OS components, such
as Netstack, UFDM and IPFIB, EthPM, and Port-Client. Finally, the chapter
addressed CoPP and how to troubleshoot for any drops in the CoPP policy,
including how to fix any MTU issues on the Ethernet and FEX ports.
References
Cisco, Cisco Nexus 7000 Series: Configuring Online Diagnostics,
http://www.cisco.com.
Cisco, Cisco Nexus Fabric Extenders, http://www.cisco.com.
Cisco, Cisco Nexus 7000 Series: Virtual Device Context Configuration
Guide, http://www.cisco.com.
Part II
Troubleshooting Layer 2
Forwarding
Chapter 4
Nexus Switching
When Cisco launched the Nexus product line, it introduced a new category
of networking devices called data center switching. Data center switching
products provide high-density, high-speed switching capacity to serve the
needs of the servers (physical and virtual) in the data center. This chapter
focuses on the core components of network switching and how to verify
which components are working properly to isolate and troubleshoot Layer 2
forwarding issues.
Network Layer 2 Communication Overview
The Ethernet protocol first used technologies such as Thinnet (10Base2) or
Thicknet (10Base5), which connected all the network devices via the same
cable. This caused problems when two devices tried to talk at the same
time, because Ethernet devices use Carrier Sense Multiple Access/Collision
Detect (CSMA/CD) to ensure that only one device talked at time in a
collision domain. If a device detected that another device was transmitting
data, it delayed transmitting packets until the cable was quiet.
As more devices were added to a cable, the less efficient the network
became. All these devices were in the same collision domain (CD).
Network hubs proliferated the problem because they added port density
while repeating traffic. Network hubs do not have any intelligence in them
to direct network traffic.
Network switches enhance scalability and stability in a network through the
creation of virtual channels. Switches maintain a table that associate a
host’s MAC Ethernet addresses to the port that sourced the network traffic.
Instead of flooding all traffic out of every port, a switch uses the MAC
address table to forward network traffic only to the destination port
associated to the destination MAC address of the packet. Packets are
forwarded out of all network ports for that LAN only if the destination
MAC address is not known on the switch (known as unicast flooding).
Network broadcasts (MAC Address: ff:ff:ff:ff:ff:ff) cause the switch to
broadcast the packet out of every LAN switch port interface. This is
disruptive because it diminishes the efficiencies of a network switch to
those of a hub because it causes communication between network devices
to stop because of CSMA/CD. Network broadcasts do not cross Layer 3
boundaries (that is, from one subnet to another subnet). All devices that
reside in the same Layer 2 (L2) segment are considered to be in the same
broadcast domain.
Figure 4-1 displays PC-A’s broadcast traffic that is being advertised to all
devices on that network, which include PC-B, PC-C, and R1. R1 does not
forward the broadcast traffic from one broadcast domain (192.168.1.0/24)
to the other broadcast domain (192.168.2.0/24).
Figure 4-1 Broadcast Domains
The local MAC address table contains the list of MAC addressees and the
ports that those MAC addresses learned. The MAC address table is
displayed with the command show mac address-table [address mac-
address]. To ensure that the switch hardware ASICS are programmed
correctly, the hardware MAC address table is displayed with the command
show hardware mac address-table module [dynamic] [address mac-
address].
Example 4-1 displays the MAC address table on a Nexus switch. Locating
the switch port the network device is attached to is the first step of
troubleshooting L2 forwarding. If multiple MAC addresses appear on the
same port, it indicates that a switch is connected to that port, and that
connecting to the switch may be required as part of the troubleshooting
processs to identify the port the network device is attached to.
Note
The terms network device and hosts are considered interchangeable in
this text.
Virtual LANs
Adding a router between LAN segments helps shrink broadcast domains
and provides for optimal network communication. Host placement on a
LAN segment varies because of network addressing. This could lead to
inefficient usage of hardware because some switch ports could go unused.
Virtual LANs (VLAN) provide a logical segmentation by creating multiple
broadcast domains on the same network switch. VLANs provide higher
utilization of switch ports because a port could be associated to the
necessary broadcast domain, and multiple broadcast domains can reside on
the same switch. Network devices in one VLAN cannot communicate with
devices in a different VLAN via traditional L2 or broadcast traffic.
VLANs are defined in the Institute of Electrical and Electronics Engineers
(IEEE) 802.1Q standard, which states that 32 bits are added to the packet
header and are composed of the following:
Tag protocol identifier (TPID): 16-bit field set to 0x8100 to identify
the packet as an 802.1Q.
Priority code point (PCP): A 3-bit field to indicate a class of service
(CoS) as part of Layer 2 quality of service (QoS) between switches.
Drop Eligible Indicator (DEI): A 1-bit field that indicates if the
packet can be dropped when there is bandwidth contention.
VLAN identifier (VID): A 12-bit field that specifies the VLAN
associated to a network packet.
Figure 4-2 displays the VLAN packet structure.
Figure 4-2 VLAN Packet Structure
The VLAN identifier has only 12 bits, which provide 4094 unique VLANs.
NX-OS uses the following logic for VLAN identifiers:
VLAN 0 is reserved for 802.1P traffic and cannot be modified or
deleted.
VLAN 1 is the default VLAN and cannot be modified or deleted.
VLANs 2 to 1005 are in the normal VLAN range and can be added,
deleted, or modified as necessary.
VLANs 1006 to 3967 and 4048 to 4093 are in the extended VLAN
range and can be added, deleted, or modified as necessary.
VLANs 3968 to 4047 and 4094 are considered internal VLANs and are
used internally by NX-OS. These cannot be added, deleted, or
modified.
VLANs 4095 is reserved by 802.1Q standards and cannot be used.
VLAN Creation
VLANs are created by using the global configuration command vlan vlan-
id. A friendly name (32 characters) is associated to the VLAN by using the
VLAN submode configuration command name name. The VLAN is not
created until the CLI has been moved back to the global configuration
context or a different VLAN identifier. Example 4-2 demonstrates the
creation of VLAN 10 (Accounting), VLAN 20 (HR), and VLAN 30
(Security) on NX-1.
NX-1(config)# vlan 10
NX-1(config-vlan)# name Accounting
NX-1(config-vlan)# vlan 20
NX-1(config-vlan)# name HR
NX-1(config-vlan)# vlan 30
NX-1(config-vlan)# name Security
VLANs and their port assignment are verified with the show vlan [id vlan-
id] command, as demonstrated in Example 4-3. The output is reduced to a
specific VLAN by using the optional id keyword. Notice that the output is
broken into three separate areas: Traditional VLANs, Remote Switched
Port Analyzer (RSPAN) VLANs, and Private VLANs.
Note
Most engineers assume that a VLAN maintains a one-to-one ratio of
subnet-to-VLAN. Multiple subnets can exist in the same VLAN by
assigning a secondary IP address to a router’s interface or by
connecting multiple routers to the same VLAN. In situations like this,
both subnets are part of the same broadcast domain.
Access Ports
Access ports are the fundamental building block of a managed switch. An
access port is assigned to only one VLAN. It carries traffic from the VLAN
to the device connected to it, or from the device to other devices on the
same VLAN on that switch.
NX-OS places a L2 switch port as an access port by default. The port is
configured as an access port with the command switchport mode access. A
specific VLAN is associated to the port with the command switchport
access vlan vlan-id. If the VLAN is not specified, it defaults to VLAN 1.
The 802.1Q tags are not included on packets transmitted or received on
access ports.
The switchport mode access command does not appear when looking at
the traditional running configuration and requires the optional all keyword,
as shown in Example 4-4.
The command show interface interface-id displays the mode that the port
is using. The assigned VLAN for the port is viewed with the show vlan
command, as shown earlier in Example 4-2, or with show interface status.
Example 4-5 demonstrates the verification of an access port and the
associated VLAN. It is important to verify that both hosts must be on the
same VLAN for L2 forwarding to work properly.
------------------------------------------------------------------
--------------
Port Name Status Vlan Duplex Speed
Type
------------------------------------------------------------------
--------------
mgmt0 -- connected
routed full 1000 --
Eth1/1 -- connected
trunk full 1000 10g
Eth1/2 -- connected
10 full 1000 10g
Trunk Ports
Trunk ports can carry multiple VLANs across them. Trunk ports are
typically used when multiple VLANs need connectivity between a switch
and another switch, router, or firewall. VLANs are identified by including
the 802.1Q headers in the packets as the packet is transmitted across the
link. The headers are examined upon the receipt of the packet, associated to
the proper VLAN, and then removed.
Trunk ports must be statically defined on Nexus switches with the interface
command switchport mode trunk. Example 4-6 displays Eth1/1 being
converted to a trunk port.
NX-1# config t
Enter configuration commands, one per line. End with CNTL/Z.
NX-1(config)# int eth1/1
NX-1(config-if)# switchport mode trunk
NX-1# show interface eth1/1 | include Port
Port mode is trunk
------------------------------------------------------------------
--------------
! Section 2 displays all of the VLANs that are allowed to be
transmitted across
! the trunk port
Port Vlans Allowed on Trunk
------------------------------------------------------------------
--------------
Eth1/1 1-4094
------------------------------------------------------------------
--------------
! Section 3 displays ports that are disabled due to an error.
Port Vlans Err-disabled on Trunk
------------------------------------------------------------------
--------------
Eth1/1 none
------------------------------------------------------------------
--------------
! Section 4 displays all of the VLANs that are allowed across the
trunk and are
! in a spanning tree forwarding state
Port STP Forwarding
------------------------------------------------------------------
--------------
Eth1/1 1,10,20,30,99
------------------------------------------------------------------
--------------
Port Vlans in spanning tree forwarding state and not
pruned
------------------------------------------------------------------
--------------
Feature VTP is not enabled
Eth1/1 1,10,20,30,99
Native VLANs
Traffic on a trunk port’s native VLAN does not include the 802.1Q tags.
The native VLAN is a port-specific configuration and is changed with the
interface command switchport trunk native vlan vlan-id.
The native VLAN should match on both ports, or traffic can change
VLANs. Although connectivity between hosts is feasible (assuming that
they are on the different VLAN numbers), this causes confusion for most
network engineers and is not a best practice.
Note
All switch control-plane traffic is advertised using VLAN 1. As part of
Cisco’s security hardening guide, it is recommended to change the
native VLAN to something other than VLAN 1. More specifically, it
should be set to a VLAN that is not used at all to prevent VLAN
hopping.
Allowed VLANs
As stated earlier, VLANs can be restricted from certain trunk ports as a
method of traffic engineering. This can cause problems if traffic between
two hosts is expected to traverse a trunk link, and the VLAN is not allowed
to traverse that trunk port. The interface command switchport trunk
allowed vlan-ids specifies the VLANs that are allowed to traverse the link.
Example 4-8 displays sample configuration to limit the VLANs that can
cross the Eth1/1 trunk link to 1,10, 30, and 99.
Example 4-8 Viewing the VLANs that Are Allowed on a Trunk Link
Click here to view code image
Private VLANS
Some network designs require segmentation between network devices. This
is easily accomplished by two techniques:
Creating unique subnets for every security domain and restricting
network traffic with an ACL. Using this technique can waste IP
addresses when a host range falls outside of a subnet range (that is, a
security zone with 65 hosts requires /25 and results in wasting 63 IP
addresses; this does not take into consideration the broadcast and
network addresses).
Using private VLANs.
Private VLANs (PVLAN) provide two-tier hierarchy (primary or secondary
VLAN) to restrict traffic between ports from a L2 perspective. An explicit
mapping between the primary VLAN and secondary VLAN is required to
allow communication outside of the PVLAN. Ports are associated into three
categories:
Promiscuous: Ports associated to this VLAN are a primary PVLAN
(the first tier) and are allowed to communicate to all hosts. Typically,
these are ports assigned to a router, firewall, or server that is providing
centralized services (DHCP, DNS, and so on).
Isolated: These ports are in a secondary PVLAN (in the second tier of
the hierarchy) and are allowed to communicate only with ports
associated to the promiscuous PVLAN. Traffic is not transmitted
between ports in the same isolated VLAN.
Community: These ports are in a secondary PVLAN and are allowed
to communicate with other ports in this VLAN and ports associated to
the promiscuous VLAN.
Figure 4-3 demonstrates the usage of PVLANs for a service provider. R1 is
the router for every host in the 10.0.0.0/24 network segment and is
connected with a promiscuous PVLAN. Host-2 and Host-3 are from
different companies and should not be able to communicate with any host.
They should only be able to communicate with R1.
Host-4 and Host-5 are from the same third company and need to talk with
each other along with R1. Host-6 and Host-7 are from the same fourth
company and need to talk with each other along with R1. All other
communication is not allowed.
Table 4-1 displays the communication capability between hosts. Notice that
Host-4 and Host-5 communicate with each other; but cannot communicate
with Host-2, Host-3, Host-6, and Host-7.
Host-3 ✓ X N/A X X X X
Host-4 ✓ X X N/A ✓ X X
Host-5 ✓ X X ✓ N/A X X
Host-6 ✓ X X X X N/A ✓
Host-7 ✓ X X X X ✓ N/A
Isolated Private VLANs
Isolated PVLANs allow communication only with promiscuous ports;
thereby, only one isolated PVLAN is needed for one L3 domain. The
process for deploying isolated PVLANs on a Nexus switch is as follows:
Step 1. Enable the private VLAN feature. Enable the PVLAN feature with
the command feature private-vlan in the global configuration
mode.
Step 2. Define the isolated PVLAN. Create the isolated PVLAN with the
command vlan vlan-id. Underneath the VLAN configuration
context, identify the VLAN as an isolated PVLAN with the
command private-vlan isolated.
Step 3. Define the promiscuous PVLAN. Create the promiscuous PVLAN
with the command vlan vlan-id. Underneath the VLAN
configuration context, identify the VLAN as a promiscuous PVLAN
with the command private-vlan primary.
Step 4. Associate the isolated PVLAN to the promiscuous PVLAN.
Underneath the promiscuous PVLAN configuration context,
associate the secondary (isolated or community) PVLANs with the
command private-vlan secondary-pvlan-id. If multiple secondary
PVLANs are used, delineate with the use of a comma.
Step 5. Configure the switchport(s) for the promiscuous PVLAN. Change
the configuration context to the switch port for the promiscuous
host with the command interface interface-id. Change the switch
port mode to promiscuous PVLAN with the command switchport
mode private-vlan promiscuous.
The switch port must then be associated to the promiscuous
PVLAN with the command switchport access vlan promiscuous-
vlan-id. A mapping between the promiscuous PVLAN and any
secondary PVLANs must be performed using the command
switchport private-vlan mapping promiscuous-vlan-id secondary-
pvlan-vlan-id. If multiple secondary PVLANs are used, delineate
with the use of a comma.
Step 6. Configure the switchport(s) for the isolated PVLAN. Change the
configuration context to the switch port for the isolated host with
the command interface interface-id. Change the switch port mode
to the secondary PVLAN type with the command switchport mode
private-vlan host.
The switch port must then be associated to the promiscuous
PVLAN with the command switchport access vlan isolated-vlan-
id. A mapping between the promiscuous PVLAN and the isolated
PVLAN must be performed using the command switchport
private-vlan mapping host-association promiscuous-vlan-id
isolated-pvlan-vlan-id.
Example 4-9 displays the deployment of VLAN 20 as an isolated PVLAN
on NX-1, according to Figure 4-3. VLAN 10 is the promiscuous PVLAN.
! Notice how there are not any ports listed in the regular VLAN
section because
! they are all in the PVLAN section.
VLAN Name Status Ports
---- -------------------------------- --------- ------------------
-------------
1 default active Eth1/4, Eth1/5,
Eth1/6, Eth1/7
10 PVLAN-PROMISCOUS active
20 PVLAN-ISOLATED active
..
Primary Secondary Type Ports
------- --------- --------------- -----------------------------
-------------- 10 20 isolated Eth1/1,
Eth1/2, Eth1/3
Note
An isolated or community VLAN can be associated with only one
primary VLAN.
PVLAN ports require a different port type and are set by the switchport
mode private-vlan {promiscuous | host} command. This setting is
verified by examining the interface using the show interface command.
Example 4-11 displays the verification of the PVLAN switch port type
setting.
Another technique is to verify that the isolated PVLAN host devices can
reach the promiscuous host device. This is achieved with a simple ping
test, as shown in Example 4-12.
Note
VLAN 20 was a part of the promiscuous port configuration to
demonstrate how isolated and community PVLANs co-exist as a
continuation of the previous configuration to provide the solution
shown in Figure 4-3.
Example 4-15 provides basic verification that all hosts in the isolated and
community PVLANs can reach R1. All hosts are not allowed to reach any
other host in the isolated PVLAN, whereas hosts in community PVLANs
can only reach hosts in the same community PVLAN.
! Verification that both hosts can ping other hosts in the same
community PVLAN
Host-4# ping 10.0.0.5
Sending 5, 100-byte ICMP Echos to 10.0.0.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/9
ms
vlan 10
name PVLAN-PROMISCOUS
private-vlan primary
private-vlan association 20,30,40
vlan 20
name PVLAN-ISOLATED
private-vlan isolated
vlan 30
name PVLAN-COMMUNITY1
private-vlan community
vlan 40
name PVLAN-COMMUNITY2
private-vlan community
Example 4-18 demonstrates the connectivity between the hosts with the
promiscuous PVLAN SVI. The two promiscuous devices (NX-1 and R1)
can ping each other. In addition, all the hosts (demonstrated by Host-2)
ping both NX-1 and R1 without impacting the PVLAN functionality
assigned to isolated or community PVLAN ports.
! host (R1)
NX-1# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1): 56 data bytes64 bytes from 10.0.0.1:
icmp_seq=0 ttl=254 time=2.608 ms
64 bytes from 10.0.0.1: icmp_seq=1 ttl=254 time=2.069 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=254 time=2.241 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=254 time=2.157 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=254 time=2.283 ms
Note
In all three scenarios, regular VLANs are transmitted across the trunk
link.
Note
Not all Nexus platforms support the promiscuous or isolated PVLAN
trunk ports. Check www.cisco.com for feature parity.
Note
A switch tries to establish an RSTP handshake with the device
connected to the port. If a handshake does not occur, the other device
is assumed to be non-RSTP compatible, and the port defaults to
regular 802.1D behavior. This means that host devices such as
computers and printers still encounter a significant transmission delay
(~50 seconds) after the network link is established.
Note
RSTP is enabled by default for any L2 switch port with a basic
configuration. Additional configuration can be applied to the switch to
further tune RSTP.
Note
Generally, older switches have a lower MAC address and are
considered more preferable. Configuration changes can be made for
optimizing placement of the root switch in a L2 topology.
Figure 4-4 provides a simple topology to demonstrate some important
spanning-tree concepts. In this topology, NX-1, NX-2, NX-3, NX-4, and
NX-5 all connect to each other. The configuration on all the switches do not
include any customizations for Spanning Tree Protocol, and the focus is
primarily on VLAN 1, but VLANs 10, 20, and 30 exist in the topology. NX-
1 has been identified as the root bridge because its system MAC address
(5e00.4000.0007) is the lowest in the topology.
The root bridge is identified with the command show spanning-tree root.
Example 4-19 demonstrates the command being executed on NX-1. The
output includes the VLAN number, root bridge identifier, root path cost,
hello time, max age time, and forwarding delay. Because NX-1 is the root
bridge, all ports are designated, so the Root Port field displays This bridge
is root.
The same command is run on NX-2 and NX-3 with the output displayed in
Example 4-20. The Root ID field is the same as NX-1; however, the root
path cost has changed to 2 because both switches must use the 10 Gbps link
to reach NX-1. Eth 1/1 has been identified on both of these switches as the
root port.
Note
Step 3 is the last step of the selection process. If a switch has multiple
links toward the root switch, the downstream switch always identifies
the RP. All other ports will match the criteria for Step 2 or Step 3 and
are placed into a blocking state.
VLAN0001
Spanning tree enabled protocol rstp
! The section displays the relevant information for the STP Root
Bridge
Root ID Priority 32769
Address 5e00.4000.0007
This bridge is the root
Hello Time 2 sec Max Age 20 sec Forward Delay 15
sec
! The section displays the relevant information for the Local STP
Bridge
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address 5e00.4000.0007
Hello Time 2 sec Max Age 20 sec Forward Delay 15
sec
Note
If the Type field includes *TYPE_Inc –, this indicates a port
configuration mismatch between the Nexus switch and the switch to
which it is connected. It is either the port type, or the port mode
(access versus trunk) is misconfigured.
Example 4-23 displays the Spanning Tree Protocol topology from NX-2
and NX-3. Notice that in the first root bridge section, the output provides
the total root path cost and the port on the switch that is identified as the
RP.
All the ports on NX-2 are in a forwarding state, but port Eth1/2 on NX-3 is
in a blocking (BLK) state. Specifically, that port has been designated as an
alternate port to reach the root in the event that Eth1/1 connection fails.
The reason that NX-3’s Eth1/2 port was placed into a blocking state versus
NX-2’s Eth1/3 port is that NX-2’s system MAC address (5e00.4001.0007)
is lower than NX-3’s system MAC address (5e00.4002.0007). This was
deduced by looking at the Figure 4-4 and the system MAC addresses in the
output.
VLAN0001
Spanning tree enabled protocol rstp
Root
ID Priority 32769 Address 5e00.4000.0007
Cost 2
Port 1 (Ethernet1/1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15
sec
Example 4-24 Viewing VLANs Participating with Spanning Tree Protocol on an Interface
Click here to view code image
Note
The best way to prevent erroneous devices from taking over the root
role is to set the priority to zero on the desired root bridge switch.
Example 4-25 demonstrates NX-1 being set as the root primary and NX-2
being set as the root secondary. Notice on NX-2’s output that it displays the
root system priority, which is different from its system priority.
Note
Notice that the priority on NX-1 is off by one. That is because the
priority in the BPDU packets is the priority plus the value of the Sys-
Id-Ext (which is the VLAN number). So the priority for VLAN 1 is
24,577, and the priority for VLAN 10 is 24,586.
Root Guard
Root guard is a Spanning-Tree Protocol feature that prevents a configured
port from becoming a root port by placing a port in ErrDisabled state if a
superior BPDU is received on a configured port. Root guard prevents a
downstream switch (often misconfigured or rogue) from becoming a root
bridge in a topology.
Root guard is enabled on an interface-by-interface basis with the interface
command spanning-tree guard root. Root guard is placed on designated
ports toward other switches that should never ever become a root bridge. In
the sample topology, root guard should be placed on NX-2’s Eth1/4 port and
NX3-‘s Eth1/5 port. This prevents NX-4 and NX-5 from ever becoming a
root bridge, but still allows for NX-2 to maintain connectivity to NX-1 via
NX-3 in the event that the NX-1 ← → NX-2 link becomes incapacitated.
Modifying Spanning Tree Protocol Root Port and Blocked Switch Port
Locations
The Spanning Tree Protocol port cost is used for calculating the Spanning
Tree Protocol tree. When a switch generates the BPDUs, the total path cost
includes only its calculated metric to the root and does not include the port
cost that the BPDU is advertised out of. The receiving router then adds the
port cost on the interface the BPDU was received in conjunction with the
value of the total path cost in the BPDU.
In Figure 4-4, NX-1 advertises its BPDUs to NX-3 with a total path cost of
zero. NX-3 receives the BPDU and adds its Spanning Tree Protocol port
cost of 2 to the total path cost in the BPDU (zero), resulting in a value of 2.
NX-3 then advertises the BPDU toward NX-5 with a total path cost of 2,
which NX-5 then adds to its ports cost of 2. NX-5 reports a cost of 4 to
reach the root bridge via NX-3. The logic is confirmed in the output of
Example 4-26. Notice that there is not a total path cost in NX-1’s output.
Modify the port priority on NX-4 with the command spanning-tree [vlan
vlan-id] port-priority priority. The optional vlan keyword allows changing
the priority on a VLAN-by-VLAN basis. Example 4-29 displays changing
the port priority on NX-4’s Eth1/5 port to 64, and the impact it has on NX-
5. Notice how NX5’s Eth1/4 port is now the RP.
Example 4-29 Verification of Port Priority Impact on a Spanning Tree Protocol Topology
Click here to view code image
The generation of TCN for hosts does not make sense because they
generally have only one connection to the network. Restricting TCN
creation to only ports that connect with other switches and network devices
increases the L2 network’s stability and efficiency. The Spanning Tree
Protocol portfast feature disables TCN generation for access ports.
Another benefit of the Spanning Tree Protocol portfast feature is that the
access ports bypass the earlier 802.1D Spanning Tree Protocol states
(learning and listening) and forward traffic immediately. This is beneficial
in environments where computers use dynamic host configuration protocol
(DHCP) or preboot execution environment (PXE).
The portfast feature is enabled on a specific port with the command
spanning-tree port type edge, or globally on all access ports with the
command spanning-tree port type edge default.
Example 4-32 demonstrates enabling portfast for NX-1’s Eth1/6 port along
with its verification. Notice how the portfast ports are displayed with Edge
P2P. The last section demonstrates how portfasts are enabled globally for
all access ports.
MST Tuning
MST supports the tuning of port cost and port priority. The interface
configuration command spanning-tree mst instance-number cost cost sets
the interface cost. Example 4-38 demonstrates the configuration of NX-3’s
Eth1/1 port being modified to a cost of 1, and verification of the interface
cost before and after the change.
Note
Some platforms do not display the MAC notifications by default and
require the following additional configuration commands:
logging level spanning-tree 6
logging level fwm 6
logging monitor 6
BPDU Guard
BPDU guard is a safety mechanism that shuts down ports configured with
Spanning Tree Protocol portfast upon receipt of a BPDU. This ensures that
loops cannot accidentally be created if an unauthorized switch is added to a
topology.
BPDU guard is enabled globally on all Spanning Tree Protocol portfast
ports with the command spanning-tree port type edge bpduguard
default. BPDU guard can be enabled or disabled on a specific interface
with the command spanning-tree bpduguard {enable | disable}. Example
4-42 displays the BPDU guard configuration for a specific port or globally
on all access ports. Upon examination of the spanning-tree port details the
by default keyword indicates that the global configuration is what applied
BPDU guard to that port.
Note
BPDU guard should be configured on all host facing ports. However,
do not enable BPDU guard on PVLAN promiscuous ports.
By default, ports that are put in ErrDisabled because of BPDU guard do not
automatically restore themselves. The Error Recovery service can be used
to reactivate ports that are shut down for a specific problem, thereby
reducing administrative overhead. The Error Recovery service recovers
ports shutdown from BPDU guard with the command errdisable recovery
cause bpduguard. The period that the Error Recovery checks for ports is
configured with the command errdisable recovery interval time-seconds.
Example 4-43 demonstrates the configuration of the Error Recovery service
for BPDU guard and Error Recovery in action.
BPDU Filter
BPDU filter quite simply blocks BPDUs from being transmitted out of a
port. BPDU filter can be enabled globally or on a specific interface. The
behavior changes depending upon the configuration:
If BPDU filter is enabled globally with the command spanning-tree
port type edge bpdufilter enable, the port sends a series of at least 10
BPDUs. If the remote port has BPDU guard on it, that generally shuts
down the port as a loop prevention mechanism.
If BPDU filter is enabled on a specific interface with the command
spanning-tree bpdufilter enable, the port does not send any BPDUs
on an ongoing basis. However, the switch sends a series of at least 10
BPDUs when a port first becomes active. If the remote port has BPDU
guard on it, that generally shuts down the port as a loop prevention
mechanism.
Note
Be careful with the deployment of BPDU filter because it could cause
problems. Most network designs do not require BPDU filter, and the
use of BPDU filter adds an unnecessary level of complexity while
introducing risk.
Example 4-44 verifies the BPDU filter was enabled globally on the Eth1/1
interface. This configuration sends the 10 BPDUs when the port first
becomes active.
Placing BPDU filter on NX-2’s Eth1/1 port that connects to the NX-1 (the
root bridge) triggers loop guard. This is demonstrated in Example 4-46.
UDLD must be enabled on the remote switch as well. Once configured, the
status of UDLD for an interface is checked using the command show udld
interface-id. Example 4-49 displays the output of UDLD status for an
interface. The output contains the current state, Device-IDs (Serial
Numbers), originating interface-IDs, and return interface-IDs.
Interface Ethernet1/49
--------------------------------
Port enable administrative configuration setting: enabled
Port enable operational state: enabled
Current bidirectional state: bidirectional
Current operational state: advertisement - Single neighbor
detected
Message interval: 15
Timeout interval: 5
Entry 1
----------------
Expiration time: 35
Cache Device index: 1
Current neighbor state: bidirectional
Device ID: FDO1348R0VM
Port ID: Eth1/2
Neighbor echo 1 devices: FOC1813R0C
Neighbor echo 1 port: Ethernet1/1
Message interval: 15
Timeout interval: 5
CDP Device name: NX-2
After a UDLD failure, the interface state indicates that the port is down
because of UDLD failure, as shown in Example 4-50.
There are two common UDLD failures, which are described in the
following sections:
Empty Echo
Tx-Rx Loop
Empty Echo
The Empty echo UDLD problem occurs in the following circumstances:
The UDLD session times out.
The remote switch does not process the UDLD packets.
The local switch does not transmit the UDLD packets.
Example 4-52 demonstrates the syslog messages that appear with a UDLD
Empty Echo Detection.
Tx-Rx Loop
This condition occurs when a UDLD frame appears to be received on the
same port that it was advertised on. This means that the system-ID and
port-ID in the received UDLD packet match the system-ID and port-ID on
the receiving switch (that is, what was transmitted by the other switch). The
Tx-Rx loop occurs in the following circumstances:
A misconfiguration or incorrect wiring in an intermediate device
(optical transport)
Incorrect wiring or media problem
Example 4-53 demonstrates the syslog messages that appear with a UDLD
Empty Echo Detection.
Bridge Assurance
Bridge assurance overcomes some of the limitations that loop guard and
UDLD are affected by. Bridge assurance works on Spanning Tree Protocol
designated ports (which loop guard cannot) and overcomes issues when a
port starts off in a unidirectional state. Bridge assurance makes Spanning
Tree Protocol operate like a routing protocol (EIGRP/OSPF, and so on)
where it requires health-check packets to occur bidirectionally.
The bridge assurance process is enabled by default, but requires that the
trunk ports are explicitly configured with the command spanning-tree port
type network. Example 4-54 demonstrates bridge assurance being
configured on the interfaces connecting NX-1, NX-2, and NX-3 with each
other.
Example 4-55 displays the Spanning Tree Protocol port type after
configuring bridge assurance. Notice how the Network keyword has been
added to the P2P type.
Example 4-55 Viewing the Spanning Tree Protocol Type of Ports with Bridge Assurance
Click here to view code image
The Spanning Tree Protocol port types now include the comment
*BA_Inc*, which refers to the fact that those interfaces are now in an
inconsistent port state for bridge assurance. Example 4-57 displays the new
interface port types.
And upon removal of the BPDU filter, bridge assurance disengages and
returns the port to a forwarding state, as shown in Example 4-59.
Note
Bridge assurance is the preferred method for detection of
unidirectional links protection and should be used when all platforms
support it.
Summary
This chapter provided a brief review of the Ethernet communication
standards and the benefits that a managed switch provides to L2 topology.
Troubleshooting L2 forwarding issues are composed of many components.
The first step in troubleshooting L2 forwarding is to identify both the
source and destination switch ports. From there it is best to follow the
flowchart in Figure 4-5 for troubleshooting. Depending upon the outcome,
the flowchart will redirect you back to the appropriate section in this
chapter.
Figure 4-5 Flowchart for Troubleshooting L2 Forwarding Issues
References
Fuller, Ron, David Jansen, and Matthew McPherson. NX-OS and Cisco
Nexus Switching. Indianapolis: Cisco Press, 2013.
Cisco. Cisco NX-OS Software Configuration Guides, www.cisco.com.
Chapter 5
Proper network design takes into account single points of failure by ensuring
that alternate paths and devices can forward traffic in case of failure. Routing
protocols make sure that redundant paths can still be consumed because of
equal-cost multipath (ECMP). However, Spanning Tree Protocol (STP) stops
forwarding on redundant links between switches to prevent forwarding loops.
Although STP is beneficial, it limits the amount of bandwidth to be achieved
between switches. Port-channels provide a way to combine multiple physical
links into a virtual link to increase bandwidth because all the member interfaces
can forward network traffic. This chapter explains port-channel operations and
the techniques to troubleshoot port-channels when they do not operate as
intended.
Port-Channels
Port-channels are a logical link that consists of one or multiple physical member
links. Port-channels are defined in the IEEE 803.3AD Link Aggregation
Specification and are sometimes referred to as EtherChannels. The physical
interfaces that are used to assemble the logical port-channel are called member
interfaces. Port-channels are either Layer 2 (L2) switching or Layer 3 (L3)
routing.
Figure 5-1 visualizes some of the key components of a port-channel (member
interface and logical interface), along with the advantages it provides over
individual links. A primary advantage of using port-channels is the reduction of
topology changes when a member link is added or removed for a port-channel.
A change might trigger an L2 STP tree calculation or L3 SPF calculation, but
forwarding still occurs between the devices in the port-channel.
Note
The LACP packets in Step 7 happen independently of other switches,
assuming that the requirements are met.
Figure 5-3 demonstrates the exchange of LACP messages between NX-1 (source
switch) and NX-2 (destination switch).
Figure 5-3 LACP Negotiation
Note
This process occurs on every member link when it joins a port-channel
interface.
Basic Port-Channel Configuration
Port-channels are configured by going into interface configuration mode for the
member interfaces and then assigning them to a port-channel and statically
setting them to “on,” or with LACP dynamic negotiation. LACP operates with
two modes:
Passive: An interface does not initiate a port-channel to be established and
does not transmit LACP packets out of it. If the remote switch receives an
LACP packet, this interface responds and then establishes an LACP
adjacency. If both devices are LACP passive, no LACP adjacency forms.
Active: An interface tries to initiate a port-channel establishment and
transmits LACP packets out of it. Active LACP interfaces can establish an
LACP adjacency only if the remote interface is configured to active or
passive.
The LACP feature must first be enabled with the global command feature lacp.
Then the interface parameter command channel-group portchannel-number
mode {on | active | passive} converts a regular interface into a member
interface.
Example 5-1 demonstrates the configuration port-channel 1 using the member
interfaces Eth1/1 and Eth1/2. Notice that the port-channel is configured as a
trunk interface, not as the individual member interfaces.
When viewing the output of the show port-channel summary command, check
the port-channel status, which is listed below the port-channel interface. The
status should be “U,” as in Example 5-2.
Examine Table 5-1 to understand the port-channel flags.
Table 5-1 Logical Port-Channel Interface Status Fields
FieldDescription
U The port-channel interface is working properly.
D The port-channel interface is down.
M The port-channel interface has successfully established at least one LACP
adjacency. However, the port-channel is configured to have a minimum
number of active interfaces that exceeds the number of active
participating member interfaces. Traffic does not forward across this port-
channel. The command lacp min-links number-member-interfaces is
configured on the port-channel interface.
S The port-channel interface is configured for Layer 2 (L2) switching.
R The port-channel interface is configured for Layer 3 (L3) routing.
Table 5-2 briefly explains the fields related to the member interfaces.
Table 5-2 Port-Channel Member Interface Status Fields
FieldDescription
P The interface is actively participating and forwarding traffic for this port-
channel.
H The port-channel is configured with the maximum number of active
interfaces. This interface is participating with LACP with the remote peer,
but it is acting as a hot-standby and does not forward traffic. The
command lacp max-bundle number-member-interfaces is configured on
the port-channel interface.
I The member interface is treated as an individual and does not detect any
LACP activity on this interface.
w This field indicates the time left to receive a packet from this neighbor to
ensure that it is still alive.
s The member interface is in a suspended state.
r The switch module associated with this interface has been removed from
the chassis.
The logical interface is viewed with the command show interface port-channel
port-channel-id. The output includes data fields that are typically displayed with
a traditional Ethernet interface, with the exception of the member interfaces and
the fact that the bandwidth reflects the combined throughput of all active
member interfaces. As this changes, factors such as QoS policies and interface
costs for routing protocols adjust accordingly.
Example 5-3 displays the use of the command on NX-1. Notice that the
bandwidth is 20 Gbps and correlates to the two 10-Gbps interfaces in the port-
channel interface.
-----------------------------------------------------------------------
-------
LACPDUs Markers/Resp
LACPDUs
Port Sent Recv Recv
Sent Pkts Err
-----------------------------------------------------------------------
-------
port-channel1
Ethernet1/1 5753 5660 0 0
0
Ethernet1/2 5319 0 0 0
0
-----------------------------------------------------------------------
-------
LACPDUs Markers/Resp
LACPDUs
Port Sent Recv Recv
Sent Pkts Err
-----------------------------------------------------------------------
-------
port-channel1
Ethernet1/1 5755 5662 0 0
0
Ethernet1/2 5321 0 0 0
0
Another method involves using the command show lacp internal info interface
interface-id. This command includes a time stamp for the last time a packet was
transmitted or received out of an interface. Example 5-5 demonstrates the use of
this command.
Partner's information
Partner Partner Partner
Port System ID Port Number Age Flags
Eth1/2 32768,18-9c-5d-11-99-800x139 985 SA
Note
Use the LACP system identifier to verify that the member interfaces are
connected to the same device and are not split between devices. The local
LACP system-ID is viewed using the command show lacp system-
identifier.
The NX-OS Ethanalyzer tool is used to view the LACP packets being
transmitted and received on the local Nexus switch by capturing packets with
the LACP MAC destination address. The command ethanalyzer local interface
inband capture-filter "ether host 0180.c200.0002" [detail] captures LACP
packets that are received. The optional detail keyword provides additional
information. Example 5-7 demonstrates the technique.
Capturing on inband
2017-10-23 03:58:11.213625 88:5a:92:de:61:58 -> 01:80:c2:00:00:02 LACP
Link Aggr
egation Control Protocol
2017-10-23 03:58:11.869668 88:5a:92:de:61:59 -> 01:80:c2:00:00:02 LACP
Link Aggr
egation Control Protocol
2017-10-23 03:58:23.381249 00:62:ec:9d:c5:1c -> 01:80:c2:00:00:02 LACP
Link Aggr
egation Control Protocol
2017-10-23 03:58:24.262746 00:62:ec:9d:c5:1b -> 01:80:c2:00:00:02 LACP
Link Aggr
egation Control Protocol
2017-10-23 03:58:41.218262 88:5a:92:de:61:58 -> 01:80:c2:00:00:02 LACP
Link Aggr
egation Control Protocol
Note
The minimum number of port-channel member interfaces does not need to
be configured on both devices to work properly. However, configuring it on
both switches is recommended to accelerate troubleshooting and assist
operational staff.
NX-1# configuration t
Enter configuration commands, one per line. End with CNTL/Z.
NX-1(config)# lacp system-priority 1
LACP Fast
The original LACP standards sent out LACP packets every 30 seconds. A link is
deemed unusable if an LACP packet is not received after three intervals. This
results in potentially 90 seconds of packet loss for a link before that member
interface is removed from a port-channel.
An amendment to the standards was made so that LACP packets are advertised
every second. This is known as LACP fast because a link is identified and
removed in 3 seconds, compared to the 90 seconds of the initial LACP standard.
LACP fast is enabled on the member interfaces with the interface configuration
command lacp rate fast.
Note
All interfaces on both switches must be configured the same, either LACP
fast or LACP slow, for the port-channel to successfully come up.
Note
When using LACP fast, check your respective platform’s release notes to
ensure that in-service software upgrade (ISSU) and graceful switchover are
still supported.
Example 5-12 demonstrates identifying the current LACP state on the local and
neighbor interface, along with converting an interface to LACP fast.
Example 5-12 Configuring LACP Fast and Verifying LACP Speed State
Click here to view code image
NX-1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
NX-1(config)# interface Eth1/1
NX-1(config-if)# lacp rate fast
Graceful Convergence
Nexus switches have LACP graceful convergence enabled by default with the
port-channel interface command lacp graceful-convergence. When a Nexus
switch is connected to a non-Cisco peer device, its graceful failover defaults can
delay the time to bring down a disabled port.
Another scenario involves forming LACP adjacencies with devices that do not
fully support the LACP specification. For example, a non-compliant LACP
device might start to transmit data upon receiving the Sync LACP message (step
2 from forming LACP adjacencies) before transmitting the Collecting LACP
message to a peer. Because the local switch still has not reached a Collecting
state, these packets are dropped.
The solution involves removing LACP graceful convergence on port-channel
interfaces when connecting to noncompliant LACP devices with the no lacp
graceful-convergence command. The Nexus switch then waits longer for the
port to initialize before sending a Sync LACP message to the peer. This ensures
that the port receives packets upon sending the Sync LACP message.
Suspend Individual
By default, Nexus switches place an LACP port in a suspended state if it does
not receive an LACP PDU from the peer. Typically, this behavior helps prevent
loops that occur with a bad switch configuration. However, it can cause some
issues with some servers that require LACP to logically bring up the port.
This behavior is changed by disabling the feature with the port-channel interface
command no lacp suspend-individual.
Note
A full list of compatibility parameters that must match is included with the
command show port-channel compatibility-parameters.
The load-balancing hash is seen with the command show port-channel load-
balance, as Example 5-14 shows. The default system hash is source-dest-ip,
which calculates the hash based upon the source and destination IP address in
the packet header.
If the links are unevenly distributed, changing the hash value might provide a
different distribution ratio across member-links. For example, if the port-
channel is established with a router, using a MAC address as part of the hash
could impact the traffic flow because the router’s MAC address does not change
(the MAC address for the source or destination is always the router’s MAC
address). A better choice is to use the source/ destination IP address or base it
off session ports.
Note
Add member links to a port-channel in powers of 2 (2, 4, 8, 16) to ensure
that the hash is calculated consistently.
In rare cases, troubleshooting is required to determine which member link a
packet is traversing on a port-channel. This involves checking for further
diagnostics (optic, ASIC, and so on) when dealing with random packet loss. A
member link is identified with the command show port-channel load-balance [
forwarding-path interface port-channel number { . | vlan vlan_ID } [ dst-ip
ipv4-addr ] [ dst-ipv6 ipv6-addr ] [ dst-mac dst-mac-addr ] [ l4-dst-port dst-
port ] [ l4-src-port src-port ] [ src-ip ipv4-addr ] [ src-ipv6 ipv6-addr ] [ src-
mac src-mac-addr ]].
Example 5-15 demonstrates how the member link is identified on NX-1 for a
packet coming from 192.168.2.2 toward 192.168.1.1 on port-channel 1.
Virtual Port-Channel
Port-channels lend many benefits to a design, but only two devices (one local
and one remote) can be used. NX-OS includes a feature called virtual port-
channel (vPC) that enables two Nexus switches to create a virtual switch in what
is called a vPC domain. vPC peers then provide a logical Layer 2 (L2) port-
channel to a remote device.
Figure 5-4 provides a topology to demonstrate vPCs. NX-2 and NX-3 are
members of the same vPC domain and are configured with a vPC providing a
logical port-channel toward NX-1. From the perspective of NX-1, it is connected
to only one switch.
Figure 5-4 Virtual Port-Channel
Note
Unlike switch stacking or Virtual Switching Systems (VSS) clustering
technologies, the configuration of the individual switch ports remains
separate. In other words, the Nexus switches are configured independently.
vPC Fundamentals
Only two Nexus switches can participate in a vPC domain. The vPC feature also
includes a vPC peer-keepalive link, vPC member links, and the actual vPC
interface. Figure 5-5 shows a topology with these components.
vPC Domain
A Nexus switch can have regular port-channel and vPC interfaces at the same
time. A different LACP system ID is used in the LACP advertisements between
the port-channel and vPC interfaces. Both Nexus peer switches use a virtual
LACP system ID for the vPC member link.
One of the switches is the primary device and the other is the secondary device.
The Nexus switches select the switch with the lower role priority as the primary
device. If a tie occurs, the Nexus switch with the lower MAC address is
preferred. No pre-emption takes place in identifying the primary device, so the
concept of operational primary device and operational secondary device is
introduced.
This concept is demonstrated in the following steps by imagining that NX-2 and
NX-3 are in the same vPC domain, and NX-2 has a lower role priority.
Step 1. As both switches boot and initialize, neither switch has been elected as
the vPC domain primary device. Then NX-2 becomes the primary device
and the operational primary device, while NX-3 becomes the secondary
device and the operational secondary device.
Step 2. NX-2 is reloaded. NX-3 then becomes the primary device and the
operational primary device.
Step 3. When NX-2 completes its initialization, it again has the lower role
priority but does not preempt NX-3. At this stage, NX-2 is the primary
device and the operational secondary device, and NX-3 is the secondary
device and the operational primary device. Only when NX-3 reloads or
shuts down all vPC interfaces does NX-2 become the operational
primary device.
vPC Peer-Keepalive
The vPC peer-keepalive link monitors the health of the peer vPC device. It sends
keepalive messages on a periodic basis (system default of 1 second). The
heartbeat packet is 96 bytes in length, using UDP port 3200. If the peer link
fails, connectivity is checked across the vPC peer link. Not a lot of network
traffic is submitted across the peer-keepalive link, so a 1-Gbps interface is used.
A vPC peer device detects a peer failure by not receiving any peer-keepalive
messages. A hold-timeout timer starts as soon as the vPC peer is deemed
unavailable. During the hold-timeout period (system default of 5 seconds), the
secondary vPC device ignores any vPC keep-alive messages to ensure that the
network can converge before action is taken against vPC interfaces. After the
hold-timeout period expires, the timeout timer begins (system default of 3
seconds). If a vPC keep-alive message is not received during this interval, the
vPC interfaces on the secondary vPC switch are shut down. This behavior
prevents a split-blain scenario.
Note
Although using a VLAN interface for the peer-keepalive interface is
technically feasible, this approach is discouraged because it can cause
confusion. Additionally, the link should be directly connected where
possible (with the exception of the management ports).
vPC Peer Link
The vPC peer link is used to synchronize state and forward data between
devices. For example, imagine that a server is attached to NX-1 and is
communicating with a host attached to NX-2. Because the port-channel hash on
NX-1, traffic is sent out the Ethernet2/2 link toward NX-3. NX-3 uses the vPC
peer link to forward the packet toward NX-2 so that NX-2 can forward the
traffic toward the directly attached host.
The vPC peer link must be on a 10-Gbps or higher Ethernet port. Typically, a
port-channel is used to ensure that enough bandwidth exists for traffic sent from
one vPC peer to be redirected where appropriate to the remote vPC peer. In
addition, on modular Nexus switches, the links should be spread across different
line cards/modules to ensure that the peer link stays up during a hardware
failure.
Note
Using the management interface for the peer-keepalive link is possible, but
this requires a management switch to provide connectivity between peer
devices. If a system has multiple supervisors (as with Nexus 7000/9000),
both the active and standby management ports on each vPC peer need to
connect to the management switch.
Step 4. Configure the vPC domain. The vPC domain is the logical construct
that both Nexus peers use. The vPC domain is created with the command
vpc domain domain-id. The domain ID must match on both devices.
In the vPC domain context, the peer-keepalive interfaces must be
identified with the command peer-keepalive destination remote-nexus-
ip [hold-timeout secs | interval msecs {timeout secs} | source local-
nexus-ip | vrf name]. The source interface is optional, but statically
assigning it as part of the configuration is recommended. The peer-
keepalive advertisement interval, hold-timeout, and timeout values are
configured by using the optional keywords hold-timeout, interval, and
timeout.
NX-OS automatically creates a vPC system MAC address for the LACP
messaging, but the MAC address is defined with the system-mac mac-
address command. The LACP system priority for vPC domain is 32768,
but it can be modified with the command system-priority priority to
increase or lower the virtual LACP priority.
Step 5. Configure the vPC device priority (optional). The vPC device priority
is configured with the command role priority priority. The priority can
be set from 1 to 65,535, with the lower value more preferred. The
preferred node is the primary vPC node; the other node is the secondary.
Step 6. Configure the vPC System priority (optional). Regular port-channel
negotiation between two switches must identify the master switch; the
same concept applies to vPC interfaces. The vPC LACP system priority
is configured with the domain configuration command system-priority
priority.
Step 7. Configure vPC autorecovery (optional but recommended) link. As a
safety mechanism, a vPC peer does not enable any vPC interfaces until
it detects the other vPC peer. In some failure scenarios, such as power
failures, both vPC devices are restarted and do not detect each other.
This can cause a loss of traffic because neither device forwards traffic.
The vPC autorecovery feature provides a method for one of the vPC
peers to start forwarding traffic. Upon initialization, if the vPC peer link
is down and three consecutive peer-keepalive messages are not
responded to, the secondary device assumes the operational primary role
and initializes vPC interfaces to allow some traffic to be forwarded. vPC
autorecovery is explained later in this chapter.
This feature is enabled with the vPC domain configuration command
auto-recovery [reload-delay delay]. The default delay is 240 seconds
before engaging this feature, but this can be changed using the optional
reload-delay keyword. The delay is a value between 240 and 3600.
Step 8. Configure the vPC. Ports are assigned to the port-channel with the
command channel-group portchannel-number mode active command.
The port-channel interface is assigned a unique vPC identifier with the
command vpc vpc-id. The vpc-id needs to match on the remote peer
device.
Example 5-16 demonstrates the vPC configuration of NX-2 from Figure 5-5.
vPC Verification
Now that both Nexus switches are configured, the health of the vPC domain
must be examined.
vPC status
-----------------------------------------------------------------------
-----
Id Port Status Consistency Reason Active
vlans
-- ------------ ------ ----------- ------ ----------
-----
1 Po1 up success success 1
As stated earlier, the peer link should be in a forwarding state. This is verified
by examining the STP state with the command show spanning-tree, as
Example 5-18 demonstrates. Notice that the vPC interface (port-channel 100)
interface is in a forwarding state and is identified as a network point-to-point
port.
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 28673
Address 885a.92de.617c
Cost 1
Port 4096 (port-channel1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
If the status shows as “down,” verify that each switch can ping the other switch
from the VRF context that is configured. If the ping fails, troubleshooting basic
connectivity between the two switches needs to be performed.
vPC Consistency-Checker
Just as with port-channel interfaces, certain parameters must match on both
Nexus switches in the vPC domain. NX-OS contains a specific process called
the consistency-checker to ensure that the settings are compatible and to prevent
unpredictable packet loss. The consistency-checker has two types of errors:
Type 1
Type 2
Type 1
When a Type 1 vPC consistency-checker error occurs, the vPC instance and vPC
member ports on the operational secondary Nexus switch enter a suspended
state and stop forwarding network traffic. The operational primary Nexus switch
still forwards network traffic. These settings must match to avoid a Type 1
consistency error:
Port-channel mode: on, off, or active
Link speed per channel
Duplex mode per channel
Trunk mode per channel
Native VLAN
VLANs allowed on trunk
Tagging of native VLAN traffic
STP mode
STP region configuration for Multiple Spanning Tree
Same enable/disable state per VLAN
STP global settings
Bridge Assurance setting
Port type setting (recommended: setting all vPC peer link ports as
network ports)
Loop Guard settings
STP interface settings
Port type setting
Loop Guard
Root Guard
MTU
Allowed VLAN bit set
Note
NX-OS version 5.2 introduced a feature called graceful consistency checker
that changes the behavior for Type 1 inconsistencies. The graceful
consistency checker enables the operational primary device to forward
traffic. If this feature is disabled, the vPC is shut down completely. This
feature is enabled by default.
Type 2
A Type 2 vPC consistency-checker error indicates the potential for undesired
forwarding behavior, such as having a VLAN interface on one node and not
another.
vPC status
-----------------------------------------------------------------------
-----
Id Port Status Consistency Reason Active
vlans
-- ------------ ------ ----------- ------ ----------
-----
1 Po1 up failed Global compat check 1,10,20
failed
Legend:
Type 1 : vPC will be suspended in case of mismatch
Legend:
Type 1 : vPC will be suspended in case of mismatch
vPC Autorecovery
As a safety mechanism, a vPC peer does not enable any vPC interfaces until it
detects the other vPC peer. In some failure scenarios, such as power failures,
both vPC devices are restarted and do not detect each other. This can cause a
loss of traffic because neither device forwards traffic.
The vPC autorecovery feature provides a method for one of the vPC peers to
start forwarding traffic. Upon initialization, if the vPC peer link is down and
three consecutive peer-keepalive messages were not responded to, the secondary
device assumes the operational primary role and can initialize vPC interfaces to
allow some traffic to forward.
This feature is enabled with the vPC domain configuration command auto-
recovery [reload-delay delay]. The default delay is 240 seconds before
engaging this feature, but this can be changed with the optional reload-delay
keyword. The delay is a value between 240 and 3600. Example 5-26 displays the
configuration and verification of vPC autorecovery.
When the web server sends a packet to the NAS device (172.32.100.22), it
computes a hash to identify which link it should send the packet on to reach the
NAS device. Assume that the web server sends the packet to NX-2, which then
changes the packet’s source MAC address to 00c1.5c00.0011 (part of the routing
process) and forwards the packet on to NX-1. NX-1 forwards (switches) the
packet on to the NAS device.
Now the NAS device creates the reply packet and, when generating the packet
headers, uses the destination MAC address of the HSRP gateway
00c1.1234.0001 and forwards the packet to NX-1. NX-1 computes a hash based
on the source and destination IP address and forwards the packet toward NX-3.
NX-2 and NX-3 both have the destination MAC address for the HSRP gateway
and can then route the packet for the 172.32.200.0/24 network and forward it
back to the web server. This is the correct and normal forwarding behavior.
The problem occurs when the NAS server enables a feature for optimizing
packet flow. After the NAS device receives the packet from the web server and
generates the reply packet headers, it just uses the source and destination MAC
addresses from the packet it originally received. When NX-1 receives the reply
packet, it calculates the hash and forwards the packet toward NX-3. Now NX-3
does not have the MAC address 00c1.5c00.0011 (NX-2’s VLAN 100 interface)
and cannot forward the packet toward NX-1. The packet is dropped because
packets received on a vPC member port cannot be forwarded across the peer
link, as a loop-prevention mechanism.
Enabling a vPC peer-gateway on NX-2 and NX-3 allows NX-3 to route packets
destined for NX-2’s MAC addresses, and vice versa. The vPC peer-gateway
feature is enabled with the command peer-gateway under the vPC domain
configuration. The vPC peer-gateway functionality is verified with the show vpc
command. Example 5-27 demonstrates the configuration and verification of the
peer-gateway feature.
Note
In addition, NX-OS automatically disables IP redirects on SVIs where the
VLAN is enabled on a vPC trunk link.
Note
Packets that are forwarded by the peer-gateway feature have their time to
live (TTL) decremented. Packets carrying a TTL of 1 thus might get
dropped in transit because of TTL expiration.
If the vPC peer link is broken (physically or through an accidental change that
triggers a Type 1 consistency checker error), NX-2 suspends activity on its vPC
member port and shuts down the SVI for VLAN 200. NX-3 drops its routing
protocol adjacency with NX-2 and then cannot provide connectivity to the
corporate network for the web server. Any packets from the web server for the
corporate network received by NX-3 are dropped.
This scenario is overcome by deploying a dedicated L3 connection between vPC
peers. These are either individual links or an L3 port-channel interface.
Note
Remember that the vPC peer link does not support the transmission of
routing protocols as transient traffic. For example, suppose that Eth1/22 on
NX-2 is a switch port that belongs to VLAN 200 and R4’s Gi0/0 interface is
configured with the IP address of 172.32.200.5. R4 pings NX-3, but it does
not establish an OSPF adjacency with NX-3 because the OSPF packets are
not transmitted across the vPC peer link. This is resolved by deploying the
second solution listed previously.
Note
L3 Routing over vPC is specific only to unicast and does not include
support for multicast network traffic.
Figure 5-8 demonstrates the concept in which NX-2 and NX-3 want to exchange
routes using OSPF with R4 across the vPC interface. NX-2 and NX-3 enable
Layer 3 routing over vPC to establish an Open Shortest Path First (OSPF)
neighborship with R4. In essence, this design places NX-2, NX-3, and R4 on the
same LAN segment.
Figure 5-8 Layer 3 Routing over vPC
Layer 3 routing over vPC is configured under the vPC domain with the
command layer3 peer-router. The peer-gateway is enabled when using this
feature. The feature is verified with the command show vpc.
Example 5-29 demonstrates the configuration and verification of Layer 3
routing over vPC.
Note
If vPC peering is not being established or vPC inconsistencies result,
collect the show tech vpc command output and contact Cisco technical
support.
FabricPath
Until recently, all L2 networks traditionally were enabled with STP to build a
loop-free topology. However, the STP-based L2 network design introduces some
limitations. One limitation is the inability of STP to leverage parallel
forwarding paths. STP blocks additional paths, forcing the traffic to take only
one path as the STP forms a forwarding tree rooted at a single device, even
though redundant paths are physically available. Other limitations include the
following:
STP convergence is disruptive.
MAC address tables don’t scale.
The tree topology provides limited bandwidth.
The tree topology introduces suboptimal paths.
Host flooding impacts the whole network.
Local problems have a network-wide impact, making troubleshooting
difficult.
To overcome these challenges, vPC was introduced in 2008. An Ethernet device
then could connect simultaneously to two discrete Nexus switches while
bundling these links into a logical port-channel. vPC provided users with active-
active forwarding paths, thus overcoming the limitation of STP. Still, although
vPC overcame most of the challenges, others remained. For example, no
provision was made for adding third or fourth aggregation layer switches to
further increase the density or bandwidth on the downstream switch. In addition,
vPC doesn’t overcome the traditional STP design limitation of extending the
VLANs.
The Cisco FabricPath feature provides a foundation for building a simplified,
scalable, and multipath-enabled L2 fabric. From the control plane perspective,
FabricPath uses a shortest path first (SPF)–based routing protocol, which helps
with best path selection to reach a destination within the FabricPath domain. It
uses the L2 IS-IS protocol, which provides all IS-IS capabilities for handling
unicast, broadcast, and multicast packets. Enabling a separate process for the L2
IS-IS is not needed; this is automatically enabled on the FabricPath-enabled
interfaces.
FabricPath provides Layer 3 routing benefits to flexible L2 bridged Ethernet
networks. It provides the following benefits of both routing and switching
domains:
Routing
Multipathing (ECMP), with up to 256 links active between any two
devices
Fast convergence
High scalability
Switching
Easy configuration
Plug and Play
Provision flexibility
Because the FabricPath core runs on L2 IS-IS, no STP is enabled between the
spine and the leaf nodes, thus providing reliable L2 any-to-any connectivity. A
single MAC address lookup at the ingress edge device identifies the exit port
across the fabric. The traffic is then switched using the shortest path available.
FabricPath-based design allows hosts to leverage the benefit of multiple active
Layer 3 default gateways, as Figure 5-9 shows. The hosts see a single default
gateway. The fabric provides forwarding toward the active default gateways
transparently and simultaneously, thus extending the multipathing from inside
the fabric to the Layer 3 domain outside the fabric.
The FP core ports provide connectivity to the spine and are FabricPath-enabled
interfaces. The FP core network is used to perform the following functions:
Send and receive FP frames
Avoid STP, require no MAC learning, and require no MAC address table
maintained by FP Core ports
Decide the best path by using a routing table computed by IS-IS
The CE edge ports are regular trunk or access ports that provide connectivity to
the hosts or other classical switches. The CE ports perform the following
functions:
Send and receive regular Ethernet frames
Run STP, perform MAC address learning, and maintain a MAC address
table
The FP edge device maintains the association of MAC addresses and switch-IDs
(which IS-IS automatically assigns to all switches). FP also introduces a new
data plane encapsulation by adding a 16-byte FP frame on top of the classical
Ethernet header. Figure 5-11 displays the FP encapsulation header, which is also
called the MAC-in-MAC header. The external FP header consists of Outer
Destination Address, Outer Source Address, and FP tag. Important fields of the
Outer Source or Destination address fields within the FP header include the
following:
Switch-ID (SID): Identifies each FP switch by a unique number
Sub-Switch ID (sSID): Identifies devices and hosts connected via vPC+
Local ID (LID): Identifies the exact port that sourced the frame or to
which the frame is destined. The egress FP switch uses LID to determine
the output interface, thus removing the requirement for MAC learning on
FP core ports. LID is locally significant.
Note
If more than 1024 topologies are required, the FTAG value is set to 0 and
the VLAN is used to identify the topology for multidestination trees.
The following steps describe the packet flow for ARP reply from B to A across
the fabric.
Step 1. Host B with MAC address B sends the ARP reply back to host A. In the
ARP response, the source MAC is set to B and the destination MAC is
set to A.
Step 2. When the packet reaches the leaf switch S300, it updates its MAC
address table with MAC address B, but it still does not have information
about MAC address A. This makes the packet an unknown unicast
packet.
Step 3. S300, the ingress FP switch, determines which tree to use. Unknown
unicast typically uses the first Tree ID (Ftag 1). The Tree ID 1 points to
all the FP core interfaces on switch S300 (po10, po20, po30, and po40).
The ingress FP switch also sets the outer destination MAC address to the
well-known “flood to fabric” multicast address represented as MC1—
01:0F:FF:C1:01:C0.
Step 4. The FP encapsulated unknown unicast packet is sent to all the spine
switches. Other FP switches honor the Tree ID selected by the ingress
switch (Tree 1, in this case). When the packet reaches the root for Tree 1
(S10), it uses the same Ftag 1 and forwards the packet out of interfaces
po100 and po200. (It does not forward the packet on po300 because this
is the interface from which the frame was received).
Step 5. When the packet reaches S100, it performs a lookup on the FP trees and
uses Tree ID 1, which is set to po10. Because the packet from S10 was
received on po10 on the S100 switch, the packet is not forwarded back
again to the fabric.
Step 6. The FP header is then decapsulated and the ARP reply is forwarded to
the host with MAC A. At this point, the MAC address table on S100 is
updated with MAC address B, with the IF/SID pointing to S300. This is
because the destination MAC is known inside the frame.
The next time host A sends a packet to host B, the packet from A is sent with
source MAC A and destination MAC B. The switch S100 receives the packet on
the CE port, and the destination MAC is already known and points to the switch
S300 in an FP-enabled network. The FP routing table is looked up to find the
shortest path to S300 using a flow-based hash because multiple paths to S300
exist. The packet is encapsulated with the FP header with a source switch-ID
(SWID) of S100 and a destination SWID of S300, and the FTAG is set to 1. The
packet is received on one of the spine switches. The spine switch then performs
an FP routing lookup for S300 and sends the packet to an outgoing interface
toward S300. When the packet reaches S300, the MAC address for A is updated
in the MAC address table with the IF/SID pointing to S100.
FabricPath Configuration
To configure FabricPath and verify a FabricPath-enabled network, examine the
topology shown in Figure 5-14. This figure has two spine nodes (NX-10 and
NX-20) and three leaf nodes (NX-1, NX-2, and NX-3). The end host nodes, host
A and host B, are connected to leaf nodes NX-1 and NX-3.
Figure 5-14 FabricPath-Enabled Topology
Enabling the FabricPath feature is a bit different than enabling other features.
First the FabricPath feature set is installed, then the feature-set fabricpath is
enabled, and then the FabricPath feature is enabled. Example 5-30 demonstrates
the configuration for enabling FabricPath feature. FabricPath uses the Dynamic
Resource Allocation Protocol (DRAP) for the allocation of switch-IDs.
However, a switch-ID can be manually configured on a Nexus switch using the
command fabricpath switch-id [1-4094]. Every switch in the FabricPath
domain is required to be configured with the unique switch-ID.
Various timers can also be configured with FabricPath, ranging from 1 to 1200
seconds:
allocate-delay: This timer is used when a new switch-ID is allocated and is
required to be propagated throughout the network. The allocate-delay
defines the delay before the new switch-ID is propagated and becomes
available and permanent.
linkup-delay: This timer configures the delay before the link is brought up,
to detect any conflicts in the switch-ID.
transition-delay: This command sets the delay for propagating the
transitioned switch-ID value in the network. During this period, all old and
new switch-ID values exist in the network.
Note
The default value of all these timers is 10 seconds.
Interface: Ethernet6/5
Status: protocol-up/link-up/admin-up
Index: 0x0003, Local Circuit ID: 0x01, Circuit Type: L1
No authentication type/keychain configured
Authentication check specified
Extended Local Circuit ID: 0x1A284000, P2P Circuit ID:
0000.0000.0000.00
Retx interval: 5, Retx throttle interval: 66 ms
LSP interval: 33 ms, MTU: 1500
P2P Adjs: 1, AdjsUp: 1, Priority 64
Hello Interval: 10, Multi: 3, Next IIH: 00:00:03
Level Adjs AdjsUp Metric CSNP Next CSNP Last LSP ID
1 1 1 40 60 Inactive ffff.ffff.ffff.ff-ff
Topologies enabled:
Level Topology Metric MetricConfig Forwarding
0 0 40 no UP
1 0 40 no UP
The IS-IS adjacency between the leaf and the spine nodes is also verified using
the command show fabricpath isis adjacency [detail]. Example 5-35 displays
the adjacency on NX-1 and NX-10. The command displays the system ID,
circuit type, interface participating in IS-IS adjacency for FabricPath, topology
ID, and forwarding state. The command also displays the last time when the
FabricPath transitioned to current state (that is, the last time the adjacency
flapped).
Next, validate whether the necessary FabricPath VLANs are configured on the
edge/leaf switches. This is verified by using the command show fabricpath isis
vlan-range. When the FP VLANs are configured and CE-facing interfaces are
configured, the edge devices learn about the MAC addresses of the hosts
attached to the edge node. This is verified using the traditional command show
mac address-table vlan vlan-id. Example 5-36 verifies the FP VLAN and the
MAC addresses learned from the hosts connected to the FP VLAN 100.
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-
Link,
(T) - True, (F) - False , ~~~ - use 'hardware-age' keyword to
retrieve
age info
VLAN MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------
------
* 100 30e4.db97.e8bf dynamic ~~~ F F Eth6/6
100 30e4.db98.0e7f dynamic ~~~ F F 300.0.97
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-
Link,
(T) - True, (F) - False , ~~~ - use 'hardware-age' keyword to
retrieve
age info
VLAN MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------
------
100 30e4.db97.e8bf dynamic ~~~ F F 100.0.85
* 100 30e4.db98.0e7f dynamic ~~~ F F Eth6/18
Similar to Layer 3 IS-IS, Layer-2 IS-IS maintains multiple topologies within the
network. Each topology is represented as a tree ID in the FabricPath domain.
The trees are nothing but multidestination trees within the fabric. To view the
IS-IS topologies in FabricPath domain, use the command show fabricpath isis
topology [summary]. Example 5-37 displays the different IS-IS topologies in
the present topology.
If issues arise with traffic forwarding or MAC addresses not being learned, it is
important to check whether the FP IS-IS adjacency has been established and
whether the FP IS-IS routes are present in the Unicast Routing Information Base
(URIB). This is easily validated through the command show fabricpath route
[detail | switchid switch-id]. This command displays the routes for the remote
nodes (leaf or spine nodes). The route is seen in the form of ftag/switch-
id/subswitch-id. In Example 5-38, the route for remote edge device NX-3 is seen
with FTAG 1, switch-ID 300, and Subswitch-ID 0 (because no vPC+
configuration was enabled).
The previous output makes it clear that the route for NX-3 has an FTAG value of
1. When the route is verified in URIB, validate that the route is installed in the
Forwarding Information Base (FIB). To verify the route present in the FIB, use
the line card command show fabricpath unicast routes vdc vdc-number [ftag
ftag] [switchid switch-id]. This command displays hardware route information
along with its RPF interface in the software table on the line card. As part of the
platform-dependent information, the command output returns the hardware table
address, which is further used to verify the hardware forwarding information for
the route. In the output shown in Example 5-39, the software table shows that
the route is a remote route with the RPF interface of Ethernet6/5. It also returns
the hardware table address of 0x18c0.
Example 5-39 Verifying Software Table in Hardware for FP Route
Click here to view code image
Note
The commands in Example 5-39 are relevant for F2 and F3 line card
modules on Nexus 7000/7700 series switches. The verification commands
vary among line cards and also platforms (for instance, Nexus 5500).
Using the hardware address in the software table, execute the command show
hardware internal forwarding instance instance-id table sw start hw-entry-
addr end hw-entry-addr. The instance-id value is achieved from the FE num
field in the previous example. The hw-entry-addr address is the address
highlighted in the previous example output. This command output displays the
switch-ID (swid), the Subswitch-ID (sswid), and various other fields. One of the
important fields to note is ssw_ctrl. If the ssw_ctrl field is 0x0 or 0x3, the
switch does not have subswitch-IDs (available only in the case of vPC+). If
vPC+ configuration is available, the value is usually 0x1. Another field to look
at is the local field. If the local field is set to n, multipath is available for the
route, so a multipath table is required for verification. Example 5-40
demonstrates this command.
[18c0]| DATA
[18c0]|
valid : y mp_mod : 1
[18c0]|
mp_base : 24 local : n
[18c0]|
cp_to_sup1 : n cp_to_sup2 : n
[18c0]|
drop : n dc3_si : 11c1
[18c0]|
data_tbl_ptr : 0 ssw_ctrl : 0
[18c0]| iic_port_idx : 54
[18c0]| l2tunnel_remote (CR only) : 0
Duplicate switch-IDs can cause forwarding issues and instability in the
FabricPath-enabled network. To check whether the network has duplicate or
conflicting switch-IDs, use the command show fabricpath conflict all. In case
of any FabricPath-related errors, event-history logs for a particular switch-ID
can be verified using the command show system internal fabricpath switch-id
event-history errors. Alternatively, the show tech-support fabricpath
command output can be collected for further investigation.
Note
If an issue arises with FabricPath, collect the following show tech-support
outputs during the problematic state:
Click here to view code image
show tech u2rib
show tech pixm
show tech eltm
show tech l2fm
show tech fabricpath isis
show tech fabricpath topology
show tech fabricpath switch-id
Along with these show tech outputs, show tech details are useful in
investigating issues in the FabricPath environment.
FabricPath Devices
FabricPath is supported on Nexus 7000/7700 and Nexus 5500 series switches.
Check the FabricPath Configuration Guide for scalability and supported switch
modules.
Emulated Switch and vPC+
In modern data center design, servers are usually connected to multiple edge
devices to provide redundancy. In a FabricPath-enabled network, only the edge
switches learn the L2 MAC addresses. The learning consists of mapping the
MAC address to a switch-ID, which is a one-to-one association performed in a
data plane. This association or mapping detects a MAC move when the host
changes its location to another switch. A vPC allows links that are physically
connected to two different Nexus switches to appear as single port-channel to a
server or another switch. This provides a loop-free topology, eliminates the
spanning tree blocked ports, and maximizes bandwidth usage.
In a FabricPath-enabled network, it is paramount to support the same
configuration, with a host or Ethernet switch connected through a port-channel
to two FabricPath edge switches. A dual-connected host can send a packet into
the FabricPath network from both edge switches, leading to a MAC flap. Figure
5-15 illustrates the MAC flap caused by a dual-connected host. Host A is dual-
connected to both edge switches S1 and S2 and is communicating with host B.
When the frames are sent from switch S1, MAC-A is associated with switch S1;
when frames are sent from S2, MAC-A is associated with switch S2. This results
in a MAC flap. To address the problem, FabricPath implements the emulated
switch.
vPC+ Configuration
To configure vPC+, two primary features must be enabled on the Nexus switch:
FabricPath
vPC
To understand how the vPC+ feature works, examine the topology in Figure 5-
16. In this topology, NX-1 and NX-2 are forming a vPC with SW-12, and NX-3
and NX-4 are forming a vPC with SW-34. All the links among the four Nexus
switches are FabricPath-enabled links, including the vPC peer link.
Figure 5-16 vPC+ Topology
Examine the vPC and FabricPath configuration for NX-1 and NX-3 in Example
5-41. Most of the configuration is similar to the configuration shown in the
section on vPC and FabricPath. The main differentiating configuration is the
fabricpath switch-id switch-id command configured under vPC configuration
mode. The same switch-ID values are assigned on both the emulated switches
NX-1 and NX-2 (assigned the switch-ID of 100) and NX-3 and NX-4 (assigned
the switch-ID of 200).
Example 5-41 vPC+ Configuration on NX-1 and NX-3
Click here to view code image
NX-1
install feature-set fabricpath
feature-set fabricpath
feature vpc
vlan 100,200,300,400,500
mode fabricpath
!
fabricpath switch-id 100
!
vpc domain 10
peer-keepalive destination 10.12.1.2 source 10.12.1.1 vrf default
fabricpath switch-id 100
!
interface port-channel1
switchport mode fabricpath
vpc peer-link
!
interface Ethernet6/4
switchport mode fabricpath
!
interface Ethernet6/5
switchport mode fabricpath
!
interface port-channel10
switchport
switchport mode trunk
vpc 10
NX-3
install feature-set fabricpath
feature-set fabricpath
feature vpc
vlan 100,200,300,400,500
mode fabricpath
!
fabricpath switch-id 200
!
vpc domain 20
peer-keepalive destination 10.34.1.4 source 10.34.1.3 vrf default
fabricpath switch-id 200
!
interface port-channel1
switchport mode fabricpath
vpc peer-link
!
interface Ethernet6/16
switchport mode fabricpath
!
interface Ethernet6/17
switchport mode fabricpath
!
interface port-channel20
switchport
switchport mode trunk
vpc 20
When both edge devices running the emulated switch learn the MAC addresses
from the remote edge nodes, the address for the interfaces is shown with the
MAC route assigned on the vPC interface of the remote edge node. Example 5-
44 displays the MAC address table on both NX-1 and NX-3 nodes. Notice that
the MAC address for the remote host connected on the NX-3/NX-4 vPC link is
learned with the interface assigned with FP MAC route 200.11.65535 on NX-1.
For the host connected to NX-1 and NX-2, the vPC link is learned with the
interface assigned with FP MAC route 100.11.65535.
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-
Link, E -
EVPN entry
(T) - True, (F) - False , ~~~ - use 'hardware-age' keyword to
retrieve
age info
VLAN/BD MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------
------
100 0022.56b9.007f dynamic ~~~ F F 200.11.65535
* 100 24e9.b3b1.8cff dynamic ~~~ F F Po10
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-
Link, E -
EVPN entry
(T) - True, (F) - False , ~~~ - use 'hardware-age' keyword to
retrieve
age info
VLAN/BD MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------
------
* 100 0022.56b9.007f dynamic ~~~ F F Po20
100 24e9.b3b1.8cff dynamic ~~~ F F 100.11.65535
If issues arise with MAC learning, check that an IS-IS adjacency exists between
the devices. The IS-IS adjacency is established with the vPC peer device and the
other spines or edge nodes based on their connectivity. After verifying the
adjacency, the FP routes are learned through IS-IS. The route for the emulated
switch from the vPC peer is learned through the vPC Manager (vPCM) and is
seen in the URIB as learned through vpcm, as Example 5-45 shows. Looking
deeper into the URIB, notice that the route learned from remote emulated switch
has the Flag or Attribute set to E.
Example 5-45 Verifying Local and Remote FP Routes in URIB
Click here to view code image
The edge ports are usually vPC ports in a vPC+-based design, so verifying the
vPCM information related to the vPC link is vital. This is verified using the
command show system internal vpcm info interface interface-id. This
command displays the outer FP MAC address of the port-channel interface,
VLANs, vPC peer information, and also the information stored in the PSS. Note
that the PSS information helps ensure restoration of the information in case of
any link flaps or VDC/switch reloads. Example 5-46 displays the vPCM
information for port-channel 10 on NX-1 node, highlighting the FP MAC
addresses and the information from vPC peers.
IF Elem Information:
IF Index: 0x16000009
MCEC NUM: 10 Is MCEC
Allowed/Config VLANs : 6 - [1,100,200,300,400,500]
Allowed/Config BDs : 0 - []
MCECM DB Information:
IF Index : 0x16000009
vPC number : 10
Num members : 0
vPC state : Up
Internal vPC state: Up
Compat Status :
Old Compat Status : Pass
Current Compat Status: Pass
Reason Code : SUCCESS
Param compat reason code:0(SUCCESS)
Individual mode: N
Flags : 0x0
Is AutoCfg Enabled : N
Is AutoCfg Sync Complete : N
Number of members: 0
FEX Parameters:
vPC is a non internal-vpc
vPC is a non fabric-vpc
FPC bringup: FALSE
Parent vPC number: 0
Card type : F2
Hardware prog state : No R2 prog
Fabricpath outer MAC address info: 100.11.65535
Designated forwarder state: Allow
Assoc flag: Disassociated
Is switchport: Yes Is shared port: FALSE
Up VLANs : 5 - [100,200,300,400,500]
Suspended VLANs : 1 - [1]
Compat check pass VLANs: 4096 - [0-4095]
Compat check fail VLANs: 0 - []
Up BDs : 0 - []
Suspended BDs : 0 - []
Compat check pass BDs : 0 - []
Compat check fail BDs : 0 - []
Compat check pass VNIs : 0 - []
Compat check fail VNIs : 0 - []
vPC Peer Information:
Peer Number : 10
Peer IF Index: 0x16000009
Peer state : Up
Card type : F2
Fabricpath outer MAC address info of peer: 100.11.0
Peer configured VLANs : 6 - [1,100,200,300,400,500]
Peer Up VLANs : 5 - [100,200,300,400,500]
Peer configured VNIs : 0 - []
Peer Up BDs : 0 - []
PSS Information:
IF Index : 0x16000009
vPC number: 10
vPC state: Up
Internal vPC state: Up
Old Compat Status: Pass
Compat Status: Pass
Card type : F2
Fabricpath outer MAC address info: 100.11.65535
Designated forwarder state: Allow
Up VLANs : 5 - [100,200,300,400,500]
Suspended VLANs : 1 - [1]
Up BDs : 0 - []
Suspended BDs : 0 - []
Note
The platform-dependent commands vary among platforms and also depend
on the line card present on the Nexus 7000/7700 chassis. If you encounter
any issues with vPC+, collect the following show tech-support command
outputs:
show tech-support fabricpath
show tech-support vpc
Other show tech-support commands are collected as covered in the
FabricPath section.
Summary
This chapter covered the technologies and features that provide resiliency and
increased capacity between switches from an L2 forwarding perspective. Port-
channels and virtual port-channels enable switches to create a logical interface
with physical member ports. Consistency in port configuration for all member
ports is the most common problem when troubleshooting these issues. This
chapter detailed additional techniques and error messages to look for when
troubleshooting these issues.
FabricPath provides a different approach for removing spanning tree while
increasing link throughput and scalability and minimizing broadcast issues
related to spanning tree. Quite simply, FabricPath involves routing packets in an
L2 realm in an encapsulated state; the packet is later decapsulated before being
forwarded to a host. Troubleshooting packet forwarding in a FabricPath
topology uses some of the basic concepts from troubleshooting STP and port
forwarding while combining them with concepts involved in troubleshooting an
IS-IS network.
References
Fuller, Ron, David Jansen, and Matthew McPherson, Matthew. NX-OS and Cisco
Nexus Switching (Indianapolis: Cisco Press, 2013).
Cisco NX-OS Software Configuration Guides. http://www.cisco.com.
Part III
Troubleshooting Layer 3
Routing
Chapter 6
IP SLA
IP Service Level Agreement (SLA) is a network performance monitoring
application that enables users to do service-level monitoring,
troubleshooting, and resource planning. It is an application-aware synthetic
operation agent that monitors network performance by measuring response
time, network reliability, resource availability, application performance,
jitter, connect time, and packet loss. The statistics gained from this feature
help with SLA monitoring, troubleshooting, problem analysis, and network
topology design. The IP SLA feature consists of two main entities:
IP SLA sender: The IP SLA sender generates active measurement
traffic based on the operation type, as configured by the user and
reports metrics. Apart from reporting metrics, the IP SLA sender also
detects threshold violations and sends notifications. Figure 6-1 shows
the various measurements for different operation types.
For ICMP echo probes, it is not required to configure the IP SLA responder
on the remote device where the probe is destined to. After the probe is
started, the statistics for the probe are verified using the command show ip
sla statistics [number] [aggregated | details]. The aggregated option
displays the aggregated statistics, whereas the details option displays the
detailed statistics. Example 6-2 displays the statistics of the ICMP echo
probe configured in Example 6-1. In the show ip sla statistics command
output, carefully verify fields such as the RTT value, return code, number
of successes, and number of failures. In the aggregated command output,
the RTT value is shown as an aggregated value (for example, the
Min/Avg/Max values of RTT for the probe).
RTT Values:
Number Of RTT: 694 RTT Min/Avg/Max: 2/2/7
milliseconds
Number of successes: 694
Number of failures: 0
Note
The configuration for an IP SLA probe can be viewed using either the
command show running-config sla sender or the command show ip
sla configuration number.
UDP Echo Probe
The sender sends a single UDP packet of a user-defined size to a
destination (responder) and measures the RTT. The sender creates a UDP
packet, opens a socket with user-specified parameters, sends it to another
node (the responder), and marks down the time. The responder receives the
probe and sends back the same packet to the sender with the same socket
parameters. The sender receives this packet, notes the time of receipt,
calculates the RTT, and stores the statistics.
To define a UDP Echo IP SLA probe, use the command udp-echo [dest-ip-
address | dest-hostname] dest-port-number source-ip [src-ip-address | src-
hostname] source-port src-port-number [control [enable | disable]].
Example 6-3 illustrates a UDP Echo probe on the NX-1 switch and a
responder configured on the NX-2 switch. This section of the output also
displays the statistics after the probe is enabled. Note that unless the
responder is configured on the remote end, the probe results in failures. To
configure the IP SLA responder, use the command ip sla responder. To
configure the UDP Echo probe responder, use the command ip sla
responder ipaddress ip-address port port-number.
NX-1 (Sender)
ip sla 11
udp-echo 192.168.2.2 5000 source-ip 192.168.1.1 source-port
65000
tos 180
frequency 10
ip sla schedule 11 start-time now
NX-2 (Responder)
ip sla responder
ip sla responder udp-echo ipaddress 192.168.2.2 port 5000
NX-1
NX-1# show ip sla statistics 11 details
Note
When a UDP Echo probe responder is configured, the responder
device continuously listens on the specified UDP port on the responder
node.
Example 6-4 illustrates the configuration of UDP jitter probe using the
g729a codec, which is set with a type of service (ToS) value of 180. Specify
the life of the probe along with the ip sla schedule command by specifying
the command option life [time- in-seconds | forever]. For a UDP jitter
probe, more detailed information is maintained as part of the statistics.
Statistical information of one-way latency, jitter time, packet loss, and
voice score values is maintained for UDP jitter probe.
Note
To cause an IP SLA operation to check each reply packet for data
corruption, use the verify-data command under ip sla configuration
mode.
As the latency and jitter increases, the MOS score goes down. Such
statistics help the network design and implementation team optimize the
network for the applications.
TCP Connect Probe
A TCP Connect Probe makes a nonblocking TCP connection for a given
source and destination IP address and port, marking the current time as the
operation time. If it gets a successful response from the responder
immediately, it updates the RTT using the difference between the current
time and the operation time and then uses this to update its statistics
components. If the sender does not get a successful response immediately,
it sets a timer and waits for a callback triggered by a response from the
destination IP. Upon receiving the callback, it updates the RTT and
statistics as before.
The TCP connection operation is used to discover the time required to
connect to the target device. This operation is used to test virtual circuit
availability or application availability. If the target is a Cisco router, the IP
SLA probe makes a TCP connection to any port number the user specifies.
If the destination is a non-Cisco IP host, you must specify a known target
port number (for example, 21 for File Transfer Protocol [FTP], 23 for
Telnet, or 80 for Hypertext Transfer Protocol [HTTP] server). This
operation is useful in testing Telnet or HTTP connection times.
To define a TCP connect IP SLA probe, use the command tcp-connect
[dest-ip-address | dest-hostname] dest-port-number source-ip [src-ip-
address | src-hostname] source-port src-port-number [control [enable |
disable]]. For the TCP connect probe, the responder must be configured on
the destination router/switch using the command ip sla responder tcp-
connect ipaddress ip-address port port-number. Example 6-5
demonstrates the configuration of an IP SLA TCP connect probe to probe a
TCP connection between NX-1 and NX-2 switches.
NX-1 (Sender)
ip sla 20
tcp-connect 192.168.2.2 10000 source-ip 192.168.1.1
ip sla schedule 20 life forever start-time now
NX-2 (Responder)
ip sla responder tcp-connect ipaddress 192.168.2.2 port 10000
NX-1
NX-1# show ip sla statistics 20 details
Note
Refer to Nexus Cisco Connection Online (CCO) documentation for
additional information on other command options available with IP
SLA.
Object Tracking
Several IP and IPv6 services, such as First-Hop Redundancy Protocol
(FHRP), are deployed in a network for reliability and high availability
purposes, to ensure load balancing and failover capability. In spite of all
these capabilities, network uptime is not guaranteed when, for example, the
WAN link goes down, which is more likely to occur in a network than
router failure. This results in considerable downtime on the link.
Object tracking offers a flexible and customizable mechanism for affecting
and controlling the failovers in the network. With this feature, you can
track specific objects in the network and take necessary action when any
object’s state change affects the network traffic. The main objective of the
object tracking feature is to allow the processes and protocols in a router
system to monitor the properties of other unrelated processes and protocols
in the same system, to accomplish the following goals:
Provide an optimal level of service
Increase the availability and speed of recovery of a network
Decrease network outages and their duration
Clients such as Hot Standby Router Protocol (HSRP), Virtual Router
Redundancy Protocol (VRRP), and Gateway Load Balancing Protocol
(GLBP) can register their interest in specific tracked objects and take
action when the state of the object changes. Along with these protocols,
other clients that use this feature include the following:
Embedded Event Manager (EEM)
Virtual Port-Channel (vPC)
Object tracking is configured for tracking the following objects:
Line protocol state change
Route reachability
Object track list
Object tracking has the configuration syntax of track number <object-
type> <object-instance> <object-parameter>, where the object-number
value ranges from 1 to 1000. The object-type indicates one of the supported
tracked objects (interface, ip route, or track list). Object-instance refers to
an instance of a tracked object (interface-name, route prefix, mask, and so
on). The object-parameter indicates the parameters related to the object-
type.
Object Tracking for the Interface
The command track number interface interface-id [line protocol | ip
routing | ipv6 routing] creates an object to track the interface status and
either the line protocol status or the IP/IPv6 routing on the interface.
Example 6-6 demonstrates the configuration of interface tracking for the
line protocol and the IP routing status. In the first test, when the interface is
shut down, the track status for both the line protocol and IP routing goes
down. In the second test, if the interface is made an L2 port (that is, it is
configured with the switchport command), only track 2, which is for IP
routing, goes down because IP routing is now disabled on the port Eth2/5.
Use the command show track to check the status of the configured tracking
objects.
NX-1
NX-1(config)# track 1 interface ethernet 2/5 line-protocol
NX-1(config)# track 2 interface ethernet 2/5 ip routing
NX-1(config)# interface ethernet2/5
NX-1(config-if)# shut
Track 2
Interface Ethernet2/5 IP Routing
IP Routing is DOWN
2 changes, last change 00:00:08
Track 2
Interface Ethernet2/5 IP Routing
IP Routing is DOWN
4 changes, last change 00:00:42
NX-1
NX-1(config)# track 5 ip route 192.168.2.2/32 reachability
NX-1(config-track)# delay down 3
NX-1(config-track)# delay up 1
NX-1# show track 5
Track 5
IP Route 192.168.2.2/32 Reachability
Reachability is UP
3 changes, last change 00:02:07
Delay up 1 secs, down 3 secs
A tracking object is also configured for the IP SLA probe using the
command track number ip sla [reachability | status]. Thus, the tracking
object can indirectly be verifying reachability to the remote prefix. The
benefit of using IP SLA probes is that network operators can use IP SLA
not only to verify reachability, but also to track the status of other probes
for UDP echo, UDP jitter, and TCP connection.
Example 6-8 displays the configuration for object tracking with IP SLA
probes. Notice that the show track command output not only displays the
state information, but also returns the operation code and RTT information,
which is actually part of the show ip sla statistics command output.
NX-1
ip sla 10
icmp-echo 192.168.2.2 source-interface loopback0
request-data-size 1400
frequency 5
ip sla schedule 10 start-time now
!
track 10 ip sla 10 state
NX-1# show track 10
Track 10
IP SLA 10 State
State is UP
1 changes, last change 00:01:01
Latest operation return code: OK
Latest RTT (millisecs): 3
NX-1
! Previous track configurations
NX-1# show run track
track 1 interface Ethernet2/5 line-protocol
track 2 interface Ethernet2/5 ip routing
track 5 ip route 192.168.2.2/32 reachability
delay up 1 down 3
track 10 ip sla 10
! Track List with Boolean AND for matching track 1 and not
matching track 2.
NX-1(config)# track 20 list boolean and
NX-1(config-track)# object 1
NX-1(config-track)# object 2 not
Note
If any issues with object tracking arise, collect the show tech track
command output and share it with the Cisco Technical Assistance
Center (TAC).
IPv4 Services
NX-OS contains a wide array of critical network services that provide
flexibility, scalability, reliability, and security in the network and solve
critical problems that enterprise or data centers face. This section discusses
the following IP services:
DHCP relay
DHCP snooping
Dynamic ARP inspection
IP source guard
Unicast RPF
DHCP Relay
Unlike traditional Cisco IOS or Cisco IOS XE software, NX-OS does not
support the Dynamic Host Configuration Protocol (DHCP) server feature.
However, you can enable the NX-OS device to function as a DHCP relay
agent. A DHCP relay agent is a device that helps in relaying DHCP
requests/replies between the DHCP client and the DHCP server when they
are on different subnets. The relay agent listens for the client’s request and
adds vital data such as the client’s link information, which the server needs
to allocate address for the client. When the server replies, the relay agent
forwards the information back to the client.
The DHCP relay agent is a useful feature, but some security concerns do
arise:
A host on one port cannot see other hosts traffic on other ports.
Hosts connected to the metro port can no longer be trusted. Therefore,
a mechanism is needed to identify them more securely.
Protection from network spoofing attacks by malicious hosts (IP
exhaustion, IP spoofing, denial of service [DoS] attacks, and so on).
DHCP option 82 helps overcome these issues. Defined in RFC 3046, DHCP
option 82 is a new type of container option that contains suboption
information gathered by the relay agent. Figure 6-2 shows the format of the
DHCP relay agent information option.
NX-1
NX-1(config)# feature dhcp
NX-1(config)# ip dhcp relay
NX-1(config)# interface e7/1
NX-1(config-if)# ip dhcp relay address 192.168.2.2
When the configuration is done and the client tries to request an IP address,
the DHCP relay agent helps exchange the messages between the client and
the server. Use the command show ip dhcp relay to verify that the
interface is enabled with DHCP relay. After the messages are exchanged,
verify the statistics of all the messages received and forwarded by the relay
agent in both directions (between server and client) using the command
show ip dhcp relay statistics. Example 6-13 examines the DHCP relay
configuration and statistics on NX-1. The show ip dhcp relay statistics
command output displays the statistics for all the different kinds of DHCP
packets received, forwarded, and dropped by the relay agent. Along with
this information, the command output displays the various reasons why a
relay agent drops the packet, along with its statistics.
DHCP L3 FWD:
Total Packets Received : 0
Total Packets Forwarded : 0
Total Packets Dropped : 0
Non DHCP:
Total Packets Received : 0
Total Packets Forwarded : 0
Total Packets Dropped : 0
DROP:
DHCP Relay not enabled : 0
Invalid DHCP message type : 0
Interface error : 0
Tx failure towards server : 0
Tx failure towards client : 0
Unknown output interface : 0
Unknown vrf or interface for server : 0
Max hops exceeded : 0
Option 82 validation failed : 0
Packet Malformed : 0
Relay Trusted port not configured : 0
* - These counters will show correct value when switch
receives DHCP request packet with destination ip as broadcast
address. If request is unicast it will be HW switched
Example 6-14 Verifying ACL on the Line Card for DHCP Relay
Click here to view code image
INSTANCE 0x0
---------------
No egress policies
No Netflow profiles in egress direction
When the ACL is programmed on the line card, view the hardware statistics
for the ACL using the command show system internal access-list input
statistics [module slot]. Example 6-15 displays the statistics for the DHCP
relay ACL, where five hits match the traffic coming from source port 67. If
during regular operation DHCP is not functioning properly, use the
command show system internal access-list input statistics [module slot]
and the command in the previous example to ensure that both the DHCP
relay ACL is programmed in hardware and the statistics counters are
incrementing.
Example 6-15 Verifying ACL Statistics on the Line Card for DHCP Relay
Click here to view code image
INSTANCE 0x0
---------------
DHCP Snooping
DHCP snooping is an L2 security feature. It resolves some types of DoS
attacks that can be engineered by DHCP messages and helps avoid IP
spoofing, in which a malicious host tries to use the IP address of another
host. DHCP snooping works at two levels:
Discovery
Enforcement
Discovery includes the functions of intercepting DHCP messages and
building a database of {IP address, MAC address, Port, VLAN} records.
This database is called the binding table. Enforcement includes the
functions of DHCP message validation, rate limiting, and conversion of
DHCP broadcasts to unicasts.
DHCP snooping provides the following security features:
Prevention of DoS attacks through DHCP messages
DHCP message validation
Creation of a DHCP binding table that helps validate DHCP messages
Option 82 insertion/removal
Rate limiting on the number of DHCP messages on an interface
Note
DHCP snooping is associated with the DHCP relay agent, which helps
extend the same security features when the DHCP client and server are
in different subnets.
NX-1
NX-1(config)# ip dhcp snooping
NX-1(config)# ip dhcp snooping vlan 100
NX-1(config)# interface e7/13
NX-1(config-if)# ip dhcp snooping trust
NX-1# show ip dhcp snooping
Switch DHCP snooping is enabled
DHCP snooping is configured on the following VLANs:
100
DHCP snooping is operational on the following VLANs:
100
Insertion of Option 82 is disabled
Verification of MAC address is enabled
DHCP snooping trust is configured on the following interfaces:
Interface Trusted
------------ -------
Ethernet7/13 Yes
After the requests/replies are exchanged between the client and the server,
a binding entry is built on the device with DHCP snooping configuration
for the untrusted port. The binding table is also used by IP source guard
(IPSG) and the Dynamic ARP Inspection (DAI) feature. To view the
binding table, use the command show ip dhcp snooping binding (see
Example 6-17). In this example, notice that the entry is built for the
untrusted port Eth7/1 and also shows the IP address assigned to the host
with the listed MAC address.
INSTANCE 0x0
---------------
INSTANCE 0x0
---------------
Vlan : 100
-----------
ARP Req Forwarded = 2
ARP Res Forwarded = 3
ARP Req Dropped = 0
ARP Res Dropped = 0
DHCP Drops = 0
DHCP Permits = 5
SMAC Fails-ARP Req = 0
SMAC Fails-ARP Res = 0
DMAC Fails-ARP Res = 0
IP Fails-ARP Req = 0
IP Fails-ARP Res = 0
For DAI, an ARP snooping ACL (VACL) is programmed on the line card.
Note that because the DAI feature is enabled along with the DHCP
snooping feature, both the ACLs are seen on the line card. Example 6-20
displays the ACL programmed on the line card and the relevant statistics
for the same.
INSTANCE 0x0
---------------
Tcam 1 resource usage:
----------------------
Label_b = 0x202
Bank 0
------
IPv4 Class
Policies: DHCP(Snooping) [Merged]
Netflow profile: 0
Netflow deny profile: 0
5 tcam entries
ARP Class
Policies: ARP(Snooping) [Merged]
Netflow profile: 0
Netflow deny profile: 0
3 tcam entries
INSTANCE 0x0
---------------
ARP Class
Policies: ARP(Snooping) [Merged]
Netflow profile: 0
Netflow deny profile: 0
Entries:
[Index] Entry [Stats]
---------------------
[0062:0018:0018] prec 1 redirect(0x0) arp/response ip 0.0.0.0/0
0.0.0.0/0 0000
.0000.0000 0000.0000.0000 [2]
[0063:0019:0019] prec 1 redirect(0x0) arp/request ip 0.0.0.0/0
0.0.0.0/0 0000.
0000.0000 0000.0000.0000 [1]
[0064:001a:001a] prec 1 permit arp-rarp/all ip 0.0.0.0/0
0.0.0.0/0 0000.0000.0
000 0000.0000.0000 [0]
ARP ACLs
In non-DHCP (no DHCP snooping enabled) scenarios, you can define ARP
ACLs to filter out malicious ARP requests and responses. No packets are
redirected to the supervisor. ARP packets coming on the line card get
forwarded and dropped in the line card based on the ACL list by user config
for the ARP inspection filter. The ARP ACL filters are configured on a per-
VLAN basis. An ARP ACL is configured using the command arp access-
list acl-name. It accepts the entries in the format of [permit | deny]
[request | response] ip ip-address subnet-mask mac [mac-address mac-
address-range]. Example 6-21 demonstrates an ARP ACL that is applied as
an ARP inspection filter for VLAN 100. After it is configured, the ARP
ACL gets programmed in the hardware and you can verify the statistics in
hardware using the same command of show system internal access-list
input statistics [module slot].
Example 6-23 SAL Database Info and FIB Verification for IPSG
Click here to view code image
NX-1# show system internal sal info database vlan 100
----+---------------------+----------+----------+-----------
Dev | Prefix | PfxIndex | AdjIndex | LIF
----+---------------------+----------+----------+-----------
0 10.12.1.3/32 0x406 0x4d 0xfff
Unicast RPF
Unicast Reverse Path Forwarding (URPF) is a technique that matches on
source IP addresses to drop the traffic at the edge of the network. In other
words, URPF prevents the network from source IP spoofing attacks. This
allows other legitimate sources to send their traffic towards the destination
server. URPF is implemented in two different modes:
Loose mode: A loose mode check is successful when a lookup of a
packet source address in the FIB returns a match and the FIB result
indicates that the source is reachable through at least one real
interface. The ingress interface through which the packet is received is
not required to match any of the interfaces in the FIB result.
Strict mode: A strict mode check is successful when Unicast RFP
finds a match in the FIB for the packet source address and the ingress
interface through which the packet is received matches one of the
Unicast RPF interfaces in the FIB match. If this check fails, the packet
is discarded. Use this type of Unicast RPF check when packet flows
are expected to be symmetrical.
Strict mode URPF is used on up to eight ECMP interfaces; if more than
eight are in use, it reverts to loose mode. Loose mode URPF is used on up
to 16 ECMP interfaces. URPF is applied on L3 interfaces, SVI, L3 port-
channels, and subinterfaces. One caveat of URPF strict mode is that /32
ECMP routes are incompatible. Thus, using URPF strict mode on the
uplink to the core is not recommended because the /32 route could be
dropped.
URPF is configured using the command ip verify unicast source
reachable-via [any [allow-default] | rx]. The rx option enables strict
mode; the any option enables loose mode URPF. The allow-default option
is used with loose mode to include IP addresses that are not specifically
contained in the routing table. Example 6-24 demonstrates the
configuration for enabling URPF strict mode on an L3 interface. After
configuration, use the command show ip interface interface-id to check
whether URPF has been enabled on the interface. In the following example,
the URPF mode enabled on interface Eth7/1 is strict mode.
Neighbor Discovery
Defined in RFC 4861, IPv6 Neighbor Discovery (ND) is a set of messages
and processes that determine the relationships between two IPv6
neighboring nodes. The IPv6 ND is built on top of ICMPv6, which is
defined in RFC 2463. IPv6 ND replaces protocols such as ARP, ICMP
redirect, and ICMP router discovery messages, used in IPv4. Both IPv6 ND
and ICMPv6 are critical for operations of IPv6.
IPv6 ND defines five ICMPv6 packets to provide the nodes with the
information they must and should know before establishing a
communication:
Router Solicitation (ICMPv6 Type 133, code 0)
Router Advertisement (ICMPv6 Type 134, code 0)
Neighbor Solicitation (ICMPv6 Type 135, code 0)
Neighbor Advertisement (ICMPv6 Type 136, code 0)
Redirect Message (ICMPv6 Type 137, code 0)
When an interface is enabled, hosts can send out a Router Solicitation (RS)
that requests routers to generate Router Advertisements immediately
instead of at their next scheduled time. When an RS message is sent, the
source address field is set to the MAC address of the sending network
interface card (NIC). The destination address field is set to
33:33:00:00:00:02 in the Ethernet header. In the IPv6 header, the source
address field is set to either the link-local IPv6 address assigned to the
sending interface or the IPv6 unspecified address (::). The destination
address is set to All Router multicast address with link local scope
(FF02:2) and the hop limit is set to 255.
Routers advertise their presence together with various link and Internet
parameters either periodically or in response to a Router Solicitation
message. Router Advertisements (RAs) contain prefixes that are used for
on-link determination and/or address configuration, a suggested hop limit
value, maximum transmission unit (MTU), and so on. In the Ethernet
header of the RA message, the source address field is set to the sending
NIC; the destination address field is set to 33:33:00:00:00:01 or the unicast
MAC address of the host that sent a RS message from a unicast address.
Similar to the RS message, the source address field is set to the link-local
address assigned to the sending interface; the destination address is set to
either the all-nodes multicast address with link-local scope (FF02:1) or the
unicast IPv6 address of the host that sent the RS message. The hop limit
field is set to 255.
A Neighbor Solicitation (NS) is sent by a node to determine the link-layer
address of a neighbor or to verify that a neighbor is still reachable via a
cached link-layer address. Neighbor Solicitations are also used for
duplicate address detection (DAD). In the Ethernet header of the NS
message, the destination MAC address corresponds to the solicited-node
address of the target. In a unicast NS message, the destination address field
is set to the unicast MAC address of the neighbor. In the IPv6 header, the
source address is set to IPv6 address of the sending interface or, during
DAD, the unspecified address (::). For a multicast NS, the destination
address is set to target the solicited node address. For unicast NS, the
destination is set to the IPv6 unicast address of the target.
A Neighbor Advertisement (NA) is a response to a Neighbor Solicitation
message. A node can also send unsolicited Neighbor Advertisements to
announce a link-layer address change. In the Ethernet header of a solicited
NA, the destination MAC is set to the unicast MAC address of the initial
NS sender. For an unsolicited NA, the destination MAC is set to
33:33:00:00:00:01, which is the link-local scope all-nodes multicast
address. In the IPv6 header, the source address is set to an IPv6 unicast
address assigned on the sending interface. The destination IPv6 address for
a solicited NA is set to the IPv6 unicast address of the sender of initial NS
message. For an unsolicited NA, the destination field is set to the link-local
scope all-nodes multicast address (FF02::1).
A Redirect Message (RM) is used by routers to inform hosts of a better
first hop for a destination. In the Ethernet header, the destination MAC is
set to the unicast MAC of the originating sender. In the IPv6 header, the
source address field is set to the unicast IPv6 address of the sending
interface and the destination address is set to the unicast address of the
originating host.
To enable neighbor discovery, the first step is to enable IPv6 or configure
an IPv6 address on an interface. An IPv6 address is configured using either
the command ipv6 address ipv6-address [eui64] or the command ipv6
address use-link-local-only. The command option eui64 configures the
IPv6 address in EUI64 format. The command option use-link-local-only
manually configures a link-local address on the interface instead of using
the automatically assigned link-local address. Examine an IPv6-enabled
link between two switches NX-1 and NX-2, as in Figure 6-5. In this
topology, the link is configured with the IPv6 address of subnet
2002:10:12:1::/64.
When the IPv6 address is configured on both sides of the link and one of
the sides initiates a ping, the ND process starts and an IPv6 neighborship is
established. An IPv6 neighbor is viewed using the command show ipv6
neighbor [detail]. Example 6-25 demonstrates an IPv6 neighborship
between two switches. Notice that when the IPv6 address is configured and
the user initiates a ping to either the IPv6 unicast address or the link-local
address of the remote peer, the IPv6 ND process is initiated, messages are
exchanged, and an IPv6 neighborship is formed.
NX-1
NX-1(config)# interface Eth4/1
NX-1(config-if)# ipv6 address 2002:10:12:1::1/64
NX-2
NX-2(config)# interface Eth4/13
NX-2(config-if)# ipv6 address 2002:10:12:1::2/64
NX-1
! IPv6 neighbor output after initiating ipv6 ping
NX-1# show ipv6 neighbor
NX-1
NX-1# ethanalyzer local interface inband display-filter "ipv6"
limit-captured-frames 0
Capturing on inband
2017-10-14 21:25:51.314297 2002:10:12:1::1 -> ff02::1:ff00:2
ICMPv6 86 Neighbor Solicitation for 2002:10:12:1::2 from
00:01:00:01:00:12
4 2017-10-14 21:25:51.315476 2002:10:12:1::2 -> 2002:10:12:1::1
ICMPv6 86 Neighbor Advertisement 2002:10:12:1::2 (rtr, sol, ovr)
is at 00:02:00:02:00:12
2017-10-14 21:25:53.319291 2002:10:12:1::1 -> 2002:10:12:1::2
ICMPv6 118 Echo (ping) request id=0x1eaf, seq=1, hop limit=255
2017-10-14 21:25:53.319620 2002:10:12:1::2 -> 2002:10:12:1::1
ICMPv6 118 Echo (ping) reply id=0x1eaf, seq=1, hop limit=2
(request in 2580)
! Output omitted for brevity
INSTANCE 0x0
---------------
Note
Other FHS techniques exist, but this chapter does not address them.
Refer to the Cisco.com documentation for more details.
RA Guard
RA Guard is a feature that enables the user of the Layer 2 switch to
configure which switch ports face routers. Router Advertisements received
on any other port are dropped, so they never reach the end hosts of the link.
RA Guard performs further deep packet inspection to validate the source of
the RA, the prefix list, the preference, and any other information carried.
RA Guard is specified in RFC 6105. The goal of this feature is to inspect
Router Neighbor Discovery (ND) traffic (such as Router Solicitations [RS],
Router Advertisements [RA], and redirects) and to drop bogus messages.
The feature introduces the capability to block unauthorized messages based
on policy configuration (for example, RAs are not allowed on a Host port).
To enable IPv6 RA Guard, first an RA Guard policy is defined and then the
policy is applied on an interface. The RA Guard policy is defined using the
command ipv6 nd raguard policy policy-name. Table 6-3 displays all the
options available as part of the RA Guard policy.
Table 6-3 RA Guard Policy Subconfiguration Options
Option Description
device-role [host | Defines the role of the device attached to the port,
router | monitor | which can be a host, router, monitor, or switch.
switch]
hop-limit [maximum | Verifies the specified hop-count limit. If this is not
minimum limit] configured, the check is bypassed.
When the policy is defined, it is applied to the interface using the interface-
level configuration command ipv6 nd raguard attach-policy policy-name.
Example 6-30 displays the sample RA Guard configuration. The command
show ipv6 nd raguard policy policy-name shows the RA Guard policy
attached on different interfaces.
Note
To debug any issues with IPv6 RA Guard, use the debug command
debug ipv6 snooping raguard, which is captured in a debug logfile.
IPv6 Snooping
IPV6 Snooping is a combination of two features: ND Snooping and
DHCPv6 Snooping. IPv6 ND Snooping analyzes IPv6 neighbor discovery
traffic and determines whether it is harmless for nodes on the link. During
this inspection, it gleans address bindings (IP, MAC, port) when available
and stores them in a binding table. The binding entry is then used to
determine address ownership, in case of contention between two clients.
IPv6 DHCP Snooping traps DHCPv6 packets between the client and the
server. From the packets snooped, assigned addresses are learned and stored
in the binding table. The IPv6 Snooping feature can also limit the number
of addresses that any node on the link can claim. This helps protect the
switch binding table against DoS flooding attacks. Figure 6-6 explains the
role of IPv6 snooping and shows how it prevents the device from invalid or
unwanted hosts.
Figure 6-6 IPv6 Snooping
device-role Specifies the device role of the device attached to the port. By
[node | default, the device role is “node.” The device role (combined
switch] with the trusted-port command) has a direct influence on the
preference level of an entry learned from the interface where
this policy applies.
The device-role node has an inherent preference of access
port and the device-role switch has a preference of trunk port.
tracking Override the default tracking policy on the port where this
[enable policy applies.
[reachable- This is especially useful on trusted ports for which tracking
lifetime entries is not desired the entry should be present in the
value] | binding table to prevent stealing. In this case, configure the
disable command tracking disable stale-lifetime infinite.
[stale-
lifetime
value]]
trusted port When receiving messages on ports with this policy, limited to
no verification is performed. Nevertheless, to protect against
address spoofing, messages still are analyzed so that the
binding information they carry can be used to maintain the
binding table. Bindings discovered from these ports are
considered more trustable than bindings received from
untrusted ports.
validate When receiving Neighbor Discovery Protocol (NDP)
source-mac messages that contain a link-layer address option, the source
MAC address is checked against the link-layer address option.
The packet is dropped if they are different.
protocol Specifies which protocol should be redirected to the snooping
[dhcp | ndp] component for analysis.
security- Specifies the security level enforced by the IPv6 snooping
level [glean feature. The default is guard.
| inspect | glean: Learns bindings but does not drop the packets.
guard]
inspect: Learns bindings and drops packets if it detects an
issue.
guard: Works like inspect, but also drops IPv6, ND, RA, and
IPv6 DHCP server packets in case of a threat.
When the policy is defined, it can be attached using the command ipv6
snooping attach-policy policy-name under the vlan configuration vlan-id
subconfiguration mode. Example 6-31 displays the configuration of IPv6
snooping policy for VLAN 100.
INSTANCE 0x0
---------------
Note
DHCP Guard and the DHCP relay agent essentially work together only
at the first hop. In later hops, DHCP relay agent is given priority over
DHCP Guard. Statistics are maintained separately for both features.
To configure DHCPv6 Guard policy, use the command ipv6 dhcp guard
policy policy-name. Under the policy, the first step is to define the device
role, which is client, server, or monitor. Then you define the advertised
minimum and maximum allowed server preference. You can also specify
whether the device is connected on a trusted port using the trusted port
command option. After configuring the policy, use the command ipv6 dhcp
guard attach-policy policy-name command to attach the policy to a port or
a VLAN. Example 6-33 shows the configuration of DHCPv6 Guard. To
check the policy configuration, use the command show ipv6 dhcp guard
policy or use the command show ipv6 snooping policies to verify both the
IPv6 snooping and the DHCPv6 guard policies; DHCPv6 Guard works in
conjunction with IPv6 snooping.
HSRP
Defined in RFC 2281, Hot Standby Routing Protocol (HSRP) provides
transparent failover of the first-hop device, which typically acts as a
gateway to the hosts. HSRP provides routing redundancy for IP hosts on
Ethernet networks configured with a default gateway IP address. It requires
a minimum of two devices to enable HSRP; one device acts as the active
device and takes care of forwarding the packets, and the other acts as a
standby, ready to take over the role of active device in case of any failure.
On a network segment, a virtual IP is configured on each HSRP-enabled
interface that belongs to the same HSRP group. HSRP selects one of the
interfaces to act as the HSRP active router. Along with the virtual IP, a
virtual MAC address is assigned for the group. The active router receives
and routes the packets destined for the virtual MAC address of the group.
When the HSRP active device fails, the HSRP standby device assumes
control of the virtual IP and MAC address of the group. If more than two
devices are part of the HSRP group, a new HSRP standby device is
selected. Network operators control which device should act as the HSRP
active device by defining interface priority (the default is 100). The higher
priority determines which device will act as an HSRP active device.
HSRP-enabled interfaces send and receive multicast UDP-based hello
messages to detect any failure and designate active and standby routers. If
the standby device does not receive a hello message or the active device
fails to send a hello message, the standby device with the second-highest
priority becomes HSRP active. The transition of HSRP active between the
devices is transparent to all hosts on the segment.
HSRP supports two versions: version 1 and version 2. Table 6-5 includes
some of the differences between HSRP versions.
Table 6-5 HSRP Version 1 Versus Version 2
HSRP Version 1 HSRP Version 2
Timers Does not support millisecond timer Supports
millisecond timer
values
Group Range 0–255 0–4095
Multicast 224.0.0.2 224.0.0.102
Address
MAC Address 0000.0C07.ACxy, where xy is a hex 0000.0C9F.F000 to
Range value representing an HSRP group 0000.0C9F.FFFF
number
Authentication Does not support authentication Supports MD5
authentication
Note
Transitioning from HSRP version 1 to version 2 can be disruptive,
given the change in MAC address between both versions.
When the HSRP is configured on the segment and both the active and
standby devices are chosen, the HSRP control packets contain the following
fields:
Source MAC: Virtual MAC of the active device or the interface MAC
of the standby or listener device
Destination MAC: 0100.5e00.0002 for version 1 and 0100.5e00.0066
for version 2
Source IP: Interface IP
Destination IP: 224.0.0.2 for version 1 and 224.0.0.102 for version 2
UDP port 1985
To understand the functioning of HSRP, examine the topology in Figure 6-
7. Here, HSRP is running on VLAN 10.
To enable HSRP, use the command feature hsrp. When configured, HSRP
runs on default HSRP version 1. To manually change the HSRP version, use
the command hsrp version [1 | 2] under the interface where HSRP is
configured.
Example 6-34 illustrates the configuration of HSRP for VLAN 10. In this
example, HSRP is configured with the group number 10 and a VIP of
10.12.1.1. NX-1 is set to a priority of 110, which means that NX-1 acts as
the active HSRP gateway. HSRP is also configured with preemption; in
case of a failure on NX-1 and the HSRP active gateway failover to NX-2,
the NX-1 regains the active role when NX-1 becomes active and available.
NX-1
interface Vlan10
no shutdown
no ip redirects
ip address 10.12.1.2/24
hsrp version 2
hsrp 10
preempt
priority 110
ip 10.12.1.1
NX-2
interface Vlan10
no shutdown
no ip redirects
ip address 10.12.1.3/24
hsrp version 2
hsrp 10
ip 10.12.1.1
To view the status of HSRP groups and determine which device is acting as
an active or standby HSRP device, use the command show hsrp brief. This
command displays the group information, the priority of the local device,
the active and standby HSRP interface address, and also the group address,
which is the HSRP VIP. You can also use the command show hsrp [detail]
to view more details about the HSRP groups. This command not only
details information about the HSRP group, but it also lists the timeline of
the state machine a group goes through. This command is useful when
troubleshooting any HSRP finite state machine issues. The show hsrp
[detail] command also displays any authentication configured for the
group, along with the virtual IP (VIP) and virtual MAC address for the
group. Example 6-35 displays both the show hsrp brief and show hsrp
detail command outputs. One important point to note in the following
output is that if no authentication is configured, the show hsrp detail
command displays it as Authentication text “cisco”.
The active HSRP gateway device also populates the ARP table with the
virtual IP and the virtual MAC address, as in Example 6-37. Notice that the
virtual IP 10.12.1.1 maps to MAC address 0000.0c9f.f00a, which is the
virtual MAC of group 10.
Example 6-37 HSRP Virtual MAC and Virtual IP Address in ARP Table
Click here to view code image
NX-1# show ip arp vlan 10
IP ARP Table
Total number of entries: 1
Address Age MAC Address Interface
10.12.1.1 - 0000.0c9f.f00a Vlan10
If the HSRP is down or flapping between the two devices, or if the HSRP
has not established the proper states between the two devices (for example,
both devices are showing in Active/Active state), it might be worth
enabling packet capture or running a debug to investigate whether the
HSRP hello packets are making it to the other end or whether they are
being generated locally on the switch. Because the HSRP control packets
are destined for the CPU, use Ethanalyzer to capture those packets. The
display-filter of hsrp helps capture HSRP control packets and determine
whether any are not being received.
Along with Ethanalyzer, you can enable HSRP debug to see whether the
hello packet is being received. The HSRP debug for the hello packet is
enabled using the command debug hsrp engine packet hello interface
interface-id group group-number. The command displays the hello packet
from and to the peer, along with other information such as authentication,
hello, and the hold timer.
Example 6-38 displays the Ethanalyzer and HSRP debug for capturing hello
packets. Note that HSRP version 2 assigns a 6-byte ID to identify the
sender of the HSRP hello packet, which is usually the interface MAC
address.
One of the most common problems with HSRP is the group remaining in
down state. This can happen for the following reasons:
The virtual IP is not configured.
The interface is down.
The interface IP is not configured.
Thus, while troubleshooting any HSRP group down-state issues, these
points should all be checked.
HSRPv6
HSRP for IPv6 (HSRPv6) provides the same functionality to IPv6 hosts as
HSRP for IPv4. An HSRP IPv6 group has a virtual MAC address that is
derived from the HSRP group number and has a virtual IPv6 link-local
address that is, by default, derived from the HSRP virtual MAC address.
When the HSRPv6 group is active, periodic RA messages are sent for the
HSRP virtual IPv6 link-local address. These RA messages stop after a final
RA is sent, when the group leaves the active state (moves to standby state).
HSRPv6 has a different MAC address range and UDP port than HSRP for
IPv4. Consider some of these values:
HSRP version 2
UDP port: 2029
MAC address range: 0005.73A0.0000 to 0005.73A0.0FFF
Hellos Multicast Address: FF02::66 (link-local scope multicast
address)
Hop limit: 255
No separate feature is required to enable HSRPv6. Feature hsrp enables
HSRP for both IPv4 and IPv6 address families. Example 6-39 illustrates the
configuration of HSRPv6 between NX-1 and NX-2 on VLAN 10. In this
example, NX-2 is set with a priority of 110, which means NX-2 acts as the
active switch and NX-1 acts as the standby. In this example, a virtual IPv6
address is defined using the command ip ipv6-address, but this virtual IPv6
address is a secondary virtual IP address. The primary virtual IPv6 address
is automatically assigned for the group.
NX-1
interface Vlan10
no shutdown
no ipv6 redirects
ipv6 address 2001:db8::2/48
hsrp version 2
hsrp 20 ipv6
ip 2001:db8::1
NX-2
interface Vlan10
no shutdown
no ipv6 redirects
ipv6 address 2001:db8::3/48
hsrp version 2
hsrp 20 ipv6
preempt
priority 110
ip 2001:db8::1
HSRPv6 does not come up if the virtual IPv6 address is configured and
assigned on the interface. This information is verified using the command
show ipv6 interface interface-id. In addition, the virtual IPv6 address and
virtual MAC addresses must be added to ICMPv6. This information is
validated using the command show ipv6 icmp vaddr [link-local | global].
The keyword link-local displays the primary virtual IPv6 address, which is
automatically calculated using the virtual MAC. The keyword global
displays the manually configured virtual IPv6 address. Example 6-41
examines the output of both these commands.
NX-2
NX-2# show ipv6 interface vlan 10
IPv6 Interface Status for VRF "default"(1)
Vlan10, Interface status: protocol-up/link-up/admin-up, iod: 121
IPv6 address:
2001:db8::3/48 [VALID]
IPv6 subnet: 2001:db8::/48
IPv6 link-local address: fe80::e6c7:22ff:fe1e:9642 (default)
[VALID]
IPv6 virtual addresses configured:
fe80::5:73ff:fea0:14 2001:db8::1
IPv6 multicast routing: disabled
! Output omitted for brevity
NX-2# show ipv6 icmp vaddr link-local
Virtual IPv6 addresses exists:
Interface: Vlan10, context_name: default (1)
Group id: 20, Protocol: HSRP, Client UUID: 0x196, Active: Yes
(1) client_state:1
Virtual IPv6 address: fe80::5:73ff:fea0:14
Virtual MAC: 0005.73a0.0014, context_name: default (1)
For flapping HSRPv6 neighbors, the same Ethanalyzer trigger can be used
as for IPv4. Example 6-42 displays the Ethanalyzer output for HSRPv6
control packets, showing packets from both HSRP active and standby
switches.
Note
For any failure or problem with HSRP or HSRPv6, collect the show
tech hsrp output in problematic state.
VRRP
Virtual Router Redundancy Protocol (VRRP) was initially defined in RFC
2338, which defines version 1. RFC 3768 and RFC 5798 define version 2
and version 3, respectively. NX-OS supports only VRRP version 2 and
version 3. VRRP works in a similar concept as HSRP. VRRP provides box-
to-box redundancy by enabling multiple devices to elect a member as a
VRRP master that assumes the role of default gateway, thus eliminating a
single point of failure. The nonmaster VRRP member forms a VRRP group
and takes the role of backup. If the VRRP master fails, the VRRP backup
assumes the role of VRRP master and acts as the default gateway.
VRRP is enabled using the command feature vrrp. VRRP has a similar
configuration as HSRP. VRRP is configured using the command vrrp
group-number. Under the interface VRRP configuration mode, network
operators can define the virtual IP, priority, authentication, and so on. A no
shutdown is necessary under the vrrp configuration to enable the vrrp
group. Example 6-43 displays the VRRP configuration between NX-1 and
NX-2.
NX-1
interface Vlan10
no shutdown
no ip redirects
ip address 10.12.1.2/24
vrrp 10
priority 110
authentication text cisco
address 10.12.1.1
no shutdown
NX-2
interface Vlan10
no shutdown
no ip redirects
ip address 10.12.1.3/24
vrrp 10
authentication text cisco
address 10.12.1.1
no shutdown
To verify the VRRP state, use the command show vrrp [master | backup].
The master and backup options display information on the respective
nodes. The show vrrp [detail] command output is used to gather more
details about the VRRP. Example 6-44 displays the detailed VRRP output,
as well as VRRP state information. Notice that, in this example, the
command show vrrp detail output displays the virtual IP as well as the
virtual MAC address. The VRRP virtual MAC address is of the format
0000.5e00.01xy, where xy is the hex representation of the group number.
For any VRRP flapping issues, use the command show vrrp statistics to
determine whether the flapping is the result of some kind of error or a
packet being wrongly received. The command displays the number of times
the device has become a master, along with other error statistics such as
TTL errors, invalid packet length, and a mismatch in address list. Example
6-45 displays the output of the show vrrp statistics. Notice that NX-1
received five authentication failure statistics for group 10.
VRRP version 2 has support only for the IPv4 address family, but VRRP
version 3 (VRRP3) has support for both IPv4 and IPv6 address families. On
NX-OS, both VRRP and VRRPv3 cannot be enabled on the same device. If
the feature VRRP is already enabled on the Nexus switch, enabling the
feature VRRPv3 displays an error stating that VRRPv2 is already enabled.
Thus, a migration must be performed from VRRP to VRRPv3, which has
minimal impact on the services. Refer to the following steps to perform the
migration from VRRP version 2 to version 3.
Step 1. Disable the feature VRRP using the command no feature vrrp.
Step 2. Enable the feature VRRPv3 using the command feature vrrpv3.
Step 3. Under the interface, configure the VRRPv3 group using the
command vrrpv3 group-number address-family [ipv4 | ipv6].
Step 4. Use the address command to define the VRRPv3 primary and
secondary virtual IP.
Step 5. Use the command vrrpv2 to enable backward compatibility with
VRRP version 2. This helps in exchanging state information with
other VRRP version 2 devices.
Step 6. Perform a no shutdown on the VRRPv3 group.
Example 6-46 illustrates a migration configuration from VRRPv2 to
VRRPv3 on the NX-1 switch.
Example 6-46 VRRPv3 Migration Configuration
Click here to view code image
NX-1
NX-1(config)# feature vrrpv3
Cannot enable VRRPv3: VRRPv2 is already enabled
The command show vrrpv3 [brief | detail] verifies the information of the
VRRPv3 groups. The show vrrpv3 brief command option displays the
brief information related to the group, such as group number, address
family, priority, preemption, state, master address, and group address
(which is the virtual group IP). The show vrrpv3 detail command displays
additional information, such as advertisements sent and received for both
VRRPv2 and VRRPv3, virtual MAC address, and other statistics related to
errors and transition states. Example 6-47 displays both the brief and
detailed command output of show vrrpv3.
You can also use the show vrrpv3 statistics command output to view the
error statistics. This command displays the counters for dropped packets
and packets dropped for various reasons, such as invalid TTL, invalid
checksum, or invalid message type. The second half of the output is similar
to the show vrrpv3 detail output. Example 6-48 displays the output of the
command show vrrpv3 statistics.
Example 6-48 show vrrpv3 statistics Command Output
Click here to view code image
NX-1
NX-1# show vrrpv3 statistics
Note
For any failure or issues with VRRP, collect the output from the
commands show tech vrrp [brief] or show tech vrrpv3 [detail]
during problematic state for further investigation by Cisco TAC.
GLBP
As the name suggests, Gateway Load-Balancing Protocol (GLBP) provides
gateway redundancy and load balancing to the network segment. It provides
redundancy with an active/standby gateway and supplies load balancing by
ensuring that each member of the GLBP group takes care of forwarding the
traffic to the appropriate gateway. GLBP is enabled on NX-OS using the
command feature glbp. When defining a GLBP group, the following
parameters can be configured:
Group number, primary and secondary IP addresses
Priority value for selecting the Active Virtual Gateway (AVG)
Preemption time and preemption delay time
Priority value and preemption delay time for virtual forwarders
Initial weighting value, upper and lower threshold values for a
secondary gateway to become AVG
Gateway load-balancing method
MD5 and clear-text authentication attributes
GLBP timer values
Interface tracking
GLBP provides three load-balancing mechanisms:
None: Functionality is similar to HSRP.
Host-dependent: The host MAC address is used to decide which
virtual forwarder MAC the packet is redirected to. This method
ensures that the host uses the same virtual MAC address, as long as
the number of virtual forwarders does not change within the group.
Round-robin: Each virtual forwarder MAC address is used to
sequentially reply for the virtual IP address.
Weighted: Weights are determined for each device in the GLBP
group, to define the ratio of load balancing between the devices.
Example 6-49 displays the GLBP configuration between NX-1 and NX-2.
Example 6-49 GLBP Configuration
Click here to view code image
NX-1
NX-1(config)# interface vlan 10
NX-1(config-if)# glbp 10
NX-1(config-if-glbp)# timers 1 4
NX-1(config-if-glbp)# priority 110
NX-1(config-if-glbp)# preempt
NX-1(config-if-glbp)# load-balancing ?
host-dependent Load balance equally, source MAC determines
forwarder choice
round-robin Load balance equally using each forwarder in
turn
weighted Load balance in proportion to forwarder
weighting
Similar to HSRP version 2, GLBP communicates its hello packets over the
multicast address 224.0.0.102. However, it uses the UDP source and
destination port number of 3222.
To view the details of the GLBP group, use the command show glbp
[brief]. The command displays the configured virtual IP, the group state,
and all the other information related to the group. The command output
also displays information regarding the forwarders, their MAC address, and
their IP addresses. Example 6-50 examines the output of both the command
show glbp and the command show glbp brief, displaying the information
of the GLBP group 10 along with its forwarder information and their states.
Example 6-50 show glbp and show glbp brief Command Output
Click here to view code image
NX-1
NX-1# show glbp
Vlan10 - Group 10
State is Active
4 state change(s), last state change(s) 00:01:54
Virtual IP address is 10.12.1.1
Hello time 1 sec, hold time 4 sec
Next hello sent in 990 msec
Redirect time 600 sec, forwarder time-out 14400 sec
Preemption enabled, min delay 0 sec
Active is local
Standby is 10.12.1.3, priority 100 (expires in 3.905 sec)
Priority 110 (configured)
Weighting 100 (default 100), thresholds: lower 1, upper 100
Load balancing: host-dependent
Group members:
5087.8940.2042 (10.12.1.2) local
E4C7.221E.9642 (10.12.1.3)
There are 2 forwarders (1 active)
Forwarder 1
State is Active
2 state change(s), last state change 00:01:50
MAC address is 0007.B400.0A01 (default)
Owner ID is 5087.8940.2042
Preemption enabled, min delay 30 sec
Active is local, weighting 100
Forwarder 2
State is Listen
1 state change(s), last state change 00:00:40
MAC address is 0007.B400.0A02 (learnt)
Owner ID is E4C7.221E.9642
Redirection enabled, 599.905 sec remaining (maximum 600 sec)
Time to live: 14399.905 sec (maximum 14400 sec)
Preemption enabled, min delay 30 sec
Active is 10.12.1.3 (primary), weighting 100 (expires in 3.905
sec)
NX-1# show glbp brief
Interface Grp Fwd Pri State Address Active
rtr Standby rtr
Vlan10 10 - 110
Active 10.12.1.1 local 10.12.1.3
Note
In case of an issue with GLBP, collect the show tech glbp command
output for further investigation by Cisco TAC.
Summary
NX-OS supports multiple IP and IPv6 services that complement the Nexus
platforms, along with their routing and switching capabilities within the
data center and position the Nexus switches at different layers. This chapter
detailed how IP SLA is leveraged to maintain track reachability, limit jitter
between a specified source and destination, and support both UDP- and
TCP-based probes. Along with IP SLA, the object tracking feature is
leveraged to perform conditional actions in the system. The object tracking
feature supports tracking an interface, an IP or IPv6 route, and a track list,
as well as using them with static routes.
A part of the IPv4 services, NX-OS provides support for DHCP relay,
snooping, and other IPv4 security–related features. This chapter covered in
detail how DHCP Relay and DHCP Snooping can be used in data center
environments to extend the capability of DHCP server and, at the same
time, protect the network from attacks. The DHCP Relay feature can be
used when the DHCP server and the host are extended across different
VLANs or subnets. This chapter also showed how to use security features
such as DAI, IP Source Guard, and URPF. When enabling all these services,
NX-OS configures ACLs in the hardware to permit relevant traffic.
For IPv6 services, this chapter covered the IPv6 neighbor discovery process
and IPv6 first-hop security features such as RA Guard, IPv6 snooping, and
DHCPv6 Guard. Additionally, the chapter looked at FHRP protocols such as
HSRP for both IPv4 and IPv6, VRRP, and GLBP. The FHRP protocols
provide hosts with gateway redundancy. Finally, the chapter looked at how
different FHRP protocols work and how to configure and troubleshoot
them.
Chapter 7
Troubleshooting Enhanced
Interior Gateway Routing
Protocol (EIGRP)
Table 7-1 contains key terms, definitions, and their correlation to Figure 7-
1.
Table 7-1 EIGRP Terminology
Term Definition
Successor The route with the lowest path metric to reach a destination.
Route The successor route for NX-1 to reach 10.4.4.0/24 on NX-4 is
NX-1→ NX-3→NX-4.
Successor The first next-hop router for the successor route.
The successor for 10.4.4.0/24 is NX-3.
Feasible The metric value for the lowest-metric path to reach a
Distance destination. The feasible distance is calculated locally using the
(FD) formula shown in the Path Metric Calculation section later in
this chapter.
The FD calculated by NX-1 for the 10.4.4.0/24 network is
3,328. (256+256+2,816).
Reported Distance reported by a router to reach a prefix. The reported
Distance distance value is the feasible distance for the advertising router.
(RD) NX-3 advertises the 10.4.4.0/24 prefix with a RD of 3,072
(256+2,816).
NX-4 advertises the 10.4.4.0/24 to NX-1 and NX-2 with a RD of
2,816.
Feasibility For a route to be considered a backup route, the reported
Condition distance received for that route must be less than the feasible
distance calculated locally. This logic guarantees a loop-free
path.
EIGRP uses K values to define which coefficients the formula uses and the
associated impact with that coefficient when calculating the metric. A
common misconception is that the K values directly apply to bandwidth,
load, delay, or reliability; this is not accurate. For example, K1 and K2 both
reference bandwidth (BW).
BW represents the slowest link in the path scaled to a 10 Gigabit per
second link (107). Link speed is collected from the configured interface
bandwidth on an interface. Delay is the total measure of delay in the path
measured in tenths of microseconds (μs).
The EIGRP formula is based off the IGRP metric formula, except the
output is multiplied by 256 to change the metric from 24 bits to 32 bits.
Taking these definitions into consideration, the formula for EIGRP is
shown in Figure 7-4.
Note
EIGRP includes a second formula to address high-speed interfaces
called EIGRP wide metrics, which add a sixth K value. EIGRP wide
metrics is explained later in the chapter.
The EIGRP update packet includes path attributes associated with each
prefix. The EIGRP path attributes can include hop count, cumulative delay,
minimum bandwidth link speed, and reported distance. The attributes are
updated at each hop along the way, allowing each router to independently
identify the shortest path.
Table 7-2 shows some of the common network types, link speeds, delays,
and EIGRP metrics using the streamlined formula from Figure 7-5.
Table 7-2 Default EIGRP Interface Metrics
Interface Type Link Speed (Kbps) Delay Metric
Serial 64 20,000 μs 40,512,000
T1 1,544 20,000 μs 2,170,031
Ethernet 10,000 1,000 μs 281,600
FastEthernet 100,000 100 μs 28,160
GigabitEthernet 1,000,000 10 μs 2,816
10 GigabitEthernet 10,000,000 10 μs 512
Note
Notice how the delay is the same between Serial and T1 interfaces, so
the only granularity is the link speed. In addition, there is not a
differentiation of delay between the Gigabit Ethernet and 10 Gigabit
Ethernet interfaces.
Using the topology from Figure 7-1, the metric from NX-1 for the
10.4.4.0/24 network is calculated using the formula in Figure 7-5. The link
speed for both Nexus switches is 1 Gbps, and the total delay is 30 μs (10 μs
for the 10.4.4.0/24 link, 10 μs for the 10.34.1.0/24 link, and 10 μs for the
10.13.1.0/24 link to NX-3).
The EIGRP metric for a specific prefix is queried directly from EIGRP’s
topology table with the command show ip eigrp topology network/prefix-
length. Example 7-1 shows NX-1’s topology table output for the 10.4.4.0/24
network. Notice that the output includes the successor route, any feasible
successor paths, and the EIGRP state for the prefix. Each path contains the
EIGRP attributes minimum bandwidth, total delay, interface reliability,
load, and hop count.
Note
The EIGRP topology table maintains other paths besides the successor
and feasible successor. The command show ip eigrp topology all-
links displays the other ones.
EIGRP Communication
EIGRP uses five packet types to communicate with other routers, as shown
in Table 7-3. EIGRP uses its own IP protocol number (88), and uses
multicast packets where possible and unicast packets when necessary.
Communication between EIGRP devices is accomplished using the
multicast group address of 224.0.0.10 or MAC address of
01:00:5e:00:00:0a when possible.
Table 7-3 EIGRP Packet Types
Type Packet Name Function
EIGRP uses the Reliable Transport Protocol (RTP) to ensure that packets
are delivered in order and that routers receive specific packets. A sequence
number is included in all of the EIGRP packets. A sequence value of zero
does not require a response from the receiving EIGRP router; all other
values require an Acknowledgement packet that includes the original
sequence number.
Ensuring that packets are received makes the transport method reliable. All
Updates, Queries, and Reply packets are deemed reliable, whereas Hello
and Acknowledgement packets do not require acknowledgement and could
be unreliable.
If the originating router does not receive an acknowledgement packet from
the neighbor before the retransmit timeout expires, it notifies the
nonresponsive router to stop processing its multicast packets. The
originating router sends all traffic via unicast, until the neighbor is fully
synchronized. Upon complete synchronization, the originating router
notifies the destination router to start processing multicast packets again.
All unicast packets require acknowledgement. EIGRP will retry up to 16
times for each packet that requires confirmation and will reset the neighbor
relationship when the neighbor reaches the retry limit of 16.
Baseline EIGRP Configuration
The EIGRP configuration process on a NX-OS switch requires
configuration under the EIGRP process and under the interface
configuration submode. The following steps explain the process for
configuring EIGRP on a NX-OS device.
Step 1. Enable the EIGRP feature. The EIGRP feature must be enabled with
the global configuration command feature eigrp.
Step 2. Define an EIGRP process tag. The EIGRP process must be defined
with the global configuration command router eigrp process-tag.
The process-tag can be up to 20 alphanumeric characters in length.
Step 3. Define the Router-ID (optional). The Router-ID (RID) is a 32-bit
unique number that identifies an EIGRP router. EIGRP uses the
RID as a loop prevention mechanism. The RID can be set manually
or dynamically, but should be unique for each EIGRP process. The
command router-id router-id is used to statically set the RID.
If a RID is not manually configured, the Loopback 0 IP address is
always preferred. If the Loopback 0 does not exist, NX-OS selects
the IP address for the first loopback interface in the configuration.
If no loopback interfaces exist, NX-OS selects the IP address for
the first physical interface in the configuration.
Step 4. Define the address family. EIGRP supports IPv4 and IPv6 address-
families under the same EIGRP process. Therefore, the address-
family should be defined with the command address-family [ipv4 |
ipv6] unicast.
This step is optional for exchanging IPv4 addresses on Nexus
switches.
Step 5. Define the Autonomous System Number (ASN) for the EIGRP
process. The autonomous system must be defined for the EIGRP
process with the command autonomous-system as-number.
This step is optional if the EIGRP process tag is only numeric and
matches the ASN used by the EIGRP process.
Step 6. Enable EIGRP on interfaces. The interface that EIGRP is enabled
on is selected with the command interface interface-id. The EIGRP
process is then enabled on that interface with the command ip
router eigrp process-tag.
Note
Unlike IOS devices, enabling EIGRP on an interface advertises any
secondary connected network into the topology table.
Table 7-4 provides a brief explanation to the key fields shown in Example
7-3.
Table 7-4 EIGRP Neighbor Columns
Field Description
Address IP address of the EIGRP neighbor
Interface Interface the neighbor was detected on
Holdtime Time left to receive a packet from this neighbor to ensure it is
still alive
SRTT Time for a packet to be sent to a neighbor, and a reply from that
neighbor received in milliseconds
RTO Timeout for retransmission (Waiting for ACK)
Q Cnt Number of packets (update/query/reply) in queue for sending
Seq Num Sequence number that was last received from this router
Table 7-5 provides a brief explanation of the key fields shown with the
EIGRP interfaces.
Table 7-5 EIGRP Interface Fields
Field Description
Interface Interfaces running EIGRP.
Peers Number of peers detected on that interface.
Xmt Queue Number of unreliable/reliable packets remaining in the
Un/Reliable transmit queue. A value of zero is an indication of a stable
network.
Mean SRTT Average time for a packet to be sent to a neighbor, and a reply
from that neighbor received in milliseconds.
Pacing Time Used to determine when EIGRP packets should be sent out of
Un/Reliable the interface (for unreliable and reliable packets).
Multicast Maximum time (seconds) that the router sent multicast
Flow Timer packets.
Pending Number of routes in the transmit queue that need to be sent.
Routes
Passive Interface
Some network topologies require advertising a network segment into
EIGRP, but need to prevent neighbors from forming adjacencies on that
segment. Example scenarios involve advertising access layer networks in a
campus topology.
To illustrate how this can cause problems, NX-1 and NX-2 cannot establish
an EIGRP adjacency. Viewing the EIGRP interfaces on both switches and
the peering link E1/1 is not displayed as expected in Example 7-5.
interface Vlan10
ip router eigrp NXOS
no ip passive-interface eigrp NXOS
interface loopback0
ip router eigrp NXOS
interface Ethernet1/1
ip router eigrp NXOS
NX-2# show run eigrp
interface Vlan20
ip router eigrp NXOS
interface loopback0
ip router eigrp NXOS
interface Ethernet1/1
ip router eigrp NXOS
ip passive-interface eigrp NXOS
Note
In addition to placing an interface into a passive state, an interface can
have EIGRP temporarily shut down with the command ip eigrp
process-tag shutdown. This disables EIGRP on that interface while
leaving EIGRP configuration on that interface.
Performing EIGRP debugs shows only the packets that have reached the
supervisor CPU. If packets are not displayed in the debugs, further
troubleshooting must be taken by examining quality of service (QoS)
policies, access control list (ACL), control plane policing (CoPP), or just
verification of the packet leaving or entering an interface.
QoS policies may or may not be deployed on an interface. If they are
deployed, the policy-map must be examined for any dropped packets,
which must then be referenced to a class-map that matches the EIGRP
routing protocol. The same logic applies to CoPP policies because they are
based on QoS settings.
Example 7-10 displays the process for checking the CoPP policy with the
following logic:
Examine the CoPP policy with the command show running-config
copp all. This displays the relevant policy-map name, classes defined,
and the police rate for each class.
Investigate the class-maps to identify the conditional matches for that
class-map.
After the class-map has been verified, examine the policy-map drops
for that class with the command show policy-map interface control-
plane.
violated 0 bytes,
5-min violate rate 0 bytes/sec
peak rate 0 bytes/sec
Note
This CoPP policy was taken from a Nexus 7000 switch; the policy-
name and class-maps may vary depending on the platform.
ip access-list EIGRP
statistics per-entry
permit eigrp 10.12.100.200/32 any any
permit eigrp any any
permit icmp any any
permit ip any any
interface vlan 10
ip access-group EIGRP in
NX-1# show ip access-list
IP access list EIGRP
statistics per-entry
10 permit eigrp 10.12.100.200/32 any log [match=100]
20 permit eigrp any any [match=200]
30 permit icmp any any [match=0]
40 permit ip any any [match=5]
In addition, NX-1 keeps changing the neighbor state for NX-2 (10.12.1.200)
after a retry limit was exceeded, as shown in Example 7-15.
NX-1
13:28:06 NX-1 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26809] (default-base) IP-EIGRP(0) 12: Neighbor 10.12.1.200
(Ethernet1/1) is down: retry limit exceeded
13:28:09 NX-1 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26809] (default-base) IP-EIGRP(0) 12: Neighbor 10.12.1.200
(Ethernet1/1) is up: new adjacency
21:19:00 NX-1 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26809] (default-base) IP-EIGRP(0) 123: Neighbor 10.12.1.200
(Ethernet1/1) is down: retry limit exceeded
21:19:00 NX-1 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26809] (default-base) IP-EIGRP(0) 123: Neighbor 10.12.1.200
(Ethernet1/1) is up: new adjacency
Note
NX-OS does not provide the syslog message “is blocked: not on
common subnet” that is included with IOS routers.
Remember that EIGRP will retry up to 16 times for each packet that
requires confirmation, and it will reset the neighbor relationship when the
neighbor reaches the retry limit of 16. The actual retry values are examined
on NX-OS by using the command show ip eigrp neighbor detail, as
demonstrated in Example 7-16.
The next step is to try to ping the primary IP address between nodes, as
shown in Example 7-17.
NX-1 cannot ping NX-2, and NX-2 cannot ping NX-1 because it does not
have a route to the host. This also means that NX-1 might have been able to
send the packets to NX-2, but NX-2 did not have a route to send the ICMP
response.
Example 7-18 displays the routing table on NX-1 and NX-2 to help locate
the reason.
router eigrp 12
autonomous-system 1234
interface Ethernet1/1
ip router eigrp 12
Note
Specifying the AS in the EIGRP configuration removes any potential
for confusion by network engineers of all skill level. This is
considered a best practice.
Mismatch K Values
EIGRP uses K values to define which factors that the best path formula
uses. To ensure a consistent routing logic and prevent routing loops from
forming, all EIGRP neighbors must use the same K values. The K values
are included as part of the EIGRP Hello packets.
Example 7-21 displays the syslog message that indicates a mismatch of K
values. The K values are identified on the local router by looking at the
EIGRP process with the command show ip eigrp.
The K values on Nexus switches are configured with the command metric
weights TOS K1 K2 K3 K4 K5[K6] under the EIGRP process. The K6 value is
optional unless EIGRP wide metrics are configured. TOS is not used and
should be set to zero. Example 7-22 displays an EIGRP configuration with
custom K values.
NX-2
03:11:35 NX-2 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26807] (default-base) IP-EIGRP(0) 12: Neighbor 10.12.1.100
(Ethernet1/1) is down: Interface Goodbye received
03:11:39 NX-2 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26807] (default-base) IP-EIGRP(0) 12: Neighbor 10.12.1.100
(Ethernet1/1) is up: new adjacency
03:11:54 NX-2 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26807] (default-base) IP-EIGRP(0) 12: Neighbor 10.12.1.100
(Ethernet1/1) is down: Interface Goodbye received
03:11:59 NX-2 %$ VDC-1 %$ %EIGRP-5-NBRCHANGE_DUAL: eigrp-NXOS
[26807] (default-base) IP-EIGRP(0) 12: Neighbor 10.12.1.100
(Ethernet1/1) is up: new adjacency
The EIGRP Hello and Hold timers for an interface are seen with the
command show ip eigrp interface [interface-id] [vrf {vrf-name | all}]. The
optional brief keyword cannot be used to view the timers. Example 7-24
displays sample output for NX-1 and NX-2.
interface Ethernet1/1
ip router eigrp NXOS
ip hello-interval eigrp NXOS 120
Note
The EIGRP interface Hold timer is modified with the command ip
hold-time eigrp process-tag hold-time.
EIGRP encrypts the password using an MD5 using the keychain function.
Keychains allow the configuration of multiple passwords and sequences
that can have the validity period set so that passwords could be rotated.
When using time-based keychains, it is important that the Nexus switches
time is synchronized with NTP and that some overlap of time is provided
between key iterations.
The hash is composed of the key number and a password. EIGRP
authentication does not encrypt the entire EIGRP packet, just the password.
The password is seen with the command show key chain [mode decrypt].
The optional keywords mode decrypt display the password in plain text
between a pair of quotation marks, which is helpful to detect unwanted
characters such as spaces. Example 7-27 displays how the keychain
password is verified.
Note
The hash does not match between EIGRP devices if the key number is
different, even if the password is identical. So the key number and
password must match.
interface Ethernet1/1
ip router eigrp NXOS
ip authentication key-chain eigrp NXOS EIGRP
ip authentication key-chain eigrp mode eigrp NXOS md5
Note
Interface-based authentication settings override any global EIGRP
authentication settings.
EIGRP routes that are installed into the RIB are seen with the command
show ip route [eigrp]. The optional eigrp keyword only shows EIGRP
learned routes. EIGRP routes are indicated by the eigrp-process-tag.
EIGRP routes originating within the autonomous system have an
administrative distance (AD) of 90 and have the internal flag listed after
the process-tag. Routes that originate from outside of the AS are external
EIGRP routes. External EIGRP routes have an AD of 170, and have the
external flag listed after the process-tag. Placing external EIGRP routes
into the RIB with a higher AD acts as a loop prevention mechanism.
Example 7-31 displays the EIGRP routes from the sample topology in
Figure 7-7. The metric for the selected route is the second number in
brackets.
Example 7-31 Viewing EIGRP Routes on NX-1
Click here to view code image
Load Balancing
EIGRP allows multiple successor routes (same metric) to be installed into
the RIB. Installing multiple paths into the RIB for the same prefix is called
equal- cost multipath (ECMP) routing. At the time of this writing, the
default maximum ECMP paths value for Nexus nodes is eight.
The default ECMP setting are changed with the command maximum-paths
maximum-paths under the EIGRP process to increase the default value to
16.
NXOS does not support EIGRP unequal-cost load balancing, which allows
installation of both successor routes and feasible successors into the EIGRP
RIB. Unequal-cost load balancing is supported in other Cisco operating
systems with the variance command.
Stub
EIGRP stub functionality allows an EIGRP router to conserve router
resources. An EIGRP stub router announces itself as a stub within the
EIGRP Hello packet. Neighboring routers detect the stub field and update
the EIGRP neighbor table to reflect the router’s stub status.
If a route goes active, EIGRP does not send EIGRP Queries to an EIGRP
stub router. This provides faster convergence within an EIGRP AS because
it decreases the size of the Query domain for that prefix.
EIGRP stubs do not advertise routes that they learn from other EIGRP
peers. By default, EIGRP stubs advertise only connected and summary
routes, but can be configured so that they only receive routes or advertise
any combination of redistributed routes, connected routes, or summary
routes.
The routing tables in Example 7-32 look different on NX-1 and NX-6 from
the baseline routing table that was displayed in Example 7-30.
All the routers have established adjacency. Using the optional detail
keyword may provide more insight to the problem. Example 7-34 displays
the command show ip eigrp neighbors detail.
NX-1 was able to detect that the 10.12.1.2 peer (NX-2) has the EIGRP stub
feature configured. The stub feature prevented NX-2 from advertising
routes learned on the E1/2 interface toward the E1/1 interface and vice
versa.
The next step is to verify and remove the EIGRP configuration. The EIGRP
command eigrp stub {direct | leak-map leak-map-name |receive-only |
redistributed | static | summary} configures stub functionality on a
switch and is displayed in Example 7-35. Removing the stub configuration
allows for the routes to transit across NX-2.
Note
The receive-only option cannot be combined with other EIGRP stub
options. Give the network design special consideration to ensure
bidirectional connectivity for any networks connected to an EIGRP
router with the receive-only stub option to ensure that routers know
how to send return traffic.
interface Ethernet1/1
ip router eigrp NXOS
interface Ethernet1/2
ip router eigrp NXOS
Note
At the time of this writing, full EIGRP support is available only in
Enterprise Services, whereas only EIGRP Stub functionality is
included in LAN Base licensing for specific platforms. Please check
current licensing options, because this could cause issues.
Maximum-Hops
EIGRP is a hybrid distance vector routing protocol and does keep track of
hop counts.
In addition to filtering by prefixes, EIGRP supports filtering by hop counts.
By default, an EIGRP router allows only routes up to 100 hops away to be
installed into the EIGRP topology table. Routes with the EIGRP hop count
path attribute higher than 100 do not install into the EIGRP topology table.
The hop count is changed with the EIGRP configuration command metric
maximum-hops hop-count.
Just as before, a change is notated in the routing table of NX-1 where paths
appear to have disappeared. The routing table for NX-1 and NX-6 is
provided in Example 7-36.
NX-1 is missing the upper (NX-1 → NX-2 → NX-3 → NX-6) path for the
10.6.6.0/24 network, whereas NX-6 maintains full paths to the 10.1.1.0/24
and 10.11.11.0/24 network. This means that there is connectivity in both
directions and that EIGRP stub functionality has not been deployed. It also
states that there is EIGRP adjacency along all paths, so some form of
filtering or path manipulation was performed.
Examining the EIGRP configuration on NX-1, NX-2, NX-3, and NX-6
identifies the cause of the problem. NX-2 has configured the maximum-
hops feature and set it to 1, as shown in Example 7-37. This allows for the
relevant routes (from NX-6’s perspective) to be seen equally. Removing the
metric maximum-hops command or changing the value to a normal value
returns the routing table to normal.
interface Ethernet1/1
ip router eigrp NXOS
interface Ethernet1/2
ip router eigrp NXOS
Distribute List
EIGRP supports filtering of routes with a distribute list that is placed on an
individual interface. The distribute list uses the command ip distribute-list
eigrp process-tag {route-map route-map-name | prefix-list prefix-list-
name {in | out}. The following rules apply:
If the direction is set to in, inbound filtering drops routes prior to the
DUAL processing; therefore, the routes are not installed into the RIB.
If the direction is set to out, the filtering occurs during outbound route
advertisement; the routes are processed by DUAL and install into the
local RIB of the receiving router.
Any routes that pass the prefix-list are advertised or received. Routes
that do not pass the prefix-list are filtered.
In lieu of specifying a prefix-list, a route-map can specified to modify
path attributes, in addition to filtering.
A network engineer has identified that a path for the 10.1.1.0/24 route has
disappeared on NX-6 while the 10.11.11.0/24 route has both paths in it.
Example 7-38 displays the current routing table of NX-6, which is different
from the original routing table displayed in Example 7-30.
Because the 10.11.11.0/24 network has two paths and it is connected to the
same Nexus switch (NX-1), some form of path manipulation is enabled.
Checking the routing table along the missing path should identify the router
causing this behavior.
Example 7-39 displays NX-2’s routing table that shows the path for
10.1.1.0/24 coming from NX-3 when the path from NX-1 appears to be
more optimal.
NX-2
interface Ethernet1/2
description To NX-1
ip router eigrp NXOS
ip distribute-list eigrp NXOS prefix-list DISTRIBUTE out
Offset Lists
Modifying the EIGRP path metric provides traffic engineering in EIGRP.
Modifying the delay setting for an interface modifies all routes that are
received and advertised from that router’s interface. Offset lists allow for
the modification of route attributes based upon direction of the update,
specific prefix, or combination of direction and prefix. The offset list is
applied under the interface with the command ip offset-list eigrp process-
tag {route-map route-map-name | prefix-list prefix-list-name {in | out }
off-set value. The following rules apply:
If the direction is set to in, the offset value is added as routes are
added to the EIGRP topology table.
If the direction is set to out, the path metric increases by the offset
value specified in the offset list as advertised to the EIGRP neighbor.
Any routes that pass the route-map or the prefix-list will have the
metric added to the path attributes.
The offset-value is calculated from an additional delay value that is added
to the existing delay in the EIGRP path attribute. Figure 7-8 shows the
modified path metric formula when an offset delay is included.
Example 7-41 displays an offset list configuration on NX-2 that adds 256 to
the path metric to only the 10.1.1.0/24 prefix received from NX-1.
NX-2
interface Ethernet1/2
description To NX-1
ip router eigrp NXOS
ip offset-list eigrp NXOS prefix-list OFFSET in 256
Example 7-42 displays the topology for the 10.1.1.0/24 prefix that is
advertised from NX-1 toward NX-2 from Figure 7-8. Notice that the path
metric has increased from 768 to 1,024 and that the delay increased by 10
microseconds.
The metric value added in Example 7-41 was explicitly calculated using the
EIGRP path metric formula so that a delay value of 10 was added. Adding a
metric value at one point in the path may not be the same metric increase
later on, depending on whether the bandwidth changes further downstream
on that path.
Example 7-43 displays how the increase of the metric (256) has impacted
only the path from 10.1.1.0/24 and not the path from 10.11.11.0/24.
Note
As stated earlier, the path metric can be manipulated with an EIGRP
offset list or the use of a distribute-list when a route-map is used. In
both scenarios, EIGRP modifies the metric through the total delay
path attribute. When small values are scaled for EIGRP, the potential
to lose precision can occur on IOS-based routers because they use
integer math. These devices may not be able to register a difference
between the value of 4007 and 4008, whereas a Nexus switch can.
In general, use larger values where the rounding does not have an
effect on the path decision. Be sure to accommodate decisions that
could be impacted further away from where the change is being made.
Redistribution
Every routing protocol has a different methodology for calculating the best
path for a route. For example, EIGRP can use bandwidth, delay, load, and
reliability for calculating its best path, whereas OSPF primarily uses the
path metric for calculating the shortest path first (SPF) tree (SPT). OSPF
cannot calculate the SPF tree using EIGRP path attributes, and EIGRP
cannot run Diffusing Update Algorithm (DUAL) using only the total path
metric. The destination protocol must provide relevant metrics to the
destination protocols so that the destination protocol can calculate the best
path for the redistributed routes.
Redistributing into EIGRP uses the command redistribute [bgp asn |
direct | eigrp process-tag | isis process-tag | ospf process-tag | rip process-
tag | static] route-map route-map-name. A route-map is required as part of
the redistribution process on Nexus switches.
Every protocol provides a seed metric at the time of redistribution that
allows the destination protocol to calculate a best path. EIGRP uses the
following logic when setting the seed metric:
The default seed metric on Nexus switches is 100,000 Kbps for
minimum bandwidth, 1000 μs of delay, reliability of 255, load of 1,
and MTU of 1492.
The default seed metric is not needed, and path attributes are
preserved when redistributing between EIGRP processes.
Note
The default seed metric behavior on Nexus switches is different from
IOS and IOS XR routers that use a default seed value of infinity.
Setting the seed metric to infinity prevents routes from being installed
into the topology table.
The default seed metrics can be changed to different values for bandwidth,
load, delay, reliability, and maximum transmission unit (MTU) if desired.
The EIGRP process command metric weights tos bandwidth delay
reliability load mtu changes the value for all routes that are redistributed
into that process, or the command set metric weights bandwidth delay
reliability load mtu can be used for selective manipulation within a route-
map.
Example 7-44 provides the necessary configuration to demonstrate the
process of redistribution. NX-1 redistributes the connected routes for
10.1.1.0/24 and 10.11.11.0/24 in lieu of them being advertised with the
EIGRP routing protocol. Notice that the route-map can be a simple permit
statement without any conditional matches.
Example 7-45 displays the routing table on NX-2. The 10.1.1.0/24 and
10.11.11.0/24 routes are tagged as external, and the AD is set to 170. The
topology table is shown to display the EIGRP path metrics. Notice that
EIGRP contains an attribute for the source protocol (Connected) as part of
the route advertisement from NX-1.
Note
EIGRP router-ids are used as a loop prevention mechanism for
external routes. An EIGRP router does not install an external route
that contains the router-id that matches itself. Ensuring unique router-
ids on all devices in an EIGRP AS prevents problems with external
EIGRP routes.
GigabitEthernet:
Scaled Bandwidth = 10,000,000 / 1000000
Scaled Delay = 10 / 10
Composite Metric = 10 + 1 * 256 = 2816
10 GigabitEthernet:
Scaled Bandwidth = 10,000,000 / 10000000
Scaled Delay = 10 / 10
Composite Metric = 1 + 1 * 256 = 512
11 GigabitEthernet:
Scaled Bandwidth = 10,000,000 / 11000000
Scaled Delay = 10 / 10
Composite Metric = 0 + 1 * 256 = 256
20 GigabitEthernet:
Scaled Bandwidth = 10,000,000 / 20000000
Scaled Delay = 10 / 10
Composite Metric = 0 + 1 * 256 = 256
EIGRP includes support for a second set of metrics known as wide metrics
that addresses the issue of scalability with higher-capacity interfaces.
EIGRP wide metric support is supported and must be configured to be
enabled on NX-OS.
Note
IOS routers support EIGRP wide metrics only in named configuration
mode, and IOS-XR routers use wide metrics by default.
Figure 7-9 shows the explicit EIGRP wide metric formula. Notice that an
additional K value (K6) is included that adds an extended attribute to
measure jitter, energy, or other future attributes.
Figure 7-9 EIGRP Wide Metric Formula
Note
The metric style used by a Nexus switch is identified with the
command show ip eigrp. If a K6 metric is present, the router is using
wide style metrics.
EIGRP can detect when peering with a router is using classic metrics, and
unscales the metric to the formula in Figure 7-11.
Figure 7-11 Formula for Calculating Unscaled EIGRP Metrics
Note
It is important to note the microseconds (10−6) to picoseconds (10−12).
A value of 10 microseconds is equal to 10,000,000. If you recheck the
delay value for NX-1 metric, that decimal place has been removed
(that is, a zero was removed).
Example 7-48 displays the EIGRP topology table for the 10.1.1.0/24
network on NX-6 while classic metric values are configured on the entire
network. Notice that both paths have the same FD of 1,280.
Example 7-49 displays the EIGRP topology table for the 10.1.1.0/24
network on NX-6, and wide metrics have been enabled on NX-1 and NX-2.
EIGRP classic metric values are configured on the remaining switches
network. Notice that the total delay has changed on the path from NX-1 →
NX-2 → NX-3 → NX-6 to 30 μs. This is because the first two hops of this
path were calculated using picoseconds instead of microseconds, resulting
in a 10 μs reduction. NX-6 uses this path only for forwarding traffic.
Example 7-50 displays the EIGRP topology table for the 10.1.1.0/24
network on NX-5 and NX-6, whereas wide metrics have been enabled on
NX-1, NX-2, and NX-3. The delay is now reduced to 20 μs along the NX-1
→ NX-2 → NX-3 → NX-6 path. The path NX-1 → NX-4 → NX-5 → NX-6
no longer passes the feasible successor condition on NX-6 and does not
show up in the topology table.
Notice that NX-5 has now calculated the path NX-1 → NX-2 → NX-3 →
NX-6 → NX-5 the same amount of delay as NX-1 → NX-4 → NX-5. When
load balanced, a portion of the traffic is forwarded suboptimally along the
longer path.
Example 7-52 displays the EIGRP topology table for the 10.1.1.0/24
network on NX-6 and NX-5, whereas wide metrics were enabled on NX-1,
NX-2, NX-3 and NX-6. NX-6 contains only the wide metric path, and the
delay is shown only in picoseconds.
NX-5 has now calculated the path NX-1 → NX-2 → NX-3 → NX-6 → NX-
5 as the best path due to the unscaling formula. All traffic to the 10.1.1.0/24
network takes the longer path.
If a feasible successor is not available for the prefix, DUAL must compute
a new route calculation. The route state changes from Passive (P) to Active
(A) in the EIGRP topology table.
Active Query
The router detecting the topology change sends out Query packets to
EIGRP neighbors for the route. The Query packet includes the network
prefix with the delay set to infinity so that other routers are aware that it
has gone Active. When the router sends the EIGRP Query packets, it sets
the Reply status flag set for each neighbor on a prefix basis.
Upon receipt of a Query packet, an EIGRP router does one of the following:
Reply to the Query that the router does not have a route to the prefix.
If the Query did not come from the successor for that route, it detects
the delay set for infinity but ignores it because it did not come from
the successor. The receiving router replies with the EIGRP attributes
for that route.
If the Query came from the successor for the route, the receiving
router detects the delay set for infinity, sets the prefix as Active in the
EIGRP topology, and sends out a Query packet to all downstream
EIGRP neighbors for that route.
The Query process continues from router to router until a router establishes
the Query boundary. A Query boundary is established when a router does
not mark the prefix as Active, meaning that it responds to a query with the
following:
Not having a route to the prefix
Replying with EIGRP attributes because the query did not come from
the successor
When a router receives a Reply for all its downstream queries, it completes
the DUAL algorithm, changes the route to Passive, and sends a Reply
packet to any upstream routers that sent a Query packet to it. Upon
receiving the Reply packet for a prefix, the reply packet is notated for that
neighbor and prefix. The reply process continues upstream for the Queries
until the first router’s Queries are received.
Figure 7-13 represents a topology where the link between NX-1 and NX-2
has failed.
Figure 7-13 EIGRP Convergence Topology
The following steps are processed in order from the perspective of NX-2
calculating a new route to the 10.1.1.0/24 network.
Step 1. NX-2 detects the link failure. NX-2 did not have a feasible
successor for the route, set the 10.1.1.0/24 prefix as active, and sent
Queries to NX-3 and NX-4.
Step 2. NX-3 receives the Query from NX-2, and processes the delay field
that is set to infinity. NX-3 does not have any other EIGRP
neighbors and sends a Reply to NX-2 that a route does not exists.
NX-4 receives the Query from NX-2 and processes the delay field
that is set to infinity. Because the Query was received by the
successor, and a feasible successor for the prefix does not exist,
NX-4 marks the route as active and sends a Query to NX-5.
Step 3. NX-5 receives the Query from NX-4 and detects that the delay field
is set to infinity. Because the Query was received by a
nonsuccessor, and a successor exists on a different interface, a
REPLY for the 10.4.4.0/24 network is sent back to NX-4 with the
appropriate EIGRP attributes.
Step 4. NX-4 receives NX-5’s Reply, acknowledges the packet, and
computes a new path. Because this is the last outstanding Query
packet on NX-4, NX-4 sets the prefix as passive. With all Queries
satisfied, NX-4 responds to NX-2’s query with the new EIGRP
metrics.
Step 5. NX-2 receives NX-4’s Reply, acknowledges the packet, and
computes a new path. Because this is the last outstanding Query
packet on NX-4, NX-4 sets the prefix as passive.
Stuck in Active
DUAL is very efficient at finding loop-free paths quickly, and normally
finds a backup path in seconds. Occasionally an EIGRP Query is delayed
because of packet loss, slow neighbors, or a large hop count. EIGRP waits
half of the active timer (90 seconds default) for a Reply. If the router does
not receive a response within 90 seconds, the originating router sends a
Stuck In Active (SIA) Query to EIGRP neighbors that have not responded.
Upon receipt of a SIA-Query, the router should respond within 90 seconds
with a SIA-REPLY. A SIA-Reply contains the route information, or
provides information on the Query process itself. If a router fails to
respond to a SIA-Query by the time the active timer expires, EIGRP deems
the router as Stuck In Active (SIA). If the SIA state is declared for a
neighbor, DUAL deletes all routes from that neighbor and treats the
situation as if the neighbor responded with unreachable messages for all
routes. Active Queries are shown with the command show ip eigrp
topology active.
Figure 7-14 shows a topology where the link between NX-1 and NX-2 has
failed. NX-2 sends out Queries to NX-4 and NX-3 for the 10.1.1.0/24 and
10.12.1.0/24 networks. NX-4 sends a Reply back to NX-2, and NX-3 sends
a Query onto R5, which then sends a query on to R6.
Figure 7-14 EIGRP SIA Topology
A network engineer sees the syslog message for the down link and
immediately runs the show ip eigrp topology active command on NX-2
and sees output the output from Example 7-54.
The “r” next to the 10.23.1.3 indicates that NX-2 is still waiting on the
reply from NX-3. NX-1 is registered as down, and the path is set to infinity.
The show ip eigrp topology command can then be executed on NX-3,
which indicates it is waiting on a response from NX-5. Then the command
can be run again on R5, which indicates it is waiting on R6. Executing the
command on R6 does not show any active prefixes, inferring that R6 never
received a Query from R5. R5’s Query could have been dropped on the
wireless connection.
After the 90-second window has passed, the switch sends out a SIQ Query,
which is seen by examining the EIGRP traffic counters. Example 7-55
displays the traffic counters before and after the 90-second window.
Example 7-55 EIGRP Traffic Counters with SIA Queries and Replies
Click here to view code image
Example 7-56 displays the EIGRP topology table after the SIA Replies are
received. And just after that, the SIA message appears in the syslog, and the
EIGRP peering is reset.
Having an invalid route stuck in the routing table because of a busy router
can be frustrating. There are two possible solutions:
Change the active timer to a different value with the command timers
active-time {disabled | 1-65535_minutes} under the EIGRP process.
Use network summarization within the network design. EIGRP
summarization is useful for creating query boundaries to reduce the
realm that a query will be executed in.
The active timer is shown by examining the EIGRP process with the show
ip eigrp command. The SIA timer is displayed in the Active Timer field.
Example 7-57 displays the active timer value of three minutes.
References
RFC 7868, Cisco’s Enhanced Interior Gateway Routing Protocol (EIGRP).
Savage, D., J. Ng, S. Moore, et al. IETF, https://tools.ietf.org/html/rfc7868,
May 2016.
Edgeworth, Brad, Aaron Foss, and Ramiro Garza Rios. IP Routing on Cisco
IOS, IOS XE and IOS XR. Indianapolis: Cisco Press, 2014.
Cisco. Cisco NX-OS Software Configuration Guides,
http://www.cisco.com.
Chapter 8
Troubleshooting Open
Shortest Path First (OSPF)
Inter-Router Communication
OSPF runs on its own protocol (89) and multicast where possible to reduce
unnecessary traffic. The two OSPF multicast addresses are as follows:
AllSPFRouters: IPv4 Address 224.0.0.5 or MAC 01:00:5E:00:00:05
All routers running OSPF should be able to receive these packets.
AllDRouters: IPv4 Address 224.0.0.6 or MAC 01:00:5E:00:00:06
Communication with Designated Routers uses this address.
Within the OSPF protocol are five types of packets. Table 8-1 provides an
overview of the OSPF packet types and a brief description for each type.
Table 8-1 OSPF Packet Types
Type Packet Name Functional Overview
1 Hello Discover & Maintain neighbors
Packets are sent out periodically on all OSPF
interfaces to discover new neighbors while ensuring
other neighbors are still online.
2 Database Summarize Database Contents
Description Packets are exchanged when an OSPF adjacency is
(DBD) or first being formed. These packets are used to describe
(DDP) the contents of the LSDB.
3 Link State Database Download
Request (LSR) When a router thinks that part of its LSDB is stale, it
may request a portion of a neighbor’s database using
this packet type.
4 Link State Database Update
Update (LSU) This is an explicit LSA for a specific network link and
normally is sent in direct response to a LSR.
5 Link State Ack Flooding Acknowledgement
These packets are sent in response to the flooding of
LSAs, therefore making the flooding a reliable
transport feature.
Note
Neighbors are selected as the DR and BDR based on the highest OSPF
priority, followed by higher Router ID (RID) when the priority is a tie.
The OSPF priority is set on an interface with the command ip ospf
priority 0-255. Setting the value to zero prevents that router from
becoming a DR for that segment.
Areas
OSPF provides scalability for the routing table by using multiple OSPF
areas with the routing domain. Each OSPF area provides a collection of
connected networks and hosts that are grouped together. OSPF uses a two-
tier hierarchical architecture where Area 0 is a special area known as the
backbone, and all other OSPF areas must connect to Area 0. In other words,
Area 0 provides transit connectivity between nonbackbone areas.
Nonbackbone areas advertise routes into the backbone, and the backbone
then advertises routes into other nonbackbone areas.
The exact topology of the area is invisible from outside of the area while
still providing connectivity to routers outside of the area. This means that
routers outside the area do not have a complete topological map for that
area, which reduces OSPF network traffic in that area. By segmenting an
OSPF routing domain into multiple areas, it is no longer true that all OSPF
routers will have identical LSDBs; however, all routers within the same
area will have identical area LSDBs. The reduction in routing traffic uses
less router memory and resources providing scalability.
Area Border Routers (ABR) are OSPF routers connected to Area 0 and
another OSPF area. ABRs are responsible for advertising routes from one
area and injecting them into a different OSPF area. Every ABR needs to
participate in Area 0; otherwise, routes do not advertise into another area.
When a router redistributes external routes into an OSPF domain, the router
is called an Autonomous System Boundary Router (ASBR). An ASBR can be
any OSPF router, and the ASBR function is independent of the ABR
function.
Note
Every LSA contains the advertising router’s RID. The router RID
represents the router and is how links are connected to each other.
Note
The Cisco Press book IP Routing on Cisco IOS, IOS XE and IOS XR
describes OSPF LSAs and how a router builds the actual topology
table using LSAs in a visual manner.
Table 8-5 provides a brief overview of the fields that appear in Example 8-
3.
Table 8-5 OSPF Neighbor State Fields
Field Description
Table 8-6 provides an overview of the fields in the output from Example 8-
3.
Table 8-6 OSPF Interface Columns
Field Description
Interface Interfaces with OSPF enabled.
Area The Area that this interface is associated with. Area is always
displayed in dotted-decimal format.
Cost Cost is used to calculate a metric for a path by the SPF algorithm.
State Current interface state DR, BDR, DROTHER, LOOP, or Down.
Neighbor Number of neighbor OSPF routers for a segment that have
established an adjacency.
Status The protocol line status for that interface. A down value reflects
an interface that is not reachable.
Example 8-5 displays the output of the show ip ospf interface command in
nonbrief format. It is important to note that the primary IP address,
interface network type, DR, BDR, and OSPF interface timers are included
as part of the information provided.
Now that a passive interface has been identified, the configuration must be
examined for the following:
The interface parameter command ip ospf passive-interface, which
makes only that interface passive.
The global OSPF configuration command passive-interface default,
which makes all interfaces under that OSPF process passive. The
interface parameter command no ip ospf passive-interface takes
precedence over the global command and makes that interface active.
Example 8-7 displays the configuration on NX-1 and NX-2 that prevents
the two Nexus switches from forming an OSPF adjacency. The Ethernet1/1
interfaces must be active on both switches for an adjacency to form. Move
the command ip ospf passive-interface from Eth1/1 to VLAN10 on NX-1,
and the command no ip ospf passive-interface from VLAN20 to Interface
Eth1/1 on NX-2 to allow an adjacency to form.
interface loopback0
ip router ospf NXOS area 0.0.0.0
interface Ethernet1/1
ip ospf passive-interface
ip router ospf NXOS area 0.0.0.0
interface VLAN10
ip router ospf NXOS area 0.0.0.0
interface loopback0
ip router ospf NXOS area 0.0.0.0
interface Ethernet1/1
ip router ospf NXOS area 0.0.0.0
interface VLAN20
no ip ospf passive-interface
ip router ospf NXOS area 0.0.0.0
Note
Debug output can also be redirected to a logfile, as shown earlier in
Chapter 2, “NX-OS Troubleshooting Tools.”
Table 8-7 provides a brief description of the fields that are provided in the
debug output from Example 8-9.
Table 8-7 Relevant Fields for OSPF Debug
Field Description
ivl Provides the OSPF Hello and Dead Timers in the Hello packet.
options Identifies the area associated to that interface as a regular OSPF
area, OSPF Stub, or OSPF NSSA area. These values are shown in
HEX, and this chapter explains later how to verify them.
mask The subnet mask of primary IP address on that interface.
priority The interface priority for DR/BDR elections.
dr The router-id of the DR.
bdr The router-id of the BDR.
nbrs The number of neighbors detected on that network segment.
Debug commands are generally the least preferred method for finding root
cause because of the amount of data that could be generated while the
debug is enabled. NX-OS provides event-history that runs in the
background without performance hits that provides another method of
troubleshooting. The command show ip ospf event-history [hello |
adjacency | event] provides helpful information when troubleshooting
OSPF adjacency problems. The hello keyword provides the same
information as the debug command in Example 8-9.
Example 8-10 displays the show ip ospf event-history hello command.
Examine the difference in the sample output on NX-1.
Performing OSPF debugs on a switch only shows the packets that have
reached the supervisor. If packets are not displayed in the debugs or event-
history, further troubleshooting must be taken by examining quality of
service (QoS) policies, control plane policing (CoPP), or just verification of
the packet leaving or entering an interface.
QoS policies may or may not be deployed on an interface. If they are
deployed, the policy-map must be examined for any drop packets, which
must then be referenced to a class-map that matches the OSPF routing
protocol. The same process applies to CoPP policies because they are based
on QoS settings as well.
Example 8-11 displays the process for checking a switch’s CoPP policy
with the following logic:
1. Examine the CoPP policy with the command show running-config
copp all. This displays the relevant policy-map name, classes defined,
and the police rate.
2. Investigate the class-maps to identify the conditional matches for that
class-map.
3. After the class-map has been verified, examine the policy-map drops
for that class with the command show policy-map interface control-
plane.
Note
This CoPP policy was taken from a Nexus 7000 switch, and the policy-
name and class-maps may vary depending on the platform.
Because CoPP operates at the RP level, it is possible that the packets were
received on an interface and did not forward to the RP. The next phase is to
identify whether packets were transmitted or received on an interface. This
technique involves creating a specific access control entity (ACE) for the
OSPF protocol. The ACE for OSPF should appear before any other
ambiguous ACE entries to ensure a proper count. The ACL configuration
command statistics per-entry is required to display the specific hits that
are encountered per ACE.
Example 8-12 demonstrates the configuration of an ACL to detect OSPF
traffic on the Ethernet1/1 interface. Notice that the ACL includes a permit
ip any any command to allow all traffic to pass through this interface.
Failing to do so could result in the loss of traffic.
Note
There are three ACE entries for OSPF. The first two are tied to the
multicast groups for DR and BDR communication. The third ACE
applies to the initial Hello packets.
Note
Example 8-12 uses an Ethernet interface, which generally indicates a
one-to-one relationship, but on multi-access interfaces like switched
virtual interfaces (SVI), also known as interface VLANs, the neighbor
may need to be specified in a specific ACE.
In the event that the problem was due to a blatant subnet mismatch, the
Hello packets are not recognized in OSPF debug or event-history. Verifying
connectivity by the ping neighbor-ipaddress or show ip route neighbor-
ipaddress will reflect that the networks are not on matching networks.
Ensuring that the OSPF routers' primary interfaces are on a common subnet
ensures proper communication.
Note
OSPF RFC 2328 allows neighbors to form an adjacency using
disjointed networks only when using the ip unnumbered command on
point-to-point OSPF network types. NX-OS does not support IP
unnumbered addressing, so this use case is not applicable.
MTU Requirements
The OSPF header of the DBD packets includes the interface MTU. OSPF
DBDs are exchanged in the EXSTART and EXCHANGE Neighbor State.
Routers check the interface’s MTU that is included in the DBD packets to
ensure that they match. If the MTUs do not match, the OSPF devices do not
form an adjacency.
Example 8-15 displays that NX-1 and NX-2 have started to form a neighbor
adjacency over 3 minutes ago and are stuck in the EXSTART state.
Examine the OSPF event-history to identify the reason the switches are
stuck in the EXSTART state. Example 8-16 displays the OSPF adjacency
event-history on NX-1, in which the MTU from NX-2 has been detected as
larger than the MTU on NX-1’s interface.
Note
The MTU messages appear only on the device with the smaller MTU.
The OSPF protocol itself does not know how to handle fragmentation. It
relies on IP fragmentation when packets are larger than the interface. It is
possible to ignore the MTU safety check by placing the interface parameter
command ip ospf mtu-ignore on the switch with the smaller MTU.
Example 8-18 displays the configuration command on NX-1 that allows it
to ignore the larger MTU from NX-2.
interface Ethernet1/1
ip ospf mtu-ignore
ip router ospf NXOS area 0.0.0.0
interface VLAN10
ip ospf passive-interface
ip router ospf NXOS area 0.0.0.0
This technique allows for adjacencies to form, but may cause problems
later. The simplest solution is to change the MTU to match on all devices.
Note
If the OSPF interface is a VLAN interface (SVI), make sure that all
the Layer 2 (L2) ports support the MTU configured on the SVI. For
example, if VLAN 10 has an MTU of 9000, configure all the trunk
ports to support an MTU of 9000 as well.
Unique Router-ID
The RID provides a unique identifier for an OSPF router. A Nexus switch
drops packets that have the same RID as itself as part of a safety
mechanism. The syslog message using our routerid, packet dropped is
displayed along with the interface and RID of the other device. Example 8-
19 displays what the syslog message looks like on NX-1.
The RID is checked by viewing the OSPF process with the command show
ip ospf, as displayed in Example 8-20.
Using the command router-id router-id in the OSPF process sets the RID
statically and is considered a best practice. After changing the RID on one
of the Nexus switches, an adjacency should form.
Note
The RID is a key component of the OSPF topology table that is built
from the LSDB. All OSPF devices should maintain a unique RID.
More information on how to interpret the OSPF topology table is
found in Chapter 7, “Advanced OSPF” of the Cisco Press book IP
Routing on Cisco IOS, IOS-XE, and IOS XR.
Interface Area Numbers Must Match
OSPF requires that the area-id match in the OSPF Hello packets to form an
adjacency. The syslog message received for wrong area is displayed along
with the interface and area-id of the other device.
Example 8-21 displays what the syslog message looks like on NX-1 and
NX-2.
Example 8-21 Syslog Message with Neighbors Configured with Different Areas
Click here to view code image
When this happens, check the OSPF interfaces to detect which area-ids are
configured by using the command show ip ospf interface brief. Example
8-22 shows the output from NX-1 and NX-2. Notice that the area is
different on NX-1 and NX-2 for the Ethernet1/1 interface.
Changing the interface areas to the same value on NX-1 and NX-2 allows
for an adjacency to form between them.
Note
The area-id is always stored in dot-decimal format on Nexus switches.
This may cause confusion when working with other devices that store
the area-id in decimal format. To convert decimal to dot-decimal,
follow these steps:
Step 1. Convert the decimal value to binary.
Step 2. Split the binary value into four octets starting with the furthest
right number.
Step 3. Add zeroes as required to complete each octet.
Step 4. Convert each octet to decimal format, which provides dot-
decimal format.
OSPF Stub (Area Flags) Settings Must Match
The OSPF Hello packet contains an Options field, specifically the E-bit,
which reflects the area’s ability to contain Type-5 LSAs (Stub capability)
settings. The interfaces in an area must be in the following types to form an
adjacency:
Normal: External routes (Type-5 LSAs) are allowed in this area.
Stubby/Totally Stubby: External LSAs (Type-5 LSAs) are not
allowed in this area. No redistribution is allowed in this area.
Not So Stubby Area (NSSA)/Totally NSSA: External LSAs (Type-5
LSAs) are not allowed in this area. Redistribution is allowed in this
area.
The OSPF Hello event-history detects a mismatched OSPF area setting.
Example 8-23 displays the concept where NX-1 has detected a different
area flag from what is configured on its interface.
Verify the area settings on the two routers that cannot form an adjacency.
Example 8-24 displays that NX-1 has Area 1 configured as a stub, whereas
NX-2 does not.
Setting the area to the same stub setting on both routers allows for the area
flag check to pass and the routers to form an adjacency.
DR Requirements
Different media types can provide different characteristics or might limit
the number of nodes allowed on a segment. Table 8-8 defines the five OSPF
network types—which ones are configurable on NX-OS and which network
types can peer with other network types.
Table 8-8 OSPF Network Types on NX-OS
Interface Configurable DR/BDR Can Establish Peering With
Type on NX-OS Field in
OSPF
Hellos
Broadcast Yes Yes Broadcast, no changes necessary
Non-Broadcast, OSPF timers need
modification
Non- No Yes Non-Broadcast, no changes
Broadcast necessary Broadcast, OSPF timers
need modification
Point-to- Yes No Point-to-Point, no changes necessary
Point Point-to-Multipoint, OSPF timers
need modification
Point-to- No No Point-to-Multipoint, no changes
Multipoint necessary Point-to-Point, OSPF
timers need modification
Loopback No N/A N/A
Note
On OSPF network segments that require a DR (Broadcast/Non-
Broadcast), an adjacency does not form if a router cannot be elected a
DR because the OSPF priority has been set to zero for all interfaces.
Neighbors are stuck in a 2WAY state in this scenario.
There are times when a Nexus switch forms only one OSPF adjacency for
that interface. An example is two Ethernet ports configured as Layer 3 (L3)
with a direct cable. In scenarios like this, setting the OSPF network type to
point-to-point (P2P) provides advantages of faster adjacency (no DR
Election) and not wasting CPU cycles for DR functionality.
OSPF can form an adjacency only if the DR and BDR Hello options match.
Example 8-25 displays NX-1 stuck in INIT state with NX-2. NX-2 does not
consider NX-1 an OSPF neighbor. Scenarios like this indicate
incompatibility in OSPF network types.
The OSPF network type needs to be changed on one of the devices, because
both Nexus switches are using L3 Ethernet ports. Configuring both
switches to use an OSPF point-to-point network type is recommended. The
command ip ospf network point-to-point configures NX-1’s Ethernet1/1
interface as an OSPF point-to-point network type. This allows for both
switches to form an adjacency. Example 8-27 displays the configuration for
NX-1 and NX-2 that allows them to form an adjacency.
Note
The default OSPF Hello Timer interval varies upon the OSPF network
type. Changing the Hello Timer interval modifies the default Dead
Interval, too.
If a router does not receive a Hello before the OSPF Dead Interval Timer
reaches zero, the neighbor state changes to Down. The OSPF router
immediately sends out the appropriate LSA reflecting the topology change,
and the SPF algorithm processes on all routers within the area.
The OSPF Hello Time and OSPF Dead Interval Time must match when
forming an adjacency. In the event the timers do not match, timers are
displayed in the OSPF Hello packet event history. Example 8-28 shows that
NX-1 is receiving a Hello packet with different OSPF timers.
Note
IOS routers support OSPF fast-packet Hellos for subsecond detection
of neighbors with issues. Nexus and IOS XR do not support OSPF fast-
packet Hellos. The use of bidirectional forwarding detection (BFD)
provides fast convergence across IOS, IOS XR, and Nexus devices and
is the preferred method of subsecond failure detection.
Authentication
OSPF supports two types of authentication: plaintext and a MD5
cryptographic hash. Plaintext mode provides little security, because anyone
with access to the link can see the password with a network sniffer. MD5
crytographic hash uses a hash instead, so the password is never sent out the
wire, and this technique is widely accepted as being the more secure mode.
OSPF authentication operates on an interface-by-interface basis or all
interfaces in an area. The password is set only as an interface parameter
and must be set for every interface. Missing an interface sets the default
password to a null value.
Plaintext authentication is enabled for an OSPF area with the command
area area-id authentication, and the interface parameter command ip ospf
authentication sets plaintext authentication only on that interface. The
plaintext password is configured with the interface parameter command ip
ospf authentication-key password.
Example 8-31 displays plaintext authentication on NX-1’s Ethernet1/1
interface and all Area 0 interfaces on NX-2 using both commands
explained previously.
NX-1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
NX-1(config)# int eth1/1
NX-1(config-if)# ip ospf authentication
NX-1(config-if)# ip ospf authentication-key CISCO
NX-1 %OSPF-4-AUTH_ERR: ospf-NXOS [8792] (default) Received packet
from 10.12.1.200 on Ethernet1/1 with bad authentication 0
NX-2# conf t
Enter configuration commands, one per line. End with CNTL/Z.
NX-2(config)# router ospf NXOS
NX-2(config-router)# area 0 authentication
NX-2(config-router)# int eth1/1
NX-2(config-if)# ip ospf authentication-key CISCO
Notice the authentication error that NX-1 produced upon enabling
authentication. When there is a mismatch of OSPF authentication
parameters, the Nexus switch produces the syslog message that contains
bad authentication, which requires verification of the authentication
settings.
Authentication is verified by looking at the OSPF interface and looking for
the authentication option. Example 8-32 verifies the use of OSPF plaintext
passwords on NX-1 and NX-2 interfaces.
interface loopback0
ip router ospf NXOS area 0.0.0.0
interface Ethernet1/1
ip ospf authentication-key 3 bdd0c1a345e1c285
ip router ospf NXOS area 0.0.0.0
MD5 authentication is enabled for an OSPF area with the command area
area-id authentication message-digest, and the interface parameter
command ip ospf authentication message-digest sets MD5 authentication
for that interface. The MD5 password is configured with the interface
parameter command ip ospf message-digest-key key# md5 password or
set by using a key-chain with the command ip ospf authentication key-
chain key-chain-name. The MD5 authentication is a hash of the key
number and password combined. If the keys do not match, the hash is
different between the nodes.
Note
Detailed instruction on key chain creation was provided in Chapter 7,
“Troubleshooting Enhanced Interior Gateway Routing Protocol
(EIGRP).”
NX-2# conf t
NX-2(config)# key chain OSPF-AUTH
NX-2(config-keychain)# key 2
NX-2(config-keychain-key)# key-string CISCO
NX-2(config-keychain-key)# router ospf NXOS
NX-2(config-router)# area 0 authentication message-digest
NX-2(config-router)# int eth1/1
NX-2(config-if)# ip ospf authentication key-chain OSPF-AUTH
Discontiguous Network
Network engineers who do not fully understand OSPF design may create a
topology such as the one illustrated in Figure 8-3. Although NX-2 and NX-
3 have OSPF interfaces in Area 0, traffic from Area 12 must cross Area 23
to reach Area 34. An OSPF network with this design is discontiguous
because interarea traffic is trying to cross a nonbackbone area.
Figure 8-3 Discontiguous Network
Example 8-37 shows that NX-2 and NX-3 appear to have full connectivity
to all networks in the OSPF domain. NX-2 maintains connectivity to the
10.34.1.0/24 network and 192.168.4.4/32 network, and NX-3 maintains
connectivity to the 10.12.1.0/24 network and 192.168.1.1/32 network.
OSPF ABRs use the following logic for Type-3 LSAs when entering
another OSPF Area:
Type-1 LSAs received from a nonbackbone area create Type-3 LSAs
into backbone area and nonbackbone areas.
Type-3 LSAs received from Area 0 are created for the nonbackbone
area.
Type-3 LSAs received from a nonbackbone area only insert into the
LSDB for the source area. ABRs do not create a Type-3 LSA for the
other nonbackbone areas.
The simplest fix for a discontiguous network is to install a virtual link
between NX-2 and NX-3. Virtual links overcome the ABR limitations by
extending Area 0 into a nonbackbone area. It is similar to running a virtual
tunnel for OSPF between an ABR and another multi-area OSPF router. The
virtual link extends Area 0 across Area 23, making Area 0 a contiguous
OSPF area.
The virtual link configuration is applied to the OSPF routing process with
the command area area-id virtual-link endpoint-rid. The configuration is
applied on both end devices as shown in Example 8-39.
NX-2
router ospf NXOS
area 0.0.0.23 virtual-link 192.168.3.3
NX-3
router ospf NXOS
area 0.0.0.23 virtual-link 192.168.2.2
Example 8-40 displays the routing table of NX-1 after the virtual link is
configured between NX-2 and NX-3. Notice that the 192.168.4.4 network is
present. In addition, the virtual link appears as an OSPF interface.
Duplicate Router ID
Router IDs (RID) play a critical role for the creation of the topology. If two
adjacent routers have the same RID, an adjacency does not form as shown
earlier. However, if two routers have the same RID and have an
intermediary router, it prevents those routes from being installed in the
topology.
The RID act as a unique identifier in the OSPF LSAs. When two different
routers advertise LSAs with the same RID, it causes confusion in the OSPF
topology, which can result in routes not populating or packets being
forwarded toward the wrong router. It also prevent LSA propagation
because the receiving router may assume that a loop exists.
Figure 8-4 provides a sample topology in which all Nexus switches are
advertising their peering network and their loopback addresses in the
192.168.0.0/16 network space. NX-2 and NX-4 have been configured with
the same RID of 192.168.4.4. NX-3 sits between NX-2 and NX-4 and has a
different RID, therefore allowing NX-2 and NX-4 to establish full neighbor
adjacencies with their peers.
Figure 8-4 Duplicate Router ID Topology
From NX-1’s perspective, the first apparent issue is that NX-4’s loopback
interface (192.168.4.4/32) is missing. Example 8-41 displays NX-1’s
routing table.
Example 8-41 NX-1’s Routing Table with Missing NX-4’s Loopback Interface
Click here to view code image
On NX-2 and NX-4, there are complaints about LSAs and Possible router-
id collision syslog messages, as shown in Example 8-42.
Filtering Routes
NX-OS provides multiple methods of filtering networks after they are
entered into the OSPF database. Filtering of routes occurs on ABRs for
internal OSPF networks and ASBRs for external OSPF networks. The
following includes some configurations that should be examined when
routes are present in one area but not present in a different area.
Area Filtration: Routes are filtered upon receipt or advertisement to
an ABR with the process level configuration command area area-id
filter-list route-map route-map-name {in|out}.
Route Summarization: Internal routes are summarized on ABRs
using the command area area-id range summary-network [not-
advertise]. If the not-advertise keyword is configured, a Type-3 LSA
is not generated for any of the component routes; thereby hiding them
to only the area of origination.
External routes are summarized on ASBRs using the command summary-
address summary-network [not-advertise]. The not-advertise keyword
stops the generation of any Type-5/Type-7 LSAs for component routes
within the summary network.
Note
ABRs for NSSA areas act as an ASBR when the Type 7 LSAs are
converted to Type 5 LSA. External summarization is performed only
on ABRs when they match this scenario.
Redistribution
Redistributing into OSPF uses the command redistribute [bgp asn | direct
| eigrp process-tag | isis process-tag | ospf process-tag | rip process-tag |
static] route-map route-map-name. A route-map is required as part of the
redistribution process on Nexus switches.
Every protocol provides a seed metric at the time of redistribution that
allows the destination protocol to calculate a best path. OSPF uses the
following default settings for seed metrics:
The network is configured as an OSPF Type-2 external network.
The default redistribution metric is set to 20 unless the source protocol
is BGP which provides a default seed metric of 1.
The default seed metrics can be changed to different values for OSPF
external network type (1 versus 2), redistribution metric, and a route-tag if
desired.
Example 8-45 provides the necessary configuration to demonstrate the
process of redistribution. NX-1 redistributes the connected routes for
10.1.1.0/24 and 10.11.11.0/24 in lieu of them being advertised with the
OSPF routing protocol. Notice that the route-map can be a simple permit
statement without any conditional matches.
NX-1
ip route 172.16.1.0/24 10.120.1.10
!
route-map REDIST permit 10
set metric-type type-1
!
router ospf NXOS
redistribute static route-map REDIST
log-adjacency-changes
!
interface Ethernet1/1
ip router ospf NXOS area 0.0.0.0
Example 8-47 displays the Type-5 LSA for the external route for the
172.16.1.0/24 network to the proxy server. The ASBR is identified as NX-1
(192.168.1.1), which is the device that all Nexus switches forward packets
to in order to reach the 172.16.1.0/24 network. Notice that the forwarding
address is the default value of 0.0.0.0.
LS age: 199
Options: 0x2 (No TOS-capability, No DC)
LS Type: Type-5 AS-External
Link State ID: 172.16.1.0 (Network address)
Advertising Router: 192.168.1.1
LS Seq Number: 0x80000002
Checksum: 0x7c98
Length: 36
Network Mask: /24
Metric Type: 1 (Same units as link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 0
Traffic from NX-2 (and NX-4) takes the non-optimal route (NX-2→NX-
4→NX-3→ NX-1→FW), as shown in Example 8-48. The optimal route
would allow NX-2 to use the directly connected 10.120.1.0/24 network
toward the firewall.
The forwarding address in OSPF Type-5 LSAs is specified in RFC 2328 for
scenarios such as this. When the forwarding address is 0.0.0.0, all routers
forward packets to the ASBR, introducing the potential for suboptimal
routing.
The OSPF forwarding address changes from 0.0.0.0 to the next-hop IP
address in the source routing protocol when the following occurs:
OSPF is enabled on the ASBR’s interface that points to the next-hop IP
address. In this scenario, NX-1’s VLAN120 interface has OSPF
enabled, which correlates to the 172.16.1.0/24 static route’s next-hop
address of 10.120.1.10.
That interface is not set to passive.
That interface is a broadcast or nonbroadcast OSPF network type.
Now OSPF is enabled on the NX-1’s and NX-2’s VLAN120 interface, which
has been associated to area 120. Figure 8-6 illustrates the current topology.
VLAN interfaces default to the broadcast OSPF network type, and all
conditions were met to set the FA to an explicit IP address.
Figure 8-6 OSPF Forwarding Address Nondefault
Example 8-49 displays the Type-5 LSA for the 172.16.1.0/24 network. Now
that OSPF is enabled on NX-1’s 10.120.1.1 interface and the interface is a
broadcast network type, the forwarding address changed from 0.0.0.0 to
10.120.1.10.
Example 8-50 verifies that connectivity from NX-2 and NX-4 now takes
the optimal path because the forwarding address changed to 10.120.1.10.
NX-2
router ospf NXOS
area 0.0.0.120 range 10.0.0.0/8 not-advertise
log-adjacency-changes
After the junior network engineer made the change, the 172.16.1.0/24
network disappeared on all the routers in Area 0. Only the other peering
network is present, as shown in Example 8-52.
If the Type-5 LSA forwarding address is not a default value, the address
must be an intra-area or inter-area OSPF route. If the FA is not resolved,
the LSA is ignored and does not install into the RIB. The FA provides a
mechanism to introduce multiple paths to the external next-hop address.
Otherwise, there is not a reason to include the FA in the LSA. Removing the
filtering on NX-1 and NX-2 restores connectivity.
Note
In the scenario provided, there was not any redundancy to provide
connectivity in the event that NX-1 failed. Typically, the configuration
is repeated on other routers, which provides resiliency. Be considerate
of the external networks when applying filtering of routes on ABRs.
Intra-Area Routes
Routes advertised via a Type-1 LSA for an Area are always preferred over
Type-3 and Type-5 LSAs. If multiple intra-area routes exist, the path with
the lowest total path metric is installed in the RIB. If there is a tie in
metric, both routes install into the RIB.
Note
Even if the path metric from an intra-area route is higher than an inter-
area path metric; the intra-area path is selected.
Inter-Area Routes
Inter-area routes take the lowest total path metric to the destination. If
there is a tie in metric, both routes install into the RIB. All inter-area paths
for a route must go through Area 0.
In Figure 8-7, NX-1 is computing the path to NX-6. NX-1 uses the path
NX-1→ NX-3→NX-5→NX-6 because its total path metric is 35 versus the
NX-1→NX-2→ NX-4→NX-6 path with a metric of 40.
Note
There is an option with NSSA areas that prevents the redistributed
routes from being advertised outside of the NSSA area (setting the P-
bit to zero), which may change the behavior. This concept is outside of
the scope of this book; it is explained in depth in RFC 2328 and 3101.
Figure 8-8 shows the topology for NX-1 and NX-3 computing a path to the
external network (100.65.0.0/16) that is being redistributed on NX-6 and
NX7.
Example 8-54 NX-1 External OSPF Path Selection for Type-2 Network
Click here to view code image
Example 8-55 NX-3 External OSPF Path Selection for Type-2 Network
Click here to view code image
Figure 8-9 External Type-2 Route Selection with Nexus and IOS Devices
NX-3 selects R7 as the ASBR for the 100.65.0.0/16 network using RFC
2328 standards and forwards packets toward R5. R5 uses RFC 1583
standards and forwards packets back to NX-3, causing a loop. Example 8-
56 verifies that the loop exists using a simple traceroute from NX-3 toward
the 100.65.0.0/16 network.
The solution involves placing the Nexus switches into RFC 1583 mode with
the OSPF command rfc1583compatibility. Example 8-57 displays the
configuration to remove the routing loop.
Note
Another significant change between RFC 1583 and RFC 2328 is the
summarization metric. With RFC 1583, an ABR uses the lowest
metric from any of the component routes for the metric of the
summarized network. RFC 2328 uses the highest metric from any of
the component routes for the metric of the summarized route.
Deploying rfc1583compatibility on the ABR changes the behavior.
Example 8-58 displays the routing table of R1 with the default reference
bandwidth. Traffic between 172.16.1.0/24 and 172.32.2.0/24 flows across
the backup 1 Gigabit link (10.12.1.0/24), which does not follow the
intended traffic patterns. Notice that the OSPF path metric is 2 to the
172.32.2.0/24 network using the 1 Gigabit link.
Example 8-58 R1’s Routing Table with Default OSPF Auto-Cost Bandwidth
Click here to view code image
R1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)# int gi0/1
R1(config-if)# shut
16:04:43.107: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.2.2 on
GigabitEthernet0/1 from FULL to DOWN, Neighbor Down: Interface
down or detached
16:04:45.077: %LINK-5-CHANGED: Interface GigabitEthernet0/1,
changed state to administratively down
16:04:46.077: %LINEPROTO-5-UPDOWN: Line protocol on Interface
GigabitEthernet0/1, changed state to down
R1(config-if)# do show ip route ospf | b Gatewa
Gateway of last resort is not set
R2# conf t
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)# router ospf 1
R2(config-router)# auto-cost reference-bandwidth 40000
% OSPF: Reference bandwidth is changed.
Please ensure reference bandwidth is consistent across all
routers.
Now let’s examine the new OSPF metric cost using the 10 Gigabit path, and
then reactivate the 1 Gigabit link on R1. Example 8-61 demonstrates this
change and then verifies which path is now used to connect the
172.16.1.0/24 and 172.32.2.0/24 networks.
Example 8-61 Verification of New Path After New Reference OSPF Bandwidth Is Configured on
R1 and R2
Click here to view code image
Note
Another solution involves statically setting the OSPF cost on an
interface with the command ip ospf cost 1-65535 for NX-OS and IOS
devices.
Summary
This chapter provided a brief review of the OSPF routing protocols, and
then explored the methods for troubleshooting adjacency issues between
devices, missing routes, and path selection.
The following parameters must be compatible for the two routers to
become neighbors:
Interfaces must be Active.
Connectivity between devices must exist using the primary subnet.
MTU matches between devices.
Router-IDs are unique.
Interface Area must match.
Need for Designated Router matches based on OSPF network types.
OSPF stub area flags match.
OSPF is a link state routing protocol that builds a complete map based on
LSAs. Routes are missing from the OSPF routing domain typically because
of bad network design or through filtering of routes as they are advertised
across area boundaries. This chapter provided some common bad OSPF
designs that cause loss of path information.
OSPF builds a loop-free topology from the computing router to all
destination networks. All routers use the same logic to calculate the
shortest-path for each network. Path selection prioritizes paths by using the
following logic:
Intra-Area
Inter-Area
External Type-1
External Type-2
When the redistribution metric is the same, Nexus switches select external
paths using RFC 2328 by default, which states to prefer intra-area
connectivity over inter-area connectivity when multiple ABSRs are present.
Cisco IOS and IOS XR routers use RFC 1583 external path selection, which
selects an ABSR by the lowest forwarding cost. This can cause routing
loops when Nexus switches are intermixed with IOS or IOS XR routers, but
the Nexus switches can be placed in RFC 1583 compatibility mode.
References
RFC 1583, OSPF Version 2. IETF, http://www.ietf.org/rfc/rfc1583.txt,
March 1997.
RFC 2328, OSPF Version 2. IETF, http://www.ietf.org/rfc/rfc2328.txt, April
1998.
Edgeworth, Brad, Aaron Foss, Ramiro Garza Rios. IP Routing on Cisco
IOS, IOS XE and IOS XR. Indianapolis: Cisco Press, 2014.
Cisco. Cisco NX-OS Software Configuration Guides,
http://www.cisco.com.
Chapter 9
Troubleshooting Intermediate
System-Intermediate System
(IS-IS)
IS-IS Fundamentals
IS-IS uses a two-level hierarchy consisting of Level 1 (L1) and Level 2
(L2) connections. IS-IS communication occurs at L1, L2, or both (L1-L2).
L2 routers communicate only with other L2 routers, and L1 routers
communicate only with other L1 routers. L1-L2 routers provide
connectivity between the L1 and L2 levels. An L2 router can communicate
with L2 routers in the same or a different area, whereas an L1 router
communicates only with other L1 routers within the same area. The
following list indicates the type of adjacencies that are formed between IS-
IS routers:
L1 ← → L1
L2 ← → L2
L1-L2 ← → L1
L1-L2 ← → L2
L1-L2 ← → L1-L2
Note
The terms L1 and L2 are used frequently in this chapter, and refer only
to the IS-IS levels. They should not be confused with the OSI model.
IS-IS uses the link-state packets (LSP) for building a link-state packet
database (LSPDB) similar to OSPF’s link-state database (LSDB). IS-IS
then runs the Dijkstra Shortest Path First (SPF) algorithm to construct a
loop-free topology of shortest paths.
Areas
OSPF and IS-IS use a two-level hierarchy but work differently between the
protocols. OSPF provides connectivity between areas by allowing a router
to participate in multiple areas, whereas IS-IS places the entire router and
all its interfaces in a specific area. OSPF’s hierarchy is based on areas
advertising prefixes into the backbone, which then are advertised into
nonbackbone areas. Level 2 is the IS-IS backbone and can cross multiple
areas, unlike OSPF, as long as the L2 adjacencies are contiguous.
Figure 9-1 demonstrates these basic differences between OSPF and IS-IS.
Notice that the IS-IS backbone extends across four areas, unlike OSPF’s
backbone, which is limited to Area 0.
All L1 IS-IS routers in the same level maintain an identical copy of the
LSPDB, and all L1 routers do not know about any routers or networks
outside of their level (area). In a similar fashion, L2 routers maintain a
separate LSPDB that is identical with other L2 routers. L2 routers are
aware only of other L2 routers and networks in the L2 LSPDB.
L1-L2 routers inject L1 prefixes into the L2 topology. L1-L2 routers do not
advertise L2 routes into the L1 area, but they set the attached bit in their L1
LSP, indicating that the router has connectivity to the IS-IS backbone
network. If an L1 router does not have a route for a network, it searches the
LSPDB for the closest router with the attached bit, which acts as a route of
last resort.
NET Addressing
IS-IS routers share an area topology through link-state packets (LSP) that
allows them to build the LSPDB. IS-IS uses NET addresses to build the
LSPDB topology. The NET address is included in the IS header for all the
LSPs. Ensuring that a router is unique in an IS-IS routing domain is
essential for properly building the LSPDB. NET addressing is based off the
OSI model’s Network Service Access Point (NSAP) address structure that is
between 8 to 20 bytes in length. NSAP addressing is variable based on the
logic for addressing domains.
The dynamic length in the Inter-Domain Part (IDP) portion of the NET
address causes unnecessary confusion. Instead of reading the NET address
left to right, most network engineers read the NET address from right to
left. In the most simplistic form, the first byte is always the selector (SEL)
(with a value of 00), with the next 6 bytes as the system ID, and the
remaining 1 to 13 bytes are the Area Address, as shown in Figure 9-3.
Note
In essence, the router’s System ID is equivalent to EIGRP, or OSPF’s
router-id. The NET address is used to construct the network topology
and must be unique.
Inter-Router Communication
Unlike other routing protocols, intermediate system (IS) communication is
protocol independent because inter-router communication is not
encapsulated in the third layer (network) of the OSI model. IS
communication uses the second layer of the OSI model. IP, IPv6, and other
protocols all use the third layer addressing in the OSI model.
IS protocol data units (PDU) (packets) follow a common header structure
that identifies the type of the PDU. Data specific to each PDU type follows
the header, and the last fields use optional variable-length fields that
contain information specific to the IS PDU type.
IS packets are categorized into three PDU types, with each type
differentiating between L1 and L2 routing information:
IS-IS Hello (IIH) Packets: IIH packets are responsible for
discovering and maintaining neighbors.
Link State Packets (LSP): LSPs provide information about a router
and associated networks. Similar to an OSPF LSA, except OSPF uses
multiple LSAs.
Sequence Number Packets (SNP): Sequence number packets (SNP)
control the synchronization process of LSPs between routers.
Complete sequence number packets (CSNP) provide the LSP
headers for the LSPDB of the advertising router to ensure the
LSPDB is synchronized.
Partial sequence number packets (PSNP) acknowledge receipt of
a LSP on point-to-point networks and request missing link state
information when the LSPDB is identified as being out of sync.
IS Protocol Header
Every IS packet includes a common header that describes the PDU. All
eight fields are 1-byte long and are in all packets.
Table 9-1 provides an explanation for the fields listed in the IS Protocol
Header.
Table 9-1 IS-IS Packet Types
Field Description
IS PDU Addressing
Communication between IS devices uses Layer 2 addresses. The source
address is always the network interface’s Layer 2 address, and the
destination address varies depending upon the network type. Nexus
switches are Ethernet based and therefore use Layer 2 MAC addresses for
IS-IS communication.
ISO standards classify network media into two categories: broadcast and
general topology.
Broadcast networks provide communication to multiple devices with a
single packet. Broadcast interfaces communicate in a multicast fashion
using well-known Layer 2 addresses so that only the nodes running IS-IS
process the traffic. IS-IS does not send unicast traffic on broadcast network
types, because all routers on the segment should be aware of what is
happening with the network.
Table 9-2 provides a list of destination MAC addresses used for IS
communication.
Table 9-2 IS-IS Destination MAC Addresses
Name Destination MAC Address
General topology networks are based off network media that allows
communication only with another device if a single packet is sent out.
General topology networks are often referred to in IS-IS documentation as
point-to-point networks. Point-to-point networks communicate with a
directed destination address that matches the Layer 2 address for the
remote device. NBMA technologies such as Frame Relay may not
guarantee communication to all devices with a single packet. A common
best practice is to use point-to-point subinterfaces on NMBA technologies
to ensure proper communication between IS-IS nodes.
Table 9-4 provides a brief list of information included in the IIH Hello
Packet.
Table 9-4 Fields in IIH Packets
Type Description
Link-State Packets
Link-state packets (LSP) are similar to OSPF LSAs where they advertise
neighbors and attached networks, except that IS-IS uses only two types of
LSPs. IS-IS defines a LSP type for each level. L1 LSPs are flooded
throughout the area they originate, and L2 LSPs are flooded throughout the
Level 2 network.
LSP ID
The LSP ID is a fixed 8-byte field that provides a unique identification of
the LSP originator. The LSP ID is composed of the following:
System ID (6 bytes): The system ID is extracted from the NET
address configured on the router.
Pseudonode ID (1 byte): The pseudonode ID identifies the LSP for a
specific pseudonode (virtual router) or for the physical router. LSPs
with a pseudonode ID of zero describe the links from the system and
can be called non-pseudonode LSPs.
LSPs with a nonzero number indicate that the LSP is a pseudonode
LSP. The pseudonode ID correlates to the router’s circuit ID for the
interface performing the designated intermediate system (DIS)
function. The pseudonode ID is unique among any other broadcast
segments for which the same router is the DIS on that level.
Pseudonodes and DIS are explained later in this chapter.
Fragment ID (1 byte): If an LSP is larger than the max MTU value of
the interface it needs to be sent out of, that LSP must be fragmented.
IS-IS fragments the LSP as it is created, and the fragment-ID allows
the receiving router to process fragmented LSPs.
Figure 9-5 shows two LSP IDs. The LSP ID on the left indicates that it is
for a specific IS router, and the LSP ID on the right indicates that it is for
the DIS because the pseudonode ID is not zero.
Attribute Fields
The last portion of the LSP header is an 8-bit section that references four
components of the IS-IS specification:
Partition Bit: The partition bit identifies whether a router supports
the capability for partition repair. Partition repair allows a broken L1
area to be repaired by L2 routers that belong to the same area as the L1
routers. Cisco and most other network vendors do not support partition
repair.
Attached Bit: The next four bits reflect the attached bit set by a L1-L2
router connected to other areas via the L2 backbone. The attached bit
is in L1 LSPs.
Overload Bit: The overload bit indicates when a router is in an
overloaded condition. During SPF calculation, routers should avoid
sending traffic through this router. Upon recovery, the router
advertises a new LSP without the overload bit, and the SPF calculation
occurs normally without avoiding routes through the previously
overloaded node.
Router Type: The last two bits indicate whether the LSP is from a L1
or L2 router.
Note
There is a natural tendency to associate IS-IS DIS behavior with
OSPF’s designated router (DR) behavior, but they operate in a
different nature. All IS-IS routers form a full neighbor adjacency with
each other. Any router can advertise non-pseudonode LSPs to all other
IS-IS routers on that segment, whereas OSPF specifies that LSAs are
sent to the DR to be advertised to the network segment.
The DIS advertises a pseudonode LSP that indicates the routers that attach
to the pseudonode. The pseudonode LSP acts like an OSPF Type-2 LSA
because it indicates the attached neighbors and informs the nodes which
router is acting as the DIS. The system IDs of the routers connected to the
pseudonode are listed in the IS Reachability TLV with an interface metric
set to zero because SPF uses the metric for the non-pseudonode LSPs for
calculating the SPF tree.
The pseudonode advertises the complete sequence number packets (CSNP)
every 10 seconds. IS-IS routers check their LSPDBs to verify that all LSPs
listed in the CSNP exist, and that the sequence number matches the version
in the CSNP.
If an LSP is missing or the router has an outdated (lower sequence
number) LSP than what is contained in the CSNP, the router advertises
a partial sequence number packet (PSNP) requesting the correct or
missing LSP. All IS-IS routers receive the PSNP, but only the DIS
sends out the correct LSP, thereby reducing traffic on that network
segment.
If a router detects that the sequence number in the CSNP is lower than
the sequence number for any LSP that is stored locally in its LSPDB,
it advertises the local LSP with the higher sequence number. All IS-IS
routers receive the LSP and process it accordingly. The DIS should
send out an updated CSNP with the updated sequence number for the
advertised LSP.
Path Selection
Note that the IS-IS path selection is quite straightforward after reviewing
the following key definitions:
Intra-area routes are routes that are learned from another router
within the same level and area address.
Inter-area routes are routes that are learned from another L2 router
that came from a L1 router or from a L2 router from a different area
address.
External routes are routes that are redistributed into the IS-IS domain.
External routes can choose between two metric types:
Internal metrics are directly comparable with IS-IS path metrics
and are selected by default by Nexus switches. IS-IS treats these
routes with the same preferences as those advertised normally via
TLV #128.
External metrics cannot be comparable with internal path
metrics.
IS-IS best-path selection follows the processing order shown in the
following steps to identify the route with the lowest path metric for each
stage.
Step 1. L1 intra-area routes
L1 external routes with internal metrics
Step 2. L2 intra-area routes
L2 external routes with internal metric
L1→L2 inter-area routes
L1→L2 inter-area external routes with internal metrics
Step 3. Leaked routes (L2→L1) with internal metrics
Step 4. L1 external routes with external metrics
Step 5. L2 external routes with external metrics
L1→L2 inter-area external routes with external metrics
Step 6. Leaked routes (L2→L1) with external metrics
Note
Under normal IS-IS configuration, only the first three steps are used.
External routes with external metrics require the external metric-type
to be explicitly specified in the route-map at the time of redistribution.
IS-IS requires that neighboring routers form an adjacency before LSPs are
processed. The IS-IS neighbor adjacency process consists of three states:
down, initializing, and up. This section explains the process for
troubleshooting IS-IS neighbor adjacencies on Nexus switches.
Figure 9-6 provides a simple topology with two Nexus switches that are
used to explain how to troubleshoot IS-IS adjacency problems.
Table 9-7 provides a brief overview of the fields used in Example 9-2.
Notice that the Holdtime for NX-2 is relatively low because NX-2 is the
DIS for the 10.12.1.0/24 network.
Table 9-7 IS-IS Neighbor State Fields
Field Description
System ID The system ID (SEL) abstracted from the NET address.
Subnetwork Point Layer 2 hardware addresses for IS-IS routers. Nexus
of Addresses switches will always show the MAC address because of
(SNPA) Ethernet.
Level Type of adjacency formed with a neighbor: L1, L2, or
L1-L2.
State Displays whether the neighbor is up or down.
Holdtime Time required to receive another IIH to maintain the IS-
IS adjacency.
Interface Interface used to peer with neighbor router.
Note
Notice that the system ID actually references the router’s hostname
instead of the 6-byte system ID. IS-IS provides a name to system ID
mapping under the optional TLV #137 that is found as part of the LSP.
This feature is disabled under the IS-IS router configuration with the
command no hostname dynamic.
Example 9-3 displays the show isis adjacency command using the
summary and detail keywords. Notice that the optional detail keyword
provides accurate timers for transition states for a particular neighbor.
Example 9-3 Display of IS-IS Neighbors with summary and detail Keywords
Click here to view code image
NX-1# show isis adjacency summary
IS-IS process: NXOS VRF: default
IS-IS adjacency database summary:
Legend: '!': No AF level connectivity in given topology
P2P UP INIT DOWN All
L1 0 0 0 0
L2 0 0 0 0
L1-2 0 0 0 0
SubTotal 0 0 0 0
Total 2 0 0 2
NX-
2 0021.21ae.c123 2 UP 00:00:08 Ethernet1/1
Up/Down transitions: 1, Last transition: 00:38:30 ago
Circuit Type: L1-2
IPv4 Address: 10.12.1.200
IPv6 Address: 0::
Circuit ID: NX-2.01, Priority: 64
BFD session for IPv4 not requested
BFD session for IPv6 not requested
Restart capable: 1; ack 0;
Restart mode: 0; seen(ra 0; csnp(0; l1 0; l2 0)); suppress 0
Besides enabling IS-IS on the network interfaces on Nexus switches, the
following parameters must match for the two switches to become
neighbors:
IS-IS interfaces must be Active.
Connectivity between devices must exist using the primary subnet.
MTU matches.
L1 adjacencies require the area address to match the peering L1 router,
and the system ID must be unique between neighbors.
L1 routers can form adjacencies with L1 or L1-L2 routers, but not L2.
L2 routers can form adjacencies with L2 or L1-L2 routers, but not L1.
DIS requirements match.
IIH Authentication Type & Credentials (if any).
L1 L2 L1 L2
L1 L2
------------------------------------------------------------------
--------------
Topology: TopoID: 0
Vlan10 Bcast 3 Down/Ready 0x02/L1-2 1500
4 4 64 64 0/0 0/0
Topology: TopoID: 0
loopback0 Loop 1 Up/Ready 0x01/L1-2 1500
1 1 64 64 0/0 0/0
Topology: TopoID: 0
VLAN10 Bcast 2 Up/Ready 0x01/L1-2 1500
4 4 64 64 1/0 1/0
Topology: TopoID: 0
VLAN10 Bcast 4 Up/Ready 0x03/L1-2 1500
4 4 64 64 0/0 0/0
The command show isis lists the IS-IS interfaces and provides an overview
of the IS-IS configuration for the router that might seem more efficient.
Example 9-5 displays the command. Notice that the System ID, MTU,
metric styles, area address, and topology mode are provided.
interface loopback0
ip router isis NXOS
interface Ethernet1/1
ip router isis NXOS
isis passive-interface level-1
interface VLAN10
ip router isis NXOS
NX-2# show run isis
interface loopback0
ip router isis NXOS
interface Ethernet1/1
ip router isis NXOS
no isis passive-interface level-1
interface VLAN20
ip router isis NXOS
Debug commands are generally the least preferred method for finding root
cause because of the amount of data that could be generated while the
debug is enabled. NX-OS provides event-history that runs in the
background without performance hits that provides another method of
troubleshooting. The command show isis event-history [adjacency | dis |
iih | lsp-flood | lsp-gen] provides helpful information when troubleshooting
IS-IS. The iih keyword provides the same information as the debug
command in Example 9-9.
Example 9-10 displays the show isis even-history iih command. Examine
the difference in the sample output on NX-1 with the previous debug
output. There is not much difference of information.
Performing IS-IS debugs shows only the packets that have reached the
supervisor CPU. If packets are not displayed in the debugs or event-history,
further troubleshooting must be taken by examining quality of service
(QoS) policies, control plane policing (CoPP), or just verification of the
packet leaving or entering an interface.
QoS policies may or may not be deployed on an interface. If they are
deployed, the policy-map must be examined for any drop packets, which
must then be referenced to a class-map that matches the IS-IS routing
protocol. The same process applies to CoPP policies because they are based
on QoS settings as well.
Example 9-11 displays the process for checking a switch’s CoPP policy
with the following logic:
1. Examine the CoPP policy with the command show running-config
copp all. This displays the relevant policy-map name, classes defined,
and the police rate.
2. Investigate the class-maps to identify the conditional matches for that
class-map.
3. After the class-map has been verified, examine the policy-map drops
for that class with the command show policy-map interface control-
plane. If drops are found, the CoPP policy needs to be modified to
accommodate a higher IS-IS packet flow.
Note
This CoPP policy was taken from a Nexus 7000 switch, and the policy-
name and class-maps may vary depending on the platform.
Another technique to see if the packets are reaching the Nexus switch is to
use the built in Ethanalyzer. The Ethanalyzer is used because IS-IS uses
Layer 2 addressing, which restricts packet captures on Layer 3 ports. The
command ethanalyzer local interface inband [capture-filter “ether host
isis-mac-address”] [detail] is used. The capture-filter restricts traffic to
specific types of traffic, and the filter ether host isis-mac-address restricts
traffic to IS-IS based on the values from Table 9-2. The optional detail
provides a packet-level view of any matching traffic. The use of
Ethanalyzer is shown in Example 9-12 to identify L2 IIH packets.
Capturing on inband
09:08:42.979127 88:5a:92:de:61:7c -> 01:80:c2:00:00:15 ISIS L2
HELLO,
System-ID: 0000.0000.0001
09:08:46.055807 88:5a:92:de:61:7c -> 01:80:c2:00:00:15 ISIS L2
HELLO,
System-ID: 0000.0000.0001
09:08:47.489024 88:5a:92:de:61:7c -> 01:80:c2:00:00:15 ISIS L2
CSNP,
Source-ID: 0000.0000.0001.00, Start LSP-ID: 0000.0000.0000.00-00,
End LSP-ID: ff
ff.ffff.ffff.ff-ff
09:08:48.570401 00:2a:10:03:f2:80 -> 01:80:c2:00:00:15 ISIS L2
HELLO,
System-ID: 0000.0000.0002
09:08:49.215861 88:5a:92:de:61:7c -> 01:80:c2:00:00:15 ISIS L2
HELLO,
System-ID: 0000.0000.0001
09:08:52.219001 88:5a:92:de:61:7c -> 01:80:c2:00:00:15 ISIS L2
HELLO,
System-ID: 0000.0000.0001
NX-1# ethanalyzer local interface inband capture-filter "ether
host 01:80:c2:00:00:15" detail
Capturing on inband
Frame 1 (1014 bytes on wire, 1014 bytes captured)
Arrival Time: May 22, 2017 09:07:16.082561000
[Time delta from previous captured frame: 0.000000000 seconds]
[Time delta from previous displayed frame: 0.000000000
seconds]
[Time since reference or first frame: 0.000000000 seconds]
Frame Number: 1
Frame Length: 1014 bytes
Capture Length: 1014 bytes
[Frame is marked: False]
[Protocols in frame: eth:llc:osi:isis]
IEEE 802.3 Ethernet
Destination: 01:80:c2:00:00:15 (01:80:c2:00:00:15)
Address: 01:80:c2:00:00:15 (01:80:c2:00:00:15)
.... ...1 .... .... .... .... = IG bit: Group address
(multicast/broadcast)
.... ..0. .... .... .... .... = LG bit: Globally unique
address (factory
default)
Source: 88:5a:92:de:61:7c (88:5a:92:de:61:7c)
Address: 88:5a:92:de:61:7c (88:5a:92:de:61:7c)
.... ...0 .... .... .... .... = IG bit: Individual address
(unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique
address (factory
default)
Length: 1000
Logical-Link Control
DSAP: ISO Network Layer (0xfe)
IG Bit: Individual
SSAP: ISO Network Layer (0xfe)
CR Bit: Command
Control field: U, func=UI (0x03)
000. 00.. = Command: Unnumbered Information (0x00)
.... ..11 = Frame type: Unnumbered frame (0x03)
ISO 10589 ISIS InTRA Domain Routeing Information Exchange Protocol
Intra Domain Routing Protocol Discriminator: ISIS (0x83)
PDU Header Length : 27
Version (==1) : 1
System ID Length : 0
PDU Type : L2 HELLO (R:000)
Version2 (==1) : 1
Reserved (==0) : 0
Max.AREAs: (0==3) : 0
ISIS HELLO
Circuit type : Level 2 only, reserved(0x00 ==
0)
System-ID {Sender of PDU} : 0000.0000.0001
Holding timer : 9
PDU length : 997
Priority : 64, reserved(0x00 == 0)
System-ID {Designated IS} : 0000.0000.0001.01
Area address(es) (4)
Area address (3): 49.0012
Protocols Supported (1)
NLPID(s): IP (0xcc)
IP Interface address(es) (4)
IPv4 interface address : 10.12.1.100 (10.12.1.100)
IS Neighbor(s) (6)
IS Neighbor: 00:2a:10:03:f2:80
Restart Signaling (1)
Restart Signaling Flags : 0x00
.... .0.. = Suppress Adjacency: False
.... ..0. = Restart Acknowledgment: False
.... ...0 = Restart Request: False
Padding (255)
Padding (255)
Padding (255)
Padding (171)
The next plan of action is to check the IS-IS event-history for adjacency
and IIH on NX-1 and NX-2. NX-1 has adjacency entries for NX-2, whereas
NX-2 does not have any adjacency entries. After checking the IIH event-
history, NX-2 displays that it cannot find a usable IP address, as shown in
Example 9-14.
The next step is to check and correct IP addressing/subnet masks on the two
IS-IS router’s interfaces so that connectivity is established.
MTU Requirements
IS-IS hellos (IIH) are padded with TLV #8 to reach the maximum
transmission unit (MTU) size of the network interface. Padding IIHs
provides the benefit of detecting errors with large frames or mismatched
MTU on remote interfaces. Broadcast interfaces transmit L1 and L2 IIHs
wasting bandwidth if both interfaces use the same MTU.
To demonstrate the troubleshooting process for mismatch MTU, the MTU
on NX-1 is set to 1000, whereas the MTU remains at 1500 for NX-2.
The first step is to check the IS-IS adjacency state as shown in Example 9-
15. NX-1 does not detect NX-2, whereas NX-2 detects NX-1.
The next step is to examine the IS-IS IIH event-history to identify the
problem. In Example 9-16, NX-1 is sending IIH packets with a length of
997, and they are received on NX-2. NX-2 is sending IIH packets with a
length of 1497 to NX-1, which are received. The length of the IIH packets
indicates an MTU problem.
MTU is examined on both switches by examining the MTU values with the
command show interface interface-id and looking for the MTU value, as
shown in Example 9-17. The MTU on NX-2 is larger than NX-1.
Cisco introduced a feature that disables the MTU padding after the router
sends the first five IIHs out of an interface. This eliminates wasted
bandwidth while still providing a mechanism for checking the MTU
between routers. Nexus switches disable the IIH padding with the interface
parameter command no isis hello padding [always]. The always keyword
does not pad any IIH packets, which allows NX-1 to form an adjacency but
could result in problems later. The best solution is to modify the interface
MTU to the highest MTU that is acceptable between the two device’s
interfaces.
Note
If the IS-IS interface is a VLAN interface (SVI), make sure that all the
L2 ports support the MTU configured on the SVI. For example, if
VLAN 10 has an MTU of 9000, all the trunk ports should be
configured to support an MTU of 9000 as well.
Unique System-ID
The System-ID provides a unique identifier for an IS-IS router in the same
area. A Nexus switch drops packets that have the same System-ID as itself
as part of a safety mechanism. The syslog message Duplicate system ID is
displayed along with the interface and System-ID of the other device.
Example 9-18 displays what the syslog message looks like on NX-2.
Through logical deduction, NX-1 and NX-2 can establish and maintain
bidirectional transmission of IS-IS packets because the L2 adjacency is
established. This indicates incorrect authentication parameters, invalid
timers, or that the area numbers do not match.
Example 9-20 displays the IS-IS event-history for NX-1 and NX-2. Notice
that the error message No common area is displayed before the message
indicating that the L1 IIH is received.
The final step is to verify the configuration and check the NET Addressing.
Example 9-21 displays the NET entries for NX-1 and NX-2. NX-1 has an
area of 49.0012 and NX-2 has an area of 49.0002.
Changing the area portion of the NET address to match on either Nexus
switch allows for the L1 adjacency to form.
interface loopback0
ip router isis NXOS
interface Ethernet1/1
isis circuit-type level-1
ip router isis NXOS
interface EthernetVlan10
ip router isis NXOS
If IIH packets are missing from the event-history, the IS-IS Router and
Interface-level settings need to be verified on both routers.
DIS Requirements
The default IS-IS interface on Nexus switches is a broadcast interface and
requires a DIS. Broadcast interface IS-IS interfaces that are directly
connected with only two IS-IS routers do not benefit from the use of a
pseudonode. Resources are wasted on electing a DIS. CSNPs are
continuously flooded into a segment, and an unnecessary pseudonode LSP
is included in the LSPDB of all routers in that level. IS-IS allows general
topology interfaces to behave like a point-to-point interface with the
interface command isis network point-to-point.
An adjacency will not form between IS-IS Nexus switches that have one
broadcast interface and an IS-IS point-to-point interface. Neither device
shows an IS-IS adjacency, but the general topology switch reports the
message Fail: Receiving P2P IIH over LAN interface xx in the IS-IS IIH
event-history. IS-IS event-history indicates which neighbor has advertised
the P2P interface. When those messages are detected, the interface type
needs to be changed on one node to ensure that they are consistent.
Example 9-24 displays NX-2’s IS-IS event-history and the relevant
configurations for NX-1 and NX-2.
Example 9-24 IS-IS Mismatch of Interface Types
Click here to view code image
interface loopback0
ip router isis NXOS
interface Ethernet1/1
isis network point-to-point
ip router isis NXOS
interface EthernetVlan10
ip router isis NXOS
interface loopback0
ip router isis NXOS
interface Ethernet1/1
ip router isis NXOS
interface EthernetVlan20
ip router isis NXOS
Adding the command isis network point-to-point to NX-2’s Ethernet1/1
interface sets both interfaces to the same type, and then an adjacency
forms.
IIH Authentication
IS-IS allows for the authentication of IIH packets that are required to form
an adjacency. IIH authentication is configured on an interface by interface
perspective. IIH authentication uses different settings for each IS-IS level.
Authenticating on one PDU type is sufficient for most designs.
IS-IS provides two types of authentication: plaintext and a MD5
cryptographic hash. Plaintext mode provides little security, because anyone
with access to the link can see the password with a network sniffer. MD5
cryptographic hash uses a hash instead, so the password is never included
in the PDUs, and this technique is widely accepted as being the more secure
mode. All IS-IS authentication is stored in TLV#10 that is part of the IIH.
Nexus switches enable IIH authentication with the interface parameter
command isis authentication key-chain key-chain-name {level-1 | level-
2}. The authentication type is identified with the command isis
authentication-type {md5 | cleartext} {level-1 | level-2}.
Example 9-25 displays MD5 authentication on NX-1’s Ethernet1/1
interface.
NX-1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
NX-1(config)# key chain IIH-AUTH
NX-1(config-keychain)# key 2
NX-1(config-keychain-key)# key-string CISCO
NX-1(config-keychain-key)# interface Ethernet1/1
NX-1(config-if)# isis authentication key-chain CISCO level-1
NX-1(config-if)# isis authentication-type md5 level-1
Duplicate System ID
The IS-IS system ID plays a critical role for the creation of the topology. If
two adjacent routers have the same system ID in the same L1 area, an
adjacency does not form as shown earlier. However, if two routers have the
same system ID in the same L1 area and have an intermediary router, it
prevents those routes from being installed in the topology.
Figure 9-7 provides a sample topology in which all Nexus switches are in
the same area with only L1 adjacencies. NX-2 and NX-4 have been
configured with the same system ID of 0000.0000.0002. NX-3 sits between
NX-2 and NX-4 and has a different system ID, therefore allowing NX-2 and
NX-4 to establish full neighbor adjacencies.
From NX-1’s perspective, the first apparent issue is that NX-4’s 10.4.4.0/24
network is missing, as shown in Example 9-29.
Example 9-29 NX-1’s Routing Table with Missing NX-4’s 10.4.4.0/24 Network
Click here to view code image
On NX-2 and NX-4, there are complaints about LSPs with duplicate system
IDs: L1 LSP—Possible duplicate system ID, as shown in Example 9-30.
Example 9-30 Syslog Messages with LSPs with Duplicate System IDs
Click here to view code image
15:45:26 NX-2 %ISIS-4-LSP_DUP_SYSID: isis-NXOS [15772] L1 LSP -
Possible duplicate system ID 0000.0000.0002 detected
15:41:47 NX-4 %ISIS-4-LSP_DUP_SYSID: isis-NXOS [23550] L1 LSP -
Possible duplicate system ID 0000.0000.0002 detected
Example 9-31 displays the routing table of the two Nexus switches with the
Possible duplicate system ID syslog messages. Notice that NX-2 is missing
only NX-4’s interface (10.4.4.0/24), whereas NX-4 is missing the
10.12.1.0/24 and NX-1’s Ethernet interface (10.1.1.0/24). Examining the
IS-IS database displays a flag (*) that indicates a problem with NX-2.
Example 9-33 R1’s Routing Table with Default Interface Metrics Bandwidth
Click here to view code image
R1# show ip route isis | begin Gateway
Gateway of last resort is not set
Now one of the beautiful things about IS-IS is how it structures networks as
objects that exist on top of the routers themselves. Instead of viewing the
routing table, the IS-IS topology table is viewed with the command show
isis topology. The IS-IS topology table lists the total path metric to reach
the destination router, next-hop node, and outbound interface. Example 9-
34 displays the topology table from R1 and NX-3’s perspective. R1 is
selecting the path to R2 via the direct link on Gi0/1.
Example 9-34 R1’s and NX-3’s IS-IS Topology Table with Default Metric
Click here to view code image
Notice how R1 and NX-3 have conflicting metric values when they point to
each other. To ensure that routing takes the optimal path, three options
ensure optimal routing:
Statically set the IS-IS metric on IS-IS devices that are not Nexus
switches. IOS-based devices use the interface parameter command isis
metric metric-value.
Statically set the IS-IS metric on a Nexus interface to reflect network
links that are more preferred with the interface parameter command
isis metric metric-value {level-1 | level-2}.
Change the reference bandwidth on Nexus switches to a higher value
to make those links more preferred. The reference bandwidth is set
with the IS-IS process configuration command reference-bandwidth
reference-bw {gbps | mbps}.
There are not any intermediary routers between R1 and R2, so the only
option that makes sense is to modify the IS-IS metrics on R1 and R2.
Example 9-35 displays the metric for the 10.12.1.0/24 link being statically
set to 40, and the metric being set to 4 for the 10 Gbps interface. The value
correlates to a reference bandwidth of 40 Gbps.
Now that the change has been made, let’s examine the IS-IS routing table
and topology table on R1 and NX-3, as shown in Example 9-36. Now the
interface metrics match for the 10.13.1.0/24 and 10.24.1.0/24 networks. In
addition, R1 is now selecting the 10 Gbps path as the preferred path to
reach R2.
Example 9-36 IS-IS Routing and Topology Table After Static Metric Configuration
Click here to view code image
Example 9-37 displays R1’s and NX-2’s IS-IS routing entries. R1 does not
have any IS-IS routes in the routing table, whereas NX-2 has routes to all
the networks in the topology.
To confirm the theory, the metric types are checked on R1 and NX-2 by
looking at the IS-IS protocol, as shown in Example 9-39. R1 is set to accept
and generate only narrow metrics, whereas NX-OS accepts both narrow and
wide metrics while advertising only wide metrics.
The Nexus switches are placed in metric transition mode using the
command metric-style transition, which makes the Nexus switch populate
the LSP with narrow and wide metric TLVs. This allows other routers that
operate in narrow metric mode to compute a total path metric for a
topology.
Example 9-40 displays the configuration and verification on NX-2 for IS-IS
metric transition mode.
Example 9-41 displays the IS-IS topology table and routing table for the
IOS routers now that the Nexus switches are placed in IS-IS metric
transition mode.
Example 9-41 Verification of IOS Devices After NX-OS Metric Transition Mode
Click here to view code image
L1 to L2 Route Propagations
IS-IS operates on a two-level hierarchy. A primary function of the L1-L2
routers is to act as a gateway for L1 routers to the L2 IS-IS backbone.
Figure 9-10 displays a simple topology with NX-1 and NX-2 in Area
49.0012 while NX-3 and NX-4 are in Area 49.0034. NX-1’s 10.1.1.0/24
network should be advertised to Area 49.0034 by NX-2, and NX-4’s
10.4.4.0/24 network is advertised to Area 49.0012 by NX-3.
Figure 9-10 IS-IS Topology to Demonstrate IS-IS L1 to L2 Route Propagation
Example 9-42 displays all four Nexus switches' routing tables. Notice that
NX-3 is missing the 10.1.1.0/24 network. This network exists in NX-2’s
routing table as an IS-IS L1 route. The same behavior exists for NX-4’s
10.4.4.0/24 network, which appears on NX-3.
The next step is to examine the IS-IS database with the command show isis
database [level-1 | level-2] [detail] [lsp-id] to make sure that the
appropriate LSPs are in the LSPDB. LSPs are restricted by specifying an
IS-IS level or the specific LSPID for an advertising router.
Example 9-43 displays all the LSPs for L1 and L2 in NX-2’s LSPDB. From
the output, NX-2 has received NX-1’s L1 LSP and has received NX-3’s L2
LSP.
Note
Remember that the pseudonode portion of the LSP ID is zero for the
actual router and contains all its links. If the pseudonode portion of the
LSP ID is nonzero, it reflects the DIS for the segment and lists the
LSP IDs for the routers connected to it. The LSP ID NX-3.02-00 is the
DIS for the NX-2 to NX-3 network link.
The IS-IS LSPDB indicates that NX-1’s L1 routes are not propagating to
NX-2’s L2 database, and the same behavior is occurring between NX-4 and
NX-3. This is caused by a difference in operational behavior between NX-
OS and other Cisco operating systems (IOS, IOS XR, etc.). Nexus switches
require explicit configuration with the command distribute level-1 into
level-2 {all | route-map route-map-name} on L1-L2 routers to insert L1
routes into the L2 topology.
Example 9-45 displays the relevant IS-IS configuration on NX-2 and NX-3
to enable L1 route propagation into the L2 LSPDB.
Example 9-46 displays NX-3’s LSP that was advertised to NX-2, now that
L1 route propagation has been configured on NX-2 and NX-3. Notice that it
now includes the L1 route 10.4.4.0/24.
Example 9-47 displays NX-2’s and NX-4’s routing table after the L1 route
propagation was configured on NX-2 and NX-3. Now the 10.1.1.0/24 and
10.4.4.0/24 network are reachable on both the L1-L2 switches.
Example 9-47 NX-2 and NX-4’s Routing Table After L1 Route Propagation
Click here to view code image
The problem comes from the suboptimal routing that occurs when NX-1
tries to connect with 10.6.6.0/24 network, as it crosses the higher cost
10.24.1.0/24 network link. The same problem occurs for NX-3 connecting
with the 10.5.5.0/24 network. Example 9-48 displays the suboptimal path
taken by both NX-1 and NX-3.
Example 9-49 displays the IS-IS database on NX-1 and NX-3. The attached
bit ‘A’ is detected for NX-2 and NX-4. In essence, the attached bit provides
a L1 default route toward the advertising L1-L2 router.
Now NX-1 and NX-2 must identify the closest router with the attached bit.
Normally this is a manual process of cross-referencing the IS-IS database
with the IS-IS topology table, but NX-OS does this for you automatically.
The IS-IS topology table for NX-1 and NX-3 is displayed in Example 9-50.
Example 9-51 displays the routing table of NX-1 and NX-3. Notice that an
entry does not exist for the 10.5.5.0/24 or 10.6.6.0/24 networks, so the
default network is used instead. Notice that the default route correlates with
the IS-IS topology table entry from Example 9-50.
Note
Route leaking normally uses a restrictive route map to control which
routes are leaked; otherwise, running all the area routers in L2 mode
makes more sense.
Let’s verify the change by checking the IS-IS database to see if the
10.5.5.0/24 and 10.6.6.0/24 networks are being advertised by NX-2 and
NX-4 into IS-IS L1 for Area 49.1234. After that is verified, check the
routing table to verify that those entries are added to the RIB. Example 9-
53 displays the IS-IS Database with L2 Route Leaking.
Example 9-54 verifies that NX-1 and NX-3 are forwarding traffic using the
optimal path.
Redistribution
Redistributing into IS-IS uses the command redistribute [bgp asn | direct |
eigrp process-tag | isis process-tag | ospf process-tag | rip process-tag |
static] route-map route-map-name. A route-map is required as part of the
redistribution process on Nexus switches. Every protocol provides a seed
metric at the time of redistribution that allows the destination protocol to
calculate a best path. IS-IS provides a default redistribution metric of 10.
Example 9-55 provides the necessary configuration to demonstrate the
process of redistribution. NX-1 redistributes the connected routes for
10.1.1.0/24 and 10.11.11.0/24 in lieu of them being advertised with the IS-
IS routing protocol. Notice that the route-map is a simple permit statement
without any conditional matches.
The route is redistributed on NX-1 and is injected into the IS-IS database
with the 10.1.1.0/24 and 10.11.11.0/24 prefix. The redistribution of prefixes
is verified by looking at the LSPDB on other devices, such as NX-2, as
shown in Example 9-56.
Example 9-56 Verification of Redistributed Networks
Click here to view code image
References
RFC 1195, Use of OSI IS-IS for Routing in TCP/IP and Dual Environments.
R. Callon. IETF, http://tools.ietf.org/html/rfc1195, December 1990.
RFC 3784, Intermediate System to Intermediate System (IS-IS) Extensions
for Traffic Engineering (TE). Tony Li, Henk Smit. IETF,
https://tools.ietf.org/html/rfc3784, June 2004.
Edgeworth, Brad, Aaron Foss, Ramiro Garza Rios. IP Routing on Cisco
IOS, IOS XE and IOS XR. Indianapolis: Cisco Press, 2014.
Cisco. Cisco NX-OS Software Configuration Guides,
http://www.cisco.com.
Chapter 10
Note
PACL can be applied only on ingress packets for L2/L3 physical
Ethernet interfaces (including L2 port-channel interfaces).
IP ACL
NX-1(config)# ip access-list TEST
NX-1(config-acl)# permit ip host 192.168.33.33 host 192.168.3.3
NX-1(config-acl)# permit ip any any
NX-1(config-acl)# statistics per-entry
IPv6 ACL
NX-1(config)# ipv6 access-list TESTv6
NX-1(config-ipv6-acl)# permit icmp host 2001::33 host 2001::3
NX-1(config-ipv6-acl)# permit ipv6 any any
NX-1(config-ipv6-acl)# statistics per-entry
MAC ACL
NX-1(config)# mac access-list TEST-MAC
NX-1(config-mac-acl)# permit 00c0.cf00.0000 0000.00ff.ffff any
NX-1(config-mac-acl)# permit any any
NX-1(config-mac-acl)# statistics per-entry
ARP ACL
NX-1(config)# arp access-list TEST-ARP
NX-1(config-arp-acl)# deny ip host 192.168.10.11 mac
00c0.cf00.0000 ffff.ff00.0000
NX-1(config-arp-acl)# permit ip any mac any
VLAN Access-map
NX-1(config)# vlan access-map TEST-VLAN-MAP
NX-1(config-access-map)# match ip address TEST
NX-1(config-access-map)# action drop
NX-1(config-access-map)# statistics per-entry
Note
Validate the ACL-related configuration using the command show run
aclmgr. This command displays both the ACL configuration and the
ACL attach points.
Example 10-2 illustrates the difference between the output of the show ip
access-list command when the statistics per-entry command is
configured, compared to when it is not configured. In the following
example, the ACL configuration that has the statistics per-entry command
configured displays the statistics for the confirmed hits.
As stated before, the ACLMGR takes care of creating the policies when an
ACL is attached to an attach point. The policies created by the ACLMGR
are verified using the command show system internal aclmgr access-lists
policies interface interface-id. This command displays the policy type and
interface index, which points to the interface where the ACL is attached, as
shown in Example 10-4.
Example 10-4 Verifying Access-List Counters in Hardware
Click here to view code image
NX-OS has a packet processing filter (PPF) API, which is used to filter the
security rules received and processed by the ACLMGR to the relevant
clients. The clients can be an interface, a port-channel, a VLAN manager,
VSH, and so on. It is important to remember that the ACLMGR stores all
the data in the form of a PPF database, where each element is a node. Based
on the node ID received from the previous command, more details about
the policy can be verified by performing a lookup in the PPF database on
that node.
Example 10-5 illustrates the use of the command show system internal
aclmgr ppf node node-id to perform a lookup on the PPF database of the
ACLMGR for the policy node created when the policy is attached to an
attach point. This command is useful when troubleshooting ACL/filtering-
related issues, such as ACL not filtering the traffic properly or not
matching the ACL entry at all on NX-OS platform.
Note
When troubleshooting any ACL related issues, it is recommended that
you collect the command show tech aclmgr [detail] or show tech
aclqos [detail] during a problem. The ACLQOS component on the line
card provides statistics for ACLs on a per-line card basis and are
important when you are troubleshooting ACL-related issues.
Interior Gateway Protocol (IGP) Network Selection
When ACLs are used for the IGP network selection during redistribution,
the source fields of the ACL are used to identify the network, and the
destination fields identify the smallest prefix length allowed in the network
range. Table 10-1 provides sample ACL entries from within the ACL
configuration mode and specifies the networks that match with the
extended ACL. Notice that the subtle difference for the destination
wildcard for the 172.16.0.0 network affects the actual network ranges that
are permitted in the second and third rows of the table.
Note
Extended ACLs that are used for distribute-list use the source fields to
identify the source of the network advertisement, and the destination
fields identify the network prefix.
BGP Network Selection
Extended ACLs react differently when matching BGP routes than when
matching IGP routes. The source fields match against the network portion
of the route, and the destination fields match against the network mask, as
shown in Figure 10-1. Extended ACLs were originally the only match
criteria used by IOS with BGP before the introduction of prefix-lists.
Table 10-2 demonstrates the concept of the wildcard for the network and
subnet mask.
Prefix Matching
The structure for a prefix match specification contains two parts: high-
order bit pattern and high-order bit count, which determine the high-order
bits in the bit pattern that are to be matched. Some documentation refers to
the high-order bit pattern as the address or network, and the high-order bit
count as length or mask length.
In Figure 10-2, the prefix match specification has the high-order bit pattern
of 192.168.0.0 and a high-order bit count of 16. The high-order bit pattern
has been converted to binary to demonstrate where the high-order bit count
lays. Because no additional matching length parameters are included, the
high-order bit count is an exact match.
The 10.168.0.0/13 prefix does not qualify because the prefix length is less
than the minimum of 24 bits, whereas the 10.168.0.0/24 prefix does meet
the matching length parameter. The 10.173.1.0/28 prefix qualifies because
the first 13 bits match the high-order bit pattern, and the prefix length is
within the matching length parameter. The 10.104.0.0/24 prefix does not
qualify because the high-order bit-pattern does not match within the high-
order bit count.
Figure 10-4 demonstrates a prefix match specification with a high-order bit
pattern of 10.0.0.0, high-order bit count of 8, and the matching length must
be between 22 and 26.
The 10.0.0.0/8 prefix does not match because the prefix length is too short.
The 10.0.0.0/24 qualifies because the bit pattern matches and the prefix
length is between 22 and 26. The 10.0.0.0/30 prefix does not match because
the bit pattern is too long. Any prefix that starts with 10 in the first octet
and has a prefix length between 22 and 26 will match the prefix match
specification.
Prefix Lists
Prefix lists contain multiple prefix matching specification entries that
contain a permit or deny action. Prefix lists process in sequential order in
a top-down fashion, and the first prefix match processes with the
appropriate permit or deny action.
NX-OS prefix lists are configured with the global configuration command
ip prefix-list prefix-list-name [seq sequence-number] {permit | deny}
high-order-bit-pattern/high-order-bit-count [{eq match-length-value | le
le-value | ge ge-value [le le-value]}].
If a sequence is not provided, the sequence number auto-increments by 5
based off the highest sequence number. The first entry is 5. Sequencing
allows the deletion of a specific entry. Because prefix lists cannot be
resequenced, it is advisable to leave enough space for insertion of sequence
numbers at a later time.
Example 10-6 provides a sample prefix list named RFC 1918 for all the
networks in the RFC 1918 address range. The prefix list only allows /32
prefixes to exist in the 192.168.0.0 network range and not exist in any other
network range in the prefix list.
Notice that sequence 5 permits all /32 prefixes in the 192.168.0.0/13 bit
pattern, then sequence 10 denies all /32 prefixes in any bit pattern, and then
sequence 15, 20, 25 permit routes in the appropriate network ranges. The
sequence order is important for the first two entries to ensure that only /32
prefixes exist in the 192.168.0.0 in the prefix list.
The command show ip prefix-list prefix-list-name high-order-bit-
pattern/high-order-bit-count first-match provides the capability for a
specific network prefix to be checked against a prefix-list to identify the
matching sequence, if any.
Example 10-7 displays the command being executed against three network
prefix patterns based upon the RFC1918 prefix-list created earlier. The first
command uses a high-order bit count of 32, which matches against
sequence 5, whereas the second command uses a high-order bit count of 16,
which matches against sequence 25. The last command matches against
sequence 10, which has a deny action.
Note
This command demonstrated in Example 10-7 is useful for verifying
that the network prefix matches the intended sequence in a prefix-list.
Route-Maps
Route-maps provide many features to a variety of routing protocols. At the
simplest level, route-maps filter networks similar to an ACL, but also
provide additional capability by adding or modifying a network attribute.
Route-maps must be referenced within a routing-protocol to influence it.
Route-maps are a critical to BGP because it is the main component of
modifying a unique routing policy on a neighbor-by-neighbor basis.
Route-maps are composed of four components:
Sequence Number: Dictates the processing order of the route-map.
Conditional Matching Criteria: Identifies prefix characteristics
(network, BGP path attribute, next-hop, and so on) for a specific
sequence.
Processing Action: Permits or denies the prefix.
Optional Action: Allows for manipulations dependent upon how the
route-map is referenced on the router. Actions include modification,
addition, or removal of route characteristics.
Route-maps use the following command syntax route-map route-map-
name [permit | deny] [sequence-number]. The following rules apply to
route-map statements:
If a processing action is not provided, the default value of permit is
used.
If a sequence number is not provided, the sequence number defaults to
10.
If a conditional matching statement is not included, an implied all
prefixes is associated to the statement.
Processing within a route-map stops after all optional actions have
processed (if configured) after matching a conditional matching
criteria.
If a route is not conditionally matched, there is an implicit deny for
that route.
Example 10-8 provides a sample route-map to demonstrate the four
components of a route-map shown earlier. The conditional matching
criteria is based upon network ranges specified in an ACL. Comments have
been added to explain the behavior of the route-map in each sequence.
Note
When deleting a specific route-map statement, include the sequence
number to prevent deleting the entire route-map.
Conditional Matching
Now that the components and processing order of a route-map were
explained, this section expands upon the aspect of how to match a route.
Example 10-9 shows the various options available within NX-OS.
match as-number { number [, Matches routes that come from peers with
number ...] | as-path-access- list the matching ASNs
name acl-name }
match as-path acl-number Selects prefixes based on regex query to
isolate the ASN in the BGP path attribute
(PA) AS-Path
*Allows for multiple match variables
match community community- Selects prefixes based on BGP
list-name communities
*Allows for multiple match variables
Note
Sequence 20 is redundant because of the implicit deny for any prefixes
that are not matched in sequence 10. However it provides clarity for
junior network engineers.
Complex Matching
Some network engineers find route-maps too complex if the conditional
matching criteria uses an ACL, AS-Path ACL, or prefix list that contains a
deny statement in it. Example 10-12 demonstrates a configuration where
the ACL uses a deny statement for the 172.16.1.0/24 network range.
Reading configurations like this must follow the sequence order first,
conditional matching criteria second, and only after a match occurs should
the processing action and optional action be used. Matching a deny
statement in the conditional match criteria excludes the route from that
sequence in the route-map.
The prefix 172.16.1.0/24 is denied by ACL-ONE, so that infers that there is
not a match in sequence 10 and 20; therefore, the processing action (permit
or deny) is not needed. Sequence 30 does not contain a match clause, so
any remaining routes are permitted, The prefix 172.16.1.0/24 passes on
sequence 30 with the metric set to 20. The prefix 172.16.2.0/24 matches
ACL-ONE and passes in sequence 10.
Note
Route-maps process in the order of evaluation of the sequence,
conditional match criteria, processing action, and optional action, in
that order. Any deny statements in the match component are isolated
from the route-map sequence action.
Optional Actions
In addition to permitting the prefix to pass, route-maps modify route
attributes. Table 10-4 provides a brief overview of the most popular
attribute modifications.
Table 10-4 Route-Map Set Actions
Set Action Description
set as-path prepend Prepends the AS-Path for the network prefix with
{as-number- pattern | the pattern specified, or from multiple iterations
last-as 1-10} from neighboring AS.
set ip next-hop { ip- Sets the next-hop IP address for any matching
address | peer-address prefix. BGP dynamic manipulation uses the peer-
| self } address or self keywords.
When routes are not installed in EIGRP as anticipated, the first step is to
check to make sure that any relevant policies were bound to the destination
routing protocol. The command show system internal rpm event-history
rsw displays low-level events that are handled by RPM.
Example 10-14 displays the command. Notice that two different route-
maps were applied to EIGRP: one was for OSPF and the other for directly
connected interfaces as the source redistribution protocols.
Policy-Based Routing
A router makes forwarding decisions based upon the destination address of
the IP packet. Some scenarios accommodate other factors, such as packet
length or source address, when deciding where the router should forward a
packet.
Policy-based routing (PBR) allows for conditional forwarding of packets
based on the following packet characteristics:
Routing by protocol type (Internet Control Message Protocol [ICMP],
Transmission Control Protocol [TCP], User Datagram Protocol [UDP]
and so on)
Routing by source IP address, destination IP address, or both
Manually assigning different network paths to the same destination
based upon tolerance for latency, link-speed, or utilization for specific
transient traffic
Packets are examined for PBR processing as they are received on the router
interface. PBR verifies the existence of the next-hop IP address and then
forwards packets using the specified next-hop address. Additional next-hop
addresses are configured so that in the event that the first next-hop address
is not in the RIB, the secondary next-hop addresses are used. If none of the
specified next-hop addresses exist in the routing table, the packets are not
conditionally forwarded.
Note
PBR policies do not modify the RIB because the policies are not
universal for all packets. This often complicates troubleshooting
because the routing table displays the next-hop address learned from
the routing protocol, but does not accommodate for a different next-
hop address for the conditional traffic.
NX-OS PBR configuration use a route-map with match and set statements
that are then attached to the inbound interface. The following steps are
used:
Step 1. Enable the PBR feature. The PBR feature is enabled with the global
configuration command feature pbr.
Step 2. Define a route-map. The route-map is configured with the global
configuration command route-map route-map-name [permit |
deny] [sequence- number].
Step 3. Identify the conditional match criteria. The conditional match
criteria is based upon packet length with the command match
length minimum-length maximum-length, or by using the packet ip
address fields with an ACL using the command match ip address
{access-list-number | acl-name}.
Step 4. Specify the next-hop. The route-map configuration command set ip
[default] next-hop ip-address [... ip-address] is used to specify one
or more next-hops for packets that match the criteria. The optional
default keyword changes the behavior so that the next-hop address
specified by the route-map is used only if the destination address
does not exist in the RIB. If a viable route exists in the RIB, that is
the next-hop address that is used for forwarding the packet.
Step 5. Apply the route-map to the inbound interface. The route-map is
applied with the interface parameter command ip policy route-
map route-map-name.
Step 6. Enable PBR statistics (optional). Statistics of PBR
forwarding are enabled with the command route-map route-map-
name pbr-statistics.
Figure 10-6 displays a topology to demonstrate how PBR operates. The
default path between NX-1 and NX-6 is NX-1 → NX-2 → NX-3 → NX-5
→ NX-6 because the link cost of 10.23.1.0/24 is lower than 10.24.1.0/24
link. However, specific traffic sourced from NX-1’s Loopback 0
(192.168.1.1) to NX-6’s Loopback 0 (192.168.6.6) must not forward
through NX-3. These packets must forward through NX-4 even though it
has a higher path cost.
NX-2
feature pbr
!
ip access-list R1-TO-R6
10 permit ip 192.168.1.1/32 192.168.6.6/32
!
route-map PBR pbr-statistics
route-map PBR permit 10
match ip address R1-TO-R6
set ip next-hop 10.24.1.4
!
interface Ethernet2/1
description to NX-1
ip address 10.12.1.2/24
ip router ospf NXOS area 0.0.0.0
ip policy route-map PBR
PBR statistics were enabled on NX-2 that allows for network engineers to
see how much traffic was forwarded by PBR. Example 10-21 displays
output for PBR statistics before and after traffic was conditionally
forwarded.
Note
The PBR configuration shown is for transient traffic. For PBR on
locally generated traffic, use the command ip local policy route-map
route-map-name.
Summary
This chapter covered several important building block features that are
necessary for understanding the conditional matching process used within
NX-OS route-maps:
Access control lists provide a method of identifying networks.
Extended ACLS provide the capability to select the network and
advertising router for IGP protocols and provide the capability to use
wildcards for the network and subnet mask for BGP routes.
Prefix lists identify networks based upon the high-order bit pattern,
high-order bit count, and required prefix length requirements.
Regular expressions (regex) provide a method of parsing output in a
systematic way. Regex is commonly used for BGP filtering, but it is
also used in the CLI for parsing output, too.
Route-maps filter routes similar to an ACL and provide the capability to
modify route attributes. Route-maps are composed of sequence numbers,
matching criteria, processing action, and optional modifying actions. They
use the following logic:
If matching criteria is not specified, all routes qualify for that route-
map sequence.
Multiple conditional matching requirements of the same type are a
Boolean or, and multiple conditional matching requirements of
different type are a Boolean and.
NX-OS uses RPM that operate as a separate process and memory
space from the actual protocols. This provides an additional method
for diagnosing unintentional behaviors in a protocol.
The default packet-forwarding decisions bypass routing protocols
altogether through the use of policy-based routing to place specific network
traffic onto a different path that was selected by the routing protocol.
References
Edgeworth, Brad, Aaron Foss, and Ramiro Garza Rios. IP Routing on Cisco
IOS, IOS XE and IOS XR. Indianapolis: Cisco Press, 2014.
Cisco. Cisco NX-OS Software Configuration Guides, www.cisco.com.
Chapter 11
Troubleshooting BGP
BGP Fundamentals
Defined in RFC 1654, Border Gateway Protocol (BGP) is a path-vector
routing protocol that provides scalability, flexibility, and network stability.
When BGP was first developed, the primary design consideration was for
IPv4 inter-organizational routing information exchange across the public
networks, such as the Internet, or for private dedicated networks. BGP is
often referred to as the protocol for the Internet, because it is the only
protocol capable of holding the Internet routing table, which has more than
600,000 IPv4 routes and over 42,000 IPv6 routes, both of which continue to
grow.
From the perspective of BGP, an autonomous system (AS) is a collection of
routers under a single organization’s control. Organizations requiring
connectivity to the Internet must obtain an autonomous system number
(ASN). ASNs were originally 2 bytes (16-bit) providing 65,535 ASNs. Due
to exhaustion, RFC 4893 expands the ASN field to accommodate 4 bytes
(32-bit). This allows for 4,294,967,295 unique ASNs, providing quite a leap
from the original 65,535 ASNs. The Internet Assigned Numbers Authority
(IANA) is responsible for assigning all public ASNs to ensure that they are
globally unique.
Two blocks of private ASNs are available for any organization to use as
long as they are never exchanged publicly on the Internet. ASNs 64,512 to
65,535 are private ASNs within the 16-bit ASN range, and 4,200,000,000 to
4,294,967,294 are private ASNs within the extended 32-bit range.
Note
It is imperative that you use only the ASN assigned by IANA, the ASN
assigned by your service provider, or private ASNs. Not only that, the
public prefixes are mapped with the relevant ASN numbers of the
organizations. Thus, mistakenly or maliciously advertising a prefix
using the wrong ASN could result in traffic loss and causing havoc on
the Internet.
Address Families
Originally, BGP was intended for routing of IPv4 prefixes between
organizations, but RFC 2858 added Multi-Protocol BGP (MP-BGP)
capability by adding extensions called address-family identifier (AFI). An
address-family correlates to a specific network protocol, such as IPv4,
IPv6, and so on, and additional granularity through subsequent address-
family identifier (SAFI), such as unicast and multicast. MBGP achieves this
separation by using the BGP path attributes (PA) MP_REACH_NLRI and
MP_UNREACH_NLRI. These attributes are carried inside BGP update
messages and are used to carry network reachability information for
different address families.
Note
Some network engineers refer to Multi-Protocol BGP as MP-BGP and
other network engineers use the term MBGP. Both terms are the same
thing.
Path Attributes
BGP attaches path attributes (PA) associated with each network path. The
PAs provide BGP with granularity and control of routing policies within
BGP. The BGP prefix PAs are classified as follows:
Well-known mandatory
Well-known discretionary
Optional transitive
Optional nontransitive
Per RFC 4271, well-known attributes must be recognized by all BGP
implementations. Well-known mandatory attributes must be included with
every prefix advertisement, whereas well-known discretionary attributes
may or may not be included with the prefix advertisement.
Optional attributes do not have to be recognized by all BGP
implementations. Optional attributes can be set so that they are transitive
and stay with the route advertisement from AS to AS. Other PAs are
nontransitive and cannot be shared from AS to AS. In BGP, the Network
Layer Reachability Information (NLRI) is the routing update that consists
of the network prefix, prefix-length, and any BGP PAs for that specific
route.
Loop Prevention
BGP is a path vector routing protocol and does not contain a complete
topology of the network like link state routing protocols. BGP behaves
similar to distance vector protocols to ensure a path is a loop-free path.
The BGP attribute AS_PATH is a well-known mandatory attribute and
includes a complete listing of all the ASNs that the prefix advertisement
has traversed from its source AS. The AS_PATH is used as a loop-
prevention mechanism in the BGP protocol. If a BGP router receives a
prefix advertisement with its AS listed in the AS_PATH, it discards the
prefix because the router thinks the advertisement forms a loop.
Note
The other IBGP-related loop-prevention mechanism are discussed
later in this chapter.
BGP Sessions
A BGP session refers to the established adjacency between two BGP
routers. BGP sessions are always point-to-point and are categorized into
two types:
Internal BGP (iBGP): Sessions established with an iBGP router that
are in the same AS or participate in the same BGP confederation.
iBGP sessions are considered more secure, and some of BGP’s
security measures are lowered in comparison to EBGP sessions. iBGP
prefixes are assigned an administrative distance (AD) of 200 upon
installing into the router’s Routing Information Base (RIB).
External BPG (EBGP): Sessions established with a BGP router that
are in a different AS. EBGP prefixes are assigned an AD of 20 upon
installing into the router’s RIB.
Note
Administrative distance (AD) is a rating of the trustworthiness of a
routing information source. If a router learns about a route to a
destination from more than one routing protocol and they all have the
same prefix length, AD is compared. The preference is given to the
route with the lower AD.
BGP uses TCP port 179 to communicate with other routers. Transmission
Control Protocol (TCP) allows for handling of fragmentation, sequencing,
and reliability (acknowledgement and retransmission) of communication
(control plane) packets. Although BGP can form neighbor adjacencies that
are directly connected, it can also form adjacencies that are multiple hops
away. Multihop sessions require that the router use an underlying route
installed in the RIB (static or from any routing protocol) to establish the
TCP session with the remote endpoint.
Note
BGP neighbors connected via the same network use the ARP table to
locate the IP address of the peer. Multihop BGP sessions require route
table information for finding the IP address of the peer. It is common
to have a static route or Interior Gateway Protocol (IGP) running
between iBGP peers for providing the topology path information for
establishing the BGP TCP session. A default route is not sufficient to
establish a multihop BGP session.
Note
It is a best practice to statically assign the BGP Router-ID.
BGP Messages
BGP communication uses four message types as shown in Table 11-2.
UPDATE
The UPDATE message advertises any feasible routes, withdraws previously
advertised routes, or can do both. The UPDATE message includes the
Network Layer Reachability Information (NLRI) that includes the prefix
and associated BGP PAs when advertising prefixes. Withdrawn NLRIs
include only the prefix. An UPDATE message can act as a KEEPALIVE
message to reduce unnecessary traffic.
NOTIFICATION
A NOTIFICATION message is sent when an error is detected with the BGP
session, such as a Hold Timer expiring, a neighbor capabilities change, or a
BGP session reset is requested. This causes the BGP connection to close.
Note
More details on the BGP messages are discussed during
troubleshooting sections.
KEEPALIVE
BGP does not rely upon the TCP connection state to ensure that the
neighbors are still alive. KEEPALIVE messages are exchanged every 1/3 of
the Hold Timer agreed upon between the two BGP routers. Cisco devices
have a default Hold Time of 180 seconds, so the default KEEPALIVE
interval is 60 seconds. If the Hold Time is set for zero, no KEEPALIVE
messages are sent between the BGP neighbors.
Idle
This is the first stage of the BGP FSM. BGP detects a start event and tries
to initiate a TCP connection to the BGP peer and also listens for a new
connect from a peer router.
If an error causes BGP to go back to the Idle state for a second time, the
ConnectRetryTimer is set to 60 seconds and must decrement to zero before
the connection is initiated again. Further failures to leave the Idle state
result in the ConnectRetryTimer doubling in length from the previous time.
Connect
In this state, BGP initiates the TCP connection. If the 3-way TCP
handshake completes, the established BGP Session BGP process resets the
ConnectRetryTimer and sends the Open message to the neighbor, and
changes to the OpenSent State.
If the ConnectRetry timer depletes before this stage is complete, a new
TCP connection is attempted, the ConnectRetry timer is reset, and the state
is moved to Active. If any other input is received, the state is changed to
Idle.
During this stage, the neighbor with the higher IP address manages the
connection. The router initiating the request uses a dynamic source port,
but the destination port is always 179.
Note
Service providers consistently assign their customers the higher or
lower IP address for their networks. This helps the service provider
create proper instructions for ACLs or firewall rules, or for
troubleshooting them.
Active
In this state, BGP starts a new 3-way TCP handshake. If a connection is
established, an Open message is sent, the Hold Timer is set to 4 minutes,
and the state moves to OpenSent. If this attempt for TCP connection fails,
the state moves back to the Connect state and resets the
ConnectRetryTimer.
OpenSent
In this state, an Open message has been sent from the originating router and
is awaiting an Open message from the other router. After the originating
router receives the OPEN message from the other router, both OPEN
messages are checked for errors. The following items are being compared:
BGP versions must match.
The source IP Address of the OPEN message must match the IP
address that is configured for the neighbor.
The AS number in the OPEN message must match what is configured
for the neighbor.
BGP Identifiers (RID) must be unique. If a RID does not exist, this
condition is not met.
Security Parameters (Password, Time to Live [TTL], and so on)
If the Open messages do not have any errors, the Hold Time is negotiated
(using the lower value), and a KEEPALIVE message is sent (assuming the
value is not set to zero). The connection state is then moved to
OpenConfirm. If an error is found in the OPEN message, a Notification
message is sent, and the state is moved back to Idle.
If TCP receives a disconnect message, BGP closes the connection, resets
the ConnectRetryTimer, and sets the state to Active. Any other input in this
process results in the state moving to Idle.
OpenConfirm
In this state, BGP waits for a Keepalive or Notification message. Upon
receipt of a neighbor’s Keepalive, the state is moved to Established. If the
Hold Timer expires, a stop event occurs, or a Notification message is
received, the state is moved to Idle.
Established
In this state, the BGP session is established. BGP neighbors exchange
routes via Update messages. As Update and Keepalive messages are
received, the Hold Timer is reset. If the Hold Timer expires, an error is
detected, and BGP moves the neighbor back to the Idle state.
NX-4
feature bgp
router bgp 65000
router-id 192.168.4.4
address-family ipv4 unicast
network 192.168.4.4/32
redistribute direct route-map conn
neighbor 10.46.1.6
remote-as 65001
address-family ipv4 unicast
neighbor 192.168.1.1
remote-as 65000
update-source loopback0
address-family ipv4 unicast
next-hop-self
!
ip prefix-list connected-routes seq 5 permit 10.46.1.0/24
!
route-map conn permit 10
match ip address prefix-list connected-routes
After the BGP peering is established, the BGP prefixes are verified using
the command show bgp afi safi. This command lists all the BGP prefixes
in the respective address families. Example 11-3 displays the output of the
BGP prefixes on NX-4. In the output, the BGP table holds locally
advertised prefixes with the next-hop value of 0.0.0.0, the next-hop IP
address, and a flag to indicate whether the prefix was learned from an IBGP
(i) or EBGP (e) peer.
Network Next
Hop Metric LocPrf Weight Path
*>r10.46.1.0/24 0.0.0.0 0 100 3
2768 ?
*>i192.168.1.1/32 192.168.1.1 100
0 i
*>l192.168.4.4/32 0.0.0.0 100 3
2768 i
*>e192.168.6.6/32 10.46.1.6
0 65001 i
On NX-OS, the BGP process is instantiated the moment the router bgp asn
command is configured. The details of the BGP process and the
summarized configuration are viewed using the command show bgp
process. This command displays the BGP process ID, state, number of
configured and active BGP peers, BGP attributes, VRF information,
redistribution and relevant route-maps used with various redistribute
statements, and so on. If there is a problem with the BGP process, this
command can be viewed to verify the state of BGP along with the memory
information of the BGP process. Example 11-4 displays the output of the
command show bgp process, highlighting some of the important fields in
the output in Example 11-3.
Redistribution
direct, route-map conn
Nexthop trigger-delay
critical 3000 ms
non-critical 10000 ms
Redistribution
None
Nexthop trigger-delay
critical 3000 ms
non-critical 10000 ms
Verifying Configuration
The very first step in troubleshooting BGP peering issues is verifying the
configuration and understanding the design. Many times, a basic
configuration mistake causes a BGP peering not to establish. The following
items should be checked when a new BGP session is configured:
Local AS number
Remote AS number
Verifying the network topology and other documentations
It is important to understand the traffic flow of BGP packets between peers.
The source IP address of the BGP packets still reflects the IP address of the
outbound interface. When a BGP packet is received, the router correlates
the source IP address of the packet to the BGP neighbor table. If the BGP
packet source does not match an entry in the neighbor table, the packet
cannot be associated to a neighbor and is discarded.
In most of the deployments, the iBGP peering is established over loopback
interface, and if the update-source interface is not specified, the session
does not come up. The explicit sourcing of BGP packets from an interface
is verified by ensuring that the update-source interface-id command under
the neighbor ip-address configuration section is correctly configured for
the peer.
If there are multiple hops between the EBGP peers, then proper hop count
is required. Ensure the ebgp-multihop [hop-count] is configured with the
correct hop count. If the hop-count is not specified, the default value is set
to 255. Note that the default TTL value for IBGP sessions is 255 whereas
the default value of EBGP session is 1. If an EBGP peering is established
between two directly connected devices but over the loopback address,
users can also use the disable-connected-check command instead of using
the ebgp-multihop 2 command. This command disables the connection
verification mechanism, which by default, prevents the session from
getting established when the EBGP peer is not in the directly connected
segment.
Another configuration that is important, although optional, for successful
establishment of a BGP session is peer authentication. Misconfiguration or
typo errors in authentication passwords will cause the BGP session to fail.
Note
At times, users may experience packet loss when performing a ping
test. If there is a pattern seen in the ping test, it is most likely be due
to CoPP policy, which is dropping those packets.
Using the preceding ping methods, reachability is verified for both the
IBGP and EBGP peers. But if there is a problem with the reachability, use
the following procedure to isolate the problem or direction of the problem.
Identify the direction of packet loss. The show ip traffic command on NX-
OS is used to identify the packet loss or direction of the packet loss. If
there is a complete or random packet loss of the ping (ICMP) packets from
source to destination, use this method. The command output has the section
of ICMP Software Processed Traffic Statistics, which consists of two
subsections: Transmission and Reception. Both the sections consist of
statistics for echo request and echo reply packets. To perform this test, first
ensure that the sent and receive counters are stable (not incrementing) on
both the source and the destination devices. Then initiate the ping test
toward the destination by specifying the source interface or IP address.
After the ping is completed, verify the show ip traffic command to
validate the increase in counters on both sides to understand the direction
of the packet loss. Example 11-6 demonstrates the method for isolating the
direction of packet loss. In this example, the ping is initiated from NX-1 to
NX-4 loopback. The first output displays that the echo request packets
received at 10 and the echo reply sent are 10 as well. After the ping test
from NX-1 to NX-4 loopback, the counters increase to 15 for both echo
request and echo reply.
NX-4
NX-4# show ip traffic | in Transmission:|Reception:|echo
Transmission:
Redirect: 0, unreachable: 0, echo request: 33, echo reply: 10,
Reception:
Redirect: 0, unreachable: 0, echo request: 10, echo reply: 29,
NX-1
NX-1# ping 192.168.4.4 source 192.168.1.1
PING 192.168.4.4 (192.168.4.4) from 192.168.1.1: 56 data bytes
64 bytes from 192.168.4.4: icmp_seq=0 ttl=253 time=3.901 ms
64 bytes from 192.168.4.4: icmp_seq=1 ttl=253 time=2.913 ms
64 bytes from 192.168.4.4: icmp_seq=2 ttl=253 time=2.561 ms
64 bytes from 192.168.4.4: icmp_seq=3 ttl=253 time=2.502 ms
64 bytes from 192.168.4.4: icmp_seq=4 ttl=253 time=2.571 ms
NX-4
NX-4# show ip traffic | in Transmission:|Reception:|echo
Transmission:
Redirect: 0, unreachable: 0, echo request: 33, echo reply: 15,
Reception:
Redirect: 0, unreachable: 0, echo request: 15, echo reply: 29,
Similarly, the outputs are verified on NX-1 as well for echo reply received
counters. In the previous example, the ping test is successful, and thus both
the echo request received and echo reply sent counters incremented, but in
situations when the ping test is failing, it is worth checking these counters
closely and with multiple iterations of test. If the ping to the destination
device is failing but still both the counters increment on the destination
device, the problem could be with the return path, and the users may have
to check the path for the return traffic.
ACLs prove to be really useful when troubleshooting packet loss or
reachability issues. Configuring an ACL matching the source and the
destination IP helps to confirm whether the packet has actually reached the
destination router. The only caution that needs to be taken is that while
configuring ACL, permit ip any any should be configured at the end, or
else it could cause the other packets to get dropped and thus cause a service
impact.
In the access-list named Out, though, both the statements permitting the
BGP packets are not required, but it is good practice to have both.
Another problem users might run into with a firewall in middle is with a
couple of features on an ASA firewall:
Sequence number randomization
Enabling TCP Option 19 for MD5 authentication
ASA firewalls by default perform sequence number randomization and thus
can cause BGP sessions to flap. Also, if the BGP peering is secured using
MD5 authentication, enable TCP option 19 on the firewall’s policy.
If the telnet is not sourced from the interface or IP that the remote device
is configured to form a BGP neighborship with, the Telnet request is
refused. This is another way to confirm that the peering device
configuration is as per the documentation or not.
When troubleshooting TCP connection issues, it is also important to check
the event-history logs for a netstack process as well. Netstack is an
implementation of a Layer-2 to Layer-4 stack on NX-OS. It is one of the
critical components involved in the control plane on NX-OS. If there is a
problem with establishing a TCP session on a Nexus device, it could be a
problem with the netstack process. The show sockets internal event-
history events command helps understand what TCP state transitions
happened for the BGP peer IP.
Example 11-11 demonstrates the use of the show sockets internal event-
history events command to see the TCP session getting closed for BGP
peer IP 192.168.2.2, but it does not show any request coming in.
Note
For any problems encountered with TCP-related protocol such as BGP,
capture show tech netstack [detail] and share the information with
Cisco TAC.
During the initial BGP negotiation between the BGP speakers, certain
capabilities are exchanged. If any of the BGP speakers are receiving a
capability that they do not support, BGP detects an OPEN message error for
unsupported capability (or unsupported optional parameter). For instance,
one of the BGP speakers is having the capability of enhanced route refresh,
but the BGP speaker on the receiving end is running an old software that
does not have the capability, then it detects this as an OPEN message error.
The following optional capabilities are negotiated between the BGP
speakers:
Route Refresh capability
4-byte AS capability
Multiprotocol capability
Single/Multisession capability
To overcome the challenges of unsupported capability, use the command
dont- capability-negotiate under the BGP neighbor configuration mode.
This command disables the capability negotiations between the BGP peers
and allows the BGP peer to come up.
BGP Debugs
Running debugs should always be the last resort for troubleshooting any
network problem because debugs can sometimes cause an impact in the
network if not used carefully. But sometimes they are the only option when
other troubleshooting techniques don’t help understand the problem. Using
the NX-OS debug logfile, users can mitigate any kind of impact due to
chatty debug outputs. Along with using debug logfile, network operators
can put a filter on the debugs using the debug-filter and filtering the output
for specific neighbor, prefix, and even the address-family, thus removing
any possibility of an impact on the Nexus switch.
When a BGP peer is down, and all the other troubleshooting steps are not
helping figure out where the problem is, enable debugs enabled to see if the
router is generating and sending the necessary BGP packets, and if it's
receiving the relevant packets or not. However, debug is not required on
NX-OS because the traces in BGP have sufficient information to debug the
problem. There are several debugs that are available for BGP. Depending
on the state in which BGP is stuck, certain debug commands are helpful.
For a BGP peering down situation, one of the key debugs used is for BGP
keepalives. The BGP keepalive debug is enabled using the command debug
bgp keepalives. In the debug output, the two important factors to consider
for ensuring a successful BGP peering are as follows:
If the BGP keepalive is being generated at regular intervals
If the BGP keepalive is being received at regular intervals
If the BGP keepalive is being generated at regular intervals but the BGP
peering still remains down, it may be possible that the BGP keepalive
couldn’t make it to the other end, or it reached the peering router but was
not processed or dropped. In such cases, BGP keepalive debugs are useful.
Enable the debug command debug bgp keepalives to verify whether the
BGP keepalives are being sent and received. Example 11-13 illustrates the
use of BGP keepalive debug. The first output helps the user verify that the
BGP keepalive is being generated every 60 seconds. The second output
shows the keepalive being received from the remote peer 192.168.1.1.
The Error code and Error-Subcode values are defined in RFC 4271. Table
11-3 shows all the Error codes, Error-Subcode and their interpretation.
Table 11-3 BGP Notification Error and Error-Subcode
Error Subcode Description
Code
01 00 Message Header Error
01 01 Message Header Error—Connection Not Synchronized
01 02 Message Header Error—Bad Message Length
01 03 Message Header Error—Bad Message Type
02 00 OPEN Message Error
02 01 OPEN Message Error—Unsupported Version Number
02 02 OPEN Message Error—Bad Peer AS
02 03 OPEN Message Error—Bad BGP Identifier
02 04 OPEN Message Error—Unsupported Optional
Parameter
02 05 OPEN Message Error—Deprecated
02 06 OPEN Message Error—Unacceptable Hold Time
03 00 Update Message Error
03 01 Update Message Error—Malformed Attribute List
03 02 Update Message Error—Unrecognized Well-Known
Attribute
03 03 Update Message Error—Missing Well-Known
Attribute
03 04 Update Message Error—Attribute Flags Error
03 05 Update Message Error—Attribute Length Error
03 06 Update Message Error—Invalid Origin Attribute
03 07 (Deprecated)
03 08 Update Message Error—Invalid NEXT_HOP Attribute
03 09 Update Message Error—Optional Attribute Error
03 0A Update Message Error—Invalid Network Field
03 0B Update Message Error—Malformed AS_PATH
04 00 Hold Timer Expired
05 00 Finite State Machine Error
06 00 Cease
06 01 Cease—Maximum Number of Prefixes Reached
06 02 Cease—Administrative Shutdown
06 03 Cease—Peer Deconfigured
06 04 Cease—Administrative Reset
06 05 Cease—Connection Rejected
06 06 Cease—Other Configuration Change
06 07 Cease—Connection Collision Resolution
06 08 Cease—Out of Resources
Whenever a notification is generated, the error code and the subcode are
always printed in the message. These notification messages are really
helpful when troubleshooting down peering issues or flapping peer issues.
Troubleshooting IPv6 Peers
With the depletion of IPv4 routes, the IPv6 addresses have caught up pace.
Most of the service providers have already upgraded or are planning to
upgrade their infrastructure to dual stack for supporting both IPv4 and IPv6
traffic and offering IPv6 ready services to the Enterprise customers. Even
the new applications are being developed with IPv6 compatibility or
completely running on IPv6. With such a pace, there is also a need to have
appropriate techniques for troubleshooting IPv6 BGP neighbors.
The methodology for troubleshooting IPv6 BGP peers is same as that of
IPv4 BGP peers. Here are a few steps you can use to troubleshoot down
peering issues for IPv6 BGP neighbors:
Step 1. Verify the configuration for correct peering IPv6 addresses, AS
numbers, update-source interface, authentication passwords, EBGP
multihop configuration.
Step 2. Verify reachability using the ping ipv6 ipv6-neighbor-address
[source interface-id | ipv6-address].
Step 3. Verify the TCP connections using the command show socket
connection tcp on NX-OS. In case of IPv6, check for TCP
connections for source and destination IPv6 addresses and one of
the ports as port 179.
Step 4. Verify any IPv6 ACL’s in path. Like IPv4, the IPv6 ACLs in the
path should permit for TCP connections on port 179 and ICMPv6
packets that can help in verifying reachability.
Step 5. Debugs. On NX-OS switches, use the debug bgp ipv6 unicast
neighbors ipv6-neighbor-address debug command to capture IPv6
BGP packets. Before enabling the debugs, enable the debug logfile
for BGP debug. For filtering the debugs for a particular IPv6
neighbor, use the IPv6 ACL to filtering the debug output for that
particular neighbor.
BGP Peer Flapping Issues
When the BGP session is down, the state never goes to Established state.
The session keeps flapping between Idle and Active. But when the BGP
peer is flapping, it means it is changing state after the session is
established. In this case the BGP state keeps flapping between Idle and
Established states. Following are the two flapping states in BGP:
Idle/Active: Discussed in previous section
Idle/Established: Bad update, TCP problem (MSS size in multihop
deployment)
Flapping BGP peers could be due to one of several reasons:
Bad BGP update
Hold Timer expired
MTU mismatch
High CPU
Improper control-plane policing
Use the command debug bgp packets to view the BGP messages in
hexdump, which can be further decoded. If too many BGP updates and
messages are being exchanged on the NX-OS devices, a better option is to
perform an Ethanalyzer or SPAN to capture a malformed BGP update
packet to further analyze it.
Note
The hexdump in the BGP message can be further analyzed using some
online tools, such as http://bgpaste.convergence.cx.
Note
MTS, CoPP, and other platform troubleshooting is covered in detail in
Chapter 3, “Troubleshooting Nexus Platform Issues.”
BGP Keepalive Generation
In networks, there are instances when a BGP peering might flap randomly.
Apart from the scenarios such as packet loss or control-plane policy drops,
there might be other reasons that the BGP peering flaps and the reason is
still seen as hold timer expiry. One such reason may be due to the BGP
keepalives not being generated in a timely manner. For troubleshooting
such instances, the first step is to understand if there is any pattern to the
BGP flaps. This information is gathered by getting answers to the following
questions:
At what time of the day is the BGP flap happening?
How frequently is the flap happening?
How is the traffic load on the interface/system when the flap occurs?
Is the CPU high during the time of the flap? If yes, is it due to traffic
or a particular process?
These questions help lay out a pattern for the BGP flaps, and relevant
troubleshooting can be performed around the same time. To further
troubleshoot the problem, understand that the BGP flap is due to two
reasons:
Either keepalives getting generated at regular intervals but not leaving
the router or not making it to the other end.
Keepalives are not getting generated at regular intervals.
If the keepalives are getting generated at regular intervals but not leaving
the router, then notice that the OutQ for the BGP peer keeps piling up. The
OutQ keeps incrementing due to keepalive generation, but the MsgSent
does not increase, which may be an indication that the messages are stuck
in the OutQ. Example 11-15 illustrates such a scenario where the BGP
keepalives are generated at regular intervals but do not leave the router,
leading to a BGP flap due to hold timer expiry. Notice that in this example,
the OutQ value increases from 10 to 12, but the MsgSent counter is
stagnant at 3938. In this scenario, the peering may flap every BGP hold
timer.
But if the device experiences random BGP flaps and at irregular intervals,
it is possible that the BGP keepalives are getting generated at regular
intervals, although the flaps may still happen frequently. For instance, a
BGP peering flaps between 4 to 10 minutes. These issues are hard to
troubleshoot and may require a different technique than just running show
commands. The reason is that it is not easy to isolate which device is not
generating the keepalive in a timely manner, or if the keepalive is
generated in a timely manner but there is a delay that occurs when the
keepalive makes it to the remote peer. To troubleshoot, follow the two-step
process between the two ends of the BGP connection.
Step 1. Enable BGP keepalive debug on both routers along with the debug
logfile.
Step 2. Enable Ethanalyzer on both routers.
The purpose of enabling Ethanalyzer or any other packet capture tool
(based on the underlying platform) is that it is possible that the BGP
keepalives reach the other end in a timely manner, but those keepalives
may be delayed before reaching BGP process itself. Based on the outputs of
the BGP keepalive debug and the Ethanalyzer from the far end device, the
timelines could be matched to conclude where exactly the delay might be
happening that is causing the BGP to flap. It may be the BGP process that
is delaying the keepalive generation, or it may be the other components that
interact with BGP to delay the keepalive processing.
Note
MPLS VPN providers should increase the MPLS MTU to at least 1508
(assuming a minimum of 2 labels) or MPLS MTU of 1516 (to
accommodate up to 4 labels)
Now the question is why the MTU mismatch causes BGP sessions to flap?
When the BGP connection is established, the MSS value is negotiated over
the TCP session. When the BGP update is generated, BGP updates are
packaged in the BGP update message, which can hold prefixes and header
information to the maximum capacity of the MSS bytes. These BGP update
messages are then sent to the remote peer with the do-not-fragment (df-bit)
set. If a device in path or even the destination is not able to accept the
packets with a higher MTU, it sends an ICMP error message back to BGP
speaker. The destination router either waits for the BGP Keepalive or BGP
Update packet to update its hold down timer. After 180 seconds, the
destination router sends a Notification back to Source with a Hold Time
expired error message.
Note
When a BGP router sends an update to a BGP neighbor, it does not
send a BGP Keepalive separately. But rather it updates the Keepalive
timer for that neighbor. During the BGP update process, the update
message is treated as a keepalive by the BGP speakers.
NX-1
NX-1# show bgp ipv4 unicast summary
BGP summary information for VRF default, address family IPv4
Unicast
BGP router identifier 192.168.1.1, local AS number 65000
BGP table version is 37, IPv4 Unicast config peers 1, capable
peers 1
4 network entries and 4 paths using 576 bytes of memory
BGP attribute entries [4/576], BGP AS path entries [1/6]
BGP community entries [0/0], BGP clusterlist entries [0/0]
The BGP flap does not occur when a small amount of prefixes are
exchanged between the peers because the BGP packet size is under 1460
bytes. One symptom of BGP flaps due to MSS/MTU issues is a repetitive
BGP flap that occurs because the Hold Timer expires.
The following are the few possible causes of BGP session flapping due to
MTU mismatch:
The interface MTU on both the peering routers do not match.
The Layer 2 path between the two peering routers does not have
consistent MTU settings.
PMTUD didn’t calculate correct MSS for the TCP BGP session.
BGP PMTUD could be failing due to blocked ICMP messages by a
router or a firewall in path.
To verify there are MTU mismatch issues in the path, perform an extended
ping test by setting the size of the packet as the outgoing interface MTU
value along with DF bit set. Also, ensure that ICMP messages are not being
blocked in the path to have PMTUD function properly. Ensure that the
MTU values are consistent throughout the network with a proper review of
the configuration.
Perform a ping test to remote peer with the packet size as the MTU of the
interface and do not fragment (df-bit) set as shown in Example 11-17.
Note
Nexus platform adds 28 bytes (20 bytes IP header + 8 bytes ICMP
header) when performing the ping with MTU size. Thus, when the
ping test is performed with DF-bit set, the ping with 1500 size fails. To
successfully test the ping with the interface MTU packet size and df-
bit set, subtract 28 bytes from the MTU value on the interface. In this
case, 1500 − 28 = 1472.
BGP Route Processing and Route Propagation
After the BGP peering is established, exchange network prefixes and path
attributes for BGP peers. Unlike IGP, BGP allows a routing policy to be
different for each peer within an AS. BGP route processing for inbound and
outbound exchange of network prefixes can be understood in a simple way,
as shown in Figure 11-5. When a BGP router receives a route from peer, the
BGP installs those routes in the BGP table by filtering those routes through
an inbound policy if configured. If the BGP table contains multiple paths
for the same prefix, a best path is selected, and then the best path is
installed in the routing table. Similarly, when advertising a prefix, only the
best route is advertised to the peer device. If there is an outbound policy,
the prefixes are filtered before being advertised to the remote peer.
NX-4
router bgp 65000
router-id 192.168.4.4
log-neighbor-changes
address-family ipv4 unicast
network 192.168.4.4/32
network 192.168.44.44/32
neighbor 192.168.1.1
remote-as 65000
update-source loopback0
address-family ipv4 unicast
next-hop-self
NX-4# show ip route 192.168.4.4/32
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Network Next
Hop Metric LocPrf Weight Path
*>i192.168.1.1/32 192.168.1.1 100
0 i
*>l192.168.4.4/32 0.0.0.0 100 3
2768 i
l192.168.44.44/32 0.0.0.0 100 3
2768 i
Network Next
Hop Metric LocPrf Weight Path
*>l192.168.4.4/32 0.0.0.0 100 3
2768 i
Redistribution
Redistributing routes into BGP is a common method of populating the BGP
table. Examine the same topology shown in Figure 11-6. On router NX-1,
OSPF is being redistributed into BGP. While redistributing the routes from
OSPF to BGP, the route-map permits for prefixes 192.168.4.4/32 and
192.168.44.44/32, although the routing table only learns 192.168.4.4/32
from NX-4. Example 11-19 demonstrates the redistribution process into
BGP. Notice in the output, the prefix 192.168.4.4/32 has an r flag, which
indicates redistributed prefix. Also, the redistributed prefix has a question
mark (?) in the AS path list.
Network Next
Hop Metric LocPrf Weight Path
*>i192.168.2.2/32 192.168.2.2 100
0 i
*>r192.168.4.4/32 0.0.0.0 41 100 3
2768 ?
Note
The redistribution process is the same for other routing protocols,
static routes, and directly connected links, as shown in Example 11-19.
There are a few caveats when performing redistribution for OSPF and IS-IS
as listed:
OSPF: When redistributing OSPF into BGP, the default behavior
includes only routes that are internal to OSPF. The redistribution of
external OSPF routes requires a conditional match on route-type under
route-map.
IS-IS: IS-IS does not include directly connected subnets for any
destination routing protocol. This behavior is overcome by
redistributing the connected networks into BGP.
Example 11-20 displays the various match route-type options available
under the route-map. The route-type options are available for both OSPF
and IS-IS route types.
Route Aggregation
Not all devices in the network are powerful enough to hold all the routes
learned via BGP or other routing protocols. Also, having multiple paths in
the network leads to consumption of more CPU and memory resources. To
overcome this challenge, route aggregation or summarization can be
performed. Route aggregation in BGP is performed using the command
aggregate-address aggregate-prefix/length [advertise-map | as-set |
attribute-map | summary-only | suppress-map]. Table 11-4 describes all
the optional command options available with the aggregate-address
command.
Table 11-4 aggregate-address Command Options
Option Description
NX-2
router bgp 65000
address-family ipv4 unicast
network 192.168.2.2/32
aggregate-address 192.168.0.0/16 summary-only
NX-2# show bgp ipv4 unicast
BGP routing table information for VRF default, address family IPv4
Unicast
BGP table version is 19, local router ID is 192.168.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history,
*-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate,
r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & -
backup
Network Next
Hop Metric LocPrf Weight Path
*>a192.168.0.0/16 0.0.0.0 100 3
2768 i
s>i192.168.1.1/32 192.168.1.1 100
0 i
s>l192.168.2.2/32 0.0.0.0 100 3
2768 i
s>i192.168.4.4/32 192.168.4.4 100
0 i
Network Next
Hop Metric LocPrf Weight Path
*>e192.168.0.0/16 10.25.1.2
0 65000 i
Default-Information Originate
Not every external route can be redistributed and advertised within the
network. In such instances, the gateway or edge device advertises a default
route to other parts of the network using a routing protocol. To advertise a
default route using BGP, use the command default-information originate
under the neighbor configuration mode. It is important to note that the
command only advertises the default route if the default route is present in
the routing table. If there is no default route present, create a default route
pointing to null0 interface.
The best path algorithm is used to manipulate network traffic patterns for a
specific route by modifying various path attributes on BGP routers.
Changing of BGP PA influences traffic flow into, out of, and around an
autonomous system (AS). The BGP routing policy varies from organization
to organization based upon the manipulation of the BGP PAs. Because
some PAs are transitive and carry from one AS to another AS, those
changes could impact downstream routing for other SPs, too. Other PAs are
nontransitive and influence only the routing policy within the organization.
Network prefixes are conditionally matched on a variety of factors, such as
AS-Path length, specific ASN, BGP communities, or other attributes.
Examining the topology shown in Figure 11-6, NX-5 and NX-6 advertise
their loopback toward AS 65000. When NX-1 receives the loopbacks, it
receives it via NX-2 and NX-3 but only one of them is chosen as the best.
The command show bgp afi safi ip-address/length displays both the
received paths but also displays one of the paths that was not chosen as the
best path, as shown in Example 11-22. In this example, initially the path for
192.168.5.5/32 is chosen via NX-2 due to the lowest RID, but when an
inbound policy on NX-3 is defined to set a higher local preference, the path
via NX-3 is chosen as the best.
Advertised path-id 1
Path type: internal, path is valid, is best path
AS-Path: 65001 , path sourced external to AS
192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
Origin IGP, MED not set, localpref 100, weight 0
Advertised path-id 1
Path type: internal, path is valid, is best path
AS-Path: 65001 , path sourced external to AS
192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
Origin IGP, MED not set, localpref 200, weight 0
Advertised path-id 1
Path type: internal, path is valid, is best path
AS-Path: 65001 , path sourced external to AS
192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
Origin IGP, MED not set, localpref 200, weight 0
Note
While a prefix is being removed from the BGP RIB (BRIB), the prefix
is marked as deleted and the path is never used for forwarding. After
the update is complete, the BRIB does not show the path/prefix that
was removed.
BGP Multipath
BGP’s default behavior is to advertise only the best path to the RIB, which
means that only one path for a network prefix is used when forwarding
network traffic to a destination. BGP multipath allows for multiple paths to
be presented to the RIB, so that both paths can forward traffic to a network
prefix at the same time. BGP multipath is an enhanced form of BGP
multihoming.
Note
It is vital to understand that the primary difference between BGP
multihoming and BGP multipath is how load balancing works. BGP
multipath attempts to distribute the load of the traffic dynamically.
BGP multihoming is distributed somewhat by the nature of the BGP
best path algorithm, but manipulation to the inbound/outbound routing
policies is required to reach a more equally distributed load among the
links.
Note
NX-OS does not support the eiBGP multipath feature at the time of
writing.
Advertised path-id 1
Path type: internal, path is valid, is best path
AS-Path: 65001 , path sourced external to AS
192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
Origin IGP, MED not set, localpref 100, weight 0
Network Next
Hop Metric LocPrf Weight Path
*>l192.168.1.1/32 0.0.0.0 100 3
2768 i
*>i192.168.2.2/32 192.168.2.2 100
0 i
*>i192.168.3.3/32 192.168.3.3 100
0 i
*>i192.168.4.4/32 192.168.4.4 100
0 i
*>i192.168.5.5/32 192.168.2.2 100
0 65001 i
*|i 192.168.3.3 100
0 65001 i
*>i192.168.6.6/32 192.168.2.2 100
0 65001 i
*|i 192.168.3.3 100
0 65001 i
NX-1# show bgp ipv4 unicast 192.168.5.5
BGP routing table information for VRF default, address family IPv4
Unicast
BGP routing table entry for 192.168.5.5/32, version 59
Paths: (2 available, best #1)
Flags: (0x08001a) on xmit-list, is in urib, is best urib route, is
in HW,
Multipath: iBGP
Advertised path-id 1
Path type: internal, path is valid, is best path
AS-Path: 65001 , path sourced external to AS
192.168.2.2 (metric 41) from 192.168.2.2 (192.168.2.2)
Origin IGP, MED not set, localpref 100, weight 0
Path type: internal, path is valid, not best reason: Router Id,
multipath
AS-Path: 65001 , path sourced external to AS
192.168.3.3 (metric 41) from 192.168.3.3 (192.168.3.3)
Origin IGP, MED not set, localpref 100, weight 0
The BGP event-history logs are used to verify the second-best path being
added to the Unicast Routing Information Base (URIB). Use the command
show bgp event-history detail to view the details for both the best path
and the second-best path of a prefix being added to URIB, as shown in
Example 11-24. In Example 11-24, first the best path is selected, which is
via 192.168.2.2, and then another path is added to the URIB, which is
learned via nexthop 192.168.3.3.
Example 11-25 Debugs for BGP Update and Route Installation in BRIB
Click here to view code image
NX-1# debug logfile bgp
NX-1# debug ip bgp update
NX-1# debug ip bgp brib
NX-1# show debug logfile bgp
! Receiving an update from peer for 192.168.44.44/32
BGP Convergence
BGP convergence depends on various factors. BGP convergence is all about
the speed of the following:
Establishing sessions with a number of peers.
Locally generate all the BGP paths (either via network statement,
redistribution of static/connected/IGP routes), and/or from other
component for other address-family (for example, Multicast VPN
(MVPN) from multicast, L2VPN from l2vpn manager, and so on).
Send and receive multiple BGP tables; that is, different BGP address-
families to/from each peer.
Upon receiving all the paths from peers, perform best path calculation
to find the best path and/or multipath, additional-path, backup path.
Installing the best path into multiple routing tables like default or VRF
routing table.
Import and export mechanism.
For other address-family like L2VPN or multicast, pass the path
calculation result to different lower layer components.
BGP uses a lot of CPU cycles when processing BGP updates and requires
memory for maintaining BGP peers and routes in BGP tables. Based on the
role of the BGP router in the network, appropriate hardware should be
chosen. The more memory a router has, the more routes it can support,
much like how a router with a faster CPU supports larger number of peers.
Note
BGP updates rely on TCP, optimization of router resources, like
memory, and TCP session parameters, like maximum segment size
(MSS), path MTU discovery, interface input queues, TCP window size,
and so on to help improve convergence.
There are various steps that should be followed to verify whether the BGP
has converged and the routes are installed in the BRIB.
If there is a traffic loss, before BGP has completed its convergence for a
given address-family, verify the routing information in the URIB and the
forwarding information in the FIB. Example 11-27 demonstrates a BGP
route getting refreshed. The command show bgp event-history [event |
detail] is used to validate that the prefix is installed in BRIB table and that
the command show routing event-history [add-route | modify-route |
delete-route] used to check the route has been installed in the URIB. In the
URIB, verify the timestamp of when the route was downloaded to the
URIB. If the prefix was recently downloaded to the URIB, there might have
been an event that caused the route to get refreshed. Also, the difference in
the time between when the prefix was installing in BRIB and when it was
further downloaded to URIB will help understand the convergence time.
IPv4 Unicast:
First bestpath signalled 0.068443 after start
First bestpath completed 0.069397 after start
Convergence to URIB sent 0.082041 after start
Peer convergence after start:
192.168.2.2 (EOR after bestpath)
192.168.3.3 (EOR after bestpath)
192.168.4.4 (EOR after bestpath)
IPv6 Unicast:
First bestpath signalled 0.068467 after start
First bestpath completed 0.069574 after start
Note
If the BGP best-path has not run yet, the problem is likely not related
to BGP on that node.
If the best-path runs before EOR is received, or if a peer fails to send EOR
marker, it can lead to traffic loss. In such situations, enable debug for BGP
updates with relevant debug-filters for VRF, address-family, and peer, as
shown in Example 11-29.
Example 11-29 Debug Commands with Filter
Click here to view code image
From the debug output, check the event log to look at the timestamp to see
when the most recent EOR was sent to the peer. This also shows how many
routes were advertised to the peer before the sending of the EOR. A
premature EOR sent to the peer can also lead to traffic loss if the peer
flushes stale routes early.
If the route in URIB has not been downloaded, it needs to be further
investigated because it may not be a problem with BGP. The following
commands can be run to check the activity in URIB that could explain the
loss:
show routing internal event-history ufdm
show routing internal event-history ufdm-summary
show routing internal event-history recursive
Scaling BGP
BGP is one of the most feature-rich protocols ever developed that provides
ease of routing and control using policies. Although BGP has many inbuilt
features that scale the protocol very well, these enhancements were never
utilized properly. This poses various challenges when BGP is deployed in a
scaled environment.
BGP is a heavy protocol because it uses the most CPU and memory
resources on a router. Many factors explain why it keeps utilizing more and
more resources. The three major factors for BGP memory consumption are
as follows:
Prefixes
Paths
Attributes
BGP can hold many prefixes, and each prefix consumes some amount of
memory. But when the same prefix is learned via multiple paths, that
information is also maintained in the BGP table. Each path adds to more
memory. Because BGP was designed to give control to each AS to manage
the flow of traffic through various attributes, each prefix can have various
attributes per path. This is put down as a mathematical function, where N
represents the number of prefixes, M represents the number of paths for a
given prefix, and L represents the attributes attached to given prefix:
Prefixes: (O(N))
Paths: (O(M × N))
Attributes: (O(L × M × N))
Prefixes
BGP memory consumption becomes huge when BGP is holding a large
number of prefixes or holding the Internet routing table. In most cases, not
all the BGP prefixes are required to be maintained by all the routers
running BGP in the network. To reduce the number of prefixes, take the
following actions:
Aggregation
Filtering
Partial routing table instead of full routing table
With the use of aggregation, multiple specific routes can be aggregated into
one route. But aggregation is challenging when tried on a fully deployed
running network. After the network is up and running, the complete IP
addressing scheme has to be looked at to perform aggregation. Aggregation
is a good option for green field deployments. The green field deployments
give more control on the IP addressing scheme, which makes it easier to
apply aggregation.
Filtering provides control over the number of prefixes maintained in the
BGP table or advertised to BGP peers. BGP provides filtering based on
prefix, BGP attributes, and communities. One important point to remember
is that complex route filtering, or route filtering applied for a large number
of prefixes, helps reduce the memory, but the router takes a hit on the CPU.
Many deployments do not require all the BGP speakers to maintain a full
BGP routing table. Especially in an enterprise and data center deployments,
there is no real need to having the full Internet routing table. The BGP
speakers can maintain even a partial routing table containing the most
relevant and required prefixes or just a default route toward the Internet
gateway. Such designs greatly reduce the resources being used throughout
the network and increase scalability.
Paths
Sometimes the BGP table carries fewer prefixes but still holds more
memory because of multiple paths. A prefix can be learned via multiple
paths, but only the best or multiple best paths are installed in the routing
table. To reduce the memory consumption by BGP due to multiple paths,
the following solutions should be adopted:
Reduce the number of peerings.
Use RRs instead of IBGP full mesh.
Multiple BGP paths are a direct effect of the multiple BGP peerings.
Especially in an IBGP full-mesh environment, the number of BGP sessions
increases exponentially and thus the number of paths. A lot of customers
increase the number of IBGP neighbors to have more redundant paths, but
two paths are sufficient to maintain redundancy. Increasing the number of
peerings can cause scaling issues both from the perspective of the number
of sessions and from the perspective of BGP memory utilization.
It is a well-known fact that IBGP needs to be in full mesh. Figure 11-7
illustrates an IBGP full-mesh topology. In an IBGP full-mesh deployment
of n nodes, there are a total of n*(n−1)/2 IBGP sessions and (n−1) sessions
per BGP speaker.
This not only affects the scalability of an individual node or router but the
whole network. To increase the scalability of IBGP network, two design
approaches can be used:
Confederations
Route Reflectors
Note
BGP Confederations and Route Reflectors are discussed in another
section later in this chapter.
Attributes
A BGP route is a “bag” of attributes. Every BGP prefix has certain default
or mandatory attributes that are assigned automatically, such as next-hop or
AS-PATH, or attributes that are configured manually, such as Multi-Exit
Discriminator (MED) and the like, assigned by customers. Each attribute
attached to the prefix adds up some memory utilization. Along with
attributes, communities—both standard and extended—add to increased
memory consumption. To reduce the BGP memory consumption due to
various attributes and communities, the following solutions can be adopted:
Reduce the number of attributes.
Filter standard or extended communities.
Limit local communities.
On NX-OS, use the command show bgp private attr detail to view the
various attributes attached to the BGP prefixes. Example 11-30 displays the
various global BGP attributes on NX-1. These attributes were learned
across various prefixes, including the community attached to the prefix
learned from NX-4.
Neighbor capabilities:
Dynamic capability: advertised (mp, refresh, gr) received (mp,
refresh, gr)
Dynamic capability (old): advertised received
Route refresh capability (new): advertised received
Route refresh capability (old): advertised received
4-Byte AS capability: advertised received
Address family IPv4 Unicast: advertised received
Graceful Restart capability: advertised received
! Output omitted for brevity
Note
When the soft-reconfiguration feature is configured, BGP route
refresh capability is not used, even though the capability is negotiated.
The soft-reconfiguration configuration controls the processing or
initiating route refresh.
Note
It is recommended to use soft-reconfiguration inbound only on
EBGP peering whenever it is required to know what the original prefix
attributes are before being filtered or modified by the inbound route-
map. It is not recommended on routers that receive a large number of
prefixes being exchanged, such as the Internet routing table.
The RR and the client peers form a cluster and are not required to be fully
meshed. Because the topology in (b) has an RR along with fully meshed
IBGP client peers, which actually defies the purpose of having RR, the BGP
RR reflection behavior should be disabled. The BGP RR client-to-client
reflection is disabled using the command no bgp client-to-client
reflection. This command is required only on the RR and not on the RR
clients. Example 11-33 displays the configuration for disabling BGP client-
to-client reflection.
ORIGINATOR_ID
This optional nontransitive BGP attribute is created by the first route-
reflector and sets the value to the RID of the router that injected/advertised
the route into the AS. If the ORIGINATOR_ID is already populated on an
NLRI, it should not be overwritten.
If a router receives an NLRI with its RID in the Originator attribute, the
NLRI is discarded.
CLUSTER_LIST
This nontransitive BGP attribute is updated by the route-reflector. This
attribute is appended (not overwritten) by the route-reflector with its
cluster-id. By default, this is the BGP identifier. The cluster-id is set with
the BGP configuration command cluster-id.
If a route reflector receives an NLRI with its cluster-id in the Cluster List
attribute, the NLRI is discarded.
Example 11-34 provides a sample prefix output from a route that was
reflected by the route reflector NX-1, as shown in Figure 11-9. Notice that
the originator ID is the advertising router and that the cluster list contains
the route-reflector ID. The cluster list contains the route-reflectors that the
prefix traversed in the order of the last route-reflector that advertised the
route.
Advertised path-id 1
Path type: internal, path is valid, is best path
AS-Path: 65001 , path sourced external to AS
192.168.2.2 (metric 81) from 192.168.1.1 (192.168.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Originator: 192.168.2.2 Cluster list: 192.168.1.1
If a topology contains more than one RR and both the RRs are configured
with different cluster IDs, the second RR holds the path from the first RR
and hence consumes more memory and CPU resources. Having either
single cluster-id or multiple cluster-id has its own disadvantages.
Different cluster-id: Additional memory and CPU overhead on RR
Same cluster-id: Less redundant paths
If the RR clients are fully meshed within the cluster, no bgp client-to-
client reflection command can be enabled on the RR.
Maximum Prefixes
By default, a BGP peer holds all the routes advertised by the peering router.
The number of routes are filtered either on the inbound of the local router
or on the outbound of the peering router. But there can still be instances
where the number of routes are more than what a router needs or a router
can handle.
NX-OS supports the BGP maximum-prefix feature that allows you to limit
the number of prefixes on a per-peer basis. Generally, this feature is
enabled for EBGP sessions, but it is also used for IBGP sessions. Although
this feature helps scale and prevent the network from an excess number of
routes, it is very important to understand when to use this feature. The BGP
maximum-prefix feature is enabled in the following situations:
Know how many BGP routes are anticipated from the peer.
What actions need to be taken if the number of routes are exceeded.
Should the BGP connection be reset or should a warning message be
logged?
To limit the number of prefixes, use the command maximum-prefix
maximum [threshold] [restart restart-interval | warning-only] for each
neighbor. Table 11-6 elaborates each of the fields in the command.
BGP Max AS
Various attributes are, by default, assigned to every BGP prefix. The length
of attributes that can be attached to a single prefix can grow up to size of 64
KB, which can cause scaling as well as convergence issues for BGP.
A lot of times, the as-path prepend option is used to increase the AS-PATH
list to make a path with lower AS-PATH list preferred. This operation does
not have much of an impact. But from the perspective of the Internet, a
longer AS-PATH list cannot only cause convergence issues but can also
cause security loopholes. The AS-PATH list actually signifies a router’s
position on the Internet.
To limit the maximum number of AS-PATH length supported in the
network, the maxas-limit command was introduced. Using the command
maxas-limit 1-512 in NX-OS, any route with AS-PATH length higher than
the specified number is discarded.
BGP Route Filtering and Route Policies
BGP, along with being scalable, also has the capability to provide route
filtering, traffic engineering, and traffic load-sharing capabilities. BGP
provides all these functionalities by defining route policies and route
filters. This route filtering is defined using three methods:
Prefix-lists
Filter-lists
Route-maps
The BGP route-maps provide more dynamic capability as compared to
prefix-lists and filter-lists, because it not only allows you to perform route
filtering but also allows the network operators to define policies and set
attributes that can be further used to control traffic flow within the
network. All these route filtering and route policy methods are discussed in
future sections.
Example 11-36 displays the BGP table of Nexus switch NX-2 in the
topology shown in Figure 11-9. The NX-2 switch is used as the base to
demonstrate all the filtering techniques shown further in this chapter.
Prefix-List-Based Filtering
As explained in Chapter 10, “Troubleshooting Nexus Route-Maps,” prefix
lists provide another method of identifying networks in a routing protocol.
They identify a specific IP address, network, or network range, and allow
for the selection of multiple networks with a variety of prefix lengths
(subnet masks) by using a prefix match specification.
The prefix-list can be applied directly to a BGP peer and also as a match
statement within the route-map. A prefix-list is configured using the
command ip prefix-list name [seq sequence-number] [permit ip-
address/length | deny ip-address/length] [le length | ge length | eq length].
Examine the same topology as shown in Figure 11-6. Example 11-37
illustrates the configuration of BGP inbound and outbound route filtering
using prefix-lists on NX-2. The inbound prefix-list permits for 5 networks,
whereas the outbound prefix-list permits for host network entries is /32
prefixes matching in subnet 192.168.0.0/16. When the prefix-lists are
configured, use the command show bgp afi safi neighbor ip-address to
ensure that the prefix-lists have been attached to the neighbor.
Example 11-38 displays the output of the BGP table after the prefix-lists
have been configured and attached to BGP neighbor 10.25.1.5. Notice that
in this example, on the NX-2 switch, only 5 prefixes are seen from
neighbor 10.25.1.5. On NX-5, all the loopback addresses of the nodes in AS
65000 are advertised apart from 192.168.44.0/24.
Network Next
Hop Metric LocPrf Weight Path
*>e100.1.1.0/24 10.25.1.5 0 65001
100 220
*>e100.1.2.0/24 10.25.1.5 0 65001
100 220
*>e100.1.3.0/24 10.25.1.5 0 65001
100 220
*>e100.1.4.0/24 10.25.1.5 0 65001
100 220
*>e100.1.5.0/24 10.25.1.5 0 65001
100 220
*>i192.168.1.1/32 192.168.1.1 100
0 i
*>l192.168.2.2/32 0.0.0.0 100 3
2768 i
*>i192.168.3.3/32 192.168.3.3 100
0 i
*>i192.168.4.4/32 192.168.4.4 100
0 i
NX-5# show bgp ipv4 unicast neighbors 10.25.1.2 routes
Peer 10.25.1.2 routes for address family IPv4 Unicast:
BGP table version is 1209, local router ID is 192.168.5.5
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history,
*-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate,
r-redist,
I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & -
backup
Network Next
Hop Metric LocPrf Weight Path
*>e192.168.1.1/32 10.25.1.2 0 65000 i
*>e192.168.2.2/32 10.25.1.2 0 65000 i
*>e192.168.3.3/32 10.25.1.2 0 65000 i
*>e192.168.4.4/32 10.25.1.2 0 65000 i
On the inbound direction on NX-2, use the command show bgp event-
history detail to view the details of the prefixes being matched against the
prefix-list Inbound. Based on the match, the prefixes are either permitted or
denied. If no entry exists for the prefix in the prefix-list, it is dropped by
BGP and will not be part of the BGP table. Example 11-39 displays the
event-history detail output demonstrating how a prefix 100.1.30.0/24 is
rejected or dropped by BGP prefix-list and the prefix 100.1.5.0/24 being
permitted at the same time.
For the outbound direction, the show bgp event-history detail command
output displays the prefixes in the BGP table being permitted and denied
based on the matching entries in the outbound prefix-list named Outbound.
After the filtering is performed, the prefixes are then advertised to the BGP
peer along with relevant attributes, as shown in Example 11-40.
NX-OS also has CLI to verify policy-based statistics for prefix-lists. The
statistics are verified for the policy implied in both inbound and outbound
directions and shows the number of prefixes permitted and denied in either
direction. Use the command show bgp afi safi policy statistics neighbor
ip-address prefix-list [in | out] to view the policy statistics for prefix-list
applied on a BGP neighbor. The counters of the policy statistics command
increment every time a BGP neighbor flaps or a soft clear is performed on
the neighbor. Example 11-41 demonstrates the use of a policy statistics
command for BGP peer 10.25.1.5 in both inbound and outbound directions
to understand how many prefixes are being permitted and dropped in both
inbound and outbound directions. In this example, a soft clear is performed
on the outbound direction, and it is seen that the counters increment for the
outbound prefix-list policy statistics by 4 for permitted prefixes and 1 for a
dropped prefix.
Example 11-41 BGP Policy Statistics for Prefix-List
Click here to view code image
Filter-Lists
BGP filter-lists allow for filtering of prefixes based on AS-Path lists. A
BGP filter-list can be applied in both inbound and outbound directions. A
BGP filter-list is configured using the command filter-list as-path-list-
name [in | out] under the neighbor address-family configuration mode.
Example 11-43 illustrates a sample configuration of filter-list on NX-2
switch in the topology referenced in Figure 11-6. In this example, an
inbound filter-list is configured to allow the prefixes that have AS 274 in
the AS_PATH list. The second output of the example shows that the filter-
list is applied on the inbound direction.
Note
AS-Path access-list is discussed later in this chapter.
Example 11-44 displays the prefixes in the BGP table received from peer
10.25.1.5 after being filtered by the filter-list. Notice that all the prefixes
shown in the BGP table have AS 274 in their AS_PATH list.
Note
If a BGP peer is configured with the soft-reconfiguration inbound
command, you can also use the command show bgp afi safi neighbor
ip-address received-routes to view the received BGP prefixes.
The easiest way to verify which prefixes are being permitted and denied is
to use the show bgp event-history detail command output, but if the event-
history detail command is not enabled under the router bgp configuration,
you can enable debugs to verify the updates. The debug bgp updates
command can be used to verify both the inbound and the outbound updates.
Example 11-45 demonstrates the use of debug bgp updates to verify which
prefixes are being permitted and which are being denied. The action of
permit or deny is always based on the entries present in the AS-path list.
Similar to policy statistics for prefix-lists, the statistics are also available
for filter-list entries. When executing the command show bgp afi safi
policy statistics neighbor ip-address filter-list [in | out], notice the
relevant AS-path access list referenced as part of the filter-list command
and the number of matches per each entry. The output also displays the
number of accepted and rejected prefixes by the filter-list, as displayed in
Example 11-46.
BGP Route-Maps
BGP uses route-maps to provide route filtering capability and traffic
engineering by setting various attributes to the prefixes that help control
the inbound and outbound traffic. Route-maps typically use some form of
conditional matching so that only certain prefixes are blocked or accepted.
At the simplest level, route-maps can filter networks similar to an AS-Path
filter/prefix-list, but also provide additional capability by adding or
modifying a network attribute. Route-maps are referenced to a specific
route-advertisement or BGP neighbor and require specifying the direction
of the advertisement (inbound/outbound). Route-maps are a critical
component of BGP because they allow for a unique routing policy on a
neighbor-by-neighbor basis.
Example 11-48 illustrates a sample configuration of a multisequence route-
map that is applied to a neighbor. Notice that in this example, the route-
map sequence 10 is matching on prefix-list to match certain set of prefixes
and on sequence 20 matches AS-Path access-list. Note that there is no
sequence 30. Absence of any other entry in the route-map acts as an
implicit deny statement and denies all prefixes.
Example 11-49 shows the BGP table after inbound route-map filtering.
Notice that the prefixes 100.1.1.0/24 to 100.1.5.0/24 are set with the local
preference of 200, whereas the prefixes that match AS 274 in the AS-path
list are set with the local preference of 300. Because there is no route-map
entry matching sequence 30, all the other prefixes are denied by the
inbound route-map filtering.
The show bgp event-history detail command can be used again to verify
which prefixes are being permitted or denied based on the route-map
policy. Based on the underlying match statements, relevant set actions are
taken (if any). Example 11-50 displays the event-history detail output
demonstrating prefixes being permitted and denied by route-map.
You can also validate the policy statistics for the route-map similar to
prefix-list and filter-list. The command show bgp ipv4 unicast policy
statistics neighbor ip-address route-map [in | out] displays the matching
prefix-list or AS-path access-list or any other attributes under each route-
map sequence and its matching statistics, as shown in Example 11-51.
Note
NX-OS devices require the regex-pattern to be placed within a pair of
double-quotes “”.
Table 11-7 provides a brief list and description of the common regex query
modifiers.
Table 11-7 RegEx Query Modifiers
Modifier Description
_ (Underscore) Matches a space
^ (Caret) Indicates the start of the string
$ (Dollar Sign) Indicates the end of the string
[] (Brackets) Matches a single character or nesting within a range
- (Hyphen) Indicates a range of numbers in brackets
[^] (Caret in Excludes the characters listed in brackets
Brackets)
() (Parentheses) Used for nesting of search patterns
| (Pipe) Provides or functionality to the query
. (Period) Matches a single character, including a space
* (Asterisk) Matches zero or more characters or patterns
+ (Plus Sign) One or more instances of the character or pattern
? (Question Mark) Matches one or no instances of the character or
pattern
Note
The .^$*+()[]? characters are special control characters that cannot be
used without using the backslash \ escape character. For example, to
match on the * in the output use the \* syntax.
Note
The AS-Path for the prefix 172.16.129.0/24 has the AS 300 twice
nonconsecutively for a specific purpose. This is not seen in real life,
because it indicates a routing loop.
_ Underscore
Query Modifier Function: Matches a space
Scenario: Only display ASs that passed through AS 100. The first
assumption is that the syntax show bgp ipv4 unicast regex “100”
as shown in Example 11-53 is ideal. The regex query includes the
following unwanted ASNs: 1100, 2100, 21003, and 10010.
Example 11-54 uses the underscore (_) to imply a space left of the 100 to
remove the unwanted ASNs. The regex query includes the following
unwanted ASNs: 10010.
Example 11-55 provides the final query by using the underscore (_) before
and after the ASN (100) to finalize the query for the route that passes
through AS 100.
^ Caret
Query Modifier Function: Indicates the start of the string.
Example 11-59 provides the solution using the dollar sign ($) for the regex
the pattern “_40$”.
[ ] Brackets
Query Modifier Function: Matches a single character or nesting
within a range.
Scenario: Only display routes with an AS that contains 11 or 14 in it.
The regex filter “1[14]” can be used as shown in Example 11-60.
- Hyphen
Query Modifier Function: Indicates a range of numbers in brackets.
Scenario: Only display routes with the last two digits of the AS of
40, 50, 60, 70, or 80. Example 11-61 uses the regex query “[5-
8]0_”. See the output in Example 11-60.
. Period
Query Modifier Function: Matches a single character, including a
space.
+ Plus Sign
Query Modifier Function: One or more instances of the character or
pattern.
Scenario: Only display routes where they contain at least one or
more ‘11’ in the AS path. The regex pattern is “(11)+” as shown in
Example 11-65.
? Question Mark
Query Modifier Function: Matches one or no instances of the
character or pattern.
Note
The CTRL+V escape sequence must be used before entering the ?.
* Asterisk
Query Modifier Function: Matches zero or more characters or
patterns.
Scenario: Display all routes from any AS. This may seem like a
useless task, but may be a valid requirement when using AS-Path
access lists, which are explained later in this chapter. Example 11-
67 shows the regex query.
BGP Communities
BGP communities provide additional capability for tagging routes and are
considered either well-known or private BGP communities. Private BGP
communities are used for conditional matching for a router’s route-policy,
which could influence routes during inbound or outbound route-policy
processing. There are four well-known communities that affect only
outbound route-advertisement:
No-Advertise: The No_Advertise community (0xFFFFFF02 or
4,294,967,042) specifies that routes with this community should not
be advertised to any BGP peer. The No-Advertise BGP community can
be advertised from an upstream BGP peer or locally with an inbound
BGP policy. In either method, the No-Advertise community is set in
the BGP Loc-RIB table that affects outbound route advertisement.
No-Export: The No_Export community (0xFFFFFF01 or
4,294,967,041) specifies that when a route is received with this
community, the route is not advertised to any EBGP peer. If the router
receiving the No-Export route is a confederation member, the route is
advertised to other sub ASs in the confederation.
Local-AS: The No_Export_SubConfed community (0xFFFFFF03 or
4,294,967,043) known as the Local-AS community specifies that a
route with this community is not advertised outside of the local AS. If
the router receiving a route with the Local-AS community is a
confederation member, the route is advertised only within the sub-AS
(Member-AS) and is not advertised between Member-ASs.
Internet: Advertise this route to the Internet community and all the
routers that belong to it.
The private community value is of the format (as-number:16-bit-number).
Conditionally matching BGP communities allows for selection of routes
based upon the BGP communities within the route’s path attributes so that
selective processing occurs in a route-map.
NX-OS devices do not advertise BGP communities to peers by default.
Communities are enabled on a neighbor-by-neighbor basis with the BGP
address-family configuration command send-community [standard |
extended | both] under the neighbor’s address family configuration.
Standard communities are sent by default, unless the optional extended or
both keywords are used.
Conditionally matching on NX-OS devices requires the creation of a
community list. A community list shares a similar structure to an ACL, is
standard or expanded, and is referenced via number or name. Standard
community lists match either well-known communities or a private
community number (as-number:16-bit-number), whereas Expanded
community lists use regex patterns.
Examining the same topology as shown in Figure 11-6. In this topology,
NX-5 assigns a community value of 65001:274 for the prefixes that have
AS 274 in their AS_Path list. Example 11-69 illustrates the configuration
on NX-5 to a community value attached to prefixes.
Advertised path-id 1
Path type: external, path is valid, is best path
AS-Path: 65001 100 228 274 {300 243} , path sourced external to
AS
10.25.1.5 (metric 0) from 10.25.1.5 (192.168.5.5)
Further Reading
Some of the topics involving validity checks and next-hop resolution are
explained further in the following books:
Halabi, Sam. Internet Routing Architectures. Indianapolis: Cisco Press,
2000.
Zhang, Randy, and Micah Bartell. BGP Design and Implementation.
Indianapolis: Cisco Press 2003.
White, Russ, Alvaro Retana, and Don Slice. Optimal Routing Design.
Indianapolis: Cisco Press, 2005.
Jain, Vinit, and Brad Edgeworth. Troubleshooting BGP. Indianapolis: Cisco
Press, 2016.
References
Jain, Vinit, and Brad Edgeworth. Troubleshooting BGP. Indianapolis: Cisco
Press, 2016.
Edgeworth, Brad, Aaron Foss, and Ramiro Garza Rios. IP Routing on Cisco
IOS, IOS XE and IOS XR. Indianapolis: Cisco Press, 2014.
Cisco. Cisco NX-OS Software Configuration Guides, www.cisco.com.
Part IV
Troubleshooting High
Availability
Chapter 12
High Availability
Note
Demand mode is not supported on Cisco platforms. In demand mode,
no hello packets are exchanged after the session is established. In this
mode, BFD assumes there is another way to verify connectivity
between the two endpoints. Either host may still send hello packets if
needed, but they are not generally exchanged.
Asynchronous Mode
Asynchronous mode is the primary mode of operation and is mandatory for
BFD to function. In this mode, each system periodically sends BFD control
packets to one another. For example, packets sent by router R1 have a
source address of NX-1 and a destination address of router NX-2, as Figure
12-1 shows.
Figure 12-2 shows the BFD control packets defined by the IETF.
Note
BFD supports keyed SHA-1 authentication on NX-OS beginning with
Release 5.2.
Asynchronous Mode with Echo Function
Asynchronous mode with echo function is designed to test only the
forwarding path, not the host stack on the remote system. It is enabled only
after the session is enabled. BFD echo packets are sent in such a way that
the other end just loops them back through its forwarding path. For
example, a packet sent by router NX-1 is sent with both the source and
destination address belonging to NX-1 (see Figure 12-3).
NX-1
NX-1(config)# int e4/1
NX-1(config-if)# no ip redirects
NX-1(config-if)# no ipv6 redirects
NX-2(config)# bfd interval 300 min_rx 300 multiplier 3
NX-1(config-if)# ip ospf bfd
NX-1(config-if)# no bfd echo
NX-1(config-if)# exit
NX-1(config)# router ospf 100
NX-1(config-router)# bfd
Note
To enable BFD for other routing protocols, refer to the Cisco
documentation for the configuration on different Nexus devices.
When BFD is enabled, a BFD session gets established. Use the command
show bfd neighbors [detail] to verify the status of BFD. The show bfd
neighbors command displays the state of the BFD neighbor, along with the
interface, local and remote discriminator, and Virtual Routing and
Forwarding (VRF) details. The output with the detail keyword displays all
the fields that are part of the BFD control packet, which is useful for
debugging purposes to see whether a mismatch could cause the BFD
session to flap. Ensure that the State bit is set to Up instead of AdminDown.
The output also shows that the echo function is enabled or disabled.
Example 12-3 displays the output of the command show bfd neighbors
[detail].
As with other features, BFD also maintains internal event-history logs that
are useful in debugging any state machine-related or BFD flaps. The event-
history for BFD provides various command-line options. To view the BFD
event-history, use the command show system internal bfd event-history
[all | errors | logs | msgs | session [discriminator]]. The all option shows
all the event-history (that is, all the events and error event-history logs).
The errors option shows only the BFD-related errors. The logs options
shows all the events for BFD. The msgs option shows BFD-related
messages, and the session option helps view the logs related to errors, log
messages, and app-events for a particular session.
Example 12-5 displays the BFD event-history logs for a BFD session
hosted on an interface on module 4 with the discriminator 0x41000004.
This example also helps you understand the information exchange and
steps the system goes through in bringing up a BFD session. These are the
steps, listed in sequence:
Step 1. The session begins with an Admin Down state.
Step 2. The BFD client (BFDC) adds a BFD session with the interface and
IP addresses of the devices between which the session will be
established.
Step 3. The BFD component sends an MTS message to the BFDC
component on the line card.
Step 4. BFD sends a received session notification to its clients.
Note
The BFD process runs on the supervisor, whereas the BFDC runs on
the line card.
Example 12-5 BFD Event-History Logs
Click here to view code image
NX-1# show system internal bfd event-history logs
Example 12-6 displays the detailed information about the session using the
command show system internal bfd event-history session discriminator.
The discriminator value is calculated from the LD or your discriminator
value from the show bfd neighbor detail output. This value is calculated in
hex, as shown in Example 12-6, and is used with the event-history
command output. The event-history session command views the errors,
logs such as parameters exchanged and state changes, and app events
related to a given BFD session.
INSTANCE 0x0
---------------
INSTANCE 0x0
---------------
Note
The ACL programming on the hardware is dependent on the
underlying line card hardware and the Nexus platform. The behavior
might differ among Nexus hardware platforms.
To enable the BFD echo function, configure the command bfd echo under
the interface. When the session is configured with the echo function, the
BFD session starts in asynchronous mode using a slow interval of 2
seconds. When the session is up, and if the interval specified by the client
is less than 2 seconds, the echo function gets activated (assuming that the
echo function is enabled on the remote peer as well).
Example 12-9 illustrates the configuration of the BFD echo function
between NX-1 and NX-2 and the changes in the show bfd neighbors detail
command output after the BFD session is established.
Example 12-9 BFD with Echo Function Configuration and Verification
Click here to view code image
If a failure occurs, NX-OS logs a syslog message for BFD failure along
with a reason code for the failure and the session discriminator value.
Example 12-10 displays the syslog message of a BFD failure on NX-1.
Notice that, in this case, the reason is 0x2, which indicates “Echo Function
Failed.”
Table 12-2 lists all the BFD failure reason codes, along with their
description.
Note
In case of any BFD failure event, capturing show tech bfd soon after
the BFD flap event is recommended. It is also necessary to capture the
show tech feature output for the relevant feature with which BFD is
associated; for instance, in case of OSPF, this is show tech ospf.
Nexus also supports BFD over L3 port-channels or BFD on SVI interfaces
over L2 port-channel. In both cases, Link Aggregation Control Protocol
(LACP) must be enabled for the port-channel interface. BFD is enabled on
L3 port-channel interfaces for two methods:
BFD per-link
Micro BFD session
To enable BFD per-link, use the command bfd per-link under the port-
channel interface along with the no ip redirects command. That enables
the BFD for the client protocol enabled on that L3 port-channel interface.
When BFD per-link mode is used, BFD creates a session for each link in
the port-channel and provides accumulated or aggregated results to the
client protocol. Example 12-11 demonstrates the configuration of per-link
BFD configuration on port-channel interface and its verification using the
show bfd neighbors [detail] command output. Use the command show
port-channel summary to verify the member ports of the port-channel
interface.
Session state is Up
Local Diag: 0
Registered protocols: ospf
Uptime: 0 days 0 hrs 0 mins 9 secs
Hosting LC: 0, Down reason: None, Reason not-hosted: None
Parent session, please check port channel config for member info
Nexus 9000 also supports BFD on every link aggregation group (LAG)
member interfaces, as defined in RFC 7130. This method is called IETF
Micro BFD session. The echo function is not supported on micro BFD
sessions. The benefit of using micro BFD sessions is that if any member
port goes down, the port is removed from the forwarding table and traffic
disruption is prevented on that member link.
Micro BFD sessions are configured using the commands port-channel bfd
track-member-link and port-channel bfd destination ip-address on an
active L3 port-channel interface. Example 12-12 demonstrates the
configuration of micro BFD session configuration on Nexus 9000 switches
N9k-1 and N9k-2.
N9k-1
N9k-1(config)# interface port-channel2
N9k-1(config-if)# port-channel bfd track-member-link
N9k-1(config-if)# port-channel bfd destination 172.16.0.1
N9k-2
N9k-2(config)# interface port-channel2
N9k-2(config-if)# port-channel bfd track-member-link
N9k-2(config-if)# port-channel bfd destination 172.16.0.0
Session state is Up
Local Diag: 0
Registered protocols: eth_port_channel
Uptime: 0 days 0 hrs 9 mins 56 secs
Hosting LC: 0, Down reason: None, Reason not-hosted: None
Parent session, please check port channel config for member info
Stateful Switchover
Various Nexus platforms (including the Nexus 7000, Nexus 7700, and
Nexus 9500) have support for fabric as well as supervisor redundancy. The
benefit of the hardware-based redundancy is that if the active hardware
(fabric or supervisor card) fails, the standby hardware takes over the role of
active and prevents any kind of traffic and service disruption. In addition,
some of the software-based HA features, such as nonstop routing (NSR),
nonstop forwarding (NSF), and graceful restart (GR), are leveraged only
when a redundant supervisor card is available to synchronize the state to
the standby supervisor and seamlessly take over the role of active
supervisor when the old active supervisor fails.
With redundant hardware, the supervisor cards must stay in active/ha-
standby mode. The supervisor states are verified using the command show
module. This command displays all the supervisor cards, line cards, and
fabric cards present in the chassis. Example 12-14 displays the show
module output on the Nexus 7000 switch. Notice that, in the output, the
supervisor card in slot 1 is in ha-standby state and the one in slot 2 is in
active state.
The HA state is also verified using the command show system redundancy
status. When the standby supervisor is booting up, or after a switchover
event when the active supervisor moves to a standby role, the ha-standby
state is not achieved immediately. The standby supervisor requires
synchronizing the state with that of the active supervisor. This is achieved
with the system manager (sysmgr) component on the active supervisor. The
sysmgr component initiates a global sync (gsync) of active supervisor state
to standby supervisor. During the synchronization process, the state is seen
as HA synchronization in progress. Note that the standby should not be in
this state for too long because it can indicate failure and other issues.
When all the components and states are synchronized between the active
and standby supervisor, the Module-Manager is informed that the standby
supervisor is up. The Module-Manager then informs all the software
components on active supervisor about the availability of the standby
supervisor and configures them. This event is known as the Standby Sup
Insertion Sequence. Any error faced during this sequence results in a reboot
of the standby supervisor.
Example 12-15 displays the system redundancy status. An ideal state for
redundancy is active/standby state. In this example, the standby supervisor
is currently synchronizing its states with the active supervisor in slot 2.
Note
In case of failure during Standby Sup Insertion Sequence, collect the
following commands to help identify where the failure has occurred:
show logging [nvram]
show module internal exception-log
show system reset-reason
show module internal event-history module slot
On the Nexus 7000 or Nexus 7700 series platform, where virtual device
context (VDC) is supported, the HA state should also be maintained across
all VDCs configured on the system. This is verified using the command
show system redundancy ha status. Example 12-16 verifies the system
redundancy state across all VDCs.
The master System Manager has PID 4967 and UUID 0x1.
Last time System Manager was gracefully shutdown.
The state is SRV_STATE_MASTER_ACTIVE_HOTSTDBY entered at time Thu
Oct 26 13:20:5
4 2017.
Debugging info:
HA info:
slotid = 2 supid = 0
cardstate = SYSMGR_CARDSTATE_ACTIVE .
cardstate = SYSMGR_CARDSTATE_ACTIVE (hot switchover is configured
enabled).
Configured to use the real platform manager.
Configured to use the real redundancy driver.
Redundancy register: this_sup = RDN_ST_AC, other_sup = RDN_ST_SB.
EOBC device name: veobc.
Remote addresses: MTS - 0x00000101/3 IP - 127.1.1.1
MSYNC done.
Remote MSYNC not done.
Module online notification received.
Local super-state is: SYSMGR_SUPERSTATE_STABLE
Standby super-state is: SYSMGR_SUPERSTATE_STABLE
Swover Reason : SYSMGR_UNKNOWN_SWOVER
Total number of Switchovers: 0
Swover threshold settings: 5 switchovers within 4800 seconds
Switchovers within threshold interval: 0
Last switchover time: 0 seconds after system start time
Cumulative time between last 0 switchovers: 0
Start done received for 1 plugins, Total number of plugins = 1
Statistics:
Message count: 0
Total latency: 0 Max latency: 0
Total exec: 0 Max exec: 0
Autobooting bootflash:/n7000-s2-kickstart.7.3.2.D1.1.bin
bootflash:/n7000-s2-dk
9.7.3.2.D1.1.bin...
Filesystem type is ext2fs, partition type 0x83
! Output omitted for brevity
NX-1 SUP-2
NX-1 login: admin
Password:
ISSU
Performing upgrades in any network deployment, especially in a huge data
center and enterprise, is unpleasant. In most cases, when a device needs to
be upgraded, services and traffic are shifted to the backup or redundant
devices, boot variables are set, and then the device is brought down using
reload command to perform the upgrade. This becomes more challenging
on devices such as the Nexus 7000, with multiple VDCs running on a single
box, acting as individual devices and playing different roles. To overcome
the challenges of upgrades in the network, leverage the ISSU feature.
ISSU is not a new concept. It is available on multiple Cisco catalyst
platforms, including 4500 and 6500 switches. ISSU follows the same
concept on Nexus 7000 series devices. The whole ISSU process takes place
in a few simple steps:
Step 1. Upgrade the Basic Input and Output System (BIOS) on supervisors
and line card modules.
Step 2. Bring up the standby supervisor card with a new image.
Step 3. Switch over from the active to the standby supervisor, which is
running on the new image.
Step 4. Bring up old active supervisor card with the new image.
Step 5. Perform a hitless line card upgrade (one at a time).
Step 6. Upgrade the Connectivity Management Processor (CMP).
Note
Starting with NX-OS Release 5.2(1), simultaneous multiple line card
upgrades happen on Nexus switches, thus reducing the upgrade time
using ISSU.
Autobooting bootflash:/n7000-s2-kickstart.7.3.2.D1.1.bin
bootflash:/n7000-s2-dk
9.7.3.2.D1.1.bin...
Filesystem type is ext2fs, partition type 0x83
Booting kickstart image: bootflash:/n7000-s2-
kickstart.7.3.2.D1.1.bin....
..................................................................
.............
..................................................................
.......
Kickstart digital signature verification Successful
Image verification OK
Note
In case of ISSU failure, it is also important to collect show tech-
support issu and show tech-support ha outputs before the services
are recovered.
Graceful Insertion and Removal
In any network deployment, network engineers must perform hardware
replacements, hardware and software replacements, or even an intrusive
debugging session to identify a root cause of a problem. In any of these
instances, engineers do not want to impact any services running on the
network. Usually a maintenance window is scheduled and traffic is diverted
to a backup path or redundant device to minimize the impact on any
services, but this is a tedious task. NX-OS provides the Graceful Insertion
and Removal (GIR) feature, which enables you to put devices in
maintenance mode and perform any of the previously stated activities
without impacting any services. The intent of GIR is to simplify the
isolation of a switch from the network using a single set of commands
instead of having to manually shut interfaces or alter metrics. In other
words, GIR can essentially be called a macro that automates all manual
steps to isolate the switch from the network.
GIR has two modes:
Maintenance mode
Normal mode
In maintenance mode (also known as the Graceful Removal phase), all data
traffic bypasses the node. A parallel path should be available for the GIR to
function properly. If no available parallel path exists, service disruptions to
the network can arise. Maintenance mode is used to perform maintenance-
related activities such as software/hardware upgrades, swaps for bad
hardware, or other disruptive activities on the node. The node then can go
back to normal mode (also known as Graceful Insertion phase).
To understand the functioning of GIR, examine the topology in Figure 12-4.
This topology is a typical spine-leaf topology with two spine nodes and six
leaf nodes. The connectivity between spine and leaf is via OSPF.
Figure 12-4 Typical Spine-Leaf Topology
In this topology, suppose that the spine node Spine1 is set to maintenance
mode for performing a software upgrade. The first step in GIR is to
advertise costly metrics within the routing protocols. Thus, Spine1
advertises the OSPF max-metric to all its OSPF neighbors. When the leaf
nodes receive the max-metric, they alter their forwarding path to push all
the traffic through Spine2. At this point, the OSPF neighborship is still up
between Spine1 and all six leaf nodes (assuming the default Isolate mode,
to be discussed), but no data forwarding is happening via Spine1.
Maintenance mode is supported on Nexus 7000 and 7700 series platforms
starting with Release 7.2.0 and on Nexus 5500/5600 platforms starting with
Release 7.1.0. Maintenance mode is configured using the command system
mode maintenance [shutdown]. When the command system mode
maintenance is configured, GIR is enabled in default mode, also known as
Isolate mode. In this mode, the protocol neighborship is maintained and
traffic is diverted to the backup or parallel path. When the command
system mode maintenance shutdown is configured, the GIR is enabled in
shutdown mode; the protocols go into shutdown state, links are shut down,
and traffic loss can occur. Isolate mode for GIR is recommended over
shutdown mode.
Example 12-21 demonstrates the differences in feature-level configuration
when the device is configured for isolate mode versus shutdown
maintenance mode. In both modes, the command show system mode
shows that the system mode is Maintenance. Before the system goes into
maintenance mode, NX-OS takes a snapshot of the current state of the
device and saves it as the before_maintenance snapshot.
When the system goes into maintenance mode, the processes that were
influenced by maintenance mode change their running state to Isolate or
Shutdown. Example 12-22 displays the different routing protocol processes
and their current state on the system.
When the system is back to normal mode, verify that the services are
normalized, with routes in the Routing Information Base (RIB), VLANs,
and so on. The snapshots taken before and after maintenance help verify the
same with just a single command. The current available snapshots are
verified using the command show snapshots. When both the before and
after maintenance snapshots are available, use the command show
snapshots compare before_maintenance after_maintenance [summary] to
compare the system for any differences. Example 12-24 demonstrates the
comparison of before and after maintenance snapshots.
==================================================================
==============
Feature before_maintenance
after_maintenance changed
==================================================================
==============
basic summary
# of
interfaces 63 63
# of
vlans 1 1
# of ipv4 routes vrf
default 43 43
# of ipv4 paths vrf
default 46 46
# of ipv4 routes vrf
management 9 9
# of ipv4 paths vrf
management 9 9
# of ipv6 routes vrf
default 3 3
# of ipv6 paths vrf
default 3 3
interfaces
# of eth
interfaces 60 60
# of eth interfaces
up 7 7
# of eth interfaces
down 53 53
# of eth interfaces
other 0 0
# of vlan interfaces 1 1
# of vlan interfaces
up 0 0
# of vlan interfaces down 1 1
# of vlan interfaces other 0 0
Most production environments have a limit on the duration of the
maintenance window. To set the time limit of the system for the
maintenance window, configure the timeout value for the maintenance
mode using the command system mode maintenance timeout time-in-
minutes. When the timeout value is reached, the system automatically rolls
back to normal mode from maintenance mode. Example 12-25 examines
configuring the maintenance timeout to 30 minutes and verifying the
timeout value using the command show maintenance timeout.
bitmap = 0xc0
Note
If any issues arise with maintenance mode, collect the command show
tech- support mmode output during or just after the problem is seen.
[Maintenance Mode]
router bgp 100
isolate
router eigrp 100
isolate
router ospf 100
isolate
router isis IS-IS
isolate
[Maintenance Mode]
router bgp 100
isolate
router ospf 100
shutdown
router eigrp 100
shutdown
router isis IS-IS
isolate
interface Ethernet3/1
shutdown
Note
Use the command show running-config mmode to validate all the
configuration settings related to maintenance mode.
Note
To debug maintenance mode, use the command debug mmode logfile.
Enabling this debug also enables logging of the debug logs into a
logfile that is viewed using the command show system internal
mmode logfile. Collecting show tech-support mmode command
output is also recommended, in case of any failures with GIR.
Summary
NX-OS being the OS for data center switches was built on paradigms of
high availability (HA). This chapter focused on some of the high
availability features that are commonly used on Nexus switches, including
achieving high availability using BFD, which is used with various routing
protocols and features. This chapter detailed verifying the hardware
programming and using event-history logs to troubleshoot any BFD issues.
The following areas should be verified while troubleshooting BFD session
issues:
Ensure that the no ip redirects or no ipv6 redirects command is
enabled on the interface.
Verify the Error code, explains the reason for the BFD failure:
No Diag
1: Control packet detection timer expired
2: Echo function failed
3: Neighbor signaled session down
4: Forwarding plane reset
5: Path down
6: Concatenated path down
7: Administratively down
8: Reverse concatenated path down
In addition, this chapter covered the system high availability features, such
as SSO and ISSU, which are critical in a production environment.
Performing incremental ISSU upgrades that are nondisruptive is better than
performing upgrades using the reload command.
The chapter also examined Graceful Insertion and Removal (GIR) and
looked at how GIR is used to perform maintenance activities in the network
without requiring too many changes. With GIR, maintenance mode is
enabled in two modes:
Isolate mode
Shutdown mode
Isolate mode is recommended for use with GIR. Finally, this chapter
elaborated on how to create and use custom profiles for maintenance
windows instead of using system-generated profiles.
References
RFC 5880, Bidirectional Forwarding Detection. D. Katz and D. Ward. IETF,
http://tools.ietf.org/html/rfc5880, June 2010.
RFC 5881, Bidirectional Forwarding Detection for IPv4 and IPv6 (Single
Hop). D. Katz and D. Ward. IETF, http://tools.ietf.org/html/rfc5881, June
2010.
RFC 5882, Generic Application of Bidirectional Forwarding Detection. D.
Katz and D. Ward. IETF, http://tools.ietf.org/html/rfc5882, June 2010.
RFC 5883, Bidirectional Forwarding Detection for Multihop Paths. D. Katz
and D. Ward. IETF, http://tools.ietf.org/html/rfc5883, June 2010.
RFC 5884, Bidirectional Forwarding Detection for MPLS Label Switched
Paths. R. Aggarwal, K. Kompella, T. Nadeau, and G. Swallow. IETF,
http://tools.ietf.org/html/rfc5884, June 2010.
Cisco.com Cisco NX-OS Software Configuration Guides.
http://www.cisco.com.
Part V
Troubleshooting Multicast
Multicast Fundamentals
Network communication is often described as being one of the following
types:
Unicast (one-to-one)
Broadcast (one-to-all)
Anycast (one-to-nearest-one)
Multicast (one-to-many)
The concept of unicast traffic is simply a single source host sending
packets to a single destination host. Anycast is another type of unicast
traffic, with multiple destination devices sharing the same network layer
address. The traffic originates from a single host with a destination anycast
address. Packets follow unicast routing to reach the nearest anycast host,
where routing metrics determine the nearest device.
Broadcast and multicast both provide a method of one-to-many
communication on a network. What makes multicast communication
different from broadcast communication is that broadcast traffic must be
received and processed by each host that receives it. This typically results
in using system resources to process frames that end up being discarded.
Multicast traffic, in contrast, is processed only by devices that are
interested in receiving the traffic. Multicast traffic is also routable across
Layer 3 (L3) subnet boundaries, whereas broadcast traffic is typically
constrained to the local subnet. Figure 13-1 demonstrates the difference
between broadcast and multicast communication behavior.
Figure 13-1 Multicast and Broadcast Communication
Multicast Terminology
The terminology used to describe the state and behaviors of multicast must
be defined before diving further into concepts. Table 13-1 lists the
multicast terms with their corresponding definition used throughout this
chapter.
Table 13-1 Multicast Terminology
Term Definition
mroute An entry in the Multicast Routing Information Base (MRIB).
Different types of mroute entries are associated with the
source tree or the shared tree.
Incoming The interface of a device that multicast traffic is expected to
interface be received on.
(IIF)
Outgoing The interface of a device that multicast traffic is expected to
interface be transmitted out of, toward receivers.
(OIF)
Outgoing The OIFs on which traffic is sent out of the device, toward
interface list interested receivers for a particular mroute entry.
(OIL)
Group Destination IP address for a multicast group.
address
Source The unicast address of a multicast source. Also referred to as
address a sender address.
L2 The act of duplicating a multicast packet at the branch points
replication along a multicast distribution tree. Replication for multicast
traffic at L2 is done without rewriting the source MAC
address or decrementing the TTL, and the packets stay inside
the same broadcast domain.
L3 The act of duplicating a multicast packet at the branch points
replication along a multicast distribution tree. Replication for multicast
traffic at L3 requires PIM state and multicast routing. The
source MAC address is updated and the TTL is decremented
by the multicast router.
Reverse Compares the IIF for multicast group traffic to the routing
Path table entry for the source IP address or the RP address.
Forwarding Ensures that multicast traffic flows only away from the
(RPF) check source.
Multicast Multicast traffic flows from the source to all receivers over
distribution the MDT. This tree can be shared by all sources (a shared
tree (MDT) tree), or a separate distribution tree can be built for each
source (a source tree). The shared tree can be one-way or
bidirectional.
Protocol Multicast routing protocol that is used to create MDTs.
Independent
Multicast
(PIM)
RP Tree The MDT between the last-hop router (LHR) and the PIM RP.
(RPT) Also referred to as the shared tree.
Shortest- The MDT between the LHR and the first-hop router (FHR) to
path tree the source. Typically follows the shortest path as determined
(SPT) by unicast routing metrics. Also known as the source tree.
Divergence The point where the RPT and the SPT diverge toward different
point upstream devices.
Upstream A device that is relatively closer to the source along the MDT.
Downstream A device that is relatively closer to the receiver along the
MDT.
Sparse mode Protocol Independent Multicast Sparse mode (PIM SM) relies
on explicit joins from a PIM neighbor before sending traffic
toward the receiver.
Dense mode PIM dense mode (PIM DM) relies on flood-and-prune
forwarding behavior. All possible receivers are sent the traffic
until a prune is received from uninterested downstream PIM
neighbors. NX-OS does not support PIM DM.
rendezvous The multicast router that is the root of the PIM SM shared
point (RP) multicast distribution tree.
Join A type of PIM message, but more generically, the act of a
downstream device requesting traffic for a particular group or
source. This can result in an interface being added to the OIL.
Prune A type of PIM message, but more generically, the act of a
downstream device indicating that traffic for the group or
source is no longer requested by a receiver. This can result in
the interface being removed from the OIL if no other
downstream PIM neighbors are present.
First-hop The L3 router that is directly adjacent to the multicast source.
router The FHR performs registration of the source with the PIM RP.
(FHR)
Last-hop The L3 router that is directly adjacent to the multicast
router receiver. The LHR initiates a join to the PIM RP and initiates
(LHR) switchover from the RPT to the SPT.
Intermediate An L3 multicast-enabled router that forwards packets for the
router MDT.
Note
Figure 13-2 shows both the RP tree and the source tree in the diagram,
for demonstration purposes. This state does not persist in reality
because NX-3 prunes itself from the RP tree and receives the group
traffic from the source tree.
When Virtual Device Contexts (VDC) are used with the Nexus 7000 series,
all of the previously mentioned PI components are unique to the VDC.
Each VDC has its own PIM, IGMP, MRIB, and MFDM processes.
However, in each I/O module, the system resources are shared among the
different VDCs.
Replication
Multicast communication is efficient because a single packet from the
source can be replicated many times as it traverses the MDT toward
receivers located along different branches of the tree. Replication can occur
at L2 when multiple receivers are in the same VLAN on different
interfaces, or at L3 when multiple downstream PIM neighbors have joined
the MDT from different OIFs.
Replication of multicast traffic is handled by specialized hardware, which
is different on each Nexus platform. In the case of a distributed platform
with different I/O modules, egress replication is used (see Figure 13-5).
Module: 3
R-L
Class Config Allowed Dropped
Total
+------------------+--------+---------------+---------------+----
-------------+
L3
mtu 500 0 0
0
L3
ttl 500 12 0
12
L3
control 10000 0 0
0
L3
glean 100 1 0
1
L3 mcast
dirconn 3000 13 0
13
L3 mcast loc-
grp 3000 2 0 2
L3 mcast rpf-
leak 500 0 0 0
L2 storm-ctrl Disable
access-list-
log 100 0 0 0
copy 30000 7182002 0
7182002
receive 30000 27874374 0
27874374
L2 port-
sec 500 0 0
0
L2 mcast-
snoop 10000 34318 0 34
318
L2 vpc-
low 4000 0 0
0
L2
l2pt 500 0 0
0
L2 vpc-peer-
gw 5000 0 0 0
L2 lisp-map-
cache 5000 0 0 0
L2
dpss 100 0 0
0
L3 glean-
fast 100 0 0
0
L2
otv 100 0 0
0
L2
netflow 48000 0 0
0
L3 auto-
config 200 0 0
0
Vxlan-peer-
learn 100 0 0
0
L3 mcast Packets for which the source is directly connected. These packets
dirconn are sent to the CPU to generate PIM register messages.
L3 mcast Packets sent to the CPU at the LHR to trigger SPT switchover.
loc-grp
L3 mcast Packets sent to the CPU to create a PIM assert message.
rpf-leak
L2 IGMP membership reports, queries, and PIM hello packets
mcast- punted to the CPU for IGMP snooping.
snoop
As with the CoPP policy, disabling any of the HWRLs that are enabled by
default is not advised. In most deployments, no modification to the default
CoPP or HWRL configuration is necessary.
If excessive traffic to the CPU is suspected, incrementing matches or drops
in a particular CoPP class or HWRL provide a hint about what traffic is
arriving. For additional detail, an Ethanalyzer capture can look at the CPU-
bound traffic for troubleshooting purposes.
Static Joins
In general, static joins should not be required when multicast has been
correctly configured. However, this is a useful option for troubleshooting in
certain situations. For example, if a receiver is not available, a static join is
used to build multicast state in the network.
NX-OS offers the ip igmp join-group [group] [source] interface
command, which configures the NX-OS device as a multicast receiver for
the group. Providing the source address is not required unless the join is for
IGMPv3. This command forces NX-OS to issue an IGMP membership
report and join the group as a host. All packets received for the group
address are processed in the control plane of the device. This command can
prevent packets from being replicated to other OIFs and should be used
with caution.
The second option is the ip igmp static-oif [group] [source] interface
command, which statically adds an OIF to an existing mroute entry and
forwards packets to the OIF in hardware. The source option is used only
with IGMPv3. It is important to note that if this command is being added to
a VLAN interface, you must also configure a static IGMP snooping table
entry with the ip igmp snooping static-group [group] [source] interface
[interface name] VLAN configuration command to actually forward
packets.
IGMP
Hosts use the IGMP protocol to dynamically join and leave a multicast
group through the LHR. With IGMP, a host can join or leave a group at any
time. Without IGMP, a multicast router has no way of knowing when
interested receivers reside on one of its interfaces or when those receivers
are no longer interested in the traffic. It should be obvious that, without
IGMP, the efficiencies in bandwidth and resource utilization in a multicast
network would be severely diminished. Imagine if every multicast router
sent traffic for each group on every interface! For that reason, hosts and
routers must support IGMP if they are configured to support multicast
communication. In the NX-OS implementation of IGMP, a single IGMP
process serves all virtual routing and forwarding (VRF) instances. If Virtual
Device Contexts (VDC) are being used, an IGMP process runs on each
VDC.
IGMPv1 was defined in RFC 1112 and provided a state machine and the
messaging required for hosts to join and leave multicast groups by sending
membership reports to the local router. Finding a device using IGMPv1 in a
modern network is uncommon, but an overview of its operation is provided
for historical purposes so that the differences and evolution in IGMPv2 and
IGMPv3 are easier to understand.
A multicast router configured for IGMPv1 periodically sends query
messages to the All-Hosts address of 224.0.0.1. The host then waits for a
random time interval, within the bounds of a report delay timer, to send a
membership report using the group address as the destination address for
the membership report. The multicast router receives the message
indicating that traffic for a specific group should be sent. When the router
receives the membership report, it knows that a host on the segment is a
current member of the multicast group and starts forwarding the group
traffic onto the segment. A functional reason for using the group address as
the destination of the membership report is so that hosts are aware of the
presence of other receivers for the group on the same network. This allows
a host to suppress its own report message, to reduce the volume of IGMP
traffic on a segment. A multicast router needs to receive only a single
membership report to begin sending traffic onto the segment.
When a host wants to join a new multicast group, it can immediately send a
membership report for the group; it does not have to wait for a query
message from a multicast router. However, when a host wants to leave a
group, IGMPv1 does not provide a way to indicate this to the local
multicast router. The host simply stops responding to queries. If the router
receives no further membership reports, it sends three queries before
pruning off the interface from the OIL and determining that interested
receivers are no longer present.
IGMPv2
Defined in RFC 2236, IGMPv2 provides additional functionality over
IGMPv1. It required an additional message to be defined to implement the
new functionality. Figure 13-6 shows the IGMP message format.
Figure 13-6 IGMP Message Format
Note
IP packets carrying IGMP messages have the TTL set to 1 and the
router alert option set in the IP header, to force routers to examine the
packet contents.
IGMPv3
IGMPv3 was specified in RFC 3376. It allows a host to support the
functionality required for Source Specific Multicast (SSM). SSM multicast
allows a receiver to specifically join not only the multicast group address,
but also the source address for a particular group. Applications running on
a multicast receiver host can now request specific sources.
In IGMPv3, the interface state of the host includes a filter mode and source
list. The filter mode can be include or exclude. When the filter mode is
include, traffic is requested only from the sources in the source list. If the
filter mode is exclude, traffic is requested for any source except the ones
present in the source list. The source list is an unordered list of IP unicast
source addresses, which can be combined with the filter mode to
implement source-specific logic. This allows IGMPv3 to signal only the
sources of interest to the receiver in the protocol messages.
Figure 13-7 provides the IGMPv3 membership query message format,
which includes several new fields when compared to the IGMPv2
membership query message, although the message type remains the same
(0x11).
Each group record in the membership report uses the format shown in
Figure 13-9.
Figure 13-9 IGMPv3 Membership Report Group Record Format
IGMP Snooping
Without IGMP snooping, a switch must flood multicast packets to each
port in a VLAN to ensure that every potential group member receives the
traffic. Obviously, bandwidth and processing efficiency are reduced if ports
on the switch do not have an interested receiver attached. IGMP snooping
inspects (or “snoops on”) the higher-layer protocol communication
traversing the switch. Looking into the contents of IGMP messages allows
the switch to learn where multicast routers and interested receivers for a
group are attached. IGMP snooping operates in the control plane by
optimizing and suppressing IGMP messages from hosts, and operates in the
data plane by installing multicast MAC address and port-mapping entries
into the local multicast MAC address table of the switch. The entries
created by IGMP snooping are installed in the same MAC address table as
unicast entries. Despite the fact that different commands are used for
viewing the entries installed by normal unicast learning and IGMP
snooping, they share the same hardware resources provided by the MAC
address table.
An IGMP snooping switch listens for IGMP query messages and PIM hello
messages to determine which ports are connected to mrouters. When a port
is determined to be an mrouter port, it receives all multicast traffic in the
VLAN so that appropriate control plane state on the mrouter is created and
sources are registered with the PIM RP, if applicable. The snooping switch
also forwards IGMP membership reports to the mrouter to initiate the flow
of multicast traffic to group members.
Host ports are discovered by listening for IGMP membership report
messages. The membership reports are evaluated to determine which
groups and sources are being requested, and the appropriate forwarding
entries are added to the multicast MAC address table or IP-based
forwarding table. An IGMP snooping switch should not forward
membership reports to hosts because it results in hosts suppressing their
own membership reports for IGMPv1 and IGMPv2.
If a multicast packet for the Network Control Block 224.0.0.0/24 arrives, it
might need to be flooded on all ports. This is because devices can listen for
groups in this range without sending a membership report for the group,
and suppressing those packets could interrupt control plane protocols.
IGMP snooping is a separate process from the IGMP control plane process
and is enabled by default in NX-OS. No user configuration is required to
have the basic functionality running on the device. NX-OS builds its IGMP
snooping table based on the group IP address instead of the multicast MAC
address for the group. This behavior allows for optimal forwarding even if
the L3 group addresses of multiple groups overlap to the same multicast
group MAC address. The output in Example 13-2 demonstrates how to
verify the IGMP snooping state and lookup mode for a VLAN.
If multicast traffic arrives for a group that a host has not requested via a
membership report message, those packets are forwarded to the mrouter
ports only, by default. This is called optimized multicast flooding in NX-
OS and is shown as enabled by default in Example 13-2. If this feature is
disabled, traffic for an unknown group is flooded to all ports in the VLAN.
Note
Optimized multicast flooding should be disabled in IPv6 networks to
avoid problems related to neighbor discovery (ND) that rely
specifically on multicast communication. This feature is disabled with
the no ip igmp snooping optimised-multicast-flood command in
VLAN configuration mode.
Note
When vPC is configured with IGMP snooping, configuring the same
IGMP parameters on both vPC peers is recommended. IGMP state is
synchronized between vPC peers with Cisco Fabric Services (CFS).
IGMP Verification
IGMP is enabled by default when PIM is enabled on an interface.
Troubleshooting IGMP problems typically involves scenarios in which the
LHR does not have an mroute entry populated by IGMP and the problem
needs to be isolated to the LHR, the L2 infrastructure, or the host itself.
Often IGMP snooping must be verified during this process because it is
enabled by default and therefore plays an important role in delivering the
queries to hosts and delivering the membership report messages to the
mrouter.
In the topology in Figure 13-10, NX-1 is acting as the LHR for receivers in
VLAN 115 and VLAN 116. NX-1 is also the IGMP querier for both
VLANs. NX-2 is an IGMP snooping switch that is not performing any
multicast routing. All L3 devices are configured for PIM ASM, with an
anycast RP address shared between NX-3 and NX-4.
Figure 13-10 IGMP Verification Example Topology
If a receiver is not getting multicast traffic for a group, verify IGMP for
correct state and operation. To begin the investigation, the following
information is required:
Multicast Group Address: 239.215.215.1
IP address of the source: 10.215.1.1
IP address of the receiver: 10.115.1.4
LHR: NX-1
Scope of the problem: The groups, sources, and receivers that are not
functioning
The purpose of IGMP is to inform the LHR that a receiver is interested in
group traffic. At the most basic level, this is communicated through a
membership report message from the receiver and should create a (*, G)
state at the LHR. In most circumstances, checking the mroute at the LHR
for the presence of the (*, G) is enough to verify that at least one
membership report was received. The OIL for the mroute should contain
the interface on which the membership report was received. If this check
passes, typically the troubleshooting follows the MDT to the PIM RP or
source to determine why traffic is not arriving at the receiver.
In the following examples, no actual IGMP problem condition is present
because the (*, G) state exists on NX-1. Instead of troubleshooting a
specific problem, this section reviews the IGMP protocol state and
demonstrates the command output, process events, and methodology used
to verify functionality.
Verification begins from NX-2, which is the IGMP snooping switch
connected to the receiver 10.115.1.4, and works across the L2 network
toward the mrouter NX-1. Example 13-4 contains the output of show ip
igmp snooping vlan 115, which is where the receiver is connected to NX-2.
This output is used to verify that IGMP snooping is enabled and that the
mrouter port is detected.
The Number of Groups field indicates that one group is present. The show
ip igmp snooping groups vlan 115 command is used to obtain additional
detail about the group, as in Example 13-5.
The last reporter is seen using the detail keyword, shown in Example 13-6.
Note
If MAC-based multicast forwarding was configured for VLAN 115,
the multicast MAC table entry can be confirmed with the show
hardware mac address-table [module] [VLAN identifier] command.
There is no software MAC table entry in the output of show mac
address-table multicast [VLAN identifier], which is expected.
NX-2 is configured to use IP-based lookup for IGMP snooping. The show
forwarding distribution ip igmp snooping vlan [VLAN identifier]
command in Example 13-7 is used to find the platform index, which is used
to direct the frames to the correct output interfaces. The platform index is
also known as the Local Target Logic (LTL) index. This command provides
the Multicast Forwarding Distribution Manager (MFDM) entry, which was
discussed in the NX-OS “NX-OS Multicast Architecture” section of this
chapter.
Member info
------------------
IFIDX LTL
---------------------------------
Eth3/19 0x0012
Po1 0x0404
Note
If the IFIDX of interest is a port-channel, the physical interface is
found by examining the LTL index of the port-channel. Chapter 5,
“Port-Channels, Virtual Port-Channels, and FabricPath,” demonstrates
the port-channel load balance hash and how to find the port-channel
member link that will be used to transmit the packet.
At this point, the IGMP snooping control plane was verified in addition to
the forwarding plane state for the group with the available show
commands. NX-OS also provides several useful event-history records for
IGMP, as well as other multicast protocols. The event-history output
collects significant events from the process and stores them in a circular
buffer. In most situations, for multicast protocols, the event-history records
provide the same level of detail that is available with process debugs.
The show ip igmp snooping internal event-history vlan command
provides a sequence of IGMP snooping events for VLAN 115 and the group
of interest, 239.215.215.1. Example 13-9 shows the reception of a general
query message from Port-channel 1, as well as the membership report
message received from 10.115.1.4 on Eth3/19.
NX-OS maintains statistics for IGMP snooping at both the global and
interface level. These statistics are viewed with either the show ip igmp
snooping statistics global command or the show ip igmp snooping
statistics vlan [VLAN identifier] command. Example 13-11 shows the
statistics for VLAN 115 on NX-2. The VLAN statistics also include global
statistics, which are useful for confirming how many and what type of
IGMP and PIM messages are being received on a VLAN. If additional
packet-level details are needed, using Ethanalyzer with an appropriate filter
is recommended.
With NX-2 verified, the examination moves to the LHR, NX-1. NX-1 is the
mrouter for VLAN 115 and the IGMP querier. The IGMP state on NX-1 is
verified with the show ip igmp interface vlan 115 command, as in
Example 13-12.
The membership report NX-2 forwarded from the host is received on Port-
channel 1. The query messages and membership reports are viewed in the
show ip igmp internal event-history debugs output in Example 13-13.
When the membership report message is received, NX-1 determines that
state needs to be created.
IGMP must also inform the MRIB so that an appropriate mroute entry is
created. This is seen in the show ip igmp internal event-history igmp-
internal output in Example 13-15. An IGMP update is sent to the MRIB
process buffer through Message and Transactional Services (MTS). Note
that IGMP receives notification from MRIB that the message was
processed and the message buffer gets reclaimed.
When the MRIB process receives the MTS message from IGMP, an mroute
is created for (*, 239.215.215.1/32) and the MFDM is informed. The RPF
toward the PIM RP (10.99.99.99) is then confirmed and added to the entry.
The output of show ip mroute in Example 13-17 confirms that a (*, G)
entry has been created by IGMP and the OIF was also populated by IGMP.
Note
Additional events occur after this point when traffic arrives from the
source, 10.215.1.1. The arrival of data traffic from the RP triggers a
PIM join toward the source and creation of the (S, G) mroute. This is
explained in the “PIM Any Source Multicast” section later in this
chapter.
PIM Multicast
PIM is the multicast routing protocol used to build shared trees and
shortest-path trees that facilitates the distribution of multicast traffic in an
L3 network. As the name suggests, PIM was designed to be protocol
independent. PIM essentially creates a multicast overlay network built
upon the information available from the underlying unicast routing
topology. The term protocol independent is based on the fact that PIM can
use the unicast routing information in the Routing Information Base (RIB)
from any source protocol, such as EIGRP, OSPF, or BGP. The unicast
routing table provides PIM with the relative location of sources,
rendezvous points, and receivers, which is essential to building a loop-free
MDT.
PIM is designed to operate in one of two modes, dense mode or sparse
mode. Dense mode (DM) operates under the assumption that receivers are
densely dispersed through the network. In dense mode, the assumption is
that all PIM neighbors should receive the traffic. In this mode of operation,
multicast traffic is flooded to all downstream neighbors. If the group traffic
is not required, the neighbor prunes itself from the tree. This is referred to
as a push model because traffic is pushed from the root of the tree toward
the leaves, with the assumption that there are many leaves and they are all
interested in receiving the traffic. NX-OS does not support PIM dense
mode because PIM sparse mode offers several advantages and is the most
popular mode deployed in modern data centers.
PIM sparse mode (SM) is based on a pull model. The pull model assumes
that receivers are sparsely dispersed through the network and that it is
therefore more efficient to have traffic forward to only the PIM neighbors
that are explicitly requesting the traffic. PIM sparse mode works well for
the distribution of multicast when receivers are sparsely or densely
populated in the topology. Because of its explicit join behavior, it has
become the preferred mode of deploying multicast.
The role of PIM in the process of distributing multicast traffic from a
source to a receiver is described by the following responsibilities:
Registering multicast sources with the PIM RP (ASM)
Joining an interested receiver to the MDT
Deciding which tree should be joined on behalf of the receiver
If multiple PIM routers exist on the same L3 network, determining
which PIM router will forward traffic
This section of the chapter introduces the PIM protocol and messages PIM
uses to build MDTs and create forwarding state. The different operating
models of PIM SM are examined, including ASM, SSM, and Bi-Directional
PIM (Bidir).
Note
RFC 2362 initially defined PIM as an experimental protocol that was
later made obsolete by RFC 4601. Recently, RFC 4601 was updated by
RFC 7761. The NX-OS implementation of PIM is based on RFC 4601.
Note
This chapter does not cover the PIM messages specific to PIM DM
because NX-OS does not support PIM DM. Interested readers should
review RFC 3973 to learn about the various PIM DM messages.
Note
In theory, it is possible that the number of group sets exceeds the
maximum IP packet size of 65535. In this case, multiple join-prune
messages are used. It is important to ensure that PIM neighbors have a
matching L3 MTU size because a neighbor could sent a join-prune
message that is too large for the receiving interface to accommodate.
This results in missing multicast state on the receiving PIM neighbor
and a broken MDT.
PIM Bootstrap Message
The PIM bootstrap message is originated by the Bootstrap Router (BSR)
and provides an RP set that contains group-to-RP mapping information.
The bootstrap message is sent to the ALL-PIM-ROUTERS address of
224.0.0.13 and is forwarded hop by hop throughout the multicast domain.
Upon receiving a bootstrap message, a PIM router processes its contents
and builds a new packet to forward the bootstrap message to all PIM
neighbors per interface. It is possible for a bootstrap message to be
fragmented into multiple Bootstrap Message Fragments (BSMF). Each
fragment uses the same format as the bootstrap message. The PIM
bootstrap message contains the following fields:
Type: The value is 4 for a bootstrap message.
No-Forward Bit: Instruction that the bootstrap message should not be
forwarded.
Fragment Tag: Randomly generated number used to distinguish
BSMFs that belong to the same bootstrap message. Each fragment
carries the same value.
Hash Mask Length: The length, in bits, of the mask to use in the hash
function.
BSR Priority: The priority value of the originating BSR. The value
can be 0 to 255 (higher is preferred).
BSR Address: The address of the bootstrap router for the domain.
Group Address 1 .. n: The group ranges associated with the
candidate-RPs.
RP Count 1 .. n: The number of candidate-RP addresses included in
the entire bootstrap message for the corresponding group range.
Frag RP Count 1 .. m: The number of candidate-RP addresses
included in this fragment of the bootstrap message for the
corresponding group range.
RP Address 1 .. m: The address of the candidate-RP for the
corresponding group range.
RP1 .. m Holdtime: The holdtime, in seconds, for the corresponding
RP.
RP1 .. m Priority: The priority of the corresponding RP and group
address. This field is copied from the candidate-RP advertisement
message. The highest priority is zero and is per RP and per group
address.
version 7.2(2)D1(2)
feature pim
interface Vlan115
ip pim sparse-mode
interface Vlan116
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
After PIM is enabled on an interface, hello packets are sent and PIM
neighbors form if there is another router on the link that is also PIM
enabled.
Note
The hello interval for PIM is configured in milliseconds. The
minimum accepted value is 1000 ms, which is equal to 1 second. If an
interval lower than the default is needed to detect a failed PIM
neighbor, use BFD for PIM instead of a reduced hello interval.
In the output of Example 13-19, NX-1 has formed PIM neighbors with NX-
3 and NX-4. The output shows whether the neighbor is BiDIR capable and
also provides the priority value of each neighbor which is used for DR
election.
If additional detail about the PIM message contents is desired, the packets
can be captured using the Ethanalyzer tool (see Example 13-23). The
packet detail is examined locally using the detail option, or the capture
may be saved for offline analysis with the write option.
Capturing on inband
Frame 1: 64 bytes on wire (512 bits), 64 bytes captured (512 bits)
Encapsulation type: Ethernet (1)
Arrival Time: Oct 29, 2017 00:48:35.186687000 UTC
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1509238115.186687000 seconds
[Time delta from previous captured frame: 0.029364000 seconds]
[Time delta from previous displayed frame: 0.029364000
seconds]
[Time since reference or first frame: 3.751505000 seconds]
Frame Number: 5
Frame Length: 64 bytes (512 bits)
Capture Length: 64 bytes (512 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ip:pim]
<>
Internet Protocol Version 4, Src: 10.1.13.3 (10.1.13.3), Dst:
224.0.0.13 (224.0.0.13)
<>
Protocol Independent Multicast
0010 .... = Version: 2
.... 0000 = Type: Hello (0)
Reserved byte(s): 00
Checksum: 0x3954 [correct]
PIM options: 4
Option 1: Hold Time: 105s
Type: 1
Length: 2
Holdtime: 105s
Option 19: DR Priority: 1
Type: 19
Length: 4
DR Priority: 1
Option 22: Bidir Capable
Type: 22
Length: 0
Option 20: Generation ID: 765622359
Type: 20
Length: 4
Generation ID: 765622359
Note
NX-OS supports PIM neighbor authentication, as well as BFD for PIM
neighbors. Refer to the NX-OS configuration guides for information
on these features.
Note
The SPT switchover is optional in PIM ASM. The ip pim spt-
threshold infinity command is used to force a device to remain on the
RPT.
PIM ASM Configuration
The configuration for PIM ASM is straightforward. Each interface that is
part of the multicast domain is configured with ip pim sparse-mode. This
includes L3 interfaces between routers and any interface where receivers
are connected. It is also considered a best practice to enable the PIM RP
Loopback interface with ip pim sparse-mode for simplicity and
consistency, although this might not be required on some platforms. The
PIM RP address must be configured on every PIM router and must have a
consistent mapping of groups to a particular RP address. NX-OS supports
BSR and Auto-RP for automatically configuring the PIM RP address in the
domain; this is covered in the “PIM RP Configuration” section of this
chapter. Example 13-24 contains the PIM configuration for NX-1, which is
currently acting as the PIM RP. The other PIM routers have a similar
configuration but do not have a Loopback99 interface. Loopback99 is the
interface where the PIM RP address is configured on NX-1. It is possible to
configure multiple PIM RPs in the network and restrict which groups are
mapped to a particular RP with the group-list or a prefix-list option.
feature pim
interface Vlan1101
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
Depending on the scale of the network environment, it might be necessary
to increase the size of the PIM event-history logs when troubleshooting a
problem. The size is increased per event-history with the ip pim event-
history [event type] size [event-history size] configuration command.
NX-2 then registers this source with the PIM RP NX-1 (10.99.99.99) by
sending a PIM register message with an encapsulated data packet from the
source. NX-1 receives this register message, as the output of show ip pim
internal event-history null-register in Example 13-26 shows. The first
register message has pktlen 84, which creates the mroute state at the PIM
RP. Subsequent null-register messages that do not have the encapsulated
source packet are only 20 bytes. NX-1 responds to each register message
with a register-stop.
Note
NX-OS can have a separate event-history for receiving encapsulated
data register messages, depending on the version. The command is
show ip pim internal event-history data-register-receive. In older
NX-OS releases, debug ip pim data-register send and debug ip pim
data-register receive are used to debug the PIM registration process.
Because no receivers currently exist in the PIM domain, NX-1 adds an (S,
G) mroute with an empty OIL (see Example 13-27). The IIF is the L3
interface between NX-1 and NX-2 Vlan1101, which is carried over Port-
channel 1. The mroute has the PIM flag to indicate that PIM created this
mroute state.
After adding the mroute entry, NX-1 sends a register-stop message back to
NX-2 (see Example 13-28). NX-2 suppresses its first null register message
because it has just received a register-stop for a recent encapsulated data
register message. After the register-stop, NX-2 starts its Register-
Suppression timer. Just before expiring the timer, another null-register is
sent. If the timer expires without a register stop from the RP, the DR
resumes sending full encapsulated packets.
The source has been successfully registered with the PIM RP. This state
persists until a receiver joins the group, with NX-2 periodically informing
NX-1 via null register messages that the source is still actively sending to
the group address.
A receiver in VLAN 215 connected to NX-4 sends a membership report to
initiate the flow of multicast for the 239.115.115.1 group. When this
message arrives at NX-4, it triggers the creation of a (*, G) mroute entry by
IGMP with an OIL containing VLAN 215 (see Example 13-29). The IIF
Ethernet 3/29 is the interface used to reach the PIM RP address on NX-1.
The mroute entry corresponds to a PIM RPT join being sent from NX-4
toward NX-1 (see Example 13-30).
When NX-1 receives this RPT Join from NX-4, the OIF Ethernet 3/17 is
added to the OIL of the mroute (see Example 13-31).
The receipt of the join triggers the creation of a (*, G) mroute state on NX-
1 and also triggers a join from NX-1 to NX-2 over VLAN 1101 for the
source (see Example 13-32).
The result of this join from NX-1 to NX-2 is that NX-2 adds an OIF of
VLAN 1101 (see Example 13-33).
Traffic now flows from the source, through NX-2 toward NX-1. NX-1
receives the traffic and forwards it through the RPT to NX-4. At NX-4,
traffic is now received on the RPT and the SPT switchover occurs, as seen
in the PIM event-history output in Example 13-34. NX-4 first sends the
SPT join to NX-2 (10.2.23.2) and then prunes itself from the RPT to NX-1
(10.2.13.1).
The resulting mroute state on NX-4 is that the (S, G) was created and the
OIL contains VLAN215. The IIF for the (S, G) points toward NX-2, while
the IIF for the (*, G) points to the PIM RP at NX-1. Example 13-35 shows
the show ip mroute output from NX-4.
NX-2 has an (S, G) mroute with the IIF of VLAN 115 and the OIF of
Ethernet 3/17 that is connected to NX-4. Example 13-36 shows the mroute
state of NX-2.
NX-1 has (*, G) state from NX-4 but no OIF for the (S, G) state. Example
13-37 contains the mroute table of NX-1 after the SPT switchover. The IIF
of the (*, G) is the RP interface of Loopback99, which is the root of the
RPT.
As the previous section demonstrates, the mroute state and the event-
history in NX-OS make it possible to determine whether the problem
involves the RPT or the SPT and to determine which device along the tree
is causing trouble.
The mroute provides the IIF and OIF, dictating which modules need to be
verified. Knowing which modules are involved is important because the
Nexus 7000 series performs egress replication for multicast traffic. With
egress replication, packets arrive on the ingress module and a copy of the
packet is sent to any local receivers on the same I/O module. Another copy
of the packet is directed to the fabric toward the I/O module of the
interfaces in the OIL of the mroute. When the packet arrives at the egress
module, another lookup is done to replicate the packet to the egress
interfaces.
The OIL contains L3 interface Ethernet 3/17, and the IIF is VLAN 115. To
confirm which physical interface the traffic is arriving on in VLAN 115,
the ARP cache and MAC address table entries are checked for the multicast
source. The show ip arp command provides the MAC address of the source
(see Example 13-39).
IP ARP Table
Total number of entries: 1
Address Age MAC Address Interface
10.115.1.4 00:10:53 64a0.e73e.12c2 Vlan115
Now check the MAC address table to confirm which interface packets
should be arriving on from 10.115.1.4. Example 13-40 shows the output of
the MAC address table.
Example 13-40 MAC Address Table Entry for the Multicast Source
Click here to view code image
NX-2# show mac address-table dynamic vlan 115
! Output omitted for brevity
Note: MAC table entries displayed are getting read from software.
Use the 'hardware-age' keyword to get information related to
'Age'
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC
Peer-Link, E -
EVPN entry
(T) - True, (F) - False , ~~~ - use 'hardware-age'
keyword to retrieve
age info
VLAN/BD MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+-------
-----------
* 115 64a0.e73e.12c2 dynamic ~~~ F F Eth3/19
It has now been confirmed that packets are coming into NX-2 on Ethernet
3/19 and egressing on Ethernet 3/17 toward NX-4. The next step in the
verification is to check the MFDM entry for the group to ensure that it is
present with the correct IIF and OIL (see Example 13-41).
The MFDM entry looks correct. The remaining steps are performed from
the LC console, which is accessed with the attach module [module
number] command. If the verification is being done in a nondefault VDC, it
is important to use the vdc [vdc number] command to enter the correct
context after logging into the module. After logging into the correct ingress
module, confirm the correct L3LKP ASIC.
Note
Verification can be completed without logging into the I/O module by
using the slot [module number] quoted [LC CLI command] to obtain
output from the module.
In this particular scenario, the ingress port and egress port are using the
same SOC instance (2), and are on the same module. If the module or SOC
instance were different, each SOC on each module would need to be
verified to ensure that the correct information is present.
With the SOC numbers confirmed for the ingress and egress interfaces,
now check the forwarding entry on the I/O module. This entry has the
correct incoming interface of Vlan115 and the correct OIL, which contains
Ethernet 3/17 (see Example 13-43). Verify the outgoing packets counter to
ensure that it is incrementing periodically.
All information so far has the correct IIF and OIF, so the final step is to
check the programming from the SOC (see Example 13-44).
Cisco TAC should interpret the various fields present. These fields
represent the pointers to the various table lookups required to replicate the
multicast packet locally, or to the fabric if the egress interface is on a
different module or SOC. Verification of these indexes requires multiple
ELAM captures at the various stages of forwarding lookup and replication.
PIM Bidirectional
PIM BiDIR is another version of PIM SM in which several modifications
to traditional ASM behavior have been made. The differences between PIM
ASM and PIM BiDIR follow:
BiDIR uses bidirectional shared trees, whereas ASM relies on
unidirectional shared and source trees.
BiDIR does not use any (S, G) state. ASM must maintain (S, G) state
for every source sending traffic to a group address.
BiDIR does not need any source registration process, which reduces
processing overhead.
Both ASM and BiDIR must have every group mapped to a rendezvous
point (RP). The RP in BiDIR does not actually do any packet
processing. In BiDIR, the RP address (RPA) is just a route vector that
is used as a reference point for forwarding up or down the shared tree.
BiDIR uses the concept of a Designated Forwarder (DF) that is elected
on every link in the PIM domain.
Because BiDIR does not require any (S, G) state, only a single (*, G)
mroute entry is required to represent a group. This can dramatically reduce
the number of mroute entries in a network with many sources, compared to
ASM. With a reduction of mroute entries, the potential scalability of the
network is higher because any router platform has a finite number of table
entries that can be stored before resources become exhausted. The increase
in scale does come with a trade-off of losing visibility into the traffic of
individual sources because there is no (S, G) state to track them. However,
in very large, many-to-many environments, this downside is outweighed by
the reduction in state and the elimination of the registration process.
BiDIR has important terminology that must be defined before looking
further into how it operates. Table 13-10 provides these definitions.
Table 13-10 PIM BiDIR Terminology
Term Definition
Rendezvous An address that is used as the root of the MDT for all groups
point mapped to it. The RPA must be reachable from all routers in
address the PIM domain. The address used for the RPA does not need
(RPA) to be configured on the interface of any router in the PIM
domain.
Rendezvous The physical link used to reach the RPA. All packets for groups
point link mapped to the RPA are forwarded out of the RPL. The RPL is
(RPL) the only interface where a DF election does not occur.
Designated A single DF is elected on every link for each RPA. The DF is
forwarder elected based on its unicast routing metric to the RPA. The DF
(DF) is responsible for sending traffic down the tree to its link and
is also responsible for sending traffic from its link upstream
toward the RPA. In addition, the DF is responsible for sending
PIM Join-Prune messages upstream toward the RPA, based on
the state of local receivers or PIM neighbors.
RPF The interface used to reach an address, based on unicast
interface routing protocol metrics.
RPF The PIM neighbor used to reach an address, based on the
neighbor unicast routing protocol metrics. With BiDIR, the RPF
neighbor might not be the router that should receive Join-
Prune messages. All Join-Prune messages should be directed to
the elected DF.
PIM neighbors that can understand BiDIR set the BiDIR capable bit in their
PIM hello messages. This is a foundational requirement for BiDIR to
become operational. As the PIM process becomes operational on each
router, the group-to-RP mapping table is populated by either static
configuration or through Auto-RP or BSR. When the RPA(s) are known, the
router determines its unicast routing metric for the RPA(s) and moves to
the next phase, to elect the DF on each interface.
Initially, all routers begin sending PIM DF election messages that carry the
offer subtype. The offer message contains the sending router’s unicast
routing metric to reach the RPA. As these messages are exchanged, all
routers on the link become aware of each other and what each router’s
metric is to the RPA. If a router receives an offer message with a better
metric, it stops sending offer messages, to allow the router with the better
metric to become elected as the DF. However, if the DF election does not
occur, the election process restarts. The result of this initial DF election
should be that all routers except for the one with the best metric stop
sending offer messages. This allows the router with the best metric to
assume the DF role after sending three offers and not receiving additional
offers from any other neighbor. After assuming the DF role, the router
transmits a DF election message with the winner subtype, which tells all
routers on the link which device is the DF and informs them of the winning
metric.
During normal operation, a new router might come online or metrics
toward the RPA could change. This essentially results in offer messages
sent to the current DF. If the current DF still has the best metric to the RPA,
it responds with a winner message. If the received metric is better than the
current DF, the current DF sends a backoff message. The backoff message
tells the challenging router to wait before assuming the DF role so that all
routers on the link have an opportunity to send an offer message. During
this time, the original DF is still acting as the DF. After the new DF is
elected, the old DF transmits a DF election message with the pass subcode,
which hands over the DF responsibility to the new winner. After the DF is
elected, the PIM BiDIR network is ready to begin forwarding multicast
packets bidirectionally using shared trees rooted at the RPA.
Packets arriving from a downstream link are forwarded upstream until they
reach the router with the RPL, which contains the RPA. Because no
registration process occurs and no switchover to an SPT takes place, the
RPA does not need to be on a router. This is initially confusing, but it works
because packets are forwarded out the RPL toward the RPA, and (*, G) state
is built from every FHR connected to a source and from every LHR with an
interested receiver toward the RPA. In other words, with BiDIR, packets do
not have to actually traverse the RP as they do in ASM. The intersecting
branches of the bidirectional (*, G) tree can distribute multicast directly
between source and receiver.
In NX-OS, up to eight BiDIR RPAs are supported per VRF. Redundancy for
the RPA is achieved using a concept referred to as a phantom RP. The term
is used because the RPA is not assigned to any router in the PIM domain.
For example, assume an RPA address of 10.1.1.1. NX-1 could have
10.1.1.0/30 configured on its Loopback10 interface and NX-3 could have
10.1.1.0/29 configured on its Loopback10 interface. All routers in the PIM
domain follow the longest-prefix-match rule in their routing table to prefer
NX-1. If NX-1 failed, NX-3 would then become the preferred path to the
RPL and thus the RP as soon as the unicast routing protocol converges.
The topology in Figure 13-14 demonstrates the configuration and
troubleshooting of PIM BiDIR.
Figure 13-14 PIM BiDIR Topology
BiDIR Configuration
The configuration for PIM BiDIR is similar to the configuration of PIM
ASM. PIM sparse mode must be enabled on all interfaces. The BiDIR
capable bit is set in PIM hello messages by default, so no interface-level
command is required to specifically enable PIM BiDIR. An RP is
designated as a BiDIR RPA when it is configured with the bidir keyword in
the ip pim rp-address [RP address] group-range [groups] bidir
command.
Example 13-45 shows the phantom RPA configuration that was previously
described. Loopback99 is the RPL, which is configured with a subnet that
contains the RPA. The RPA is not actually configured on any router in the
topology, which is a major difference between PIM BiDIR and PIM ASM.
This RPA is advertised to the PIM domain with OSPF; because you want
OSPF to advertise the link as 10.99.99.96/29, the ip ospf network point-to-
point command is used. This forces OSPF on NX-1 to advertise this as a
stub-link in the type 1 router link-state advertisement (LSA).
feature pim
interface Vlan1101
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
NX-1# show run interface loopback99
! Output omitted for brevity
interface loopback99
ip address 10.99.99.98/29
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
NX-1# show ip pim group-range 239.115.115.1
PIM Group-Range Configuration for VRF "default"
Group-range Action Mode RP-address Shrd-tree-
range Origin
224.0.0.0/4 - Bidir
10.99.99.99 - Static NX-1# show ip pim rp
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
Note
All other routers in the topology have the same BiDIR-specific
configuration, which is the static RPA with the BiDIR keyword. NX-1
and NX-3 are the only routers configured with an RPL to the RPA.
BiDIR Verification
To understand the mroute state and BiDIR events, verification begins from
NX-4, where a receiver is connected in VLAN 215. Example 13-46 gives
the output of show ip mroute from NX-4, which is the LHR. The (*, G)
mroute was created as a result of the IGMP membership report from the
receiver. Because this is a bidirectional shared tree, notice that the RPF
interface Ethernet 3/29 used to reach the RPA is also included in the OIL
for the mroute.
Because NX-4 is the DF election winner on VLAN 215, it sends a PIM join
for the shared tree to the DF on the RPF interface Ethernet 3/29. The show
ip pim internal event-history join-prune command is used to view these
events (see Example 13-49 for the output).
The next hop in the bidirectional shared tree is NX-1, which is NX-4’s RPF
neighbor to the RPA. The join-prune event-history confirms that the (*, G)
join was received from NX-4 (see Example 13-51).
Example 13-53 gives the output of show ip pim df. Because the RPL is
local to this device, it is the DF winner on all interfaces except for the RPL.
No DF is elected on the RPL in PIM BiDIR.
Example 13-53 PIM DF Status on NX-1
Click here to view code image
No (S, G) join exists from the RPA toward the source as there would have
been in PIM ASM. In BiDIR, all traffic from the source is forwarded from
NX-2, which is the FHR toward the RPA. Therefore, a join from NX-1 to
NX-2 is not required to pull the traffic to NX-1 across VLAN1101. This
fact highlights one troubleshooting disadvantage of BiDIR. No visibility
from the RPA to the FHR is available about this particular source because
the (S, G) state does not exist.
An ELAM capture can be used on NX-1 to verify that traffic is arriving
from NX-2. Another useful technique is to configure a permit line in an
ACL to match the traffic. Configure the ACL with statistics per-entry,
which provides a counter to verify that traffic has arrived. In the output of
Example 13-54, the ACL named verify was configured to match the source
connected on NX-2. The ACL is applied ingress on VLAN 1101, which is
the interface traffic should be arriving on.
interface Vlan1101
description L3 to 7009-B-NX-2
no shutdown
mtu 9216
ip access-group verify in
no ip redirects
ip address 10.1.11.1/30
no ipv6 redirects
ip ospf cost 1
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
NX-1# show access-list verify
In this exercise, the source is connected to NX-2, so the mroute entry can
be verified to ensure that VLAN 1101 to NX-1 is included in the OIL.
Example 13-55 shows the mroute from NX-2. The mroute entry covers all
groups mapped to the RPA.
Static RP Configuration
Static RP is the simplest mechanism to implement. Each router in the
domain is configured with a PIM RP address, as shown in Example 13-56.
feature pim
ip pim rp-address 10.99.99.99 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
The simplicity has drawbacks, however. Any change to the group mapping
requires the network operator to update the configuration on each router. In
addition, a single static PIM RP could become a scalability bottleneck as
hundreds or thousands of sources are being registered. If the network is
small in scale, or if a single PIM RP address is being used for all groups, a
static RP could be a good option.
Note
If a static RP is configured and dynamic RP–to–group mapping is
received, the router uses the dynamic learned address if it is more
specific. If the group mask length is equal, the higher IP address is
used. The override keyword forces a static RP to win over Auto-RP or
BSR.
Auto-RP Configuration and Verification
Auto-RP uses the concept of candidate RPs and candidate mapping agents.
Candidate RPs send their configured multicast group ranges in RP-
announce messages that are multicast to 224.0.1.39. Mapping agents listen
for the RP-announce messages and collect the RP-to-group mapping data
into a local table. After resolving any conflict in the mapping, the list is
passed to the network using RP-discovery messages that are sent to
multicast address 224.0.1.40. Routers in the network are configured to
listen for the RP-discovery messages sent by the elected mapping agent.
Upon receiving the RP-discovery message, each router in the PIM domain
updates its local RP-to-group mapping table.
Multiple mapping agents could exist in the network, so a deterministic
method is needed to determine which mapping agent routers should listen
to. Routers in the network use the mapping agent with the highest IP
address to populate their group-to-RP mapping tables. See Figure 13-15 for
the topology used here to discuss the operation and verification of Auto-RP.
feature pim
interface Vlan1101
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
The group range can be configured for additional granularity using the
group-list, prefix-list, or route-map options.
Note
The interface used as an Auto-RP candidate-RP or mapping agent
must be configured with ip pim sparse-mode.
Example 13-58 shows the Auto-RP mapping agent configuration from NX-
4. This configuration results in NX-4 sending RP-discovery messages with
a TTL of 16. In the output of show ip pim rp, because NX-4 is the current
mapping agent, a timer is displayed to indicate when the next RP-discovery
message will be sent.
feature pim
ip pim auto-rp mapping-agent loopback0 scope 16
ip pim ssm range 232.0.0.0/8
ip pim auto-rp listen forward
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
NX-4# show ip pim rp
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP RPA: 10.2.2.3*, next Discovery message in: 00:00:29
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
RP: 10.3.3.3, (0),
uptime: 01:18:01 priority: 0,
RP-source: 10.3.3.3 (A),
group ranges:
239.0.0.0/8 , expires: 00:02:37 (A)
RP: 10.99.99.99, (0),
uptime: 01:20:27 priority: 0,
RP-source: 10.99.99.99 (A),
group ranges:
224.0.0.0/4 , expires: 00:02:36 (A)
Note
Do not use an anycast IP address for the mapping agent address. This
could result in frequent refreshing of the RP mapping in the network.
feature pim
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback1
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
NX-3# show ip pim rp
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP RPA: 10.2.2.3, uptime: 01:21:50, expires: 00:02:49
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
RP: 10.3.3.3*, (0), uptime: 01:16:28, expires: 00:02:49 (A),
priority: 0, RP-source: 10.2.2.3 (A), (local), group ranges:
239.0.0.0/8
RP: 10.99.99.99, (0), uptime: 01:18:18, expires: 00:02:49,
priority: 0, RP-source: 10.2.2.3 (A), group ranges:
224.0.0.0/4
Finally, the configuration of NX-2 is to simply act as an Auto-RP listener
and forwarder. Example 13-60 shows the configuration, which allows NX-4
to receive the Auto-RP RP-discovery messages from NX-4 and NX-3.
feature pim
interface Vlan115
ip pim sparse-mode
interface Vlan116
ip pim sparse-mode
interface Vlan1101
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
NX-2# show run pim
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP RPA: 10.2.2.3, uptime: 00:07:29, expires: 00:02:25
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
RP: 10.3.3.3, (0),
uptime: 00:00:34 priority: 0,
RP-source: 10.2.2.3 (A),
group ranges:
239.0.0.0/8 , expires: 00:02:25 (A)
RP: 10.99.99.99, (0),
uptime: 00:02:59 priority: 0,
RP-source: 10.2.2.3 (A),
group ranges:
224.0.0.0/4 , expires: 00:02:25 (A)
Because the Auto-RP messages are bound by their configured TTL scope,
care must be taken to ensure that all RP-announce messages can reach all
mapping agents in the network. It is also important to ensure that the scope
of the RP-discovery messages is large enough for all routers in the PIM
domain to receive the messages. If multiple mapping agents exist and the
TTL is misconfigured, it is possible to have inconsistent RP-to-group
mapping throughout the PIM domain, depending on the proximity to the
mapping agent.
NX-OS provides a useful event-history for troubleshooting Auto-RP
message problems. The show ip pim internal event-history rp output is
provided from NX-4 in Example 13-61. The output is verbose, but it shows
that NX-4 elects itself as the mapping agent. An Auto-RP discovery
message is then sent out of each PIM-enabled interface. This output also
shows that Auto-RP messages are subject to passing an RPF check. If the
check fails, the message is discarded. Finally, an RP-announce message is
received from NX-3, resulting in the installation of a new PIM RP-to-group
mapping.
Value(G,M,C(i))=
feature pim
ip pim bsr rp-candidate loopback99 group-list 224.0.0.0/4 priority
0
ip pim ssm range 232.0.0.0/8
ip pim bsr listen forward
interface Vlan1101
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
feature pim
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
NX-4# show ip pim rp
PIM RP Status Information for VRF "default"
BSR: 10.2.2.3*, next Bootstrap message in: 00:00:53,
priority: 64, hash-length: 30
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
RP: 10.3.3.3, (0),
uptime: 06:30:36 priority: 0,
RP-source: 10.3.3.3 (B),
group ranges:
239.0.0.0/8 , expires: 00:02:11 (B)
RP: 10.99.99.99, (0),
uptime: 06:30:07 priority: 0,
RP-source: 10.99.99.99 (B),
group ranges:
224.0.0.0/4 , expires: 00:02:28 (B)
Example 13-64 shows the configuration of NX-3, which is configured to be
both a C-RP for 239.0.0.0/8 and a C-BSR. NX-3 has a lower C-BSR address
than NX-4, so it does not send any bootstrap messages after losing the BSR
election.
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback1
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
NX-3# show ip pim rp
PIM RP Status Information for VRF "default"
BSR: 10.2.2.3, uptime: 07:05:30, expires: 00:02:05,
priority: 64, hash-length: 30
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
RP: 10.3.3.3*, (0), uptime: 00:00:04, expires: 00:02:25,
priority: 0, RP-source: 10.2.2.3 (B), group ranges:
239.0.0.0/8
RP: 10.99.99.99, (0), uptime: 06:59:41, expires: 00:02:25,
priority: 0, RP-source: 10.2.2.3 (B), group ranges:
224.0.0.0/4
The final router to review is NX-2, which is acting only as a BSR listener
and forwarder. In this configuration, NX-2 receives the bootstrap message
from NX-4 and inspects its contents. It then selects the RP-to-group
mapping for each group range and installs the entry in the local RP cache.
Note that NX-4, NX-3, and NX-1 are BSR clients as well, but they are also
acting as C-RPs or C-BSRs. Example 13-65 shows the configuration and
RP mapping from NX-2.
feature pim
interface Vlan115
ip pim sparse-mode
interface Vlan116
ip pim sparse-mode
interface Vlan1101
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
NX-2# show ip pim rp
PIM RP Status Information for VRF "default"
BSR: 10.2.2.3, uptime: 07:11:35, expires: 00:01:39,
priority: 64, hash-length: 30
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None
RP: 10.3.3.3, (0),
uptime: 07:06:15 priority: 0,
RP-source: 10.2.2.3 (B),
group ranges:
239.0.0.0/8 , expires: 00:01:59 (B)
RP: 10.99.99.99, (0),
uptime: 07:05:47 priority: 0,
RP-source: 10.2.2.3 (B),
group ranges:
224.0.0.0/4 , expires: 00:01:59 (B)
Running both Auto-RP and BSR in the same PIM domain is not supported.
Auto-RP and BSR both are capable of providing dynamic and redundant RP
mapping to the network. If third-party vendor devices are also participating
in the PIM domain, BSR is the IETF standard choice and allows for
multivendor interoperability.
Anycast-RP Configuration and Verification
Redundancy is always a factor in modern network design. In a multicast
network, no single device is more important to the network overall than the
PIM RP. The previous section discussed Auto-RP and BSR, which provide
redundancy in exchange for additional complexity in the election processes
and the distribution of multicast group–to–RP mapping information in the
network.
Fortunately, another approach is available for administrators who favor the
simplicity of a static PIM RP but also desire RP redundancy. Anycast RP
configuration involves multiple PIM routers sharing a single common IP
address. The IP address is configured on a Loopback interface using a /32
mask. Each router that is configured with the anycast address advertises the
connected host address into the network’s chosen routing protocol. Each
router in the PIM domain is configured to use the anycast address as the
RP. When an FHR needs to register a source, the network’s unicast routing
protocol automatically routes the PIM message to the closest device
configured with the anycast address. This allows many devices to share the
load of PIM register messages and provides redundancy in the case of an
RP failure.
Obviously, intentionally configuring the same IP address on multiple
devices should be done with care. For example, any routing protocol or
management functions that could mistakenly use the anycast Loopback
address as a router-id or source address should be configured to always use
a different interface. With those caveats addressed, using an anycast
address is perfectly safe, and this is a popular option in large and
multiregional multicast networks.
Two methods are available for configuring anycast RP functionality:
1. Anycast RP with Multicast Source Discovery Protocol (MSDP)
2. PIM Anycast RP as specified in RFC 4610
This section examines both options.
Anycast RP with MSDP
The MSDP protocol defines a way for PIM RPs to advertise the knowledge
of registered, active sources to each other. Initially, MSDP was designed to
connect multiple independent PIM domains that each use their own PIM
RP together. However, the protocol was also chosen as an integral part of
the Anycast RP specification in RFC 3446.
MSDP allows each PIM RP configured with the Anycast RP address to act
independently, while still sharing active source information with all other
Anycast RPs in the domain. For example, in the topology in Figure 13-17,
an FHR can register a source for a multicast group with Anycast RP NX-3,
and then a receiver can join that group through Anycast RP NX-4. After
traffic is received through the RPT, normal PIM SPT switchover behavior
occurs on the LHR.
Anycast RP with MSDP requires that each Anycast RP have an MSDP peer
with every other Anycast RP. The MSDP peer session is established over
Transmission Control Protocol (TCP) port 639. When the TCP session is
established, MSDP can send keepalive and source-active (SA) messages
between peers, encoded in a TLV format.
When an Anycast RP learns of a new source, it uses the SA message to
inform all its MSDP peers about that source. The SA message contains the
following information:
Unicast address of the multicast source
Multicast group address
IP address of the PIM RP (originator-id)
When the peer receives the MSDP SA, it subjects the message to an RPF
check, which compares the IP address of the PIM RP in the SA message to
the MSDP peer address. This address must be a unique IP address on each
MSDP peer and cannot be an anycast address. NX-OS provides the ip msdp
originator-id [address] command to configure the originating RP address
that gets used in the SA message.
Note
Other considerations for the MSDP SA message RPF check are not
relevant to the MSDP example used in this chapter. Section 10 of RFC
3618 gives the full explanation of the MSDP SA message RPF check.
If the SA message is accepted, it is sent to all other MSDP peers except the
one from which the SA message was received. A concept called a mesh
group can be configured to reduce the SA message flooding when many
anycast RPs are configured with MSDP peering. The mesh group is a group
of MSDP peers that have an MSDP neighbor with every other mesh group
peer. Therefore, any SA message received from a mesh group peer does not
need to be forwarded to any peers in the mesh group because all peers
should have received the same message from the originator.
MSDP supports the use of SA filters, which can be used to enforce specific
design parameters through message filtering. SA filters are configured with
the ip msdp sa-policy [peer address] [route-map | prefix-list] command. It
is also possible to limit the total number of SA messages from a peer with
the ip msdp sa-limit [peer address] [number of SAs] command.
The example network in Figure 13-17 was configured with anycast RPs and
MSDP between NX-3 and NX-4. NX-3 and NX-4 are both configured with
the Anycast RP address of 10.99.99.99 on their Loopback99 interfaces. The
Loopback0 interface on NX-3 and NX-4 is used to establish the MSDP
peering. NX-1 and NX-2 are statically configured to use the anycast RP
address of 10.99.99.99.
The output of Example 13-69 shows the configuration for anycast RP with
MSDP from NX-3. As with PIM, before MSDP can be configured, the
feature must be enabled with the feature msdp command. The originator-
id and the MSDP connect source are both using the unique IP address
configured on interface Loopback0, while the PIM RP is configured to use
the anycast IP address of Loopback99. The MSDP peer address is the
Loopback0 interface of NX-4.
feature pim
ip pim rp-address 10.99.99.99 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
NX-3# show run msdp
! Output omitted for brevity
!Command: show running-config msdp
feature msdp
ip msdp originator-id loopback0
ip msdp peer 10.2.2.3 connect-source loopback0
NX-3# show run interface lo0 ; show run interface lo99
! Output omitted for brevity
interface loopback0
ip address 10.2.1.3/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
interface loopback99
ip address 10.99.99.99/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
feature pim
ip pim rp-address 10.99.99.99 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
NX-3# show run msdp
! Output omitted for brevity
!Command: show running-config msdp
feature msdp
ip msdp originator-id loopback0
ip msdp peer 10.2.1.3 connect-source loopback0
NX-3# show run interface lo0 ; show run interface lo99
! Output omitted for brevity
interface loopback0
ip address 10.2.2.3/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
!Command: show running-config interface loopback99
interface loopback99
ip address 10.99.99.99/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
After the configuration is applied, NX-3 and NX-4 establish the MSDP
peering session between their Loopback0 interfaces using TCP port 639.
The MSDP peering status can be confirmed with the show ip msdp peer
command (see Example 13-71). The output provides an overview of the
MSDP peer status and how long the peer has been established. It also lists
any configured SA policy filters or limits and provides counters for the
number of MSDP messages exchanged with the peer.
PIM Anycast RP
RFC 4610 specifies PIM anycast RP. The design goal of PIM anycast RP is
to remove the dependency on MSDP and to achieve anycast RP
functionality using only the PIM protocol. The benefit of this approach is
that the end-to-end process has one fewer control plane protocol and one
less point of failure or misconfiguration.
PIM anycast RP relies on the PIM register and register-stop messages
between the anycast RPs to achieve the same functionality that MSDP
provided previously. PIM anycast is designed around the following
requirements:
Each anycast RP is configured with the same anycast RP address.
Each anycast RP also has a unique address to use for PIM messages
between the anycast RPs.
Every anycast RP is configured with the addresses of all the other
anycast RPs.
The example network in Figure 13-18 helps in understanding PIM anycast
RP configuration and troubleshooting.
Figure 13-18 PIM Anycast RP
feature pim
ip pim rp-address 10.99.99.99 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
ip pim anycast-rp 10.99.99.99 10.1.1.1
ip pim anycast-rp 10.99.99.99 10.2.1.3
ip pim anycast-rp 10.99.99.99 10.2.2.3
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface loopback99
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
The same debugging methodology used for the PIM source registration
process can be applied to the PIM Anycast RP set. The show ip pim
internal event-history null-register and show ip pim internal event-
history data-header-register outputs provide a record of the messages
being exchanged between the Anycast-RP set and any FHRs that are
sending register messages to the device.
Example 13-75 shows the event-history output from NX-4. The null
register message from 10.115.1.254 is from NX-2, which is the FHR. After
adding the mroute entry, NX-4 forwards the register message to the other
members of the anycast RP set and then receives a register stop message in
response.
Example 13-75 PIM Null Register Event-History on NX-4
Click here to view code image
All examples in the PIM anycast RP section of this book used a static PIM
RP configuration. Using the PIM anycast RP functionality in combination
with Auto-RP or BSR is fully supported, for dynamic group-to-RP mapping
and to benefit from the advantages of anycast RP.
Note
SSM can natively join a source in another PIM domain because the
source address is known to the receiver. PIM ASM and BiDIR require
the use of additional protocols and configuration to enable
interdomain multicast to function.
SSM Configuration
The configuration for PIM SSM requires ip pim sparse-mode to be
configured on each interface participating in multicast forwarding. There is
no PIM RP to be defined, but any interface connected to a receiver must be
configured with ip igmp version 3. The ip pim ssm-range command is
configured by default to the IANA reserved range of 232.0.0.0/8.
Configuring a different range of addresses is supported, but care must be
taken to ensure that this is consistent throughout the PIM domain.
Otherwise, forwarding is broken because the misconfigured router assumes
that this is an ASM group and it does not have a valid PIM RP-to-group
mapping.
The ip igmp ssm-translate [group] [source] command is used to translate
an IGMPv1 or IGMPv2 membership report that does not contain a source
address to an IGMPv3-compatible state entry. This is not required if all
hosts attached to the interface support IGMPv3.
Example 13-76 shows the output of the complete SSM configuration for
NX-2.
feature pim
ip pim ssm range 232.0.0.0/8
interface Vlan115
ip pim sparse-mode
interface Vlan116
ip pim sparse-mode
interface Vlan1101
ip pim sparse-mode
interface Ethernet3/17
ip pim sparse-mode
interface Ethernet3/18
ip pim sparse-mode
ip igmp ssm-translate 232.1.1.1/32 10.215.1.1
NX-2# show run interface vlan115
interface Vlan115
no shutdown
no ip redirects
ip address 10.115.1.254/24
ip ospf passive-interface
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
ip igmp version 3
feature pim
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
interface Vlan215
no shutdown
no ip redirects
ip address 10.215.1.253/24
ip ospf passive-interface
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
ip igmp version 3
NX-1 and NX-3 are configured in a similar way. Because they do not play a
role in forwarding traffic in this example, the configuration is not shown.
SSM Verification
To verify the SPT used in SSM, it is best to begin at the LHR where the
receiver is attached. If the receiver sent an IGMPv3 membership report, an
(S, G) state is present on the LHR. If this entry is missing, check the host
for the proper configuration. SSM requires that the host have knowledge of
the source address, and it works correctly only when the host knows which
source to join, or when a correct translation is configured when the receiver
is not using IGMPv3.
If any doubt arises that the host is sending a correct membership report,
perform an Ethanalyzer capture on the LHR. In addition, the output of show
ip igmp groups and show ip igmp snooping groups can be used to
confirm that the interface has received a valid membership report. Example
13-78 shows this output from NX-4. Because this is IGMPv3 and NX-OS
uses an IP-based table, both the source and group information is present.
The PIM Join is received on NX-2, and the OIL of the mroute entry is
updated to include Ethernet 3/17, which is directly connected with NX-4.
Example 13-81 gives the event-history for PIM join-prune and the mroute
entry from NX-2.
Note
Although multicast source and receiver traffic is supported over vPC,
an L3 PIM neighbor from the vPC peers to a vPC-connected multicast
router is not yet supported.
vPC-Connected Source
The example network topology in Figure 13-20 illustrates the configuration
and verification of a vPC-connected multicast source.
Figure 13-20 vPC-Connected Source Topology
In Figure 13-20, the multicast sources are 10.215.1.1 in VLAN 215 and
10.216.1.1 in VLAN 216 for group 239.215.215.1. Both sources are
attached to L2 switch NX-6, which uses its local hash algorithm to choose a
member link to forward the traffic to. NX-3 and NX-4 are vPC peers and
act as FHRs for VLAN 215 and VLAN 216, which are trunked across the
vPC with NX-6.
The receiver is attached to VLAN 115 on NX-2, which is acting as the
LHR. The network was configured with a static PIM anycast RP of
10.99.99.99, which is Loopback 99 on NX-1 and NX-2.
When vPC is configured, no special configuration commands are required
for vPC and multicast to work together. Multicast forwarding is integrated
into the operation of vPC by default and is enabled automatically. CFS
handles IGMP synchronization, and PIM does not require the user to enable
any vPC-specific configuration beyond enabling ip pim sparse-mode on
the vPC VLAN interfaces.
Example 13-82 shows the PIM and vPC configuration for NX-4.
feature pim
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
feature vpc
vpc domain 2
peer-switch
peer-keepalive destination 10.33.33.1 source 10.33.33.2 vrf
peerKA
peer-gateway
interface port-channel1
vpc peer-link
interface port-channel2
vpc 2
Example 13-83 shows the PIM and vPC configuration on the vPC peer NX-
3.
feature pim
ip pim rp-address 10.99.99.99 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
interface Vlan215
ip pim sparse-mode
interface Vlan216
ip pim sparse-mode
interface Vlan303
ip pim sparse-mode
interface loopback0
ip pim sparse-mode
interface Ethernet3/28
ip pim sparse-mode
interface Ethernet3/29
ip pim sparse-mode
feature vpc
vpc domain 2
peer-switch
peer-keepalive destination 10.33.33.1 source 10.33.33.2 vrf
peerKA
peer-gateway
interface port-channel1
vpc peer-link
interface port-channel2
vpc 2
After implementing the configuration, the next step is to verify that PIM
and IGMP are operational on the vPC peers. The output of show ip pim
interface from NX-4 indicates that VLAN 215 is a vPC VLAN (see
Example 13-84). Note that NX-3 (10.215.1.254) is the PIM DR and handles
registration of the source with the PIM RP. PIM neighbor verification on
NX-3 and NX-4 for the non-vPC interfaces and for NX-1 and NX-2 is
identical to the previous examples shown in the PIM ASM section of this
chapter.
Identifying which device is acting as the PIM DR for the VLAN of interest
is important because this device is responsible for registering the source
with the RP, as with traditional PIM ASM. What differs in vPC for source
registration is the interface on which the DR receives the packets from the
source. Packets can arrive either directly on the vPC member link or from
the peer link. Packets are forwarded on the peer link because it is
programmed in IGMP snooping as an mrouter port (see Example 13-86).
Example 13-87 Multicast vPC Source MROUTE Entry on NX-3 and NX-4
Click here to view code image
NX-4# show ip mroute
! Output omitted for brevity
When the (S, G) mroutes are created on NX-3 and NX-4, both devices
realize that the sources are directly connected. Both devices then determine
the forwarder for each source. In this example, the sources are vPC
connected, which makes the forwarding state for both sources Win-force
(forwarding). The result of the forwarding election is found in the output of
show ip pim internal vpc rpf-source (see Example 13-88). This output
indicates which vPC peer is responsible for forwarding packets from a
particular source address. In this case, both are equal; because the source is
directly attached through vPC, both NX-3 and NX-4 are allowed to forward
packets in response to receiving a PIM join or IGMP membership report
message.
Example 13-88 PIM vPC RPF-Source Cache Table on NX-3 and NX-4
Click here to view code image
NX-4# show ip pim internal vpc rpf-source
! Output omitted for brevity
Source: 10.215.1.1
Pref/Metric: 0/0
Ref count: 1
In MRIB: yes
Is (*,G) rpf: no
Source role: primary
Forwarding state: Win-force (forwarding)
MRIB Forwarding state: forwarding
Source: 10.216.1.1
Pref/Metric: 0/0
Ref count: 1
In MRIB: yes
Is (*,G) rpf: no
Source role: primary
Forwarding state: Win-force (forwarding)
MRIB Forwarding state: forwarding
NX-3# show ip pim internal vpc rpf-source
! Output omitted for brevity
PIM vPC RPF-Source Cache for Context "default" - Chassis Role
Secondary
Source: 10.215.1.1
Pref/Metric: 0/0
Ref count: 1
In MRIB: yes
Is (*,G) rpf: no
Source role: secondary
Forwarding state: Win-force (forwarding)
MRIB Forwarding state: forwarding
Source: 10.216.1.1
Pref/Metric: 0/0
Ref count: 1
In MRIB: yes
Is (*,G) rpf: no
Source role: secondary
Forwarding state: Win-force (forwarding)
MRIB Forwarding state: forwarding
Note
The historical vPC RPF-Source Cache creation events are viewed in
the output of show ip pim internal event-history vpc.
NX-3 is the PIM DR for both VLAN 215 and VLAN 216 and is responsible
for registering the sources with the PIM RP (NX-1 and NX-2). NX-3 sends
PIM register messages to NX-1, as shown in the output of show ip pim
internal event-history null-register in Example 13-89. Because NX-1 is
part of an anycast RP set, it then forwards the register message to NX-2 and
sends a register-stop message to NX-3. At this point, both vPC peers have
an (S, G) for both sources, and both anycast RPs have an (S, G) state.
After the source has been registered with the RP, the receiver in VLAN 115
sends an IGMP membership report requesting all sources for group
239.215.215.1, which arrives at NX-2. NX-2 joins the RPT and then
initiates switchover to the SPT after the first packet arrives. NX-2 has two
equal-cost routes to reach the sources (see Example 13-90), and it choses to
join 10.215.1.1 through NX-3 and 10.216.1.1 through NX-4. NX-OS is
enabled for multipath multicast by default, which means it could send a
PIM join on either valid RPF interface toward the source when joining the
SPT.
Example 13-90 Unicast Routes from NX-2 for VLAN 215 and VLAN 216
Click here to view code image
NX-2# show ip route 10.215.1.0
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
Example 13-91 PIM SPT Joins from NX-2 for vPC-Connected Sources
Click here to view code image
Example 13-92 MROUTE Entries from NX-3 and NX-4 after SPT Join
Click here to view code image
Example 13-93 MROUTE Entries from NX-3 and NX-4 after IGMP Join
Click here to view code image
NX-3# show ip mroute
! Output omitted for brevity
(*, 239.215.215.1/32), uptime: 00:00:05, igmp pim ip
Incoming interface: Ethernet3/29, RPF nbr: 10.1.13.1
Outgoing interface list: (count: 1)
Vlan216, uptime: 00:00:05, igmp
We now have a (*, G) entry because the IGMP membership report was
received, and both (S, G) mroutes now contain VLAN 216 in the OIL. In
this scenario, packets are hashed by NX-6 from the source 10.215.1.1 to
NX-3. While the traffic is being received at NX-3, the following events
occur:
NX-3 forwards the packets across the peer link in VLAN 215.
NX-3 replicates the traffic and multicast-routes the packets from
VLAN 215 to VLAN 216, based on its mroute entry.
NX-3 sends packets toward the receiver in VLAN 216 on Port-channel
2 (vPC).
NX-4 receives the packets from NX-3 in VLAN 215 from the peer
link. NX-4 forwards the packets to any non-vPC receivers but does not
forward the packets out a vPC VLAN.
The (RPF) flag on the (10.216.1.1, 239.215.215.1) mroute entry signifies
that a source and receiver are in the same VLAN.
vPC-Connected Receiver
The same topology used to verify a vPC-connected source is reused to
understand how a vPC-connected receiver works. Although the location of
the source and receivers changed, the rest of the topology remains the same
(see Figure 13-21).
Figure 13-21 vPC-Connected Receiver Topology
Note
IGMP control plane packet activity is seen in the output of show ip
igmp snooping internal event-history vpc.
PIM joins are sent toward the RP from both NX-3 and NX-4, which can be
seen in the show ip pim internal event-history join-prune output of
Example 13-97.
Upon receiving the (*, G) join messages from NX-3 and NX-4, the mroute
entry on NX-2 is updated to include the Ethernet 3/17 and Ethernet 3/18
interfaces to NX-3 and NX-4 in the OIL. Traffic then is sent out on the
RPT.
As the traffic arrives on the RPT at NX-3 and NX-4, the source address of
the group traffic becomes known, which triggers the creation of the (S, G)
mroute entry. NX-3 and NX-4 then determine which device will act as the
forwarder for this source using CFS. The communication for the forwarder
election is viewed in the output of show ip pim internal event-history
vpc. Because both NX-3 and NX-4 have equal metrics and route preference
to the source, a tie occurs. However, because NX-4 is the vPC primary, it
wins over NX-3 and acts as the forwarder for 10.115.1.4.
After the election results are obtained, an entry is created in the vPC RPF-
Source cache, which is seen with the show ip pim internal vpc rpf-source
command. Example 13-98 contains the PIM vPC forwarding election
output from NX-4 and NX-3.
Source: 10.115.1.4
Pref/Metric: 110/44
Ref count: 1
In MRIB: yes
Is (*,G) rpf: no
Source role: primary
Forwarding state: Tie (forwarding)
MRIB Forwarding state: forwarding
NX-3# show ip pim internal vpc rpf-source
PIM vPC RPF-Source Cache for Context "default" - Chassis Role
Secondary
Source: 10.115.1.4
Pref/Metric: 110/44
Ref count: 1
In MRIB: yes
Is (*,G) rpf: no
Source role: secondary
Forwarding state: Tie (not forwarding)
MRIB Forwarding state: not forwarding
For this election process to work correctly, PIM must be registered with the
vPC manager process. This is indicated in the highlighted output of
Example 13-99.
With ip pim pre-build-spt, both NX-3 and NX-4 initiate (S, G) joins
toward NX-2 following the RPF path toward the source. However, because
NX-3 is not the forwarder, it simply discards the packets it receives on the
SPT. NX-4 forwards packets toward the vPC receiver and across the peer
link to NX-3.
Example 13-100 shows the (S, G) mroute state and resulting PIM SPT joins
from NX-3 and NX-4. Only NX-4 has an OIL containing VLAN 215 for the
(S, G) mroute entry.
More detail about the mroute state is seen in the output of the show
routing ip multicast source-tree detail command. This command
provides additional information that can be used for verification. The
output confirms that NX-4 is the RPF-Source Forwarder for this (S, G)
entry (see Example 13-101). NX-3 has the same OIL, but its status is set to
inactive, which indicates that it is not forwarding.
Ethanalyzer Examples
Various troubleshooting steps in this chapter have relied on the NX-OS
Ethanalyzer facility to capture control plane protocol messages. Table 13-
11 provides examples of Ethanalyzer protocol message captures for the
purposes of troubleshooting. In general, when performing an Ethanalyzer
capture, you must decide whether the packets should be displayed in the
session, decoded in the session, or written to a local file for offline
analysis. The basic syntax of the command is ethanalyzer local interface
[inband] capture-filter [filter-string in quotes] write [location:filename].
Many variations of the command exist, depending on which options are
desired.
Table 13-11 Example Ethanalyzer Captures
What Is Being Captured Ethanalyzer Capture Filter
Packets that are PIM and to/from host “pim && host 10.2.23.3”
10.2.23.3.
Unicast PIM packets such as register or “pim && not host 224.0.0.13”
candidate RP advertisement
MSDP messages from 10.1.1.1 “src host 10.1.1.1 && tcp port
639”
IGMP general query “igmp && host 224.0.0.1”
IGMP group specific query or report “igmp && host 239.115.115.1”
message
IGMP leave message “igmp && host 224.0.0.2”
Multcast data packets sent to the supervisor “src host 10.115.1.4 && dst
from 10.115.1.4 host 239.115.115.1”
References
RFC 1112, Host Extensions for IP Multicasting, S. Deering. IETF,
https://tools.ietf.org/html/rfc1112, August 1989.
RFC 2236, Internet Group Management Protocol, Version 2, W. Fenner.
IETF, https://tools.ietf.org/html/rfc2236, November 1997.
RFC 3376, Internet Group Management Protocol, Version 3, B. Cain, S.
Deering, I. Kouvelas, et al. IETF, https://www.ietf.org/rfc/rfc3376.txt,
October 2002.
RFC 3446, Anycast Rendezvous Point (RP) Mechanism Using Protocol
Independent Multicast (PIM) and Multicast Source Discovery Protocol
(MSDP). D. Kim, D. Meyer, H. Kilmer, D. Farinacci. IETF,
https://www.ietf.org/rfc/rfc3446.txt, January 2003.
RFC 3618, Multicast Source Discovery Protocol (MSDP). B. Fenner, D.
Meyer. IETF, https://www.ietf.org/rfc/rfc3618.txt, October 2003.
RFC 4541, Considerations for Internet Group Management Protocol
(IGMP) and Multicast Listener Discovery (MLD) Snooping Switches. M.
Christensen, K. Kimball, F. Solensky. IETF,
https://www.ietf.org/rfc/rfc4541.txt, May 2006.
RFC 4601, Protocol Independent Multicast–Sparse Mode (PIM-SM):
Protocol Specification (Revised). B. Fenner, M. Handley, H. Holbrook, I.
Kouvelas. IETF, https://www.ietf.org/rfc/rfc4601.txt, August 2006.
RFC 4607, Source-Specific Multicast for IP. H. Holbrook, B. Cain. IETF,
https://www.ietf.org/rfc/rfc4607.txt, August 2006.
RFC 4610, Anycast-RP Using Protocol Independent Multicast (PIM). D.
Farinacci, Y. Cai. IETF, https://www.ietf.org/rfc/rfc4610.txt, August 2006.
RFC 5015, Bidirectional Protocol Independent Multicast (BIDIR-PIM). M.
Handley, I. Kouvelas, T. Speakman, L. Vicisano. IETF,
https://www.ietf.org/rfc/rfc5015.txt, October 2007.
RFC 5059, Bootstrap Router (BSR) Mechanism for Protocol Independent
Multicast (PIM). N. Bhaskar, A. Gall, J. Lingard, S. Venaas. IETF,
https://www.ietf.org/rfc/rfc5059.txt, January 2008.
RFC 5771, IANA Guidelines for IPv4 Multicast Address Assignments. M.
Cotton, L. Vegoda, D. Meyer. IETF, https://tools.ietf.org/rfc/rfc5771.txt,
March 2010.
RFC 6166, A Registry for PIM Message Types. S. Venaas. IETF,
https://tools.ietf.org/rfc/rfc6166.txt, April 2011.
Cisco NX-OS Software Configuration Guides. http://www.cisco.com.
Doyle, Jeff, and Jennifer DeHaven Carroll. Routing TCP/IP, Volume II
(Indianapolis: Cisco Press, 2001).
Edgeworth, Brad, Aaron Foss, and Ramiro Garza Rios. IP Routing on Cisco
IOS, IOS XE and IOS XR (Indianapolis: Cisco Press, 2014).
Esau, Matt. “Troubleshooting NXOS Multicast” (Cisco Live: San
Francisco, 2014.)
Fuller, Ron, David Jansen, and Matthew McPherson. NX-OS and Cisco
Nexus Switching (Indianapolis: Cisco Press, 2013).
IPv4 Multicast Address Space Registry, Stig Venaas,
http://www.iana.org/assignments/multicast-addresses/multicast-
addressess.xhtml, October 2017.
Loveless, Josh, Ray Blair, and Arvind Durai. IP Multicast, Volume I: Cisco
IP Multicast Networking (Indianapolis: Cisco Press, 2016).
Part VI
Troubleshooting Nexus
Tunneling
Chapter 14
Troubleshooting Overlay
Transport Virtualization
(OTV)
VLANs are aggregated into a distribution switch and then fed into a
dedicated OTV VDC through a L2 trunk. Any traffic in a VLAN that needs
to reach the remote data center is switched to the OTV VDC where it gets
encapsulated by the edge device. The packet then traverses the routed VDC
as an L3 IP packet and gets routed toward the remote OTV edge device for
decapsulation. Traffic that requires L3 routing is fed from the L2
distribution to a routing VDC. The routing VDC typically has a First Hop
Redundancy Protocol (FHRP) like Hot Standby Router Protocol (HSRP) or
Virtual Router Redundancy Protocol (VRRP) to provide a default-gateway
address to the hosts in the attached VLANs and to perform Inter VLAN
routing.
Note
Configuring multiple VDCs may require the installation of additional
licenses, depending on the requirements of the deployment and the
number of VDCs.
OTV Terminology
An OTV network topology example is shown in Figure 14-1. There are two
data center sites connected by an L3 routed network that is enabled for IP
multicast. The L3 routed network must provide IP connectivity between the
OTV edge devices for OTV to function correctly. The placement of the ED
is flexible as long as the OTV ED receives L2 frames for the VLANs that
require extension across OTV. Usually the OTV ED is connected at the L2
and L3 boundary.
Data center 1 contains redundant OTV VDCs NX-2 and NX-4, which are
the edge devices. NX-1 and NX-3 perform the routing and L2 VLAN
aggregation and connect the access switch to the OTV VDC internal
interface. The OTV join interface is a Layer 3 interface connected to the
routing VDC. Data center 2 is configured as a mirror of Data center 1;
however, the port-channel 3 interface is used as the OTV internal interface
instead of the OTV join interface as in Data center 1. VLANs 100–110 are
being extended with OTV between the data centers across the overlay.
The OTV terminology introduced in Figure 14-1 is explained in Table 14-1.
Table 14-1 OTV Terminology
Term Definition
Internal Interface on the OTV edge device that connects to the local
Interface site. This interface provides a traditional L2 interface from
the ED to the internal network, and MAC addresses are
learned as traffic is received. The internal interface is an L2
trunk that carries the VLANs being extended by OTV.
Join Interface on the OTV edge device that connects to the L3
Interface routed network and used to source OTV encapsulated traffic.
It can be a Loopback, L3 point-to-point interface, or L3
Port-channel interface. Subinterfaces may also be used.
Multiple overlays can use the same join interface.
Overlay Interface on the OTV ED. The overlay interface is used to
Interface dynamically encapsulate the L2 traffic for an extended
VLAN in an IP packet for transport to a remote OTV site.
Multiple overlay interfaces are supported on an edge device.
Site VLAN A VLAN that exists in the local site that connects the OTV
edge devices at L2. The site VLAN is used to discover other
edge devices in the local site and allows them to form an
adjacency. After the adjacency is formed, the Authoritative
Edge Device (AED) for each VLAN is elected. The site
VLAN should be dedicated for OTV and not extended across
the overlay. The site VLAN should be the same VLAN
number at all OTV sites.
Site The site-id must be the same for all edge devices that are
Identifier part of the same site. Value ranges from 0x1 to 0xffffffff.
The site-id is advertised in IS-IS packets, and it allows edge
devices to identify which edge devices belong to the same
site. Edge devices form an adjacency on the overlay as well
as on the site VLAN (Dual adjacency). This allows the
adjacency between edge devices in a site to be maintained
even if the site VLAN adjacency gets broken due to a
connectivity problem. The overlay interface will not come
up until a site identifier is configured.
Site Formed across the site VLAN between OTV edge devices
Adjacency that are part of the same site. If an IS-IS Hello is received
from an OTV ED on the site VLAN with a different site-id
than the local router, the overlay is disabled. This is done to
prevent a loop between the OTV internal interface and the
overlay. This behavior is why it is recommended to make the
OTV internal VLAN the same at each site.
Deploying OTV
The configuration of the OTV edge device consists of the OTV internal
interface, the join interface, and the overlay virtual interface. Before
attempting to configure OTV, the capabilities of the transport network must
be understood, and it must be correctly configured to support the OTV
deployment model.
OTV Deployment Models
There are two OTV deployment models available, depending on the
capabilities of the transport network.
Multicast Enabled Transport: The control plane is encapsulated in
IP multicast packets. Allows for dynamic neighbor discovery by
having each OTV ED join the multicast control-group through the
transport. A single multicast packet is sent by the OTV ED, which gets
replicated along the multicast tree in the transport to each remote OTV
ED.
Adjacency Server Mode: Neighbors must be manually configured for
the overlay interface. Unicast control plane packets are created for
each individual neighbor and routed through the transport.
The OTV deployment model that is deployed should be decided during the
planning phase after verifying the capabilities of the transport network. If
multicast is supported in the transport, it is recommended to use the
multicast deployment model. If there is no multicast support available in
the transport network, use the adjacency server model.
The transport network must provide IP routed connectivity for unicast and
multicast communication between the OTV EDs. The unicast connectivity
requirements are achieved with any L3 routing protocol. If the OTV ED
does not form a dynamic routing adjacency with the data center, it must be
configured with static routes to reach the join interfaces of the other OTV
EDs.
Multicast routing in the transport must be configured to support Protocol
Independent Multicast (PIM). An Any Source Multicast (ASM) group is
used for the OTV control-group, and a range of PIM Source Specific
Multicast (SSM) groups are used for OTV data-groups. IGMPv3 should be
enabled on the join interface of the OTV ED.
Note
It is recommended to deploy PIM Rendezvous Point (RP) redundancy
in the transport network for resiliency.
OTV Site VLAN
Each OTV site should be configured with an OTV site VLAN. The site
VLAN should be trunked from the data center L2 switched network to the
OTV internal interface of each OTV ED. Although not required, it is
recommended to use the same VLAN at each OTV site in case the site
VLAN is accidentally leaked between OTV sites.
With the deployment model determined and the OTV VDC created with the
TRANSPORT_SERVICES_PKG license installed, the following steps are
used to enable OTV functionality. The following examples are based upon a
multicast enabled transport.
OTV Configuration
Before any OTV configuration is entered, the feature must be enabled with
the feature otv command. Example 14-1 shows the configuration
associated with the OTV internal interface, which is the L2 trunk port that
participates in traditional switching with the existing data center network.
The VLANs to be extended over OTV are VLAN 100–110. The site VLAN
for both data centers is VLAN 10, which is being trunked over the OTV
internal interface, along with VLANs 100–110.
vlan 1,10,100-110
interface Ethernet3/5
description To NX-1 3/19, OTV internal interface
switchport
switchport mode trunk
mtu 9216
no shutdown
interface port-channel3
description To NX-1 Po3, OTV Join interface
mtu 9216
ip address 10.1.12.1/24
ip router ospf 1 area 0.0.0.0
ip igmp version 3
interface Ethernet3/7
description To NX-1 Eth3/22, OTV Join interface
mtu 9216
channel-group 3 mode active
no shutdown
interface Ethernet3/8
description To NX-1 Eth3/23, OTV Join interface
mtu 9216
channel-group 3 mode active
no shutdown
Note
OTV encapsulation increases the size of L2 frames as they are
transported across the IP transport network. The considerations for
OTV MTU are further discussed later in this chapter.
With the OTV internal interface and join interface configured; the logical
interface referred to as the overlay interface can now be configured and
bound to the join interface. The overlay interface is used to dynamically
encapsulate VLAN traffic between OTV sites. The number assigned to the
overlay interface must be the same on all OTV EDs participating in the
overlay. It is possible for multiple overlay interfaces to exist on the same
OTV ED, but the VLANs extended on each overlay must not overlap.
The OTV site VLAN is used to form a site adjacency with any other OTV
EDs located in the same site. Even for a single OTV ED site, the site VLAN
must be configured for the overlay interface to come up. Although not
required, it is recommended that the same site VLAN be configured at each
OTV site. This is to allow OTV to detect if OTV sites become merged,
either on purpose or in error. The site VLAN should not be included in the
OTV extended VLAN list. The site identifier should be configured to the
same value for all OTV EDs that belong to the same site. The otv join-
interface [interface] command is used to bind the overlay interface to the
join interface. The join interface is used to send and receive the OTV
multicast control plane messaging used to form adjacencies and learn MAC
addresses from other OTV EDs.
Because this configuration is utilizing a multicast capable transport
network, the otv control-group [group number] is used to declare which IP
PIM ASM group will be used for the OTV control plane group. The control
plane group will carry OTV control plane traffic such as IS-IS hellos across
the transport and allow the OTV EDs to communicate. The group number
should match on all OTV EDs and must be multicast routed in the transport
network. Each OTV ED acts as both a source and receiver for this multicast
group.
The otv data-group [group number] is used to configure which Source
Specific Multicast (SSM) groups are used to carry multicast data traffic
across the overlay. This group is used to transport multicast traffic within a
VLAN across the OTV overlay between sites. The number of multicast
groups included in the data-group is a balance between optimization and
scalability. If a single group is used, all OTV EDs receive all multicast
traffic on the overlay, even if there is no receiver at the site. If a large
number of groups is defined, multicast traffic can be forwarded optimally,
but the number of groups present in the transport network could become a
scalability concern. Presently, 256 multicast data groups are supported for
OTV.
After the configuration is completed, the Overlay0 interface must be no
shutdown. OTV adjacencies will then form between the OTV EDs, provided
the underlay network has been properly configured for both unicast and
multicast routing. Example 14-3 contains the configuration for interface
Overlay0 on NX-2 as well as the site-VLAN and site-identifier
configurations.
otv site-vlan 10
interface Overlay0
description Site A
otv join-interface port-channel3
otv control-group 239.12.12.12
otv data-group 232.1.1.0/24
otv extend-vlan 100-110
no shutdown
Note
If multihoming is planned for the deployment, it is recommended to
first enable a single OTV ED at each site. After the OTV functionality
has been verified, the second OTV ED can be enabled. This phased
approach is recommended to allow for simplified troubleshooting if a
problem occurs.
Note
The behavior of forming Dual Adjacencies on the site VLAN and the
overlay began with NX-OS release 5.2(1). Prior to this, OTV EDs in a
site only formed site adjacencies.
The IS-IS protocol used by OTV does not require any user configuration for
basic functionality. When OTV is configured the IS-IS process gets enabled
and configured automatically. Adjacencies form provided that the
underlying transport is functional and the configured parameters for the
overlay are compatible between OTV EDs.
The IS-IS control plane is fundamental to the operation of OTV. It provides
the mechanism to discover both local and remote OTV EDs, form
adjacencies, and exchange MAC address reachability between sites. MAC
address advertisements are learned through the IS-IS control plane. An SPF
calculation is performed, and then the OTV MAC routing table is populated
based on the result. When investigating a MAC address reachability issue,
the advertisement is tracked through the OTV control plane to ensure that
the ED has the correct information from all IS-IS neighbors. If a host-to-
host reachability problem exists across the overlay, it is recommended to
begin the investigation with a validation of the control plane configuration
and operational state before moving into the data plane.
Overlay-Interface Overlay0 :
The output of the show otv site command, as shown in Example 14-6, is
used to verify the site adjacency. The adjacency with NX-4 is in the Full
state, which indicates that both the overlay and site adjacencies are
functional (Dual Adjacency).
------------------------------------------------------------------
-------------
NX-4 64a0.e73e.12c2 Full 13:50:52 Yes
Examples 14-5 and 14-6 show a different adjacency uptime for the site and
overlay adjacencies because these are independent IS-IS interfaces, and the
adjacencies form independently of each other. The site-id for an IS-IS
neighbor is found in the output of show otv internal adjacency, as shown
in Example 14-7. This provides information about which OTV EDs are part
of the same site.
Overlay-Interface Overlay0 :
System-ID Dest Addr Adj-State TM_State Adj-State inAS Site-
ID
Version
64a0.e73e.12c2
10.1.22.1 default default UP UP 0000.0000.0001*
HW-St: Default N backup (null)
64a0.e73e.12c4
10.2.43.1 default default UP UP 0000.0000.0002*
HW-St: Default N backup (null)
6c9c.ed4d.d944
10.2.34.1 default default UP UP 0000.0000.0002*
HW-St: Default N backup (null)
Note
OTV has several event-history logs that are useful for troubleshooting.
The show otv isis internal event-history adjacency command is used
to review recent adjacency changes.
------------------------------------------------------------------
-------------
Interface Status IP Address Encap type MTU
------------------------------------------------------------------
-------------
Tunnel16384 up -- GRE/IP 9178
Tunnel16385 up -- GRE/IP 9178
Tunnel16386 up -- GRE/IP 9178
When the OTV Adjacencies are established, the AED role is determined for
each VLAN that is extended across the overlay using a hash function. The
OTV IS-IS system-id is used along with the VLAN identifier to determine
the AED role for each VLAN based on an ordinal value. The device with
the lower system-id becomes AED for the even-numbered VLANs, and the
device with the higher system-id becomes AED for the odd numbered
VLANs.
The show otv vlan command from NX-2 is shown in Example 14-10. The
VLAN state column lists the current state as Active or Inactive. An Active
state indicates this OTV ED is the AED for that VLAN and is responsible
for forwarding packets across the overlay and advertising MAC address
reachability for the VLAN. This is an important piece of information to
know when troubleshooting to ensure the correct device is being
investigated for a particular VLAN.
Legend:
(NA) - Non AED, (VD) - Vlan Disabled, (OD) - Overlay Down
(DH) - Delete Holddown, (HW) - HW: State Down
(NFC) - Not Forward Capable
The presence of a (*, G) from IGMP for a group indicates that at minimum
an IGMP join message was received by the router, and there is at least one
interested receiver on that interface. A PIM join message is sent toward the
PIM RP from the last hop router, and the (*, G) join state should be present
along the multicast tree to the PIM RP. When a data packet for the group is
received on the shared tree by the last hop router, in this case NX-1, a PIM
(S, G) join message is sent toward the source. This messaging forms what
is called the source tree, which is built to the first-hop router connected to
the source. The source tree remains in place as long as the receiver is still
interested in the group.
Example 14-12 shows how to verify the receipt of traffic with the show ip
mroute summary command, which provides packet counters and bit-rate
values for each source.
Example 14-12 Verify the Current Bit-Rate of the OTV Control-Group
Click here to view code image
Because IS-IS adjacency failures for the overlay are often caused by
multicast packet delivery problems in the transport, it is important to
understand what the multicast state on each router is indicating. The
multicast role of each transport router must also be understood to provide
context to the multicast routing table state. For example, is the device a
first-hop router (FHR), PIM RP, transit router, or last-hop router (LHR)? In
the network example, NX-1 is a PIM LHR, FHR, and RP for the control-
group.
If NX-1 had no multicast state for the OTV control-group, it indicates that
the IGMP join has not been received from NX-2. Because NX-1 is also a
PIM RP for this group, it also indicates that none of the sources have been
registered. If a (*, G) was present, but no (S, G), it indicates that the IGMP
join was received from NX-2, but multicast data traffic from NX-4, NX-6,
or NX-8 was not received by NX-1; therefore, the switchover to the source
tree did not happen. At that point, troubleshooting moves toward the source
and first-hop routers until the cause of the multicast problem is identified.
Note
Multicast troubleshooting is covered in Chapter 13, “Troubleshooting
Multicast.”
The site adjacency is formed across the site VLAN. There must be
connectivity between the OTV ED’s internal interface across the data center
network for the IS-IS adjacency to form successfully. Example 14-13
contains the output of show otv site where the site adjacency is down, as
indicated by the Partial state because the overlay adjacency with NX-4 is
UP.
Overlay-Interface Overlay0 :
Hostname System-ID Dest Addr Up Time State
NX-4 64a0.e73e.12c2 10.1.22.1 00:01:57 UP
NX-8 64a0.e73e.12c4 10.2.43.1 00:01:57 UP
NX-6 6c9c.ed4d.d944 10.2.34.1 00:02:09 UP
The show otv isis site output confirms that the adjacency was lost on the
site VLAN as shown in Example 14-14.
BFD: Disabled
The IS-IS adjacency being down indicates that IS-IS hellos (IIH Packets)
are not being exchanged properly on the site VLAN. The transmit and
receipt of IIH packets is recorded in the output of show otv isis internal
event-history iih. Example 14-15 confirms that IIH packets are being sent,
but none are being received across the site VLAN.
This event-history log confirms that the IIH packets are created, and the
process is sending them out to the site VLAN. The same event-history can
be checked on NX-4 to verify if the IIH packets are received. The output
from NX-4 is shown in Example 14-16, which indicates the IIH packets are
being sent, but none are received from NX-2.
The output in Example 14-15 and Example 14-16 confirms that both NX-2
and NX-4 are sending IS-IS IIH hellos to the site VLAN, but neither side is
receiving packets from the other OTV ED. At this point of the
investigation, troubleshooting should follow the VLAN across the L2 data
center infrastructure to confirm the VLAN is properly configured and
trunked between NX-2 and NX-4. In this case, a problem was identified on
NX-3 where the site VLAN, VLAN 10, was not being trunked across the
vPC peer-link. This resulted in a Bridge Assurance inconsistency problem
over the peer-link, as shown in the output of Example 14-17.
After correcting the trunked VLAN configuration of the vPC peer-link, the
OTV site adjacency came up on the site VLAN, and the dual adjacency
state was returned to FULL. The adjacency transitions are viewed in the
output of show otv isis internal event-history adjacency as shown in
Example 14-18.
The LSP lifetime shows that LSPs are only a few seconds old because the
Lifetime counts from 1200 to zero. Issuing the command a few times may
also show the Seq Number field incrementing, which indicates that the LSP
is being updated by the originating IS-IS neighbor with changed
information. This could cause OTV MAC routes to be refreshed and
reinstalled as the SPF algorithm executes constantly. LSPs may refresh and
get updated as part of normal IS-IS operation, but in this case the updates
are happening constantly, which is abnormal in a steady-state.
To investigate the problem, check the LSP contents for changes over time.
To understand which OTV ED is advertising which LSP, check the
hostname to system-id mapping. The Hostname TLV provides a way to
dynamically learn the system-id to hostname mapping for a neighbor. To
identify which IS-IS database entries belong to which neighbors, use the
show otv isis hostname command, as shown in Example 14-20. The
asterisk (*) indicates the local system-id.
To determine what information is changing in the LSP, use the NX-OS diff
utility. As shown in Example 14-22, the diff utility reveals that the
Sequence Number is updated, and the LSP Lifetime has refreshed again to
1198. The changing LSP contents are related to HSRP MAC addresses in
several VLANs extended by OTV.
It is now apparent what is changing in the LSPs and why the lifetime is
continually resetting to 1200. The metric is changing from zero to one.
The next step is to further investigate the problem at the remote AED that
is originating the MAC advertisements across the overlay. In this particular
case, the problem is caused by an incorrect configuration. The HSRP MAC
addresses are being advertised across the overlay through OTV incorrectly.
The HSRP MAC should be blocked using the First Hop Routing Protocol
(FHRP) localization filter, as described later in this chapter, but instead it
was advertised across the overlay resulting in the observed instability.
The previous example demonstrated a problem with the receipt of a MAC
advertisement from a remote OTV ED. If a problem existed with MAC
addresses not being advertised out to other OTV EDs from the local AED,
the first step is to verify that OTV is passing the MAC addresses into IS-IS
for advertisement. The show otv isis mac redistribute route command
shown in Example 14-25 is used to verify that MAC addresses were passed
to IS-IS for advertisement to other OTV EDs.
Example 14-25 MAC Address Redistribution into OTV IS-IS
Click here to view code image
0101-64a0.e73e.12c1, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0101-6c9c.ed4d.d941, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0101-c464.135c.6600, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0103-64a0.e73e.12c1, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0103-6c9c.ed4d.d941, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0105-64a0.e73e.12c1, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0105-6c9c.ed4d.d941, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0107-64a0.e73e.12c1, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0109-64a0.e73e.12c1, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
0109-6c9c.ed4d.d941, all
Advertised into L1, metric 1 LSP-ID 6c9c.ed4d.d942.00-00
The integrity of the IS-IS LSP is a critical requirement for the reliability
and stability of the OTV control plane. Packet corruption problems or loss
in the transport can affect both OTV IS-IS adjacencies as well as the
advertisement of LSPs. Separate IS-IS statistics are available for the
overlay and site VLAN, as shown in Examples 14-26 and 14-27, which
provide valuable clues when troubleshooting an adjacency or LSP issue.
SPF calculations: 0
LSPs sourced: 2
LSPs refreshed: 13
LSPs purged: 0
otv site-vlan 10
key chain OTV-CHAIN
key 0
key-string 7 073c046f7c2c2d
interface Overlay0
description Site A
otv isis authentication-type md5
otv isis authentication key-chain OTV-CHAIN
otv join-interface port-channel3
otv control-group 239.12.12.12
otv data-group 232.1.1.0/24
otv extend-vlan 100-110
no shutdown
otv-isis default
otv site-identifier 0x1
OTV IS-IS authentication is enabled as verified with the show otv isis
interface overlay [overlay-number] output in Example 14-29.
The output of show otv adjacency and show otv site varies depending on
which adjacencies are down. The authentication configuration is applied
only to the overlay interface, so it is possible the site adjacency is up even
if one OTV ED at a site has authentication misconfigured for the overlay.
Example 14-31 shows that the overlay adjacency is down, but the site
adjacency is still valid. In this scenario, the state is shown as Partial.
interface Overlay0
otv join-interface port-channel3
otv extend-vlan 100-110
otv use-adjacency-server 10.1.12.1 unicast-only
no shutdown
otv site-identifier 0x1
Example 14-33 shows the configuration for NX-2, which is now acting as
the adjacency server. When configuring an OTV ED in adjacency server
mode, the otv control-group [multicast group] and otv data-group
[multicast-group] configuration on each OTV ED shown in the previous
examples must be removed. The otv use-adjacency-server [IP address] is
then configured to enable OTV adjacency server mode and the otv
adjacency-server unicast-only command specifies that NX-2 will be the
adjacency server. The join interface and internal interface configurations
remain unchanged from the previous examples in this chapter.
interface port-channel3
description 7009A-Main-OTV Join
mtu 9216
ip address 10.1.12.1/24
ip router ospf 1 area 0.0.0.0
ip igmp version 3
interface Overlay0
description Site A
otv join-interface port-channel3
otv extend-vlan 100-110
otv use-adjacency-server 10.1.12.1 unicast-only
otv adjacency-server unicast-only
no shutdown
otv site-identifier 0x1
Dynamically advertising a list of known OTV EDs saves the user from
having to configure every OTV ED with all other OTV ED addresses to
establish adjacencies. The process of registration with the adjacency server
and advertisement of the OTV Neighbor List is shown in Figure 14-4. The
site adjacency is still present but not shown in the figure for clarity.
Figure 14-4 OTV EDs Register with the Adjacency Server
After the OTV Neighbor List (oNL) is built, it is advertised to each OTV
ED from the adjacency server as shown in Figure 14-5.
Figure 14-5 OTV Adjacency Server Advertises the Neighbor List
Each OTV ED then establishes IS-IS adjacencies with all other OTV EDs.
Updates are sent with OTV encapsulation in IP unicast packets from each
OTV ED. Each OTV ED must replicate its message to all other neighbors.
This step is shown in Figure 14-6.
Figure 14-6 OTV IS-IS Hellos in Adjacency Server Mode
Example 14-34 contains the output of show otv adjacency from NX-4.
After receiving the OTV Neighbor List from the adjacency Server, IS-IS
adjacencies are formed with all other OTV EDs.
Overlay-Interface Overlay0 :
Hostname System-ID Dest Addr Up Time State
NX-8 64a0.e73e.12c4 10.2.43.1 00:20:35 UP
NX-2 6c9c.ed4d.d942 10.1.12.1 00:20:35 UP
NX-6 6c9c.ed4d.d944 10.2.34.1 00:20:35 UP
An OTV IS-IS site adjacency is still formed across the site VLAN, as
shown in the output of show otv site in Example 14-35.
------------------------------------------------------------------
-------------
NX-2 6c9c.ed4d.d942 Full 00:42:04 Yes
When a frame arrives on the internal interface, a series of lookups are used
to determine how to rewrite the packet for transport across the overlay. The
original payload, ethertype, source MAC address, and destination MAC
address are copied into the new OTV Encapsulated frame. The 802.1Q
header is removed, and an OTV SHIM header is inserted. The SHIM header
contains information about the VLAN and the overlay it belongs to. This
field in OTV 1.0 is actually an MPLS-in-GRE encapsulation, where the
MPLS label is used to derive the VLAN. The value of the MPLS label is
equal to 32 + VLAN identifier. For this example, VLAN 101 is
encapsulated as MPLS label 133. The outer IP header is added, which
contains the source IP address of the local OTV ED and the destination IP
address of the remote OTV ED.
Control plane IS-IS frames are encapsulated in a similar manner between
OTV EDs across the overlay and also carry the same 42 bytes of OTV
Overhead. The MPLS label used for IS-IS control plane frames is the
reserved label 1, which is the Router Alert label.
Note
If a packet capture is taken in the transport, OTV 1.0 encapsulation is
decoded as MPLS Pseudowire with no control-word using analysis
tools, such as Wireshark. Unfortunately, at the time of this writing,
Wireshark is not able to decode all the IS-IS PDUs used by OTV.
Ethernet Frames arriving from the OTV internal interface have the original
payload, ethertype, 802.1Q header, source MAC address, and destination
MAC address copied into the new OTV 2.5 Encapsulated frame. The OTV
2.5 encapsulation uses the same packet format as Virtual Extensible LAN
(VxLAN), which is detailed in RFC 7348.
The OTV SHIM header contains information about the Instance and
Overlay. The instance is the table identifier that should be used at the
destination OTV ED to lookup the destination, and the overlay identifier is
used by the control plane packets to identify packets belonging to a specific
overlay. A control plane packet has the VxLAN Network ID (VNI) bit set to
False (zero), while an encapsulated data frame has this value set to True
(one). The UDP header contains a variable source port and destination port
of 8472.
Fragmentation of OTV frames containing data packets becomes a concern
if the transport MTU is not at least 1550 bytes with OTV 2.5, or 1542 bytes
with OTV 1.0. This is based on the assumption that a host in the data center
has an interface MTU of 1500 bytes and attempts to send full MTU sized
frames. When the OTV encapsulation is added, the packet no longer fits
into the available MTU size.
The minimum transport MTU requirement for control plane packets is
either 1442 for multicast transport, or 1450 for unicast transport in
adjacency server mode. OTV sets the Don’t Fragment bit in the outer IP
header to ensure that no OTV control plane or data plane packets become
fragmented in the transport network. If MTU restrictions exist, it could
result in OTV IS-IS adjacencies not forming, or the loss of frames for data
traffic when the encapsulated frame size exceeds the transport MTU.
Note
The OTV encapsulation format must be the same between all sites
(GRE or UDP) and is configured with the global configuration
command otv encapsulation-format ip [gre | udp].
Note
If multiple OTV EDs exist at a site, only the AED forwards packets
onto the overlay, including ARP request and replies. The AED is also
responsible for advertising MAC address reachability to other OTV
EDs through the IS-IS control plane.
Note
The ARP-ND-Cache is enabled by default. In some environments with
a lot of ARP activity, it may cause the CPU of the OTV ED to become
high or experience CoPP drops because the supervisor CPU must
handle the ARP traffic to create the cache entries.
Broadcasts
Broadcast frames received by an OTV ED on the internal interface are
forwarded across the overlay by the AED for the extended VLAN.
Broadcast frames, such as ARP request, are encapsulated into an L3
multicast packet where the source address is the local OTV EDs join
interface, and the group is the OTV Control-group address. The multicast
packet is sent to the transport where it gets replicated to each remote OTV
ED that has joined the control-group.
When using a multicast enabled transport, OTV allows for the
configuration of a dedicated otv broadcast-group, as shown in Example 14-
38. This allows the operator to separate the OTV control-group from the
broadcast group for easier troubleshooting and to allow different handling
of the packets based on group address. For example, a different PIM
rendezvous point could be defined for each group, or a different Quality of
Service (QoS) treatment could be applied to the control-group and
broadcast-group in the transport.
feature otv
otv site-identifier 0x1
otv flood mac C464.135C.6600 vlan 101
The result of adding this command is a static OTV route entry for the
VLAN, which causes traffic to flow across the overlay, as shown in
Example 14-40.
Traffic from Host A is first sent to the L2 switch where it has an 802.1Q
VLAN tag added for VLAN 103. The frames follow the MAC address table
entries at the L2 switch across the trunk port to reach NX-2 on the OTV
internal interface Ethernet3/5. When the packets arrive at NX-2, it
performs a MAC address table lookup in the VLAN to determine how to
reach Host C’s MAC address 442b.03ec.cb00. The MAC address table of
NX-2 is shown in Example 14-41.
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC
Peer-Link, E - EVPN entry
(T) - True, (F) - False , ~~~ - use 'hardware-age' keyword to
retrieve age info
VLAN/BD MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+-------
-----------
* 103 0000.0c07.ac67 dynamic ~~~ F F Eth3/5
O 103 442b.03ec.cb00 dynamic - F F Overlay0
* 103 64a0.e73e.12c1 dynamic ~~~ F F Eth3/5
O 103 64a0.e73e.12c3 dynamic - F F Overlay0
O 103 6c9c.ed4d.d943 dynamic - F F Overlay0
* 103 c464.135c.6600 dynamic ~~~ F F Eth3/5
The MAC address table indicates that Host C’s MAC is reachable across
the overlay, which means that the OTV MAC Routing table (ORIB) should
be used to obtain the IP next-hop and encapsulation details. The ORIB
indicates how to reach the remote OTV ED that advertised the MAC
address to NX-2 via IS-IS, which is NX-6 in this example.
Note
If multiple OTV EDs exist at a site, ensure the data path is being
followed to the AED for the VLAN. This is verified with the show otv
vlan command. Under normal conditions the MAC forwarding entries
across the L2 network should lead to the AED’s internal interface.
NX-2 is the AED for VLAN103 as shown in Example 14-42.
Legend:
(NA) - Non AED, (VD) - Vlan Disabled, (OD) - Overlay Down
(DH) - Delete Holddown, (HW) - HW: State Down
(NFC) - Not Forward Capable
VLAN Auth. Edge Device Vlan State Overlay
---- ----------------------------------- ---------------------
- -------
After verifying the AED state for VLAN 103 to ensure you are looking at
the correct device, check the ORIB to determine which remote OTV ED
will receive the encapsulated frame from NX-2. The ORIB for NX-2 is
shown in Example 14-43.
Recall that the ORIB data is populated by the IS-IS LSP received from NX-
6, which indicates MAC address 442b.03ec.cb00 is an attached host. This is
confirmed by obtaining the system-id of NX-6 in show otv adjacency, and
then finding the correct LSP in the output of show otv isis database detail.
At the AED originating the advertisement, the redistribution from the local
MAC table into OTV IS-IS is verified on NX-6 using the show otv isis
redistribute route command, which is shown in Example 14-44.
At this point, it has been confirmed that NX-6 is the correct remote OTV
ED to receive frames with a destination MAC address of 442b.03ec.cb00 in
VLAN 103. The next step in delivering the packet to Host C is for NX-2 to
rewrite the packet to impose the OTV header and send the encapsulated
frame into the transport network from the join interface.
OTV uses either UDP or GRE encapsulation, and in this example the
default GRE encapsulation is being used. There is a point-to-point tunnel
created dynamically for each remote OTV ED that has formed an adjacency
with the local OTV ED. These tunnels are viewed with show tunnel
internal implicit otv detail, as shown in Example 14-45.
Note
The verification of the packet rewrite details in hardware varies
depending on the type of forwarding engine present in the line card.
Verify the adjacencies, MAC address table, ORIB, and tunnel state
before suspecting a hardware programming problem. If connectivity
fails despite correct control plane programming, and MAC addresses
are learned, engage the Cisco TAC for support.
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O -
Overlay MAC
age - seconds since last seen,+ - primary entry using vPC
Peer-Link, E - EVPN entry
(T) - True, (F) - False , ~~~ - use 'hardware-age' keyword to
retrieve age info
VLAN/BD MAC Address Type age Secure NTFY
Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+-------
-----------
* 103 0000.0c07.ac67 dynamic ~~~ F F Po3
* 103 442b.03ec.cb00 dynamic ~~~ F F Po3
O 103 64a0.e73e.12c1 dynamic - F F Overlay0
* 103 64a0.e73e.12c3 dynamic ~~~ F F Po3
* 103 6c9c.ed4d.d943 dynamic ~~~ F F Po3
O 103 c464.135c.6600 dynamic - F F Overlay0
The frame exits Port-channel 3 on the L2 trunk with a VLAN tag of 103.
The L2 switch in data center 2 receives the frame and performs a MAC
address table lookup to find the port where Host C is connected and
delivers the frame to its destination.
Note
Troubleshooting unicast data traffic when using the adjacency server
mode follows the same methodology used for a multicast enabled
transport. The difference is that any control plane messages are
exchanged between OTV EDs using a unicast encapsulation method
and replicated by the advertising OTV ED to all adjacent OTV EDs.
The host-to-host data traffic is still MAC-in-IP unicast encapsulated
from source OTV ED to the destination OTV ED.
OTV Multicast Traffic with a Multicast Enabled Transport
OTV provides support for multicast traffic to be forwarded across the
overlay in a seamless manner. The source and receiver hosts do not need to
modify their behavior to exchange L2 multicast traffic across an OTV
network between sites.
In a traditional L2 switched network, the receiver host sends an Internet
Group Management Protocol (IGMP) membership report to indicate
interest in receiving the traffic. The L2 switch is typically enabled for
IGMP snooping, which listens for these membership reports to optimize
flooding of multicast traffic to only the ports where there are interested
receivers.
IGMP snooping must also learn where multicast routers (mrouters) are
connected. Any multicast traffic must be forwarded to an mrouter so that
interested receivers on other L3 networks can receive it. The mrouter is
also responsible for registering the source with a rendezvous point if PIM
ASM is being used. IGMP snooping discovers mrouters by listening for
Protocol Independent Multicast (PIM) hello messages, which indicate an
L3 capable mrouter is present on that port. The L2 forwarding table is then
updated to send all multicast group traffic to the mrouter, as well as any
interested receivers. OTV EDs use a dummy PIM Hello message to draw
multicast traffic and IGMP membership reports to the OTV ED’s internal
interface.
OTV maintains its own mroute table for multicast forwarding just as it
maintains an OTV routing table for unicast forwarding. There are three
types of OTV mroute entries, which are described as VLAN, Source, and
Group. The purpose of each type is detailed in Table 14-2.
Table 14-2 OTV MROUTE Types
Type Definition
The OTV IS-IS control plane protocol is utilized to allow hosts to send and
receive multicast traffic within an extended VLAN between sites without
the need to send IGMP messages across the overlay. Figure 14-11 shows a
simple OTV topology where Host A is a multicast source for group
239.100.100.100, and Host C is a multicast receiver. Both Host A and Host
C belong to VLAN 103.
The delivery group must be coordinated with the L3 transport to ensure that
PIM SSM is supported and that the correct range of groups are defined for
use as SSM groups. Each OTV ED is configured with the same range of otv
data-groups, and each OTV ED can be a source for the SSM group. Remote
OTV EDs join the SSM group in the transport to receive multicast frames
from a particular OTV ED acting as source. The signaling of which SSM
group to use is accomplished with IS-IS advertisements between OTV EDs
to allow for discovery of active sources and receivers at each site.
The site group is the multicast group that is being transported across the
overlay using the delivery group. In Figure 14-11, the site group is
239.100.100.100 sourced by Host A and received by Host C. Essentially,
OTV is using a multicast-in-multicast OTV encapsulation scheme to send
the site group across the overlay using the delivery group in the transport
network.
Troubleshooting is simplified by splitting the end-to-end packet delivery
mechanism into two distinct layers of focus: the site group and the delivery
group. At the source end, the site group troubleshooting is focused on
ensuring that multicast data frames from the source are arriving at the
internal interface of the AED for the VLAN. At the receiving site, site
group troubleshooting must verify that a receiver has expressed interest in
the group by sending an IGMP membership report. IGMP snooping must
have the correct ports to reach the receivers from the OTV AEDs internal
interface, through any L2 switches in the path. In the transport network, the
delivery group must be functional so that any OTV ED acting as a source
host successfully sends the multicast-in-multicast OTV traffic into the
transport for replication and delivery to the correct OTV ED receivers.
For multicast sent by Host A to be successfully received by Host C, some
prerequisite steps must occur. The OTV AED’s internal interface must be
seen by the L2 switch as an mrouter port. This is required so that any IGMP
membership reports from a receiver are sent to the AED, and any multicast
traffic is also flooded to the AED’s OTV internal interface. To achieve this,
OTV sends a dummy PIM hello with a source IP address of 0.0.0.0 on the
internal interface for each VLAN extended by OTV. The purpose is not to
form a PIM neighbor on the VLAN, but to force the detection of an mrouter
port by any attached L2 switch, as depicted in Figure 14-12.
Type: IP (0x0800)
Internet Protocol Version 4, Src: 0.0.0.0 (0.0.0.0),Dst:
224.0.0.13 (224.0.0.13)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector
6; ECN: 0x00:
Not-ECT (Not ECN-Capable Transport))
1100 00.. = Differentiated Services Codepoint: Class Selector
6 (0x30)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not
ECN-Capable
Transport) (0x00)
Total Length: 50
Identification: 0xa51f (42271)
Flags: 0x00
0... .... = Reserved bit: Not set
.0.. .... = Don’t fragment: Not set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 1
Protocol: PIM (103)
Header checksum: 0x3379 [correct]
[Good: True]
[Bad: False]
Source: 0.0.0.0 (0.0.0.0)
Destination: 224.0.0.13 (224.0.0.13)
Protocol Independent Multicast
0010 .... = Version: 2
.... 0000 = Type: Hello (0)
Reserved byte(s): 00
Checksum: 0x572f [correct]
PIM options: 4
Option 1: Hold Time: 0s (goodbye)
Type: 1
Length: 2
Holdtime: 0s (goodbye)
Option 19: DR Priority: 0
Type: 19
Length: 4
DR Priority: 0
Option 22: Bidir Capable
Type: 22
Length: 0
Option 20: Generation ID: 2882395322
Type: 20
Length: 4
Generation ID: 2882395322
Example 14-50 shows the IGMP snooping status of the L2 switch in Data
Center 2 after receiving the PIM dummy hello packets on VLAN103 from
NX-6.
Note
At this point only Host C joined the multicast group, and there are no
sources actively sending to the group.
NX-2 installs an OTV mroute entry in response to receiving the IS-IS GM-
Update from NX-6, as shown in Example 14-53. The OIF on NX-2 is the
overlay interface. The r indicates the receiver is across the overlay.
Note
The show otv isis internal event-history mcast command is useful
for troubleshooting the IS-IS control plane for OTV multicast and the
advertisement of groups and sources for a particular VLAN.
NX-6 updates this information into its OTV mroute table, as shown in
Example 14-57. The s indicates the source is located across the overlay.
The show otv data-group command is used to verify the site group and
delivery group information for NX-2 and NX-6, as shown in Example 14-
58. This should match what is present in the output of show otv mroute.
OTV EDs act as source hosts and receiver hosts for the delivery groups
used on the transport network. An IGMPv3 membership report from the
join interface is sent to the transport to allow the OTV ED to start receiving
packets from the delivery group (10.1.12.1, 232.1.1.0).
Verification in the transport is done based on the PIM SSM delivery group
information obtained from the OTV EDs. Each AED’s join interface is a
source for the delivery group. The AED joins only delivery group sources
that are required based on the OTV mroute table and the information
received through the IS-IS control plane. This mechanism allows OTV to
optimize the multicast traffic in the transport so that only the needed data
is received by each OTV ED. The use of PIM SSM allows specific source
addresses to be joined for each delivery group.
Example 14-59 shows the mroute table of a transport router. In this output
10.1.12.1 is NX-2’s OTV join interface, which is a source for the delivery
group 232.1.1.0/32. The incoming interface should match the routing table
path toward the source to pass the Reverse Path Forwarding (RPF) check.
Interface Ethernet3/30 is the OIF and is connected to the OTV join
interface of NX-6.
Note
Multicast troubleshooting in the transport network between OTV ED
sources and receivers follow standard multicast troubleshooting for
the delivery group. The fact that OTV has encapsulated the site group
within a multicast delivery group does not change the troubleshooting
methodology in the transport. The OTV ED are source and receiver
hosts for the delivery group from the perspective of the transport
network.
In this example, Host A and Host C are both members of VLAN 103. Host
A is sending traffic to the site group 239.100.100.100, and Host C sends an
IGMP membership report message to the Data Center 2 L2 switch. The L2
switch forwards the membership report to NX-6 because it is an mrouter
port in IGMP snooping. The same PIM dummy hello packet mechanism is
used on the OTV internal interface, just as with a multicast enabled
transport. The arrival of the IGMP membership report on NX-6 triggers an
OTV mroute to be created, as shown in Example 14-60, with the internal
interface Port-channel 3 as an OIF.
Note
There is a PIM enabled router present on VLAN 103, as indicated in
Example 14-61 by the (*, *) entry.
Because IGMP packets are not forwarded across the overlay, the IS-IS
messages used to signal an interested receiver are counted as IGMP proxy-
reports. Example 14-62 shows the IGMP snooping statistics of NX-6,
which indicate the proxy-report being originated through IS-IS. The IGMP
proxy-report mechanism is not specific to OTV adjacency server mode.
Following the path from receiver to the source in Data Center 1, the IS-IS
database is verified on NX-2. This is done to confirm that the overlay is
added as an OIF for the OTV mroute. Example 14-63 contains the GM-LSP
received from NX-6 on NX-2.
The IGMP Snooping table on NX-2 confirms that the overlay is included in
the port list, as shown in Example 14-64.
The OTV mroute on NX-2 contains the (V, *, G) entry, which is populated
as a result of receiving the IS-IS GM-LSP from NX-6. This message
indicates Host C is an interested receiver in Data Center 2 and that NX-2
should add the overlay as an OIF for the group. The OTV mroute table from
NX-2 is shown in Example 14-65. The r indicates the receiver is reachable
across the overlay. The (V, S, G) entry is also present, which indicates Host
A is actively sending traffic to the site group 239.100.100.100.
Note
The OTV mroute table lists an OIF of NX-6 installed by OTV. This is
a result of the OTV Unicast encapsulation used in adjacency server
mode. The delivery group has values of all zeros for the group address.
This information is populated with a valid delivery group when
multicast transport is being used.
NX-2 encapsulates the site group packets in an OTV unicast packet with a
destination address of NX-6’s join interface. The OTV unicast packets
traverse the transport network until they arrive at NX-6. When the packets
arrive at NX-6 on the OTV join interface, the outer OTV unicast
encapsulation is removed. The next lookup is done on the inner multicast
packet, which results in an OIF for the mroute installed by IGMP on the
OTV internal interface. Example 14-66 shows the OTV mroute table of
NX-6. The site group multicast packet leaves on Po3 toward the L2 switch
in Data Center 2 and ultimately reaches Host C.
With adjacency server mode, the source is not advertised to the other OTV
EDs by NX-2. This is because there is no delivery group used across the
transport for remote OTV EDs to join. NX-2 only needs to know that there
is an interested receiver across the overlay and which OTV ED has the
receiver. The join interface of that OTV ED is used as the destination
address of the multicast-in-unicast OTV packet across the transport. The
actual encapsulation of the site group multicast frame is done using the
OTV unicast point-to-point dynamic tunnel, as shown in Example 14-67.
ip access-list ALL_IPs
10 permit ip any any
ipv6 access-list ALL_IPv6s
10 permit ipv6 any any
mac access-list ALL_MACs
10 permit any any
ip access-list HSRP_IP
10 permit udp any 224.0.0.2/32 eq 1985
20 permit udp any 224.0.0.102/32 eq 1985
ipv6 access-list HSRP_IPV6
10 permit udp any ff02::66/128
mac access-list HSRP_VMAC
10 permit 0000.0c07.ac00 0000.0000.00ff any
20 permit 0000.0c9f.f000 0000.0000.0fff any
30 permit 0005.73a0.0000 0000.0000.0fff any
arp access-list HSRP_VMAC_ARP
10 deny ip any mac 0000.0c07.ac00 ffff.ffff.ff00
20 deny ip any mac 0000.0c9f.f000 ffff.ffff.f000
30 deny ip any mac 0005.73a0.0000 ffff.ffff.f000
40 permit ip any mac any
vlan access-map HSRP_Localization 10
match mac address HSRP_VMAC
match ip address HSRP_IP
match ipv6 address HSRP_IPV6
action drop
vlan access-map HSRP_Localization 20
match mac address ALL_MACs
match ip address ALL_IPs
match ipv6 address ALL_IPv6s
action forward
vlan filter HSRP_Localization vlan-list 100-110
service dhcp
otv-isis default
vpn Overlay0
redistribute filter route-map OTV_HSRP_filter
otv site-identifier 0x1
ip arp inspection filter HSRP_VMAC_ARP vlan 100-110
Multihoming
A multihomed site in OTV refers to a site where two or more OTV ED are
configured to extend the same range of VLANs. Because OTV does not
forward STP BPDUs across the overlay, L2 loops form without the election
of an AED.
When multiple OTV EDs exist at a site, the AED election runs using the
OTV IS-IS system-id and VLAN identifier. This is done by using a hash
function where the result is an ordinal value of zero or one. The ordinal
value is used to assign the AED role for each extended VLAN to one of the
forwarding capable OTV EDs at the site.
When two OTV EDs are present, the device with the lower system-id is the
AED for the even-numbered VLANs, and the higher system-id is the AED
for the odd-numbered VLANs. The AED is responsible for advertising
MAC addresses and forwarding traffic for an extended VLAN across the
overlay.
Beginning in NX-OS 5.2(1) the dual site adjacency concept is used. This
allows OTV EDs with the same site identifier to communicate across the
overlay as well as across the site VLAN, which greatly reduces the chance
of one OTV ED being isolated and creating a dual active condition. In
addition, the overlay interface of an OTV ED is disabled until a site
identifier is configured, which ensures that OTV is able to detect any
mismatch in site identifiers. If a device becomes non-AED capable, it
proactively notifies the other OTV ED at the site so it can take over the role
of AED for all VLANs.
Note
For more information on LISP, refer to http://lisp.cisco.com.
VLAN Translation
In some networks, a VLAN configured at an OTV site may need to
communicate with a VLAN at another site that is using a different VLAN
numbering scheme. There are two solutions to this problem:
VLAN mapping on the overlay interface
VLAN mapping on an L2 Trunk port
VLAN mapping on the overlay interface is not supported with Nexus 7000
F3 or M3 series modules. If VLAN mapping is required with F3 or M3
modules, VLAN mapping on the OTV internal interface, which is an L2
trunk, must be used.
Example 14-69 demonstrates the configuration of VLAN mapping on the
overlay interface. VLAN 200 is extended across the overlay. The local
VLAN 200 is mapped to VLAN 300 at the other OTV site.
The status of the secondary OTV adjacencies are seen with the show otv
adjacency detail command, as shown in Example 14-72.
Overlay-Interface Overlay0 :
Hostname System-ID Dest Addr Up Time State
NX-4 64a0.e73e.12c2 10.1.22.1 00:03:07 UP
Secondary src/dest: 10.1.12.4 10.1.22.1 UP
HW-St: Default
NX-8 64a0.e73e.12c4 10.2.43.1 00:03:07 UP
Secondary src/dest: 10.1.12.4 10.2.43.1 UP
HW-St: Default
NX-6 6c9c.ed4d.d944 10.2.34.1 00:03:06 UP
Secondary src/dest: 10.1.12.4 10.2.34.1 UP
HW-St: Default
Note
OTV tunnel depolarization is enabled by default. It is disabled with
the otv depolarization disable global configuration command.
Note
OTV UDP encapsulation is supported starting in NX-OS release
7.2(0)D1(1) for F3 and M3 modules.
OTV Fast Failure Detection
OTV’s dual adjacency implementation forms an adjacency on the site
VLAN as well as across the overlay for OTV EDs, which have a common
site identifier. When an OTV ED becomes unreachable or goes down, the
other OTV ED at the site must take over the AED role for all VLANs.
Detecting this failure condition quickly minimizes traffic loss during the
transition.
The site VLAN IS-IS adjacency can be configured to use Bidirectional
Forwarding Detection (BFD) on the site VLAN to detect IS-IS neighbor
loss. This is useful to detect any type of connectivity failure on the site
VLAN. Example 14-73 shows the configuration required to enable BFD on
the site VLAN.
otv site-vlan 10
otv isis bfd
interface Vlan10
no shutdown
bfd interval 250 min_rx 250 multiplier 3
no ip redirects
ip address 10.111.111.1/30
The status of BFD on the site VLAN is verified with the show otv isis site
command, as shown in Example 14-74. Any BFD neighbor is also present
in the output of the show bfd neighbors command.
For the overlay adjacency, the presence of a route to reach the peer OTV
ED’s join interface can be tracked to detect a reachability problem that
eventually causes the IS-IS neighbor to go down. Example 14-75 shows the
configuration to enable next-hop adjacency tracking for the overlay
adjacency of OTV EDs, which use the same site identifier.
otv-isis default
track-adjacency-nexthop
vpn Overlay0
redistribute filter route-map OTV_HSRP_filter
Example 14-76 contains the output of show otv isis track-adjacency-
nexthop, which verifies the feature is enabled and tracking next-hop
reachability of NX-4.
Summary
OTV was introduced in this chapter as an efficient and flexible way to
extend L2 VLANs to multiple sites across a routed transport network. The
concepts of MAC routing and the election of an AED were explained as an
efficient way to solve the challenges presented by other DCI solutions
without relying on STP. The examples and end-to-end walk-through for the
control plane, unicast traffic, and multicast traffic provided in this chapter
can be used as a basis for troubleshooting the various types of connectivity
problems that may be observed in a production network environment.
References
Fuller, Ron, David Jansen, and Matthew McPherson. NX-OS and Cisco
Nexus Switching. Indianapolis: Cisco Press, 2013.
Krattiger, Lukas. “Overlay Transport Virtualization” (presented at Cisco
Live, Las Vegas 2016).
Schmidt, Carlo. “Advanced OTV—Configure, Verify and Troubleshoot
OTV in Your Network” (presented at Cisco Live, San Francisco 2014).
draft-hasmit-otv-04 Overlay Transport Virtualization, H. Grover, D. Rao,
D. Farinacci, V. Moreno, IETF, https://tools.ietf.org/html/draft-hasmit-otv-
04, February 2013.
draft-drao-isis-otv-00 IS-IS Extensions to Support OTV, D. Rao, A.
Banerjee, H. Grover, IETF, https://tools.ietf.org/html/draft-drao-isis-otv-
00, March 2011.
RFC 6165, Extensions to IS-IS for Layer-2 Systems. A. Banerjee, D. Ward.
IETF, https://tools.ietf.org/html/rfc6165, April 2011.
RFC 7348. Virtual eXtensible Local Area Network (VXLAN): A
Framework for Overlaying Virtualized L2 Networks over L3 Networks. M.
Mahalingam et al. IETF, https://tools.ietf.org/html/rfc7348, August 2014.
Cisco. Cisco Nexus Platform Configuration Guides, http://www.cisco.com.
Wireshark. Network Protocol Analyzer, www.wireshark.org/.
Part VII
Network Programmability
Chapter 15
Programmability and
Automation
Note
For more details on Open NX-OS architecture, refer to the book
Programmability and Automation with Cisco Open NX-OS, at
Cisco.com.
Bash Shell
Bourne-Again Shell (Bash) is a modern UNIX shell, a successor of the
Bourne shell. It provides a rich feature set and built-in capability to interact
with the low-level components of the underlying operating system. The
Bash shell is currently available on the Nexus 9000, Nexus 3000, and
Nexus 3500 series platforms. The Bash shell provides shell access to the
underlying Linux operating system, which has additional capabilities that
the standard NX-OS CLI does not provide. To enable the Bash shell on
Nexus 9000 switches, enable the command feature bash-shell. Then use
the command run bash cli to execute any Bash CLI commands. Users can
also move into shell mode by using the NX-OS CLI command run bash
and then can execute the relevant Bash CLI commands from the Bash shell.
Example 15-1 illustrates how to enable the bash-shell feature and use the
Bash shell command pwd to display the current working directory. To
check whether the bash-shell feature is enabled, use the command show
bash-shell. Example 15-1 also demonstrates various basic commands on
the Bash shell. The Bash command id -a is used to verify the current user,
as well as Group and Group ID information. You can also use echo
commands to print various messages based on the script requirements.
Example 15-1 Enabling the bash-shell Feature and Using Bash Commands
Click here to view code image
N9k-1(config)# feature bash-shell
bash-4.2$ id -a
uid=2002(admin) gid=503(network-admin) groups=503(network-admin)
bash-4.2$
bash-4.2$ echo "First Example on " 'uname -n' " using bash-shell "
$BASH_VERSION
First Example on N9k-1 using bash-shell 4.2.10(1)-release
Note
It is recommended that you become familiar with the UNIX/Linux
bash shell commands for this section.
On NX-OS, only users with the roles network-admin, vdc-admin, and dev-
ops can use the Bash shell. Other users are restricted from using Bash
unless it is specially allowed in their role. To validate check roles are
permitted to use the Bash shell, use the command show role [name role-
name]. Example 15-2 displays the permission for the network-admin and
dev-ops user roles.
Role: network-admin
Description: Predefined network admin role has access to all
commands
on the switch
----------------------------------------------------------------
---
Rule Perm Type Scope Entity
----------------------------------------------------------------
---
1 permit read-write
Role: dev-ops
Description: Predefined system role for devops access. This role
cannot be modified.
----------------------------------------------------------------
---
Rule Perm Type Scope Entity
----------------------------------------------------------------
---
6 permit command conf t ;
username *
5 permit command attach module
*
4 permit command slot
*
3 permit command bcm module
*
2 permit command run bash
*
1 permit command python *
Example 15-4 Installing and Removing RPM Packages from the Bash Shell
Click here to view code image
bash-4.2$ sudo yum list installed | grep n9000
base-files.n9000 3.0.14-
r74.2 installed
bfd.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
bgp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
container-tracker.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
core.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
eigrp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
eth.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
fcoe.lib32_n9000 2.0.0-
7.0.3.IFD6.1 installed
isis.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
lacp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
linecard2.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
lldp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
ntp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
nxos-ssh.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
ospf.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
perf-cisco.n9000_gdb 3.12-
r0 installed
platform.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
rip.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
shadow-securetty.n9000_gdb 4.1.4.3-
r1 installed
snmp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
svi.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
sysvinit-inittab.n9000_gdb 2.88dsf-
r14 installed
tacacs.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
task-nxos-base.n9000_gdb 1.0-
r0 installed
telemetry.lib32_n9000 2.2.1-
7.0.3.I6.1 installed
tor.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
vtp.lib32_n9000 2.0.0-
7.0.3.I6.1 installed
bash-4.2$ sudo yum -y install bfd
Loaded plugins: downloadonly, importpubkey, localrpmDB,
patchaction, patching,
protect-packages
groups-repo | 1.1
kB 00:00 ...
localdb | 951
B 00:00 ...
patching | 951
B 00:00 ...
thirdparty | 951
B 00:00 ...
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package bfd.lib32_n9000 0:2.0.0-7.0.3.I6.1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
==================================================================
==============
Package Arch Version Repository
Size
Installing:
bfd lib32_n9000 2.0.0-7.0.3.I6.1 groups-
repo 483 k
Transaction Summary
==================================================================
==============
Install 1 Package
Complete!
Dependencies Resolved
==================================================================
==============
Package Arch Version Repository
Size
==================================================================
==============
Removing:
bfd lib32_n9000 2.0.0-7.0.3.I6.1 @groups-
repo 1.8 M
Transaction Summary
==================================================================
==============
Remove 1 Package
Removed:
bfd.lib32_n9000 0:2.0.0-
7.0.3.I6.1
Complete!
Guest Shell
The network paradigm has moved from hardware, software, and
management network elements to extensible network elements. The built-
in Python and Bash execution environments enable network operators to
execute custom scripts in NX-OS environments using the Cisco-supplied
APIs and classes to interact with some of the major NX-OS components.
However, in some scenarios, network operators want to integrate third-
party applications and host the application on NX-OS. To meet those needs,
NX-OS provides a third-party application hosting framework that enables
users to host their applications in a dedicated Linux user space
environment. Network operators must use the Cisco Application
Development Toolkit (ADT) to cross-compile their software and package it
with a Linux root file system into a Cisco Open Virtual Appliance (OVA)
package. These OVAs are then deployed on the NX-OS network element
using the application hosting feature.
NX-OS software introduces the NX-OS Guest shell feature on the Nexus
9000 and Nexus 3000 series switches. The Guest shell is an open source
and secure Linux environment for rapid third-party software development
and deployment. The guestshell feature leverages the benefits of the Python
and Bash execution environments and the NX-OS application hosting
framework.
The Guest shell is enabled by default on Nexus 9000 and Nexus 3000. You
can explicitly enable or destroy the guestshell feature on NX-OS. Table 15-
1 describes the various guest shell commands.
Table 15-1 Guest Shell Feature Commands
CommandDescription
guestshell This CLI installs and enables the Guest shell service. When this
enable command is enabled, you can enter the Guest shell using the
guestshell command.
guestshell This command disables the guest shell service. Access to the
disable Guest shell then is also disabled.
guestshell The CLI deactivates and uninstalls the current guest shell. All
destroy system resources associated with the Guest shell return to the
system.
guestshell The CLI deactivates and reactivates the current Guest shell.
reboot
guestshell The CLI is used to execute a program inside a Guest shell,
run return the output, and exit the Guest shell.
command-
line
guestshell The CLI deactivates the current active Guest shell, syncs its
sync root file system contents to the standby RP, and then reactivates
the Guest shell on the active RP.
guestshell The CLI deactivates and performs an upgrade of the current
upgrade Guest shell using the OVA that is embedded within the booted
system image. Upon successful upgrade, the Guest shell is
reactivated.
guestshell The CLI allows modification to the default or existing
resize parameters related to the Guest shell, such as CPU, memory,
and root file system parameters.
When the Guest shell is up and running, you can check the details using the
command show guestshell detail. This command displays the path of the
OVA file, the status of the Guest shell service, resource reservations, and
the file system information of the Guest shell. Example 15-5 displays the
detailed information of the Guest shell on a Nexus 9000 switch.
Attached devices
Type Name Alias
---------------------------------------------
Disk _rootfs
Disk /cisco/core
Serial/shell
Serial/aux
Serial/Syslog serial2
Serial/Trace serial3
If the Guest shell does not come up, check the log for any error messages
using the show logging logfile command. To troubleshoot issues with the
Guest shell, use the command show virtual-service [list] to view both the
status of the Guest shell and the resources the Guest shell is using. Example
15-6 displays the virtual service list and the resources being utilized by the
current Guest shell on the Nexus 9000 switch.
Example 15-6 Virtual Service List and Resource Utilization
Click here to view code image
Note
If you cannot resolve the Guest shell problem, collect the output of
show virtual-service tech-support and contact the Cisco Technical
Assistance Center (TAC) for further investigation.
Python
With the networking industry’s push toward software-defined networking
(SDN), multiple doors have opened for integrating scripting and
programming languages with network devices. Python has gained industry-
wide acceptance as the programming language of choice. Python is a
powerful and easy-to-learn programming language that provides efficient
high-level data structures and object-oriented features. These features make
it an ideal language for rapid application development on most platforms.
Python integration is available on most Nexus platforms and does not
require the installation of any special license. The interactive Python
interpreter is invoked from the CLI on Nexus platforms by typing the
python command. On Nexus 9000 and Nexus 3000 platforms, Python can
also be used through the Guest shell. After executing the python command,
the user is placed directly into the Python interpreter. Example 15-7
demonstrates the use of the Python interpreter from both the CLI and the
guest shell.
Example 15-7 Python Interpreter from CLI and the Guest Shell
Click here to view code image
N9k-1# python
Python 2.7.5 (default, Nov 5 2016, 04:39:52)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> print "Hello World...!!!"
Hello World...!!!
N9k-1# guestshell
[admin@guestshell ~]$ python
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> print "Hello Again..!!!"
Hello Again..!!!
Note
Readers are advised to become familiar with the Python programming
language. This chapter does not focus on writing specific Python
programs, however; instead, it focuses on how to use Python on Nexus
platforms.
In addition to the standard Python libraries, NX-OS provides the Cisco and
CLI libraries, which you can import into your Python script to perform
Cisco-specific functions on the Nexus switch. The Cisco library provides
access to Cisco Nexus components. The CLI library provides the capability
to execute commands from the Nexus CLI and return the result. Example
15-8 displays the package contents of both the Cisco and CLI libraries on
NX-OS.
NAME
cisco
FILE
/usr/lib64/python2.7/site-packages/cisco/__init__.py
PACKAGE CONTENTS
acl
bgp
buffer_depth_monitor
check_port_discards
cisco_secret
dohost
feature
history
interface
ipaddress
key
line_parser
mac_address_table
nxapi
nxcli
ospf
routemap
section_parser
ssh
system
tacacs
transfer
vlan
vrf
NAME
cli
FILE
/usr/lib64/python2.7/site-packages/cli.py
FUNCTIONS
cli(cmd)
clid(cmd)
clip(cmd)
import sys
from cli import *
import json
sys.exit(0)
[admin@guestshell ~]$ python test.py
mgmt0
Ethernet1/4
Ethernet1/5
Ethernet1/13
Ethernet1/14
Ethernet1/15
Ethernet1/16
Ethernet1/19
Ethernet1/32
Ethernet1/37
Ethernet2/1
port-channel10
port-channel101
port-channel600
loopback0
loopback5
loopback100
Vlan100
Vlan200
Vlan300
NX-SDK
The NX-OS software development kit (NX-SDK) is a C++ plug-in library
that allows custom, native applications to access NX-OS functions and
infrastructure. Using the NX-SDK, you can create custom CLI commands,
syslog messages, event handlers, and error handlers. An example of using
this functionality would be creating your custom application to register
with the route manager to receive routing updates from the routing
information base (RIB) and then taking some action based on the presence
of the route. Three primary requirements must be met for using NX-SDK:
Docker
Linux environment (Ubuntu 14.04 or higher, Centos 6.7 or higher)
Cisco SDK (optional)
Note
NX-SDK can also be integrated with Python. Thus, Cisco SDK is not
required for Python applications.
The NX-SDK must be installed before it can be used in the development
environment. The installation steps follow:
Step 1. Pull a docker image from
https://hub.docker.com/r/dockercisco/nxsdk
Step 2. Set the environment variables as follows for a 32-bit environment:
1. export ENXOS_SDK_ROOT=/enxos-sdk
2. cd $ENXOS_SDK_Root
3. source environment-setup-x86-linux
Step 3. Clone NX-SDK toolkit from GitHub.
1. git clone https://github.com/CiscoDevNet/NX-SDK.git
Explore the API after forking the NX-SDK from GitHub and use it to create
custom application packages to be installed on the Nexus switch.
Note
When creating custom applications, refer to the documentation and
custom sample application code available as part of the NX-SDK.
Once the applications are built, use the rpm_gen.py Python script to
automatically generate the RPM package. The script is present in the /NX-
SDK/scripts directory. When the RPM package is built, the RPM package
can be copied to the Nexus Switch in the bootflash: directory, where the
package is then installed on the Nexus switch for further use. Example 15-
11 demonstrates the installation steps for an RPM package on the Nexus
9000 switch. This example demonstrates the sample RPM package named
customCliApp that is available as part of the NX-SDK kit. To start a custom
application, first enable feature nxsdk. Then add the custom application as
a service using the command nxsdk service-name app-name. You can
check the status of the application using the command show nxsdk
internal service.
Inactive Packages:
customCliApp-1.0-1.0.0.x86_64
Active Packages:
customCliApp-1.0-1.0.0.x86_64
Note
In Example 15-11, the RPM package is installed using the Virtual shell
(VSH). The RPM package can also be installed from the Bash shell.
Note
In addition to the event history logs, you can collect the output of
show tech nxsdk if custom applications are failing to install or are not
working.
NX-API
NX-OS provides an API known as the NX-API that enables you to interact
with the switch using a standard request/response language. The traditional
CLI was designed for human-to-switch interaction. Requests are made by
typing a CLI command and receiving a response from the switch in the
form of output to the client terminal. This response data is unstructured
and requires the human operator to evaluate the output line by line to find
the interesting piece of information in the output. Operators that use the
traditional CLI interface to automate tasks through scripting are forced to
follow the same data interpretation method by screen-scraping the output
for the interesting data. This is not only inefficient, but also cumbersome
because it requires output iteration and specific text matching through
regular expressions.
The benefit of using the NX-API is the capability to send requests and
receive responses that are optimized for machine-to-machine
communication. In other words, when communicating through the NX-API,
the request and response are formatted as structured data. The response
received from the NX-API is provided as either Extensible Markup
Language (XML) or JavaScript Object Notation (JSON). This is much more
efficient and less error prone than parsing the entire human-readable CLI
output for only a small percentage of interesting data. NX-API is used to
obtain output from show commands, as well as to add or remove
configuration, thus streamlining and automating operations and
management in a large-scale network.
Communication between the client and NX-API running on the switch uses
the Transport Control Protocol (TCP) and can be either Hypertext Transfer
Protocol (HTTP) or Hypertext Transfer Protocol Secure (HTTPS),
depending on the requirements. NX-API uses HTTP basic authentication.
Requests must carry the username and password in the HTTP header. After
a successful authentication, NX-API provides a session-based cookie using
the name nxapi_auth. That session cookie should be included in subsequent
NX-API requests. The privilege of the user is checked to confirm that the
request is being made by a user with a valid username and password on the
switch who also has the proper authorization for the commands being
executed through the NX-API.
After successful authentication, you can start sending requests. The NX-
API request object is either in JSON-RPC or a Cisco proprietary format.
Table 15-2 describes the fields present in the JSON-RPC request object.
Figure 15-1 shows an example JSON-RPC request object used to query the
switch for its configured switch name.
Figure 15-1 JSON-RPC Request Object
The second type of request object is the Cisco proprietary format, which is
either XML or JSON. Table 15-3 provides a description for the fields used
in the Cisco proprietary request object.
Table 15-3 Cisco Proprietary Request Object Fields
Field Description
The request object is sent to the switch on the configured HTTP (TCP port
80) or HTTPS (TCP port 443) port. The received request object is validated
by the web server and the appropriate software object is provided with the
request. The response object is then sent back from the switch in either
JSON-RPC or Cisco proprietary formats to the client. Table 15-4 provides
the field descriptions of the JSON-RPC response object.
Table 15-4 JSON-RPC Response Object Fields
Field Description
result This field is included only for successful requests. The value of
this field contains the requested CLI output.
error This field is included only on an errored request. The error object
contains the following fields: “code”: An integer error code
specified by the JSON-RPC specification “message”: A human-
readable string that corresponds to the error code “data”: An
optional structure that contains other useful information for the
user.
id This field contains the same value as the id field in the
corresponding request object. If a problem occurred while parsing
the id field in the request, this value is null.
Table 15-5 describes the fields included in the Cisco proprietary response
object.
Table 15-5 Cisco Proprietary Response Object Fields
Field Description
sid Session ID of the current response (valid only for chunked output).
code The error code of the command execution. Standard HTTP error
codes are used.
msg The error message associated with the error code.
NX-2# conf t
Enter configuration commands, one per line. End with CNTL/Z.
NX-2(config)# feature nxapi
When the NX-API feature is enabled, you may authenticate and begin
sending requests to the appropriate HTTP or HTTPS port. NX-OS also
provides a sandbox environment for testing the functions of the API; this is
accessed by using a standard web browser and connecting through HTTP to
the switch management address.
Troubleshooting problems related to NX-API typically involve the TCP
connection used to deliver the request and response messages between the
switch and the client. The NX-OS ethanalyzer capture tool is used to
troubleshoot connection issues from the client and confirm that the TCP
three-way handshake is completed (see Example 15-14).
Capturing on mgmt0
192.168.1.50 -> 192.168.1.201 TCP 52018 > https [SYN] Seq=0
Win=65535 Len=0
MSS=1460 WS=5 TSV=568065210 TSER=0
192.168.1.201 -> 192.168.1.50 TCP https > 52018 [SYN, ACK] Seq=0
Ack=1
Win=16768 Len=0 MSS=1460 TSV=264852 TSER=568065210
192.168.1.50 -> 192.168.1.201 TCP 52018 > https [ACK] Seq=1 Ack=1
Win=65535
Len=0 TSV=568065211 TSER=264852
192.168.1.50 -> 192.168.1.201 SSL Client Hello
192.168.1.201 -> 192.168.1.50 TLSv1.2 Server Hello, Certificate,
Server Key
Exchange, Server Hello Done
192.168.1.50 -> 192.168.1.201 TCP 52018 > https [ACK] Seq=518
Ack=1294
Win=65535 Len=0 TSV=568065232 TSER=264852
192.168.1.50 -> 192.168.1.201 TLSv1.2 Client Key Exchange, Change
Cipher Spec,
Hello Request, Hello Request
192.168.1.201 -> 192.168.1.50 TLSv1.2 Encrypted Handshake Message,
Change
Cipher Spec, Encrypted Handshake Message
After confirming that the TCP session from the client is established,
additional information about the NX-API communication with the client is
found with the show nxapi-server logs command. The server logs in
Example 15-15 show the connection attempt, as well as the details of the
request that was received. The execution of the CLI command is also
shown in the log file, which is helpful in identifying why a particular batch
of commands is failing. Finally, the response object sent to the client is also
provided.
Note
Any activity from the NX-API is logged in the switch accounting log
just like in the traditional CLI. The username associated with the NX-
API is listed in the accounting log as nginx.
In addition to the NX-API server logs, NX-OS has a detailed show tech
nxapi command that provides the server logs in addition to the nginx web
server logs from the Linux process.
Summary
Automation and programmability are the defining building blocks for the
future of networking. Open NX-OS was conceived to meet the future needs
of SDN and the desire for users to natively execute third-party applications
directly on Nexus switches. Open NX-OS provides the architecture that
allows network operators and developers to create and deploy custom
applications on their network devices. Integration of the powerful Bash
shell and Guest shell has made it easy to create scripts for automating tasks
on Nexus switches. This chapter covered in detail how you can leverage the
Bash shell and the Guest shell to deploy third-party applications.
Integration of Python with NX-OS enables you to create dynamic
applications that enhance the functionality and manageability of Nexus
switches. In addition to Python support, Cisco provides the NX-SDK, which
supports building applications in both the C++ and Python languages and
compile them as RPM packages. Finally, this chapter covered NX-API, an
API that enables users to interact with the Nexus switch using standard
request/response language.
References
Programmability and Automation with Cisco Open NX-OS:
https://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/nexus9000
/sw/open_nxos/programmability/guide/Programmability_Open_NX-OS.pdf
NX-SDK:
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7
-x/programmability/guide/b_Cisco_Nexus_9000_Series_NX-
OS_Programmability_Guide_7x/b_Cisco_Nexus_9000_Series_NX-
OS_Programmability_Guide_7x_chapter_011010.pdf
Index
Symbol
s
* (asterisk) in RegEx, 683
[] (brackets) in RegEx, 680
^ (caret) in RegEx, 679
[^] (caret in brackets) in RegEx, 681
, (comma) utility, 41
$ (dollar sign) in RegEx, 679–680
- (hyphen) in RegEx, 680–681
() (parentheses) in RegEx, 681–682
. (period) in RegEx, 682
| (pipe) in RegEx, 681–682
+ (plus sign) in RegEx, 682
? (question mark) in RegEx, 683
_ (underscore) in RegEx, 677–678
(*, G) join from NX-4 and NX-3 example, 865
802.1D standards, 219–220
A
access ports, 203–204
accounting log, 45–46, 91
ACL Manager, 570–576
ACLs (access control lists), 569–570
ACL Manager, 570–576
for BFD in hardware example, 700–702
BGP network selection, 577
formats example, 571–572
IGP network selection, 576–577
to match traffic on NX-1 example, 810
for permitting BGP traffic example, 613
programming and statistics for DAI example, 346–348
statistics example, 572–573
verifying, 613–615
action-on-failure for on-demand diagnostic tests example, 107
activating maintenance mode with custom profiles example, 730–731
active interfaces, verifying, 402–403
active query in EIGRP, 441–442
Active state, 604
Active/Standby redundancy mode, 29–34
AD (administrative distance), 600
address assignment (IPv6), 357–362
DHCPv6 relay agent, 357–359
DHCPv6 relay LDRA, 360–362
address families, 598–599
adjacency internal forwarding trace example, 162
adjacency manager clients example, 165
adjacency server mode in OTV, 907–912, 932–937
adjacency verification in OTV, 888–898
advanced verification of EIGRP neighbors example, 423
advertising community value example, 685–686
AFI (address-family identifier), 598–599
aggregate-address command, 634–635
allowed VLANs, 206
AM (Adjacency Manager), 160–175
anycast RP, configuring and verifying, 830–841
anycast traffic, 734
architecture of NX-OS, 8–9
feature manager, 14–16
file systems, 19–25
kernel, 9
line card microcode, 17–19
Messages and Transactional Services (MTS), 11–12
multicast architecture, 741–743
CLI commands, 743
CPU protection, 745–747
implementation, 747–750
replication, 744–745
Persistent Storage Services (PSS), 13–14
system manager (sysmgr), 9–11
area settings mismatches
in IS-IS, 539–541
in OSPF, 473–474
areas
in IS-IS, 508–509
in OSPF, 453
ARP (Address Resolution Protocol), 160–175
ACL configuration and verification, 348–349
dynamic ARP inspection (DAI), 345–349
entry for multicast source example, 796
event history example, 163–164
event-history logs and buffer size example, 92
ND-Cache event-history example, 916–917
in OTV, 915–917
synchronization in vPC, 291–292
table example, 162
ARP-ND-Cache, 915–917
ASM (any source multicast), 785–787
configuring, 787–788
event-history and MROUTE state verification, 789–795
platform verification, 795–799
verifying, 788–789
ASN (autonomous system number), 597–598
ASN mismatch, 412–413
AS-Path access lists, 684
assert message (PIM), 778–779
asterisk (*) in RegEx, 683
asynchronous mode in BFD, 691–692
asynchronous mode with echo function in BFD, 693
attach module CLI usage from supervisor example, 18–19
attribute modifications for route-maps, 586
attributes (BGP), 637
authentication
in EIGRP, 416–419
in FabricPath, 302
in IS-IS, 544–546
on overlay interface, 905–907
in OSPF, 478–482
automation, 949–950. See also programmability
Open NX-OS, 950–951
shells and scripting, 951
bash shell, 951–957
Guest shell, 957–960
Python, 960–964
AS (autonomous system), 597
autorecovery (vPC), 289
auto-RP
configuration on NX-3 example, 817–818
configuring and verifying, 813–820
event-history on NX-4 example, 819–820
listener configuration on NX-2 example, 818–819
mapping agent configuration on NX-4 example, 815–816
B
backup Layer 3 routing in vPC, 292–293
bad BGP updates, 622–623
baseline configuration
EIGRP (Enhanced Interior Gateway Protocol), 399–401
IS-IS (Intermediate System-to-Intermediate System), 518–520
OSPF (Open Shortest Path First), 456–458
bash shell, 51, 951–957
best path calculation in BGP, 636–639
BFD (bidirectional forwarding detection), 689–691, 944–945
asynchronous mode, 691–692
asynchronous mode with echo function, 693
configuring and verifying sessions, 693–707
control packet fields, 691–692
with echo function configuration and verification example, 702–703
event-history logs example, 696–697
failure log example, 703
failure reason codes, 703
feature status example, 695
for OTV IS-IS on site VLAN example, 944–945
over port-channel example, 706–707
over port-channel (micro session configuration) example, 706
over port-channel per-link configuration example, 704–705
session-based event-history example, 697–699
transition history logs example, 699–700
bfd per-link command, 704–705
BGP (Border Gateway Protocol), 597–598
address families, 598–599
attributes detail example, 652–653
best path calculation, 636–639
best path selection example, 638–639
configuration and verification, 605–609
convergence, 646–649
event-history example, 674–675
event-history for inbound prefixes example, 666
event-history for outbound prefixes example, 667
filter-lists example, 670, 672–673
flaps due to MSS issue example, 628
and IBP redistribution example, 633–634
IPv6 peer troubleshooting, 621–622
keepalive debugs example, 619
logs collection, 687
loop prevention, 599–600
message sent and OutQ example, 625
messages
KEEPALIVE, 602
NOTIFICATION, 602
OPEN, 601–602
types of, 601
UPDATE, 602
multipath, 640–643
neighbor states, 602–603
Active, 604
Connect, 603–604
Established, 605
Idle, 603
OpenConfirm, 604
OpenSent, 604
network selection, 577
path attributes (PA), 599
peer flapping troubleshooting, 622
bad BGP updates, 622–623
Hold Timer expired, 623–624
Keepalive generation, 624–626
MTU mismatches, 626–630
peering down troubleshooting, 609–610
ACL and firewall verification, 613–615
configuration verification, 610–611
debug logfiles, 618–619
notifications, 619–621
OPEN message errors, 617–618
reachability and packet loss verification, 611–613
TCP session verification, 615–617
policy statistics for prefix-list example, 667–668
policy statistics for route-map example, 675
regex queries
for AS _100 example, 678
for AS _100_ example, 678
with AS 40 example, 680
for AS 100 example, 678
for AS 300 example, 679
with asterisk example, 683
with brackets example, 680
with caret example, 679
with caret in brackets example, 681
with dollar sign example, 680
with hyphen example, 681
with parentheses example, 682
with period example, 682
with plus sign example, 682
with question mark example, 683
route advertisement, 631
with aggregation, 634–635
with default-information originate command, 636
with network statement, 631–633
with redistribution, 633–634
route filtering and route policies, 662–663
communities, 684–686
with filter lists, 669–673
looking glass and route servers, 687
AS-Path access lists, 684
with prefix lists, 663–669
regular expressions, 676–683
with route-maps, 673–676
route processing, 630–631
route propagation, 630–631
route refresh capability example, 656
route-map configuration example, 673–674
router ID (RID), 601
scaling, 649–650
maxas-limit command, 662
maximum-prefixes, 659–661
with route reflectors, 657–659
soft reconfiguration inbound versus route refresh, 654–657
with templates, 653–654
tuning memory consumption, 650–653
sessions, 600–601
table for regex queries example, 677
table on NX-2 example, 662–663
table output after prefix-list configuration example, 665
table output with route-map filtering example, 674
table with filter-list applied example, 670–671
template configuration example, 654
update generation process, 643–646
wrong peer AS notification message example, 617
BiDIR (Bidirectional), 799–803
configuring, 803–804
terminology, 800
verifying, 805–811
blocked switch ports
identification, 225–227
modifying location, 229–232
bloggerd, 47
bootstrap message (PIM), 777–778
bootup diagnostics, 98–99
Bourne-Again Shell (Bash), 951–957
BPDU (Bridge Protocol Data Unit), 220
filter, 244–245
guard, 243–244
guard configuration example, 243
brackets ([]) in RegEx, 680
BRIB and URIB route installation example, 648
bridge assurance, 250–252
configuration example, 250
engaging example, 251
brief review of MST status example, 237–238
broadcast domains, 198. See also VLANs (virtual LANs)
broadcast optimization in OTV, 877
broadcast traffic
multicast traffic versus, 734–735
in OTV, 917–918
BSR (bootstrap router), configuring and verifying, 820–830
on NX-1 example, 822–823
on NX-2 example, 826–827
on NX-3 example, 825–826
on NX-4 example, 824–825
buffered logging, 88–89
C
candidate RP advertisement message (PIM), 779
capture filters in Ethanalyzer, 65–67
capturing
debug in logfile on NX-OS example, 90
LACP packets with Ethanalyzer example, 265
packets. See packet capture
caret (^) in RegEx, 679
caret in brackets ([^]) in RegEx, 681
CD (collision domain), 197–198
cd command, 20
changing
LACP port priority example, 269
MST interface cost example, 240
MST interface priority example, 241
OSPF reference bandwidth on R1 and R2 example, 503
spanning tree protocol system priority example, 228–229
checking
for feature manager errors example, 16
feature manager state for feature example, 15
IS-IS metric configuration example, 555
Cisco and CLI Python libraries on NX-OS example, 961–962
Cisco proprietary request object fields, 969–970
Cisco proprietary response object fields, 971
classic metrics
on all Nexus switches example, 436
versus wide metrics
in EIGRP, 433–439
on NX-1 example, 435
clear bgp command, 654–657
clear ip mroute command, 748
CLI, 39–44
collecting show tech-support to investigate OSPF problem example, 45
comma (,) utility, 41
commands
access port configuration, 203
aggregate-address, 634–635
bash shell, 951–957
bfd per-link, 704–705
clear bgp, 654–657
clear ip mroute, 748
CLI, 39–44
conditional matching options, 583–584
configure maintenance profile, 728–730
debug bgp keepalives, 618–619
debug bgp packets, 623
debug bgp updates, 671–672
debug ip bgp brib, 643–645
debug ip bgp update, 643–645
debug ip eigrp packets, 405–406
debug ip ospf, 464
debug ip pim data-register receive, 790
debug ip pim data-register send, 790
debug isis, 529–530
debug mmode logfile, 731
debug sockets tcp pcb, 156–157
default-information originate, 636
ethanalyzer local interface, 65
ethanalyzer local read, 68
feature bfd, 693
feature netflow, 74
feature nxapi, 972
file system commands
dir bootflash: 21
dir logflash: 24
list of, 20
show file logflash: 24–25
Guest shell, 957–960
IGMP snooping configuration parameters, 758–761
install all, 719
install all kickstart, 714–718
maxas-limit, 662
maximum-prefix, 659–661
for multicast traffic, 743
no configure maintenance profile, 728–730
no system mode maintenance, 724–725
python, 50, 960–961
redirection, 39
run bash, 51
show accounting log, 45–46
show bfd neighbors, 694–695, 704–705
show bfd neighbors detail, 702–703
show bgp, 606–607, 638–639
show bgp convergence detail, 648–649
show bgp event-history, 647–648
show bgp event-history detail, 642–643, 646, 665–667, 674–675
show bgp ipv4 unicast policy statistics neighbor, 675
show bgp policy statistics neighbor filter-list, 672
show bgp policy statistics neighbor prefix-list, 667–668
show bgp private attr detail, 652–653
show bgp process, 607–609
show cli list, 42–43
show cli syntax, 43
show clock, 82
show copp diff profile, 188
show cores, 29
show cores vdc-all, 108
show diagnostic bootup level, 99
show diagnostic content module, 101–103
show diagnostic ondemand setting, 106–107
show diagnostic result module, 103–105
show event manager policy internal, 85–86
show event manager system-policy, 84–85
show fabricpath conflict all, 310
show fabricpath isis adjacency, 304–305
show fabricpath isis interface, 303–304
show fabricpath isis topology, 306
show fabricpath isis vlan-range, 305–306
show fabricpath route, 307
show fabricpath switch-id, 303, 315
show fabricpath unicast routes vdc, 308–309
show fex, 126–128
show forwarding distribution ip igmp snooping vlan, 765
show forwarding distribution ip multicast route group, 797
show forwarding internal trace v4-adj-history, 162
show forwarding internal trace v4-pfx-history, 172–173
show forwarding ipv4 adjacency, 162–163
show forwarding ipv4 route, 173–174
show forwarding route, 173–174
show glbp, 386–388
show glbp brief, 386–388
show guestshell detail, 958–959
show hardware, 98
show hardware capacity interface, 113
show hardware flow, 76–77
show hardware internal cpu-mac eobc stats, 118–119
show hardware internal cpu-mac inband counters, 123
show hardware internal cpu-mac inband events, 122–123
show hardware internal cpu-mac inband stats, 119–122
show hardware internal dev-port-map, 797–798
show hardware internal errors, 114, 124
show hardware internal forwarding asic rate-limiter, 184–185
show hardware internal forwarding instance, 309
show hardware internal forwarding rate-limiter usage, 182–184
show hardware internal statistics module pktflow dropped, 116–118
show hardware mac address-table, 764
show hardware rate-limiter, 745–746
show hardware rate-limiters, 181–182
show hsrp brief, 373–374
show hsrp detail, 373–374
show hsrp group detail, 377–378
show incompatibility-all system, 713–714
show interface, 110–112, 193, 194, 203–204
show interface counters errors, 112–113
show interface port-channel, 261–262
show interface trunk, 204–205
show interface vlan 10 private-vlan mapping, 216
show ip access-list, 572–573
show ip adjacency, 165–166
show ip arp, 161–162, 796
show ip arp inspection statistics vlan, 345–346
show ip arp internal event-history, 163–164
show ip arp internal event-history event, 92
show ip dhcp relay, 337–338
show ip dhcp relay statistics, 337–338
show ip dhcp snooping, 342
show ip dhcp snooping binding, 342–343
show ip eigrp, 404
show ip eigrp interface, 402, 415–416
show ip eigrp neighbor detail, 410–411
show ip eigrp topology, 395, 398
show ip eigrp traffic, 405
show ip igmp groups, 845–846
show ip igmp interface, 853–854
show ip igmp interface vlan, 768–769
show ip igmp internal event-history debugs, 769
show ip igmp internal event-history igmp-internal, 769–770
show ip igmp route, 769
show ip igmp snooping groups, 845–846
show ip igmp snooping groups vlan, 764
show ip igmp snooping internal event-history vlan, 766
show ip igmp snooping mrouter, 854–855
show ip igmp snooping otv groups, 935
show ip igmp snooping statistics, 864–865
show ip igmp snooping statistics global, 767
show ip igmp snooping statistics vlan, 767–768, 934–935
show ip igmp snooping vlan, 757, 763–764
show ip interface, 374
show ip mroute, 770–771, 794–795, 892–893, 932
show ip mroute summary, 894
show ip msdp internal event-history route, 837–838
show ip msdp internal event-history tcp, 837–838
show ip msdp peer, 835–836
show ip ospf, 461
show ip ospf event-history, 464–465
show ip ospf interface, 461, 475–476
show ip ospf internal event-history adjacency, 47
show ip ospf internal event-history rib, 169–170
show ip ospf internal txlist urib, 169
show ip ospf neighbors, 458–459
show ip ospf traffic, 463
show ip pim df, 805–806, 809
show ip pim group-range, 829–830
show ip pim interface, 782–783, 852–853
show ip pim internal event-history bidir, 806
show ip pim internal event-history data-header-register, 840–841
show ip pim internal event-history data-register-receive, 790
show ip pim internal event-history hello, 783–784
show ip pim internal event-history join-prune, 792–793, 806–807, 808,
846–847, 858, 865
show ip pim internal event-history null-register, 790, 791, 840–841, 857
show ip pim internal event-history rp, 819–820, 827–828
show ip pim internal event-history vpc, 857, 865–867
show ip pim internal vpc rpf-source, 856–857, 866–867
show ip pim neighbor, 781
show ip pim rp, 814–819, 822–827
show ip pim statistics, 783, 828–829
show ip prefix-list, 580–581
show ip route, 171, 419–421
show ip sla configuration, 324
show ip sla statistics, 323
show ip traffic, 154–156, 611–612
show ip verify source interface, 349–350
show ipv6 dhcp guard policy, 369–370
show ipv6 dhcp relay statistics, 358–359
show ipv6 icmp vaddr, 378–379
show ipv6 interface, 378–379
show ipv6 nd, 355–356
show ipv6 nd raguard policy, 364
show ipv6 neighbor, 354
show ipv6 snooping policies, 369–370
show isis, 525–526
show isis adjacency, 520–523
show isis database, 558–560
show isis event-history, 530–531
show isis interface, 523–525, 526–527
show isis traffic, 528–529
show key chain, 417, 546
show lacp counters, 262–263
show lacp internal info interface, 263–264
show lacp neighbor, 264
show lacp system-identifier, 264
show logging log, 88
show logging logfile, 959
show logging onboard internal kernel, 148
show logging onboard module 10 status, 23
show mac address-table, 198–199
show mac address-table dynamic vlan, 796, 919–920, 923
show mac address-table multicast, 764
show mac address-table vlan, 305–306
show maintenance profile, 727–728
show maintenance timeout, 726
show module, 96–98, 708
show monitor session, 56–57
show ntp peer-status, 82
show ntp statistics, 83
show nxapi-server logs, 973–975
show nxsdk internal event-history, 967
show nxsdk internal service, 965–966
show otv adjacency, 889, 906–907, 910
show otv arp-nd-cache, 916
show otv data-group, 931
show otv internal adjacency, 890
show otv internal event-history arp-nd, 916–917
show otv isis database, 899
show otv isis database detail, 900–902
show otv isis hostname, 899
show otv isis interface overlay, 906
show otv isis internal event-history adjacency, 898
show otv isis internal event-history iih, 896–897
show otv isis internal event-history spf-leaf, 902–903
show otv isis ip redistribute mroute, 930, 934
show otv isis mac redistribute route, 903–904
show otv isis redistribute route, 921–922
show otv isis site, 895–896
show otv isis site statistics, 904–905
show otv isis traffic overlay0, 904, 906
show otv mroute, 928, 929
show otv mroute detail, 929–930, 931, 933
show otv overlay, 888
show otv route, 902, 923
show otv route vlan, 921
show otv site, 889–890, 895, 911–912
show otv vlan, 891–892, 920
show policy-map interface, 114
show policy-map interface control-plane, 189–190
show policy-map system type network-qos, 194–195
show port-channel compatibility-parameters, 272
show port-channel load-balance, 273–274
show port-channel summary, 260–261, 272, 704–705
show port-channel traffic, 273
show processes log pid, 29
show processes log vdc-all, 109–110
show queueing interface, 114
show queuing interface, 193, 194
show routing clients, 167–168
show routing event-history, 647–648
show routing internal event-history msgs, 169–170
show routing ip multicast event-history rib, 770
show routing ip multicast source-tree detail, 868–869
show routing memory statistics, 171
show run aclmgr, 572
show run all | include glean, 161
show run copp all, 186
show run netflow, 76
show run otv, 908–909, 917–918
show run pim, 781
show run sflow, 79
show run vdc, 137
show running-config, 45
show running-config copp, 188–189
show running-config diff, 43–44
show running-config mmode, 730
show running-config sla sender, 324
show sflow, 79–80
show sflow statistics, 80
show snapshots, 725–726
show sockets client detail, 157–158
show sockets connection tcp, 615–616
show sockets connection tcp detail, 157
show sockets internal event-history events, 616–617
show sockets statistics all, 159
show spanning-tree, 225–227, 237–238, 281–282
show spanning-tree inconsistentports, 246, 252
show spanning-tree interface, 227
show spanning-tree mst, 238–239
show spanning-tree mst configuration, 237
show spanning-tree mst interface, 239–240
show spanning-tree root, 222–224, 225
show spanning-tree vlan, 897–898
show system inband queuing statistics, 150
show system internal access-list input entries detail, 190
show system internal access-list input statistics, 340–341, 348–349, 359,
367–368, 700–702
show system internal access-list interface, 339–340, 367–368, 700–702
show system internal access-list interface e4/2 input statistics module 4,
573–574
show system internal aclmgr access-lists policies, 574–575
show system internal aclmgr ppf node, 575–576
show system internal adjmgr client, 164–165
show system internal adjmgr internal event-history events, 167
show system internal bfd event-history, 695–699
show system internal bfd transition-history, 699–700
show system internal copp info, 191–192
show system internal eltm info interface, 195
show system internal ethpm info interface, 175–178, 195
show system internal fabricpath switch-id event-history errors, 310
show system internal feature-mgr feature action, 16
show system internal feature-mgr feature bfd current status, 695
show system internal feature-mgr feature state, 15
show system internal fex info fport, 128–130
show system internal fex info sat port, 128
show system internal flash, 13–14, 24, 88–89
show system internal forwarding adjacency entry, 173–174
show system internal forwarding route, 173–174
show system internal forwarding table, 350
show system internal mmode logfile, 731
show system internal mts buffer summary, 145–146
show system internal mts buffers detail, 146–147
show system internal mts event-history errors, 148
show system internal mts sup sap description, 146–147
show system internal mts sup sap sap-id, 11–12
show system internal mts sup sap stats, 147–148
show system internal pixm info ltl, 765
show system internal pktmgr client, 151–152
show system internal pktmgr interface, 152–153
show system internal pktmgr stats, 153
show system internal port-client event-history port, 179
show system internal port-client link-event, 178–179
show system internal qos queueing stats interface, 114–115
show system internal rpm as-path-access-list, 672–673
show system internal rpm clients, 588–589
show system internal rpm event-history rsw, 588, 672–673
show system internal rpm ip-prefix-list, 589, 668–669
show system internal sal info database vlan, 350
show system internal sflow info, 80
show system internal sup opcodes, 147
show system internal sysmgr gsync-pending, 32
show system internal sysmgr service, 10
show system internal sysmgr service all, 10, 11, 146
show system internal sysmgr service dependency srvname, 142–143
show system internal sysmgr state, 31–32, 710–711
show system internal ufdm event-history debugs, 171–172
show system internal vpcm info interface, 318–320
show system mode, 720–722
show system redundancy ha status, 709
show system redundancy status, 29–30, 708–709
show system reset-reason, 29, 110
show tech adjmgr, 167
show tech arp, 167
show tech bfd, 704
show tech bgp, 687
show tech dhcp, 362
show tech ethpm, 179
show tech glbp, 390
show tech hsrp, 379
show tech netstack, 617, 687
show tech nxapi, 975
show tech nxsdk, 967
show tech routing ipv4 unicast, 687
show tech rpm, 687
show tech track, 334
show tech vpc, 294
show tech vrrp, 385
show tech vrrpv3, 385
show tech-support, 44–45, 320, 749–750
show tech-support detail, 124, 141
show tech-support eem, 87
show tech-support eltm, 195
show tech-support ethpm, 130, 195
show tech-support fabricpath, 310
show tech-support fex, 130
show tech-support ha, 719
show tech-support issu, 719
show tech-support mmode, 731
show tech-support netflow, 78
show tech-support netstack, 160
show tech-support pktmgr, 160
show tech-support sflow, 80
show tech-support vdc, 141
show tunnel internal implicit otv brief, 890–891
show tunnel internal implicit otv detail, 922, 937
show tunnel internal implicit otv tunnel_num, 891
show udld, 247–248
show udld internal event-history errors, 248–249
show vdc detail, 137–138
show vdc internal event-history, 140–141
show vdc membership, 139–140
show vdc resource detail, 138–139
show vdc resource template, 131–132
show virtual-service, 959–960
show virtual-service tech-support, 960
show vlan, 201–202, 214
show vlan private-vlan, 210–211
show vpc, 280–281, 284–285, 314–315
show vpc consistency-parameters, 285–286
show vpc consistency-parameters vlan, 286–287
show vpc consistency-parameters vpc, 287
show vpc orphan-ports, 288
show vpc peer-keepalive, 282–283
show vrrp, 380–381
show vrrp statistics, 381–382
show vrrpv3, 383–384
show vrrpv3 statistics, 384–385
soft-reconfiguration inbound, 654–657
source, 963
system maintenance mode always-use-custom-profile, 728–730
system mode maintenance, 720–722
system mode maintenance dont-generate-profile, 730–731
system mode maintenance on-reload reset-reason, 726–727
system mode maintenance timeout, 726
system switchover, 711–712
test packet-tracer, 71–72
communities in BGP, 684–686
community PVLANs, 207, 212–215
comparing before and after maintenance snapshots example, 725–726
complex matching route-maps example, 585
conditional matching, 569
with ACLs, 569–570
ACL Manager, 570–576
BGP network selection, 577
IGP network selection, 576–577
with prefix lists, 580–581
with prefix matching, 578–579
route-maps, 582–584
command options, 583–584
complex matching, 585–586
multiple match conditions, 584–585
configuration checkpoints, 48–49
configuration rollbacks, 48–49
configure maintenance profile command, 728–730
configuring
ARP ACLs, 348–349
ASM (any source multicast), 787–788
AS-path access list, 684
auto-RP configuration on NX-3, 817–818
auto-RP listener configuration on NX-2, 818–819
auto-RP mapping agent configuration on NX-4, 815–816
BFD (bidirectional forwarding detection)
with echo function, 702–703
for OSPF example, 694
over port-channel per-link, 704–705
sessions, 693–707
BGP (Border Gateway Protocol), 605–609
route-map, 673–674
table output after prefix-list, 665
template, 654
BiDIR (Bidirectional), 803–804
BPDU guard, 243
bridge assurance, 250
BSR (bootstrap router)
on NX-1, 822–823
on NX-2, 826–827
on NX-3, 825–826
on NX-4, 824–825
console logging example, 88
CoPP NetFlow, 78
custom maintenance profiles example, 728–730
DAI (dynamic ARP inspection), 345–346
DHCP relay, 336–337
DHCP snooping, 342
DHCPv6 guard, 369–370
dynamic ARP inspection, 346
EEM, 85–86
EIGRP (Enhanced Interior Gateway Protocol)
baseline configuration, 399–401
with custom K values, 414
with modified hello timer, 416
with passive interfaces, 404–405
stub configuration, 424
error recovery service, 244
ERSPAN, 59
FabricPath, 300–302
FEX (Fabric Extender), 126
FHRP localization configuration on NX-2, 938–939
filtering SPAN traffic, 57
GLBP (Gateway Load-Balancing Protocol), 386
HSRP (Hot Standby Routing Protocol), 372–373
HSRPv6, 377
IP SLA ICMP echo probe, 323
IP SLA TCP connect probe, 328
IP source guard, 350
IPv6 RA guard, 364
IPv6 snooping, 367
IS-IS (Intermediate System-to-Intermediate System)
baseline configuration, 518–520
L2 route-leaking, 564–565
metric transition mode, 555
with passive interfaces, 528
routing and topology table after static metric configuration, 552–553
jumbo MTU system, 193
L1 route propagation example, 560
L2 and L3 rate-limiter and exception, 184–185
LACP fast and verifying LACP speed state example, 270
Layer 3 routing over vPC example, 294
loop guard, 246
with maximum hops example, 425
maximum links example, 267
minimum number of port-channel member interfaces example, 265–266
MST (Multiple Spanning-Tree Protocol), 236–237
multicast vPC
on NX-3, 851–852
on NX-4, 850–851
NetFlow, 73–77
flow exporter definition, 75–76
flow monitor and interface, 76
flow monitor definition, 76–77
flow record definition, 74–75
sampler and interface, 78
NTP, 81–82
NX-1 redistribution, 431, 488, 567
NX-1 to redistribute 172.16.1.0/24 into OSPF, 489–490
NX-2 redistribution, 587
NX-2’s PBR, 592–593
NX-3 anycast RP with MSDP, 832–833
NX-4 anycast RP with MSDP, 834–835
NX-API feature configuration, 972
NX-OS BGP, 606
on-reload reset-reason, 726–727
OSPF (Open Shortest Path First)
baseline configuration, 456–458
to ignore interface MTU example, 470
network types example, 476
with passive interfaces, 462–463
OTV (Overlay Transport Virtualization), 882–885
adjacency server on NX-2, 908–909
ED adjacency server mode on NX-4, 908
internal interface, 882
IS-IS authentication example, 905
join interface, 883
next-hop adjacency tracking example, 946
overlay interface, 885
packet tracer, 71–72
PIM (Protocol Independent Multicast)
anycast RP on NX-4, 840
ASM on NX-1, 788
auto-RP candidate-RP on NX-1, 814–815
BiDIR on NX-1, 803–804
sparse mode on interface example, 781
SSM on NX-2, 843–844
SSM on NX-4, 844–845
static RP on NX-3, 812
PIM RP, 811–812
anycast RP, 830–841
Auto-RP, 813–820
BSR (bootstrap router), 820–830
static RP, 812–813
port down upon MAC move notification example, 242–243
port-channels, 259–260
promiscuous PVLAN SVI example, 216
route-maps, 586
sample distribute list configuration, 427
sample MST configuration on NX-1, 236–237
sample offset list configuration, 428
scale factor configuration, 190, 191–192
scheduler job example, 50
sFlow, 79
SPAN (Switched Port Analyzer), 55–56
SPAN-on-drop, 61
SPAN-on-latency, 61
SSM (source specific multicast), 843–845
syslog logging, 90
trunk port, 204
UDLD, 247
unicast RPF, 351–352
URPF (Unicast Reverse Path Forwarding), 351–352
VDC (Virtual Device Contexts), 133–134
virtual link, 484
vPC (virtual port-channel), 278–280
autorecovery example, 289
peer-gateway example, 291
vPC+, 311–314
vPC-connected receiver, 861–869
vPC-connected source, 849–861
VRRP (Virtual Router Redundancy Protocol), 380
VRRPv3 migration, 382
confirming
BFD neighbor on site VLAN example, 945
IS-IS interfaces, 523–526
OBFL is enabled on module example, 23
OSPF interfaces, 460–461
redundancy and synchronization state example, 31–32
confusing EIGRP ASN configuration example, 412
Connect state, 603–604
consistency checkers, 49–50
vPC, 283–287
console logging, 88
control plane (OTV), 885–886
adjacency server mode, 907–912
adjacency verification, 888–898
authentication, 905–907
CoPP, 912–913
IS-IS topology table, 898–905
multicast mode, 887–888
convergence in BGP, 646–649
convergence problems, 439–441
active query, 441–442
stuck in active (SIA) queries, 443–446
CoPP (control plane policing), 179–192
classes, 745
NetFlow configuration and verification example, 78
strict policy on Nexus example, 186–188
copy command, 20
core interfaces (FabricPath), verifying, 303–304
corrupt BGP update message example, 623
count or wc utility usage example, 40
count utility, 40
CPU protection, 745–747
creating and debugging base shell scripts example, 953–954
CSMA/CD (Carrier Sense Multiple Access/Collision Detect), 197
custom maintenance profiles, 727–731
D
DAI (dynamic ARP inspection), 345–349
ACL programming, 346–348
ARP ACLs, 348–349
configuring and verifying, 345–346
data plane (OTV)
ARP resolution and ARP-ND-Cache, 915–917
broadcasts, 917–918
encapsulation, 913–915
multicast traffic with multicast enabled transport, 924–932
multicast traffic with unicast transport, 932–937
selective unicast flooding, 918–919
unicast traffic with multicast enabled transport, 919–924
Dead Interval Time, 476–478
debug bgp keepalives command, 618–619
debug bgp packets command, 623
debug bgp updates command, 671–672
debug bgp updates output example, 671–672
debug commands with filter example, 649
debug filters, 47–48
debug ip bgp brib command, 643–645
debug ip bgp update command, 643–645
debug ip eigrp packets command, 405–406
debug ip ospf command, 464
debug ip pim data-register receive command, 790
debug ip pim data-register send command, 790
debug isis command, 529–530
debug log file and debug filter example, 47–48
debug logfiles, 47–48, 90, 618–619
debug mmode logfile command, 731
debug sockets tcp pcb command, 156–157
debugs for BGP update and route installation in BRIB example, 644–
645
decimal format, converting to dot-decimal, 473
dedicated OTV broadcast group example, 917–918
default FA in OSPF type-5 LSA example, 490
default-information originate command, 636
delete command, 20
dense mode (DM), 771–772
dependencies in feature manager, 14
deployment models for OTV, 881
deployment of community PVLANs on NX-1 example, 213
deployment of isolated PVLAN on NX-1 example, 209–210
detailed VLAN 115 IGMP snooping group membership example, 764
detecting inconsistent port state example, 251
determining current supervisor redundancy state example, 29–30
determining the SoC instances on module 3 of NX-2 example, 797–798
DF election message (PIM), 779–780
DHCP (Dynamic Host Configuration Protocol)
relay configuration example, 337
snooping ACL programming example, 343–345
snooping binding database example, 343
snooping configuration and validation example, 342
DHCP relay, 335–341
ACL verification, 339–341
configuring, 336–337
verifying, 337–338
DHCP snooping, 341–345
ACL programming, 343–345
binding database, 342–343
configuring, 342
DHCPv6
guard configuration and policy verification example, 369–370
relay ACL line card statistics example, 359
relay statistics example, 358–359
DHCPv6 Guard, 368–370
DHCPv6 relay agent, 357–359
DHCPv6 relay LDRA, 360–362
diagnostic tests. See GOLD (Generic Online Diagnostic) tests
diff utility, 40
different OSPF areas on Ethernet1/1 interfaces example, 472
different OSPF hello timers example, 477
dir bootflash: command, 21
dir command, 20
dir logflash: command, 24
DIS (Designated Intermediate System), 516–517, 543–544
disabling BGP client-to-client reflection example, 658
discontiguous networks in OSPF, 482–485
display filters in Ethanalyzer, 65–67
displaying
active EIGRP interfaces example, 402
EIGRP neighbors example, 401
IS-IS neighbors example, 521
IS-IS neighbors with summary and detail keywords example, 521–522
OSPF neighbors example, 459
distribute list, 426–427
dollar sign ($) in RegEx, 679–680
domains (vPC), 275–276, 280–282
dot-decimal format, converting decimal to, 473
drop threshold for syslog logging example, 190–191
DRs (Designated Routers), 452, 474–476
dummy PIM hello captured in Ethanalyzer example, 926–927
duplicate multicast packets, 870
duplicate router-ID example, 471
duplicate router-ID in OSPF, 485–487
duplicate system-ID example, 539
duplicate System-ID in IS-IS, 546–549
dynamic ARP inspection configuration and verification example, 346
dynamic tunnel encapsulation
for multicast traffic example, 937
for NX-6 example, 922
E
EBGP (external BGP), 600, 640–643
echo command, 951–952
EEM (Embedded Event Manager), 47, 50, 83–87, 107, 964
configuration and verification example, 85–86
system policy example, 84–85
with TCL script example, 86
egrep utility, 41–42
egress multicast replication, 744–745
EIGRP (Enhanced Interior Gateway Protocol), 393–394
adjacency dropping due to retry limit example, 410
adjacency failure due to holding timer example, 415
configuring
baseline configuration, 399–401
with custom K values example, 414
with modified hello timer example, 416
with passive interfaces example, 404–405
convergence problems, 439–441
active query, 441–442
stuck in active (SIA) queries, 443–446
interface level authentication example, 418
neighbor adjacency troubleshooting, 401–402
ASN mismatch, 412–413
authentication, 416–419
connectivity with primary subnet, 409–412
Hello and hold timers, 414–416
K values mismatch, 413–414
passive interfaces, 403–405
verifying active interfaces, 402–403
verifying EIGRP packets, 405–409
packet debugs example, 406
packet types, 399
path attributes for 10.1.1.0/24 example, 428–429
path metric calculation, 396–398
path selection and missing routes troubleshooting, 419–421
classic metrics versus wide metrics, 433–439
distribute list, 426–427
hop counts, 424–425
interface-based settings, 430
load balancing, 421
offset lists, 427–430
redistribution, 430–432
stub routers, 421–424
process level authentication example, 419
reference topology, 394
route-maps, 587
stub configuration example, 424
terminology, 394
topology for 10.1.1.0/24 network example, 440–441
topology for specific prefix example, 398
topology table, 395–396
traffic counters with SIA queries and replies example, 444–445
traffic statistics example, 405
ELAM (embedded logic analyzer module), 19
email utility, 42
Empty echo, 249
emulated switches
in FabricPath, 310–311
verifying, 315
enabling
authentication on FP ports example, 302
bash-shell feature and using bash commands example, 952
BFD feature example, 693
FabricPath feature example, 301
FP core ports, FP VLAN, and CE edge ports example, 301
MAC address lookup mode example, 757
NetFlow, 74
vPC ARP synchronization example, 292
encapsulation in OTV data plane, 913–915
encrypted authentication in OSPF, 480–482
entering bash shell example, 51
EOBC status and error counters example, 119
EPLD (electronic programmable logic device), 26
error recovery service configuration and demonstration example, 244
ERSPAN (Encapsulated Remote SPAN), 57–60
configuring, 59
session verification, 59–60
Established state, 605
Ethanalyzer, 63–71
capture and display filters, 65–67
capture example, 68
capture of client connection example, 973
capture of IGMP messages on NX-2 example, 767
GLBP (Gateway Load-Balancing Protocol) and, 388–390
HSRP (Hot Standby Routing Protocol) and, 375–376
for HSRPv6, 379
IPv6 Neighbor Discovery, 354–355
multicast traffic examples, 871
write and read example, 69–70
ethanalyzer local interface command, 65
ethanalyzer local read command, 68
EtherChannels. See port-channels
Ethernet protocol, 197
EthPM (Ethernet Port Manager), 175–179
event history logs, 16, 46–47, 92, 749–750, 789–795
ARP (Address Resolution Protocol)
buffer size example, 92
ND-Cache event-history example, 916–917
auto-RP on NX-4 example, 819–820
BFD (bidirectional forwarding detection), 696–697
session-based event-history example, 697–699
BGP (Border Gateway Protocol), 674–675
for inbound prefixes example, 666
multipath example, 643
for outbound prefixes example, 667
update generation example, 646
BiDIR join-prune
on NX-1, 808
on NX-4, 807
BiDIR on NX-4 example, 806
for hello messages example, 784
hello packet visibility from IS-IS, 530–531
IGMP (Internet Group Management Protocol)
internal events example, 770
snooping VLAN event-history example, 766
IS-IS (Intermediate System-to-Intermediate System), event-history
indicates different areas example, 540
and MROUTE state verification, 789–795, 799
MSDP on NX-4, 837–838
null register on NX-4 example, 841
NX-1 and NX-2 example, 536–537
NX-1 IGMP debugs example, 769
NX-1 IS-IS adjacency with MTU mismatch example, 538
NX-1 OSPF adjacency with MTU mismatch example, 469
NX-2 OTV IS-IS IIH example, 896
NX-4 OTV IS-IS IIH example, 897
OSPF (Open Shortest Path First), with mismatched area flags example,
473
OTV (Overlay Transport Virtualization)
IS-IS adjacency event-history example, 898
IS-IS SPF event-history example, 903
for RP from NX-4 with BSR example, 827–828
RPM (Route Policy Manager)
client for prefix-lists example, 668–669
viewing, 588
spanning tree protocol, viewing, 234
SSM join-prune
on NX-2, 847
on NX-4, 847
UDLD example, 248–249
examining
accounting log example, 45–46
interface MTU example, 538
interface’s MTU example, 470
MTS queue for SAP example, 12
NX-2’s L2 detailed LSPDB example, 559–560
exclude utility, 42
executing
command with multiple arguments example, 41
consistency checker example, 49
external OSPF path selection for type-1 networks example, 497
external routes
on NX-2 example, 432
in OSPF, 495–499
F
FabricPath. See also vPC+
advantages of, 294–296
authentication, 302
configuring, 300–302
devices, 310
emulated switches, 310–311
packet forwarding, 297–300
terminology, 296–297
topology information example, 306
verifying, 303–310
core interfaces, 303–304
IS-IS adjacency, 304–305
software table in hardware, 308–309
switch-IDs, 303, 310
topologies, 306
in URIB, 307
VLANs (virtual LANs), 305–306
failure detection in OTV, 944–946. See also BFD (bidirectional
forwarding detection)
feature bash-shell command, 951–952
feature bfd command, 693
feature dependency hierarchy, 142–143
feature manager, 14–16
feature netflow command, 74
feature nxapi command, 972
feature sets, installing, 15
FEX (Fabric Extender), 2–3, 124–130
configuring, 126
detail example, 127–128
internal information example, 128–130
jumbo MTU settings, 193–194
verifying, 126–128
FHRP (First-Hop Redundancy Protocol), 370
GLBP (Gateway Load-Balancing Protocol), 385–390
configuring, 386
Ethanalyzer and, 388–390
HSRP (Hot Standby Routing Protocol), 370–379
ARP table population, 375
configuring, 372–373
Ethanalyzer and, 375–376
HSRPv6, 376–379
multicast group, 374
verifying, 373–374
version comparison, 371
localization, 938–939
VRRP (Virtual Router Redundancy Protocol), 380–385
configuring, 380
statistics, 381–382
verifying, 380–381
VRRPv3, 382–385
FHS (First-Hop Security), 362–370
attacks and mitigation techniques, 363
DHCPv6 Guard, 368–370
IPv6 snooping, 365–368
RA Guard, 363–364
file systems, 19–25
commands
dir bootflash: 21
dir logflash: 24
list of, 20
show file logflash: 24–25
flash file system, 21–22
logflash, 23–25
onboard failure logging (OBFL), 22–23
filter lists, 669–673
filtering routes
in BGP, 662–663
AS-Path access lists, 684
communities, 684–686
with filter lists, 669–673
looking glass and route servers, 687
with prefix lists, 663–669
regular expressions, 676–683
with route-maps, 673–676
in OSPF, 487
filtering traffic
Ethanalyzer capture and display filters, 65–67
multicast traffic, 748–749
SPAN (Switched Port Analyzer), 57
firewalls, verifying, 613–615
flapping peer issues. See peer flapping (BGP) troubleshooting
flash file system, 21–22
flow exporter definition, 75–76
flow monitor definition, 76–77
flow record definition, 74–75
FNF (Flexible NetFlow), 72–73
Forward Delay, 220
forwarding addresses in OSPF, 488–494
forwarding loops
BPDU filter, 244–245
BPDU guard, 243–244
detecting and remediating, 241–242
MAC address notifications, 242–243
unidirectional links, 245
bridge assurance, 250–252
loop guard, 245–246
UDLD (unidirectional link detection), 246–250
FSM (Finite State Machine), 602–603
G
GIR (Graceful Insertion and Removal), 719–727
GLBP (Gateway Load-Balancing Protocol), 385–390
configuring, 386
Ethanalyzer and, 388–390
global EIGRP authentication, 418–419
GOLD (Generic Online Diagnostic) tests, 98
bootup diagnostics, 98–99
diagnostic test results example, 103–105
EEM (Embedded Event Manager), 107
runtime diagnostics, 100–107
graceful consistency checkers, 284
graceful convergence (LACP), 270
granular verification of EIGRP packets with ACL example, 409
granular view of MST topology example, 239
Guest shell, 957–960
guest shell details example, 959
gunzip command, 20
gzip command, 20
H
hardware crashes, 108–110
hardware forwarding verification on module 3 example, 799
hardware interface resources and drops example, 113
hardware internal errors example, 124
hardware rate-limiters for glean traffic example, 161, 167
hardware troubleshooting, 95–98
GOLD (Generic Online Diagnostic) tests, 98
bootup diagnostics, 98–99
EEM (Embedded Event Manager), 107
runtime diagnostics, 100–107
health checks, 108
hardware and process crashes, 108–110
interface errors and drops, 110–115
packet loss, 110
platform-specific drops, 116–124
health checks, 108
hardware and process crashes, 108–110
interface errors and drops, 110–115
packet loss, 110
platform-specific drops, 116–124
hello message (PIM), 775
Hello packets
in IS-IS, 513–514
authentication, 544–546
visibility, 530–531
in OSPF, 450–451
visibility, 465
Hello Time, 220, 476–478
Hello timers
in EIGRP, 414–416
in OSPF, 476–478
high availability. See also BFD (bidirectional forwarding detection);
FHRP (First-Hop Redundancy Protocol); vPC (virtual port-
channel)
custom maintenance profiles, 727–731
GIR (Graceful Insertion and Removal), 719–727
ISSU (in-service software upgrade), 713–719
stateful switchover (SSO), 707–712
VDC policies, 133
high-availability infrastructure, 28–29
in-service software upgrade (ISSU), 34–35
supervisor redundancy, 29–34
historical information of FIB route example, 172–173
history
of Nexus platforms, 1–2
of NX-OS, 1–2
HM (health-monitoring) diagnostic tests, 100–105
Hold Timer expired, 623–624
hold timers in EIGRP, 414–416
hop counts, 424–425
HSRP (Hot Standby Routing Protocol), 278, 370–379
ARP table population, 375
configuring, 372–373
Ethanalyzer and, 375–376
multicast group, 374
verifying, 373–374
version comparison, 371
HSRPv6, 376–379
configuration example, 377
group detail example, 378
virtual address verification example, 379
HWRL (hardware rate limiters), 179–192, 745–747
hyphen (-) in RegEx, 680–681
I
IANA (Internet Assigned Numbers Authority), 597
iBGP (internal BGP), 600
multipath, 640–643
ICMP echo probes, 322–324
id -a command, 951–952
identifying
active EIGRP interfaces example, 403
EIGRP example AS, 413
if passive IS-IS is configured for a level example, 526–527
if passive OSPF interfaces are configured example, 461
matching sequence for specific prefix pattern example, 580–581
member link for specific network traffic example, 274
root ports example, 223–224
root ports on NX-4 and NX-5 example, 224–225
Idle state, 603
IEEE 802.1D standards, 219–220
IGMP (Internet Group Management Protocol). See also vPC (virtual
port-channel)
created MROUTE entry on NX-1 example, 769, 771
event-history of internal events example, 770
IGMPv1, 750
IGMPv2, 751–752
IGMPv3, 752–756
state on NX-3 example, 863–864
state on NX-4 example, 862–863
verifying, 761–771
IGMP snooping, 756–761
MFDM entry example, 765
OTV groups on NX-2 example, 935
statistics on NX-4 example, 864–865
status for VLAN 115 example, 763–764
VLAN event-history example, 766
IGMPv1, 750
IGMPv2, 751–752
IGMPv3, 752–756, 846
IGP (Interior Gateway Protocol), 576–577
IIH (IS-IS Hello) packets, 513–514, 544–546
in-band management (VDC), 134–136
in-band Netstack KLM statistics example, 150, 152
include utility, 42
incompatible OSPF timers example, 477
incomplete configuration of route-maps, 586
indication of EIGRP K values mismatch example, 414
ingress routing optimization, 940–941
initializing VDC (Virtual Device Contexts), 134–136
instability in OTV MAC routing table example, 902
install all command, 719
install all kickstart command, 714–718
installing
custom RPM package example, 965–966
feature sets, 15
NX-SDK, 965
and removing RPM packages from bash shell example, 955–957
inter-area routes in OSPF, 495
interfaces. See also passive interfaces
EIGRP
authentication, 418
settings, 430
error counters example, 113
errors and drops, 110–115
FabricPath, verifying, 303–304
IS-IS
confirming, 523–526
link costs, 549–553
OSPF
area number mismatches, 471–473
confirming, 460–461
link costs, 500–504
PIM, verifying, 780–785
PktMgr statistics example, 153
port-channels
consistency, 271–272
establishment troubleshooting, 272
priority. See port priority
queueing statistics example, 114–115
status
object tracking for, 330
reflecting UDLD error example, 248
STP cost, 221–222
internal flash directories example, 88–89
internal interfaces (OTV), configuring, 882
inter-router communication
in IS-IS, 511
in OSPF, 450
intra-area routes in OSPF, 494
I/O module MFIB verification on module 3 example, 798
IP SLA (Service Level Agreement), 321–322
ICMP echo probes, 322–324
object tracking, 331
statistics example, 323
TCP connect probes, 328–329
UDP echo probes, 324–325
UDP jitter probes, 325–327
IPFIB process, 171–175
IPSG (IP Source Guard), 349–350
IPv4 services, 335
DHCP relay, 335–341
ACL verification, 339–341
configuring, 336–337
verifying, 337–338
DHCP snooping, 341–345
ACL programming, 343–345
binding database, 342–343
configuring, 342
dynamic ARP inspection (DAI), 345–349
ACL programming, 346–348
ARP ACLs, 348–349
configuring and verifying, 345–346
IP Source Guard (IPSG), 349–350
Unicast Reverse Path Forwarding (URPF), 351–352
IPv6 services, 352
address assignment, 357–362
DHCPv6 relay agent, 357–359
DHCPv6 relay LDRA, 360–362
First-Hop Security (FHS), 362–370
attacks and mitigation techniques, 363
DHCPv6 Guard, 368–370
IPv6 snooping, 365–368
RA Guard, 363–364
Neighbor Discovery (ND), 352–356
Ethanalyzer capture example, 355
interface information example, 355–356
peer troubleshooting, 621–622
RA guard configuration example, 364
snooping, 365–368
IS-IS (Intermediate System-to-Intermediate System), 507
areas, 508–509
configuration with passive interfaces example, 528
database for area 49.1234 example, 563
database with L2 route leaking example, 565–566
DIS (Designated Intermediate System), 516–517
event-history indicates different areas example, 540
hello debugs example, 529–530
hierarchy in, 507–508
IIH packets, 513–514
interface verification example, 523–525
inter-router communication, 511
L2 route-leaking configuration example, 564–565
LSPs (link state packets), 515–516
MAC addresses, 512–513
metric transition mode configuration and verification example, 555
mismatch of interface types example, 543–544
missing routes troubleshooting
duplicate System-ID, 546–549
interface link costs, 549–553
L1 to L2 route propagations, 556–561
metric calculation, 553–556
redistribution, 566–567
suboptimal routing, 562–566
neighbor adjacency troubleshooting
area settings mismatches, 539–541
baseline configuration, 518–520
checking adjacency capabilities, 541–543
confirming interfaces, 523–526
DIS requirements, 543–544
IIH authentication, 544–546
MTU requirements, 537–539
passive interfaces, 526–528
primary subnets, 535–537
unique System-ID, 539
verifying neighbors, 520–523
verifying packets, 528–535
NET addressing, 509–510
OSPF, compared, 508
OTV control plane, 885–886
adjacency server mode, 907–912
adjacency verification, 888–898
authentication, 905–907
CoPP, 912–913
IS-IS topology table, 898–905
multicast mode, 887–888
packet types, 511–512
path selection troubleshooting, definitions and processing order, 517–
518
protocol verification example, 525–526
routing and topology table after static metric configuration example,
552–553
TLVs, 512
topology for area 49.1234 example, 563
topology table with mismatched metric types example, 554–555
traffic statistics example, 529
verifying adjacency in FabricPath, 304–305
isolate and shutdown maintenance mode example, 721–722
isolated PVLANs, 207, 208–212
ISSU (in-service software upgrade), 34–35, 713–719
J
join interfaces (OTV), configuring, 883
join-prune message (PIM), 776–777
json utility, 42
JSON-RPC request object fields, 968–969
JSON-RPC response object fields, 970–971
jumbo MTU system configuration example, 193
K
K values mismatch, 413–414
Keepalive generation, 624–626
KEEPALIVE message, 602
kernel, 9
L
L1 adjacency is affected by L1 IIH authentication on NX-1 example,
545
L1 IIH authentication on NX-1 example, 545
L2 and L3 rate-limiter and exception configuration example, 184–185
LACP (link-aggregation control packets), 256–258
advanced configuration options, 265–268
interface establishment troubleshooting, 272
port-channel configuration, 259–260
system priority, 268–271
verifying, 262–265
LACP fast, 269–270
last utility, 40–41
Layer 2 communications
multicast addresses, 738–739
overview, 197–199
troubleshooting flowchart, 253
Layer 2 overlay. See OTV (Overlay Transport Virtualization)
Layer 3 routing
backup routing in vPC, 292–293
multicast addresses, 739–741
over vPC, 293–294
LDRA (Lightweight DHCPv6 Relay Agent), 360–362
license manager, 15
licensing, 28
line card interop limitations, 141–142
line card microcode, 17–19
listing files on standby supervisor example, 22
load balancing, 421
Local Bridge Identifier, 220
locate UUID for service name example, 11
logflash, 23–25
logging, 87–90
accounting log, 91
BGP logs collection, 687
buffered logging, 88–89
console logging, 88
debug logfiles, 90
event history logs. See event history logs
levels, 87
syslog server, 90
long-lived software releases, 26
looking glass servers, 687
loop guard, 245–246
loop prevention
with BGP, 599–600
in route reflectors, 658–659
loop-free topologies. See STP (Spanning Tree Protocol)
LSAs (link state advertisements), 453–456
LSPs (link state packets), 515–516
M
MAC addresses
address table example, 316
in FabricPath, 305–306
host C example, 919–920
host C on NX-6 example, 923
in IS-IS, 512–513
multicast source example, 796
for multicast traffic, 738–739
preventing forwarding loops, 242–243
redistribution into OTV IS-IS example, 903–904, 921–922
viewing, 198–199
in vPC+, 315–316
maintenance mode (GIR), 719–724
maintenance mode timeout settings example, 726
maintenance profiles, 727–731
maintenance software releases, 25
major software releases, 25
manageability, 950
match route-map command options example, 634
Max Age, 220
maxas-limit command, 662
maximum-prefixes in BGP, 659–661
MD5 authentication in OSPF, 480–482
member interfaces (port-channels), consistency, 271–272
member links (vPC), 277
messages
BGP (Border Gateway Protocol)
KEEPALIVE, 602
NOTIFICATION, 602
OPEN, 601–602
types of, 601
UPDATE, 602
PIM (Protocol Independent Multicast)
assert message, 778–779
bootstrap message, 777–778
candidate RP advertisement message, 779
DF election message, 779–780
hello message, 775
join-prune message, 776–777
register message, 775–776
register-stop message, 776
types of, 773–774
metric calculation
for common LAN interface speeds example, 433
for EIGRP paths, 396–398
in IS-IS, 553–556
MFDM verification on NX-2 example, 797
minor software releases, 25
mismatched OSPF hello timers example, 478
missing path of only one route example, 426
missing routes troubleshooting
EIGRP (Enhanced Interior Gateway Protocol), 419–421
classic metrics versus wide metrics, 433–439
distribute list, 426–427
hop counts, 424–425
interface-based settings, 430
load balancing, 421
offset lists, 427–430
redistribution, 430–432
stub routers, 421–424
IS-IS (Intermediate System-to-Intermediate System)
duplicate System-ID, 546–549
interface link costs, 549–553
L1 to L2 route propagations, 556–561
metric calculation, 553–556
redistribution, 566–567
suboptimal routing, 562–566
OSPF (Open Shortest Path First)
discontiguous networks, 482–485
duplicate router-ID, 485–487
filtering routes, 487
forwarding addresses, 488–494
redistribution, 487–488
mkdir command, 20
modification of spanning tree protocol port cost example, 231–232
move command, 20
MRIB creating (*, G) state example, 770
MROUTE entries
clearing, 748
from NX-3 and NX-4 after IGMP join example, 860
from NX-3 and NX-4 after SPT join example, 859
MROUTE state
on NX-1 after SPT switchover example, 794–795
on NX-1 with no receivers example, 791
on NX-2 after SPT switchover example, 794
on NX-2 with Active Source example, 790
on NX-4 after SPT switchover example, 794
on NX-4 with receiver example, 792
MROUTE types, 924
MROUTE verification, 789–795
on NX-2 example, 795
in transport network example, 932
MSDP (Multicast Source Discovery Protocol), 831–838
event-history on NX-4 example, 837–838
peer status on NX-4 example, 835–836
SA state and MROUTE status on NX-3 example, 836–837
MST (Multiple Spanning-Tree Protocol), 236
configuring, 236–237
tuning, 240–241
verifying, 237–240
MTS (Messages and Transactional Services), 11–12, 144–148
message stuck in queue example, 146
OBFL logs example, 148
SAP statistics example, 147–148
MTU mismatches, 626–630
MTU requirements
in IS-IS, 537–539
in OSPF, 469–470
MTU settings, 192–195
MTU verification
under ELTM process example, 195
under ethpm process example, 195
multicast enabled transport
multicast traffic with, 924–932
unicast traffic with, 919–924
multicast mode in OTV, 887–888
multicast source tree detail on NX-4 and NX-3 example, 869
multicast traffic, 733–735
Ethanalyzer examples, 871
IGMP. See IGMP (Internet Group Management Protocol)
Layer 2 addresses, 738–739
Layer 3 addresses, 739–741
with multicast enabled transport, 924–932
NX-OS architecture, 741–743
CLI commands, 743
CPU protection, 745–747
implementation, 747–750
replication, 744–745
PIM. See PIM (Protocol Independent Multicast)
terminology, 735–738
with unicast transport, 932–937
vPC (virtual port-channel), 848–849
duplicate packets, 870
receiver configuration and verification, 861–869
reserved VLAN, 870
source configuration and verification, 849–861
multicast vPC
configuring
on NX-3, 851–852
on NX-4, 850–851
IGMP interface on NX-4 example, 853–854
PIM interface on NX-4 example, 852–853
source MROUTE entry on NX-3 and NX-4 example, 855
source registration from NX-3 example, 857
multihoming in OTV, 939–940
multipath (BGP), 640–643
multiple match options example route-map example, 585
multiple match variables example route-map example, 584
multiple subnets in VLANs, 203
N
naming conventions for software releases, 25–27
native VLANs, 206
ND (Neighbor Discovery), 352–356
neighbor adjacency troubleshooting
EIGRP (Enhanced Interior Gateway Protocol), 401–402
ASN mismatch, 412–413
authentication, 416–419
connectivity with primary subnet, 409–412
Hello and hold timers, 414–416
K values mismatch, 413–414
passive interfaces, 403–405
verifying active interfaces, 402–403
verifying EIGRP packets, 405–409
IS-IS (Intermediate System-to-Intermediate System)
area settings mismatches, 539–541
baseline configuration, 518–520
checking adjacency capabilities, 541–543
confirming interfaces, 523–526
DIS requirements, 543–544
IIH authentication, 544–546
MTU requirements, 537–539
passive interfaces, 526–528
primary subnets, 535–537
unique System-ID, 539
verifying neighbors, 520–523
verifying packets, 528–535
OSPF (Open Shortest Path First)
area settings mismatches, 473–474
authentication, 478–482
baseline configuration, 456–458
confirming interfaces, 460–461
connectivity with primary subnet, 468
DR requirements, 474–476
interface area number mismatches, 471–473
MTU requirements, 469–470
passive interfaces, 461–463
timers, 476–478
unique router-ID, 471
verifying neighbors, 458–460
verifying packets, 463–467
neighbor states
in BGP, 602–603
Active, 604
Connect, 603–604
Established, 605
Idle, 603
OpenConfirm, 604
OpenSent, 604
in OSPF, 451–452
neighbors (PIM), verifying, 780–785
NET addressing in IS-IS, 509–510
NetFlow, 72–73
configuring, 73–77
flow exporter definition, 75–76
flow monitor definition, 76–77
flow record definition, 74–75
sampling, 77–78
statistics, 77
Netstack, 148–160
socket accounting example, 159
socket client details example, 158
network automation, 950
network broadcasts, 198
network communications, Layer 2
overview, 197–199
troubleshooting flowchart, 253
network hubs, 198
network QoS policy verification example, 195
network sniffing, 53–57
Ethanalyzer, 63–71
packet tracer, 71–72
SPAN (Switched Port Analyzer), 54–57
configuring, 55–56
ERSPAN, 57–60
filtering traffic, 57
SPAN-on-Drop, 61–62
SPAN-on-Latency (SOL), 60–61
verifying, 56
network statement BGP route advertisement, 631–633
network switches, 198
network types in OSPF, 474
network-admin and dev-ops user role permissions example, 953
next-hop adjacency tracking, 946
Nexus 2000 series, 2–3
Nexus 3000 series, 3–4
Nexus 5000 series, 4
Nexus 6000 series, 4–5
Nexus 7000 series, 5–6
hardware rate limiters example, 746
in-band events example, 123
in-band status example, 120–122
packet flow drop counters example, 116–118
Nexus 9000 series, 6–7
in-band status example, 120–122
Nexus core files example, 108
Nexus in-band counters example, 123
Nexus interface details and capabilities example, 111–112
Nexus platforms
history of, 1–2
Nexus 2000 series, 2–3
Nexus 3000 series, 3–4
Nexus 5000 series, 4
Nexus 6000 series, 4–5
Nexus 7000 series, 5–6
Nexus 9000 series, 6–7
Nexus process crash example, 109–110
no configure maintenance profile command, 728–730
no system mode maintenance command, 724–725
no-more utility, 42
normal traffic flow to NX-6’s loopback 0 interface example, 593
NOTIFICATION message, 602
notifications in BGP, 619–621
NTP (Network Time Protocol), 81–83
configuring, 81–82
statistics, 83
NX-1 and NX-2 detect bad subnet mask example, 468
NX-1 and NX-2 event-history example, 536–537
NX-1 and NX-2 routing table for adjacency example, 412
NX-1 and NX-3’s routing table example, 564
NX-1 configuration to redistribute 172.16.1.0/24 into OSPF example,
489–490
NX-1 detects NX-2 as neighbor example, 410
NX-1 does not detect NX-2 example, 537
NX-1 external OSPF path selection for type-2 network example, 498–
499
NX-1 IGMP debugs event-history example, 769
NX-1 IGMP interface VLAN 115 state example, 768–769
NX-1 IS-IS adjacency event-history with MTU mismatch example, 538
NX-1 OSPF adjacency event-history with MTU mismatch example,
469
NX-1 redistribution configuration example, 431, 488, 567
NX-1 stuck in INIT state with NX-2 example, 535
NX-1’s routing table example, 420
NX-1’s routing table with missing NX-4’s 10.4.4.0/24 network example,
547
NX-1’s routing table with missing NX-4’s loopback interface example,
485–486
NX-1’s spanning tree protocol information example, 226
NX-2 and NX-4’s routing table after L1 route propagation example,
561
NX-2 OTV IS-IS IIH event-history example, 896
NX-2 redistribution configuration example, 587
NX-2 VLAN 115 IGMP snooping statistics example, 767–768
NX-2’s LSPDB example, 558
NX-2’s PBR configuration example, 592–593
NX-3 anycast RP with MSDP configuration example, 832–833
NX-3 external OSPF path selection for type-2 network example, 499
NX-3’s LSP after enabling route propagation example, 561
NX-4 anycast RP with MSDP configuration example, 834–835
NX-4 OTV IS-IS IIH event-history example, 897
NX-6 detected as MROUTER port by IGMP snooping example, 928
NX-API, 968–975
Cisco proprietary request object fields, 969–970
Cisco proprietary response object fields, 971
feature configuration example, 972
JSON-RPC request object fields, 968–969
JSON-RPC response object fields, 970–971
server logs example, 973–975
NX-OS
architecture of, 8–9
feature manager, 14–16
file systems, 19–25
kernel, 9
line card microcode, 17–19
Messages and Transactional Services (MTS), 11–12
Persistent Storage Services (PSS), 13–14
system manager (sysmgr), 9–11
BGP (Border Gateway Protocol)
configuration example, 606
peering verification example, 607
process example, 608–609
table output example, 607
component logging level example, 89
detection of forwarding loop example, 242
high-availability infrastructure, 28–29
in-service software upgrade (ISSU), 34–35
supervisor redundancy, 29–34
history of, 1–2
licensing, 28
management and operations
accounting log, 45–46
bash shell, 51
CLI, 39–44
configuration checkpoint and rollback, 48–49
consistency checkers, 49–50
debug filters and debug log files, 47–48
event history logs, 46–47
python interpreter, 50
scheduler, 50
technical support files, 44–45
multicast architecture, 741–743
CLI commands, 743
CPU protection, 745–747
implementation, 747–750
replication, 744–745
pillars of, 1–2, 8
Python interpreter example, 50
Software Maintenance Upgrades (SMUs), 27–28
software releases, 25–27
system component troubleshooting, 142–143
ARP and Adjacency Manager, 160–175
EthPM and Port-Client, 175–179
HWRL, CoPP, system QoS, 179–192
MTS (Message and Transaction Service), 144–148
MTU settings, 192–195
Netstack and Packet Manager, 148–160
virtualization
Virtual Device Contexts (VDCs), 35–37
virtual port channels (vPC), 37–39
Virtual Routing and Forwarding (VRF), 37
NX-SDK, 964–967
event history example, 967
O
OBFL (onboard failure logging), 22–23
object tracking, 329
for interface status, 330
for route status, 330–331
with static routes, 334
for track-list state, 332–333
offline diagnostics, 107
offset list configuration example, 428
offset lists, 427–430
on-demand diagnostics, 105–107
on-reload reset-reason configuration and verification example, 726–
727
OPEN message, 601–602, 617–618
Open NX-OS, 950–951
OpenConfirm state, 604
OpenSent state, 604
ORIB entry for host C on NX-6 example, 923
orphan ports (vPC), 288
OSPF (Open Shortest Path First), 449
adjacency failure example, 475
areas, 453
configuration with passive interfaces example, 462–463
Designated Routers (DRs), 452
encrypted authentication example, 480–481
event-history with mismatched area flags example, 473
hello and packet debugs example, 464
Hello packets, 450–451
interface output example, 461
interface output in brief format example, 460
inter-router communication, 450
IS-IS, compared, 508
LSAs (link state advertisements), 453–456
missing routes troubleshooting
discontiguous networks, 482–485
duplicate router-ID, 485–487
filtering routes, 487
forwarding addresses, 488–494
redistribution, 487–488
neighbor adjacency troubleshooting
area settings mismatches, 473–474
authentication, 478–482
baseline configuration, 456–458
confirming interfaces, 460–461
connectivity with primary subnet, 468
DR requirements, 474–476
interface area number mismatches, 471–473
MTU requirements, 469–470
passive interfaces, 461–463
timers, 476–478
unique router-ID, 471
verifying neighbors, 458–460
verifying packets, 463–467
neighbor states, 451–452
neighbors stuck in EXSTART neighbor state example, 469
network types, 474
path selection troubleshooting, 494
external routes, 495–499
inter-area routes, 495
interface link costs, 500–504
intermixed RFC 1583 and RFC 2328 devices, 499–500
intra-area routes, 494
plaintext authentication example, 479
route distribution to URIB example, 169
routing table example, 456
traffic statistics example, 463
OTV (Overlay Transport Virtualization), 875–877
(V, *, G) MROUTE detail on NX-6 example, 933
(V, S, G) MROUTE detail on NX-2 example, 929–930
(V, S, G) MROUTE detail on NX-6 example, 931
adjacencies with secondary IP address example, 943–944
adjacency server configuration on NX-2 example, 908–909
adjacency server mode dual adjacency example, 911–912
adjacency server mode IS-IS neighbors example, 910
advanced features
fast failure detection, 944–946
FHRP localization, 938–939
ingress routing optimization, 940–941
multihoming, 939–940
tunnel depolarization, 942–944
VLAN mapping, 941–942
configuring, 882–885
control plane, 885–886
adjacency server mode, 907–912
adjacency verification, 888–898
authentication, 905–907
CoPP, 912–913
IS-IS topology table, 898–905
multicast mode, 887–888
data plane
ARP resolution and ARP-ND-Cache, 915–917
broadcasts, 917–918
encapsulation, 913–915
multicast traffic with multicast enabled transport, 924–932
multicast traffic with unicast transport, 932–937
selective unicast flooding, 918–919
unicast traffic with multicast enabled transport, 919–924
deployment models, 881
dynamic unicast tunnels example, 891
ED adjacency server mode configuration on NX-4 example, 908
flood control and broadcast optimization, 877
IGMP proxy reports example, 934–935
internal interface configuration example, 882
IS-IS (Intermediate System-to-Intermediate System)
adjacencies on overlay example, 889
adjacency event-history example, 898
authentication error statistics example, 906
authentication parameters example, 906
database detail example, 900–901
database example, 899
dynamic hostname example, 899
LSP updating frequently example, 901–902
MGROUP database detail on NX-2 example, 935
MGROUP database on NX-2 example, 928–929
overlay traffic statistics example, 904
site adjacency example, 889–890
site-VLAN statistics example, 904–905
SPF event-history example, 903
join interface configuration example, 883
MGROUP database detail on NX-6 example, 930
MROUTE
detail on NX-2 example, 936
detail on NX-6 example, 936–937
entry on NX-2 example, 929
redistributed into IS-IS on NX-6 example, 934
redistribution into OTV IS-IS example, 930
state on NX-6 example, 928
overlay interface configuration example, 885
overlay IS-IS adjacency down example, 907
partial adjacency example, 895
routing table with selective unicast flooding example, 918–919
site VLAN, 882
SSM data-groups example, 925
supported platforms, 878
terminology, 878–880
out-of-band management (VDC), 134–136
output of RR reflected prefix example, 659
overlay interfaces (OTV)
configuring, 885
IS-IS authentication on, 905–907
verifying, 888–898
P
PA (path attributes), 599
packet capture, 53–57
Ethanalyzer, 63–71
packet tracer, 71–72
SPAN (Switched Port Analyzer), 54–57
configuring, 55–56
ERSPAN, 57–60
filtering traffic, 57
SPAN-on-Drop, 61–62
SPAN-on-Latency (SOL), 60–61
verifying, 56
packet loss
reasons for, 110
interface errors and drops, 110–115
platform-specific drops, 116–124
verifying, 611–613
Packet Manager (PktMgr), 148–160
packet processing filter (PPF), 575–576
packet tracer, 71–72
packets. See also messages
EIGRP (Enhanced Interior Gateway Protocol)
types of, 399
verifying, 405–409
FabricPath, 297–300
IS-IS (Intermediate System-to-Intermediate System)
IIH, 513–514, 544–546
LSPs, 515–516
types of, 511–512
verifying, 528–535
LACP. See LACP (link-aggregation control packets)
OSPF (Open Shortest Path First)
types of, 450
verifying, 463–467
parentheses () in RegEx, 681–682
partial configuration of route-maps, 586
passive interfaces
in EIGRP, 403–405
in IS-IS, 526–528
in OSPF, 461–463
path changed for 10.1.1.0/24 route example, 427
path check after L2 route leaking example, 566
path metric calculation in EIGRP, 396–398
path modification on NX-6 example, 429–430
path selection troubleshooting
EIGRP (Enhanced Interior Gateway Protocol), 419–421
classic metrics versus wide metrics, 433–439
distribute list, 426–427
hop counts, 424–425
interface-based settings, 430
load balancing, 421
offset lists, 427–430
redistribution, 430–432
stub routers, 421–424
IS-IS (Intermediate System-to-Intermediate System), 517–518
OSPF (Open Shortest Path First), 494
external routes, 495–499
inter-area routes, 495
interface link costs, 500–504
intermixed RFC 1583 and RFC 2328 devices, 499–500
intra-area routes, 494
Path-MTU-Discovery (PMTUD), 626–627
PBR (policy-based routing), 591–594
peer flapping (BGP) troubleshooting, 622
bad BGP updates, 622–623
Hold Timer expired, 623–624
Keepalive generation, 624–626
MTU mismatches, 626–630
peer link (vPC), 277
peer-gateway (vPC), 289–291
peering down (BGP) troubleshooting, 609–610
ACL and firewall verification, 613–615
configuration verification, 610–611
debug logfiles, 618–619
notifications, 619–621
OPEN message errors, 617–618
reachability and packet loss verification, 611–613
TCP session verification, 615–617
peer-keepalive link (vPC), 276–277, 282–283
period (.) in RegEx, 682
Persistent Storage Services (PSS), 13–14
pillars of NX-OS, 1–2, 8
PIM (Protocol Independent Multicast), 771–772
(S, G) join events and MROUTE state example, 868
anycast RP configuration on NX-4 example, 840
ASM (any source multicast), 785–787
configuring, 787–788
event-history and MROUTE state verification, 789–795
platform verification, 795–799
verifying, 788–789
auto-RP candidate-RP configuration on NX-1 example, 814–815
BiDIR, 799–803
configuring, 803–804
DF status on NX-4 example, 805–806
event-history on NX-4 example, 806
interface counters on NX-4 example, 807–808
join-prune event-history on NX-1 example, 808
join-prune event-history on NX-4 example, 807
MROUTE entry on NX-1 example, 809
MROUTE entry on NX-2 example, 811
MROUTE entry on NX-4 example, 805
terminology, 800
verifying, 805–811
DF status on NX-1 example, 809
Ethanalyzer capture of PIM hello message example, 784–785
event-history for hello messages example, 784
event-history for RP from NX-4 with BSR example, 827–828
global statistics example, 783
group-to-RP mapping information from NX-2 example, 830
interface and neighbor verification, 780–785
interface parameters on NX-1 example, 782–783
join received from NX-1 on NX-2 example, 793
join sent from NX-1 to NX-2 example, 793
message types
assert message, 778–779
bootstrap message, 777–778
candidate RP advertisement message, 779
DF election message, 779–780
hello message, 775
join-prune message, 776–777
list of, 773–774
register message, 775–776
register-stop message, 776
neighbors on NX-1 example, 781
null register event-history on NX-4 example, 841
RP configuration, 811–812
anycast RP, 830–841
Auto-RP, 813–820
BSR (bootstrap router), 820–830
static RP, 812–813
RPT join from NX-4 to NX-1 example, 792
RPT join received on NX-1 example, 792
SPT joins from NX-2 for vPC-connected sources example, 858
SSM (source specific multicast), 841–843
configuring, 843–845
verifying, 845–848
static RP on NX-3 configuration example example, 812
statistics on NX-4 with BSR example, 828–829
trees, 772–773
vPC (virtual port-channel)
forwarder election on NX-3 and NX-4 example, 866–867
RPF-source cache table on NX-3 and NX-4 example, 856–857
status on NX-4 example, 867
ping test and show ip traffic command output example, 612
ping with DF-bit set example, 629
ping with source interface as loopback example, 611
pipe (|) in RegEx, 681–682
PktMgr (Packet Manager), 148–160
plaintext authentication in OSPF, 478–480
platform FIB verification example, 173–174, 176–178
platform-specific drops, 116–124
plus sign (+) in RegEx, 682
PMTUD (Path-MTU-Discovery), 626–627
port priority
LACP, 268–269
modifying, 232–233
port-channels, 255–258. See also vPC (virtual port-channel)
advanced LACP options, 265–268
advantages of, 255–256
configuring, 259–260
LACP in, 256–258
interface establishment troubleshooting, 272
system priority, 268–271
verifying packets, 262–265
member interface consistency, 271–272
traffic load-balancing troubleshooting, 272–274
verifying status, 260–262
Port-Client, 175–179
portfast, 232–235
PPF (packet processing filter), 575–576
prefix advertisement using network command example, 632–633
prefix lists, 580–581, 663–669
prefix matching, 578–579
prefix-list-based route filtering example, 664
primary subnets
EIGRP connectivity, 409–412
IS-IS connectivity, 535–537
OSPF connectivity, 468
process crashes, 108–110
programmability, 950. See also automation; shells and scripting
NX-API, 968–975
NX-SDK, 964–967
Open NX-OS, 950–951
promiscuous PVLANs, 207
community PVLANs and, 212–215
isolated PVLANs and, 208–212
on SVI, 215–217
PSS (Persistent Storage Services), 13–14
PVLANs (private VLANs), 207–208
communication capability between hosts, 208
community PVLANs, 212–215
isolated PVLANs, 208–212
promiscuous PVLANs on SVI, 215–217
trunking between switches, 217–218
PVST (Per-VLAN Spanning Tree), 220
PVST+ (Per-VLAN Spanning Tree Plus), 220
pwd command, 20, 951–952
Python, 960–964
with EEM example, 87
interpreter from CLI and guest shell example, 961
invoking from EEM applet example, 964
printing all interfaces in UP state example, 963–964
python command, 50, 960–961
python interpreter, 50
Q
query modifiers. See RegEx (regular expressions)
question mark (?) in RegEx, 683
queue names (MTS), 146
R
R1 routing table with GRE tunnel example, 139–140
R1’s and NX-2’s IS-IS routing table entries example, 554
R1’s and NX-3’s IS-IS topology table with default metric example, 551
R1’s routing table with 1 gigabit link shutdown example, 502
R1’s routing table with default interface metrics bandwidth example,
550
R1’s routing table with default OSPF auto-cost bandwidth example,
502
RA Guard, 363–364
rate-limiter usage example, 183–184
reachability, verifying, 611–613
redirection, 39
redistribution
in BGP, 633–634
in EIGRP, 430–432
in IS-IS, 566–567
in OSPF, 487–488
redundancy switchover example, 711–712
RegEx (regular expressions), 676–683
asterisk (*), 683
brackets ([]), 680
caret (^), 679
caret in brackets ([^]), 681
dollar sign ($), 679–680
hyphen (-), 680–681
list of, 676
parentheses (), 681–682
period (.), 682
pipe (|), 681–682
plus sign (+), 682
question mark (?), 683
underscore (_), 677–678
register message (PIM), 775–776, 790
register-stop message (PIM), 776, 791
replication, 744–745
reserved VLAN, 870
resolved and unresolved adjacencies example, 165–166
resource templates (VDC), 131–132
restoring connectivity by allowing BPDUs to process example, 252
reviewing OSPF adjacency event history example, 47
RFC 1583 devices, 499–500
RFC 2328 devices, 499–500
RID (router ID)
in BGP, 601
in OSPF, 471, 485–487
rmdir command, 20
Root Bridge Identifier, 220
root bridges, 219
election, 222–224
placement, 228–229
root guard, 229
Root Path Cost, 220
root ports
identification, 224–225
modifying location, 229–232
route advertisement in BGP, 631
with aggregation, 634–635
with default-information originate command, 636
with network statement, 631–633
with redistribution, 633–634
route aggregation in BGP, 634–635
route filtering
in BGP, 662–663
communities, 684–686
with filter lists, 669–673
looking glass and route servers, 687
AS-Path access lists, 684
with prefix lists, 663–669
regular expressions, 676–683
with route-maps, 673–676
in OSPF, 487
route leaking in IS-IS, 564–566
route policies in BGP, 662–663
communities, 684–686
with filter lists, 669–673
looking glass and route servers, 687
AS-Path access lists, 684
with prefix lists, 663–669
regular expressions, 676–683
with route-maps, 673–676
route processing in BGP, 630–631
route propagation in BGP, 630–631
route reflectors in BGP, 657–659
route refresh in BGP, 654–657
route servers, 687
route status, object tracking for, 330–331
route-maps
attribute modifications (set actions), 586
in BGP, 673–676
conditional matching, 582–584
command options, 583–584
complex matching, 585–586
multiple match conditions, 584–585
explained, 581–582
partial configuration, 586
PBR (policy-based routing), 591–594
RPM (Route Policy Manager), 586–590
routing loop because of intermixed OSPF devices example, 500
routing protocol and URIB updates example, 170
routing protocol states during maintenance mode example, 722–724
routing tables
with impact example, 422
of NX-1, NX-2, NX-3, and NX-4 example, 557
of NX-1 and NX-6 example, 424–425
of NX-2 and NX-4 example, 486, 548
RP configuration (PIM), 811–812
anycast RP, 830–841
Auto-RP, 813–820
BSR (bootstrap router), 820–830
static RP, 812–813
RPM (Route Policy Manager), 586–590, 668–669
RSTP (Rapid Spanning Tree Protocol), 220–221
blocked switch port identification, 225–227
interface STP cost, 221–222
root bridge election, 222–224
root port identification, 224–225
tuning, 228–235
port priority, 232–233
root bridge placement, 228–229
root guard, 229
root port and blocked switch port locations, 229–232
topology changes and portfast, 232–235
verifying VLANs on trunk links, 227
run bash command, 51, 951–952
runtime diagnostics, 100–107
S
SAFI (subsequent address-family identifier), 598–599
SAL database info and FIB verification for IPSG example, 350
sampling
with NetFlow, 77–78
with sFlow, 78–80
SAP (service access points), 11, 147
scale factor configuration example, 190, 191–192
scaling BGP (Border Gateway Protocol), 649–650
maxas-limit command, 662
maximum-prefixes, 659–661
with route reflectors, 657–659
soft reconfiguration inbound versus route refresh, 654–657
with templates, 653–654
tuning memory consumption, 650–653
scheduler, 50
scripting. See shells and scripting
secondary IP address to avoid polarization example, 943
section utility, 42
selective unicast flooding, 918–919
sessions (BGP), 600–601
set actions for route-maps, 586
setting static IS-IS metric on R1 and R2 example, 552
sFlow, 78–80
configuring, 79
statistics, 80
shells and scripting, 951
bash shell, 951–957
Guest shell, 957–960
Python, 960–964
short-lived software releases, 26
show accounting log command, 45–46
show bash-shell command, 951–952
show bfd neighbors command, 694–695, 704–705
show bfd neighbors detail command, 702–703
show bgp command, 606–607, 638–639
show bgp convergence detail command, 648–649
show bgp convergence detail command output example, 648–649
show bgp event-history command, 647–648
show bgp event-history detail command, 642–643, 646, 665–667, 674–
675
show bgp ipv4 unicast policy statistics neighbor command, 675
show bgp policy statistics neighbor filter-list command, 672
show bgp policy statistics neighbor prefix-list command, 667–668
show bgp private attr detail command, 652–653
show bgp process command, 607–609
show cli list command, 42–43
show cli list command example, 42–43
show cli syntax command, 43
show cli syntax command example, 43
show clock command, 82
show command output redirection example, 40
show copp diff profile command, 188
show cores command, 29
show cores vdc-all command, 108
show diagnostic bootup level command, 99
show diagnostic content module command, 101–103
show diagnostic content module command output example, 102–103
show diagnostic ondemand setting command, 106–107
show diagnostic result module command, 103–105
show event manager policy internal command, 85–86
show event manager system-policy command, 84–85
show fabricpath conflict all command, 310
show fabricpath isis adjacency command, 304–305
show fabricpath isis interface command, 303–304
show fabricpath isis topology command, 306
show fabricpath isis vlan-range command, 305–306
show fabricpath route command, 307
show fabricpath switch-id command, 303, 315
show fabricpath switch-id command output example, 303
show fabricpath unicast routes vdc command, 308–309
show fex command, 126–128
show file command, 20
show file logflash: command, 24–25
show forwarding distribution ip igmp snooping vlan command, 765
show forwarding distribution ip multicast route group command, 797
show forwarding internal trace v4-adj-history command, 162
show forwarding internal trace v4-pfx-history command, 172–173
show forwarding ipv4 adjacency command, 162–163
show forwarding ipv4 route command, 173–174
show forwarding route command, 173–174
show glbp and show glbp brief command output example, 387–388
show glbp brief command, 386–388
show glbp command, 386–388
show guestshell detail command, 958–959
show hardware capacity interface command, 113
show hardware command, 98
show hardware flow command, 76–77
show hardware internal cpu-mac eobc stats command, 118–119
show hardware internal cpu-mac inband counters command, 123
show hardware internal cpu-mac inband events command, 122–123
show hardware internal cpu-mac inband stats command, 119–122
show hardware internal dev-port-map command, 797–798
show hardware internal errors command, 114, 124
show hardware internal forwarding asic rate-limiter command, 184–
185
show hardware internal forwarding instance command, 309
show hardware internal forwarding rate-limiter usage command, 182–
184
show hardware internal statistics module pktflow dropped command,
116–118
show hardware mac address-table command, 764
show hardware rate-limiter command, 745–746
show hardware rate-limiters command, 181–182
show hsrp brief command, 373–374
show hsrp detail command, 373–374
show hsrp group detail command, 377–378
show incompatibility-all system command, 713–714
show interface command, 110–112, 193, 194, 203–204
show interface counters errors command, 112–113
show interface port-channel command, 261–262
show interface trunk command, 204–205
show interface trunk command output example, 205
show interface vlan 10 private-vlan mapping command, 216
show ip access-list command, 572–573
show ip adjacency command, 165–166
show ip arp command, 161–162, 796
show ip arp inspection statistics vlan command, 345–346
show ip arp internal event-history command, 163–164
show ip arp internal event-history event command, 92
show ip dhcp relay command, 337–338
show ip dhcp relay statistics command, 337–338
show ip dhcp snooping binding command, 342–343
show ip dhcp snooping command, 342
show ip eigrp command, 404
show ip eigrp interface command, 402, 415–416
show ip eigrp neighbor detail command, 410–411
show ip eigrp topology command, 395, 398
show ip eigrp traffic command, 405
show ip igmp groups command, 845–846
show ip igmp interface command, 853–854
show ip igmp interface vlan command, 768–769
show ip igmp internal event-history debugs command, 769
show ip igmp internal event-history igmp-internal command, 769–770
show ip igmp route command, 769
show ip igmp snooping groups command, 845–846
show ip igmp snooping groups vlan command, 764
show ip igmp snooping internal event-history vlan command, 766
show ip igmp snooping mrouter command, 854–855
show ip igmp snooping otv groups command, 935
show ip igmp snooping statistics command, 864–865
show ip igmp snooping statistics global command, 767
show ip igmp snooping statistics vlan command, 767–768, 934–935
show ip igmp snooping vlan command, 757, 763–764
show ip interface command, 374
show ip mroute command, 770–771, 794–795, 892–893, 932
show ip mroute summary command, 894
show ip msdp internal event-history route command, 837–838
show ip msdp internal event-history tcp command, 837–838
show ip msdp peer command, 835–836
show ip ospf command, 461
show ip ospf event-history command, 464–465
show ip ospf interface command, 461, 475–476
show ip ospf internal event-history adjacency command, 47
show ip ospf internal event-history rib command, 169–170
show ip ospf internal txlist urib command, 169
show ip ospf neighbors command, 458–459
show ip ospf traffic command, 463
show ip pim df command, 805–806, 809
show ip pim group-range command, 829–830
show ip pim interface command, 782–783, 852–853
show ip pim internal event-history bidir command, 806
show ip pim internal event-history data-header-register command,
840–841
show ip pim internal event-history data-register-receive command,
790
show ip pim internal event-history hello command, 783–784
show ip pim internal event-history join-prune command, 792–793,
806–807, 808, 846–847, 858, 865
show ip pim internal event-history null-register command, 790, 791,
840–841, 857
show ip pim internal event-history rp command, 819–820, 827–828
show ip pim internal event-history vpc command, 857, 865–867
show ip pim internal vpc rpf-source command, 856–857, 866–867
show ip pim neighbor command, 781
show ip pim rp command, 814–819, 822–827
show ip pim statistics command, 783, 828–829
show ip prefix-list command, 580–581
show ip route command, 171, 419–421
show ip sla configuration command, 324
show ip sla statistics command, 323
show ip traffic command, 154–156, 611–612
show ip verify source interface command, 349–350
show ipv6 dhcp guard policy command, 369–370
show ipv6 dhcp relay statistics command, 358–359
show ipv6 icmp vaddr command, 378–379
show ipv6 interface command, 378–379
show ipv6 nd command, 355–356
show ipv6 nd raguard policy command, 364
show ipv6 neighbor command, 354
show ipv6 snooping policies command, 369–370
show isis adjacency command, 520–523
show isis command, 525–526
show isis database command, 558–560
show isis event-history command, 530–531
show isis interface command, 523–525, 526–527
show isis traffic command, 528–529
show key chain command, 417, 546
show lacp counters command, 262–263
show lacp internal info interface command, 263–264
show lacp neighbor command, 264
show lacp system-identifier command, 264
show logging log command, 88
show logging logfile command, 959
show logging onboard internal kernel command, 148
show logging onboard module 10 status command, 23
show mac address-table command, 198–199
show mac address-table dynamic vlan command, 796, 919–920, 923
show mac address-table multicast command, 764
show mac address-table vlan command, 305–306
show maintenance profile command, 727–728
show maintenance timeout command, 726
show module command, 96–98, 708
show module command output example, 96–97, 708
show monitor session command, 56–57
show ntp peer-status command, 82
show ntp statistics command, 83
show nxapi-server logs command, 973–975
show nxsdk internal event-history command, 967
show nxsdk internal service command, 965–966
show otv adjacency command, 889, 906–907, 910
show otv arp-nd-cache command, 916
show otv data-group command, 931
show otv internal adjacency command, 890
show otv internal event-history arp-nd command, 916–917
show otv isis database command, 899
show otv isis database detail command, 900–902
show otv isis hostname command, 899
show otv isis interface overlay command, 906
show otv isis internal event-history adjacency command, 898
show otv isis internal event-history iih command, 896–897
show otv isis internal event-history spf-leaf command, 902–903
show otv isis ip redistribute mroute command, 930, 934
show otv isis mac redistribute route command, 903–904
show otv isis redistribute route command, 921–922
show otv isis site command, 895–896
show otv isis site statistics command, 904–905
show otv isis traffic overlay0 command, 904, 906
show otv mroute command, 928, 929
show otv mroute detail command, 929–930, 931, 933
show otv overlay command, 888
show otv route command, 902, 923
show otv route vlan command, 921
show otv site command, 889–890, 895, 911–912
show otv vlan command, 891–892, 920
show policy-map interface command, 114
show policy-map interface control-plane command, 189–190
show policy-map interface control-plane output example, 189–190
show policy-map system type network-qos command, 194–195
show port-channel compatibility-parameters command, 272
show port-channel load-balance command, 273–274
show port-channel summary command, 260–261, 272, 704–705
show port-channel traffic command, 273
show processes log pid command, 29
show processes log vdc-all command, 109–110
show queueing interface command, 114
show queuing interface command, 193, 194
show role command, 952
show routing clients command, 167–168
show routing event-history command, 647–648
show routing internal event-history msgs command, 169–170
show routing ip multicast event-history rib command, 770
show routing ip multicast source-tree detail command, 868–869
show routing memory statistics command, 171
show run aclmgr command, 572
show run all | include glean command, 161
show run copp all command, 186
show run netflow command, 76
show run otv command, 908–909, 917–918
show run pim command, 781
show run sflow command, 79
show run vdc command, 137
show running-config command, 45
show running-config copp command, 188–189
show running-config diff command, 43–44
show running-config diff example, 43–44
show running-config mmode command, 730
show running-config sla sender command, 324
show sflow command, 79–80
show sflow command output example, 80
show sflow statistics command, 80
show snapshots command, 725–726
show sockets client detail command, 157–158
show sockets connection tcp command, 615–616
show sockets connection tcp detail command, 157
show sockets internal event-history events command, 616–617
show sockets internal event-history events command example, 617
show sockets statistics all command, 159
show spanning-tree command, 225–227, 237–238, 281–282
show spanning-tree inconsistentports command, 246, 252
show spanning-tree interface command, 227
show spanning-tree mst command, 238–239
show spanning-tree mst configuration command, 237
show spanning-tree mst interface command, 239–240
show spanning-tree root command, 222–224, 225
show spanning-tree vlan command, 897–898
show system inband queuing statistics command, 150
show system internal access-list input entries detail command, 190
show system internal access-list input statistics command, 340–341,
348–349, 359, 367–368, 700–702
show system internal access-list interface command, 339–340, 367–368,
700–702
show system internal access-list interface e4/2 input statistics module
4 command, 573–574
show system internal aclmgr access-lists policies command, 574–575
show system internal aclmgr ppf node command, 575–576
show system internal adjmgr client command, 164–165
show system internal adjmgr internal event-history events command,
167
show system internal bfd event-history command, 695–699
show system internal bfd transition-history command, 699–700
show system internal copp info command, 191–192
show system internal eltm info interface command, 195
show system internal ethpm info interface command, 175–178, 195
show system internal fabricpath switch-id event-history errors
command, 310
show system internal feature-mgr feature action command, 16
show system internal feature-mgr feature bfd current status
command, 695
show system internal feature-mgr feature state command, 15
show system internal fex info fport command, 128–130
show system internal fex info sat port command, 128
show system internal flash command, 13–14, 24, 88–89
show system internal forwarding adjacency entry command, 173–174
show system internal forwarding route command, 173–174
show system internal forwarding table command, 350
show system internal mmode logfile command, 731
show system internal mts buffer summary command, 145–146
show system internal mts buffers detail command, 146–147
show system internal mts event-history errors command, 148
show system internal mts sup sap description command, 146–147
show system internal mts sup sap sap-id command, 11–12
show system internal mts sup sap stats command, 147–148
show system internal pixm info ltl command, 765
show system internal pktmgr client command, 151–152
show system internal pktmgr interface command, 152–153
show system internal pktmgr stats command, 153
show system internal port-client event-history port command, 179
show system internal port-client link-event command, 178–179
show system internal qos queueing stats interface command, 114–115
show system internal rpm as-path-access-list command, 672–673
show system internal rpm clients command, 588–589
show system internal rpm event-history rsw command, 588, 672–673
show system internal rpm ip-prefix-list command, 589, 668–669
show system internal sal info database vlan command, 350
show system internal sflow info command, 80
show system internal sup opcodes command, 147
show system internal sysmgr gsync-pending command, 32
show system internal sysmgr service all command, 10, 11, 146
show system internal sysmgr service all command example, 10
show system internal sysmgr service command, 10
show system internal sysmgr service command example, 10
show system internal sysmgr service dependency srvname command,
142–143
show system internal sysmgr state command, 31–32, 710–711
show system internal ufdm event-history debugs command, 171–172
show system internal vpcm info interface command, 318–320
show system mode command, 720–722
show system redundancy ha status command, 709
show system redundancy status command, 29–30, 708–709
show system reset-reason command, 29, 110
show tech adjmgr command, 167
show tech arp command, 167
show tech bfd command, 704
show tech bgp command, 687
show tech dhcp command, 362
show tech ethpm command, 179
show tech glbp command, 390
show tech hsrp command, 379
show tech netstack command, 617, 687
show tech nxapi command, 975
show tech nxsdk command, 967
show tech routing ipv4 unicast command, 687
show tech rpm command, 687
show tech track command, 334
show tech vpc command, 294
show tech vrrp command, 385
show tech vrrpv3 command, 385
show tech-support command, 51, 320, 749–750
show tech-support detail command, 124, 141
show tech-support eem command, 87
show tech-support eltm command, 195
show tech-support ethpm command, 130, 195
show tech-support fabricpath command, 310
show tech-support fex command, 130
show tech-support ha command, 719
show tech-support issu command, 719
show tech-support mmode command, 731
show tech-support netflow command, 78
show tech-support netstack command, 160
show tech-support pktmgr command, 160
show tech-support sflow command, 80
show tech-support vdc command, 141
show tunnel internal implicit otv brief command, 890–891
show tunnel internal implicit otv detail command, 922, 937
show tunnel internal implicit otv tunnel_num command, 891
show udld command, 247–248
show udld internal event-history errors command, 248–249
show vdc detail command, 137–138
show vdc detail command output example, 137–138
show vdc internal event-history command, 140–141
show vdc membership command, 139–140
show vdc resource detail command, 138–139
show vdc resource detail command output example, 138–139
show vdc resource template command, 131–132
show virtual-service command, 959–960
show virtual-service tech-support command, 960
show vlan command, 201–202, 214
show vlan command example, 201–202
show vlan private-vlan command, 210–211
show vpc command, 280–281, 284–285, 314–315
show vpc consistency-parameters command, 285–286
show vpc consistency-parameters command example, 285–286
show vpc consistency-parameters vlan command, 286–287
show vpc consistency-parameters vlan command example, 286–287
show vpc consistency-parameters vpc command, 287
show vpc consistency-parameters vpc vpc-id command example, 287
show vpc orphan-ports command, 288
show vpc peer-keepalive command, 282–283
show vrrp command, 380–381
show vrrp statistics command, 381–382
show vrrpv3 command, 383–384
show vrrpv3 statistics command, 384–385
SIA (stuck in active) queries in EIGRP, 443–446
SIA timers output example, 444, 446
site VLAN for OTV, 882
SM (sparse mode), 772
SMUs (Software Maintenance Upgrades), 27–28
sniffing. See network sniffing
soft reconfiguration inbound in BGP, 654–657
software releases, 25–27
SOL (SPAN-on-Latency), 60–61
source command, 963
SPAN (Switched Port Analyzer), 54–57
configuring, 55–56
ERSPAN, 57–60
filtering traffic, 57
SPAN-on-Drop, 61–62
SPAN-on-Latency (SOL), 60–61
verifying, 56
SPAN-on-Drop, 61–62
SPT switchover on NX-4 example, 793
SSM (source specific multicast), 841–843
configuring, 843–845
verifying, 845–848
SSO (stateful switchover), 707–712
stateful restarts, 29
stateless restarts, 29
static joins, 748
static routes, object tracking with, 334
static RP, configuring, 812–813
status of overlay example, 888
STP (Spanning Tree Protocol), 218–219
forwarding loops
BPDU filter, 244–245
BPDU guard, 243–244
detecting and remediating, 241–242
MAC address notifications, 242–243
unidirectional links, 245–252
IEEE 802.1D standards, 219–220
MST (Multiple Spanning-Tree Protocol), 236
configuring, 236–237
tuning, 240–241
verifying, 237–240
port states, 219
port types, 219
portfast enablement example, 235
RSTP (Rapid Spanning Tree Protocol), 220–221
blocked switch port identification, 225–227
interface STP cost, 221–222
root bridge election, 222–224
root port identification, 224–225
tuning, 228–235
verifying VLANs on trunk links, 227
terminology, 219–220
stub routers, 421–424
subnets in VLANs, 203. See also primary subnets
suboptimal path selection example, 562
suboptimal routing in IS-IS, 562–566
supervisor redundancy, 29–34
suspend individual (LACP), 271
suspending vPC orphan port during vPC failure example, 288
SVI (switched virtual interface), promiscuous PVLANs on, 215–217
switching from maintenance mode to normal mode example, 724–725
syslog
configuring, 90
with LSAs with duplicate RIDs example, 486, 487
with LSPs with duplicate system IDs example, 547
with neighbors configured, 472
server, 90
triggered loop guard example, 246
sysmgr (system manager), 9–11
system component troubleshooting, 142–143
ARP and Adjacency Manager, 160–175
EthPM and Port-Client, 175–179
HWRL, CoPP, system QoS, 179–192
MTS (Message and Transaction Service), 144–148
MTU settings, 192–195
Netstack and Packet Manager, 148–160
system maintenance mode always-use-custom-profile command, 728–
730
system manager state information example, 710–711
system mode maintenance command, 720–722
system mode maintenance dont-generate-profile command, 730–731
system mode maintenance on-reload reset-reason command, 726–727
system mode maintenance timeout command, 726
system priority (LACP), 268–271
system QoS (quality of service), 179–192
system redundancy HA status example, 709
system redundancy state example, 709
system switchover command, 711–712
System-ID in IS-IS, 539, 546–549
T
tar append command, 20
tar create command, 20
tar extract command, 20
TCAM (ternary content addressable memory), 573–574
TCN (topology change notification), 232–235
TCP connect probes, 328–329
TCP sessions, verifying, 615–617
TCP socket connections example, 615
TCP socket creation and Netstack example, 157
TCPUDP component (Netstack), 156–160
technical support files, 44–45
telnet to port 179 usage example, 616
templates in BGP, 653–654
test packet-tracer command, 71–72
threshold for track list object example, 333
timers in OSPF, 476–478
TLVs (type, length, value) tuples, 512
in IIH, 514
in LSPs, 516
topologies
after SIA replies example, 445
EIGRP topology table, 395–396
IS-IS topology table, 898–905
verifying in FabricPath, 306
track object with static routes example, 334
track-list state, object tracking for, 332–333
traffic load-balancing (port-channels) troubleshooting, 272–274
trees in PIM, 772–773
trunk ports, 204–205
allowed VLANs, 206
configuring and verifying, 204
native VLANs, 206
PVLANs and, 217–218
verifying VLANs on, 227
tuning
BGP memory consumption, 650–653
MST (Multiple Spanning-Tree Protocol), 240–241
RSTP (Rapid Spanning Tree Protocol), 228–235
port priority, 232–233
root bridge placement, 228–229
root guard, 229
root port and blocked switch port locations, 229–232
topology changes and portfast, 232–235
tunnel depolarization, 942–944
Tx-Rx loop, 249–250
Type 1 vPC consistency-checker errors, 283–284
Type 2 vPC consistency-checker errors, 284
Type-1 networks, external OSPF routes, 496–497
Type-2 networks, external OSPF routes, 497–499
U
UDLD (unidirectional link detection), 246–250
configuring, 247
empty echo detection example, 249
event-history example, 248–249
UDP echo probes, 324–325
UDP jitter probes, 325–327
UFDM process, 171–175
UFDM route distribution to IPFIB and acknowledgment example, 172
underscore (_) in RegEx, 677–678
unicast flooding, 198
with multicast enabled transport, 919–924
in OTV, 877
selective unicast flooding, 918–919
unicast forwarding components, 167
unicast routes from NX-2 for VLAN 215 and VLAN 216 example, 858
unicast RPF configuration and verification example, 351–352
unicast traffic, 734
unicast transport, multicast traffic with, 932–937
unidirectional links, 245
bridge assurance, 250–252
loop guard, 245–246
UDLD (unidirectional link detection), 246–250
unique router-ID in OSPF, 471
unique System-ID in IS-IS, 539
update generation process in BGP, 643–646
UPDATE message, 602
URIB (Unicast Routing Information Base), 167–171
clients, 168
route installation, 647–648
verifying FabricPath, 307
verifying vPC+, 316–317
URPF (Unicast Reverse Path Forwarding), 351–352
UUID (Universally Unique Identifier), 9
V
VDC (Virtual Device Contexts), 35–37, 130–131
configuring, 133–134
initializing, 134–136
internal event history logs example, 140–141
management, 137–142
out-of-band and in-band management, 137
resource templates, 131–132
verifying
access port mode example, 203–204
access-list counters
in hardware example, 574–575
in TCAM example, 573–574
ACLs (access control lists)
on line card for DHCP relay example, 339–340
statistics on line card for DHCP relay example, 340–341
active interfaces, 402–403
AED for VLAN 103 example, 920
anycast RP, 830–841
ARP ACLs, 348–349
ARP ND-Cache example, 916
ASM (any source multicast), 788–789
Auto-RP, 813–820
BFD (bidirectional forwarding detection)
with echo function, 702–703
neighbors example, 694–695
sessions, 693–707
BGP (Border Gateway Protocol), 605–609
ACLs and firewalls, 613–615
configuration, 610–611
reachability and packet loss, 611–613
TCP sessions, 615–617
BiDIR (Bidirectional), 805–811
BPDU filter example, 245
BSR (bootstrap router), 820–830
community PVLAN configuration example, 214
configuration incompatibilities example, 713–714
connectivity
after virtual link example, 484–485
between primary subnets example, 411
with promiscuous PVLAN SVI example, 216–217
between PVLANs example, 214–215
contents of logflash: directory example, 24
CoPP (control plane policing)
EIGRP example, 407–408
IS-IS example, 532
NetFlow, 78
OSPF example, 465–466
current bit-rate of OTV control-group example, 894
DAI (dynamic ARP inspection), 345–346
detailed dynamic tunnel parameters example, 891
DHCP relay, 337–338
DHCPv6 guard configuration and policy, 369–370
EEM (Embedded Event Manager), 85–86
EIGRP (Enhanced Interior Gateway Protocol)
hello and hold timers example, 415–416
neighbors, 423
packets, 405–409
emulated switch-IDs example, 315
ERSPAN session, 59–60
FabricPath, 303–310
core interfaces, 303–304
IS-IS adjacency, 304–305
software table in hardware, 308–309
switch-IDs, 303, 310
topologies, 306
in URIB, 307, 309
VLANs (virtual LANs), 305–306
FEX (Fabric Extender), 126–128
filtering SPAN traffic, 57
forwarding adjacency example, 163
FP core interfaces example, 303–304
FP MAC information in vPCM example, 318–320
hardware forwarding on module 3, 799
hardware rate-limiters on N7k and N9k switches example, 181–182
hardware statistics for IPv6 snooping example, 367–368
HSRP (Hot Standby Routing Protocol), 373–374
HSRPv6 virtual address, 379
IGMP (Internet Group Management Protocol), 761–771
IGMP snooping example, 757
IGMPv3 on NX-4, 846
ingress L3 unicast flow drops example, 62
interface’s OSPF network type example, 475–476
I/O module MFIB on module 3, 798
IOS devices after NX-OS metric transition mode example, 556
IS-IS (Intermediate System-to-Intermediate System)
adjacency example, 305
interface, 523–525
interface level type example, 542
metric transition mode, 555
neighbors, 520–523
packets, 528–535
process level type example, 541
protocol, 525–526
system IDs example, 549
isolated PVLANs
communications example, 211–212
configuration example, 210–211
keychains example, 417
LACP (link-aggregation control packets), 262–265
LACP speed state, 270
Layer 3 routing over vPC, 294
local and remote FP routes in URIB example, 316–317
maintenance and normal profile configurations example, 727–728
maximum links, 267
MFDM on NX-2, 797
missing 172.16.1.0/24 network example, 493–494
MROUTE, 789–795
MROUTE in transport network, 932
MROUTE on NX-2, 795
MST (Multiple Spanning-Tree Protocol), 237, 240
MTU
under ELTM process, 195
under ethpm process, 195
multicast routing for OTV control-group example, 893
NET addressing example, 541
network QoS policy, 195
new path after new reference OSPF bandwidth is configured on R1 and
R2 example, 503–504
no services pending synchronization example, 32, 34
NX-OS BGP peering, 607
on-reload reset-reason, 726–727
optimal routing example, 493
ORIB entry for host C example, 921
OSPF (Open Shortest Path First)
area settings example, 474
encrypted authentication example, 481
neighbors, 458–460
packets, 463–467
packets using Ethanalyzer example, 467
packets with ACL example, 467
plaintext authentication example, 479
OTV (Overlay Transport Virtualization)
IS-IS adjacencies, 888–898
next-hop adjacency tracking example, 946
site adjacency example, 896
packet tracer, 71–72
PBR-based traffic example, 593
PIM ASM platform, 795–799
PIM interfaces and neighbors, 780–785
platform FIB, 173–174, 176–178
platform LTL index example, 765
port priority impact on spanning tree protocol topology example, 232–
233
port-channel status, 260–262
PPF database example, 575–576
promiscuous PVLAN SVI mapping example, 216
PVLAN switchport type example, 211
redistributed networks example, 567
remote area routes
on NX-1 and NX-4 example, 483
on NX-2 and NX-3 example, 482–483
RFC1583 compatibility example, 500
root and blocking ports for VLAN example, 226–227
SAL database info and FIB for IPSG, 350
site group to delivery group mapping example, 931
site-ID of OTV IS-IS neighbor example, 890
site-VLAN spanning-tree example, 897–898
size and location of PSS in flash file system example, 13–14
software table in hardware for FP route example, 308–309
SPAN (Switched Port Analyzer), 56
spanning tree protocol root bridge example, 223
SSM (source specific multicast), 845–848
state and available space for logflash: example, 24
suboptimal routing example, 491
sysmgr state on standby supervisor example, 33
total path cost example, 230–231
trunk port, 204
UDLD switch port status example, 247–248
URPF (Unicast Reverse Path Forwarding), 351–352
VLANs on trunk links, 227
vPC (virtual port-channel)
autorecovery, 289
autorecovery example, 289
consistency-checker, 283–287
domain status, 280–282
peer-gateway, 291
peer-gateway example, 291
peer-keepalive link, 282–283
vPC+, 314–320
emulated switches, 315
MAC addresses, 315–316
show vpc command, 314–315
in URIB, 316–317
in vPCM, 318–320
vPC-connected receiver, 861–869
vPC-connected source, 849–861
VRRP (Virtual Router Redundancy Protocol), 380–381
which OTV ED is AED example, 892
viewing
access port configuration command example, 203
and changing LACP system priority example, 268
contents of specific file in logflash: example, 24–25
CoPP policy and creating custom CoPP policy example, 189
debug information for redistribution example, 590
detailed version of spanning-tree state example, 234
EIGRP (Enhanced Interior Gateway Protocol)
authentication on interfaces example, 417
passive interfaces example, 404
retry values for neighbors example, 410–411
routes on NX-1 example, 420–421
IIH authentication example, 545–546
inconsistent ports example, 252
inconsistent spanning tree protocol ports example, 246
interface specific MST settings example, 240
keychain passwords example, 481, 546
LACP (link-aggregation control packets)
neighbor information example, 264
packet counters example, 263
time stamps for transmissions on interface example, 263–264
MAC addresses on Nexus switch example, 199
nondefault OSPF forwarding address example, 492
number of classic and wide EIGRP neighbors example, 438
number of RPM clients per protocol example, 588–589
OSPF (Open Shortest Path First)
password for simple authentication example, 480
RID example, 471
port-channels
hash algorithm example, 273
interface status example, 262
summary status example, 260
RPM (Route Policy Manager)
event-history example, 588
perspective example prefix-lists, 589
STP (Spanning Tree Protocol)
behavior changes with vPC example, 281–282
event-history example, 234
port priority example, 232
spanning tree protocol type of ports with bridge assurance example,
250–251
traffic load on member interfaces example, 273
VLANs (virtual LANs)
allowed on trunk link example, 206
participating with spanning tree protocol on interface example, 227
vPC (virtual port-channel)
orphan ports example, 288
peer-keepalive status example, 282
status example, 280–281
virtual link configuration example, 484
virtual service list and resource utilization example, 960
virtualization
Virtual Device Contexts (VDCs), 35–37
virtual port channels (vPC), 37–39
Virtual Routing and Forwarding (VRF), 37
VLANs (virtual LANs), 200–201
access ports, 203–204
creating, 201–203
IGMP snooping group membership example, 764
loop-free topologies. See STP (Spanning Tree Protocol)
mapping
on L2 trunk example, 942
in OTV, 941–942
on overlay interface example, 942
multiple subnets in, 203
PVLANs (private VLANs), 207–208
communication capability between hosts, 208
community PVLANs, 212–215
isolated PVLANs, 208–212
promiscuous PVLANs on SVI, 215–217
trunking between switches, 217–218
reserved VLAN, 870
site VLAN for OTV, 882
trunk ports, 204–205
allowed VLANs, 206
native VLANs, 206
verifying
in FabricPath, 305–306
on trunk links, 227
vPC (virtual port-channel), 37–39, 274–275
ARP synchronization, 291–292
autorecovery, 289
backup Layer 3 routing, 292–293
configuring, 278–280
domains, 275–276
IGMP snooping state on NX-4 example, 854–855
Layer 3 routing, 293–294
member links, 277
multicast traffic, 848–849
duplicate packets, 870
receiver configuration and verification, 861–869
reserved VLAN, 870
source configuration and verification, 849–861
operational behavior, 277–278
orphan ports, 288
peer link, 277
peer-gateway, 289–291
peer-keepalive link, 276–277
status with consistency checker error example, 284–285
topology, 275–276
verifying
consistency-checker, 283–287
domain status, 280–282
peer-keepalive link, 282–283
vPC+
configuring, 311–314
verifying, 314–320
emulated switches, 315
MAC addresses, 315–316
show vpc command, 314–315
in URIB, 316–317
in vPCM, 318–320
vPCM (vPC Manager), verifying vPC+, 318–320
VRF (Virtual Routing and Forwarding), 37
VRRP (Virtual Router Redundancy Protocol), 380–385
configuring, 380
state and detail information example, 381
statistics, 381–382
verifying, 380–381
VRRPv3, 382–385
VRRPv3, 382–385
W
wc utility, 40
well-known multicast addresses, 741
wide metrics
versus classic metrics in EIGRP, 433–439
on NX-1, NX-2, and NX-3 example, 437–438
on NX-1, NX-2, NX-3, and NX-6 example, 438–439
on NX-1 and NX-2 example, 436–437
X
xml utility, 42
Y
yum command, 954
Code Snippets
Many titles include programming code or configuration examples. To
optimize the presentation of these elements, view the eBook in single-
column, landscape mode and adjust the font size to the smallest setting. In
addition to presenting code and configurations in the reflowable text
format, we have included images of the code that mimic the presentation
found in the print book; therefore, where the reflowable format may
compromise the presentation of the code listing, you will see a “Click here
to view code image” link. Click the link to view the print-fidelity code
image. To return to the previous page viewed, click the Back button on
your device or app.