Guest

Cisco Catalyst 6500 Series Switches

Virtual Switching System Quad-Supervisor Stateful Switchover - Delivering Maximum Uptime with Simplicity White Paper

  • Viewing Options

  • PDF (1.0 MB)
  • Feedback

What You Will Learn

This paper is intended for network design engineers and network operators looking to understand the new Cisco® Virtual Switching System Quad-Supervisor Stateful Switchover technology and how it enhances the VSS to provide increased application uptime with simplified network designs. The paper begins with a brief description of the benefits of the VSS itself and then explains how VS4O technology, enabling redundant in-chassis supervisor modules, enhances these benefits.

Following the benefits discussion, the paper provides a more technical description of the VS4O architecture and operations. Finally, the paper provides an explanation of how to migrate to a VS4O configuration and an overview of the software upgrade process.

VS4O is available for the Cisco Catalyst® 6500 Virtual Switching System configured with the Cisco Catalyst 6500 Series Supervisor Engine 2T beginning with Cisco IOS® Software Release 15.1(1)SY1.

Introduction

Network availability demands on today’s enterprise network infrastructures are higher than ever before due to a number of businesses and technological trends. From the business standpoint, many enterprises are looking for ways to become more efficient, consolidate assets, and lower operating expenses. The network infrastructure is an obvious choice to use new technologies and capabilities in order to consolidate and enhance services while lowering costs.

Recent examples of network infrastructure integrations include:

Voice and data networks

IP-enabled security devices

Building climate and control systems

Medical devices

Many other industry-specific control systems

Integrating these disparate systems into a single IP-enabled infrastructure is creating opportunities for businesses to reduce costs and enhance services.

On the technology front, a proliferation of real-time applications, including as voice and video, is demanding very fast convergence, in the order of subsecond recovery. Network designs must therefore evolve to provide higher availability levels with subsecond convergence.

The VSS was developed to address these increasing demands on network availability with three main benefits:

Simplified network designs

Faster convergence

Increased bandwidth

Simplified Network Designs

The VSS technology takes two physical chassis and creates an active-standby control plane[1] relationship across the two chassis while creating an active-active data plane[2]. In other words, a single management entity and configuration file exists because of the active-standby control plane; however, both chassis can forward data. This creates what appears in the network as a single device with 1+1 chassis redundancy.

Because the VSS appears as a single logical entity, the network design itself is also simplified. With a single logical device instead of two, there is no longer a need to use First Hop Redundancy Protocols such as the Hot-Standby Router Protocol. Because the VSS is a single logical entity instead of two, this reduces the number of Layer 3 peering relationships and therefore lowers the processing cycles associated with the routing protocols. Similarly, IP multicast routing is simplified because there is no longer a need for two separate routers to negotiate a designated IP multicast forwarder.

Faster Convergence

With the VSS, network designs can utilize multichassis EtherChannel or Layer 3 equal cost multipath (ECMP) routes when connecting to the VSS. The result is faster, hardware-accelerated convergence during failure scenarios, including link fail, line card fail, and node/chassis fail.

Increased System Bandwidth

Utilizing multichassis EtherChannel designs eliminates Layer 2 loops in the network; therefore, this also eliminates spanning tree blocking ports, allowing all links to forward traffic for increased network bandwidth.

What Is VS4O?

The VS4O technology enhances the VSS with support for redundant supervisor modules within a single chassis, referred to as in-chassis redundant supervisor module support.[3] With VS4O the VSS is configured with two supervisor modules per chassis. The second supervisor module within the chassis can be described as an in-chassis standby supervisor (ICS). Within each local chassis the two supervisor modules use Stateful Switchover (SSO) technology to establish an “SSO active” and “SSO standby hot” control plane redundancy relationship. In the unlikely event that the active supervisor should fail, the standby hot supervisor will immediately transition to the active role. This transition occurs in a subsecond manner from the data plane forwarding perspective; in fact, the local chassis remains operational, keeping all links active. In this case we can describe the redundancy relationships as 1:1 within the VSS chassis, followed by 1+1 across the VSS chassis.

Primary Benefits of VS4O

By supporting an in-chassis standby supervisor, the resiliency of the individual chassis is increased as well as the performance of the VSS during a supervisor failure event.

The main benefits are:

Increased chassis resiliency

VSS maintenance of 100% bandwidth even after single supervisor failure

Automated and deterministic recovery of a failed supervisor

New staggered mode software upgrade process, which reduces the link downtime associated with the individual line cards during an enhanced fast software upgrade (EFSU)[4]

Adding redundant components is the most direct method to increase device resiliency for standalone devices. Adding redundant power supplies, fans, and redundant supervisor modules adds to the overall predicted availability rating of the chassis.

The second benefit to a VS4O configuration is that the VSS maintains 100 percent of the VSS switching capacity even after the failure of a supervisor module. This is a significant improvement versus configurations with a single supervisor module per chassis, in which case the local chassis and its line cards are not operational following the failure of a supervisor module.

The supervisor recovery and replacement procedure is a more automated process in a VS4O configuration. The recovery process is based on SSO technology and occurs in a subsecond manner. The replacement procedure is simplified as well. Beginning with Cisco IOS Software Release 15.1(1)SY1,[5] a VSS mode in-chassis redundant supervisor module is fully supported. One simply needs to insert the second supervisor into the chassis. More details on the boot process for the second supervisor are provided in the migrations section of this paper.

For software upgrades in a VS4O configuration, the new staggered mode operation will take advantage of the ICS supervisor and upgrade the software of the ICS supervisor separately. This allows for the line cards to be restarted with minimal downtime. More details on this process are provided in the software upgrade section of this paper.

SSO Technology Review

SSO technology is essentially a group of Cisco IOS Software processes that provide for supervisor module redundancy. In a standalone chassis,[6] for example, the redundant supervisor module operates in a 1:1 active-standby hot redundancy mode, providing redundancy for both the control plane and data plane. The Cisco IOS Software SSO processes build and maintain the redundancy relationship between supervisor modules; the supervisor modules use a protocol to arbitrate which becomes the active and which becomes the standby hot. The active supervisor will provide the control plane functionality for the overall device, creating a single logical entity. The active supervisor synchronizes certain primary Cisco IOS Software processes and data structures with the standby supervisor module. The active will continue to synchronize the primary information so that in the event the active supervisor should fail, the standby can assume the active role immediately.[7]

SSO Evolves to Support VSS

SSO technology later evolved to support the VSS, where supervisor engine redundancy is established across two chassis. The VSS uses a dedicated physical link, called the Virtual Switch Link (VSL), between the two chassis to synchronize the supervisor modules in each chassis. Note that in the initial implementations of VSS, only a single supervisor per chassis is supported[8]. With the VSS, supervisor modules also establish an SSO active and SSO standby hot redundancy mode for the control plane of the entire VSS, thus creating a single logical entity. However, unlike in a standalone chassis, the VSS standby supervisor module will perform data plane switching functions for its local chassis. In the data plane context, the VSS operates in a 1+1 redundancy mode, meaning the aggregate performance is additive and utilizes both supervisors.

SSO Evolves to Support VS4O

VS4O evolves the SSO technology yet again to support redundant supervisor modules within the physical chassis as well across two chassis in a VSS. VS4O enables the supervisor module to maintain two different redundancy relationships: one primary redundancy relationship, which is always across chassis and is maintained for the overall VSS, and a secondary redundancy relationship maintained within the local chassis. VS4O in effect provides 1+1 redundancy across two chassis and also 1:1 redundancy with the local chassis.

VS4O: How It Works

VS4O uses multiple redundancy domains to support the in-chassis redundant supervisor module in VSS mode. A redundancy domain is effectively an active and standby SSO redundancy mode relationship between two entities. Figure 1 shows the new in-chassis redundancy domain existing along with the default redundancy domain that exists across the VSS chassis.

Figure 1. VS4O Redundancy Domains

The default redundancy domain is the domain established for the overall VSS itself. The default redundancy domain’s active and standby supervisor modules are designed to always exist across chassis. Alternatively, the new in-chassis redundancy domain exists only within a single chassis. Each VSS chassis establishes its own in-chassis redundancy domain, which is independent of the other chassis.

Initialization Process for VS4O

When a VSS-configured chassis powers up, if two supervisor modules are installed in the chassis, each supervisor module will start its bootup process. Very early during this initialization process, the supervisor will determine two primary factors: first, that it is a VSS configuration, and second, that an in-chassis redundant supervisor module is installed in the same chassis. (See Table 1).

Table 1. Excerpts from System Console Messages Indicating VSS Mode Detected and Then Role Resolution Identified

Supervisor Module Switch 1 Slot 1

Supervisor Module Switch Slot 2

Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2013 by Cisco Systems, Inc.

Compiled Tue 26-Mar-13 19:08 by prod_rel_team

Image text-base: 0x04100144, data-base: 0x0C000000

System detected Virtual Switch configuration...

Interface TenGigabitEthernet 1/1/4 is member of PortChannel 1

Interface TenGigabitEthernet 1/1/5 is member of PortChannel 1

Interface TenGigabitEthernet 1/2/4 is member of PortChannel 1

Interface TenGigabitEthernet 1/2/5 is member of PortChannel 1

Initializing as Virtual Switch ACTIVE processor

Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2013 by Cisco Systems, Inc.

Compiled Tue 26-Mar-13 19:08 by prod_rel_team

Image text-base: 0x04100144, data-base: 0x0C000000

System detected Virtual Switch configuration...

Interface TenGigabitEthernet 1/1/4 is member of PortChannel 1

Interface TenGigabitEthernet 1/1/5 is member of PortChannel 1

Interface TenGigabitEthernet 1/2/4 is member of PortChannel 1

Interface TenGigabitEthernet 1/2/5 is member of PortChannel 1

*Jun 17 22:56:20.011: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.

Firmware compiled 06-Mar-13 08:26 by integ Build [25856]

*Jun 17 22:56:20.011: %PFREDUN-6-STANDBY: Initializing as STANDBY processor for this switch

*Jun 17 22:58:06.815: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.

Both supervisor modules will then arbitrate and determine which supervisor module will become the in-chassis active (ICA), and the other will become the in-chassis standby (ICS). During a normal bootup sequence, the supervisor module in the lowest slot number will become the ICA. If a supervisor module is inserted after a previous supervisor module has already established itself as the ICA, then the second supervisor module will assume the ICS role.

Next, the ICA will continue the initialization process by attempting to establish communication over the VSL and negotiate a redundancy role in the default redundancy domain. Only the ICA supervisor from each chassis will participate in the default redundancy domain. If for some reason the ICA supervisor is not able to communicate with the ICA of the peer chassis after a specified period of time, it will progress to the active role for the default redundancy domain and the overall VSS.

The supervisor module that becomes the SSO active in the default redundancy domain will then continue the bootup process and start to initialize the line cards and execute the startup configuration file.

Supervisor Switchover Events

The failure of a supervisor module is a rare event but is still a possibility. In a full VS4O configuration, the most critical of the four possible supervisor module failure events would be the failure of the VSS active supervisor, which is the active supervisor in the default redundancy domain. A closer examination of successive failure of the VSS active supervisor is discussed next.

Figure 1 shows the supervisor modules with their redundancy roles and their respective redundancy domains. If the VSS active supervisor (sw1/slot1) fails, then two separate recovery processes occur immediately after the failure is detected. The overall VSS control plane is maintained in the default redundancy domain. The default redundancy domain is always established across the VSS chassis; in this case, the VSS standby module (switch2/slot1) will progress from the standby role to the active role for the VSS default redundancy domain.

A second separate recovery process occurs within switch1: the ICS supervisor module for switch1 (switch1/slot2) will detect the failure of the supervisor in switch1/slot1 and then progress to become the ICA for switch1. It will also then begin to participate in the default redundancy domain and establish communication with the peer in the default redundancy domain. At this point the new VSS active in the default redundancy domain is switch2/slot1. Because the supervisor in switch2/slot1 was already participating in the default redundancy domain and has transitioned to the VSS active role in the default redundancy domain, the supervisor in switch1/slot2 will become the standby for the default redundancy domain.

Important point: The VSS control plane is made redundant by the active and standby supervisor modules in the default redundancy domain. The default redundancy domain is always established across the two VSS chassis. Therefore, the failure or removal of the VSS active supervisor will result in the transition to a new VSS active supervisor in the peer chassis.

To illustrate the cross-chassis recovery process further, Figure 2 shows the resulting Z pattern of active supervisor module locations after successive failure events.

Figure 2. Z Pattern Switchover

Configuring and Monitoring VS4O

From a configuration standpoint, no additional configuration commands are required to enable the VS4O implementation. The only requirement is that the supervisor modules are running a software image version capable of supporting the VS4O redundancy modes. When a second supervisor is inserted into an operational VSS, the newly inserted supervisor module automatically detects the VSS configuration and negotiates to an ICS redundancy role.

As with redundant supervisor operations in a standalone chassis, the SSO active supervisor module automatically synchronizes the startup-config and running-config files with the SSO standby hot; this is part of SSO operation. Apart from the configuration files, the software image files and any other files located on the supervisor module file systems must be managed manually. In other words, one uses the CLI such as copy, delete, and so on to move files between the various supervisor module file systems.

The VS4O implementation creates few new file system names based on the active/standby relationship, or alternatively one can reference the switch number/slot number. The new switch number/slot number naming convention allows one to reference a file system without needing to consider the redundancy state of VSS. (SeeTable 2).

Table 2. Subset of New File System Names Supported with VS4O Implementation; disk0: File System Used for Consistency

File System Name

Description

disk0:

Removable compact flash on VSS active supervisor

ics-disk0:

Removable compact flash on ICS supervisor of VSS active chassis

salvedisk0:

Removable compact flash on VSS standby supervisor

slave-ics-disk0:

Removable compact flash on ICS supervisor of VSS standby chassis

sw1-slot1-disk0:

Removable compact flash on switch 1 slot 1 supervisor

sw1-slot2-disk0:

Removable compact flash on switch 1 slot 2 supervisor

sw2-slot1-disk0:

Removable compact flash on switch 2 slot 1 supervisor

sw2-slot2-disk0:

Removable compact flash on switch 2 slot 2 supervisor

For viewing and monitoring the state of the individual supervisor modules, certain VSS-related CLI commands can be used to view the redundancy roles and states. An example of the show switch virtual redundancy command is shown in Figure 3. Notice that the output of this command will always list the current active supervisor module first; therefore, the order in which the modules are displayed will vary as the redundancy modes change on the supervisors.

Figure 3. Show Switch Virtual Redundancy Command Output
VSS01#show switch virtual redundancy
My Switch Id = 1
Peer Switch Id = 2
Last switchover reason = none
Configured Redundancy Mode = sso
Operating Redundancy Mode = sso
Switch 1 Slot 1 Processor Information:
-----------------------------------------------
Current Software state = ACTIVE
Uptime in current state = 16 hours, 7 minutes
Image Version = Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Tue 26-Mar-13 19:08 by prod_rel_team
BOOT = bootdisk:v2,12;bootdisk:s2t54-advipservicesk9-mz.SPA.151-1.SY1,12;
CONFIG_FILE =
BOOTLDR =
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = ACTIVE
Switch 1 Slot 2 Processor Information:
-----------------------------------------------
Current Software state = STANDBY HOT (CHASSIS)
Uptime in current state = 16 hours, 6 minutes
Image Version = Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Tue 26-Mar-13 19:08 by prod_rel_team
BOOT = bootdisk:v2,12;bootdisk:s2t54-advipservicesk9-mz.SPA.151-1.SY1,12;
CONFIG_FILE =
BOOTLDR =
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = STANDBY
Switch 2 Slot 1 Processor Information:
-----------------------------------------------
Current Software state = STANDBY HOT (switchover target)
Uptime in current state = 16 hours, 4 minutes
Image Version = Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Tue 26-Mar-13 19:08 by prod_rel_team
BOOT = bootdisk:v2,12;bootdisk:s2t54-advipservicesk9-mz.SPA.151-1.SY1,12;
CONFIG_FILE =
BOOTLDR =
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = STANDBY
Switch 2 Slot 2 Processor Information:
-----------------------------------------------
Current Software state = STANDBY HOT (CHASSIS)
Uptime in current state = 16 hours, 4 minutes
Image Version = Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Tue 26-Mar-13 19:08 by prod_rel_team
BOOT = bootdisk:v2,12;bootdisk:s2t54-advipservicesk9-mz.SPA.151-1.SY1,12;
CONFIG_FILE =
BOOTLDR =
Configuration register = 0x2102
Fabric State = ACTIVE
Control Plane State = STANDBY
VSS01#

Control Plane and Data Plane Convergence (VS4O Supervisor Switchover)

To understand the effect of a supervisor switchover event on data plane traffic, let us first define and characterize data plane traffic versus control plane traffic. Data plane traffic is the traffic being switched through the VSS at either Layer 2 or Layer 3. In other words, data plane traffic is destined for an L2 or L3 address existing somewhere in the network, but not to an address on the VSS itself. Control plane traffic would be traffic destined to a Layer 2 or Layer 3 address existing on the VSS itself. Other types of control plane traffic would be broadcast, multicast, and unknown unicast traffic; all of these types of traffic will be forwarded in some way to the CPU for specific processing. Typically, data plane traffic will be the vast majority of all traffic passing through the VSS. Data plane traffic is almost always completely switched using the hardware switching functionality of the Cisco Catalyst 6500.

The specific effects a supervisor switchover event has on user data traffic or network control traffic depends on multiple variables, but in general, user data traffic is either not affected at all or affected only by a subsecond interruption.

Effects on Control Plane Traffic Forwarding

Control plane traffic can be interrupted for longer periods of time after a switchover event because of the newly active supervisor module performing higher priority tasks as it transitions to the active role. In some cases the interruption can be as long as 8 seconds. Therefore, in an SSO or VS4O configuration, it is recommend to keep network control protocol keep-alive and hello messages to a default setting or at least to an expiration time not less than 8 seconds, so that these timers do not expire.

There are some exceptions for control plane protocols that have been engineered to be “SSO aware,” meaning the protocol has been developed with an infrastructure capable of maintaining state across switchover events even with fast/aggressive timers[9]. Bidirectional Forwarding Detection (BFD) is an example of a control plane protocol that is SSO aware and can support hold times less than 2 seconds. For guidance on the minimum support timer values in an SSO configuration such as VSS, VS4O, or even in a standalone switch with redundant supervisor modules, consult the latest Cisco IOS Software configuration guide.

Therefore, from the network design standpoint, the question relative to a supervisor module failure event is: Do you want SSO and Nonstop Forwarding (NSF) technologies (used in VSS and redundant supervisor configurations) to maintain the network topology and not cause the network control protocols to converge? If yes, then keep the control plane protocol timers at default settings or at least configure the hold time values long enough that they will survive the switchover event. In contrast, if you would rather have the network control protocols detect the supervisor switchover event and converge the network as quickly as possible, then consider using fast/aggressive timers.

For further discussion and reference on this topic, consult the Cisco validated design guides and smart business architecture documents for reference on specific timer tuning options in a VSS environment.

Reference URL: http://www.cisco.com/en/US/partner/netsol/ns742/networking_solutions_program_category_home.html.

Effects on Data Plane Traffic Forwarding

In the case of VSS with VS4O, a supervisor module failure or switchover event will cause a subsecond interruption in data plane traffic for any traffic flows passing through the failing supervisor module’s switch fabric; this includes the supervisor module’s physical interfaces. The term “switch fabric” describes both the 2-terabit crossbar fabric channels and the Policy Feature Card v4. The 2-terabit crossbar is used to switch data between the line cards, and the Policy Feature Card v4 serves as the centralized lookup and forwarding engine for all line cards in the chassis or as the master forwarding engine for the line cards equipped with distributed forwarding cards (DFCs).

After the ICS detects the failure of the ICA, the ICS will start to transition its role to the new ICA. Part of this process includes signaling the line cards to start using the switch fabric channels on the newly active supervisor module. This time period between the failure of the ICA and the time the line cards start to use the switch fabric channels on the new active supervisor is when both data plane and control plane traffic can be interrupted. (See Figure 4.)

Figure 4. Data Path Before and After Active Supervisor Module Switchover Event

The switchover time associated with the line card depends primarily on the notification method used by the supervisor module to signal the switchover event and also on how fast the line card can initialize the new fabric channels. Newer line cards support a hardware-based signaling and notification method, as well as the hot-sync standby fabric channel mechanism in which the standby supervisor fabric channels are fully initialized and ready to start switching traffic after the notification occurs. Table 3 lists the different line cards and their support switchover mechanisms.

Table 3. Line Card Switchover Capabilities

Line Card Model Number

Hot Sync Standby Fabric

Fast HW Notification

6900 Series

Yes

Yes: less than 50ms

6800 Series 10G

Yes

Yes: less than 50ms

6700 Series 10G

Yes

Yes: less than 50ms

6704-10G

Yes

No

6800 Series 1G

Yes

No

6700 Series 1G

Yes

No

Classic

N/A

No

Migrating to VS4O

Migrating an existing VSS to a VS4O configuration is a straightforward process. The main requirements are that the VSS is running the 15.1(1)SY1 or newer Cisco IOS Software version, and the newly inserted redundant supervisor module is also configured to boot the same image as the ICA supervisor. The redundant supervisor can simply be inserted into the operational VSS chassis; the newly inserted supervisor will be automatically provisioned for VSS (VSS switch ID configured to match the ICA) and will negotiate itself to the ICS redundancy mode. There is no effect to data plane switching when the second supervisor is inserted.

Primary steps in migrating to VS4O:

1. Upgrade existing VSS to version 15.1(1)SY1 or newer.

2. Prepare the first ICS supervisor module to boot the same image version as the active VSS.

3. Insert the redundant supervisor module into the chassis (it does not matter into which VSS chassis the redundant supervisor is inserted first, the VSS active or VSS standby).

4. Establish console connection for each supervisor module in the VSS.

5. Verify the newly inserted supervisor boots as the ICS.

6. Configure and connect the new ICS supervisor’s 10Gb uplink ports into the VSL (optional step, but recommended).

Upgrading the existing VSS to version 15.1(1)SY1 or newer needs to be performed prior to inserting the ICS supervisor in order to support VS4O. Previous versions of Cisco IOS Software do not support an ICS in VSS mode. If a second supervisor module running is inserted into an operational VSS that is running a pre-15.1(1)SY1 version of software, depending upon the configurations, the second module either will post an error during the initialization phase and drop to the rommon mode, or possibly will post an error during the initialization phase and perform a reload. However, there will be no negative effects to the existing VSS operations. (SeeTable4).

Table 4. Expected Behavior of Supervisor When Inserting as ICS Supervisor Module with Different Version of Software Running on ICA

ICS/ICA

Active Supervisor in VSS Mode: Running Image Supporting VS4O (15.1(1)SY1 or Newer)

Active Supervisor in VSS Mode: Running Image Not Supporting VS4O (15.1(1)SY or Previous)

Standby in VSS mode running VS4O-capable image (15.1(1)SY1 or newer)

Boots as VS4O in-chassis SSO standby hot

ICS will boot to RPR-mode standby cold

Standby in standalone mode running VS4O-capable image (15.1(1)SY1 or newer)

ICS detects ICA in VSS mode and automatically sets switch number, then resets and boots as VS4O in-chassis SSO standby hot

ICS boots and detects ICA in VSS mode, sets switch_number variable and reset to rommon, boot ICS again to SY1, and ICS goes RPR-mode standby cold

Standby booting with 15.1(1)SY or older image in a standalone default config

ICS will start to boot Cisco IOS Software and recognize it is in an unsupported ICS config and drop to rommon

Standby attempts to boot as standalone ICS, will time out waiting on active, then reload

Standby booting with 15.1(1)SY or older image in a VSS config

ICS will start to boot Cisco IOS Software and recognize it is in an unsupported ICS config and drop to rommon

ICS will start to boot Cisco IOS Software and recognize it is in an unsupported ICS config and drop to rommon

ICS configured with config-register 0x2102

ICS boots to rommon

ICS boots to rommon

One option may be to use a spare chassis to prepare the second supervisor to be inserted into the VSS. When using the spare chassis method, one can configure the supervisor to boot the 15.1(1)SY1 image or whatever image is being run in the operational VSS. This software version must be at least the 15.1(1)SY1 version or newer in order to support VS4O. It is not necessary to convert the supervisor module to VSS mode or configure the VSS switch number.

If it is not possible to use a spare chassis to prepare the redundant supervisor module, then one can insert the redundant supervisor module and either use the console connection to interrupt the boot process and force the supervisor to rommon or simply allow the supervisor to abort the Cisco IOS Software initialization and drop to rommon. Neither of these two approaches will cause an interruption in data plane switching of the VSS.

After the redundant supervisor is in rommon mode, one can boot the supervisor using the correct software version, located on the external file system. An example of the CLI to boot the supervisor from the external file system is provided in Figure 5.

Figure 5. Example of CLI Used to Boot Second Supervisor from Rommon
rommon 1 > dir bootdisk:
Digitally Signed Release Software with key version A
Initializing ATA monitor library...
Directory of bootdisk:
3 33554432 -rw- sea_log.dat
2051 33554432 -rw- sea_console.dat
10053 0 drw- call-home
10054 9201 -rw- DTcfgvss02
10055 98141144 -rw- s2t54-advipservicesk9-mz.SPA.151-1.SY1
rommon 2 > boot bootdisk:s2t54-advipservicesk9-mz.SPA.151-1.SY1
Digitally Signed Release Software with key version A
Initializing ATA monitor library...
bootdisk:s2t54-advipservicesk9-mz.SPA.151-1.SY1: Digitally Signed Release Software with key version A
Self extracting the image... [OK]
Self decompressing the image: #################################################################################
#################################################################################
#################################################################################
#################################################################################
#################################################################################
#################################################################################
###########
Cisco IOS Software, s2t54 Software (s2t54-ADVIPSERVICESK9-M), Version 15.1(1)SY1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Tue 26-Mar-13 19:08 by prod_rel_team
Image text-base: 0x04100144, data-base: 0x0C000000
System detected Virtual Switch configuration...
Interface TenGigabitEthernet 2/1/4 is member of PortChannel 2
Interface TenGigabitEthernet 2/1/5 is member of PortChannel 2
*Apr 17 20:56:30.911: %SYS-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.
Firmware compiled 06-Mar-13 08:26 by integ Build [25856]
*Apr 17 20:56:30.911: %PFREDUN-6-STANDBY: Initializing as STANDBY processor for this switch
*Apr 17 20:56:32.191: %SYS-SW2-2_STBY-3-LOGGER_FLUSHING: System pausing to ensure console debugging output.

The first time the ICS is inserted into the VSS chassis, the ICA will detect that the new supervisor does not have the VSS configuration variables set. The ICA will configure the ICS with the appropriate VSS variables, including the VSS switch ID, and then reload the ICS. This is a one-time step; upon any subsequent reloads, the supervisor module will have the relevant VSS information to boot directly as the ICA or ICS.

After the ICS is fully booted, the status of the various supervisor module scans be observed using the CLI and SNMP methods.

Configuring the Virtual Switch Link in VS4O

The VSL is the dedicated port channel between the two VSS chassis that is used for passing control plane and data plane traffic between the chassis. There are a number of variables to consider when determining the number of interfaces and which interfaces to use to build the VSL. Some of these considerations, such as VSL bandwidth sizing, link diversification, and QoS mechanisms, are relevant considerations to all VSS designs and are beyond the scope of this paper, which is dedicated to VS4O. These topics are discussed in the existing documentations and white papers, including design guides[10].

One consideration specific to VS4O is the EFSU process, which provides for a simplified software upgrade process, allowing for a test phase of the new software version and automatic rollback capabilities. The details of the EFSU process are discussed in the next section of this paper, but VS4O also supports a new staggered mode upgrade as part of the EFSU process.

The staggered mode is unique to a VS4O configuration and requires that there be a direct physical VSL link between the VSS active supervisor and both supervisor modules in the peer VSS chassis. In other words, at a minimum, all four 10GbE Ethernet uplink ports on the supervisor modules must be used for the VSL. There must also be a direct link between each of the four supervisor models, which means swapping or crisscrossing the port 4s or port 5s of the supervisor modules. Figure 6 illustrates “swapping the 5s” for the VSL.

Figure 6. VSL Swapping the 5s

Figure 7 shows the CLI output of the show switch virtual link detail command, indicating the VSL links and their connected peers using the “swap 5s” configuration.

Figure 7. CLI Output from show switch virtual link detail Command
VSS01#show switch virtual link detail
VSL Status: UP
VSL Uptime: 21 hours, 45 minutes
VSL SCP Ping: Pass
VSL ICC Ping: Pass
VSL Control Link: Te1/2/4
VSL Encryption: Configured Mode - Off, Operational Mode - Off
LMP summary
Link info: Configured: 4 Operational: 4
Peer Peer Peer Peer Timer(s)running
Interface Flag State Flag MAC Switch Interface (Time remaining)
--------------------------------------------------------------------------------
Te1/1/4 vfsp operational vfsp 0013.5f1c.0680 2 Te2/1/4 T4(152ms)
T5(59.95s)
Te1/1/5 vfsp operational vfsp 0013.5f1c.0680 2 Te2/2/5 T4(152ms)
T5(59.95s)
Te1/2/4 vfsp operational vfsp 0013.5f1c.0680 2 Te2/2/4 T4(152ms)
T5(59.95s)
Te1/2/5 vfsp operational vfsp 0013.5f1c.0680 2 Te2/1/5 T4(152ms)
T5(59.98s)
Flags: v - Valid flag set f - Bi-directional flag set
s - Negotiation flag set p - Peer detected flag set
Timers: T4 - Hello Tx Timer T5 - Hello Rx Timer

Therefore the recommend VSL configuration for VS4O is to use at a minimum all four 10Gb Ethernet supervisor uplink interfaces per chassis. Other considerations might apply and might mean that additional interfaces are used as part of the VSL.

If for some reason it is not possible or feasible to use all four 10Gb ports per chassis for the VSL port channel, then a minimum of two ports is recommended, with one interface per supervisor module[11]. In this case the EFSU staggered mode upgrade will not be allowed; an error message will be posted to the console if the upgrade is attempted, indicating the minimum VSL interfaces are not configured. The user can, however, configure EFSU in tandem mode, where the upgrade will proceed by upgrading one chassis at a time. The tandem mode is also the only method supported for the VSS using a single supervisor per chassis configuration. The staggered mode upgrade process is discussed in more detail in the software upgrades section of this paper.

Software Upgrades with VS4O

Upgrading the Cisco IOS Software image version on the VS4O is supported primarily via two methods: either by using a manual configuration change to the boot variable followed by a full system reload or by using the EFSU process. With the EFSU process, the upgrade is performed using the Cisco IOS Software In-Service Software Upgrade (ISSU) infrastructure, in which the VS4O chassis are upgraded individually.

In VSS the EFSU process is most beneficial in network designs where devices are dual-attached to the VSS - in other words, where the network devices are connected to both chassis via a multichassis EtherChannel or via redundant L3 paths. In this way, whenever the individual chassis is being upgraded (which includes a chassis reload), the user data traffic converges to the surviving chassis using the inherent redundancy of EtherChannel or SSO/NSF in the case of redundant L3 paths.

VS4O Staggered Mode Upgrade

New and unique to the VS4O is the staggered mode for the EFSU process. The staggered mode uses the ICS during the EFSU process by upgrading the ICS ahead of the line cards so that when the line cards are upgraded, the reload time is reduced significantly. Using the staggered mode can reduce the effective link downtime for the line cards on average 3-5 minutes, depending upon the configuration.

VS4O Staggered Mode Upgrade Process

The EFSU staggered mode process uses the same CLI as with previous EFSU-based upgrades. The CLI is based on the Cisco IOS Software ISSU infrastructure and uses the same steps, loadversion through commitversion, as illustrated in Figure 8.

Figure 8. ISSU Process

The EFSU staggered mode process adds two additional automated procedures to the ISSU process: the upgrade of the ICS in the loadversion stage and the upgrade of the ICS in the commitversion stage. These are automated procedures in that no additional CLI input is required.

Loadversion with Staggered Mode

The ISSU process begins with the loadversion stage. To describe the upgrade process, we will use the diagram of the VS4O configuration in Figure 9.

Figure 9. VS4O and Supervisor Redundancy States Prior to Starting EFSU Process

When the user enters the ISSU loadversion command, the process will begin with an upgrade of the ICS supervisor module in the VSS standby chassis (sw2slot2). After the ICS is fully booted to the new version, the process will continue automatically to reload the ICA of the VSS standby chassis (sw2slot1), which is also the standby supervisor in the VSS default redundancy domain. Note that the sw2slot1 supervisor module is not upgraded to the new version of software at this point; it is simply reloaded. Because this supervisor is the ICA for switch 2, the reload will also force the line cards in switch 2 to reload. However, when the line cards start to boot, they will associate to the new ICA (sw2slot2) supervisor running the new version of software. Because the new ICA is already operational, the effective downtime of the line cards is reduced to just the line card reload time, not the supervisor module + line card reload time.

After the VSS has completed the upgrade of the line cards of the VSS standby in switch 2, the system will provide a console message indicating the loadversion procedure is complete and will prompt the user to run the ISSU runversion command. (See Figures 10 and 11.)

Figure 10. Console Message Prompting for Runversion
VSS01#
*Apr 18 17:33:44.366: %ISSU_PROCESS-SW1-6-LOADVERSION_INFO: Loadversion has completed. Please issue the 'issu runversion' command after all modules come online.
Figure 11. VS4O with Supervisor Redundancy States After Loadversion Process Completes

Runversion with Staggered mode

At the runversion stage, the intention is to move the VSS active role to the supervisor running the new software version, in this case the VSS standby (switch2slot2). This is accomplished by reloading the VSS active supervisor (switch1slot1). When the VSS active (switch1slot1) reloads, two different switchover events occur. The VSS default redundancy domain will transition the active role from switch1slot1 to switch2slot2, and the in-chassis redundancy domain in switch 1 will transition the ICA role to switch1slot2. Subsequently, the switch1slot2 supervisor will also start to participate in the default redundancy and become the VSS standby. (See Figure 12.)

Figure 12. ISSU Runversion Process

What is significant in the staggered mode runversion process is that during these switchover events, none of the line cards will reload, compared to the tandem mode upgrade, in which the VSS active chassis (including line cards) will reload during the runversion process. With staggered mode, because the switch 1 ICA supervisor and ICS supervisor are both running the same version of software, the line cards do not need to reload. They will experience an SSO switchover event, but this primarily involves the line cards transitioning intermodule traffic to the new ICA switch fabric. The line cards in switch 1 experience a data plane convergence event, which typically occurs in less than 200ms.

Acceptversion with Staggered Mode

The optional ISSU acceptversion stage is still available as in the tandem mode upgrade. The acceptversion command stops the rollback timer, which is automatically started at the runversion command. By stopping the rollback timer, the user can stay in the runversion state indefinitely. The intent is to allow the user to verify the functionality of the new version of software. Because the runversion stage keeps the standby chassis (switch 1 in this example) running the original version of software, the upgrade process can be easily aborted and the system reverted back to the original software version.

The ISSU abortversion command is the single command needed to revert the system back to the original software version and end the ISSU upgrade process. The ISSU abortversion command will force the supervisor module in switch2 to reload and boot the original version of the software.

Commitversion with Staggered Mode

The final stage in the upgrade process is the ISSU commitversion stage. When this stage begins, the VSS active is already running the new software version. The objective here is to finalize the upgrade process and upgrade the remaining components. The commitversion stage is initiated with the ISSU commitversion command, in which the first of two automated steps begins by setting the boot variable and reloading the ICS of the VSS standby, in this example, the supervisor module in switch1slot1. (See Figure 13.)

Figure 13. Commitversion Step 1

After the ICS of the VSS standby chassis is finished reloading to the new version of software, the second step in the commitversion stage begins automatically without any user intervention. Now the VSS standby ICA, switch1slot2, is configured with the new boot variable and reloaded. This will also force the line cards in switch 1 to reload. Because the switch 1 ICS, switch1slot1, is already running the version, the line card reload time is minimized, and the line cards are initialized and controlled by the new ICA, switch1slot1.

At the same time the upgrade of the ICS in switch 2 also occurs. Because the supervisor module is in the ICS state, it does not have any effect on the line cards for switch2. (See Figure 14.)

Figure 14. Commitversion Step 2

Conclusion

The Cisco Catalyst Virtual Switching System continues to evolve with new innovative capabilities, including VS4O Quad-Supervisor SSO support. The increased availability and consistent bandwidth provided by the VS4O configurations provide higher resiliency for campus network deployments such as:

Core and distribution network designs

Service module deployments

Designs with single attached devices

Designs with stringent bandwidth requirements

For More Information

Borderless Campus 1.0 Overview and Framework http://www.cisco.com/en/US/partner/solutions/ns340/ns414/ns742/ns815/landing_cOverall_design.html

Enterprise Campus 3.0 Design Guide http://www.cisco.com/en/US/partner/docs/solutions/Enterprise/Campus/campover.html

Smart Business Architecture Borderless Networks Design Guides http://www.cisco.com/en/US/partner/netsol/ns982/networking_solutions_program_home.html#~bng

Nonstop Forwarding with Stateful Switchover on the Cisco Catalyst 6500 http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/prod_white_paper0900aecd801c5cd7.html

Cisco Catalyst 6500 Series Virtual Switching System http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps9336/white_paper_c11_429338.pdf



[1] Control plane: Refers to the Cisco IOS Software processes running on the device. These include network-level and device-level controls such as routing protocols, spanning tree protocols, and Simple Network Management Protocol processing.
[2] Data plane: Refers to the hardware-accelerated traffic forwarding.
[3] When the VSS is configured with a single supervisor module per chassis, the VSS provides for 1+1 chassis redundancy. In the unlikely event of a supervisor module failure, the local chassis (chassis associated with the failed supervisor module) is also rendered nonoperational. However, the surviving chassis remains operational.
[4] EFSU: Refers to the upgrade process supported by the Cisco Catalyst 6500; uses the Cisco IOS Software In-Service Software Upgrade infrastructure.
[5] Cisco IOS Software 15.1(1)SY1 is the first software release for the Cisco Catalyst 6500 Supervisor Engine 2T to support a redundant in-chassis supervisor module in VSS mode.
[6] Standalone chassis refers to a Cisco Catalyst 6500 not converted to VSS mode.
[7] White paper reference for SSO: Nonstop Forwarding with Stateful Switchover on the Cisco Catalyst 6500: http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/prod_white_paper0900aecd801c5cd7.html.
[8] The Supervisor Engine 720-10G supports VSS Quad-Sup Uplink Forwarding beginning with the 12.2(33)SXI4 Cisco IOS Software release.
[9] Fast/aggressive timers is a term used to describe network control protocols using hold time values less than 1 second.
[11] Note that the VSS will continue to operate with just a single interface in the VSL. However, the interface will be providing the entire bandwidth for interchassis control plane and date plane traffic; in addition, the EFSU process will not be supported. This configuration should only be allowed as a redundant fallback situation in the event of a VSL interface failure/failures.