The Cisco® Catalyst® 6500 has been deployed in many of the most critical parts of enterprise and service provider networks. Having such a vital position in the network, the Cisco Catalyst 6500 must achieve close to 100-percent high availability. The platform has evolved over the years to achieve higher levels of availability by providing more advanced resiliency mechanisms. Examples of high-availability mechanisms include redundant supervisors, redundant switches with redundant links, Cisco EtherChannel® technology, the Spanning Tree Protocol, the Unidirectional Link Detection Protocol (UDLD), the Hot Standby Router Protocol (HSRP), the Gateway Load Balancing Protocol (GLBP), and routing protocol equal-cost paths.
The newest addition to the family of high-availability features is nonstop forwarding (NSF) with stateful switchover (SSO). NSF with SSO is a supervisor redundancy mechanism introduced on the Supervisor Engine 2 and the Supervisor Engine 720 in Cisco IOS® Software Release 12.2(18)SXD to provide intrachassis SSO at Layer 2-4. NSF with SSO reduces the mean time to repair (MTTR) by allowing extremely fast supervisor switchover in the order of 0 to 3 seconds of packet loss. NSF with SSO can be deployed in the most critical parts of an enterprise or service provider network. It is an essential feature for single points of termination in the network, and it minimizes downtime when voice over IP (VoIP), video, and other packet loss-sensitive applications are involved.
This paper discusses the NSF and SSO supervisor redundancy operations for the Cisco Catalyst 6500 in Cisco IOS Software. It covers the NSF with SSO platform-specific details; the NSF with SSO supported features, including Multicast Multilayer Switching (MMLS) NSF with SSO, and the NSF with SSO performance results. Although it is not the goal of this paper, it is very important for readers to understand how to design a highly available network with NSF and SSO. For high-availability campus network design information, in-depth information about generic NSF with SSO operations, an NSF with SSO configuration guide, an exhaustive list of all Cisco Catalyst 6500 high-availability mechanisms, and supervisor redundancy information about the Cisco Catalyst Operating System for the Cisco Catalyst 6500, see the "References" section of this paperSupervisor Redundancy
Switch Redundancy Components
The Cisco Catalyst 6500 Series switches are built with the design goal of redundant hardware system architecture as the basis for a highly available system. The following components in the Cisco Catalyst 6500 switches provide switch redundancy:
• Supervisor engine-Every Cisco Catalyst 6500 chassis can support redundant supervisors to provide for system high availability. Supervisors operate in active and standby modes and support a variety of redundancy mechanisms for failover.
• Switch fabric-The switch fabric provides a data path for fabric-enabled line cards and increases the available system bandwidth from the shared bus capacity of 32 Gbps to 256 Gbps for the Supervisor Engine 2 with switch fabric module 2 (SFM2) or 720 Gbps for the Supervisor Engine 720. If a switch fabric fails, the redundant switch fabric (if present) takes over.
• Power supplies-Every Cisco Catalyst 6500 chassis supports redundant power supplies so that a power supply failure does not affect operations.
• Fan trays-Each fan tray has multiple fans. The Cisco Catalyst WS-C6509-NEB-A chassis also provides optional fan-tray redundancy.
• Line-card online insertion and removal (OIR)-New modules can be added without affecting the system, and line cards can be exchanged without losing the configuration. When a module with a local forwarding engine (also referred to as distributed forwarding card) is inserted, the local forwarding-engine hardware tables are repopulated with the most current forwarding information.
Supervisor Redundancy Definitions
Supervisor redundancy on the Cisco Catalyst 6500 requires the following:
• Two supervisors per chassis
• A redundancy protocol to synchronize information between these two supervisors
The supervisor engine that boots first becomes the active supervisor engine. The active supervisor is responsible for control-plane and forwarding decisions. The second supervisor is the standby supervisor, which does not participate in the control or data-plane decisions. The active supervisor synchronizes configuration and protocol state information to the standby supervisor. As a result, the standby supervisor is ready to take over the active supervisor responsibilities if the active supervisor fails. This "take-over" process from the active supervisor to the standby supervisor is referred to as switchover.
Only one supervisor is active at a time, and supervisor-engine redundancy does not provide supervisor-engine load balancing. However, the interfaces on a standby supervisor engine are active when the supervisor is up and thus can be used to forward traffic in a redundant configuration.
Supervisor Redundancy Operations
The supervisor redundancy operations have evolved from route processor redundancy (RPR) and RPR plus (RPR+) to single router mode (SRM) with SSO and NSF with SSO. Each of these redundancy modes of operation improves upon the functions of the previous mode.
• RPR-RPR is the first redundancy mode of operation introduced in Cisco IOS Software. In RPR mode, the startup configuration and boot registers are synchronized between the active and standby supervisors, the standby is not fully initialized, and images between the active and standby supervisors do not need to be the same. Upon switchover, the standby supervisor becomes active automatically, but it must complete the boot process. In addition, all line cards are reloaded and the hardware is reprogrammed. The RPR switchover time is 2 or more minutes.
• RPR+-RPR+ is an enhancement to RPR in which the standby supervisor is completely booted and line cards do not reload upon switchover. The running configuration is synchronized between the active and the standby supervisors. All synchronization activities inherited from RPR are also performed. The synchronization is done before the switchover, and the information synchronized to the standby is used when the standby becomes active to minimize the downtime. No link layer or control-plane information is synchronized between the active and the standby supervisors. Interfaces may bounce after switchover, and the hardware contents need to be reprogrammed. The RPR+ switchover time is 30 or more seconds.
• SRM with SSO-SSO expands the RPR+ capabilities to provide transparent failover of Layer 2 protocols when a supervisor failure occurs. SSO is stateful for Layer 2 protocols. Policy-feature-card (PFC) and distributed-forwarding-card (DFC) hardware tables are maintained across a switchover. This allows for transparent failover at Layer 2 and Layer 4. SSO is a requirement for SRM with SSO and NSF with SSO. SSO can be used independently of SRM and NSF, which provide extra Layer 3 routing functions. When using SRM with SSO, the routing protocols restart upon switchover. However, SRM with SSO uses the existing PFC and DFC Layer 3 switching information to forward traffic for a configurable route-convergence interval while the newly active Multilayer Switch Feature Card (MSFC) builds its routing table. This minimizes downtime, but peers still need to reconverge around the supervisor failure. The SRM-with-SSO switchover time is 0 to 3 seconds for Layer 2 unicast traffic.
• NSF with SSO-NSF works in conjunction with SSO to ensure Layer 3 integrity following a switchover. It allows a router experiencing the failure of an active supervisor to continue forwarding data packets along known routes while the routing protocol information is recovered and validated. This forwarding can continue to occur even though peering arrangements with neighbor routers have been lost on the restarting router. NSF relies on the separation of the control plane and the data plane during supervisor switchover. The data plane continues to forward packets based on pre-switchover Cisco Express Forwarding information. The control plane implements graceful restart routing protocol extensions to signal a supervisor restart to NSF-aware neighbor routers, reform its neighbor adjacencies, and rebuild its routing protocol database following a switchover. An NSF-capable router implements the NSF functionality and continues to forward data packets after a supervisor failure. An NSF-aware router understands the NSF graceful restart mechanisms: it does not tear down its neighbor relationships with the NSF-capable restarting router, and can help a neighboring NSF-capable router restart thus avoiding unnecessary route flaps and network instability. An NSF-capable router is also NSF-aware.
MMLS NSF with SSO enables the system to maintain multicast forwarding state in the PFC3 and DFC3 hardware during a supervisor-engine switchover, minimizing multicast service interruption. Prior to MMLS NSF with SSO, the multicast forwarding entries were not synchronized to the standby supervisor engine. The NSF with SSO switchover time is 0 to 3 seconds for Layer 2-4 unicast or multicast traffic.
Table 1 gives the minimum software version for each redundancy mode supported on the Cisco Catalyst 6500.
Table 1. Supervisor Redundancy Mode Support
Supervisor Engine
RPR and RPR+
SRM with SSO
NSF with SSO
Supervisor Engine 1A
12.1(13)E
-
-
Supervisor Engine 2
12.1(13)E or 12.1(17d)SXB
-
12.2(18)SXD
Supervisor Engine 720
12.2(14)SX
12.2(17b)SXA and 12.2(17d)SXB only
12.2(18)SXD
The default redundancy mode of operation with two Supervisor Engine 720s is SSO in Cisco IOS Software Release 12.2(17b)SXA and later releases. The default redundancy mode of operation with two Supervisor Engine 2s is SSO in Cisco IOS Software Release 12.2(18)SXD and later releases. In earlier Cisco IOS Software 12.2SX releases, the default redundancy mode of operation is RPR+.
In order to run in RPR+ or SSO redundancy mode, image versions must be the same on the redundant and active supervisors. In these redundancy modes, the active supervisor engine checks the image version of the redundant supervisor engine when the redundant supervisor engine comes online. If the image on the redundant supervisor engine does not match the image on the active supervisor engine, the software sets the redundancy mode to RPR while doing a software upgrade and sets it back to SSO when the software upgrade is complete.
Note that future Cisco In Service Software Upgrades (ISSUs) will allow software upgrades in SSO redundancy mode. NSF with SSO is the building block for Cisco ISSUs.
Supervisor Fault Detection
Supervisor redundancy is just one part of system high availability. Detecting hardware and software faults is a primary requirement for providing resilient supervisor switchover mechanisms. Generic Online Diagnostics (GOLD) and platform-dependant diagnostics provide the framework for this fault detection.
GOLD defines a common architecture for diagnostic operation on Cisco Systems® platforms. GOLD works together with platform-specific online diagnostics to help ensure that a system booting up and a live system are healthy. Given that most of the intelligence of a Cisco Catalyst 6500 is hardware-based, it is very important to make sure the hardware functions are tested regularly. Fault-detection diagnostics mechanisms are enabled on most modules in a Cisco Catalyst 6500 system, including the active and standby supervisors. Diagnostics test results can be used to make switchover decisions. With online diagnostics being integrated on the Cisco Catalyst 6500, switchover triggers are not limited to software crashes or keepalive mechanisms. Instead, switchovers can be triggered when the supervisor control and data paths are inconsistent or faulty, or when runtime diagnostics detect a malfunctioning piece of hardware. In addition to helping trigger switchover decisions, GOLD regularly monitors the standby supervisor to make sure that it is ready to take over if the need to switchover occurs. GOLD also integrates a feature that allows scheduling of switchovers: an administrator can schedule a switchover at a specific time through an online diagnostics command-line interface (CLI).
GOLD detects the following problems to make supervisor switchover decisions:
• Faulty hardware components
• Faulty connectors
• Failed interfaces
• Memory errors
• Inconsistencies between the data plane and the control plane
SSO
SSO Operation
SSO Synchronization Operation
Figure 1 depicts SSO synchronization during normal operations. In SSO mode, Layer 2 protocols and PFC hardware contents are synchronized from the active supervisor to the standby supervisor. On the figure, the RP is the route processor, SP is the switch processor, PFC is the policy feature card, and DFC is the distributed forwarding card.
Figure 1. SSO Synchronization Operation
SSO expands the synchronization capabilities of RPR+ to allow transparent failover at Layer 2 and Layer 4. Synchronization from the active to the standby supervisor is not limited to startup configuration, startup variables, and running configuration; it also applies to runtime data. This dynamic data synchronization, referred to as check pointing, relies on the Cisco IOS Redundancy Facility and the Checkpoint Facility to initiate failovers and provide ordered and reliable communication between peer protocol processes on the active and standby supervisors. SSO bulk synchronization occurs at boot time. When a system is operational, configuration synchronization and state check pointing for various protocols happen as changes occur within the system.
SSO synchronizes runtime data for Layer 2 dynamic protocols. As Layer 2 control-plane, configuration, or other network-related changes occur, the Cisco IOS Checkpoint Facility running between the peer processes on the active and standby supervisors communicates the changes. Table 2 gives the list of Layer 2 protocols supported with SSO. For example, the Spanning Tree Protocol database on the standby supervisor is kept up-to-date by check pointing both protocol information and port states from the active supervisor.
SSO also synchronizes the hardware forwarding tables between the active and standby supervisors. The PFC is a supervisor daughter card that contains the application-specific integrated circuit (ASIC) responsible for hardware switching. When new hardware table entries need to be downloaded to the PFC, entries also are downloaded to all other forwarding engines in the system. This allows the standby supervisor PFC to contain the same forwarding information as the active PFC and the DFCs. The MAC address table, the Forwarding Information Base (FIB), the adjacency table, the access control lists (ACLs), and the quality-of-service (QoS) hardware table contents can be used for switching decisions after switchover.
Figure 2 depicts the supervisor switchover operation. Upon switchover, traffic can be forwarded without disruption. The numbers 1, 2, 3, and 4 represent switchover steps. These steps are described as follows.
Figure 2. Supervisor Switchover Operation
During normal operation, hardware table and Layer 2 protocol state synchronization occur. Figure 1 depicts the supervisor switchover operation in SSO redundancy mode. Switchover Steps 1 through 4 are described as follows:
1. The system detects a software or hardware fault on the active supervisor and triggers a switchover. This fault could be detected by software exception handlers, GOLD background checks, keepalive failures between the route processor (RP) and the switch processor (SP), fabric-switching-module state changes on a Supervisor Engine 720, or it could be the result of a user-initiated switchover.
2. Line-card synchronization helps ensure that all modules in the system understand that a switchover has occurred. The standby supervisor assumes the role of active supervisor and data is forwarded by the PFC on the newly active supervisor.
3. The switch processor and route processor on the newly active supervisor start processing protocol and data packets. SSO-aware protocols are not affected by the switchover, and these protocols start processing updates from the network.
4. Non-SSO-aware protocols and routing protocols are initialized. SRM with SSO purges the preswitchover FIB information after a configurable route-convergence interval, which allows for Layer 3 forwarding to continue in hardware while the routing protocols converge. Peers need to reconverge around the failure. Static routes are maintained across a switchover because they are based on static configuration and are not dynamic. Supported Layer 2 control protocols and Layer 4 policies derived from QoS or ACL policies are not affected by a switchover.
NSF WITH SSO
NSF with SSO Operation
NSF with SSO Synchronization Operation
Figure 3 depicts supervisor synchronization with NSF and SSO. Orange steps are control plane-driven, whereas blue steps are data-driven. Green arrows show synchronization operations for the software Cisco Express Forwarding tables and the PFC Cisco Express Forwarding tables.
Figure 3. NSF with SSO Synchronization Operation
Packet forwarding in a Cisco router is provided by Cisco Express Forwarding, which maintains two tables: a FIB and an adjacency table. The FIB table is a distilled version of the routing table, containing only information relevant to the forwarding process and not to particular routing protocols. For example, the routing protocols administrative distance is not relevant to the forwarding process. The adjacency table is a collection of next-hop rewrite information for adjacent nodes.
During normal operation, the system collects the routes calculated by each routing protocol into a common database called the Routing Information Base (RIB). When information for all routing protocols is present in the RIB, the RIB is scanned to determine the lowest-cost next-hop destination for each network and subnet. At that point, routing prefix and adjacency information for lowest-cost paths are populated to the Cisco Express Forwarding tables. As routing-protocol changes occur, the software Cisco Express Forwarding databases are check pointed from the active supervisor to the standby supervisor, and the Cisco Express Forwarding tables are downloaded to the hardware on all PFCs and DFCs present in the system, including the standby PFC. This ensures forwarding-table synchronization at the software and hardware level and ensures that postswitchover data forwarding relies on the most accurate and up-to-date forwarding-table information.
An epoch number per Cisco Express Forwarding entry is introduced in order to allow differentiation between old and new Cisco Express Forwarding entries. This is known as FIB and adjacency database versioning. Only software Cisco Express Forwarding tables keep track of the epoch number, and this version number does not impact the forwarding path. A "global epoch number" is incremented when a switchover occurs. The version number for the Cisco Express Forwarding entries is updated with the global epoch number when new routing information is populated after switchover on the newly active supervisor. When the routing protocols signal that they have converged, all FIB and adjacency entries that have version numbers older than the current epoch are cleared.
Supervisor Switchover Operation
The separation between the data plane and Layer 3 control plane is critical for the correct function of NSF upon switchover. Whereas the control plane builds a new routing protocol database and restarts peering agreements, the data plane relies on preswitchover forwarding-table synchronization to continue forwarding traffic. The following section assumes the presence of an NSF-aware neighbor. Without the help of NSF-aware neighbors, NSF-capable systems cannot rebuild their database nor maintain their neighbor adjacencies across a switchover. (Note that the Cisco Intermediate System-to-Intermediate System [IS-IS] NSF implementation does not require any NSF-aware neighbor.)
The same switchover operations as described in Figure 2 occur. However, reinitialization of the NSF-capable routing protocol does not cause route flaps. Figure 4 describes the generic routing protocol NSF with SSO operations that take place. Figure 4 depicts an NSF-aware neighbor router and an NSF-capable Cisco Catalyst 6500. The Cisco Catalyst 6500 newly active supervisor is represented along with NSF with SSO operation steps. This figure does not represent the failing former active supervisor. Note that the steps applying to the supervisor switch processor (SP) and policy feature card (PFC) apply also to the line-card (LC) processor and DFCs. Orange steps are control-plane driven, whereas blue steps are data-driven.
Figure 4. NSF with SSO Operation
Figure 4 steps 1 through 12 are described as follows. All these steps occur on the "newly active" supervisor.
1. Switchover is triggered.
2. Routing-protocol processes are informed of the supervisor failover. In order to provide control- and data-plane separation, the FIB is detached from the RIB until the routing protocol reconverges.
3. Packet forwarding continues based on last-known FIB and adjacency entries while the standby takes over.
4. The global epoch number is incremented: if the preswitchover global epoch was 0, it is incremented to 1.
5. The supervisor starts processing control-plane traffic.
6. The software adjacency table is populated with the preswitchover Address Resolution Protocol (ARP) table contents. Updated Cisco Express Forwarding entries receive the new global epoch number. The epoch number is available only in the route processor software Cisco Express Forwarding entries. It is not present in the hardware table. New adjacency entries are downloaded to the hardware.
7. The routing protocol-specific neighbor and adjacency reacquisition occurs: the restarting NSF-capable router notifies its neighbor that the adjacency is being reacquired and that the NSF-aware neighbor should not reinitialize the neighbor relationship. Upon receiving the restart indication, protocol-specific procedures occur to allow adjacencies to be maintained. In most cases, the restart indication consists of setting a restart flag in hello packets and sending hello packets at a shorter interval for the duration of the recovery process. NSF-aware neighbors might also indicate their NSF awareness to restarting routers. Non-NSF-aware neighbors ignore the restart indication and bring down the adjacency. Note also that the current NSF implementation does not support multiple NSF-capable neighbor restarts at once.
8. The routing protocol-specific database synchronization occurs: routing protocol processes rebuild their database using database information from NSF-aware neighbors.
9. When the routing databases are synchronized, distance-vector, path-vector, or shortest-path-first (SPF) algorithm computations determine the best route for specific prefix destinations. The RIB is repopulated with new routing entries. The corresponding Cisco Express Forwarding entries are updated.
10. As the software Cisco Express Forwarding databases are populated with updated information, updated entries receive the global epoch number to indicate that they have been refreshed. Corresponding FIB entries and hardware entries are updated.
11. Each routing protocol notifies Cisco Express Forwarding that it has converged. After all of them have converged, the last routing protocol flushes the stale route and adjacency information: software Cisco Express Forwarding entries with an epoch number not corresponding to the current global epoch number are flushed. Corresponding FIB and adjacency hardware entries are also flushed.
12. The Cisco IOS Software Cisco Express Forwarding tables on the route processor and the forwarding tables on the switch processor and PFC and DFCs are now synchronized.
NSF graceful restart routing protocol extensions follow IETF drafts and RFCs. For additional NSF protocol-specific information, see the "References" section.
MMLS NSF with SSO
Prior to Cisco IOS Software Release 12.2(18)SXD, the switchover model for IPv4 Multicast was RPR+. Even with SRM with SSO for IPv4 unicast configured, the multicast forwarding entries in the PFC3 hardware are not synchronized to the standby supervisor engine.
When a switchover occurs, the multicast forwarding entries on the PFC3 and DFC3s (if present) are purged, causing service interruption for multicast traffic. After the new active route processor comes online, it must establish Protocol Independent Multicast (PIM) neighbor relationships, process Internet Group Management Protocol (IGMP) packets, and otherwise reconverge multicast state before it can repopulate the hardware forwarding entries in the PFC3 and DFC3 forwarding engines.
MMLS NSF with SSO enables the system to maintain multicast forwarding state in the PFC3 and DFC3 hardware during a supervisor-engine switchover, minimizing multicast service interruption.
In a steady state, the active supervisor engine synchronizes the standby supervisor engine with the hardware multicast forwarding entries. If a supervisor-engine switchover occurs, the entries in the PFC3 and DFC3 hardware forwarding tables are preserved and the system continues to forward multicast traffic using the last-known good copy of the multicast forwarding table.
When the new active route processor comes online, converges with the network, and relearns the multicast forwarding state, it repopulates the hardware forwarding tables on the PFC3 and DFC3 using the new information.
SSO AND NSF WITH SSO FEATURES
Tables 2 through 5 provide a description of the Layer 2, Layer 3, WAN, and hardware features that are synchronized for SSO and NSF with SSO. Features designated as "coexistent" can be used with SSO and NSF with SSO but their protocol state is not synchronized from the active to the standby supervisors and the protocol is reinitialized upon switchover. Release details can be found in the Cisco Catalyst 6500 release notes.
Layer 2 Features Synchronization
Table 2 provides a description of the Layer 2 protocols that are synchronized for Layer 2 SSO. The features listed are available with Supervisor Engine 720 in Cisco IOS Software Release 12.2(17b)SXA and Supervisor Engine 2 in Cisco IOS Software Release 12.2(18)SXD.
Table 2. Layer 2 SSO Supported Features
Layer 2 SSO Supported Features
Cisco Discovery Protocol (stateful only for power-related information)
Port security
Diagnostics
Switched Port Analyzer (SPAN) and Remote SPAN (RSPAN)
802.1q
Spanning Tree Protocol
802.1x
Traffic storm
Dynamic Trunking Protocol (DTP)
UDLD
IGMP snooping
VLAN trunks
Interface and port state
VLAN Trunking Protocol (VTP)
Layer 2 protocol tunneling
Voice VLAN and inline power
Port channeling: Port Aggregation Protocol (PAgP) and Link Aggregate Control Protocol (LACP)
Layer 3 Features Synchronization
Table 3 provides a description of the Layer 3 protocols that are synchronized for SRM with SSO and NSF with SSO. NSF capability for Enhanced Interior Gateway Routing Protocol (EIGRP), Open Shortest Path First (OSPF), IS-IS, and Border Gateway Protocol (BGP) is available with Supervisor Engine 720 and Supervisor Engine 2 in the Cisco IOS Software Release 12.2(18)SXD. Cisco IOS Software Release 12.2(18)SXD introduces MMLS NSF with SSO for Supervisor Engine 720 only.
Table 3. Layer 3 Features Synchronization
Layer 3 Information Synchronization
NSF with SSO Supported Features
SRM with SSO Supported Features
Coexistent Features
ARP
ARP
HSRP, GLBP, and Virtual Router Redundancy Protocol (VRRP)
BGP
Hardware Cisco Express Forwarding tables
Multiprotocol Label Switching (MPLS), Internetwork Packet Exchange (IPX), and IPv6
Cisco Express Forwarding tables (software and hardware)
PIM snooping
EIGRP
Routing Information Protocol (RIP)
IS-IS
MMLS NSF with SSO
OSPFv2
VRF Lite
WAN Features Synchronization
Table 4 provides a description of the WAN protocols that are synchronized for SSO. The features listed are available with Cisco IOS Software Release 12.2(18)SXD.
Table 4. WAN Features Synchronization
Layer 2 SSO Supported Features
SSO Supported Features
Coexistent Features
ATM
Distributed link fragmentation and interleaving over ATM (dLFIoATM) dLFI over Frame Relay (dLFIoFR)
Automatic protection switching (APS)
Multilink PPP (MLPPP) and Multilink Frame Relay (MFR)
Frame Relay
IP header compression
High-Level Data Link Control (HDLC)
MPLS and Any Transport over MPLS (AToM)
Point-to-Point Protocol (PPP)
QoS for WAN cards
Spatial Reuse Protocol (SRP)
ATM
Distributed link fragmentation and interleaving over ATM (dLFIoATM) dLFI over Frame Relay (dLFIoFR)
Automatic protection switching (APS)
Multilink PPP (MLPPP) and Multilink Frame Relay (MFR)
Hardware Layer 2-4 Features Synchronization
Table 5 gives a description of the PFC information that is available on the standby supervisor upon SSO switchover. Most of the features listed are available with Supervisor Engine 720 in Cisco IOS Software Release 12.2(17b)SXA and Supervisor Engine 2 in Cisco IOS Software Release 12.2(18)SXD.
Table 5. Hardware Layer 2-4 Features Synchronization
Hardware L2-4 Features synchronization
Hardware ACL based features
Hardware Forwarding Information Base (FIB)
Hardware Adjacency table
Hardware MAC-address table
Hardware IP multicast information
Hardware QoS based features
SWITCHOVER PERFORMANCE
NSF with SSO Failover Time
Figure 5 depicts the test setup for NSF-with-SSO performance testing.
Figure 5. NSF with SSO Performance Tests
The NSF with SSO performance on the Cisco Catalyst 6500 was measured using the setup shown in Figure 5. This setup integrates simulation devices to record the failover time corresponding to different modes of operation. It also includes real-life applications to make sure that these applications are not affected by a NSF with SSO switchover. The tested applications include video and VoIP applications.
The setup consists of three Cisco Catalyst 6500 switches. All supervisors in the setup are loaded with Cisco IOS 12.2(18)SXD. The Device Under Test (DUT) is a Cisco Catalyst 6500 with redundant Supervisor 720s switching bidirectional Layer 2 and Layer 3 traffic from the neighbor routers. This bidirectional traffic includes simple Layer 2 and Layer 3 traffic generated from a traffic simulator at 100,000 pps, as well as voice and video traffic from VoIP phones and a video client and server. The testing procedure consists of Layer 2 and Layer 3 tests run for each of the following failover mechanisms: RPR+, SSO with NSF capability disabled, and NSF with SSO. Layer 3 test runs were performed with 1000 routes injected for OSPF, EIGRP, IS-IS, and BGP. All neighbors are NSF-aware.
The failover time can be derived by comparing the number of packets sent with the number of packets received across a switchover: at a given packet rate (100,000 pps in the test run), the failover time corresponds to (Packets Transmitted - Packets received)/Packet rate.
Table 6 lists failover times for different scenarios when traffic flows between two ports on a WS-X6748-GE-TX module. Overall, failover times range from 0 to 3 seconds with NSF with SSO, depending on test conditions.
Table 6. NSF with SSO Failover Times
Failover Time
Layer 2 Traffic
Layer 3 EIGRP Routed Traffic
Layer 3 OSPF Routed Traffic
Layer 3 IS-IS Routed Traffic (Cisco method)
Layer 3 IS-IS Routed Traffic (IETF method)
Layer 3 BGP Routed Traffic
RPR+
62.00s
70.00s
140.00s
82.00s
82.00s
130.00s
SSO (NSF capability disabled)
0.50s
6.00s
11.00s
0.55s
0.55s
54.00s
NSF with SSO
0.50s
0.55s
0.55s
0.55s
0.55s
0.55s
Comparison of Cisco Catalyst Operating System and Cisco IOS Software High Availability
Table 7 compares the Cisco Catalyst Operating System, hybrid, and Cisco IOS Software switchover performance numbers for equivalent features. More information about the Cisco Catalyst Operating System supervisor redundancy mechanisms can be found at http://www.cisco.com/warp/public/cc/pd/si/casi/ca6000/tech/hafc6_wp.pdf.
The fast-switchover capability in the Cisco Catalyst Operating System is comparable to the Cisco IOS Software RPR+ function. The Cisco Catalyst Operating System High-Availability feature is comparable to the Cisco IOS Software SSO feature in that they both provide Layer 2 protocol synchronization and Layer 2-4 hardware synchronization.
The hybrid high availability with SRM redundancy method is the hybrid equivalent to SSO. The hybrid model does not support the NSF functionality.
Table 7 compares performance numbers for the Cisco Catalyst Operating System, hybrid, and Cisco IOS Software redundancy features.
Table 7. Cisco Catalyst Operating System and Cisco IOS Software Supervisor Redundancy Features
Cisco Catalyst Operating System
Hybrid
Cisco IOS Software
-
-
RPR: >120.00s
Fast switchover: > 30.00s
Fast switchover: >30.00s
RPR+: 30.00s
High availability: 0.50-5.00s
High availability with SRM: 0.50-5.00s
SSO: 0.00-3.00s
High availability: 0.50-5.00s
High availability with SRM: 0.50-5.00s
NSF with SSO: 0.00-3.00s
High availability versioning
High availability versioning
FSU
STATISTICS AND SIMPLE NETWORK MANAGEMENT PROTOCOL
Statistics
The various statistics maintained by an active supervisor are not synchronized to the redundant supervisor because they may change often and the degree of synchronization they require is substantial. A network-management system should be used to poll affected statistics regularly to maintain accurate statistics.
SNMP
Simple Network Management Protocol (SNMP) data is synchronized between redundant supervisors when the supervisor is operating in SSO mode. This is done to ensure that the standby and the active supervisor are indistinguishable from a network-management perspective. Some of the SNMP objects that are synchronized include interface-related features such as ifindex and SNMP configuration.
The Cisco High-Availability MIB, CISCO-RF_MIB, reports redundancy information to an administrator. This information includes identification of the primary and secondary supervisors, current redundancy state, the reason for the last switchover that occurred, and when the last switchover occurred. When a switchover occurs, the ciscoRFSwactNotif notification is used to signal a switchover.
In addition to using the Cisco High-Availability MIB, syslog messages and SNMP traps are sent to notify the administrator of any component failure.
SNMP data synchronization is not available in the RPR and RPR+ modes of operations.
It is important for service modules to continue working through an NSF with SSO supervisor failover event. Many of the service modules have specific high-availability mechanisms in place today to allow intrachassis or interchassis module-to-module switchover. Supervisor NSF with SSO support with services modules complements the high-availability mechanism of each of these services modules by minimizing the impact of a supervisor failover.
Each of the properties pertaining to SSO with standard switching modules holds true for services modules: the services modules do not reboot, the services modules interfaces stay up, and the service modules are not affected by a supervisor switchover except for the short period corresponding to line-card synchronization.
Optical services modules (OSMs) and FlexWAN modules are supported with redundant supervisor engines and continue working through an NSF with SSO supervisor failover event.