Information About SSO
SSO Overview
The switch is supports fault resistance by allowing a redundant supervisor engine to take over if the primary supervisor engine fails. Cisco SSO (frequently used with NSF) minimizes the time a network is unavailable to its users following a switchover while continuing to forward IP packets. The switch is supports route processor redundancy (RPR). For more information, see Chapter10, “Route Processor Redundancy (RPR)”
SSO is particularly useful at the network edge. Traditionally, core routers protect against network faults using router redundancy and mesh connections that allow traffic to bypass failed network elements. SSO provides protection for network edge devices with dual Route Processors (RPs) that represent a single point of failure in the network design, and where an outage might result in loss of service for customers.
SSO has many benefits. Because the SSO feature maintains stateful feature information, user session information is maintained during a switchover, and line cards continue to forward network traffic with no loss of sessions, providing improved network availability. SSO provides a faster switchover than RPR by fully initializing and fully configuring the standby RP, and by synchronizing state information, which can reduce the time required for routing protocols to converge. Network stability may be improved with the reduction in the number of route flaps had been created when routers in the network failed and lost their routing tables.
SSO is required by the Cisco Nonstop Forwarding (NSF) feature (see Chapter 9, “Nonstop Forwarding (NSF)”).
Figure 8-1 illustrates how SSO is typically deployed in service provider networks. In this example, Cisco NSF with SSO is primarily at the access layer (edge) of the service provider network. A fault at this point could result in loss of service for enterprise customers requiring access to the service provider network.
For Cisco NSF protocols that require neighboring devices to participate in Cisco NSF, Cisco NSF-aware software images must be installed on those neighboring distribution layer devices. Additional network availability benefits might be achieved by applying Cisco NSF and SSO features at the core layer of your network; however, consult your network design engineers to evaluate your specific site requirements.
Figure 8-1 Cisco NSF with SSO Network Deployment: Service Provider Networks
Additional levels of availability may be gained by deploying Cisco NSF with SSO at other points in the network where a single point of failure exists. Figure 8-2 illustrates an optional deployment strategy that applies Cisco NSF with SSO at the enterprise network access layer. In this example, each access point in the enterprise network represents another single point of failure in the network design. In the event of a switchover or a planned software upgrade, enterprise customer sessions would continue uninterrupted through the network.
Figure 8-2 Cisco NSF with SSO Network Deployment: Enterprise Networks
SSO Operation
SSO establishes one of the RPs as the active processor while the other RP is designated as the standby processor. SSO fully initializes the standby RP, and then synchronizes critical state information between the active and standby RP.
During an SSO switchover, the line cards are not reset, which provides faster switchover between the processors. The following events cause a switchover:
- A hardware failure on the active supervisor engine
- Clock synchronization failure between supervisor engines
- A manual switchover or shutdown
An SSO switchover does not interrupt Layer 2 traffic. An SSO switchover preserves FIB and adjacency entries and can forward Layer 3 traffic after a switchover. SSO switchover duration is between 0 and 3 seconds.
Route Processor Synchronization
Synchronization Overview
In networking devices running SSO, both RPs must be running the same configuration so that the standby RP is always ready to assume control if the active RP fails. SSO synchronizes the configuration information from the active RP to the standby RP at startup and whenever changes to the active RP configuration occur. This synchronization occurs in two separate phases:
- While the standby RP is booting, the configuration information is synchronized in bulk from the active RP to the standby RP.
- When configuration or state changes occur, an incremental synchronization is conducted from the active RP to the standby RP.
Bulk Synchronization During Initialization
When a system with SSO is initialized, the active RP performs a chassis discovery (discovery of the number and type of line cards and fabric cards, if available, in the system) and parses the startup configuration file.
The active RP then synchronizes this data to the standby RP and instructs the standby RP to complete its initialization. This method ensures that both RPs contain the same configuration information.
Even though the standby RP is fully initialized, it interacts only with the active RP to receive incremental changes to the configuration files as they occur. Executing CLI commands on the standby RP is not supported.
Synchronization of Startup Configuration
During system startup, the startup configuration file is copied from the active RP to the standby RP. Any existing startup configuration file on the standby RP is overwritten.
The startup configuration is a text file stored in the NVRAM of the RP. It is synchronized whenever you perform the following operations:
- CLI command copy system:running-config nvram:startup-config is used.
- CLI command copy running-config startup-config is used.
- CLI command write memory is used.
- CLI command copy filename nvram:startup-config is used.
- SNMP SET of MIB variable ccCopyEntry in CISCO_CONFIG_COPY MIB is used.
- System configuration is saved using the reload command.
- System configuration is saved following entry of a forced switchover CLI command.
Incremental Synchronization
Incremental Synchronization Overview
After both RPs are fully initialized, any further changes to the running configuration or active RP states are synchronized to the standby RP as they occur. Active RP states are updated as a result of processing feature information, external events (such as the interface becoming up or down), or user configuration commands (using CLI commands or Simple Network Management Protocol [SNMP]) or other internal events.
CLI Commands
CLI changes to the running configuration are synchronized from the active RP to the standby RP. In effect, the CLI command is run on both the active and the standby RP.
SNMP SET Commands
Configuration changes caused by an SNMP set operation are synchronized on a case-by-case basis. Currently only two SNMP configuration set operations are supported:
- shut and no-shut (of an interface)
- link up/down trap enable/disable
Routing and Forwarding Information
Routing and forwarding information is synchronized to the standby RP:
- State changes for SSO-aware features (for example, SNMP) are synchronized to the standby RP.
- Cisco Express Forwarding updates to the Forwarding Information Base (FIB) are synchronized to the standby RP.
Chassis State
Changes to the chassis state due to line card insertion or removal are synchronized to the standby RP.
Line Card State
Changes to the line card states are synchronized to the standby RP. Line card state information is initially obtained during bulk synchronization of the standby RP. Following bulk synchronization, line card events, such as whether the interface is up or down, received at the active processor are synchronized to the standby RP.
Counters and Statistics
The various counters and statistics maintained in the active RP are not synchronized because they may change often and because the degree of synchronization they require is substantial. The volume of information associated with statistics makes synchronizing them impractical.
Note
Not synchronizing counters and statistics between RPs may create problems for external network management systems that monitor this information.
SSO Conditions
An automatic or manual switchover may occur under the following conditions:
- A fault condition that causes the active RP to crash or reboot—automatic switchover
- The active RP is declared dead (not responding)—automatic switchover
- The CLI is invoked—manual switchover
The user can force the switchover from the active RP to the standby RP by using a CLI command. This manual procedure allows for a “graceful” or controlled shutdown of the active RP and switchover to the standby RP. This graceful shutdown allows critical cleanup to occur.
Note
This procedure should not be confused with the graceful shutdown procedure for routing protocols in core routers—they are separate mechanisms.
Caution
The SSO feature introduces a number of new command and command changes, including commands to manually cause a switchover. The
reload command does not cause a switchover. The
reload command causes a full reload of the box, removing all table entries, resetting all line cards, and interrupting nonstop forwarding.
Switchover Time
The time required by the device to switch over from the active RP to the standby RP is between zero and three seconds.
Although the newly active processor takes over almost immediately following a switchover, the time required for the device to begin operating again in full redundancy (SSO) mode can be several minutes, depending on the platform. The length of time can be due to a number of factors including the time needed for the previously active processor to obtain crash information, load code and microcode, and synchronize configurations between processors.
On DFC-equipped switching modules, forwarding information is distributed, and packets forwarded from the same line card should have little to no forwarding delay; however, forwarding packets between line cards requires interaction with the RP, meaning that packet forwarding might have to wait for the switchover time.
Online Removal of the Active RP
Online removal of the active RP automatically forces a stateful switchover to the standby RP.
Fast Software Upgrade
You can use Fast Software Upgrade (FSU) to reduce planned downtime. With FSU, you can configure the system to switch over to a standby RP that is preloaded with an upgraded Cisco IOS software image. FSU reduces outage time during a software upgrade by transferring functions to the standby RP that has the upgraded Cisco IOS software preinstalled. You can also use FSU to downgrade a system to an older version of Cisco OS or have a backup system loaded for downgrading to a previous image immediately after an upgrade.
SSO must be configured on the networking device before performing FSU.
Note
During the upgrade process, different images will be loaded on the RPs for a short period of time. During this time, the device will operate in RPR mode.
Core Dump Operation
In networking devices that support SSO, the newly active primary processor runs the core dump operation after the switchover has taken place. Not having to wait for dump operations effectively decreases the switchover time between processors.
Following the switchover, the newly active RP will wait for a period of time for the core dump to complete before attempting to reload the formerly active RP. The time period is configurable. For example, on some platforms an hour or more may be required for the formerly active RP to perform a coredump, and it might not be site policy to wait that much time before resetting and reloading the formerly active RP. In the event that the core dump does not complete within the time period provided, the standby is reset and reloaded regardless of whether it is still performing a core dump.
The core dump process adds the slot number to the core dump file to identify which processor generated the file content.
Note
Core dumps are generally useful only to your technical support representative. The core dump file, which is a very large binary file, must be transferred using the TFTP, FTP, or remote copy protocol (rcp) server and subsequently interpreted by a Cisco Technical Assistance Center (TAC) representative that has access to source code and detailed memory maps.
SSO-Aware Features
A feature is SSO-aware if it maintains, either partially or completely, undisturbed operation through an RP switchover. State information for SSO-aware features is synchronized from active to standby to achieve stateful switchover for those features.
The dynamically created state of SSO-unaware features is lost on switchover and must be reinitialized and restarted on switchover.
The output of the show redundancy clients command displays the SSO-aware features (see the “Verifying SSO Features” section).