Nonstop Forwarding with Stateful Switchover

Cisco Nonstop Forwarding (NSF) works with the Stateful Switchover (SSO) feature to minimize the amount of time a network is unavailable to users following a switchover. The main objective of NSF SSO is to continue forwarding IP packets following a Route Processor (RP) switchover.

Prerequisites for Cisco Nonstop Forwarding with Stateful Switchover

  • Cisco nonstop forwarding (NSF) must be configured on a networking device that has been configured for stateful Switchover (SSO).

  • Border Gateway Protocol (BGP) support in NSF requires that neighbor networking devices be NSF-aware; that is, devices must have the graceful restart capability and advertise that capability in their OPEN message during session establishment. If an NSF-capable device discovers that a particular BGP neighbor does not have graceful restart capability, it does not establish an NSF-capable session with that neighbor. All other neighbors that have graceful restart capability continue to have NSF-capable sessions with this NSF-capable networking device.

  • Open Shortest Path First (OSPF) support in NSF requires that all neighbor networking devices be NSF-aware. If an NSF-capable device discovers that it has non-NSF-aware neighbors on a particular network segment, it disables NSF capabilities for that segment. Other network segments composed entirely of NSF-capable or NSF-aware devices continue to provide NSF capabilities.

Restrictions for Cisco Nonstop Forwarding with Stateful Switchover

The following are restrictions for configuring NSF with SSO:

  • NSF capability is supported for IPv4 routing protocols only. NSF capability is not supported for IPv6 routing protocols.

  • NSF does not support IP Multicast Routing, as it is not SSO-aware.

  • For NSF operation, you must have SSO configured on the device.

  • All Layer 3 neighboring devices must be an NSF helper or NSF-capable to support graceful restart capability.

  • For IETF, all neighboring devices must be running an NSF-aware software image.

  • The Hot Standby Routing Protocol (HSRP) is not supported with NSF SSO. Do not use HSRP with NSF SSO.

  • An NSF-aware device cannot support two NSF-capable peers performing an NSF restart operation at the same time. However, both neighbors can reestablish peering sessions after the NSF restart operation is complete.

  • For SSO operation, ensure that both active and standby devices run the same version of the Cisco IOS XE image. If the active and standby devices are operating different images, SSO failover might cause an outage.

Information About Cisco Nonstop Forwarding with Stateful Switchover

Overview of Cisco Nonstop Forwarding with Stateful Switchover

Cisco NSF works with the SSO feature. The device supports fault resistance by allowing a standby switch to take over if the active device becomes unavailable. NSF works with SSO to minimize the amount of time a network is unavailable.

Usually, when a networking device restarts, all routing peers of that device detect that the device went down and then came back up. This transition results in what is called a routing flap, which could spread across multiple routing domains. Routing flaps caused by routing restarts create routing instabilities, which are detrimental to the overall network performance. Cisco NSF helps to suppress routing flaps in SSO-enabled devices, thus reducing network instability.

Cisco NSF with SSO allows for the forwarding of data packets to continue along known routes while the routing protocol information is being restored following a switchover. With NSF/SSO, peer networking devices do not experience routing flaps. Data traffic is forwarded through intelligent line cards or dual forwarding processors (FPs) while the standby router processor (RP) assumes control from the failed active RP during a switchover. NSF with SSO operation provides the ability of line cards and FPs to remain active through a switchover and to be kept current with the Forwarding Information Base (FIB) on the active RP.

NSF provides the following benefits:

  • Improved network availability—NSF continues forwarding network traffic and application state information so that user session information is maintained after a switchover.

  • Overall network stability—Network stability can be improved with the reduction in the number of route flaps that are created when devices in the network fail, and lose their routing tables.

  • Neighboring devices do not detect a link flap—Because interfaces remain active during a switchover, neighboring devices do not detect a link flap (the link does not go down and come back up).

  • Prevents routing flaps—Because SSO continues forwarding network traffic during a switchover, routing flaps are avoided.

  • Maintains user sessions established prior to the switchover.

  • If a stack member does not respond, that member is removed from the stack.

  • If the standby device does not respond, a new standby device is elected.

  • If the active device does not respond, the standby device becomes the active device.

SSO Operation

When a standby device runs in SSO mode, the standby device starts up in a fully-initialized state and synchronizes with the persistent configuration and the running configuration on the active device. It subsequently maintains the state of the protocols, and all changes in hardware and software states for features that support SSO are kept in synchronization. Consequently, it offers minimum interruption to Layer 2 sessions in a redundant active device configuration.

If the active device fails, the standby device becomes the active device. This new active device uses existing Layer 2 switching information to continue forwarding traffic. Layer 3 forwarding is delayed until routing tables are repopulated in the newly active device.


Note

  • If the standby device is not programmed with a Federal Information Processing Standards (FIPS) key, it will print warning messages since it is not in the correct operating mode.

  • The switch will work in SSO mode even if one supervisor module is in FIPS mode and the other, in non-FIPS mode.


Cisco Nonstop Forwarding Operation

NSF always runs with SSO, and provides redundancy for Layer 3 traffic. NSF is supported by BGP, Enhanced Interior Gateway Routing Protocol (EIGRP), and OSPF routing protocols and also by Cisco Express Forwarding for forwarding. These routing protocols have been enhanced with NSF-capability and awareness, which means that devices running these protocols can detect a switchover and take necessary actions to continue forwarding network traffic and to recover route information from peer devices.

Each protocol depends on Cisco Express Forwarding to continue forwarding packets during switchover while routing protocols rebuild the Routing Information Base (RIB) tables. After the convergence of routing protocols, Cisco Express Forwarding updates the FIB table and removes stale route entries. Cisco Express Forwarding then updates the hardware with the new FIB information.

If the active device is configured (with the graceful-restart command) for BGP, OSPF, or EIGRP routing protocols, routing updates are automatically sent during the active device election.

NSF has two primary components:

  • NSF-aware: A networking device is NSF-aware if it is running NSF-compatible software. If neighboring devices detect that an NSF device can still forward packets when an active device election happens, this capability is referred to as NSF-awareness. Enhancements to the Layer 3 routing protocols (BGP, OSPF, and EIGRP) are designed to prevent route-flapping so that the Cisco Express Forwarding routing table does not time out or the NSF device does not drop routes. An NSF-aware device helps to send routing protocol information to the neighboring NSF device. NSF-awareness is enabled by default for EIGRP-stub, EIGRP, and OSPF protocols. NSF-awareness is disabled by default for BGP.

  • NSF-capability: A device is NSF-capable if it is configured to support NSF; it rebuilds routing information from NSF-aware or NSF-capable neighbors. NSF works with SSO to minimize the amount of time that a Layer 3 network is unavailable following an active device election by continuing to forward IP packets. Reconvergence of Layer 3 routing protocols (BGP, OSPFv2, and EIGRP) is transparent to the user and happens automatically in the background. Routing protocols recover routing information from neighbor devices and rebuild the Cisco Express Forwarding table.


    Note

    NSF does not support IPv6 and is IPv4 Unicast only.


Cisco Express Forwarding

A key element of NSF is packet forwarding. In a Cisco networking device, packet forwarding is provided by Cisco Express Forwarding. Cisco Express Forwarding maintains the Forwarding Information Base (FIB), and uses the FIB information that is current at the time of a switchover to continue forwarding packets during a switchover, to reduce traffic interruption during the switchover.

During normal NSF operation, Cisco Express Forwarding on the active device synchronizes its current FIB and adjacency databases with the FIB and adjacency databases on the standby device . Upon switchover, the standby device initially has FIB and adjacency databases that are mirror images of those that were current on the active device . Cisco Express Forwarding keeps the forwarding engine on the standby device current with changes that are sent to it by Cisco Express Forwarding on the active device . The forwarding engine can continue forwarding after a switchover as soon as the interfaces and a data path are available.

As the routing protocols start to repopulate the RIB on a prefix-by-prefix basis, the updates cause prefix-by-prefix updates to Cisco Express Forwarding, which it uses to update the FIB and adjacency databases. Existing and new entries receive the new version (“epoch”) number, indicating that they have been refreshed. The forwarding information is updated on the forwarding engine during convergence. The device signals when the RIB has converged. The software removes all FIB and adjacency entries that have an epoch older than the current switchover epoch. The FIB now represents the newest routing protocol forwarding information.

Routing Protocols

Routing protocols run only on the active RP, and receive routing updates from neighbor devices. Routing protocols do not run on the standby RP. Following a switchover, routing protocols request that the NSF-aware neighbor devices send state information to help rebuild routing tables. Alternately, the Intermediate System-to-Intermediate System (IS-IS) protocol can be configured to synchronize state information from the active to the standby RP to help rebuild the routing table on the NSF-capable device in environments where neighbor devices are not NSF-aware.


Note

For NSF operation, routing protocols depend on Cisco Express Forwarding to continue forwarding packets while routing protocols rebuild the routing information.


BGP Operation

When a NSF-capable device begins a BGP session with a BGP peer, it sends an OPEN message to the peer. Included in the message is a declaration that the NSF-capable device has “graceful restart capability.” Graceful restart is the mechanism by which BGP routing peers avoid a routing flap following a switchover. If the BGP peer has this capability, it is aware that the device sending the message is NSF-capable. Both the NSF-capable device and its BGP peer(s) need to exchange the Graceful Restart Capability in their OPEN messages, at the time of session establishment. If both peers do not exchange the Graceful Restart Capability, the session is not graceful restart capable.

If the BGP session is lost during the RP switchover, the NSF-aware BGP peer marks all routes associated with the NSF-capable device as stale; however, it continues to use these routes to make forwarding decisions for a set period of time. This functionality means that no packets are lost while the newly active RP is waiting for convergence of the routing information with the BGP peers.

After an RP switchover occurs, the NSF-capable device reestablishes the session with the BGP peer. In establishing the new session, it sends a new graceful restart message that identifies the NSF-capable device as having restarted.

At this point, the routing information is exchanged between two BGP peers. Once this exchange is complete, the NSF-capable device uses the routing information to update the RIB and the FIB with the new forwarding information. The NSF-aware device uses the network information to remove stale routes from its BGP table. Following that, the BGP protocol is fully converged.

If a BGP peer does not support the graceful restart capability, it will ignore the graceful-restart capability in an OPEN message; but will establish a BGP session with the NSF-capable device. This function allows interoperability with non-NSF-aware BGP peers (and without NSF functionality), but the BGP session with non-NSF-aware BGP peers will not be graceful restart capable.


Note

BGP support in NSF requires that neighbor networking devices be NSF-aware; that is, devices must have the Graceful Restart Capability and advertise that capability in their OPEN message during session establishment. If an NSF-capable device discovers that a particular BGP neighbor does not have Graceful Restart Capability, it will not establish an NSF-capable session with that neighbor. All other neighbors that have Graceful Restart Capability will continue to have NSF-capable sessions with this NSF-capable networking device.


EIGRP Operation

Enhanced Interior Gateway Routing Protocol (EIGRP) NSF capabilities are exchanged by EIGRP peers in hello packets. The NSF-capable device notifies its neighbors that an NSF restart operation has started by setting the restart (RS) bit in a hello packet. When an NSF-aware device receives notification from an NSF-capable neighbor that an NSF-restart operation is in progress, the NSF-capable and NSF-aware devices immediately exchange their topology tables. The NSF-aware device sends an end-of-table update packet when the transmission of its topology table is complete. The NSF-aware device then performs the following actions to assist the NSF-capable device:

  • The EIGRP hello hold timer is expired to reduce the time interval set for hello packet generation and transmission. This allows the NSF-aware device to reply to the NSF-capable device more quickly reducing the amount of time required for the NSF-capable device to rediscover neighbors and rebuild the topology table.

  • The route-hold timer is started. This timer is used to set the period of time that the NSF-aware device will hold known routes for the NSF-capable neighbor. This timer is configured with the timers nsf route-hold command. The default time period is 240 seconds.

  • In the peer list, the NSF-aware device notes that the NSF-capable neighbor is restarting, maintains adjacency, and holds known routes for the NSF-capable neighbor until the neighbor signals that it is ready for the NSF-aware device to send its topology table, or the route-hold timer expires. If the route-hold timer expires on the NSF-aware device, the NSF-aware device discards held routes and treats the NSF-capable device as a new device joining the network and reestablishes adjacency accordingly.

  • The NSF-aware device continues to send queries to the NSF-capable device which is still in the process of converging after a switchover, effectively extending the time before a stuck-in-active condition can occur.

When the switchover operation is complete, the NSF-capable device notifies its neighbors that it has reconverged and has received all of their topology tables by sending an end-of-table update packet to assisting devices. The NSF-capable device then returns to normal operation. The NSF-aware device will look for alternate paths (go active) for any routes that are not refreshed by the NSF-capable (restarting device). The NSF-aware device will then return to normal operation. If all paths are refreshed by the NSF-capable device, the NSF-aware device will immediately return to normal operation.


Note

NSF-aware devices are completely compatible with non-NSF aware or -capable neighbors in an EIGRP network. A non-NSF aware neighbor will ignore NSF capabilities and reset adjacencies and otherwise maintain the peering sessions normally.


OSPF Operation

When an OSPF NSF-capable device performs a supervisor engine switchover, it must perform the following tasks in order to resynchronize its link state database with its OSPF neighbors:

  • Relearn the available OSPF neighbors on the network without causing a reset of the neighbor relationship.

  • Reacquire the contents of the link state database for the network.

As quickly as possible after a supervisor engine switchover, the NSF-capable device sends an OSPF NSF signal to neighboring NSF-aware devices. Neighbor networking devices recognize this signal as an indicator that the neighbor relationship with this device should not be reset. As the NSF-capable device receives signals from other devices on the network, it can begin to rebuild its neighbor list.

After neighbor relationships are reestablished, the NSF-capable device begins to resynchronize its database with all of its NSF-aware neighbors. At this point, the routing information is exchanged between the OSPF neighbors. Once this exchange is complete, the NSF-capable device uses the routing information to remove stale routes, update the RIB, and update the FIB with the new forwarding information. The OSPF protocols are then fully converged.


Note

OSPF support in NSF requires that all neighbor networking devices be NSF-aware. If an NSF-capable device discovers that it has non-NSF -aware neighbors on a particular network segment, it disables NSF capabilities for that segment. Other network segments composed entirely of NSF-capable or NSF-aware devices continue to provide NSF capabilities.


How to Configure Cisco Nonstop Forwarding with Stateful Switchover

Configuring Stateful Switchover

You must configure SSO in order to use NSF with any supported protocol.

SUMMARY STEPS

  1. enable
  2. show redundancy states
  3. redundancy
  4. mode sso
  5. end
  6. show redundancy states
  7. debug redundancy status

DETAILED STEPS

  Command or Action Purpose
Step 1

enable

Example:

Device> enable

Enables privileged EXEC mode.

  • Enter your password if prompted.

Step 2

show redundancy states

Example:

Device# show redundancy states

Displays the operating redundancy mode.

Step 3

redundancy

Example:

Device(config)# redundancy

Enters redundancy configuration mode.

Step 4

mode sso

Example:

Device(config-red)# mode sso

Configures stateful switchover.

  • When this command is entered, the standby switch is reloaded and begins to work in SSO mode.

Step 5

end

Example:

Device(config-red)# end

Exits redundancy configuration mode and returns to privileged EXEC mode.

Step 6

show redundancy states

Example:

Device# show redundancy states

Displays the operating redundancy mode.

Step 7

debug redundancy status

Example:

Device# debug redundancy status

Enables the debugging of redundancy status events.

Configuration Examples for Nonstoop Forwarding with Stateful Switchover

Example: Configuring Stateful Switchover

This example shows how to configure the system for SSO and displays the redundancy state:

Device(config)# redundancy 
Device(config-red)# mode sso
Device(config-red)# end
Device#

The following is sample output from the show redundancy command:
Device# show redundancy states

       my state = 13 -ACTIVE 
     peer state = 1  -DISABLED 
           Mode = Simplex
           Unit = Primary
        Unit ID = 3

Redundancy Mode (Operational) = Non-redundant
Redundancy Mode (Configured)  = sso
Redundancy State              = Non Redundant
     Maintenance Mode = Disabled
    Manual Swact = disabled (system is simplex (no peer unit))
 Communications = Down      Reason: Simplex mode

   client count = 103
 client_notification_TMR = 30000 milliseconds
           RF debug mask = 0x0 

The following is sample output from the show redundancy clients command:
Device# show redundancy clients 

 clientID = 29       group_id = 1        clientSeq = 60       Redundancy Mode RF
 clientID = 139      group_id = 1        clientSeq = 62       IfIndex
 clientID = 25       group_id = 1        clientSeq = 71       CHKPT RF
 clientID = 10001    group_id = 1        clientSeq = 85       QEMU Platform RF
 clientID = 77       group_id = 1        clientSeq = 87       Event Manager
 clientID = 1340     group_id = 1        clientSeq = 104      RP Platform RF
 clientID = 1501     group_id = 1        clientSeq = 105      CWAN HA
 clientID = 305      group_id = 1        clientSeq = 110      Multicast ISSU Consolidation RF
 clientID = 304      group_id = 1        clientSeq = 111      IP multicast RF Client
 clientID = 22       group_id = 1        clientSeq = 112      Network RF Client
 clientID = 88       group_id = 1        clientSeq = 113      HSRP
 clientID = 114      group_id = 1        clientSeq = 114      GLBP
 clientID = 4700     group_id = 1        clientSeq = 118      COND_DEBUG RF
 clientID = 1341     group_id = 1        clientSeq = 119      IOSXE DPIDX
 clientID = 1505     group_id = 1        clientSeq = 120      IOSXE SPA TSM
 clientID = 75       group_id = 1        clientSeq = 130      Tableid HA
 clientID = 501      group_id = 1        clientSeq = 137      LAN-Switch VTP VLAN
 clientID = 71       group_id = 1        clientSeq = 139      XDR RRP RF Client
 clientID = 24       group_id = 1        clientSeq = 140      CEF RRP RF Client
 clientID = 146      group_id = 1        clientSeq = 142      BFD RF Client
 clientID = 301      group_id = 1        clientSeq = 146      MRIB RP RF Client
 clientID = 306      group_id = 1        clientSeq = 150      MFIB RRP RF Client
 clientID = 402      group_id = 1        clientSeq = 161      TPM RF client
 clientID = 520      group_id = 1        clientSeq = 162      RFS RF
 clientID = 210      group_id = 1        clientSeq = 163      Auth Mgr
 clientID = 10101    group_id = 1        clientSeq = 164      NGMOD HMS RF client
 clientID = 5        group_id = 1        clientSeq = 165      Config Sync RF client
 clientID = 10007    group_id = 1        clientSeq = 170      NGWC FEC Rf client
 clientID = 10009    group_id = 1        clientSeq = 173      NGWC POWERNET Rf client
 clientID = 10100    group_id = 1        clientSeq = 174      NGMOD XCVR RF client
 clientID = 502      group_id = 1        clientSeq = 187      LAN-Switch Port Manager
 clientID = 530      group_id = 1        clientSeq = 189      Access Tunnel
 clientID = 519      group_id = 1        clientSeq = 190      Mac address Table Manager
 clientID = 209      group_id = 1        clientSeq = 209      L2FIB
 clientID = 207      group_id = 1        clientSeq = 215      CFM RF
 clientID = 208      group_id = 1        clientSeq = 218      LLDP
 clientID = 226      group_id = 1        clientSeq = 219      LACP

The following is sample output from the show redundancy counters command:
Device# show redundancy counters 

Redundancy Facility OMs
               comm link up = 0 
             comm link down = 0 

          invalid client tx = 0 
          null tx by client = 0 
                tx failures = 0 
      tx msg length invalid = 0 

      client not rxing msgs = 0 
 rx peer msg routing errors = 0 
           null peer msg rx = 0 
        errored peer msg rx = 0 

                 buffers tx = 7250 
     tx buffers unavailable = 0 
                 buffers rx = 6786 
      buffer release errors = 0 

 duplicate client registers = 0 
  failed to register client = 0 
       Invalid client syncs = 0 

The following is sample output from the show redundancy states command:
Device# show redundancy states

       my state = 13 -ACTIVE 
     peer state = 1  -DISABLED 
           Mode = Simplex
           Unit = Primary
        Unit ID = 3

Redundancy Mode (Operational) = Non-redundant
Redundancy Mode (Configured)  = sso
Redundancy State              = Non Redundant
     Maintenance Mode = Disabled
    Manual Swact = disabled (system is simplex (no peer unit))
 Communications = Down      Reason: Simplex mode

   client count = 103
 client_notification_TMR = 30000 milliseconds
           RF debug mask = 0x0 

Verifying Cisco Express Forwarding with Cisco Nonstop Forwarding

SUMMARY STEPS

  1. show cef state

DETAILED STEPS


show cef state

Displays the state of Cisco Express Forwarding on a networking device.

Example:

Device# show cef state 

CEF Status:
RP instance
common CEF enabled
IPv4 CEF Status:
CEF enabled/running
dCEF enabled/running
CEF switching enabled/running
universal per-destination load sharing algorithm, id DEA83012
IPv6 CEF Status:
CEF disabled/not running
dCEF disabled/not running
universal per-destination load sharing algorithm, id DEA83012
RRP state:
I am standby RRP: no
RF Peer Presence: yes
RF PeerComm reached: yes
RF Progression blocked: never
Redundancy mode: rpr(1)
CEF NSF sync: disabled/not running
CEF ISSU Status:
FIBHWIDB broker
No slots are ISSU capable.
FIBIDB broker
No slots are ISSU capable.
FIBHWIDB Subblock broker
No slots are ISSU capable.
FIBIDB Subblock broker
No slots are ISSU capable.
Adjacency update
No slots are ISSU capable.
IPv4 table broker
No slots are ISSU capable.
CEF push
No slots are ISSU capable.


Additional References for Nonstop Forwarding with Stateful Switchover

Related Documents

Related Topic Document Title

For complete syntax and usage information for the commands used in this chapter.

See the High Availability section of the Command Reference (Catalyst 9400 Series Switches)

Feature History and Information for Cisco Nonstop Forwarding with Stateful Switchover

The following table provides release information about the feature or features described in this module. This table lists only the software release that introduced support for a given feature in a given software release train. Unless noted otherwise, subsequent releases of that software release train also support that feature.

Use the Cisco Feature Navigator to find information about platform support and Cisco software image support. To access Cisco Feature Navigator, go to www.cisco.com/go/cfn. An account on Cisco.com is not required

Table 1. Feature Information for Nonstop Forwarding with Stateful Switchover

Feature Name

Release

Feature Information

Nonstop Forwarding with Stateful Switchover

Cisco IOS XE Everest 16.6.2

This feature was introduced.

Cisco NSF works with the SSO feature. NSF works with SSO to minimize the amount of time a network is unavailable to users following a switchover. The main objective of NSF SSO is to continue forwarding IP packets following a Route Processor (RP) switchover.