PDIF Session Recovery


PDIF Session Recovery
 
 
In the telecommunications industry, over 90 percent of all equipment failures are software-related. With robust hardware failover and redundancy protection, any card-level hardware failures on the system can quickly be corrected. However, software failures can occur for numerous reasons, many times without prior indication. For this reason, we have introduced a new solution to recover subscriber sessions in the event of failure.
The Session Recovery feature provides seamless failover and reconstruction of subscriber session information in the event of a hardware or software fault within the system preventing a fully connected user session from being disconnected.
The PDIF supports a session recovery procedure wherein the system can be configured to store redundant call state information and recover this information after certain types of hardware and software failures.
Important: Session Recovery can only be enabled through a feature use license key. If you have not previously purchased this enhanced feature, contact your sales representative for more information.
The session recovery feature, even when the feature use key is present, is disabled by default on the system.
This chapter provides information on the following topics:
 
Session Recovery
This section examines how the PDIF recovers from two types of failure: migration failures and task failures.
 
How Session Recovery Works in PDIF
PDIF call traffic is encrypted by the IPSec protocol. When the encrypted packet arrives on the ASR 5000, it first needs to be decrypted in the data path. The Daughter Card has an IPSec SA to do the decryption. Consequently, it is very important to have the Daughter Card recover before any software recovery in order to continue call traffic processing.
IPSec Controller does not send an IPSec Manager death notification to any subsystem. This allows both the Session Manager and Daughter Card to continue carrying subscriber traffic using NPU flows and IPSec SAs to transmit the data.
It also allows the Daughter Card to continue to receive and decrypt IPSec tunnel data.
When IPSec Manager recovers, IPSec Controller sends the configured PDIF crypto template to the IPSec Manager. Upon receiving the template, IPSec Manager initializes the template policy and creates the service entry for this template policy. It also initializes an IKEv2 stack instance. IPSec Manager uses the AAA API to recover calls in bulk. On receiving a response from the AAA, IPSec Manager repopulates each subscriber policy data structure. It also takes care of information that needs to be generated dynamically, (reprogramming the Datapath, creating skip lists, etc.)
To provide faster Datapath recovery without affecting call traffic, Daughter Card Manager is used to repopulate the Daughter Cards with the SAs faster and more efficiently; meanwhile IPSec Manager and Session Manager can recover in the background at relatively slower pace without the call data being interrupted.
On restart, IPSec Manager allocates the Datapath instance used to update the Daughter Card with the IPSec SAs and to create an NPU flow mapped to the Daughter Card IPSec SA entry.
When an IPSec tunnel is established, two IPSec SAs are created: one for receiving encrypted data and another for encrypting and transmitting encrypted data. In the interests of speed, a standby Daughter Card Manager on a standby PSC stores IPSec SAs for the whole system.
Once the PSC becomes active, Daughter Card Manager quickly programs the Daughter Card with available IPSec SAs. The most active calls are given priority in order to prevent traffic interruption.
 
Migration vs. Task Failure
Planned migration requires that all the task data be transferred over the new CPU. Task migration time (hence total outage time) depends upon task size. In general, a longer outage can be anticipated since the whole task has to be successfully migrated before it can be executed. Task failure, on the other hand, creates a new task that is operational right away. Thus, it can recover its own session while it services other sessions as they are recovered. So, the perceived outage is shorter in the case of failure recovery than for a migration.
 
Planned PSC Migration
This is the case when a system administrator decides to migrate an ASR 5000 application card (PSC). When the migration command is issued, all the tasks on affected card are notified to get ready for migration. After task migration is complete, all the tasks are notified and any post-migration adjustments will be done. At the end of the migration process, each IPSec Manager is returned to its previous normal operational state before it started migration.
 
Unplanned PSC Migration
This is the case when there is a system fault and a card has failed, or when an individual task fails. When tasks are restarted as a result of either card failure or task failure, it is recovered with saved information and put back into its previous state of operation. In either planned or unplanned recovery, only tunnels that are in an already established state can be recovered. Any non-established tunnels may be lost.
 
Hardware Requirements and Configuration
Because session recovery requires numerous hardware resources, such as memory, control processors, NPU processing capacity, etc., some additional hardware may be required to ensure that enough resources are available to fully support this feature.
Important: PDIF is designed to run on an ASR 5000 chassis using only SMC controller cards and PSC application cards.
Important: A minimum of four PSCs (three active and one standby) per individual chassis is required to use this feature.
To allow for complete session recovery in the event of a hardware failure during a PSC migration, a minimum of three active and two standby PSCs should be deployed.
To assist you in your network design and capacity planning, the following list provides information that should be considered.
If a PSC migration is being performed, this may temporarily impact the ability to perform session recovery as hardware resources (e.g. memory, processors, etc.) that may be needed are not available during this operation.To avoid this condition, a minimum of two standby PSCs should be configured.
 
Enabling or Disabling Session Recovery from the CLI
 
Enabling or disabling session recovery is done on a chassis-wide basis.
 
Enabling Session Recovery on an Out-of-Service System
The following procedure is for a system that does not have any contexts configured.
Step 1
 
show license info
The output of this command appears similar to the example shown below. Session Recovery is shown in bold. If there were no current license, another would have to be applied before Session Recovery could be enabled.
Enabled Features:
 
# Feature Applicable Part Numbers
# ----------------------------------------
# HA: [ 600-00-7502 / 600-00-7505
# 600-00-7592 / 600-00-7593 ]
# + RADIUS AAA Server Groups [ none ]
# IPv4 Routing Protocols [ none ]
# Proxy MIP: [ 600-00-7512 / 600-00-7549
# 600-00-8538 ]
# + FA [ none ]
# IPv6 [ 600-00-7521 / 600-00-7576 ]
# Lawful Intercept [ 600-00-7522 ]
# 600-00-7643 / 600-00-7663 ]
# Enhanced Lawful Intercept [ 600-00-7567 / 600-00-8534 ]
# PDIF: [ none ]
# + FA [ none ]
# + Session Recovery [ 600-00-7513 / 600-00-7546
# 600-00-7552 / 600-00-7554
# 600-00-7566 / 600-00-7594
# 600-00-9100 / 600-00-9101
# 600-00-7638 / 600-00-7640
# 600-00-7634 / 600-00-7595 ]
# + RADIUS AAA Server Groups [ none ]
# PDIF: [ 600-00-8539 ]
# + FA [ none ]
# + IPSec [ 600-00-7507 / 600-00-8537 ]
# + RADIUS AAA Server Groups [ none ]
# Session Limits:
# Sessions Session Type
# -------- -----------------------
# 20000 HA
Step 2
 
configure
  require session recovery
  end
Step 3
Save your configuration as described in Verifying and Saving Your Configuration.
 
Enabling Session Recovery on an In-Service System
When enabling session recovery on a system that already has a saved configuration, the session recovery commands are automatically placed before any service configuration commands in the configuration file.
Step 1
Step 2
 
configure
  require session recovery
  end
Important: This feature does not take effect until after the system has been restarted.
Step 3
Save your configuration as described in Verifying and Saving Your Configuration.
Step 4
 
reload
The following prompt appears:
 
Are you sure? [Yes|No]:
Answer:
 
Yes
More advanced users may opt to simply insert the require session recovery command syntax into an existing configuration file using a text editor or other means, and then applying the configuration file manually. Caution should be taken when doing this to ensure that this command is placed among the first few lines of any existing configuration file to ensure that it appears before the creation of any non-local context.
 
Disabling the Session Recovery Feature
To disable the session recovery feature on a system, enter the following command from the Global Configuration mode prompt:
configure
  no require session recovery
  end
Important: If this command is issued on an in-service system, then the system must be restarted by issuing the reload command.
 
Preserved Session States
Session state is periodically stored (checkpointed) by using redundant processes to carry the existing call state information.
Preserved session states includes the following:
 
Scope of Data Recovery
In addition to maintaining call state information, information is kept in order to:
 
 
Possible Recovery Failures
There are some situations wherein session recovery may not operate properly. These include:
 
 
Show Session Recovery Status Command
Information about the current state of the session recovery subsystem is available from an Exec Mode command.
show session recovery status verbose
The output (below) shows the current state of the session recovery system. It displays all the tasks involved with the recovery process. It can also tell when was the last checkpoint (save) was done and how many sessions are to be recovered on each manager:
show session recovery status verbose
Session Recovery Status:
Overall Status : Ready For Recovery
Last Status Update : 10 seconds ago
Overall Status Update : 3 seconds ago
----sessmgr--- ----aaamgr---- demux
cpu state active standby active standby active status
---- ------- ------ ------- ------ ------- ------ -------------
2/0 Standby 0 10 0 10 0 Good
4/0 Active 0 0 0 0 4 Good (Demux)
6/0 Active 10 1 10 1 0 Good
11/0 Active 10 1 10 1 0 Good
13/0 Standby 0 10 0 10 0 Good
 

Cisco Systems Inc.
Tel: 408-526-4000
Fax: 408-527-0883